RE: Any known soft lockup issue with vfs_write()->fsnotify()?
There was another report of the same issue on CoreOS, 4.14.11-coreos. The host/guest is AWS G4. So the problem is not limited to Azure VMs. It doesn't happen on older kernel like 4.4. Maybe the problem is related to some (recent) changes on fsnotify or other fs code? Soft lockup kernel panic reboot on AWS instance on fsnotify and vfs_write #2356 https://github.com/coreos/bugs/issues/2356 Thanks, - Haiyang > -Original Message- > From: Jan Kara > Sent: Monday, March 5, 2018 3:49 PM > To: Dexuan Cui > Cc: linux-fsde...@vger.kernel.org; Jan Kara ; Amir Goldstein > ; Miklos Szeredi ; Haiyang > Zhang ; 'linux-kernel@vger.kernel.org' ker...@vger.kernel.org>; Jork Loeser > Subject: Re: Any known soft lockup issue with vfs_write()->fsnotify()? > > Hi! > > On Fri 02-03-18 22:28:50, Dexuan Cui wrote: > > Recently people are getting a soft lock issue with vfs_write()->fsnotify(). > > The detailed calltrace is available at: > > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu > > > b.com%2Fcoreos%2Fbugs%2Fissues%2F2356&data=04%7C01%7Chaiyangz%40 > micros > > > oft.com%7Ca1b1bc6822c9442195ad08d582da7942%7C72f988bf86f141af91ab2 > d7cd > > > 011db47%7C1%7C0%7C636558797237925702%7CUnknown%7CTWFpbGZsb3d8 > eyJWIjoiM > > C4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C- > 2&sdata=pdwtsbU > > 0%2FW3y7Zy%2BX%2Ffkbx%2FPktoKVBgimfxMyVk6Lyw%3D&reserved=0 > > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu > > > b.com%2Fcoreos%2Fbugs%2Fissues%2F2364&data=04%7C01%7Chaiyangz%40 > micros > > > oft.com%7Ca1b1bc6822c9442195ad08d582da7942%7C72f988bf86f141af91ab2 > d7cd > > > 011db47%7C1%7C0%7C636558797237925702%7CUnknown%7CTWFpbGZsb3d8 > eyJWIjoiM > > C4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C- > 2&sdata=w%2Bjed > > u0yIYlpRut5sHa2%2Bhs5cdcdxp1dd3sHkyvRCPw%3D&reserved=0 > > I didn't see them yet. > > > The kernel versions showing up the issue are: > > 4.14.11-coreos > > 4.14.19-coreos > > 4.13.0-1009 -- this is the kernel with which I'm personally seeing the > > lockup. > > > > I have not got a chance to try the latest mainline kernel yet. > > It would be good to try 4.15 kernel to see whether recent fixes from Miklos > didn't fix your problem. They should be present in 4.14.11/19 kernels as well > but one never knows... > > > Before the lockup error message suddenly appears, Linux has been > > running fine for many hours. I have NOT found a consistent way to > > reproduce the lockup yet. > > > > Looks the kernel is stuck in fsnotify(), when it tries to get the > > fsnotify_mark_srcu lock. > > It is not possible that we would 'hang' in srcu_read_lock() - that is just a > read of > one variable and increment of another. We'd have to be looping somewhere > and watchdog would have to happen to hit us always at that place. Weird. Are > you sure RIP points to srcu_read_lock? > > > "git log fs/notify/fsnotify.c" on the latest mainline shows that some > > recent patches might help. > > > > I'd like to check if this is a known issue. > > As I've mentioned above, so far I didn't see reports like this... > > Honza > -- > Jan Kara > SUSE Labs, CR
Re: Any known soft lockup issue with vfs_write()->fsnotify()?
Hi! On Fri 02-03-18 22:28:50, Dexuan Cui wrote: > Recently people are getting a soft lock issue with vfs_write()->fsnotify(). > The detailed calltrace is available at: > https://github.com/coreos/bugs/issues/2356 > https://github.com/coreos/bugs/issues/2364 I didn't see them yet. > The kernel versions showing up the issue are: > 4.14.11-coreos > 4.14.19-coreos > 4.13.0-1009 -- this is the kernel with which I'm personally seeing the lockup. > > I have not got a chance to try the latest mainline kernel yet. It would be good to try 4.15 kernel to see whether recent fixes from Miklos didn't fix your problem. They should be present in 4.14.11/19 kernels as well but one never knows... > Before the lockup error message suddenly appears, Linux has been running > fine for many hours. I have NOT found a consistent way to reproduce the > lockup yet. > > Looks the kernel is stuck in fsnotify(), when it tries to get the > fsnotify_mark_srcu lock. It is not possible that we would 'hang' in srcu_read_lock() - that is just a read of one variable and increment of another. We'd have to be looping somewhere and watchdog would have to happen to hit us always at that place. Weird. Are you sure RIP points to srcu_read_lock? > "git log fs/notify/fsnotify.c" on the latest mainline shows that some > recent patches might help. > > I'd like to check if this is a known issue. As I've mentioned above, so far I didn't see reports like this... Honza -- Jan Kara SUSE Labs, CR
Any known soft lockup issue with vfs_write()->fsnotify()?
Hi, Recently people are getting a soft lock issue with vfs_write()->fsnotify(). The detailed calltrace is available at: https://github.com/coreos/bugs/issues/2356 https://github.com/coreos/bugs/issues/2364 The kernel versions showing up the issue are: 4.14.11-coreos 4.14.19-coreos 4.13.0-1009 -- this is the kernel with which I'm personally seeing the lockup. I have not got a chance to try the latest mainline kernel yet. Before the lockup error message suddenly appears, Linux has been running fine for many hours. I have NOT found a consistent way to reproduce the lockup yet. Looks the kernel is stuck in fsnotify(), when it tries to get the fsnotify_mark_srcu lock. "git log fs/notify/fsnotify.c" on the latest mainline shows that some recent patches might help. I'd like to check if this is a known issue. Looking forward to your insights! Thanks, -- Dexuan For your convenience, this is a calltrace from the first link: 18h 30m 8.626s( 4ms): ip-172-45-43-199 login: [67361.641359] watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [java:87260] 18h 42m 40.116s(751490ms): [67361.644600] Modules linked in: xfs xt_statistic xt_nat xt_recent ipt_REJECT nf_reject_ipv4 xt_comment xt_mark veth nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter xt_conntrack br_netfilter bridge stp llc ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c crc32c_generic vxlan ip6_udp_tunnel udp_tunnel overlay mousedev psmouse sb_edac i2c_piix4 i2c_core evdev edac_core button xenfs xen_privcmd sch_fq_codel nls_ascii nls_cp437 vfat fat dm_verity dm_bufio ext4 crc16 mbcache jbd2 fscrypto crc32c_intel ata_piix aesni_intel xen_blkfront libata aes_x86_64 crypto_simd cryptd glue_helper scsi_mod ixgbevf dm_mirror dm_region_hash dm_log dm_mod dax 18h 42m 40.142s( 26ms): [67361.668103] CPU: 10 PID: 87260 Comm: java Not tainted 4.14.11-coreos #1 18h 42m 40.142s( 0ms): [67361.670391] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006 18h 42m 40.144s( 2ms): [67361.672581] task: 90d6009dbc80 task.stack: b2388f704000 18h 42m 40.149s( 5ms): [67361.674604] RIP: 0010:fsnotify+0x166/0x520 18h 42m 40.149s( 0ms): [67361.675971] RSP: 0018:b2388f707e10 EFLAGS: 0202 ORIG_RAX: ff0c 18h 42m 40.150s( 1ms): [67361.678462] RAX: 0001 RBX: RCX: 18h 42m 40.152s( 2ms): [67361.680986] RDX: 0001 RSI: 0002 RDI: 907294c0 18h 42m 40.157s( 5ms): [67361.683340] RBP: 90d2eddffed8 R08: R09: 18h 42m 40.157s( 0ms): [67361.685709] R10: dd941da7a100 R11: R12: 90d2eddfff00 18h 42m 40.159s( 2ms): [67361.688199] R13: R14: R15: 18h 42m 40.165s( 6ms): [67361.690579] FS: 7f491c3f4700() GS:90d6ef88() knlGS: 18h 42m 40.165s( 0ms): [67361.693227] CS: 0010 DS: ES: CR0: 80050033 18h 42m 40.166s( 1ms): [67361.695206] CR2: 00c421288300 CR3: 0005ba5fc002 CR4: 001606e0 18h 42m 40.175s( 9ms): [67361.697655] Call Trace: 18h 42m 40.175s( 0ms): [67361.698499] vfs_write+0x14f/0x1a0 18h 42m 40.175s( 0ms): [67361.699656] SyS_write+0x52/0xc0 18h 42m 40.175s( 0ms): [67361.700745] do_syscall_64+0x59/0x1c0 18h 42m 40.175s( 0ms): [67361.701996] entry_SYSCALL64_slow_path+0x25/0x25 18h 42m 40.175s( 0ms): [67361.703536] RIP: 0033:0x7f4b5566643d 18h 42m 40.176s( 1ms): [67361.704751] RSP: 002b:7f491c3f0ef0 EFLAGS: 0293 ORIG_RAX: 0001 18h 42m 40.179s( 3ms): [67361.707375] RAX: ffda RBX: 0032 RCX: 7f4b5566643d 18h 42m 40.188s( 9ms): [67361.709849] RDX: 00f0 RSI: 7f491c3f0f50 RDI: 0f40 18h 42m 40.188s( 0ms): [67361.712205] RBP: 7f491c3f0f20 R08: 7f491c3f1030 R09: 0005f4ff70d8 18h 42m 40.188s( 0ms): [67361.714573] R10: 00052f06 R11: 0293 R12: 00f0 18h 42m 40.188s( 0ms): [67361.716924] R13: 7f491c3f0f50 R14: 0f40 R15: 18h 42m 40.191s( 3ms): [67361.719331] Code: 40 4c 89 7c 24 48 4c 89 7c 24 08 8b 44 24 18 25 00 00 03 00 89 44 24 34 4d 85 e4 0f 95 c2 48 83 7c 24 08 00 0f 95 c1 89 c8 08 d0 <0f> 84 96 03 00 00 84 d2 0f 84 e8 02 00 00 48 8b 44 24 40 84 c9