Re: [Lustre-discuss] Lustre-1.8.4 : BUG soft lock up
On 08/10/2011 01:40 AM, Jeff Johnson wrote: Greetings, The below console output is from a 1.8.4 OST (RHEL5.5, 2.6.18-194.3.1.el5_lustre.1.8.4, x86_64). Not saying it is a Lustre bug for sure. Just wondering if anyone has seen this or something very similar. Updating to 1.8.6 WC variant isn't an option at this time. It was stuck in a kernel swap thread for more than 10 seconds. Possibly a race condition on the disk. If anyone has some insight into this I'd appreciate the feedback. Thanks, --Jeff BUG: soft lockup - CPU#6 stuck for 10s! [kswapd0:409] More to the point, it shouldn't be swapping. What is sysctl -a | grep swappiness ? and cat /proc/meminfo | grep -i swap Likely you have some process with a memory leak, and you need to flush cache/swap every now and then to make sure it doesn't fill up. CPU 6: RIP: 0010:[801011bf] [801011bf] dqput+0x105/0x19f This is a quota put. It has some nice spin locks in there, and there could be some allocations in some of the function calls. I haven't checked. http://lxr.free-electrons.com/source/fs/quota/dquot.c?a=microblaze#L718 RSP: 0018:8101be805cd0 EFLAGS: 0202 RAX: 81012e03f000 RBX: RCX: 81012e03f000 RDX: ffe2 RSI: 0002 RDI: 81012f4f01c0 RBP: 81007fb4c918 R08: 81018b00 R09: 81007fb4c918 R10: 8101be805c60 R11: 8b6448f0 R12: 8101be805c60 R13: 8b6448f0 R14: ffe2 R15: 8b6448f0 FS: () GS:8101bfc2adc0() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 00402000 CR3: 00201000 CR4: 06e0 Call Trace: [8010182b] dquot_drop+0x30/0x5e [8b647e83] :ldiskfs:ldiskfs_dquot_drop+0x43/0x70 [80022d99] clear_inode+0xb4/0x123 [80034e52] dispose_list+0x41/0xe0 [8002d6a7] shrink_icache_memory+0x1b7/0x1e6 [8003f466] shrink_slab+0xdc/0x153 [80057e59] kswapd+0x343/0x46c [800a0ab2] autoremove_wake_function+0x0/0x2e [80057b16] kswapd+0x0/0x46c [800a089a] keventd_create_kthread+0x0/0xc4 [80032890] kthread+0xfe/0x132 [8009d728] request_module+0x0/0x14d [8005dfb1] child_rip+0xa/0x11 [800a089a] keventd_create_kthread+0x0/0xc4 [80032792] kthread+0x0/0x132 [8005dfa7] child_rip+0x0/0x11 There are a couple of bugs in RHEL that this could be similar to. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics, Inc. email: land...@scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Lustre-1.8.4 : BUG soft lock up
Greetings, The below console output is from a 1.8.4 OST (RHEL5.5, 2.6.18-194.3.1.el5_lustre.1.8.4, x86_64). Not saying it is a Lustre bug for sure. Just wondering if anyone has seen this or something very similar. Updating to 1.8.6 WC variant isn't an option at this time. If anyone has some insight into this I'd appreciate the feedback. Thanks, --Jeff BUG: soft lockup - CPU#6 stuck for 10s! [kswapd0:409] CPU 6: Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) ldiskfs(U) jbd2(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ksocklnd(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) autofs4(U) hidp(U) l2cap(U) bluetooth(U) lockd(U) sunrpc(U) ip6t_REJECT(U) xt_tcpudp(U) ip6table_filter(U) ip6_tables(U) x_tables(U) ib_iser(U) libiscsi2(U) scsi_transport_iscsi2(U) scsi_transport_iscsi(U) ib_srp(U) rds(U) ib_sdp(U) ib_ipoib(U) ipoib_helper(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) rdma_ucm(U) rdma_cm(U) ib_ucm(U) ib_uverbs(U) ib_umad(U) ib_cm(U) iw_cm(U) ib_addr(U) ib_sa(U) mptsas(U) mptctl(U) dm_mirror(U) dm_multipath(U) scsi_dh(U) video(U) backlight(U) sbs(U) power_meter(U) hwmon(U) i2c_ec(U) dell_wmi(U) wmi(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) mlx4_ib(U) ib_mad(U) ib_core(U) mlx4_en(U) joydev(U) shpchp(U) sg(U) mlx4_core(U) e1000e(U) serio_raw(U) pcspkr(U) i2c_i801(U) i2c_core(U) dm_raid45(U) dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) mptspi(U) scsi_transport_spi(U) mptscsih(U) mptbase(U) scsi_transport_sas(U) ata_piix(U) libata(U) sd_mod(U) scsi_mod(U) raid1(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U) Pid: 409, comm: kswapd0 Tainted: G 2.6.18-194.3.1.el5_lustre.1.8.4 #1 RIP: 0010:[801011bf] [801011bf] dqput+0x105/0x19f RSP: 0018:8101be805cd0 EFLAGS: 0202 RAX: 81012e03f000 RBX: RCX: 81012e03f000 RDX: ffe2 RSI: 0002 RDI: 81012f4f01c0 RBP: 81007fb4c918 R08: 81018b00 R09: 81007fb4c918 R10: 8101be805c60 R11: 8b6448f0 R12: 8101be805c60 R13: 8b6448f0 R14: ffe2 R15: 8b6448f0 FS: () GS:8101bfc2adc0() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: 00402000 CR3: 00201000 CR4: 06e0 Call Trace: [8010182b] dquot_drop+0x30/0x5e [8b647e83] :ldiskfs:ldiskfs_dquot_drop+0x43/0x70 [80022d99] clear_inode+0xb4/0x123 [80034e52] dispose_list+0x41/0xe0 [8002d6a7] shrink_icache_memory+0x1b7/0x1e6 [8003f466] shrink_slab+0xdc/0x153 [80057e59] kswapd+0x343/0x46c [800a0ab2] autoremove_wake_function+0x0/0x2e [80057b16] kswapd+0x0/0x46c [800a089a] keventd_create_kthread+0x0/0xc4 [80032890] kthread+0xfe/0x132 [8009d728] request_module+0x0/0x14d [8005dfb1] child_rip+0xa/0x11 [800a089a] keventd_create_kthread+0x0/0xc4 [80032792] kthread+0x0/0x132 [8005dfa7] child_rip+0x0/0x11 -- -- Jeff Johnson Manager Aeon Computing jeff.johnson at aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss