Re: [Lustre-discuss] Lustre-1.8.4 : BUG soft lock up

2011-08-10 Thread Joe Landman
On 08/10/2011 01:40 AM, Jeff Johnson wrote:
 Greetings,

 The below console output is from a 1.8.4 OST (RHEL5.5,
 2.6.18-194.3.1.el5_lustre.1.8.4, x86_64). Not saying it is a Lustre bug
 for sure. Just wondering if anyone has seen this or something very
 similar. Updating to 1.8.6 WC variant isn't an option at this time.

It was stuck in a kernel swap thread for more than 10 seconds.  Possibly 
a race condition on the disk.


 If anyone has some insight into this I'd appreciate the feedback.

 Thanks,

 --Jeff

 BUG: soft lockup - CPU#6 stuck for 10s! [kswapd0:409]

More to the point, it shouldn't be swapping.  What is

sysctl -a | grep swappiness

?  and

cat /proc/meminfo  | grep -i swap

Likely you have some process with a memory leak, and you need to flush 
cache/swap every now and then to make sure it doesn't fill up.

 CPU 6:

 RIP: 0010:[801011bf]  [801011bf] dqput+0x105/0x19f

This is a quota put.  It has some nice spin locks in there, and there 
could be some allocations in some of the function calls.  I haven't checked.

http://lxr.free-electrons.com/source/fs/quota/dquot.c?a=microblaze#L718

 RSP: 0018:8101be805cd0  EFLAGS: 0202
 RAX: 81012e03f000 RBX:  RCX: 81012e03f000
 RDX: ffe2 RSI: 0002 RDI: 81012f4f01c0
 RBP: 81007fb4c918 R08: 81018b00 R09: 81007fb4c918
 R10: 8101be805c60 R11: 8b6448f0 R12: 8101be805c60
 R13: 8b6448f0 R14: ffe2 R15: 8b6448f0
 FS:  () GS:8101bfc2adc0() knlGS:
 CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
 CR2: 00402000 CR3: 00201000 CR4: 06e0

 Call Trace:
[8010182b] dquot_drop+0x30/0x5e
[8b647e83] :ldiskfs:ldiskfs_dquot_drop+0x43/0x70
[80022d99] clear_inode+0xb4/0x123
[80034e52] dispose_list+0x41/0xe0
[8002d6a7] shrink_icache_memory+0x1b7/0x1e6
[8003f466] shrink_slab+0xdc/0x153
[80057e59] kswapd+0x343/0x46c
[800a0ab2] autoremove_wake_function+0x0/0x2e
[80057b16] kswapd+0x0/0x46c
[800a089a] keventd_create_kthread+0x0/0xc4
[80032890] kthread+0xfe/0x132
[8009d728] request_module+0x0/0x14d
[8005dfb1] child_rip+0xa/0x11
[800a089a] keventd_create_kthread+0x0/0xc4
[80032792] kthread+0x0/0x132
[8005dfa7] child_rip+0x0/0x11

There are a couple of bugs in RHEL that this could be similar to.




-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: land...@scalableinformatics.com
web  : http://scalableinformatics.com
http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Lustre-1.8.4 : BUG soft lock up

2011-08-09 Thread Jeff Johnson
Greetings,

The below console output is from a 1.8.4 OST (RHEL5.5, 
2.6.18-194.3.1.el5_lustre.1.8.4, x86_64). Not saying it is a Lustre bug 
for sure. Just wondering if anyone has seen this or something very 
similar. Updating to 1.8.6 WC variant isn't an option at this time.

If anyone has some insight into this I'd appreciate the feedback.

Thanks,

--Jeff

BUG: soft lockup - CPU#6 stuck for 10s! [kswapd0:409]
CPU 6:
Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) ldiskfs(U) 
jbd2(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U)
osc(U) ksocklnd(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) 
autofs4(U) hidp(U) l2cap(U) bluetooth(U)
lockd(U) sunrpc(U) ip6t_REJECT(U) xt_tcpudp(U) ip6table_filter(U) ip6_tables(U) 
x_tables(U) ib_iser(U) libiscsi2(U)
scsi_transport_iscsi2(U) scsi_transport_iscsi(U) ib_srp(U) rds(U) ib_sdp(U) 
ib_ipoib(U) ipoib_helper(U) ipv6(U) xfrm_nalgo(U)
crypto_api(U) rdma_ucm(U) rdma_cm(U) ib_ucm(U) ib_uverbs(U) ib_umad(U) ib_cm(U) 
iw_cm(U) ib_addr(U) ib_sa(U) mptsas(U) mptctl(U)
dm_mirror(U) dm_multipath(U) scsi_dh(U) video(U) backlight(U) sbs(U) 
power_meter(U) hwmon(U) i2c_ec(U) dell_wmi(U) wmi(U)
button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) 
parport(U) mlx4_ib(U) ib_mad(U) ib_core(U)
mlx4_en(U) joydev(U) shpchp(U) sg(U) mlx4_core(U) e1000e(U) serio_raw(U) 
pcspkr(U) i2c_i801(U) i2c_core(U) dm_raid45(U)
dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) mptspi(U) 
scsi_transport_spi(U) mptscsih(U) mptbase(U)
scsi_transport_sas(U) ata_piix(U) libata(U) sd_mod(U) scsi_mod(U) raid1(U) 
ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U)
Pid: 409, comm: kswapd0 Tainted: G  2.6.18-194.3.1.el5_lustre.1.8.4 #1
RIP: 0010:[801011bf]  [801011bf] dqput+0x105/0x19f
RSP: 0018:8101be805cd0  EFLAGS: 0202
RAX: 81012e03f000 RBX:  RCX: 81012e03f000
RDX: ffe2 RSI: 0002 RDI: 81012f4f01c0
RBP: 81007fb4c918 R08: 81018b00 R09: 81007fb4c918
R10: 8101be805c60 R11: 8b6448f0 R12: 8101be805c60
R13: 8b6448f0 R14: ffe2 R15: 8b6448f0
FS:  () GS:8101bfc2adc0() knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 00402000 CR3: 00201000 CR4: 06e0

Call Trace:
  [8010182b] dquot_drop+0x30/0x5e
  [8b647e83] :ldiskfs:ldiskfs_dquot_drop+0x43/0x70
  [80022d99] clear_inode+0xb4/0x123
  [80034e52] dispose_list+0x41/0xe0
  [8002d6a7] shrink_icache_memory+0x1b7/0x1e6
  [8003f466] shrink_slab+0xdc/0x153
  [80057e59] kswapd+0x343/0x46c
  [800a0ab2] autoremove_wake_function+0x0/0x2e
  [80057b16] kswapd+0x0/0x46c
  [800a089a] keventd_create_kthread+0x0/0xc4
  [80032890] kthread+0xfe/0x132
  [8009d728] request_module+0x0/0x14d
  [8005dfb1] child_rip+0xa/0x11
  [800a089a] keventd_create_kthread+0x0/0xc4
  [80032792] kthread+0x0/0x132
  [8005dfa7] child_rip+0x0/0x11


-- 
--
Jeff Johnson
Manager
Aeon Computing

jeff.johnson at aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x101   f: 858-412-3845

4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss