Over the Christmas holiday we experienced two nodes on our cluster that froze 
up with the errors below:

Dec 27 09:43:13 dqshtc14 kernel: BUG: soft lockup - CPU#0 stuck for 67s! 
[pvfs2-client-co:4867]
Dec 27 09:43:13 dqshtc14 kernel: Modules linked in: pvfs2(U) mpt2sas 
scsi_transport_sas raid_class mptctl mptbase ipmi_devintf dell_rbu autofs4 nfs 
lockd fscache auth_rpcgss nfs_acl sunrpc bonding 8021q garp stp llc ipv6 
power_meter sg shpchp bnx2x libcrc32c mdio dcdbas microcode sb_edac edac_core 
iTCO_wdt iTCO_vendor_support ext4 mbcache jbd2 sd_mod crc_t10dif ahci wmi 
megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: 
speedstep_lib]

Our cluster has been running long enough and with a load that is heavy enough 
that I would have thought we would have seen this already if it is a systemic 
problem.

After some Googling and reading we found a lot of these types of errors being 
reported on a variety of Linux distros, none involving PVFS.   However, no 
solutions were provided either.   Has anyone in the PVFS community seen these 
errors before?   Is this I bug in the PVFS client, in the kernel, or something 
else?

We are running RHEL 6.4, kernel 2.6.32,  OrangeFS 2.8.7.

Thank you!

-Roger

-----------------------------------------------------------
Roger V. Moye
Systems Analyst III
XSEDE Campus Champion
University of Texas - MD Anderson Cancer Center
Division of Quantitative Sciences
Pickens Academic Tower - FCT4.6109
Houston, Texas
(713) 792-2134
-----------------------------------------------------------

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to