[Lustre-discuss] More: OSS crashes

2008-07-31 Thread Thomas Roth
Hi all,

I'm still successful in bringing my OSSs to a standstill if not crashing 
them.
Having reduced the number of stress jobs writing to Lustre (stress -d 2 
--hdd-noclean --hdd-bytes 5M) to four, and having reduced the number of 
OSS threads (options ost oss_num_threads=256 in /etc/modprobe.d/lustre), 
the OSS do not freeze entirely any more. Instead after ~ 15 hours,
- all stress jobs have terminated with Input/output error
- the MDT has marked the affected OSTs as Inactive
- the already open connections to the OSS remain active
- interactive collectl, watch df, top sessions are still working
- the number of ll_ost threads is 256 ( number of ll_ost_io is 257 ?)
- log file writing has obviously stopped after only 10 hours
- already open shells  allow commands like ps, I can kill some processes
- new ssh login doesn't work
- access to disk, as in ls, brings the system to total freeze

The process table shows six ll_ost_io - threads, all using 38.9% cpu, 
all running for 419:21m. All the rest are sleeping.
The cause can't be system overloading or simple faulty hardware.  To 
give an impression of what is going on, I'm quoting the last collectl 
record:

##
### RECORD  139  (1217475195.342) (Thu Jul 31 05:33:15 2008) ###

# CPU SUMMARY (INTR, CTXSW  PROC /sec)
# USER  NICE   SYS  WAIT   IRQ  SOFT STEAL  IDLE  INTR  CTXSW  PROC 
RUNQ   RUN   AVG1  AVG5 AVG15
 0 014  200 5 0 
 58  425553K 1 736 622.06 
31.28   31.13

# DISK SUMMARY (/sec)
#KBRead RMerged  Reads SizeKB  KBWrite WMerged Writes SizeKB
  00   0   0 83740 
   314  861 97

# LUSTRE FILESYSTEM SINGLE OST STATISTICS
#Ost  KBRead   ReadsKBWrite  Writes
OST00040   0  40674  63
OST00050   0  40858  66
##


That's not too much for the machine, I'd reckon. And as mentioned in an 
earlier post, I have run the very same 'stress' test, also with CPU load 
or I/O load only, locally on machines that had crashed earlier. The test 
runs that wrote to disk finished only when the disks where 100% full 
(then formatted plain ext3), the tests with I/O load = 500 and CPU load 
= 1k are running for three days now.  Of course I don't know how 
reliable these test are.

Looks to me as if a few Lustre threads for some reason can't  process 
their I/O any more, kind of building up pressure and finally blocking 
all (disk) I/O.
Knowing this reason and how to avoid it would not only relieve these 
servers of some pressure... ;-)

Hm, hardware: the cluster is running Debian Etch, Kernel 2.6.22, Lustre 
1.6.5. The OSS are Supermicro X7DB8 fileservers, Xeon E5320, 8GB RAM, 
with 16 internal disks on two 3ware 9650 RAID controllers, forming two 
OSTs each.

Many thanks for any further hints,
Thomas

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] More: OSS crashes

2008-07-31 Thread Andreas Dilger
On Jul 31, 2008  20:45 +0200, Thomas Roth wrote:
 I'm still successful in bringing my OSSs to a standstill if not crashing 
 them.
 Having reduced the number of stress jobs writing to Lustre (stress -d 2 
 --hdd-noclean --hdd-bytes 5M) to four, and having reduced the number of 
 OSS threads (options ost oss_num_threads=256 in /etc/modprobe.d/lustre), 
 the OSS do not freeze entirely any more. Instead after ~ 15 hours,
 - all stress jobs have terminated with Input/output error
 - the MDT has marked the affected OSTs as Inactive
 - the already open connections to the OSS remain active
 - interactive collectl, watch df, top sessions are still working
 - the number of ll_ost threads is 256 ( number of ll_ost_io is 257 ?)
 - log file writing has obviously stopped after only 10 hours
 - already open shells  allow commands like ps, I can kill some processes
 - new ssh login doesn't work
 - access to disk, as in ls, brings the system to total freeze
 
 The process table shows six ll_ost_io - threads, all using 38.9% cpu, 
 all running for 419:21m. All the rest are sleeping.
 The cause can't be system overloading or simple faulty hardware.

You need to look at the process table (sysrq-t) and get the stacks of
the running and blocked lustre processes.  Also useful would be the
memory information (sysrq-m) to see if the node is out of free memory,
and if so where it is gone.

If you can still run some commands, then cat /proc/slabinfo may
also be useful.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss