Hi all,
we have a couple of thumpers here, one running Sol10u8 with the latest patches
all applied (just did it this morning). After problems with the 10 Gb/s NIC we
disabled that one and moved to the on-board e1000g0.
Now, when user launch jobs and are hitting the box relatively hard we get very
interesting numbers:
s09:~# zpool iostat 18
capacity operations bandwidth
pool used avail read write read write
---------- ----- ----- ----- ----- ----- -----
atlashome 7.64T 12.8T 512 251 51.2M 28.1M
atlashome 7.64T 12.8T 1.28K 174 145M 8.24M
atlashome 7.64T 12.8T 1.17K 123 133M 7.26M
atlashome 7.64T 12.8T 1.60K 36 202M 3.06M
atlashome 7.64T 12.8T 1.37K 106 173M 3.94M
atlashome 7.64T 12.8T 1.31K 8 164M 956K
atlashome 7.64T 12.8T 1.14K 63 144M 2.57M
atlashome 7.64T 12.8T 1.73K 48 218M 2.02M
atlashome 7.64T 12.8T 1.45K 3 184M 429K
atlashome 7.64T 12.8T 1.66K 44 210M 1.55M
atlashome 7.64T 12.8T 1.75K 4 219M 549K
atlashome 7.64T 12.8T 1.88K 34 238M 1.12M
atlashome 7.64T 12.8T 1.62K 26 205M 1.32M
atlashome 7.64T 12.8T 1.79K 9 224M 1.15M
atlashome 7.64T 12.8T 2.12K 42 269M 2.95M
atlashome 7.64T 12.8T 2.35K 14 298M 1.71M
atlashome 7.64T 12.8T 3.12K 49 397M 4.00M
atlashome 7.64T 12.8T 3.61K 55 460M 3.91M
atlashome 7.64T 12.8T 4.32K 7 550M 902K
atlashome 7.64T 12.8T 4.12K 44 525M 3.32M
atlashome 7.64T 12.8T 5.05K 5 644M 643K
atlashome 7.64T 12.8T 4.33K 34 553M 1.70M
atlashome 7.64T 12.8T 4.52K 30 577M 1.69M
atlashome 7.64T 12.8T 4.70K 3 600M 427K
atlashome 7.64T 12.8T 4.71K 36 599M 2.43M
atlashome 7.64T 12.8T 4.49K 2 569M 314K
atlashome 7.64T 12.8T 6.56K 40 832M 2.89M
atlashome 7.64T 12.8T 5.78K 46 735M 3.58M
atlashome 7.64T 12.8T 5.97K 3 759M 345K
atlashome 7.64T 12.8T 6.03K 45 765M 3.29M
atlashome 7.64T 12.8T 5.18K 2 658M 309K
atlashome 7.64T 12.8T 5.64K 37 710M 2.42M
atlashome 7.64T 12.8T 5.44K 42 685M 2.95M
atlashome 7.64T 12.8T 4.73K 4 590M 454K
atlashome 7.64T 12.8T 3.60K 53 447M 3.46M
atlashome 7.64T 12.8T 3.76K 59 469M 2.82M
atlashome 7.64T 12.8T 2.95K 51 367M 1.52M
atlashome 7.64T 12.8T 1.52K 53 191M 1.18M
atlashome 7.64T 12.8T 3.48K 32 434M 1.11M
atlashome 7.64T 12.8T 3.41K 21 432M 533K
atlashome 7.64T 12.8T 3.58K 41 454M 1.56M
atlashome 7.64T 12.8T 2.71K 39 342M 1.36M
Read from remote host s09: Connection timed out
Here the system crashed. Can someone explain me, why the zpool is reading data
at 5-8 times the possible bandwidth of the single Gbit interface off the
disks?
Could this just be combination of large record sizes (128k) and compression
being on and the users reading very tiny files?
Also, via the ILOM I'm totally stuck:
last pid: 2115; load avg: 101.4, 104.9, 61.3; up 0+00:32:32
09:49:08
46 processes: 43 sleeping, 2 running, 1 on cpu
CPU states: 31.2% idle, 0.1% user, 68.7% kernel, 0.0% iowait, 0.0% swap
Memory: 16G phys mem, 62M free mem, 4001M swap, 3822M free swap
Feb 25 09:46:55 s09 nfssrv: WARNING: nfsauth: mountd not responding
PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND
736 daemon 999 60 -20 14M 736K sleep 11:56 56.08% nfsd
160 root 47 59 0 8920K 1152K sleep 0:04 0.33% nscd
1796 noaccess 18 59 0 189M 1008K sleep 0:21 0.17% java
2004 root 1 59 0 3396K 236K cpu 0:04 0.12% top
2012 root 1 60 0 2960K 212K sleep 0:00 0.10% bash
1915 root 1 59 0 8564K 256K sleep 0:00 0.10% master
9 root 16 59 0 11M 424K sleep 0:05 0.09% svc.configd
616 root 23 59 0 21M 868K run 0:13 0.08% fmd
485 root 3 59 0 2816K 188K sleep 0:00 0.07% automountd
7 root 14 59 0 15M 324K sleep 0:01 0.07% svc.startd
1 root 1 59 0 2496K 4K sleep 0:00 0.07% init
1962 postfix 1 60 0 8768K 264K sleep 0:00 0.06% qmgr
357 daemon 1 60 0 4540K 288K sleep 0:01 0.06% rpcbind
2006 root 1 60 0 6328K 164K sleep 0:00 0.06% sshd
427 root 1 59 0 1436K 248K sleep 0:00 0.04% utmpd
zfs ^C^C^C^C^C^C^C^C
ifconfig -a
s09:~#
s09:~# ifconfig -a
-bash: fork: Resource temporarily unavailable
s09:~# free
-bash: fork: Resource temporarily unavailable
s09:~# Feb 25 09:50:03 s09 sshd[591]: error: fork: Error 0
s09:~# uptime
-bash: fork: Resource temporarily unavailable
s09:~# reboot
-bash: fork: Resource temporarily unavailable
s09:~# shutdown -i 5 -y -g 0
-bash: fork: Resource temporarily unavailable
Any idea how to find out what's going amiss here?
cheers
Carsten
_______________________________________________
Solaris-Users mailing list
[email protected]
http://www.filibeto.org/mailman/listinfo/solaris-users