Hi!
Thank you for the tips!
Sorry, for the long mail, but I wanna show you some server statistics.
To answer a few further questions:
1. sysctl -w net.inet.ip.mtudisc=0
doesn't have any effect
2. no important messages in /var/log/messages during up-/download
3. Samba downloads show similar behaviour
4. And I ran some further tests with ifstat, iostat and netperf
(see below)
My assumptions:
I think there are some problems with my hdd, especially when dmesg says:
-
root on wd0a
rootdev=0x0 rrootdev=0x300 rawdev=0x302
wd0(pciide0:0:0): timeout
type: ata
c_bcount: 512
c_skip: 0
wd0e: DMA error writing fsbn 2651200 of 2651200-2651211 (wd0 bn 3819472; cn
3789 tn 2 sn 34), retrying
wd0: soft error (corrected)
wd0(pciide0:0:0): timeout
type: ata
c_bcount: 0
c_skip: 0
wd0(pciide0:0:0): timeout
type: ata
c_bcount: 512
c_skip: 0
wd0(pciide0:0:0): timeout
type: ata
c_bcount: 8192
c_skip: 0
wd0e: device timeout writing fsbn 2651200 of 2651200-2651215 (wd0 bn 3819472;
cn 3789 tn 2 sn 34), retrying
wd0: soft error (corrected)
wd0(pciide0:0:0): timeout
type: ata
c_bcount: 512
c_skip: 0
wd0(pciide0:0:0): timeout
type: ata
c_bcount: 512
c_skip: 0
wd0(pciide0:0:0): timeout
type: ata
c_bcount: 512
c_skip: 0
wd0(pciide0:0:0): timeout
type: ata
c_bcount: 6144
c_skip: 0
wd0e: device timeout writing fsbn 2653908 of 2653908-2653919 (wd0 bn 3822180;
cn 3791 tn 13 sn 33), retrying
wd0: soft error (corrected)
After running a self test with smartctl the results do not look very good:
Num Test_DescriptionStatus Remaining LifeTime(hours)
LBA_of_first_error
# 1 Extended offlineCompleted: read failure 40% 2076
169914
And if you have a look at the iostat results you can see repeating patterns
like these:
wd0wd0 cpu
KB/t t/s MB/sKB xfr time us ni sy in id
64.00 1 0.0664 1 3.50 1 0 0 0 99
0.00 0 0.00 0 0 0.00 1 0 0 0 99
0.00 0 0.00 0 0 0.00 0 0 0 0100
0.00 0 0.00 0 0 0.00 0 0 0 1 99
Does that mean that the hdd is busy during the 'zero-lines' while there is no
more data transfer through the IO controller? The 3.5 seconds cannot be a
real meassured value, because they were recorded during a span of 1
second. So the transfer actually has to extend to the next 3 seconds. But there
it is no longer displayed in the statistics. (I hope you know what I want to
say.)
So I think the problem is my hdd.
What do you think?
Jo
netserver on OpenBSD box
netperf on Linux box
netperf -H OpenBSD -l 20
Recv SendSend
Socket Socket Message Elapsed
Size SizeSize Time Throughput
bytes bytes bytessecs.10^6bits/sec
16384 16384 1638420.02 7.74
netserver on Linux box
netperf on OpenBSD box
netperf -H Linux -l 20
Recv SendSend
Socket Socket Message Elapsed
Size SizeSize Time Throughput
bytes bytes bytessecs.10^6bits/sec
87380 16384 1638420.02 7.89
12 MB upload via scp (Linux host - OpenBSD server)
with scp running on Linux box and sshd running on OpenBSD
iostat -w 1 -D -d -C wd0
wd0wd0 cpu
KB/t t/s MB/sKB xfr time us ni sy in id
15.10 0 0.00 2 0 0.99 0 0 0 0100
0.00 0 0.00 0 0 0.00 0 0 0 0100
0.00 0 0.00 0 0 0.00 16 0 5 0 79
16.00 5 0.0879 5 5.95 0 0 0 0100
16.00 1 0.0216 1 0.99 0 0 0 0100
0.00 0 0.00 0 0 0.00 5 0 4 0 91
15.33 3 0.0446 3 2.59 0 0 0 0100
24.00 12 0.28 286 12 1.09 5 0 2 2 91
37.33 9 0.33 333 9 1.00 8 0 3 6 83
64.00 14 0.86 882 14 0.97 18 0 8 5 68
64.00 13 0.81 826 13 0.97 17 0 3 9 71
64.00 15 0.93 953 15 1.04 12 0 5 8 75
53.89 19 0.99 1016 19 1.01 22 0 8 4 67
64.00 14 0.86 882 14 0.95 21 0 5 8 66
64.00 15 0.93 953 15 1.02 22 0 6 9 63
64.00 15 0.93 953 15 0.98 30 0 9 6 55
56.00 18 0.97 992 18 1.00 13 0 10 9 68
64.00 13 0.81 826 13 1.00 19 0 5 5 71
64.00 14 0.87 889 14 0.95 12 0 5 4 79
64.00 15 0.93 953 15 1.07 26 0 8 2 64
wd0wd0 cpu
KB/t t/s MB/sKB xfr time us ni sy in id
64.00 15 0.92 945 15 0.98 15 0 5 5 75
64.00 3 0.19 191 3 0.20 6 0 2 2 90
0.00 0 0.00 0 0 0.00 0 0 0 0100
12 MB download via scp (OpenBSD server - Linux host)
with scp running on Linux box and sshd running on OpenBSD
iostat -w 1 -D -d -C wd0
wd0wd0 cpu
KB/t t/s MB/sKB