slow network performance (part 2)

2006-02-15 Thread Joachim Mathes
Hi!

Thank you for the tips!

Sorry, for the long mail, but I wanna show you some server statistics.

To answer a few further questions:

1. sysctl -w net.inet.ip.mtudisc=0
   doesn't have any effect
2. no important messages in /var/log/messages during up-/download
3. Samba downloads show similar behaviour
4. And I ran some further tests with ifstat, iostat and netperf
   (see below)

My assumptions:
I think there are some problems with my hdd, especially when dmesg says:

-
root on wd0a
rootdev=0x0 rrootdev=0x300 rawdev=0x302
wd0(pciide0:0:0): timeout
type: ata
c_bcount: 512
c_skip: 0
wd0e: DMA error writing fsbn 2651200 of 2651200-2651211 (wd0 bn 3819472; cn 
3789 tn 2 sn 34), retrying
wd0: soft error (corrected)
wd0(pciide0:0:0): timeout
type: ata
c_bcount: 0
c_skip: 0
wd0(pciide0:0:0): timeout
type: ata
c_bcount: 512
c_skip: 0
wd0(pciide0:0:0): timeout
type: ata
c_bcount: 8192
c_skip: 0
wd0e: device timeout writing fsbn 2651200 of 2651200-2651215 (wd0 bn 3819472; 
cn 3789 tn 2 sn 34), retrying
wd0: soft error (corrected)
wd0(pciide0:0:0): timeout
type: ata
c_bcount: 512
c_skip: 0
wd0(pciide0:0:0): timeout
type: ata
c_bcount: 512
c_skip: 0
wd0(pciide0:0:0): timeout
type: ata
c_bcount: 512
c_skip: 0
wd0(pciide0:0:0): timeout
type: ata
c_bcount: 6144
c_skip: 0
wd0e: device timeout writing fsbn 2653908 of 2653908-2653919 (wd0 bn 3822180; 
cn 3791 tn 13 sn 33), retrying
wd0: soft error (corrected)


After running a self test with smartctl the results do not look very good:

Num  Test_DescriptionStatus  Remaining  LifeTime(hours)  
LBA_of_first_error
# 1  Extended offlineCompleted: read failure   40%  2076 
169914


And if you have a look at the iostat results you can see repeating patterns
like these:

wd0wd0 cpu
  KB/t t/s MB/sKB xfr time  us ni sy in id
 64.00   1 0.0664   1 3.50   1  0  0  0 99
  0.00   0 0.00 0   0 0.00   1  0  0  0 99
  0.00   0 0.00 0   0 0.00   0  0  0  0100
  0.00   0 0.00 0   0 0.00   0  0  0  1 99


Does that mean that the hdd is busy during the 'zero-lines' while there is no
more data transfer through the IO controller? The 3.5 seconds cannot be a
real meassured value, because they were recorded during a span of 1
second. So the transfer actually has to extend to the next 3 seconds. But there
it is no longer displayed in the statistics. (I hope you know what I want to
say.)

So I think the problem is my hdd.

What do you think?

Jo
netserver on OpenBSD box
netperf on Linux box

 netperf -H OpenBSD -l 20
Recv   SendSend  
Socket Socket  Message  Elapsed  
Size   SizeSize Time Throughput  
bytes  bytes   bytessecs.10^6bits/sec  

 16384  16384  1638420.02   7.74   

netserver on Linux box
netperf on OpenBSD box

 netperf -H Linux -l 20
Recv   SendSend  
Socket Socket  Message  Elapsed  
Size   SizeSize Time Throughput  
bytes  bytes   bytessecs.10^6bits/sec  

 87380  16384  1638420.02   7.89   



12 MB upload via scp (Linux host - OpenBSD server)
with scp running on Linux box and sshd running on OpenBSD

 iostat -w 1 -D -d -C wd0
wd0wd0 cpu
  KB/t t/s MB/sKB xfr time  us ni sy in id
 15.10   0 0.00 2   0 0.99   0  0  0  0100
  0.00   0 0.00 0   0 0.00   0  0  0  0100
  0.00   0 0.00 0   0 0.00  16  0  5  0 79
 16.00   5 0.0879   5 5.95   0  0  0  0100
 16.00   1 0.0216   1 0.99   0  0  0  0100
  0.00   0 0.00 0   0 0.00   5  0  4  0 91
 15.33   3 0.0446   3 2.59   0  0  0  0100
 24.00  12 0.28   286  12 1.09   5  0  2  2 91
 37.33   9 0.33   333   9 1.00   8  0  3  6 83
 64.00  14 0.86   882  14 0.97  18  0  8  5 68
 64.00  13 0.81   826  13 0.97  17  0  3  9 71
 64.00  15 0.93   953  15 1.04  12  0  5  8 75
 53.89  19 0.99  1016  19 1.01  22  0  8  4 67
 64.00  14 0.86   882  14 0.95  21  0  5  8 66
 64.00  15 0.93   953  15 1.02  22  0  6  9 63
 64.00  15 0.93   953  15 0.98  30  0  9  6 55
 56.00  18 0.97   992  18 1.00  13  0 10  9 68
 64.00  13 0.81   826  13 1.00  19  0  5  5 71
 64.00  14 0.87   889  14 0.95  12  0  5  4 79
 64.00  15 0.93   953  15 1.07  26  0  8  2 64
wd0wd0 cpu
  KB/t t/s MB/sKB xfr time  us ni sy in id
 64.00  15 0.92   945  15 0.98  15  0  5  5 75
 64.00   3 0.19   191   3 0.20   6  0  2  2 90
  0.00   0 0.00 0   0 0.00   0  0  0  0100


12 MB download via scp (OpenBSD server - Linux host)
with scp running on Linux box and sshd running on OpenBSD

 iostat -w 1 -D -d -C wd0

wd0wd0 cpu
  KB/t t/s MB/sKB 

Re: slow network performance (part 2)

2006-02-15 Thread knitti
On 2/15/06, Joachim Mathes [EMAIL PROTECTED] wrote:
 So I think the problem is my hdd.

 What do you think?
I think the same, get a new disk in there. Until then you can run a
dd if=/rwd0c of=/dev/null bs=1m to read every sector of the hdd and
maybe allocate some of the spare sectors to some of the failing ones.
of course, do a backup asap, before you do anything other.


--knitti