Re: Poor raidframe reconstruct performance on 8-stable

2018-11-29 Thread Mike Pumford




On 27/11/2018 22:16, Mike Pumford wrote:



On 27/11/2018 21:27, Havard Eidnes wrote:


So for now just assume duff hardware. Sorry for the noise and thanks for 
the suggestions.


Now confirmed as a very broken disk. Currently reconstructing a new 
raid1 onto a 4TB disk at 172MB/s :) Haven't run into a disk running that 
slowly without throwing IO faults before.


Mike



Re: Poor raidframe reconstruct performance on 8-stable

2018-11-27 Thread Mike Pumford




On 27/11/2018 21:27, Havard Eidnes wrote:

It could be that the caching settings on the new drive is
different from the old one.  Check and adjust with "dkctl", using
"getcache" and "setcache" as required.

Just checked that and the settings are the same. I've also disassembled 
and re-assembled the machine after the reconstruct completed (after 3 
days). The write performance of the real raid is now just as slow with 
one disk showing 100% busy and the other showing 1% busy in systat 
vmstat. This was using fresh cables and I also swapped the 2 raid disks 
around on the motherboard connection. The slow performance followed the 
disk and didn't stay bound to the port.


Based on that and the rather more sensible performance of the raid1 set 
in my other NetBSD 8 system  I'm just going to assume the disk is bad 
and replace it ASAP.


I have just got some 4TB disks to replace the existing 2TB volumes so 
once I've done a data copy I'll have the opportunity to rede a 
reconstruct on the new disks.


So for now just assume duff hardware. Sorry for the noise and thanks for 
the suggestions.


Mike


Re: Poor raidframe reconstruct performance on 8-stable

2018-11-27 Thread Havard Eidnes
> [...] I'm doing a raid1 reconstruct which should
> be a straight read from disk A, write to disk B. The data from systat
> backs that up. Each disk is doing ~110 transfers per second. wd1 the
> source disk is reading at 7MB/s and wd2 the destination is writing at
> 7MB/s.

It could be that the caching settings on the new drive is
different from the old one.  Check and adjust with "dkctl", using
"getcache" and "setcache" as required.

Regards,

- Håvard


Re: Poor raidframe reconstruct performance on 8-stable

2018-11-24 Thread Mike Pumford




On 24/11/2018 09:19, Mike Pumford wrote:



On 23/11/2018 22:20, Jaromir Dolecek wrote:

Can you perhaps try interrupt count via intrctl? iirc someone
complained about some interrupt storm coming from acpi.


Didn't know about that one. That gives:

interrupt id   CPU0  CPU1  CPU2  CPU3  device name(s)
ioapic0 pin 9 0*    0 0 0  acpi SCI
msi0 vec 0    0*    0 0 0  hdaudio0
ioapic0 pin 16  432*    0 0 0  uhci0, unknown
ioapic0 pin 21    76161*    0 0 0  uhci1, fwohci0
ioapic0 pin 19 18004848*    0 0 0  uhci2, uhci4, ahcisata1
ioapic0 pin 18    2*    0 0 0  ehci0, uhci5, ichsmb0
msi1 vec 0    0*    0 0 0  hdaudio1
ioapic0 pin 17   12*    0 0 0  unknown
ioapic0 pin 23    0*    0 0 0  uhci3, ehci1, unknown
ioapic0 pin 22   197505*    0 0 0  unknown, wm0

pin 19 is the controller for the disks. and vmstat -i says:

interrupt  total rate
TLB shootdown 838374   16
cpu0 timer   4919245   99
ioapic0 pin 16   432    0
ioapic0 pin 21 76219    1
ioapic0 pin 19  18015996  364
ioapic0 pin 18 2    0
ioapic0 pin 17    12    0
ioapic0 pin 22    197629    3
Total   24047909  486

To follow up on this more this process isn't CPU limited in any way. The 
CPU time is reoproting as 99% idle consistently during the operation.




Mike


Re: Poor raidframe reconstruct performance on 8-stable

2018-11-24 Thread Mike Pumford




On 23/11/2018 22:20, Jaromir Dolecek wrote:

Can you perhaps try interrupt count via intrctl? iirc someone
complained about some interrupt storm coming from acpi.


Didn't know about that one. That gives:

interrupt id   CPU0  CPU1  CPU2  CPU3  device name(s)
ioapic0 pin 9 0*0 0 0  acpi SCI
msi0 vec 00*0 0 0  hdaudio0
ioapic0 pin 16  432*0 0 0  uhci0, unknown
ioapic0 pin 2176161*0 0 0  uhci1, fwohci0
ioapic0 pin 19 18004848*0 0 0  uhci2, uhci4, ahcisata1
ioapic0 pin 182*0 0 0  ehci0, uhci5, ichsmb0
msi1 vec 00*0 0 0  hdaudio1
ioapic0 pin 17   12*0 0 0  unknown
ioapic0 pin 230*0 0 0  uhci3, ehci1, unknown
ioapic0 pin 22   197505*0 0 0  unknown, wm0

pin 19 is the controller for the disks. and vmstat -i says:

interrupt  total rate
TLB shootdown 838374   16
cpu0 timer   4919245   99
ioapic0 pin 16   4320
ioapic0 pin 21 762191
ioapic0 pin 19  18015996  364
ioapic0 pin 18 20
ioapic0 pin 17120
ioapic0 pin 221976293
Total   24047909  486

Mike


Re: Poor raidframe reconstruct performance on 8-stable

2018-11-23 Thread Jaromir Dolecek
Can you perhaps try interrupt count via intrctl? iirc someone complained about 
some interrupt storm coming from acpi. 

> Le 23 nov. 2018 à 21:15, Mike Pumford  a écrit :
> 
> 
> 
>> On 23/11/2018 20:01, Michael van Elst wrote:
>> mpumf...@mudcovered.org.uk (Mike Pumford) writes:
 I'm seeing a write speed of 7MB/s which means my 2TB reconstruct is
 going to take 72+hours! Last time I did it it took about 3 hours if I'm
 remembering correctly.
>>> Just updated this to a build from the 17th and there is no change. :(
>> That's a normal behaviour, both disks are compared and if there is
>> a difference, the data from the first disk is copied to the second.
>> That read/write pattern is slow on modern disks.
> Why is this a read/compare? I'm doing a raid1 reconstruct which should be a 
> straight read from disk A, write to disk B. The data from systat backs that 
> up. Each disk is doing ~110 transfers per second. wd1 the source disk is 
> reading at 7MB/s and wd2 the destination is writing at 7MB/s.
> 
> I've reconstructed on this system before with the same disks (the failure is 
> a cabling issue not the disk failing). So unless raid1 raidframe 
> reconstruction has changed in the last year I want to know why an operation 
> that USED to run at 30-40MB/s is now running at 7MB/s. I can't do a direct 
> copy as I need the system to stay up while its reconstructing and the raid FS 
> is a critical volume. Are you really telling me I'd be better off vaping the 
> disklabel and re-adding the old disk as a fresh component to the raid. I 
> thought that's what I had told raidctl to do.
> 
>> You can try to fail the disk again and restart reconstruction.
> That's how I triggered the reconstruct. The disk was already failed so I did 
> raidctl -R /dev/wd2e raid1 to start the reconstruction. In this case I can't 
> understand why it would be doing a comparison at all as it should be assuming 
> that the destination disk is fresh.
> 
> Mike


Re: Poor raidframe reconstruct performance on 8-stable

2018-11-23 Thread Mike Pumford




On 23/11/2018 20:01, Michael van Elst wrote:

mpumf...@mudcovered.org.uk (Mike Pumford) writes:


I'm seeing a write speed of 7MB/s which means my 2TB reconstruct is
going to take 72+hours! Last time I did it it took about 3 hours if I'm
remembering correctly.



Just updated this to a build from the 17th and there is no change. :(


That's a normal behaviour, both disks are compared and if there is
a difference, the data from the first disk is copied to the second.
That read/write pattern is slow on modern disks.

Why is this a read/compare? I'm doing a raid1 reconstruct which should 
be a straight read from disk A, write to disk B. The data from systat 
backs that up. Each disk is doing ~110 transfers per second. wd1 the 
source disk is reading at 7MB/s and wd2 the destination is writing at 
7MB/s.


I've reconstructed on this system before with the same disks (the 
failure is a cabling issue not the disk failing). So unless raid1 
raidframe reconstruction has changed in the last year I want to know why 
an operation that USED to run at 30-40MB/s is now running at 7MB/s. I 
can't do a direct copy as I need the system to stay up while its 
reconstructing and the raid FS is a critical volume. Are you really 
telling me I'd be better off vaping the disklabel and re-adding the old 
disk as a fresh component to the raid. I thought that's what I had told 
raidctl to do.



You can try to fail the disk again and restart reconstruction.

That's how I triggered the reconstruct. The disk was already failed so I 
did raidctl -R /dev/wd2e raid1 to start the reconstruction. In this case 
I can't understand why it would be doing a comparison at all as it 
should be assuming that the destination disk is fresh.


Mike


Re: Poor raidframe reconstruct performance on 8-stable

2018-11-23 Thread Michael van Elst
mpumf...@mudcovered.org.uk (Mike Pumford) writes:

>> I'm seeing a write speed of 7MB/s which means my 2TB reconstruct is 
>> going to take 72+hours! Last time I did it it took about 3 hours if I'm 
>> remembering correctly.

>Just updated this to a build from the 17th and there is no change. :(

That's a normal behaviour, both disks are compared and if there is
a difference, the data from the first disk is copied to the second.
That read/write pattern is slow on modern disks.

You can try to fail the disk again and restart reconstruction.

Or you can start without raid and copy the data manually. The
reconstruction that still starts when you configure the raid
will then just read both disks at normal speeds.

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: Poor raidframe reconstruct performance on 8-stable

2018-11-23 Thread Mike Pumford




On 23/11/2018 19:30, Mike Pumford wrote:
Due to a disk failure I'm having to do a full raid1 reconstruct on one 
of my systems and the performance seems much slower than last time I did 
it on this system.


I'm seeing a write speed of 7MB/s which means my 2TB reconstruct is 
going to take 72+hours! Last time I did it it took about 3 hours if I'm 
remembering correctly.


The filesystem level  performance of the raid when its working is a lot 
faster than this this speed.


Build date (Which reflects checkout date is)
NetBSD  8.0_STABLE NetBSD 8.0_STABLE (GENERIC) #0: Sat Sep  8 
09:19:05 BST 2018



Just updated this to a build from the 17th and there is no change. :(

Mike


Poor raidframe reconstruct performance on 8-stable

2018-11-23 Thread Mike Pumford
Due to a disk failure I'm having to do a full raid1 reconstruct on one 
of my systems and the performance seems much slower than last time I did 
it on this system.


I'm seeing a write speed of 7MB/s which means my 2TB reconstruct is 
going to take 72+hours! Last time I did it it took about 3 hours if I'm 
remembering correctly.


The filesystem level  performance of the raid when its working is a lot 
faster than this this speed.


Build date (Which reflects checkout date is)
NetBSD  8.0_STABLE NetBSD 8.0_STABLE (GENERIC) #0: Sat Sep  8 
09:19:05 BST 2018


amd64 system with following cpu:
cpu0 at mainbus0 apid 0
cpu0: Intel(R) Core(TM)2 Quad CPUQ6600  @ 2.40GHz, id 0x6fb


Disks are:
wd1 at atabus6 drive 0
wd1: 
wd1: drive supports 16-sector PIO transfers, LBA48 addressing
wd1: 1863 GB, 3876021 cyl, 16 head, 63 sec, 512 bytes/sect x 3907029168 
sectors

wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd1(ahcisata1:2:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 
(Ultra/133) (using DMA)

wd2 at atabus7 drive 0
wd2: 
wd2: drive supports 16-sector PIO transfers, LBA48 addressing
wd2: 1863 GB, 3876021 cyl, 16 head, 63 sec, 512 bytes/sect x 3907029168 
sectors

wd2: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd2(ahcisata1:3:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 
(Ultra/133) (using DMA)

raid1: Components: /dev/wd1e /dev/wd2e

Any suggestions?

Mike