Re: Poor raidframe reconstruct performance on 8-stable
On 27/11/2018 22:16, Mike Pumford wrote: On 27/11/2018 21:27, Havard Eidnes wrote: So for now just assume duff hardware. Sorry for the noise and thanks for the suggestions. Now confirmed as a very broken disk. Currently reconstructing a new raid1 onto a 4TB disk at 172MB/s :) Haven't run into a disk running that slowly without throwing IO faults before. Mike
Re: Poor raidframe reconstruct performance on 8-stable
On 27/11/2018 21:27, Havard Eidnes wrote: It could be that the caching settings on the new drive is different from the old one. Check and adjust with "dkctl", using "getcache" and "setcache" as required. Just checked that and the settings are the same. I've also disassembled and re-assembled the machine after the reconstruct completed (after 3 days). The write performance of the real raid is now just as slow with one disk showing 100% busy and the other showing 1% busy in systat vmstat. This was using fresh cables and I also swapped the 2 raid disks around on the motherboard connection. The slow performance followed the disk and didn't stay bound to the port. Based on that and the rather more sensible performance of the raid1 set in my other NetBSD 8 system I'm just going to assume the disk is bad and replace it ASAP. I have just got some 4TB disks to replace the existing 2TB volumes so once I've done a data copy I'll have the opportunity to rede a reconstruct on the new disks. So for now just assume duff hardware. Sorry for the noise and thanks for the suggestions. Mike
Re: Poor raidframe reconstruct performance on 8-stable
> [...] I'm doing a raid1 reconstruct which should > be a straight read from disk A, write to disk B. The data from systat > backs that up. Each disk is doing ~110 transfers per second. wd1 the > source disk is reading at 7MB/s and wd2 the destination is writing at > 7MB/s. It could be that the caching settings on the new drive is different from the old one. Check and adjust with "dkctl", using "getcache" and "setcache" as required. Regards, - Håvard
Re: Poor raidframe reconstruct performance on 8-stable
On 24/11/2018 09:19, Mike Pumford wrote: On 23/11/2018 22:20, Jaromir Dolecek wrote: Can you perhaps try interrupt count via intrctl? iirc someone complained about some interrupt storm coming from acpi. Didn't know about that one. That gives: interrupt id CPU0 CPU1 CPU2 CPU3 device name(s) ioapic0 pin 9 0* 0 0 0 acpi SCI msi0 vec 0 0* 0 0 0 hdaudio0 ioapic0 pin 16 432* 0 0 0 uhci0, unknown ioapic0 pin 21 76161* 0 0 0 uhci1, fwohci0 ioapic0 pin 19 18004848* 0 0 0 uhci2, uhci4, ahcisata1 ioapic0 pin 18 2* 0 0 0 ehci0, uhci5, ichsmb0 msi1 vec 0 0* 0 0 0 hdaudio1 ioapic0 pin 17 12* 0 0 0 unknown ioapic0 pin 23 0* 0 0 0 uhci3, ehci1, unknown ioapic0 pin 22 197505* 0 0 0 unknown, wm0 pin 19 is the controller for the disks. and vmstat -i says: interrupt total rate TLB shootdown 838374 16 cpu0 timer 4919245 99 ioapic0 pin 16 432 0 ioapic0 pin 21 76219 1 ioapic0 pin 19 18015996 364 ioapic0 pin 18 2 0 ioapic0 pin 17 12 0 ioapic0 pin 22 197629 3 Total 24047909 486 To follow up on this more this process isn't CPU limited in any way. The CPU time is reoproting as 99% idle consistently during the operation. Mike
Re: Poor raidframe reconstruct performance on 8-stable
On 23/11/2018 22:20, Jaromir Dolecek wrote: Can you perhaps try interrupt count via intrctl? iirc someone complained about some interrupt storm coming from acpi. Didn't know about that one. That gives: interrupt id CPU0 CPU1 CPU2 CPU3 device name(s) ioapic0 pin 9 0*0 0 0 acpi SCI msi0 vec 00*0 0 0 hdaudio0 ioapic0 pin 16 432*0 0 0 uhci0, unknown ioapic0 pin 2176161*0 0 0 uhci1, fwohci0 ioapic0 pin 19 18004848*0 0 0 uhci2, uhci4, ahcisata1 ioapic0 pin 182*0 0 0 ehci0, uhci5, ichsmb0 msi1 vec 00*0 0 0 hdaudio1 ioapic0 pin 17 12*0 0 0 unknown ioapic0 pin 230*0 0 0 uhci3, ehci1, unknown ioapic0 pin 22 197505*0 0 0 unknown, wm0 pin 19 is the controller for the disks. and vmstat -i says: interrupt total rate TLB shootdown 838374 16 cpu0 timer 4919245 99 ioapic0 pin 16 4320 ioapic0 pin 21 762191 ioapic0 pin 19 18015996 364 ioapic0 pin 18 20 ioapic0 pin 17120 ioapic0 pin 221976293 Total 24047909 486 Mike
Re: Poor raidframe reconstruct performance on 8-stable
Can you perhaps try interrupt count via intrctl? iirc someone complained about some interrupt storm coming from acpi. > Le 23 nov. 2018 à 21:15, Mike Pumford a écrit : > > > >> On 23/11/2018 20:01, Michael van Elst wrote: >> mpumf...@mudcovered.org.uk (Mike Pumford) writes: I'm seeing a write speed of 7MB/s which means my 2TB reconstruct is going to take 72+hours! Last time I did it it took about 3 hours if I'm remembering correctly. >>> Just updated this to a build from the 17th and there is no change. :( >> That's a normal behaviour, both disks are compared and if there is >> a difference, the data from the first disk is copied to the second. >> That read/write pattern is slow on modern disks. > Why is this a read/compare? I'm doing a raid1 reconstruct which should be a > straight read from disk A, write to disk B. The data from systat backs that > up. Each disk is doing ~110 transfers per second. wd1 the source disk is > reading at 7MB/s and wd2 the destination is writing at 7MB/s. > > I've reconstructed on this system before with the same disks (the failure is > a cabling issue not the disk failing). So unless raid1 raidframe > reconstruction has changed in the last year I want to know why an operation > that USED to run at 30-40MB/s is now running at 7MB/s. I can't do a direct > copy as I need the system to stay up while its reconstructing and the raid FS > is a critical volume. Are you really telling me I'd be better off vaping the > disklabel and re-adding the old disk as a fresh component to the raid. I > thought that's what I had told raidctl to do. > >> You can try to fail the disk again and restart reconstruction. > That's how I triggered the reconstruct. The disk was already failed so I did > raidctl -R /dev/wd2e raid1 to start the reconstruction. In this case I can't > understand why it would be doing a comparison at all as it should be assuming > that the destination disk is fresh. > > Mike
Re: Poor raidframe reconstruct performance on 8-stable
On 23/11/2018 20:01, Michael van Elst wrote: mpumf...@mudcovered.org.uk (Mike Pumford) writes: I'm seeing a write speed of 7MB/s which means my 2TB reconstruct is going to take 72+hours! Last time I did it it took about 3 hours if I'm remembering correctly. Just updated this to a build from the 17th and there is no change. :( That's a normal behaviour, both disks are compared and if there is a difference, the data from the first disk is copied to the second. That read/write pattern is slow on modern disks. Why is this a read/compare? I'm doing a raid1 reconstruct which should be a straight read from disk A, write to disk B. The data from systat backs that up. Each disk is doing ~110 transfers per second. wd1 the source disk is reading at 7MB/s and wd2 the destination is writing at 7MB/s. I've reconstructed on this system before with the same disks (the failure is a cabling issue not the disk failing). So unless raid1 raidframe reconstruction has changed in the last year I want to know why an operation that USED to run at 30-40MB/s is now running at 7MB/s. I can't do a direct copy as I need the system to stay up while its reconstructing and the raid FS is a critical volume. Are you really telling me I'd be better off vaping the disklabel and re-adding the old disk as a fresh component to the raid. I thought that's what I had told raidctl to do. You can try to fail the disk again and restart reconstruction. That's how I triggered the reconstruct. The disk was already failed so I did raidctl -R /dev/wd2e raid1 to start the reconstruction. In this case I can't understand why it would be doing a comparison at all as it should be assuming that the destination disk is fresh. Mike
Re: Poor raidframe reconstruct performance on 8-stable
mpumf...@mudcovered.org.uk (Mike Pumford) writes: >> I'm seeing a write speed of 7MB/s which means my 2TB reconstruct is >> going to take 72+hours! Last time I did it it took about 3 hours if I'm >> remembering correctly. >Just updated this to a build from the 17th and there is no change. :( That's a normal behaviour, both disks are compared and if there is a difference, the data from the first disk is copied to the second. That read/write pattern is slow on modern disks. You can try to fail the disk again and restart reconstruction. Or you can start without raid and copy the data manually. The reconstruction that still starts when you configure the raid will then just read both disks at normal speeds. -- -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: Poor raidframe reconstruct performance on 8-stable
On 23/11/2018 19:30, Mike Pumford wrote: Due to a disk failure I'm having to do a full raid1 reconstruct on one of my systems and the performance seems much slower than last time I did it on this system. I'm seeing a write speed of 7MB/s which means my 2TB reconstruct is going to take 72+hours! Last time I did it it took about 3 hours if I'm remembering correctly. The filesystem level performance of the raid when its working is a lot faster than this this speed. Build date (Which reflects checkout date is) NetBSD 8.0_STABLE NetBSD 8.0_STABLE (GENERIC) #0: Sat Sep 8 09:19:05 BST 2018 Just updated this to a build from the 17th and there is no change. :( Mike
Poor raidframe reconstruct performance on 8-stable
Due to a disk failure I'm having to do a full raid1 reconstruct on one of my systems and the performance seems much slower than last time I did it on this system. I'm seeing a write speed of 7MB/s which means my 2TB reconstruct is going to take 72+hours! Last time I did it it took about 3 hours if I'm remembering correctly. The filesystem level performance of the raid when its working is a lot faster than this this speed. Build date (Which reflects checkout date is) NetBSD 8.0_STABLE NetBSD 8.0_STABLE (GENERIC) #0: Sat Sep 8 09:19:05 BST 2018 amd64 system with following cpu: cpu0 at mainbus0 apid 0 cpu0: Intel(R) Core(TM)2 Quad CPUQ6600 @ 2.40GHz, id 0x6fb Disks are: wd1 at atabus6 drive 0 wd1: wd1: drive supports 16-sector PIO transfers, LBA48 addressing wd1: 1863 GB, 3876021 cyl, 16 head, 63 sec, 512 bytes/sect x 3907029168 sectors wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) wd1(ahcisata1:2:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA) wd2 at atabus7 drive 0 wd2: wd2: drive supports 16-sector PIO transfers, LBA48 addressing wd2: 1863 GB, 3876021 cyl, 16 head, 63 sec, 512 bytes/sect x 3907029168 sectors wd2: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) wd2(ahcisata1:3:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA) raid1: Components: /dev/wd1e /dev/wd2e Any suggestions? Mike