Re: RAID1 drive replacement help?

2009-09-22 Thread Jeffrey C. Smith

Paul M wrote:

Thinking about this some more, I suspect that what may be happening is that
the disk still thinks it is a spare. Try blowing away the RAID partition,
possibly even replace it with a regular partition and write data to it just
to make sure. Then delete that, recreate the RAID partition and try 
again to

reconstruct the component.
(It may also be possible to achieve this with the -r option to raidctl, but
I'm unfamiliar with the operation of this switch).


I will nuke the raid partition. I'll relabel is as a regular partition 
and format a file system on it. I'll then relabel it again as a RAID 
partition. I assume that would count as a nuke 'em from space.  :)


Once it's nuked what are the series of steps to add it as a component to 
the array? I want to make sure I get it right this time.



Essentially, you configured the disk as a spare, now you want to override
that configuration and configure it as a component.
The man page does say that the spare and the component it was reconstructed
from are interchangeable, but I think the system is getting confused as to
just what wd1d is.


OK...

Taking a different approach, you could keep wd1d as the spare, but add a 
3rd disk to replace the failed component and simply reconstruct onto that 
(using the -B switch to raidctl)


I will look in the box to see if I can get another drive in there. I may 
 be space constrained...



Also - dont forget about the syslog.


Sorry, but I'm not clear on what you mean here? Could you clarify?

Thanks,
Jeff
--
Jeffrey C. Smith   Phone: 512.692.7607
RevolutionONE  Cell : 512.965.3898
j...@revolutionone.com






paulm




Re: RAID1 drive replacement help?

2009-09-17 Thread Jeffrey C. Smith

Paul M wrote:
According  to the man page (if my memory is correct), the name 
component1 is a placeholder used by raidctl when it is unable to 
access the drive - in other words this component is bad. Remove the 
drive completely and it will still list it as component1. So 
component1 is not the name.


You need to use (again, if my memory is correct):
# raidctl -v -R /dev/wd1d raid0


Paul,

Thanks for the reply. I still seem to be having trouble. Here's the 
latest status of the array:


# raidctl -v -s raid0
raid0 Components:
   /dev/wd0d: optimal
  component1: failed
No spares.
Component label for /dev/wd0d:
   Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
   Version: 2, Serial Number: 100, Mod Counter: 1237272102
   Clean: No, Status: 0
   sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
   Queue size: 100, blocksize: 512, numBlocks: 238613376
   RAID Level: 1
   Autoconfig: Yes
   Root partition: Yes
   Last configured as: raid0
component1 status is: failed.  Skipping label.
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.


Now, I try to reconstruct /dev/wd1d (the failed drive):

# raidctl -v -R /dev/wd1d raid0
raidctl: /dev/wd1d is not a component of this device

Still no luck. Any more ideas???

Thanks,
Jeff
--
Jeffrey C. Smith   Phone: 512.692.7607
RevolutionONE  Cell : 512.965.3898
j...@revolutionone.com



Re: RAID1 drive replacement help?

2009-09-17 Thread Jeffrey C. Smith

Paul M wrote:


On 18/09/2009, at 2:28 PM, Jeffrey C. Smith wrote:

Now, I try to reconstruct /dev/wd1d (the failed drive):

# raidctl -v -R /dev/wd1d raid0
raidctl: /dev/wd1d is not a component of this device

Still no luck. Any more ideas???

Thanks,
Jeff



Has your raid0.conf file changed? The one you posted earlier shows that 
/dev/wd1d

*is* a component of that array.


It has not changed. Here it is:

# more /etc/raid0.conf
START array
1 2 0
START disks
/dev/wd0d
/dev/wd1d
START layout
128 1 1 1
START queue
fifo 100


What do disklabel wd1 and dmesg | grep wd show? anything suspicious?


Here are the disk labels:
# disklabel wd0
# Inside MBR partition 3: type A6 start 63 size 241248042
# /dev/rwd0c:
type: ESDI
disk: ESDI/IDE disk
label: IC35L120AVV207-1
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 15017
total sectors: 241254720
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0   # microseconds
track-to-track seek: 0  # microseconds
drivedata: 0

16 partitions:
#size   offset  fstype [fsize bsize  cpg]
  a:  2104452   63  4.2BSD   2048 163841
  b:   530145  2104515swap
  c:2412547200  unused
  d:238613445  2634660RAID


# disklabel wd1
# Inside MBR partition 3: type A6 start 63 size 241248042
# /dev/rwd1c:
type: ESDI
disk: ESDI/IDE disk
label: IC35L120AVV207-1
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 15017
total sectors: 241254720
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0   # microseconds
track-to-track seek: 0  # microseconds
drivedata: 0

16 partitions:
#size   offset  fstype [fsize bsize  cpg]
  a:  2104452   63  4.2BSD   2048 163841
  b:   530145  2104515swap
  c:2412547200  unused
  d:238613445  2634660RAID


Here's the entire dmesg (lots of wdx/raid0 stuff at the bottom):

# dmesg
OpenBSD 4.5 (GENERIC) #0: Tue Aug 11 17:54:58 CDT 2009
r...@gateway.whiteinstruments.com:/mnt/sys/arch/i386/compile/GENERIC
cpu0: Intel(R) Pentium(R) 4 CPU 1400MHz (GenuineIntel 686-class) 1.39 GHz
cpu0: 
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM

real mem  = 133296128 (127MB)
avail mem = 120193024 (114MB)
mainbus0 at root
bios0 at mainbus0: AT/286+ BIOS, date 08/02/01, BIOS32 rev. 0 @ 0xffe90, 
SMBIOS rev. 2.3 @ 0xf0450 (97 entries)

bios0: vendor Dell Computer Corporation version A04 date 08/02/2001
bios0: Dell Computer Corporation OptiPlex GX400
acpi0 at bios0: rev 0
acpi0: tables DSDT FACP SSDT BOOT
acpi0: wakeup devices VBTN(S4) PCI0(S5) USB0(S3) USB1(S3) PCI1(S5) KBD_(S3)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 2 (PCI1)
acpicpu0 at acpi0
acpibtn0 at acpi0: VBTN
bios0: ROM list: 0xc/0x9800 0xc9800/0x2800
cpu0 at mainbus0: (uniprocessor)
pci0 at mainbus0 bus 0: configuration mode 1 (bios)
pchb0 at pci0 dev 0 function 0 Intel 82850 Host rev 0x02
intelagp0 at pchb0
agp0 at intelagp0: aperture at 0xf000, size 0x800
ppb0 at pci0 dev 1 function 0 Intel 82850/82860 AGP rev 0x02
pci1 at ppb0 bus 1
vga1 at pci1 dev 0 function 0 NVIDIA Riva TNT2 rev 0x15
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
ppb1 at pci0 dev 30 function 0 Intel 82801BA Hub-to-PCI rev 0x04
pci2 at ppb1 bus 2
dc0 at pci2 dev 8 function 0 Macronix PMAC 98715 rev 0x25: irq 10, 
address 00:80:c6:fa:cd:20

dcphy0 at dc0 phy 31: internal PHY
rl0 at pci2 dev 10 function 0 Realtek 8139 rev 0x10: irq 11, address 
00:50:fc:5f:45:61

rlphy0 at rl0 phy 0: RTL internal PHY
xl0 at pci2 dev 12 function 0 3Com 3c905C 100Base-TX rev 0x78: irq 11, 
address 00:06:5b:4a:5e:0f

exphy0 at xl0 phy 24: 3Com internal media interface
ichpcib0 at pci0 dev 31 function 0 Intel 82801BA LPC rev 0x04
pciide0 at pci0 dev 31 function 1 Intel 82801BA IDE rev 0x04: DMA, 
channel 0 wired to compatibility, channel 1 wired to compatibility

wd0 at pciide0 channel 0 drive 0: IC35L120AVV207-1
wd0: 16-sector PIO, LBA48, 117800MB, 241254720 sectors
wd1 at pciide0 channel 0 drive 1: IC35L120AVV207-1
wd1: 16-sector PIO, LBA48, 117800MB, 241254720 sectors
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2
wd1(pciide0:0:1): using PIO mode 4, Ultra-DMA mode 2
atapiscsi0 at pciide0 channel 1 drive 0
scsibus0 at atapiscsi0: 2 targets
cd0 at scsibus0 targ 0 lun 0: LG, CD-ROM CRD-8482B, 1.05 ATAPI 5/cdrom 
removable

cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2
uhci0 at pci0 dev 31 function 2 Intel 82801BA USB rev 0x04: irq 11
ichiic0 at pci0 dev 31 function 3 Intel 82801BA SMBus rev 0x04: irq 10
iic0 at ichiic0
uhci1 at pci0 dev 31 function 4 Intel 82801BA USB rev 0x04: irq 9
auich0 at pci0 dev 31

Re: RAID1 drive replacement help?

2009-09-16 Thread Jeffrey C. Smith

Paul M wrote:

On 16/09/2009, at 10:46 AM, Jeffrey C. Smith wrote:

I am trying to add a new drive to replace a failed drive on my RAID1 
OpenBSD system. I have read the available documentation but can't get 
the drive added permanently. Here's what I've done so far-


...

Any feedback would be most appreciated...


You did not state what the failed disk was - was it also wd1d?


Sorry. Yes it is...

So, you replaced the failed drive, then configured this new drive as a 
spare, and then reconstructed onto it. Is that correct?


Yes. And as a spare the array is up and healthy:

# raidctl -s raid0
raid0 Components:
   /dev/wd0d: optimal
  component1: spared
Spares:
   /dev/wd1d: used_spare
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.

I think this is wrong because the new disk is not a spare, it is one of 
the components.
I think what you need is the -R option to raidctl, rather than the -F 
option.


OK. So I rebooted and the status is now failed:

# raidctl -s raid0
raid0 Components:
   /dev/wd0d: optimal
  component1: failed
No spares.
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.

I try the -R option:

# raidctl -v -R component1 raid0
Reconstruction status:

It doesn't complain and comes back almost immediately (strange). I check 
status again:


# raidctl -s raid0
raid0 Components:
   /dev/wd0d: optimal
  component1: failed
No spares.
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.

Says reconstruction is complete but the device is still failed? Maybe it 
wants the device name:


# raidctl -v -R wd1d raid0
raidctl: wd1d is not a component of this device

Nope. Seems component1 is the correct name.

So, it does not complain when I run the reconstruct command (raidctl -v 
-R component1 raid0) and it even starts the reconstruction progress 
indicator. But, it quickly returns after that and does not seem to do 
anything?


To recap, seems the only way I can get the drive into the array is to 
add it as a spare and us the -F option to reconstruct it. After that the 
array is up and healthy. If I reboot the wd1 drive (component1) fails 
and I'm back to where I started.


Any ideas out there???

Thanks much,
Jeff



RAID1 drive replacement help?

2009-09-15 Thread Jeffrey C. Smith
I am trying to add a new drive to replace a failed drive on my RAID1 
OpenBSD system. I have read the available documentation but can't get 
the drive added permanently. Here's what I've done so far-


First, add the drive as a spare:

# raidctl -a /dev/wd1d raid0

Checking the status gives:

# raidctl -s raid0
raid0 Components:
   /dev/wd0d: optimal
  component1: failed
Spares:
   /dev/wd1d: spare
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.

Forcing the reconstruction of the array:

# raidctl -vF component1 raid0
Reconstruction status:
  100% |   | ETA:00:00 /

Now the status looks like this (after about 2 hours of rebuild time):

# raidctl -s raid0
raid0 Components:
   /dev/wd0d: optimal
  component1: spared
Spares:
   /dev/wd1d: used_spare
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.


As far as I can tell the second drive is now part of the array as a 
spare. This is the second time I've completed these steps which explains 
why the parity is clean - I've already rebuilt the parity earlier.


My config file looks like this:

# more /etc/raid0.conf
START array
1 2 0
START disks
/dev/wd0d
/dev/wd1d
START layout
128 1 1 1
START queue
fifo 100


If I reboot the machine it will come up but the second drive (/dev/wd1d) 
is still in the failed state? I suspect that I need to reconfigure the 
array to make wd1d a permanent part of the array. However, I am not sure 
how that would be done and I don't want to make a mistake and trash the 
array.


What do I need to do to make the spare a permanent part of the array so 
that on the next system boot it will have both drives in optimal state?


Any feedback would be most appreciated...

Thanks,
Jeff
--
Jeffrey C. Smith   Phone: 512.692.7607
RevolutionONE  Cell : 512.965.3898
j...@revolutionone.com