Re: [zfs-discuss] zfs replace problems please please help

2010-08-12 Thread Seth Keith


 -Original Message-
 From: Mark J Musante [mailto:mark.musa...@oracle.com]
 Sent: Wednesday, August 11, 2010 5:03 AM
 To: Seth Keith
 Cc: zfs-discuss@opensolaris.org
 Subject: Re: [zfs-discuss] zfs replace problems please please help
 
 On Tue, 10 Aug 2010, seth keith wrote:
 
  # zpool status
   pool: brick
  state: UNAVAIL
  status: One or more devices could not be used because the label is missing
 or invalid.  There are insufficient replicas for the pool to continue
 functioning.
  action: Destroy and re-create the pool from a backup source.
see: http://www.sun.com/msg/ZFS-8000-5E
  scrub: none requested
  config:
 
 NAME   STATE READ WRITE CKSUM
 brick  UNAVAIL  0 0 0  insufficient replicas
   raidz1   UNAVAIL  0 0 0  insufficient replicas
 c13d0  ONLINE   0 0 0
 c4d0   ONLINE   0 0 0
 c7d0   ONLINE   0 0 0
 c4d1   ONLINE   0 0 0
 replacing  UNAVAIL  0 0 0  insufficient replicas
   c15t0d0  UNAVAIL  0 0 0  cannot open
   c11t0d0  UNAVAIL  0 0 0  cannot open
 c12d0  FAULTED  0 0 0  corrupted data
 c6d0   ONLINE   0 0 0
 
  What I want is to remove c15t0d0 and c11t0d0 and replace with the original 
  c6d1.
 Suggestions?
 
 Do the labels still exist on c6d1?  e.g. what do you get from zdb -l
 /dev/rdsk/c6d1s0?
 
 If the label still exists, and the pool guid is the same as the labels on
 the other disks, you could try doing a zpool detach brick c15t0d0 (or
 c11t0d0), then export  try re-importing.  ZFS may find c6d1 at that
 point.  There's no way to guarantee that'll work.

When I do a zdb -l /dev/rdsk/any device I get the same output for all my 
drives in the pool, but I don't think it looks right:

# zdb -l /dev/rdsk/c4d0

LABEL 0

failed to unpack label 0

LABEL 1

failed to unpack label 1

LABEL 2

failed to unpack label 2

LABEL 3

failed to unpack label 3


If I try this zpool deatch action,  can it be reversed if there is a problem?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs replace problems please please help

2010-08-11 Thread Mark J Musante

On Tue, 10 Aug 2010, seth keith wrote:


# zpool status
 pool: brick
state: UNAVAIL
status: One or more devices could not be used because the label is missing
   or invalid.  There are insufficient replicas for the pool to continue
   functioning.
action: Destroy and re-create the pool from a backup source.
  see: http://www.sun.com/msg/ZFS-8000-5E
scrub: none requested
config:

   NAME   STATE READ WRITE CKSUM
   brick  UNAVAIL  0 0 0  insufficient replicas
 raidz1   UNAVAIL  0 0 0  insufficient replicas
   c13d0  ONLINE   0 0 0
   c4d0   ONLINE   0 0 0
   c7d0   ONLINE   0 0 0
   c4d1   ONLINE   0 0 0
   replacing  UNAVAIL  0 0 0  insufficient replicas
 c15t0d0  UNAVAIL  0 0 0  cannot open
 c11t0d0  UNAVAIL  0 0 0  cannot open
   c12d0  FAULTED  0 0 0  corrupted data
   c6d0   ONLINE   0 0 0

What I want is to remove c15t0d0 and c11t0d0 and replace with the original 
c6d1. Suggestions?


Do the labels still exist on c6d1?  e.g. what do you get from zdb -l 
/dev/rdsk/c6d1s0?


If the label still exists, and the pool guid is the same as the labels on 
the other disks, you could try doing a zpool detach brick c15t0d0 (or 
c11t0d0), then export  try re-importing.  ZFS may find c6d1 at that 
point.  There's no way to guarantee that'll work.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs replace problems please please help

2010-08-11 Thread Mark J Musante

On Wed, 11 Aug 2010, Seth Keith wrote:



When I do a zdb -l /dev/rdsk/any device I get the same output for all my 
drives in the pool, but I don't think it looks right:

# zdb -l /dev/rdsk/c4d0


What about /dev/rdsk/c4d0s0?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs replace problems please please help

2010-08-11 Thread seth keith
this is for newbies like myself: I used using 'zdb -l' wrong, just using the 
drive name from 'zpool status' or format which is like c6d1, didn't work. I 
needed to add s0 to the end:

zdb -l /dev/dsk/c6d1s0

gives me a good looking label ( I think ). The pool_guid values are the same 
for all the drives. I see the first 500GB drive I replaced has children that 
are all 500GB drives. The second 500GB drive I replaced has 1 2TB child. All 
the other drives have 2 2TB children.

I managed to detach one of the drives being replaced, but I count not detach 
the other two 2TB drives. I exported and imported, now my pool looks like

  pool: brick
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
brick DEGRADED 0 0 0
  raidz1  DEGRADED 0 0 0
c13d0 ONLINE   0 0 0
c4d0  ONLINE   0 0 0
c7d0  ONLINE   0 0 0
c4d1  ONLINE   0 0 0
14607330800900413650  UNAVAIL  0 0 0  was 
/dev/dsk/c15t0d0s0
c11t1d0   ONLINE   0 0 0
c6d0  ONLINE   0 0 0

errors: 352808 data errors, use '-v' for a list

I there someway I can take the original zpool label from the first 500GB drive 
I replaced and use it to fix up the other drives in the pool?  What are my 
options here...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs replace problems please please help

2010-08-11 Thread Mark J Musante

On Wed, 11 Aug 2010, seth keith wrote:


   NAME  STATE READ WRITE CKSUM
   brick DEGRADED 0 0 0
 raidz1  DEGRADED 0 0 0
   c13d0 ONLINE   0 0 0
   c4d0  ONLINE   0 0 0
   c7d0  ONLINE   0 0 0
   c4d1  ONLINE   0 0 0
   14607330800900413650  UNAVAIL  0 0 0  was 
/dev/dsk/c15t0d0s0
   c11t1d0   ONLINE   0 0 0
   c6d0  ONLINE   0 0 0


OK, that's good - your missing disk can be replaced with a brand new disk 
using zpool replace brick 14607330800900413650 disk name.  Then wait 
for the resilver to complete and do a full scrub to be on the safe side.



errors: 352808 data errors, use '-v' for a list

I there someway I can take the original zpool label from the first 500GB 
drive I replaced and use it to fix up the other drives in the pool?


No.  The files with errors can only be restored from any backups you made. 
If there is an original disk that's not part of your pool, you might want 
to try making a backup of it, plug it in, and see if a zpool export/zpool 
import will find it.  But it will only find it if zdb -l shows four valid 
labels.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs replace problems please please help

2010-08-10 Thread Mark J Musante

On Tue, 10 Aug 2010, seth keith wrote:


first off I don't have the exact failure messages here, and I did not take good 
notes of the failures, so I will do the best I can. Please try and give me 
advice anyway.

I have a 7 drive raidz1 pool with 500G drives, and I wanted to replace them all 
with 2TB drives. Immediately I ran into trouble. If I tired:

  zpool offline brick device


Were you doing an in-place replace?  i.e. pulling out the old disk and 
putting in the new one?



I got a message like: insufficient replicas


This means that there was a problem with the pool already.  When ZFS opens 
a pool, it looks at the disks that are part of that pool.  For raidz1, if 
more than one disk is unopenable, then the pool will report that there are 
no valid replicas, which is probably the error message you saw.


If that's the case, then your pool already had one failed drive in, and 
you were attempting to disable a second drive.  Do you have a copy of the 
output from zpool status brick from before you tried your experiment?




I tried to

   zpool replace brick old device new device

and I got something like: new device must be a single disk


Unfortunately, this just means that we got back an EINVAL from the kernel, 
which could mean any one of a number of things, but probably there was an 
issue with calculating the drive size.  I'd try plugging it separately and 
using 'format' to see how big solaris thinks the drive is.




I finally got replace and offline to work by:

   zpool export brick
   [reboot]
   zpool import brick


Probably didn't need to reboot there.


now

   zpool offline brick old device
   zpool replace brick old device new device


If you use this form for the replace command, you don't need to offline 
the old disk first.  You only need to offline a disk if you're going to 
pull it out.  And then you can do an in-place replace just by issuing 
zpool replace brick device-you-swapped


This worked. zpool status showed replacing in progress, and then after 
about 26 hours of resilvering, everything looked fine. The old device 
was gone, and no errors in the pool. Now I tried to do it again with the 
next device. I missed the zpool offline part however. Immediately, I 
started getting disk errors on both the drive I was replacing and the 
first drive I replaced.


Read errors?  Write errors?  Checksum errors?  Sounds like a full scrub 
would have been a good idea prior to replacing the second disk.


I have the two original drives, they are in good shape and should still 
have all the data on them, can I somehow put my original zpool back. 
How? Please help!


You can try exporting the pool, plugging in the original drives, and then 
do a recovery on it.  See the zpool manpage under zpool import for the 
recovery options and what the flags mean.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs replace problems please please help

2010-08-10 Thread seth keith
First off double thanks for replying to my post. I tried to your advice but 
something is way wrong. I have all 2TB drives disconnected, and the 7 500GB 
drives connected. All 7 show up in bios and in format. Here all the drives are 
the original 7 500Mb drives: 

   # format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
   0. c3d0 DEFAULT cyl 4859 alt 2 hd 255 sec 63
  /p...@0,0/pci8086,3...@1c/pci-...@0/i...@1/c...@0,0
   1. c4d0 Maxtor 7-H81AYZ5-0001-465.76GB
  /p...@0,0/pci-...@1f,2/i...@0/c...@0,0
   2. c4d1 WDC WD50-  WD-WCAS8323204-0001-465.76GB
  /p...@0,0/pci-...@1f,2/i...@0/c...@1,0
   3. c6d0 WDC WD50-  WD-WCAS8510568-0001-465.76GB
  /p...@0,0/pci-...@1f,2/i...@1/c...@0,0
   4. c6d1 WDC WD50-  WD-WCAUF149175-0001-465.76GB
  /p...@0,0/pci-...@1f,2/i...@1/c...@1,0
   5. c7d0 Maxtor 7-H81DM5X-0001-465.76GB
  /p...@0,0/pci-...@1f,5/i...@0/c...@0,0
   6. c12d0 WDC WD50-  WD-WCAUH024469-0001-465.76GB
  /p...@0,0/pci8086,2...@1e/pci-...@1/i...@1/c...@0,0
   7. c13d0 WDC WD50-  WD-WCAS8415731-0001-465.76GB
  /p...@0,0/pci8086,2...@1e/pci-...@1/i...@0/c...@0,0


Now clear out brick:

# zpool export brick
# zpool status
  pool: rpool
  state: ONLINE
  scrub: none requested
  config:

NAMESTATE READ WRITE CKSUM
rpool   ONLINE   0 0 0
c3d0s0  ONLINE   0 0 0

errors: No known data errors


Then an error on the import:

# zpool import -F brick
cannot open 'brick': I/O error

Now there is a pool but the drives are wrong:

# zpool status
  pool: brick
 state: UNAVAIL
status: One or more devices could not be used because the label is missing
or invalid.  There are insufficient replicas for the pool to continue
functioning.
action: Destroy and re-create the pool from a backup source.
   see: http://www.sun.com/msg/ZFS-8000-5E
 scrub: none requested
config:

NAME   STATE READ WRITE CKSUM
brick  UNAVAIL  0 0 0  insufficient replicas
  raidz1   UNAVAIL  0 0 0  insufficient replicas
c13d0  ONLINE   0 0 0
c4d0   ONLINE   0 0 0
c7d0   ONLINE   0 0 0
c4d1   ONLINE   0 0 0
replacing  UNAVAIL  0 0 0  insufficient replicas
  c15t0d0  UNAVAIL  0 0 0  cannot open
  c11t0d0  UNAVAIL  0 0 0  cannot open
c12d0  FAULTED  0 0 0  corrupted data
c6d0   ONLINE   0 0 0



What I want is to remove c15t0d0 and c11t0d0 and replace with the original 
c6d1. Suggestions?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss