Re: [zfs-discuss] Help with corrupted pool

2010-02-23 Thread Ethan
On Sun, Feb 21, 2010 at 12:41, Ethan notet...@gmail.com wrote:

 Update: I'm stuck. Again.

 To answer For curiosity's sake, what happens when you remove (rename) your
 dir with the symlinks?: it finds the devices on p0 with no problem, with
 the symlinks directory deleted.

 After clearing the errors and scrubbing again, no errors were encountered
 in the second scrub. Then I offlined the disk which had errors in the first
 scrub.

 I followed the suggestion to thoroughly test the disk (and remap any bad
 sectors), filling it with random-looking data by encrypting /dev/zero.
 Reading back and decrypting the drive, it all read back as zeros - all
 good.

 I then checked the SMART status of the drive, which had 0 error rates for
 everything. I ran the several-hour extended self-test, whatever that does,
 after which I had two write errors on one drive which weren't there before.
 I believe it's the same drive that had the zfs errors, but I did the SMART
 stuff in linux, not being able to find SMART tools in solaris, and I haven't
 been able to figure out which drive is which. Is there a way to get a
 drive's serial number in solaris? I could identify it by that.

 I scrubbed again with the pool degraded. No errors.

 NAME STATE READ WRITE CKSUM
 qONLINE   0 0 0
   raidz1 ONLINE   0 0 0
 c9t4d0p0 ONLINE   0 0 0
 c9t5d0p0 ONLINE   0 0 0
 c9t2d0p0 ONLINE   0 0 0
 3763020893739678459  UNAVAIL  0 0 0  was
 /dev/dsk/c9t1d0p0
 c9t0d0p0 ONLINE   0 0 0

 errors: No known data errors

 I tried zpool replace on the drive.
 # zpool replace q 3763020893739678459 c9t1d0
 cannot replace 3763020893739678459 with c9t1d0: device is too small

 Victor was right. I went into 'format' and fought with it for a while.
 Moving the beginning of slice 0 from block 256 down to block 34 was simple
 enough, but I can not figure out how to tell it I don't want 8MB in slice 8.
 Is it even possible? I haven't got 8MB to spare (as silly as that sounds for
 a 1.5TB drive) - if I can't get rid of slice 8, I may have to stick with
 using p0's. I haven't encountered a problem using them so far (who needs
 partition tables anyway?) but I figured I'd ask if anybody had ideas about
 getting back that space.
 What's the 8MB for, anyway? Some stuff seems to indicate that it has to do
 with booting the drive, but this will never be a boot drive. That seems to
 be for VTOC stuff, not EFI, though. I did look at switching to VTOC labels,
 but it seems they don't support disks as large as I am using, so I think
 that's out.
 I also see Information that was stored in the alternate cylinders area,
 the last two cylinders of the disk, is now stored in slice 8. (
 http://docsun.cites.uiuc.edu/sun_docs/C/solaris_9/SUNWaadm/SYSADV1/p117.html)
 Not sure what an alternate cylinders area is - it sounds sort of like
 remapping bad sectors, but that's something that the disk does on its own.

 So, can I get the 8MB back? Should I use p0? Is there another option I'm
 not thinking of? (I could always try diving into the EFI label with a hex
 editor and set it the way I please with no silly slice 8)

 -Ethan


I did a replace onto p0 of the drive I'd randomized, and did a scrub. No
errors. (Then I did another scrub, just for the hell of it; no errors
again.)

I feel fairly content staying with p0's, unless there's a good reason not
to. There are a few things I'm not entirely certain about:

- Is there any significant advantage to having a partition table?
- If there is, is it possible to drop the 8MB slice 8 so that I can actually
have enough space to put my raid on slice 0?
- Should I replace the disk that had errors on the initial scrub, or is it
probably sufficient to just be wary of it, scrub frequently, and replace it
if it encounters any more errors?

-Ethan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with corrupted pool

2010-02-21 Thread Ethan
On Thu, Feb 18, 2010 at 16:03, Ethan notet...@gmail.com wrote:

 On Thu, Feb 18, 2010 at 15:31, Daniel Carosone d...@geek.com.au wrote:

 On Thu, Feb 18, 2010 at 12:42:58PM -0500, Ethan wrote:
  On Thu, Feb 18, 2010 at 04:14, Daniel Carosone d...@geek.com.au wrote:
  Although I do notice that right now, it imports just fine using the p0
  devices using just `zpool import q`, no longer having to use import -d
 with
  the directory of symlinks to p0 devices. I guess this has to do with
 having
  repaired the labels and such? Or whatever it's repaired having
 successfully
  imported and scrubbed.

 It's the zpool.cache file at work, storing extra copies of labels with
 corrected device paths.  For curiosity's sake, what happens when you
 remove (rename) your dir with the symlinks?


 I'll let you know when the current scrub finishes.



  After the scrub finished, this is the state of my pool:
  /export/home/ethan/qdsk/c9t1d0p0  DEGRADED 4 060
  too many errors

 Ick.  Note that there are device errors as well as content (checksum)
 errors, which means it's can't only be correctly-copied damage from
 your orignal pool that was having problems.

 zpool clear and rescrub, for starters, and see if they continue.


 Doing that now.



 I suggest also:
  - carefully checking and reseating cables, etc
  - taking backups now of anything you really wanted out of the pool,
   while it's still available.
  - choosing that disk as the first to replace, and scrubbing again
   after replacing onto it, perhaps twice.
  - doing a dd to overwrite that entire disk with random data and let
   it remap bad sectors, before the replace (not just zeros, and not
   just the sectors a zfs resilver would hit. openssl enc of /dev/zero
   with a lightweight cipher and whatever key; for extra caution read
   back and compare with a second openssl stream using the same key)
  - being generally very watchful and suspicious of that disk in
   particular, look at error logs for clues, etc.


 Very thorough. I have no idea how to do that with openssl, but I will look
 into learning this.


  - being very happy that zfs deals so well with all this abuse, and
   you know your data is ok.


 Yes indeed - very happy.



  I have no idea what happened to the one disk, but No known data errors
 is
  what makes me happy. I'm not sure if I should be concerned about the
  physical disk itself

 given that it's reported disk errors as well as damaged content, yes.


 Okay. Well, it's a brand-new disk and I can exchange it easily enough.



  or just assume that some data got screwed up with all
  this mess. I guess maybe I'll see how the disk behaves during the
 replace
  operations (restoring to it and then restoring from it four times seems
 like
  a pretty good test of it), and if it continues to error, replace the
  physical drive and if necessary restore from the original truecrypt
 volumes.

 Good plan; note the extra scrubs at key points in the process above.


 Will do. Thanks for the tip.



  So, current plan:
  - export the pool.

 shouldn't be needed; zpool offline dev would be enough

  - format c9t1d0 to have one slice being the entire disk.

 Might not have been needed, but given Victor's comments about reserved
 space, you may need to do this manually, yes.  Be sure to use EFI
 labels.  Pick the suspect disk first.

  - import. should be degraded, missing c9t1d0p0.

 no need if you didn't export

  - replace missing c9t1d0p0 with c9t1d0

 yup, or if you've manually partitioned you may need to mention the
 slice number to prevent it repartitioning with the default reserved
 space again. You may even need to use some other slice (s5 or
 whatever), but I don't think so.

  - wait for resilver.
  - repeat with the other four disks.

  - tell us how it went
  - drink beer.

 --
 Dan.


 Okay. Plan is updated to reflect your suggestions. Beer was already in the
 plan, but I forgot to list it. Speaking of which, I see your e-mail address
 is .au, but if you're ever in new york city I'd love to buy you a beer as
 thanks for all your excellent help with this. And anybody else in this
 thread - you guys are awesome.

 -Ethan


Update: I'm stuck. Again.

To answer For curiosity's sake, what happens when you remove (rename) your
dir with the symlinks?: it finds the devices on p0 with no problem, with
the symlinks directory deleted.

After clearing the errors and scrubbing again, no errors were encountered in
the second scrub. Then I offlined the disk which had errors in the first
scrub.

I followed the suggestion to thoroughly test the disk (and remap any bad
sectors), filling it with random-looking data by encrypting /dev/zero.
Reading back and decrypting the drive, it all read back as zeros - all
good.

I then checked the SMART status of the drive, which had 0 error rates for
everything. I ran the several-hour extended self-test, whatever that does,
after which I had two write errors on one drive which weren't there 

Re: [zfs-discuss] Help with corrupted pool

2010-02-18 Thread Daniel Carosone
On Wed, Feb 17, 2010 at 11:37:54PM -0500, Ethan wrote:
  It seems to me that you could also use the approach of 'zpool replace' for
 That is true. It seems like it then have to rebuild from parity for every
 drive, though, which I think would take rather a long while, wouldn't it?

No longer than copying - plus, it will only resilver active data, so
unless the pool is close to full it could save some time.  Certainly
it will save some hassle and risk of error, plugging and swapping drives
between machines more times.  As a further benefit, all this work will
count towards a qualification cycle for the current hardware setup.

I would recommend using replace, one drive at a time. Since you still
have the original drives to fall back on, you can do this now (before
making more changes to the pool with new data) without being overly
worried about a second failure killing your raidz1 pool.  Normally,
when doing replacements like this on a singly-redundant pool, it's a
good idea to run a scrub after each replace, making sure everything
you just wrote is valid before relying on it to resilver the next
disk. 

If you're keen on copying, I'd suggest doing over the network; that
way your write target is a system that knows the target partitioning
and there's no (mis)caclulation of offsets.

--
Dan.

pgpXOZkFtzKSn.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with corrupted pool

2010-02-18 Thread Ethan
On Thu, Feb 18, 2010 at 04:14, Daniel Carosone d...@geek.com.au wrote:

 On Wed, Feb 17, 2010 at 11:37:54PM -0500, Ethan wrote:
   It seems to me that you could also use the approach of 'zpool replace'
 for
  That is true. It seems like it then have to rebuild from parity for every
  drive, though, which I think would take rather a long while, wouldn't it?

 No longer than copying - plus, it will only resilver active data, so
 unless the pool is close to full it could save some time.  Certainly
 it will save some hassle and risk of error, plugging and swapping drives
 between machines more times.  As a further benefit, all this work will
 count towards a qualification cycle for the current hardware setup.

 I would recommend using replace, one drive at a time. Since you still
 have the original drives to fall back on, you can do this now (before
 making more changes to the pool with new data) without being overly
 worried about a second failure killing your raidz1 pool.  Normally,
 when doing replacements like this on a singly-redundant pool, it's a
 good idea to run a scrub after each replace, making sure everything
 you just wrote is valid before relying on it to resilver the next
 disk.

 If you're keen on copying, I'd suggest doing over the network; that
 way your write target is a system that knows the target partitioning
 and there's no (mis)caclulation of offsets.

 --
 Dan.



These are good points - it sounds like replacing one at a time is the way to
go. Thanks for pointing out these benefits.
Although I do notice that right now, it imports just fine using the p0
devices using just `zpool import q`, no longer having to use import -d with
the directory of symlinks to p0 devices. I guess this has to do with having
repaired the labels and such? Or whatever it's repaired having successfully
imported and scrubbed.
After the scrub finished, this is the state of my pool:


# zpool status
  pool: q
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed after 7h18m with 0 errors on Thu Feb 18 06:25:44
2010
config:

NAME  STATE READ WRITE CKSUM
q DEGRADED 0 0 0
  raidz1  DEGRADED 0 0 0
/export/home/ethan/qdsk/c9t4d0p0  ONLINE   0 0 0
/export/home/ethan/qdsk/c9t5d0p0  ONLINE   0 0 0
/export/home/ethan/qdsk/c9t2d0p0  ONLINE   0 0 0
/export/home/ethan/qdsk/c9t1d0p0  DEGRADED 4 060
too many errors
/export/home/ethan/qdsk/c9t0d0p0  ONLINE   0 0 0

errors: No known data errors


I have no idea what happened to the one disk, but No known data errors is
what makes me happy. I'm not sure if I should be concerned about the
physical disk itself, or just assume that some data got screwed up with all
this mess. I guess maybe I'll see how the disk behaves during the replace
operations (restoring to it and then restoring from it four times seems like
a pretty good test of it), and if it continues to error, replace the
physical drive and if necessary restore from the original truecrypt volumes.


So, current plan:
- export the pool.
- format c9t1d0 to have one slice being the entire disk.
- import. should be degraded, missing c9t1d0p0.
- replace missing c9t1d0p0 with c9t1d0 (should this be c9t1d0s0? my
understanding is that zfs will treat the two about the same, since it adds
the partition table to raw devices if that's what it's given and ends up
using s0 anyway)
- wait for resilver.
- repeat with the other four disks.

Sound good?

-Ethan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with corrupted pool

2010-02-18 Thread Cindy Swearingen

Hi Ethan,

Great job putting this pool back together...

I would agree with the disk-by-disk replacement by using the zpool
replace command. You can read about this command here:

http://docs.sun.com/app/docs/doc/817-2271/gazgd?a=view

Having a recent full backup of your data before making any more changes
is always recommended.

You might be able to figure out if c9t1d0p0 is a failing disk by
checking the fmdump -eV output but with the changing devices, it might
be difficult to isolate with gobs of output.

Also, if you are using whole disks, then use the c9t*d* designations.
The s* designations are unnecessary for whole disks and building pools
with p* devices isn't recommended.

Thanks,

Cindy


On 02/18/10 10:42, Ethan wrote:
On Thu, Feb 18, 2010 at 04:14, Daniel Carosone d...@geek.com.au 
mailto:d...@geek.com.au wrote:


On Wed, Feb 17, 2010 at 11:37:54PM -0500, Ethan wrote:
   It seems to me that you could also use the approach of 'zpool
replace' for
  That is true. It seems like it then have to rebuild from parity
for every
  drive, though, which I think would take rather a long while,
wouldn't it?

No longer than copying - plus, it will only resilver active data, so
unless the pool is close to full it could save some time.  Certainly
it will save some hassle and risk of error, plugging and swapping drives
between machines more times.  As a further benefit, all this work will
count towards a qualification cycle for the current hardware setup.

I would recommend using replace, one drive at a time. Since you still
have the original drives to fall back on, you can do this now (before
making more changes to the pool with new data) without being overly
worried about a second failure killing your raidz1 pool.  Normally,
when doing replacements like this on a singly-redundant pool, it's a
good idea to run a scrub after each replace, making sure everything
you just wrote is valid before relying on it to resilver the next
disk.

If you're keen on copying, I'd suggest doing over the network; that
way your write target is a system that knows the target partitioning
and there's no (mis)caclulation of offsets.

--
Dan.



These are good points - it sounds like replacing one at a time is the 
way to go. Thanks for pointing out these benefits.
Although I do notice that right now, it imports just fine using the p0 
devices using just `zpool import q`, no longer having to use import -d 
with the directory of symlinks to p0 devices. I guess this has to do 
with having repaired the labels and such? Or whatever it's repaired 
having successfully imported and scrubbed.

After the scrub finished, this is the state of my pool:


# zpool status
  pool: q
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed after 7h18m with 0 errors on Thu Feb 18 06:25:44 
2010

config:

NAME  STATE READ WRITE CKSUM
q DEGRADED 0 0 0
  raidz1  DEGRADED 0 0 0
/export/home/ethan/qdsk/c9t4d0p0  ONLINE   0 0 0
/export/home/ethan/qdsk/c9t5d0p0  ONLINE   0 0 0
/export/home/ethan/qdsk/c9t2d0p0  ONLINE   0 0 0
/export/home/ethan/qdsk/c9t1d0p0  DEGRADED 4 0
60  too many errors

/export/home/ethan/qdsk/c9t0d0p0  ONLINE   0 0 0

errors: No known data errors


I have no idea what happened to the one disk, but No known data errors 
is what makes me happy. I'm not sure if I should be concerned about the 
physical disk itself, or just assume that some data got screwed up with 
all this mess. I guess maybe I'll see how the disk behaves during the 
replace operations (restoring to it and then restoring from it four 
times seems like a pretty good test of it), and if it continues to 
error, replace the physical drive and if necessary restore from the 
original truecrypt volumes.


So, current plan:
- export the pool.
- format c9t1d0 to have one slice being the entire disk.
- import. should be degraded, missing c9t1d0p0.
- replace missing c9t1d0p0 with c9t1d0 (should this be c9t1d0s0? my 
understanding is that zfs will treat the two about the same, since it 
adds the partition table to raw devices if that's what it's given and 
ends up using s0 anyway)

- wait for resilver.
- repeat with the other four disks.

Sound good?

-Ethan




___
zfs-discuss mailing list

Re: [zfs-discuss] Help with corrupted pool

2010-02-18 Thread Victor Latushkin

Ethan wrote:

So, current plan:
- export the pool.
- format c9t1d0 to have one slice being the entire disk.
- import. should be degraded, missing c9t1d0p0.
- replace missing c9t1d0p0 with c9t1d0 (should this be c9t1d0s0? my 
understanding is that zfs will treat the two about the same, since it 
adds the partition table to raw devices if that's what it's given and 
ends up using s0 anyway)

- wait for resilver.
- repeat with the other four disks.

Sound good?


Almost. You can run into issue with size - slice 0 on EFI-labeled (whole-) disk 
 may not be sufficient to replace disk in your raidz1.


regards,
victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with corrupted pool

2010-02-18 Thread Ethan
On Thu, Feb 18, 2010 at 13:22, Victor Latushkin victor.latush...@sun.comwrote:

 Ethan wrote:

 So, current plan:
 - export the pool.
 - format c9t1d0 to have one slice being the entire disk.
 - import. should be degraded, missing c9t1d0p0.
 - replace missing c9t1d0p0 with c9t1d0 (should this be c9t1d0s0? my
 understanding is that zfs will treat the two about the same, since it adds
 the partition table to raw devices if that's what it's given and ends up
 using s0 anyway)
 - wait for resilver.
 - repeat with the other four disks.

 Sound good?


 Almost. You can run into issue with size - slice 0 on EFI-labeled (whole-)
 disk  may not be sufficient to replace disk in your raidz1.

 regards,
 victor


This should be okay, I think. The overhead from truecrypt was 262144 bytes,
so I have that much to spare on the non-truecrypted disks. An EFI GPT is 34
512-byte LBAs at each end, or 34816 bytes total. So there should be plenty
of room.

-Ethan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with corrupted pool

2010-02-18 Thread Victor Latushkin

Ethan wrote:
On Thu, Feb 18, 2010 at 13:22, Victor Latushkin 
victor.latush...@sun.com mailto:victor.latush...@sun.com wrote:


Ethan wrote:

So, current plan:
- export the pool.
- format c9t1d0 to have one slice being the entire disk.
- import. should be degraded, missing c9t1d0p0.
- replace missing c9t1d0p0 with c9t1d0 (should this be c9t1d0s0?
my understanding is that zfs will treat the two about the same,
since it adds the partition table to raw devices if that's what
it's given and ends up using s0 anyway)
- wait for resilver.
- repeat with the other four disks.

Sound good?


Almost. You can run into issue with size - slice 0 on EFI-labeled
(whole-) disk  may not be sufficient to replace disk in your raidz1.

regards,
victor


This should be okay, I think. The overhead from truecrypt was 262144 
bytes, so I have that much to spare on the non-truecrypted disks. An EFI 
GPT is 34 512-byte LBAs at each end, or 34816 bytes total. So there 
should be plenty of room.


By default ZFS creates s0 on EFI-labeled disk at offset of 256 sectors from the 
beginning of disk. Also there's 8MB reserved partition number 8.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with corrupted pool

2010-02-18 Thread Daniel Carosone
On Thu, Feb 18, 2010 at 12:42:58PM -0500, Ethan wrote:
 On Thu, Feb 18, 2010 at 04:14, Daniel Carosone d...@geek.com.au wrote:
 Although I do notice that right now, it imports just fine using the p0
 devices using just `zpool import q`, no longer having to use import -d with
 the directory of symlinks to p0 devices. I guess this has to do with having
 repaired the labels and such? Or whatever it's repaired having successfully
 imported and scrubbed.

It's the zpool.cache file at work, storing extra copies of labels with
corrected device paths.  For curiosity's sake, what happens when you
remove (rename) your dir with the symlinks?  

 After the scrub finished, this is the state of my pool:
 /export/home/ethan/qdsk/c9t1d0p0  DEGRADED 4 060
 too many errors

Ick.  Note that there are device errors as well as content (checksum)
errors, which means it's can't only be correctly-copied damage from
your orignal pool that was having problems.  

zpool clear and rescrub, for starters, and see if they continue.  

I suggest also:
 - carefully checking and reseating cables, etc
 - taking backups now of anything you really wanted out of the pool,
   while it's still available.
 - choosing that disk as the first to replace, and scrubbing again
   after replacing onto it, perhaps twice.
 - doing a dd to overwrite that entire disk with random data and let
   it remap bad sectors, before the replace (not just zeros, and not
   just the sectors a zfs resilver would hit. openssl enc of /dev/zero
   with a lightweight cipher and whatever key; for extra caution read
   back and compare with a second openssl stream using the same key)
 - being generally very watchful and suspicious of that disk in
   particular, look at error logs for clues, etc.
 - being very happy that zfs deals so well with all this abuse, and
   you know your data is ok.

 I have no idea what happened to the one disk, but No known data errors is
 what makes me happy. I'm not sure if I should be concerned about the
 physical disk itself

given that it's reported disk errors as well as damaged content, yes.

 or just assume that some data got screwed up with all
 this mess. I guess maybe I'll see how the disk behaves during the replace
 operations (restoring to it and then restoring from it four times seems like
 a pretty good test of it), and if it continues to error, replace the
 physical drive and if necessary restore from the original truecrypt volumes.

Good plan; note the extra scrubs at key points in the process above.

 So, current plan:
 - export the pool.

shouldn't be needed; zpool offline dev would be enough

 - format c9t1d0 to have one slice being the entire disk.

Might not have been needed, but given Victor's comments about reserved
space, you may need to do this manually, yes.  Be sure to use EFI
labels.  Pick the suspect disk first.

 - import. should be degraded, missing c9t1d0p0.

no need if you didn't export

 - replace missing c9t1d0p0 with c9t1d0 

yup, or if you've manually partitioned you may need to mention the
slice number to prevent it repartitioning with the default reserved
space again. You may even need to use some other slice (s5 or
whatever), but I don't think so.

 - wait for resilver.
 - repeat with the other four disks.

 - tell us how it went
 - drink beer.

--
Dan.

pgpAeO0FFlOUi.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with corrupted pool

2010-02-18 Thread Ethan
On Thu, Feb 18, 2010 at 15:31, Daniel Carosone d...@geek.com.au wrote:

 On Thu, Feb 18, 2010 at 12:42:58PM -0500, Ethan wrote:
  On Thu, Feb 18, 2010 at 04:14, Daniel Carosone d...@geek.com.au wrote:
  Although I do notice that right now, it imports just fine using the p0
  devices using just `zpool import q`, no longer having to use import -d
 with
  the directory of symlinks to p0 devices. I guess this has to do with
 having
  repaired the labels and such? Or whatever it's repaired having
 successfully
  imported and scrubbed.

 It's the zpool.cache file at work, storing extra copies of labels with
 corrected device paths.  For curiosity's sake, what happens when you
 remove (rename) your dir with the symlinks?


I'll let you know when the current scrub finishes.



  After the scrub finished, this is the state of my pool:
  /export/home/ethan/qdsk/c9t1d0p0  DEGRADED 4 060
  too many errors

 Ick.  Note that there are device errors as well as content (checksum)
 errors, which means it's can't only be correctly-copied damage from
 your orignal pool that was having problems.

 zpool clear and rescrub, for starters, and see if they continue.


Doing that now.



 I suggest also:
  - carefully checking and reseating cables, etc
  - taking backups now of anything you really wanted out of the pool,
   while it's still available.
  - choosing that disk as the first to replace, and scrubbing again
   after replacing onto it, perhaps twice.
  - doing a dd to overwrite that entire disk with random data and let
   it remap bad sectors, before the replace (not just zeros, and not
   just the sectors a zfs resilver would hit. openssl enc of /dev/zero
   with a lightweight cipher and whatever key; for extra caution read
   back and compare with a second openssl stream using the same key)
  - being generally very watchful and suspicious of that disk in
   particular, look at error logs for clues, etc.


Very thorough. I have no idea how to do that with openssl, but I will look
into learning this.


  - being very happy that zfs deals so well with all this abuse, and
   you know your data is ok.


Yes indeed - very happy.



  I have no idea what happened to the one disk, but No known data errors
 is
  what makes me happy. I'm not sure if I should be concerned about the
  physical disk itself

 given that it's reported disk errors as well as damaged content, yes.


Okay. Well, it's a brand-new disk and I can exchange it easily enough.



  or just assume that some data got screwed up with all
  this mess. I guess maybe I'll see how the disk behaves during the replace
  operations (restoring to it and then restoring from it four times seems
 like
  a pretty good test of it), and if it continues to error, replace the
  physical drive and if necessary restore from the original truecrypt
 volumes.

 Good plan; note the extra scrubs at key points in the process above.


Will do. Thanks for the tip.



  So, current plan:
  - export the pool.

 shouldn't be needed; zpool offline dev would be enough

  - format c9t1d0 to have one slice being the entire disk.

 Might not have been needed, but given Victor's comments about reserved
 space, you may need to do this manually, yes.  Be sure to use EFI
 labels.  Pick the suspect disk first.

  - import. should be degraded, missing c9t1d0p0.

 no need if you didn't export

  - replace missing c9t1d0p0 with c9t1d0

 yup, or if you've manually partitioned you may need to mention the
 slice number to prevent it repartitioning with the default reserved
 space again. You may even need to use some other slice (s5 or
 whatever), but I don't think so.

  - wait for resilver.
  - repeat with the other four disks.

  - tell us how it went
  - drink beer.

 --
 Dan.


Okay. Plan is updated to reflect your suggestions. Beer was already in the
plan, but I forgot to list it. Speaking of which, I see your e-mail address
is .au, but if you're ever in new york city I'd love to buy you a beer as
thanks for all your excellent help with this. And anybody else in this
thread - you guys are awesome.

-Ethan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with corrupted pool

2010-02-17 Thread Daniel Carosone
On Wed, Feb 17, 2010 at 12:31:27AM -0500, Ethan wrote:
 And I just realized - yes, labels 2 and 3 are in the wrong place relative to
 the end of the drive; I did not take into account the overhead taken up by
 truecrypt when dd'ing the data. The raw drive is 1500301910016 bytes; the
 truecrypt volume is 1500301647872 bytes. Off by 262144 bytes - I need a
 slice that is sized like the truecrypt volume.

It shouldn't matter if the slice is larger than the original; this is
how autoexpand works.   2 should be near the start (with 1), 3 should
be near the logical end (with 4).

Did this resolve the issue? You didn't say, and I have my doubts. I'm
not sure this is your problem, but it seems you're on the track to
finding the real problem.  

In the labels you can see, are the txg's the same for all pool
members?  If not, you may still need import -F, once all the
partitioning gets sorted out.

Also, re-reading what I wrote above, I realised I was being ambiguous
in my use of label.  Sometimes I meant the zfs labels that zdb -l
prints, and sometimes I meant the vtoc that format uses for slices. In
the BSD world we call those labels too, and I didn't realise I was
mixing terms.  Sorry for any confusion but it seems you figured out
what I meant :) 

--
Dan.

pgpAYV8T7I5RZ.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with corrupted pool

2010-02-17 Thread Ethan
On Wed, Feb 17, 2010 at 15:22, Daniel Carosone d...@geek.com.au wrote:

 On Wed, Feb 17, 2010 at 12:31:27AM -0500, Ethan wrote:
  And I just realized - yes, labels 2 and 3 are in the wrong place relative
 to
  the end of the drive; I did not take into account the overhead taken up
 by
  truecrypt when dd'ing the data. The raw drive is 1500301910016 bytes; the
  truecrypt volume is 1500301647872 bytes. Off by 262144 bytes - I need a
  slice that is sized like the truecrypt volume.

 It shouldn't matter if the slice is larger than the original; this is
 how autoexpand works.   2 should be near the start (with 1), 3 should
 be near the logical end (with 4).

 Did this resolve the issue? You didn't say, and I have my doubts. I'm
 not sure this is your problem, but it seems you're on the track to
 finding the real problem.

 In the labels you can see, are the txg's the same for all pool
 members?  If not, you may still need import -F, once all the
 partitioning gets sorted out.

 Also, re-reading what I wrote above, I realised I was being ambiguous
 in my use of label.  Sometimes I meant the zfs labels that zdb -l
 prints, and sometimes I meant the vtoc that format uses for slices. In
 the BSD world we call those labels too, and I didn't realise I was
 mixing terms.  Sorry for any confusion but it seems you figured out
 what I meant :)

 --
 Dan.


I have not yet successfully imported. I can see two ways of making progress
forward. One is forcing zpool to attempt to import using slice 2 for each
disk rather than slice 8. If this is how autoexpand works, as you say, it
seems like it should work fine for this. But I don't know how, or if it is
possible to, make it use slice 2.

The other way is to make a slice that is the correct size of the volumes as
I had them before (262144 bytes less than the size of the disk). It seems
like this should cause zpool to prefer to use this slice over slice 8, as it
can find all 4 labels, rather than just labels 0 and 1. I don't know how to
go about this either, or if it's possible. I have been starting to read
documentation on slices in solaris but haven't had time to get far enough to
figure out what I need.

I also have my doubts about this solving my actual issues - the ones that
caused me to be unable to import in zfs-fuse. But I need to solve this issue
before I can move forward to figuring out/solving whatever that issue was.

txg is the same for every volume.

-Ethan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with corrupted pool

2010-02-17 Thread Daniel Carosone
On Wed, Feb 17, 2010 at 03:37:59PM -0500, Ethan wrote:
 On Wed, Feb 17, 2010 at 15:22, Daniel Carosone d...@geek.com.au wrote:
 I have not yet successfully imported. I can see two ways of making progress
 forward. One is forcing zpool to attempt to import using slice 2 for each
 disk rather than slice 8. If this is how autoexpand works, as you say, it
 seems like it should work fine for this. But I don't know how, or if it is
 possible to, make it use slice 2.

Just get rid of 8? :-)

Normally, when using the whole disk, convention is that slice 0 is
used, and there's a small initial offset (for the EFI label).  I think
you probably want to make a slice 0 that spans the right disk sectors.

Were you using some other partitioning inside the truecrypt disks?
What devices were given to zfs-fuse, and what was their starting
offset? You may need to account for that, too.  How did you copy the
data, and to what target device, on what platform?  Perhaps the
truecrypt device's partition table is now at the start of the physical
disk, but solaris can't read it properly? If that's an MBR partition
table (which you look at with fdisk), you could try zdb -l on
/dev/dsk/c...p[01234] as well. 

We're just guessing here.. to provide more concrete help, you'll need
to show us some of the specifics, both of what you did and what you've
ended up with. fdisk and format partition tables and zdb -l output
would be a good start. 

Figuring out what is different about the disk where s2 was used would
be handy too.  That may be a synthetic label because something is
missing from that disk that the others have.

 The other way is to make a slice that is the correct size of the volumes as
 I had them before (262144 bytes less than the size of the disk). It seems
 like this should cause zpool to prefer to use this slice over slice 8, as it
 can find all 4 labels, rather than just labels 0 and 1. I don't know how to
 go about this either, or if it's possible. I have been starting to read
 documentation on slices in solaris but haven't had time to get far enough to
 figure out what I need.

format will let you examine and edit these.  Start by making sure they
have all the same partitioning, flags, etc.

 I also have my doubts about this solving my actual issues - the ones that
 caused me to be unable to import in zfs-fuse. But I need to solve this issue
 before I can move forward to figuring out/solving whatever that issue was.

Yeah - my suspicion is that import -F may help here.  That is a pool
recovery mode, where it rolls back progressive transactions until it
finds one that validates correctly.  It was only added recently and is
probably missing from the fuse version. 

--
Dan.



pgpZ5Qt78UfHj.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with corrupted pool

2010-02-17 Thread Ethan
On Wed, Feb 17, 2010 at 16:14, Daniel Carosone d...@geek.com.au wrote:

 On Wed, Feb 17, 2010 at 03:37:59PM -0500, Ethan wrote:
  On Wed, Feb 17, 2010 at 15:22, Daniel Carosone d...@geek.com.au wrote:
  I have not yet successfully imported. I can see two ways of making
 progress
  forward. One is forcing zpool to attempt to import using slice 2 for each
  disk rather than slice 8. If this is how autoexpand works, as you say, it
  seems like it should work fine for this. But I don't know how, or if it
 is
  possible to, make it use slice 2.

 Just get rid of 8? :-)


That sounds like an excellent idea, but, being very new to opensolaris, I
have no idea how to do this. I'm reading through
http://multiboot.solaris-x86.org/iv/3.html at the moment. You mention the
'format' utility below, which I will read more into.



 Normally, when using the whole disk, convention is that slice 0 is
 used, and there's a small initial offset (for the EFI label).  I think
 you probably want to make a slice 0 that spans the right disk sectors.

 Were you using some other partitioning inside the truecrypt disks?
 What devices were given to zfs-fuse, and what was their starting
 offset? You may need to account for that, too.  How did you copy the
 data, and to what target device, on what platform?  Perhaps the
 truecrypt device's partition table is now at the start of the physical
 disk, but solaris can't read it properly? If that's an MBR partition
 table (which you look at with fdisk), you could try zdb -l on
 /dev/dsk/c...p[01234] as well.


There was no partitioning on the truecrypt disks. The truecrypt volumes
occupied the whole raw disks (1500301910016 bytes each). The devices that I
gave to the zpool on linux were the whole raw devices that truecrypt exposed
(1500301647872 bytes each). There were no partition tables on either the raw
disks or the truecrypt volumes, just truecrypt headers on the raw disk and
zfs on the truecrypt volumes.
I copied the data simply using

dd if=/dev/mapper/truecrypt1 of=/dev/sdb

on linux, where /dev/mapper/truecrypt1 is the truecrypt volume for one hard
disk (which was on /dev/sda) and /dev/sdb is a new blank drive of the same
size as the old drive (but slightly larger than the truecrypt volume). And
repeat likewise for each of the five drives.

The labels 2 and 3 should be on the drives, but they are 262144 bytes
further from the end of slice 2 than zpool must be looking.

I could create a partition table on each drive, specifying a partition with
the size of the truecrypt volume, and re-copy the data onto this partition
(would have to re-copy as creating the partition table would overwrite zfs
data, as zfs starts at byte 0). Would this be preferable? I was under some
impression that zpool devices were preferred to be raw drives, not
partitions, but I don't recall where I came to believe that much less
whether it's at all correct.




 We're just guessing here.. to provide more concrete help, you'll need
 to show us some of the specifics, both of what you did and what you've
 ended up with. fdisk and format partition tables and zdb -l output
 would be a good start.

 Figuring out what is different about the disk where s2 was used would
 be handy too.  That may be a synthetic label because something is
 missing from that disk that the others have.

  The other way is to make a slice that is the correct size of the volumes
 as
  I had them before (262144 bytes less than the size of the disk). It seems
  like this should cause zpool to prefer to use this slice over slice 8, as
 it
  can find all 4 labels, rather than just labels 0 and 1. I don't know how
 to
  go about this either, or if it's possible. I have been starting to read
  documentation on slices in solaris but haven't had time to get far enough
 to
  figure out what I need.

 format will let you examine and edit these.  Start by making sure they
 have all the same partitioning, flags, etc.


I will have a look at format, but if this operates on partition tables,
well, my disks have none at the moment so I'll have to remedy that.



  I also have my doubts about this solving my actual issues - the ones that
  caused me to be unable to import in zfs-fuse. But I need to solve this
 issue
  before I can move forward to figuring out/solving whatever that issue
 was.

 Yeah - my suspicion is that import -F may help here.  That is a pool
 recovery mode, where it rolls back progressive transactions until it
 finds one that validates correctly.  It was only added recently and is
 probably missing from the fuse version.

 --
 Dan.


as for using import -F, I am on snv_111b, which I am not sure has -F for
import. I tried to update to the latest dev build (using the instructions
at http://pkg.opensolaris.org/dev/en/index.shtml ) but things are behaving
very strangely. I get error messages on boot - gconf-sanity-check-2 exited
with error status 256, and when I dismiss this and go into gnome, terminal
is messed up and doesn't echo anything I 

Re: [zfs-discuss] Help with corrupted pool

2010-02-17 Thread Ethan
On Wed, Feb 17, 2010 at 16:25, Daniel Carosone d...@geek.com.au wrote:

 On Thu, Feb 18, 2010 at 08:14:03AM +1100, Daniel Carosone wrote:
  I think
  you probably want to make a slice 0 that spans the right disk sectors.
 [..]
  you could try zdb -l on /dev/dsk/c...p[01234] as well.

 Depending on how and what you copied, you may have zfs data that start
 at sector 0, with no space for any partitioning labels at all.   If
 zdb -l /dev/rdsk/c..p0 shows a full set, this is what has happened.
 Trying to write partition tables may overwrite some of the zfs labels.

 zfs won't import such a pool by default (it doesn't check those
 devices).  You could cheat, by making a directory with symlinks to the
 p0 devices, and using import -d, but this will not work at boot.  It
 would be a way to verify current state, so you can then plan next
 steps.

 --
 Dan.


It looks like using p0 is exactly what I want, actually. Are s2 and p0 both
the entire disk?
The idea of symlinking to the full-disk devices from a directory and using
-d had crossed my mind, but I wasn't sure about it. I think that is
something worth trying. I'm not too concerned about it not working at boot -
I just want to get something working at all, at the moment.

-Ethan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with corrupted pool

2010-02-17 Thread Daniel Carosone
On Wed, Feb 17, 2010 at 04:48:23PM -0500, Ethan wrote:
 It looks like using p0 is exactly what I want, actually. Are s2 and p0 both
 the entire disk?

No. s2 depends on there being a solaris partition table (Sun or EFI),
and if there's also an fdisk partition table (disk shared with other
OS), s2 will only cover the solaris part of the disk.  It also
typically doesn't cover the last 2 cylinders, which solaris calls
reserved for hysterical raisins. 

 The idea of symlinking to the full-disk devices from a directory and using
 -d had crossed my mind, but I wasn't sure about it. I think that is
 something worth trying. 

Note, I haven't tried it either..

 I'm not too concerned about it not working at boot -
 I just want to get something working at all, at the moment.

Yup.

--
Dan.

pgprAVyX9wjRF.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with corrupted pool

2010-02-17 Thread Daniel Carosone
On Wed, Feb 17, 2010 at 04:44:19PM -0500, Ethan wrote:
 There was no partitioning on the truecrypt disks. The truecrypt volumes
 occupied the whole raw disks (1500301910016 bytes each). The devices that I
 gave to the zpool on linux were the whole raw devices that truecrypt exposed
 (1500301647872 bytes each). There were no partition tables on either the raw
 disks or the truecrypt volumes, just truecrypt headers on the raw disk and
 zfs on the truecrypt volumes.
 I copied the data simply using
 
 dd if=/dev/mapper/truecrypt1 of=/dev/sdb

Ok, then as you noted, you want to start with the ..p0 device, as the 
equivalent. 

 The labels 2 and 3 should be on the drives, but they are 262144 bytes
 further from the end of slice 2 than zpool must be looking.

I don't think so.. They're found by counting from the start; the end
can move out further (LUN expansion), and with autoexpand the vdev can
be extended (adding metaslabs) and the labels will be rewritten at the
new end after the last metaslab.  

I think the issue is that there are no partitions on the devices that allow
import to read that far.  Fooling it into using p0 would work around this.

 I could create a partition table on each drive, specifying a partition with
 the size of the truecrypt volume, and re-copy the data onto this partition
 (would have to re-copy as creating the partition table would overwrite zfs
 data, as zfs starts at byte 0). Would this be preferable?

Eventually, probably, yes - once you've confirmed all the speculation,
gotten past the partitioning issue to whatever other damage is in the
pool, resolved that, and have some kind of access to your data.

There are other options as well, including using replace
one at a time, or send|recv.

 I was under some
 impression that zpool devices were preferred to be raw drives, not
 partitions, but I don't recall where I came to believe that much less
 whether it's at all correct.

Sort of. zfs commands can be handed bare disks; internally they put
EFI labels on them automatically (though evidently not the fuse
variants).

ZFS mostly makes partitioning go away (hooray), but it still becomes
important in cases like this - shared disks and migration between
operating systems.

 as for using import -F, I am on snv_111b, which I am not sure has -F for
 import. 

Nope.

 I tried to update to the latest dev build (using the instructions
 at http://pkg.opensolaris.org/dev/en/index.shtml ) but things are behaving
 very strangely. I get error messages on boot - gconf-sanity-check-2 exited
 with error status 256, and when I dismiss this and go into gnome, terminal
 is messed up and doesn't echo anything I type, and I can't ssh in (error
 message about not able to allocate a TTY). anyway, zfs mailing list isn't
 really the place to be discussing that I suppose.

Not really, but read the release notes.

Alternately, if this is a new machine, you could just reinstall (or
boot from) a current livecd/usb, download from genunix.org

--
Dan.


pgp4YNaiZ3HRo.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with corrupted pool

2010-02-17 Thread Ethan
On Wed, Feb 17, 2010 at 17:44, Daniel Carosone d...@geek.com.au wrote:

 On Wed, Feb 17, 2010 at 04:48:23PM -0500, Ethan wrote:
  It looks like using p0 is exactly what I want, actually. Are s2 and p0
 both
  the entire disk?

 No. s2 depends on there being a solaris partition table (Sun or EFI),
 and if there's also an fdisk partition table (disk shared with other
 OS), s2 will only cover the solaris part of the disk.  It also
 typically doesn't cover the last 2 cylinders, which solaris calls
 reserved for hysterical raisins.

  The idea of symlinking to the full-disk devices from a directory and
 using
  -d had crossed my mind, but I wasn't sure about it. I think that is
  something worth trying.

 Note, I haven't tried it either..

  I'm not too concerned about it not working at boot -
  I just want to get something working at all, at the moment.

 Yup.

 --
 Dan.



Success!

I made a directory and symlinked p0's for all the disks:

et...@save:~/qdsk# ls -al
total 13
drwxr-xr-x   2 root root   7 Feb 17 23:06 .
drwxr-xr-x  21 ethanstaff 31 Feb 17 14:16 ..
lrwxrwxrwx   1 root root  17 Feb 17 23:06 c9t0d0p0 -
/dev/dsk/c9t0d0p0
lrwxrwxrwx   1 root root  17 Feb 17 23:06 c9t1d0p0 -
/dev/dsk/c9t1d0p0
lrwxrwxrwx   1 root root  17 Feb 17 23:06 c9t2d0p0 -
/dev/dsk/c9t2d0p0
lrwxrwxrwx   1 root root  17 Feb 17 23:06 c9t4d0p0 -
/dev/dsk/c9t4d0p0
lrwxrwxrwx   1 root root  17 Feb 17 23:06 c9t5d0p0 -
/dev/dsk/c9t5d0p0


I attempt to import using -d:


et...@save:~/qdsk# zpool import -d .
  pool: q
id: 5055543090570728034
 state: ONLINE
status: The pool is formatted using an older on-disk version.
action: The pool can be imported using its name or numeric identifier,
though
some features will not be available without an explicit 'zpool
upgrade'.
config:

q ONLINE
  raidz1  ONLINE
/export/home/ethan/qdsk/c9t4d0p0  ONLINE
/export/home/ethan/qdsk/c9t5d0p0  ONLINE
/export/home/ethan/qdsk/c9t2d0p0  ONLINE
/export/home/ethan/qdsk/c9t1d0p0  ONLINE
/export/home/ethan/qdsk/c9t0d0p0  ONLINE


The pool is not imported. This does look promising though. I attempt to
import using the name:


et...@save:~/qdsk# zpool import -d . q


it sits there for a while. I worry that it's going to hang forever like it
did on linux.
but then it returns!


et...@save:~/qdsk# zpool status
  pool: q
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scrub: scrub in progress for 0h2m, 0.43% done, 8h57m to go
config:

NAME  STATE READ WRITE CKSUM
q ONLINE   0 0 0
  raidz1  ONLINE   0 0 0
/export/home/ethan/qdsk/c9t4d0p0  ONLINE   0 0 0
/export/home/ethan/qdsk/c9t5d0p0  ONLINE   0 0 0
/export/home/ethan/qdsk/c9t2d0p0  ONLINE   0 0 0
/export/home/ethan/qdsk/c9t1d0p0  ONLINE   0 0 0
/export/home/ethan/qdsk/c9t0d0p0  ONLINE   0 0 0

errors: No known data errors


All the filesystems are there, all the files are there. Life is good.
Thank you all so much.

-Ethan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with corrupted pool

2010-02-17 Thread Daniel Carosone
On Wed, Feb 17, 2010 at 06:15:25PM -0500, Ethan wrote:
 Success!

Awesome.  Let that scrub finish before celebrating completely, but
this looks like a good place to stop and consider what you want for an
end state. 

--
Dan.


pgph6ALkJoiw6.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with corrupted pool

2010-02-17 Thread Ethan
On Wed, Feb 17, 2010 at 18:24, Daniel Carosone d...@geek.com.au wrote:

 On Wed, Feb 17, 2010 at 06:15:25PM -0500, Ethan wrote:
  Success!

 Awesome.  Let that scrub finish before celebrating completely, but
 this looks like a good place to stop and consider what you want for an
 end state.

 --
 Dan.


True. Thinking about where to end up - I will be staying on opensolaris
despite having no truecrypt. My paranoia likes having encryption, but it's
not really necessary for me, and it looks like encryption will be coming to
zfs itself soon enough. So, no need to consider getting things working on
zfs-fuse again.

I should have a partition table, for one thing, I suppose. The partition
table is EFI GUID Partition Table, looking at the relevant documentation.
So, I'll need to somehow shift my zfs data down by 17408 bytes (34 512-byte
LBA's, the size of the GPT's stuff at the beginning of the disk) - perhaps
just by copying from the truecrypt volumes as I did before, but with offset
of 17408 bytes. Then I should be able to use format to make the correct
partition information, and use the s0 partition for each drive as seems to
be the standard way of doing things. Or maybe I can format (write the GPT)
first, then get linux to recognize the GPT, and copy from truecrypt into the
partition.

Does that sound correct / sensible? Am I missing or mistaking anything?

Thanks,
-Ethan

PS: scrub in progress for 4h4m, 65.43% done, 2h9m to go - no errors yet.
Looking good.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with corrupted pool

2010-02-17 Thread Bob Friesenhahn

On Wed, 17 Feb 2010, Ethan wrote:


I should have a partition table, for one thing, I suppose. The partition table 
is EFI GUID Partition
Table, looking at the relevant documentation. So, I'll need to somehow shift my 
zfs data down by 17408
bytes (34 512-byte LBA's, the size of the GPT's stuff at the beginning of the 
disk) - perhaps just by

Does that sound correct / sensible? Am I missing or mistaking anything? 


It seems to me that you could also use the approach of 'zpool replace' 
for each device in turn until all of the devices are re-written to 
normal Solaris/zfs defaults.  This would also allows you to expand the 
partition size a bit for a larger pool.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with corrupted pool

2010-02-17 Thread Ethan
On Wed, Feb 17, 2010 at 23:21, Bob Friesenhahn bfrie...@simple.dallas.tx.us
 wrote:

 On Wed, 17 Feb 2010, Ethan wrote:


 I should have a partition table, for one thing, I suppose. The partition
 table is EFI GUID Partition
 Table, looking at the relevant documentation. So, I'll need to somehow
 shift my zfs data down by 17408
 bytes (34 512-byte LBA's, the size of the GPT's stuff at the beginning of
 the disk) - perhaps just by

 Does that sound correct / sensible? Am I missing or mistaking anything?


 It seems to me that you could also use the approach of 'zpool replace' for
 each device in turn until all of the devices are re-written to normal
 Solaris/zfs defaults.  This would also allows you to expand the partition
 size a bit for a larger pool.

 Bob
 --
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/


That is true. It seems like it then have to rebuild from parity for every
drive, though, which I think would take rather a long while, wouldn't it?
I could put in a new drive to overwrite. Then the replace command would just
copy from the old drive rather than rebuilding from parity (I think? that
seems like the sensible thing for it to do, anyway.) But I don't have a
spare drive for this - I have the original drives that still contain the
truecrypt volumes, but I am disinclined to start overwriting these, this
episode having given me a healthy paranoia about having good backups.
I guess this question just comes down to weighing whether rebuilding each
from parity or re-copying from the truecrypt volumes to a different offset
is more of a hassle.

-Ethan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with corrupted pool

2010-02-17 Thread Damon Atkins
Create a new empty pool on the solaris system, let it format the disks etc
ie used the disk names cXtXd0 This should put the EFI label on the disks and 
then setup the partitions for you. Just encase here is an example.

Go back to the Linux box, and see if you can use tools to see the same 
partition layout, if you can then dd it to the currect spot which in Solaris 
c5t2d0s0. (zfs send|zfs recv would be easier)

-bash-4.0$ pfexec fdisk -R -W - /dev/rdsk/c5t2d0p0

* /dev/rdsk/c5t2d0p0 default fdisk table
* Dimensions:
*512 bytes/sector
*126 sectors/track
*255 tracks/cylinder
*   60800 cylinders
*
* systid:
*1: DOSOS12
*  238: EFI_PMBR
*  239: EFI_FS
*
* IdAct  Bhead  Bsect  BcylEhead  Esect  EcylRsect  Numsect
  238   025563 102325563 10231  1953525167
  0 00  0  0   0  0  0   0  0
  0 00  0  0   0  0  0   0  0
  0 00  0  0   0  0  0   0  0


-bash-4.0$ pfexec prtvtoc /dev/rdsk/c5t2d0
* /dev/rdsk/c5t2d0 partition map
*
* Dimensions:
* 512 bytes/sector
* 1953525168 sectors
* 1953525101 accessible sectors
*
* Flags:
*   1: unmountable
*  10: read-only
*
* Unallocated space:
*   First SectorLast
*   Sector CountSector
*  34   222   255
*
*  First Sector   Last
* Partition  Tag   FlagsSector Count   Sector  Mount Directory
   0  4 00  2561953508495 1953508750
   8 1100  1953508751  163841953525134
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Help with corrupted pool

2010-02-16 Thread Ethan
This is the current state of my pool:

et...@save:~# zpool import
  pool: q
id: 5055543090570728034
 state: UNAVAIL
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
   see: http://www.sun.com/msg/ZFS-8000-5E
config:

q UNAVAIL  insufficient replicas
  raidz1  UNAVAIL  insufficient replicas
c9t4d0s8  UNAVAIL  corrupted data
c9t5d0s2  ONLINE
c9t2d0s8  UNAVAIL  corrupted data
c9t1d0s8  UNAVAIL  corrupted data
c9t0d0s8  UNAVAIL  corrupted data


back story:
I was previously running and using this pool on linux using zfs-fuse.
one day the zfs-fuse daemon behaved strangely. zpool and zfs commands gave a
message about not being able to connect to the daemon. the filesystems for
the pool q were still available and seemed to be working correctly. I
started the zfs-fuse daemon again. I'm not sure if this meant that there
were two deamons running, since the filesystem was still available but I
couldn't get any response from the zfs or zpool commands. I then decided
instead just to reboot.
after rebooting, the pool appeared to import successfully, but `zfs list`
showed no filesystems.
I rebooted again, not really having any better ideas. after that `zpool
import` just hung forever.
I decided I should get off of the fuse/linux implementation and use a more
recent version of zfs in its native environment, so I installed
opensolaris.
I had been running the pool on truecrypt encrypted volumes, so I copied them
off of the encrypted volumes onto blank volumes, and put them on
opensolaris. I got the above when I tried to import.
Now, no idea where to go from here.

It doesn't seem like my data should be just gone - there is no problem with
the physical drives. It seems unlikely that a misbehaving zfs-fuse would
completely corrupt the data of 4 out of 5 drives (or so I am hoping).

Is there any hope for my data? I have some not-very-recent backups of some
fraction of it, but if recovering this is possible that would of course be
infinitely preferable.

-Ethan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with corrupted pool

2010-02-16 Thread Daniel Carosone
On Tue, Feb 16, 2010 at 10:06:13PM -0500, Ethan wrote:
 This is the current state of my pool:
 
 et...@save:~# zpool import
   pool: q
 id: 5055543090570728034
  state: UNAVAIL
 status: One or more devices contains corrupted data.
 action: The pool cannot be imported due to damaged devices or data.
see: http://www.sun.com/msg/ZFS-8000-5E
 config:
 
 q UNAVAIL  insufficient replicas
   raidz1  UNAVAIL  insufficient replicas
 c9t4d0s8  UNAVAIL  corrupted data
 c9t5d0s2  ONLINE
 c9t2d0s8  UNAVAIL  corrupted data
 c9t1d0s8  UNAVAIL  corrupted data
 c9t0d0s8  UNAVAIL  corrupted data
 
 
 back story:
 I was previously running and using this pool on linux using zfs-fuse.

Two things to try:

 - import -F (with -n, first time) on a recent build
 - zdb -l dev for each of the devs above, compare and/or post.  this
   helps ensure that you copied correctly, with respect to all the various
   translations, labelling, partitioning etc differences between the
   platforms.  Since you apparenly got at least one right, hopefully this
   is less of an issue if you did  the same for all. 

--
Dan.



pgp5JUba4jEE6.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with corrupted pool

2010-02-16 Thread Daniel Carosone
On Wed, Feb 17, 2010 at 02:30:28PM +1100, Daniel Carosone wrote:
  c9t4d0s8  UNAVAIL  corrupted data
  c9t5d0s2  ONLINE
  c9t2d0s8  UNAVAIL  corrupted data
  c9t1d0s8  UNAVAIL  corrupted data
  c9t0d0s8  UNAVAIL  corrupted data

  - zdb -l dev for each of the devs above, compare and/or post.  this
helps ensure that you copied correctly, with respect to all the various
translations, labelling, partitioning etc differences between the
platforms.  Since you apparenly got at least one right, hopefully this
is less of an issue if you did  the same for all. 

Actually, looking again, is there any signifigance to the fact that s2
on one disk is ok, and s8 on the others are not?  Perhaps start with
the zdb -l, and make sure you're pointing at the right data before
importing. 

--
Dan.



pgpfx2lwIk5r3.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with corrupted pool

2010-02-16 Thread Ethan
On Tue, Feb 16, 2010 at 22:35, Daniel Carosone d...@geek.com.au wrote:

 On Wed, Feb 17, 2010 at 02:30:28PM +1100, Daniel Carosone wrote:
   c9t4d0s8  UNAVAIL  corrupted data
   c9t5d0s2  ONLINE
   c9t2d0s8  UNAVAIL  corrupted data
   c9t1d0s8  UNAVAIL  corrupted data
   c9t0d0s8  UNAVAIL  corrupted data

   - zdb -l dev for each of the devs above, compare and/or post.  this
 helps ensure that you copied correctly, with respect to all the
 various
 translations, labelling, partitioning etc differences between the
 platforms.  Since you apparenly got at least one right, hopefully this
 is less of an issue if you did  the same for all.

 Actually, looking again, is there any signifigance to the fact that s2
 on one disk is ok, and s8 on the others are not?  Perhaps start with
 the zdb -l, and make sure you're pointing at the right data before
 importing.

 --
 Dan.


I do not know if there is any significance to the okay disk being s2 and the
others s8 - in fact I do not know what the numbers mean at all, being out of
my element in opensolaris (but trying to learn as much as quickly as I
can).
As for the copying, all that I did was `dd if=the truecrypt volume of=the
new disk` for each of the five disks.
Output of zdb -l looks identical for each of the five volumes, apart from
guid. I have pasted one of them below.
Thanks for your help.

-Ethan


et...@save:/dev# zdb -l dsk/c9t2d0s8

LABEL 0

version=13
name='q'
state=1
txg=361805
pool_guid=5055543090570728034
hostid=8323328
hostname='that'
top_guid=441634638335554713
guid=13840197833631786818
vdev_tree
type='raidz'
id=0
guid=441634638335554713
nparity=1
metaslab_array=23
metaslab_shift=32
ashift=9
asize=7501483868160
is_log=0
children[0]
type='disk'
id=0
guid=459016284133602
path='/dev/mapper/truecrypt3'
whole_disk=0
children[1]
type='disk'
id=1
guid=12502103998258102871
path='/dev/mapper/truecrypt2'
whole_disk=0
children[2]
type='disk'
id=2
guid=13840197833631786818
path='/dev/mapper/truecrypt1'
whole_disk=0
children[3]
type='disk'
id=3
guid=3763020893739678459
path='/dev/mapper/truecrypt5'
whole_disk=0
children[4]
type='disk'
id=4
guid=4929061713231157616
path='/dev/mapper/truecrypt4'
whole_disk=0

LABEL 1

version=13
name='q'
state=1
txg=361805
pool_guid=5055543090570728034
hostid=8323328
hostname='that'
top_guid=441634638335554713
guid=13840197833631786818
vdev_tree
type='raidz'
id=0
guid=441634638335554713
nparity=1
metaslab_array=23
metaslab_shift=32
ashift=9
asize=7501483868160
is_log=0
children[0]
type='disk'
id=0
guid=459016284133602
path='/dev/mapper/truecrypt3'
whole_disk=0
children[1]
type='disk'
id=1
guid=12502103998258102871
path='/dev/mapper/truecrypt2'
whole_disk=0
children[2]
type='disk'
id=2
guid=13840197833631786818
path='/dev/mapper/truecrypt1'
whole_disk=0
children[3]
type='disk'
id=3
guid=3763020893739678459
path='/dev/mapper/truecrypt5'
whole_disk=0
children[4]
type='disk'
id=4
guid=4929061713231157616
path='/dev/mapper/truecrypt4'
whole_disk=0

LABEL 2

failed to unpack label 2

LABEL 3

failed to unpack label 3
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with corrupted pool

2010-02-16 Thread Richard Elling
On Feb 16, 2010, at 7:57 PM, Ethan wrote:
 On Tue, Feb 16, 2010 at 22:35, Daniel Carosone d...@geek.com.au wrote:
 On Wed, Feb 17, 2010 at 02:30:28PM +1100, Daniel Carosone wrote:
   c9t4d0s8  UNAVAIL  corrupted data
   c9t5d0s2  ONLINE
   c9t2d0s8  UNAVAIL  corrupted data
   c9t1d0s8  UNAVAIL  corrupted data
   c9t0d0s8  UNAVAIL  corrupted data

slice 8 tends to be tiny and slice 2 is the whole disk, which is why
you can't find label 2 or 3, which are at the end of the disk.

Try exporting the pool and then import.
 -- richard

 
   - zdb -l dev for each of the devs above, compare and/or post.  this
 helps ensure that you copied correctly, with respect to all the various
 translations, labelling, partitioning etc differences between the
 platforms.  Since you apparenly got at least one right, hopefully this
 is less of an issue if you did  the same for all.
 
 Actually, looking again, is there any signifigance to the fact that s2
 on one disk is ok, and s8 on the others are not?  Perhaps start with
 the zdb -l, and make sure you're pointing at the right data before
 importing.
 
 --
 Dan.
 
 
 I do not know if there is any significance to the okay disk being s2 and the 
 others s8 - in fact I do not know what the numbers mean at all, being out of 
 my element in opensolaris (but trying to learn as much as quickly as I can). 
 As for the copying, all that I did was `dd if=the truecrypt volume of=the 
 new disk` for each of the five disks. 
 Output of zdb -l looks identical for each of the five volumes, apart from 
 guid. I have pasted one of them below. 
 Thanks for your help.
 
 -Ethan
 
 
 et...@save:/dev# zdb -l dsk/c9t2d0s8
 
 LABEL 0
 
 version=13
 name='q'
 state=1
 txg=361805
 pool_guid=5055543090570728034
 hostid=8323328
 hostname='that'
 top_guid=441634638335554713
 guid=13840197833631786818
 vdev_tree
 type='raidz'
 id=0
 guid=441634638335554713
 nparity=1
 metaslab_array=23
 metaslab_shift=32
 ashift=9
 asize=7501483868160
 is_log=0
 children[0]
 type='disk'
 id=0
 guid=459016284133602
 path='/dev/mapper/truecrypt3'
 whole_disk=0
 children[1]
 type='disk'
 id=1
 guid=12502103998258102871
 path='/dev/mapper/truecrypt2'
 whole_disk=0
 children[2]
 type='disk'
 id=2
 guid=13840197833631786818
 path='/dev/mapper/truecrypt1'
 whole_disk=0
 children[3]
 type='disk'
 id=3
 guid=3763020893739678459
 path='/dev/mapper/truecrypt5'
 whole_disk=0
 children[4]
 type='disk'
 id=4
 guid=4929061713231157616
 path='/dev/mapper/truecrypt4'
 whole_disk=0
 
 LABEL 1
 
 version=13
 name='q'
 state=1
 txg=361805
 pool_guid=5055543090570728034
 hostid=8323328
 hostname='that'
 top_guid=441634638335554713
 guid=13840197833631786818
 vdev_tree
 type='raidz'
 id=0
 guid=441634638335554713
 nparity=1
 metaslab_array=23
 metaslab_shift=32
 ashift=9
 asize=7501483868160
 is_log=0
 children[0]
 type='disk'
 id=0
 guid=459016284133602
 path='/dev/mapper/truecrypt3'
 whole_disk=0
 children[1]
 type='disk'
 id=1
 guid=12502103998258102871
 path='/dev/mapper/truecrypt2'
 whole_disk=0
 children[2]
 type='disk'
 id=2
 guid=13840197833631786818
 path='/dev/mapper/truecrypt1'
 whole_disk=0
 children[3]
 type='disk'
 id=3
 guid=3763020893739678459
 path='/dev/mapper/truecrypt5'
 whole_disk=0
 children[4]
 type='disk'
 id=4
 guid=4929061713231157616
 path='/dev/mapper/truecrypt4'
 whole_disk=0
 
 LABEL 2
 
 failed to unpack label 2
 
 LABEL 3
 
 failed to unpack label 3
 
 ___
 

Re: [zfs-discuss] Help with corrupted pool

2010-02-16 Thread Ethan
On Tue, Feb 16, 2010 at 23:24, Richard Elling richard.ell...@gmail.comwrote:

 On Feb 16, 2010, at 7:57 PM, Ethan wrote:
  On Tue, Feb 16, 2010 at 22:35, Daniel Carosone d...@geek.com.au wrote:
  On Wed, Feb 17, 2010 at 02:30:28PM +1100, Daniel Carosone wrote:
c9t4d0s8  UNAVAIL  corrupted data
c9t5d0s2  ONLINE
c9t2d0s8  UNAVAIL  corrupted data
c9t1d0s8  UNAVAIL  corrupted data
c9t0d0s8  UNAVAIL  corrupted data

 slice 8 tends to be tiny and slice 2 is the whole disk, which is why
 you can't find label 2 or 3, which are at the end of the disk.

 Try exporting the pool and then import.
  -- richard


The pool is never successfully importing, so I can't export. The import just
gives the output I pasted, and the pool is not imported.
If slice 2 is the whole disk, why is zpool trying to using slice 8 for all
but one disk? Can I explicitly tell zpool to use slice 2 for each device?

-Ethan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with corrupted pool

2010-02-16 Thread Daniel Carosone
On Tue, Feb 16, 2010 at 11:39:39PM -0500, Ethan wrote:
 If slice 2 is the whole disk, why is zpool trying to using slice 8 for all
 but one disk? 

Because it's finding at least part of the labels for the pool member there.

Please check the partition tables of all the disks, and use zdb -l on
the various partitions, to make sure that you haven't got funny
offsets or other problems hiding the data from import. 

In a default solaris label, s2 and s8 start at cylinder 0 but are
vastly different sizes.  You need to arrange for your labels to match
however the data you copied got laid out.

 Can I explicitly tell zpool to use slice 2 for each device?

Not for import, only at creation time.  On import, devices are chosen
by inspection of the zfs labels within.  zdb -l will print those for
you; when you can see all 4 labels for all devices your import has a
much better chance of success.

--
Dan.


pgpCkGexgY2ew.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with corrupted pool

2010-02-16 Thread Ethan
On Tue, Feb 16, 2010 at 23:57, Daniel Carosone d...@geek.com.au wrote:

 On Tue, Feb 16, 2010 at 11:39:39PM -0500, Ethan wrote:
  If slice 2 is the whole disk, why is zpool trying to using slice 8 for
 all
  but one disk?

 Because it's finding at least part of the labels for the pool member there.

 Please check the partition tables of all the disks, and use zdb -l on
 the various partitions, to make sure that you haven't got funny
 offsets or other problems hiding the data from import.

 In a default solaris label, s2 and s8 start at cylinder 0 but are
 vastly different sizes.  You need to arrange for your labels to match
 however the data you copied got laid out.

  Can I explicitly tell zpool to use slice 2 for each device?

 Not for import, only at creation time.  On import, devices are chosen
 by inspection of the zfs labels within.  zdb -l will print those for
 you; when you can see all 4 labels for all devices your import has a
 much better chance of success.

 --
 Dan.


How would I go about arranging labels?
I only see labels 0 and 1 (and do not see labels 2 and 3) on every device,
for both slices 8 (which makes sense if 8 is just part of the drive; the zfs
devices take up the whole drive) and slice 2 (which doesn't seem to make
sense to me).

Since only two of the four labels are showing up for each of the drives on
both slice 2 and slice 8, I guess that causes zpool to not have a preference
between slice 2 and slice 8? So it just picks whichever it sees first, which
happened to be slice 2 for one of the drives, but 8 for the others? (I am
really just guessing at this.)

So, on one hand, the fact that it says slice 2 is online for one drive makes
me think that if I could get it to use slice 2 for the rest maybe it would
work.
On the other hand, the fact that I can't see labels 2 and 3 on slice 2 for
any drive (even the one that says it's online) is worrisome and I want to
figure out what's up with that.

Labels 2 and 3 _do_ show up (and look right) in zdb -l running in zfs-fuse
on linux, on the truecrypt volumes.

If it might just be a matter of arranging the labels so that the beginning
and end of a slice are in the right place, that sounds promising, although I
have no idea how I go about arranging labels. Could you point me in the
direction of what utility I might use or some documentation to get me
started in that direction?

Thanks,
-Ethan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with corrupted pool

2010-02-16 Thread Ethan
On Wed, Feb 17, 2010 at 00:27, Ethan notet...@gmail.com wrote:

 On Tue, Feb 16, 2010 at 23:57, Daniel Carosone d...@geek.com.au wrote:

 On Tue, Feb 16, 2010 at 11:39:39PM -0500, Ethan wrote:
  If slice 2 is the whole disk, why is zpool trying to using slice 8 for
 all
  but one disk?

 Because it's finding at least part of the labels for the pool member
 there.

 Please check the partition tables of all the disks, and use zdb -l on
 the various partitions, to make sure that you haven't got funny
 offsets or other problems hiding the data from import.

 In a default solaris label, s2 and s8 start at cylinder 0 but are
 vastly different sizes.  You need to arrange for your labels to match
 however the data you copied got laid out.

  Can I explicitly tell zpool to use slice 2 for each device?

 Not for import, only at creation time.  On import, devices are chosen
 by inspection of the zfs labels within.  zdb -l will print those for
 you; when you can see all 4 labels for all devices your import has a
 much better chance of success.

 --
 Dan.


 How would I go about arranging labels?
 I only see labels 0 and 1 (and do not see labels 2 and 3) on every device,
 for both slices 8 (which makes sense if 8 is just part of the drive; the zfs
 devices take up the whole drive) and slice 2 (which doesn't seem to make
 sense to me).

 Since only two of the four labels are showing up for each of the drives on
 both slice 2 and slice 8, I guess that causes zpool to not have a preference
 between slice 2 and slice 8? So it just picks whichever it sees first, which
 happened to be slice 2 for one of the drives, but 8 for the others? (I am
 really just guessing at this.)

 So, on one hand, the fact that it says slice 2 is online for one drive
 makes me think that if I could get it to use slice 2 for the rest maybe it
 would work.
 On the other hand, the fact that I can't see labels 2 and 3 on slice 2 for
 any drive (even the one that says it's online) is worrisome and I want to
 figure out what's up with that.

 Labels 2 and 3 _do_ show up (and look right) in zdb -l running in zfs-fuse
 on linux, on the truecrypt volumes.

 If it might just be a matter of arranging the labels so that the beginning
 and end of a slice are in the right place, that sounds promising, although I
 have no idea how I go about arranging labels. Could you point me in the
 direction of what utility I might use or some documentation to get me
 started in that direction?

 Thanks,
 -Ethan


And I just realized - yes, labels 2 and 3 are in the wrong place relative to
the end of the drive; I did not take into account the overhead taken up by
truecrypt when dd'ing the data. The raw drive is 1500301910016 bytes; the
truecrypt volume is 1500301647872 bytes. Off by 262144 bytes - I need a
slice that is sized like the truecrypt volume.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss