Re: [zfs-discuss] cannot replace X with Y: devices have different sector alignment

2012-11-12 Thread Trond Michelsen
On Sat, Nov 10, 2012 at 5:00 PM, Tim Cook t...@cook.ms wrote:
 On Sat, Nov 10, 2012 at 9:48 AM, Jan Owoc jso...@gmail.com wrote:
 On Sat, Nov 10, 2012 at 8:14 AM, Trond Michelsen tron...@gmail.com
 wrote:
 How can I replace the drive without migrating all the data to a
 different pool? It is possible, I hope?
 I had the same problem. I tried copying the partition layout and some
 other stuff but without success. I ended up having to recreate the
 pool and now have a non-mirrored root fs.
 If anyone has figured out how to mirror drives after getting the
 message about sector alignment, please let the list know :-).
 Not happening with anything that exists today.  The only way this would be
 possible is with bp_rewrite which would allow you to evacuate a vdev
 (whether it be for a situation like this, or just to shrink a pool).  What
 you're trying to do is write a block for block copy to a disk that's made up
 of a different block structure.  Not happening.

That is disappointing. I'll probably manage to find a used 2TB drive
with 512b blocksize, so I'm sure I'll be able to keep the pool alive,
but I had planned to swap all 2TB drives for 4TB drives within a year
or so. This is apparently not an option anymore. I'm also a bit
annoyed, because I cannot remember seeing any warnings (other than
performance wise) about mixing 512b and 4kB blocksize discs in a pool,
or any warnings that you'll be severely restricted if you use 512b
blocksize discs at all.

 *insert everyone saying they want bp_rewrite and the guys who have the
 skills to do so saying their enterprise customers have other needs*

bp_rewrite is what's needed to remove vdevs, right? If so, yes, being
able to remove (or replace) a vdev, would've solved my problem.
However, I don't see how this could not be desirable for enterprise
customers. 512b blocksize discs are rapidly disappearing from the
market. Enterprise discs fail ocasionally too, and if 512b blocksize
discs can't be replaced by 4kB blocksize discs, then that effectively
means that you can't replace failed drives on ZFS. I would think that
this is a desirable feature of an enterprise storage solution.

-- 
Trond Michelsen
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cannot replace X with Y: devices have different sector alignment

2012-11-12 Thread Trond Michelsen
On Sat, Nov 10, 2012 at 5:04 PM, Tim Cook t...@cook.ms wrote:
 On Sat, Nov 10, 2012 at 9:59 AM, Jan Owoc jso...@gmail.com wrote:
 Apparently the currently-suggested way (at least in OpenIndiana) is to:
 1) create a zpool on the 4k-native drive
 2) zfs send | zfs receive the data
 3) mirror back onto the non-4k drive

 I can't test it at the moment on my setup - has anyone tested this to
 work?
 That would absolutely work, but it's not really a fix for this situation.
 For OP to do this he'd need 42 new drives (or at least enough drives to
 provide the same capacity as what he's using) to mirror to and then mirror
 back.  The only way this is happening for most people is if they only have a
 very small pool, and have the ability to add an equal amount of storage to
 dump to.  Probably not a big deal if you've only got a handful of drives, or
 if the drives you have are small and you can take downtime.  Likely
 impossible for OP with 42 large drives.

Well, if I have to migrate, I basically have three alternatives:

1. safe way:
  a) buy 24 4TB drives,
  b) migrate everything

2. scary way
  a) buy 6 4TB drives,
  b) migrate about 12TB data to new pool
  c) split all mirror vdevs on old pool, add 4k discs to new pool
  d) migrate remaining data to new pool while holding my breath
  e) destroy old pool and reattach discs to vdevs in new pool

3. slightly less scary way
  a) buy 23 3TB drives
  b) set up new pool with 4x mirrored vdevs and 15x non-redundant vdevs
  c) migrate everything from old pool
  d) detatch 3TB discs from mirrors in old pool and attach to vdevs in new pool

I've got room for the first method, but it'll be prohibitively
expensive, even if I sell the old drives. Until 4TB drives drop below
$100 this won't be a realistic option. I don't think I've got the
nerves to do it the scary way :) The third option is a lot cheaper
than the first, but it'll still be a solid chunk of money, so I'll
probably have to think about that for a bit.


That said, I've already migrated far too many times already. I really,
really don't want to migrate the pool again, if it can be avoided.
I've already migrated from raidz1 to raidz2 and then from raidz2 to
mirror vdevs. Then, even though I already had a mix of 512b and 4k
discs in the pool, when I bought new 3TB discs, I couldn't add them to
the pool, and I had to set up a new pool with ashift=12. In
retrospect, I should have built the new pool without the 2TB drives,
and had I known what I do now, I would definately have done that.

-- 
Trond Michelsen
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cannot replace X with Y: devices have different sector alignment

2012-11-12 Thread Trond Michelsen
On Sat, Nov 10, 2012 at 6:19 PM, Jim Klimov jimkli...@cos.ru wrote:
 On 2012-11-10 17:16, Jan Owoc wrote:
 Any other ideas short of block pointer rewrite?
 A few... one is an idea of what could be the cause: AFAIK the
 ashift value is not so much per-pool as per-toplevel-vdev.
 If the pool started as a set of the 512b drives and was then
 expanded to include sets of 4K drives, this mixed ashift could
 happen...

Now I'm really confused. Turns out, my system is the opposite:

# zdb -C tank | grep ashift
ashift: 12
ashift: 12
ashift: 12
ashift: 12
ashift: 12
ashift: 12
ashift: 9
ashift: 9
ashift: 9
ashift: 9
ashift: 9
ashift: 9
ashift: 9
ashift: 9
ashift: 9
ashift: 12
ashift: 12
ashift: 12
ashift: 12
ashift: 12
ashift: 12

I had an old pool with ashift=9, and when I tried to add new disks,
zpool wouldn't let me add the new drives. So, I ended up creating a
new pool with ashift=12, and after migrating, destroyed the old pool,
and added the drives to the new. I was told at the time that as long
as the pool is created with ashift=12, new vdevs would have ashift=12
as well. Obviously, that's not the case. I did verify that ashift was
12 after creating the pool, but I apparently did not check after
adding the old drives, because this is the first time I've noticed
that there's any ashift=9 in the pool.

-- 
Trond Michelsen
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cannot replace X with Y: devices have different sector alignment

2012-11-10 Thread Trond Michelsen
On Tue, Sep 25, 2012 at 6:42 PM, LIC mesh licm...@gmail.com wrote:
 The new drive I bought correctly identifies as 4096 byte blocksize!
 So...OI doesn't like it merging with the existing pool.

So... Any solution to this yet?

I've got a 42 drive zpool (21 mirror vdevs) with 12 2TB drives that
has 512byte blocksize. The remaining drives are 3TB with 4k blocksize,
and the pool uses ashift=12. Recently this happened to one of the 2TB
drives:

mirror-13DEGRADED 0 0 0
  c4t5000C5002AA2F8D6d0  UNAVAIL  0 0 0  cannot open
  c4t5000C5002AB4FF17d0  ONLINE   0 0 0

and even though it came back after a reboot, I'd like to swap it for a
new drive. Obviously, all new drives have 4k blocksize, so I decided
to replace both drives in the vdev with 3TB drives. The new drives are
Seagate ST3000DM001-1CH1, and there are already 12 of these in the
pool.

# iostat -En
...
c4t5000C5004DE1EFF2d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: ST3000DM001-1CH1 Revision: CC43 Serial No: Z1F0TKXV
Size: 3000.59GB 3000592982016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

c4t5000C5004DE863F2d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: ST3000DM001-1CH1 Revision: CC43 Serial No: Z1F0VHTG
Size: 3000.59GB 3000592982016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

c4t5000C5004DD3F76Bd0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: ST3000DM001-1CH1 Revision: CC43 Serial No: Z1F0T1QX
Size: 3000.59GB 3000592982016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0


When I try to replace the old drive, I get this error:

# zpool replace tank c4t5000C5002AA2F8D6d0 c4t5000C5004DE863F2d0
cannot replace c4t5000C5002AA2F8D6d0 with c4t5000C5004DE863F2d0:
devices have different sector alignment


How can I replace the drive without migrating all the data to a
different pool? It is possible, I hope?


-- 
Trond Michelsen
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Finding disks [was: # disks per vdev]

2011-07-05 Thread Trond Michelsen
On Tue, Jul 5, 2011 at 12:54 PM, Lanky Doodle lanky_doo...@hotmail.com wrote:
 OK, I have finally settled on hardware;
 2x LSI SAS3081E-R controllers

Beware that this controller does not support drives larger than 2TB.

-- 
Trond Michelsen
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RaidzN blocksize ... or blocksize in general ... and resilver

2010-10-20 Thread Trond Michelsen
On Wed, Oct 20, 2010 at 2:50 PM, Edward Ned Harvey sh...@nedharvey.com wrote:
 One of the above mentioned disks needed to be resilvered yesterday.
 (Actually a 2T disk.)  It has now resilvered 1.12T in 18.5 hrs, and has 10.5
 hrs remaining.  This is a mirror.  The problem would be several times worse
 if it were a raidz.

Is this one of those Advanced format drives (Western Digital EARS or
Samsung F4), which emulates 512 byte sectors? Or is that only a
problem with raidz anyway?

-- 
Trond Michelsen
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] migration / vdev balancing

2010-10-19 Thread Trond Michelsen
Hi.

I have a pool with 3 raidz1 vdevs (5*1,5TB + 5*1,5TB + 5*1TB), and I
want to create 6-disk raidz2 vdevs instead. I've bought 12 2TB drives,
and I already have additional 1,5TB and 1TB drives. My cabinet can
only hold 24 drives (connected to an LSI SAS controller, and a
Supermicro SAS backplane), so the idea is to get rid of the 1TB drives
when I'm done with the migration. During the migration, I'll connect 6
drives to internal SATA-ports.

Anyway - I'm wondering what is the best way to migrate the data in
this system? I'm assuming that upgrading a raidz1 vdev to raidz2 is
not possible, and I have to create a new pool, zfs send all the
datasets and destroy the old pool. Is that correct?

The current pool holds 16TB, and with 2*6 2TB drives in raidz2, I
should also get 16TB, so it should be possible to just zfs send all
datasets to the new pool, destroy the old, and add the 1,5TB drives to
the new pool. This should be fairly straightforward, but the way I
understand it, if I do this, I'll end up with two completely full
vdevs and two completely empty, and I'd prefer if the vdevs were
fairly evenly balanced.

What's the best way to balance the pool? Creating a temporary pool
with the 1TB-drives, zfs send each dataset there, destroy it from the
mainpool and zfs send it back?

This will copy all data three times (once for the first migration,
then out and back in again), so it'll probably take a few days. But,
if I understand how zfs writes data to the vdevs correctly, they
should be at least become pretty well balanced at the end of the
process.

Are my assumptions correct? Are there any better/faster ways?

-- 
Trond Michelsen
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss