Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-23 Thread Paul Schlie
It also wouldn't be a bad idea for ZFS to also verify drives designated as
hot spares in fact have sufficient capacity to be compatible replacements
for particular configurations, prior to actually being critically required
(as if drives otherwise appearing to have equivalent capacity may not, it
wouldn't be a nice thing to first discover upon attempted replacement of a
failed drive).


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-23 Thread Blake
+1

On Thu, Jan 22, 2009 at 11:12 PM, Paul Schlie sch...@comcast.net wrote:
 It also wouldn't be a bad idea for ZFS to also verify drives designated as
 hot spares in fact have sufficient capacity to be compatible replacements
 for particular configurations, prior to actually being critically required
 (as if drives otherwise appearing to have equivalent capacity may not, it
 wouldn't be a nice thing to first discover upon attempted replacement of a
 failed drive).


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-22 Thread Dale Sears
Would this work?  (to get rid of an EFI label).

dd if=/dev/zero of=/dev/dsk/thedisk bs=1024k count=1

Then use

format

format might complain that the disk is not labeled.  You
can then label the disk.

Dale



Antonius wrote:
 can you recommend a walk-through for this process, or a bit more of a 
 description? I'm not quite sure how I'd use that utility to repair the EFI 
 label
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-22 Thread Antonius
yes, that's exactly what I did. the issue is that I can't get the corrected 
label to be written once I've zero'd the drive. I get and error from fdisk that 
apparently views the backup label
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-22 Thread Jonathan Edwards
not quite .. it's 16KB at the front and 8MB back of the disk (16384  
sectors) for the Solaris EFI - so you need to zero out both of these

of course since these drives are 1TB you i find it's easier to format  
to SMI (vtoc) .. with format -e (choose SMI, label, save, validate -  
then choose EFI)

but to Casper's point - you might want to make sure that fdisk is  
using the whole disk .. you should probably reinitialize the fdisk  
sectors either with the fdisk command or run fdisk from format (delete  
the partition, create a new partition using 100% of the disk, blah,  
blah) ..

finally - glancing at the format output - there appears to be a mix of  
labels on these disks as you've got a mix c#d# entries and c#t#d#  
entries so i might suspect fdisk might not be consistent across the  
various disks here .. also noticed that you dumped the vtoc for c3d0  
and c4d0, but you're replacing c2d1 (of unknown size/layout) with c1d1  
(never dumped in your emails) .. so while this has been an animated  
(slightly trollish) discussion on right-sizing (odd - I've typically  
only seen that term as an ONTAPism) with some short-stroking digs ..  
it's a little unclear what the c1d1s0 slice looks like here or what  
the cylinder count is - i agree it should be the same - but it would  
be nice to see from my armchair here

On Jan 22, 2009, at 3:32 AM, Dale Sears wrote:

 Would this work?  (to get rid of an EFI label).

   dd if=/dev/zero of=/dev/dsk/thedisk bs=1024k count=1

 Then use

   format

 format might complain that the disk is not labeled.  You
 can then label the disk.

 Dale



 Antonius wrote:
 can you recommend a walk-through for this process, or a bit more of  
 a description? I'm not quite sure how I'd use that utility to  
 repair the EFI label
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-21 Thread Antonius
you mentioned one, so what do you recomend as a workaround?.
I've tried re-initialing the disks on another system's HW RAID controller, but 
still get the same error.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-21 Thread Casper . Dik


The user DEFINITELY isn't expecting 5 bytes, or what you meant to
say 5000 bytes, they're expecting 500GB.  You know, 536,870,912,000
bytes.  But even if the drive mfg's calculated it correctly, they wouldn't
even be getting that due to filesystem overhead.

Then you have a very stupid user who is been living in a cave.

The only reason why we incorrect label memory is because the systems are
binary.   (Incorrect, because there's one standard and it says that
K, M, G and T are powers of 10.)
The computer cannot efficiently address non-binary sized memory.

IIRC, some stupid user did indeed sue WD and he won, but that is in
America (I'm sure that the km is 1024 meters in the US)

Since that lawsuit the vendors all make sure that the specification says 
how many addressable sectors are in a disk.

You make the right size disk a big issue.  And perhaps it is, however,
ZFS is out a number of years and noone complained about it before.
It's not just a big priority, it's not even in the list.

File a bug/rfe, if you want this fixed.

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-21 Thread Casper . Dik

so you're suggesting I buy 750s to replace the 500s. then if a 750 fails buy 
another bigger drive again?

Have you filed a bug/rfe to fix this in ZFS in future?

Anyway, you only need to change the 750GB drives if:
- all 500GBs drives are replace by 750GB disks
- and they're all bigger than the newest 750GB

the drives are RMA replacements for the other disks that faulted in the
array before. they are the  same brand, model and model number, apparently
not so under the label though, but no way I could tell that before.

That is really weird.

Or is this, perhaps, because you use a EFI label on the disks and
we now label the disks different?  (I think we make sure that the
ZFS label starts at a 128K offset, now, before it did not)

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-21 Thread Richard Elling
I believe this is an fdisk issue.  But I don't think any
of the fdisk engineers hang out on this forum.

You might try partitioning the disk on another OS.
  -- richard

Antonius wrote:
 I'll attach 2 files of output from 2 disks:
 
 c4d0 is a current member of the zpool that is a sibling (as in a member of 
 the same batch a couple of serial number increments different) of the faulted 
 disk to replace and currently running without issue
 
 and c3d0 is a new disk I got back from as a replacement for a failed disk 
 that's obviously different. it appears the EFI label needs fixing. I just 
 can't get it to stick with any combination of commands I've tried.
 
 eg removing and resetting all partitions with fdisk -e
 and trying to recreate with geometry as per the existing pool members even 
 after trying to dd the first section of all partitions:
 
 bash-3.2# fdisk -A 238:0:0:1:0:254:63:1023:1:976773167 c3d0
 fdisk: EFI partitions must encompass the entire disk
 (input numsect: 976773167 - avail: 976760063)
 
 
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-21 Thread Al Tobey
Grab the AOE driver and pull aoelabinit out of the package.They wrote it 
just for forcing EFI or Sun labels onto disks when the normal Solaris tools get 
in the way.   coraid's website looks like it's broken at the moment, so you may 
need to find it elsewhere on the web.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-21 Thread Antonius
can you recommend a walk-through for this process, or a bit more of a 
description? I'm not quite sure how I'd use that utility to repair the EFI label
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-20 Thread Tim
On Mon, Jan 19, 2009 at 5:39 PM, Adam Leventhal a...@eng.sun.com wrote:

  And again, I say take a look at the market today, figure out a
 percentage,
  and call it done.  I don't think you'll find a lot of users crying foul
 over
  losing 1% of their drive space when they don't already cry foul over the
  false advertising that is drive sizes today.

 Perhaps it's quaint, but 5GB still seems like a lot to me to throw away.


That wasn't a hard number, that was a hypothetical number.  On 750GB drives
I'm only seeing them lose in the area of 300-500MB.



  I have two disks in one of my systems... both maxtor 500GB drives,
 purchased
  at the same time shortly after the buyout.  One is a rebadged Seagate,
 one
  is a true, made in China Maxtor.  Different block numbers... same model
  drive, purchased at the same time.
 
  Wasn't zfs supposed to be about using software to make up for
 deficiencies
  in hardware?  It would seem this request is exactly that...

 That's a fair point, and I do encourage you to file an RFE, but a) Sun has
 already solved this problem in a different way as a company with our
 products
 and b) users already have the ability to right-size drives.

 Perhaps a better solution would be to handle the procedure of replacing a
 disk
 with a slightly smaller one by migrating data and then treating the extant
 disks as slightly smaller as well. This would have the advantage of being
 far
 more dynamic and of only applying the space tax in situations where it
 actually
 applies.


A) Should have a big bright * next to it referencing our packaged storage
solutions.  I've got plenty of 72G Sun drives still lying around that
aren't all identical block numbers ;)  Yes, an RMA is great, but when I've
got spares sitting on the shelf and I lose a drive at 4:40pm on a Friday,
I'm going to stick the spare off the shelf in, call Sun, and put the
replacement back on the shelf on Monday.  /horse beaten

B) I think we can both agree that having to pre-slice every disk that goes
into the system is not a viable long-term solution to this issue.

That being said, your conclusion sounds like a perfectly acceptable/good
idea to me for all of the technical people such as those on this list.

Joe User is another story, but much like adding a single drive to a
raid-z(2) vdev, I doubt that's a target market for Sun at this time.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-20 Thread Moore, Joe
  Ross wrote:
  The problem is they might publish these numbers, but we 
 really have  
  no way of controlling what number manufacturers will 
 choose to use  
  in the future.
 
  If for some reason future 500GB drives all turn out to be slightly  
  smaller than the current ones you're going to be stuck.  Reserving  
  1-2% of space in exchange for greater flexibility in replacing  
  drives sounds like a good idea to me.  As others have said, RAID  
  controllers have been doing this for long enough that even the very  
  basic models do it now, and I don't understand why such simple  
  features like this would be left out of ZFS.

It would certainly be terrible go back to the days where 5% of the filesystem 
space is inaccessible to users, and force the sysadmin to manually change that 
percentage to 0 to get full use of the disk.

Oh wait, UFS still does that, and it's a configurable parameter at mkfs time 
(and can be tuned on the fly)

For a ZFS pool, (until block pointer rewrite capability) this would have to be 
a pool-create-time parameter.  Perhaps a --usable-size=N[%] option which would 
either cut down the size of the EFI slices or fake the disk geometry so the EFI 
label ends early.

Or it would be a small matter of programming to build a perl wrapper for zpool 
create that would accomplish the same thing.

--Joe
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-20 Thread Miles Nordin
 mj == Moore, Joe joe.mo...@siemens.com writes:

mj For a ZFS pool, (until block pointer rewrite capability) this
mj would have to be a pool-create-time parameter. 

naw.  You can just make ZFS do it all the time, like the other storage
vendors do.  no parameters.

You can invent parameter-free ways of turning it off.  For example,

 1. label the disk with an EFI label taking up the whole disk

 2. point ZFS at slice zero instead of the whole disk, like
/dev/dsk/c0t0d0s0 instead of /dev/dsk/c0t0d0

 3. ZFS will then be written to know it's supposed to use the entire
disk instead of writing a new label, but will still behave as
though it owns the disk cache-wise.

-or-

 1. label the disk any way you like

 2. point ZFS at the whole disk, /dev/dsk/c0t0d0.  And make that
whole-disk device name work for all disks no matter what
controller, whether or not they're ``removeable,'' or how they're
labeled, like the equivalent device name does in Linux, FreeBSD,
and Mac OS X.

 3. ZFS should remove your label and write a one-slice EFI label that
doesn't use the entire disk, and rounds down to a
bucketed/quantized/whole-ish number.  If the disk is a replacement
for a component of an existing vdev, the EFI labelsize it picks
will be the *larger* of:

a. The right-size ZFS would have picked if the disk weren't a
   replacement

b. the smallest existing component in the vdev

Most people will not even notice the feature exists except by getting
errors less often.  AIUI this is how it works with other RAID layers,
the cheap and expensive alike among ``hardware'' RAID, and this
common-practice is very ZFS-ish.  except hardware RAID is proprietary
so you cannot determine their exact policy, while in ZFS you would be
able to RTFS and figure it out.

But there is still no need for parameters.  There isn't even a need to
explain the feature to the user.

I guess this has by now become a case of the silly unimportant but
easy-to-understand feature dominating the mailing list because it's so
obvious that everyone's qualified to pipe up with his opinion, so
maybe I'm a bit late and shuld have let it die.


pgpg3DEgtVtdm.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-20 Thread Moore, Joe
 Miles Nordin wrote:
  mj == Moore, Joe joe.mo...@siemens.com writes:
 
 mj For a ZFS pool, (until block pointer rewrite capability) this
 mj would have to be a pool-create-time parameter. 
 
 naw.  You can just make ZFS do it all the time, like the other storage
 vendors do.  no parameters.

Other storage vendors have specific compatibility requirements for the disks 
you are allowed to install in their chassis.

On the other hand, OpenSolaris is intended to work on commodity hardware.

And there is no way to change this after the pool has been created, since after 
that time, the disk size can't be changed.  So whatever policy is used by 
default, it is very important to get it right.


(snip)
 
 Most people will not even notice the feature exists except by getting
 errors less often.  AIUI this is how it works with other RAID layers,
 the cheap and expensive alike among ``hardware'' RAID, and this
 common-practice is very ZFS-ish.  except hardware RAID is proprietary
 so you cannot determine their exact policy, while in ZFS you would be
 able to RTFS and figure it out.

Sysadmins should not be required to RTFS.  Behaviors should be documented in 
other places too.

 
 But there is still no need for parameters.  There isn't even a need to
 explain the feature to the user.

There isn't a need to explain the feature to the user?  That's one of the most 
irresponsible responses I've heard lately.  A user is expecting their 500GB 
disk to be 5 bytes, not 4999500 bytes, unless that feature is explained.

Parameters with reasonable defaults (and a reasonable way to change them) allow 
users who care about the parameter and understand the tradeoffs involved in 
changing from the default to make their system work better.

If I didn't want to be able to tune my system for performance, I would be 
running Windows.  OpenSolaris is about transparency, not just Open Source.

--Joe
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-20 Thread Richard Elling
[I hate to keep dragging this thread forward, but...]

Moore, Joe wrote:
 And there is no way to change this after the pool has been created,
 since after that time, the disk size can't be changed.  So whatever
 policy is used by default, it is very important to get it right.

Today, vdev size can be grown, but not shrunk, on the fly, without
causing any copying of data.  If you need to shrink today, you need
to copy the data.  This is also true of many, but not all, file systems.
  -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-20 Thread Miles Nordin
 jm == Moore, Joe joe.mo...@siemens.com writes:

jm Sysadmins should not be required to RTFS.

I never said they were.  The comparison was between hardware RAID and
ZFS, not between two ZFS alternatives.  The point: other systems'
behavior is enitely secret.  Therefore, secret opaque undiscussed
right-sizing is the baseline.  The industry-wide baseline is not
guaranteeing to use the whole disk no matter what, nor is it building
a flag-ridden partitioning tool with bikeshed HOWTO documentation into
zpool full of multi-paragraph Windows ExPee-style CYA ``are you SURE
you want to use the whole disk, because blah bla blahblah blha
blaaagh'' modal dialog box warnings.

This overdiscussion feels like the way X.509 and IPsec grow and grow,
accomodating every feature dreamed up by people who don't have to
implement or live with the result because each feature is so important
that some day it'd be disastrous not to have it.

jm There isn't a need to explain the feature to the user?  That's
jm one of the most irresponsible responses I've heard lately.

It's fine if you disagree, but the disastrous tone makes no sense.
Other filesystems and RAID layers consume similar amounts of space for
metadata, labels, bitmaps, whatever.  The suggestion is neither
surprising nor harmful, especially compared to the current behavior.

anyway probably none of it matters because of the IDEMA sizes, and the
rewrite/evacuation feature that will hopefully be done a couple years
from now.


pgpfv3oyMNGsm.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-20 Thread Tim
On Tue, Jan 20, 2009 at 2:26 PM, Moore, Joe joe.mo...@siemens.com wrote:


 Other storage vendors have specific compatibility requirements for the
 disks you are allowed to install in their chassis.


And again, the reason for those requirements is 99% about making money, not
a technical one.  If you go back far enough in time, nearly all of them at
some point allowed non-approved disks into the system, or there was firmware
available to flash unsupported drives to make them work.  Heck, if you knew
the right people you could still do that today...




 There isn't a need to explain the feature to the user?  That's one of the
 most irresponsible responses I've heard lately.  A user is expecting their
 500GB disk to be 5 bytes, not 4999500 bytes, unless that feature is
 explained.


The user DEFINITELY isn't expecting 5 bytes, or what you meant to
say 5000 bytes, they're expecting 500GB.  You know, 536,870,912,000
bytes.  But even if the drive mfg's calculated it correctly, they wouldn't
even be getting that due to filesystem overhead.

Funny I haven't seen any posts to the list from you demanding that Sun
release exact specifications for how much overhead is lost to metadata,
snapshots, and filesystem structure...




 Parameters with reasonable defaults (and a reasonable way to change them)
 allow users who care about the parameter and understand the tradeoffs
 involved in changing from the default to make their system work better.

 If I didn't want to be able to tune my system for performance, I would be
 running Windows.  OpenSolaris is about transparency, not just Open Source.


If you fill the disks 100% full, you won't need to worry about performance.
In fact, I would wager if the only space you have left on the device is the
amount you lost to right-sizing, the pool will have already toppled over and
died.

Although I do agree with you, being able to change from the default
behavior, in general, is a good idea.  Agreeing on what that default
behavior should be is probably another issue entirely ;)

I would imagine this could be something set perhaps with a flag in
bootenv.rc (or wherevever deemed appropriate).
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-20 Thread Anton B. Rang
The user DEFINITELY isn't expecting 5 bytes, or what you meant to say 
5000 
bytes, they're expecting 500GB.  You know, 536,870,912,000 bytes.  But even if 
the drive mfg's 
calculated it correctly, they wouldn't even be getting that due to filesystem 
overhead.

I doubt there are any users left in the world that would expect that -- the 
drive manufacturers have made it clear for the past 20 years that 500 GB = 
500*10^9, not 500*2^30.  Even the OS vendors have finally (for the most part) 
started displaying GB instead of GiB.

And again, the reason for [certified devices] is 99% about making money, not a 
technical one.

Yes and no.  From my experience at three storage vendors, it *is* about making 
money (aren't all corporate decisions supposed to be?) but it's less about 
making money by selling overpriced drives than by not *losing* money by trying 
to support hardware that doesn't quite work.  It's a dirty little secret of the 
drive/controller/array industry (and networking, for that matter) that two 
arbitrary pieces of hardware which are supposed to conform to a standard will 
usually, mostly, work together -- but not always, and when they fail, it's very 
difficult to track down (usually impossible in a customer environment).  By 
limiting which drives, controllers, firmware revisions, etc. are supported, we 
reduce the support burden immensely and are able to ensure that we can actually 
test what a customer is using.

A few specific examples I've seen personally:

* SCSI drives with caches that would corrupt data if the mode pages were set 
wrong.
* SATA adapters which couldn't always complete commands simultaneously on 
multiple channels (leading to timeouts or I/O errors).
* SATA controllers which couldn't quite deal with timing at one edge of the 
spec ... and drives which pushed the timing to that edge under the right 
conditions.
* Drive firmware which silently dropped commands when the queue depth got too 
large.

All of these would 'mostly work', especially in desktop use (few outstanding 
commands, no changes to default parameters, no use of task control messages), 
but would fail in other environments in ways that were almost impossible to 
track down with specialized hardware.

When I was in a software-only RAID company, we did support nearly arbitrary 
hardware -- but we had a compatible list of what we'd tested, and for 
everything else, the users were pretty much on their own. That's OK for home 
users, but for critical data, the greatly increased risk is not worth saving a 
few thousand (or even tens of thousands) dollars.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-20 Thread Antonius
so you're suggesting I buy 750s to replace the 500s. then if a 750 fails buy 
another bigger drive again?

the drives are RMA replacements for the other disks that faulted in the array 
before. they are the same brand, model and model number, apparently not so 
under the label though, but no way I could tell that before.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Antonius
yes, it's the same make and model as most of the other disks in the zpool and 
reports the same number of sectors
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Ross
The problem is they might publish these numbers, but we really have no way of 
controlling what number manufacturers will choose to use in the future.

If for some reason future 500GB drives all turn out to be slightly smaller than 
the current ones you're going to be stuck.  Reserving 1-2% of space in exchange 
for greater flexibility in replacing drives sounds like a good idea to me.  As 
others have said, RAID controllers have been doing this for long enough that 
even the very basic models do it now, and I don't understand why such simple 
features like this would be left out of ZFS.

Fair enough, for high end enterprise kit where you want to squeeze every byte 
out of the system (and know you'll be buying Sun drives), you might not want 
this, but it would have been trivial to turn this off for kit like that.  It's 
certainly a lot easier to expand a pool than shrink it!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Blake
I'm going waaay out on a limb here, as a non-programmer...but...

Since the source is open, maybe community members should organize and
work on some sort of sizing algorithm?  I can certainly imagine Sun
deciding to do this in the future - I can also imagine that it's not
at the top of Sun's priority list (most of the devices they deal with
are their own, and perhaps not subject to the right-sizing issue).  If
it matters to the community, why not, as a community, try to
fix/improve zfs in this way?

Again, I've not even looked at the code for block allocation or
whatever it might be called in this case, so I could be *way* off here
:)

Lastly, Antonius, you can try the zpool trick to get this disk
relabeled, I think.  Try 'zpool create temp_pool [problem_disk]' then
'zpool destroy temp_pool]' - this should relabel the disk in question
and set up the defaults that zfs uses.  Can you also run format 
partition  print on one of the existing disks and send the output so
that we can see what the existing disk looks like? (Off-list directly
to me if you prefer).

cheers,
Blake
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Richard Elling
Ross wrote:
 The problem is they might publish these numbers, but we really have no way of 
 controlling what number manufacturers will choose to use in the future.

 If for some reason future 500GB drives all turn out to be slightly smaller 
 than the current ones you're going to be stuck.  Reserving 1-2% of space in 
 exchange for greater flexibility in replacing drives sounds like a good idea 
 to me.  As others have said, RAID controllers have been doing this for long 
 enough that even the very basic models do it now, and I don't understand why 
 such simple features like this would be left out of ZFS.

   

I have added the following text to the best practices guide:

* When a vdev is replaced, the size of the replacements vdev, measured 
by usable
sectors, must be the same or greater than the vdev being replaced. This 
can be
confusing when whole disks are used because different models of disks may
provide a different number of usable sectors. For example, if a pool was 
created
with a 500 GByte drive and you need to replace it with another 500 
GByte
drive, then you may not be able to do so if the drives are not of the 
same make,
model, and firmware revision. Consider planning ahead and reserving some 
space
by creating a slice which is smaller than the whole disk instead of the 
whole disk.

 Fair enough, for high end enterprise kit where you want to squeeze every byte 
 out of the system (and know you'll be buying Sun drives), you might not want 
 this, but it would have been trivial to turn this off for kit like that.  
 It's certainly a lot easier to expand a pool than shrink it!
   

Actually, enterprise customers do not ever want to squeeze every byte, they
would rather have enough margin to avoid such issues entirely.  This is what
I was referring to earlier in this thread wrt planning.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Jim Dunham
Richard,

 Ross wrote:
 The problem is they might publish these numbers, but we really have  
 no way of controlling what number manufacturers will choose to use  
 in the future.

 If for some reason future 500GB drives all turn out to be slightly  
 smaller than the current ones you're going to be stuck.  Reserving  
 1-2% of space in exchange for greater flexibility in replacing  
 drives sounds like a good idea to me.  As others have said, RAID  
 controllers have been doing this for long enough that even the very  
 basic models do it now, and I don't understand why such simple  
 features like this would be left out of ZFS.



 I have added the following text to the best practices guide:

 * When a vdev is replaced, the size of the replacements vdev, measured
 by usable
 sectors, must be the same or greater than the vdev being replaced.  
 This
 can be
 confusing when whole disks are used because different models of  
 disks may
 provide a different number of usable sectors. For example, if a pool  
 was
 created
 with a 500 GByte drive and you need to replace it with another 500
 GByte
 drive, then you may not be able to do so if the drives are not of the
 same make,
 model, and firmware revision. Consider planning ahead and reserving  
 some
 space
 by creating a slice which is smaller than the whole disk instead of  
 the
 whole disk.

Creating a slice, instead of using the whole disk, will cause ZFS to  
not enable write-caching on the underlying device.

- Jim



 Fair enough, for high end enterprise kit where you want to squeeze  
 every byte out of the system (and know you'll be buying Sun  
 drives), you might not want this, but it would have been trivial to  
 turn this off for kit like that.  It's certainly a lot easier to  
 expand a pool than shrink it!


 Actually, enterprise customers do not ever want to squeeze every  
 byte, they
 would rather have enough margin to avoid such issues entirely.  This  
 is what
 I was referring to earlier in this thread wrt planning.
 -- richard

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Adam Leventhal
 Since it's done in software by HDS, NetApp, and EMC, that's complete
 bullshit.  Forcing people to spend 3x the money for a Sun drive that's
 identical to the seagate OEM version is also bullshit and a piss-poor
 answer.

I didn't know that HDS, NetApp, and EMC all allow users to replace their
drives with stuff they've bought at Fry's. Is this still covered by their
service plan or would this only be in an unsupported config?

Thanks.

Adam

-- 
Adam Leventhal, Fishworks http://blogs.sun.com/ahl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Richard Elling
Jim Dunham wrote:
 Richard,
   
 Ross wrote:
 
 The problem is they might publish these numbers, but we really have  
 no way of controlling what number manufacturers will choose to use  
 in the future.

 If for some reason future 500GB drives all turn out to be slightly  
 smaller than the current ones you're going to be stuck.  Reserving  
 1-2% of space in exchange for greater flexibility in replacing  
 drives sounds like a good idea to me.  As others have said, RAID  
 controllers have been doing this for long enough that even the very  
 basic models do it now, and I don't understand why such simple  
 features like this would be left out of ZFS.


   
 I have added the following text to the best practices guide:

 * When a vdev is replaced, the size of the replacements vdev, measured
 by usable
 sectors, must be the same or greater than the vdev being replaced.  
 This
 can be
 confusing when whole disks are used because different models of  
 disks may
 provide a different number of usable sectors. For example, if a pool  
 was
 created
 with a 500 GByte drive and you need to replace it with another 500
 GByte
 drive, then you may not be able to do so if the drives are not of the
 same make,
 model, and firmware revision. Consider planning ahead and reserving  
 some
 space
 by creating a slice which is smaller than the whole disk instead of  
 the
 whole disk.
 

 Creating a slice, instead of using the whole disk, will cause ZFS to  
 not enable write-caching on the underlying device.
   

Correct.  Engineering trade-off.  Since most folks don't read the manual,
or the best practices guide, until after they've hit a problem, it is 
really
just a CYA entry :-(

BTW, I also added a quick link to CR 4852783, reduce pool capacity, which
is the feature which has a good chance of making this point moot.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Miles Nordin
 edm == Eric D Mudama edmud...@bounceswoosh.org writes:

   edm If, instead of having ZFS manage these differences, a user
   edm simply created slices that were, say, 98%

if you're willing to manually create slices, you should be able to
manually enable the write cache, too, while you're in there, so I
wouldn't worry about that.  I'd worry a little about the confusion
over this write cache bit in general---where the write cache setting
is stored and when it's enabled and when (if?) it's disabled, if the
rules differ on each type of disk attachment, and if you plug the disk
into Linux will Linux screw up the setting by auto-enabling at boot or
by auto-disabling at shutdown or does Linux use stateless versions
(analagous to sdparm without --save) when it prints that boot-time
message about enabling write caches?  For example weirdness, on iSCSI
I get this, on a disk to which I've let ZFS write a GPT/EFI label:

write_cache display
Write Cache is disabled
write_cache enable
Write cache setting is not changeable

so is that a bug of my iSCSI target, and is there another implicit
write cache inside the iSCSI initiator or not?  The Linux hdparm man
page says:

   -W Disable/enable  the  IDE  drive's write-caching feature (default
  state is undeterminable; manufacturer/model specific).

so is the write_cache 'display' feature in 'format -e' actually
reliable?  Or is it impossible to reliably read this setting on an ATA
drive, and 'format -e' is making stuff up?

With Linux I can get all kinds of crazy caching data from a SATA disk:

r...@node0 ~ # sdparm --page=ca --long /dev/sda
/dev/sda: ATA   WDC WD1000FYPS-0  02.0
Caching (SBC) [PS=0] mode page:
  IC  0  Initiator control
  ABPF0  Abort pre-fetch
  CAP 0  Caching analysis permitted
  DISC0  Discontinuity
  SIZE0  Size (1-CSS valid, 0-NCS valid)
  WCE 1  Write cache enable
  MF  0  Multiplication factor
  RCD 0  Read cache disable
  DRRP0  Demand read retension priority
  WRP 0  Write retension priority
  DPTL0  Disable pre-fetch transfer length
  MIPF0  Minimum pre-fetch
  MAPF0  Maximum pre-fetch
  MAPFC   0  Maximum pre-fetch ceiling
  FSW 0  Force sequential write
  LBCSS   0  Logical block cache segment size
  DRA 0  Disable read ahead
  NV_DIS  0  Non-volatile cache disable
  NCS 0  Number of cache segments
  CSS 0  Cache segment size

but what's actually coming from the drive, and what's fabricated by
the SCSI-to-SATA translator built into Garzik's libata?  Because I
think Solaris has such a translator, too, if it's attaching sd to SATA
disks.  I'm guessing it's all a fantasy because:

r...@node0 ~ # sdparm --clear=WCE /dev/sda
/dev/sda: ATA   WDC WD1000FYPS-0  02.0
change_mode_page: failed setting page: Caching (SBC)

but neverminding the write cache, I'd be happy saying ``just round
down disk sizes using the labeling tool instead of giving ZFS the
whole disk, if you care,'' IF the following things were true:

 * doing so were written up as a best-practice.  because, I think it's
   a best practice if the rest of the storage industry from EMC to $15
   promise cards is doing it, though maybe it's not important any more
   because of IDEMA.  And right now very few people are likely to have
   done it because of the way they've been guided into the setup process.

 * it were possible to do this label-sizing to bootable mirrors in the
   various traditional/IPS/flar/jumpstart installers

 * there weren't a proliferation of = 4 labeling tools in Solaris,
   each riddled with assertion bailouts and slightly different
   capabilities.  Linux also has a mess of labeling tools, but they're
   less assertion-riddled, and usually you can pick one and use it for
   everything---you don't have to drag out a different tool for USB
   sticks because they're considered ``removeable.''  Also it's always
   possible to write to the unpartitioned block device with 'dd' on
   Linux (and FreeBSD and Mac OS X), no matter what label is on the
   disk, while Solaris doesn't seem to have an unpartitioned device.
   And finally the Linux formatting tools work by writing to this
   unpartitioned device, not by calling into a rat's nest of ioctl's,
   so they're much easier for me to get along with.

   Part of the attraction of ZFS should be avoiding this messy part of
   Solaris, but we still have to use format/fmthard/fdisk/rmformat, to
   swap label types because ZFS won't, to frob the write cache because
   ZFS's user interface is too simple and does that semi-automatically
   though I'm not sure all the rules it's using, to enumerate the
   installed disks, to determine in which of the several states
   working / connected-but-not-identified / disconnected /
   disconnected-but-refcounted the iSCSI initiator is in.

   And while ZFS will do special things to an UNlabeled disk, I'm not
   

Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Tim
On Mon, Jan 19, 2009 at 11:05 AM, Adam Leventhal a...@eng.sun.com wrote:

  Since it's done in software by HDS, NetApp, and EMC, that's complete
  bullshit.  Forcing people to spend 3x the money for a Sun drive that's
  identical to the seagate OEM version is also bullshit and a piss-poor
  answer.

 I didn't know that HDS, NetApp, and EMC all allow users to replace their
 drives with stuff they've bought at Fry's. Is this still covered by their
 service plan or would this only be in an unsupported config?

 Thanks.

 Adam



So because an enterprise vendor requires you to use their drives in their
array, suddenly zfs can't right-size?  Vendor requirements have absolutely
nothing to do with their right-sizing, and everything to do with them
wanting your money.

Are you telling me zfs is deficient to the point it can't handle basic
right-sizing like a 15$ sata raid adapter?

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Julien Gabel
 Creating a slice, instead of using the whole disk, will cause ZFS to
 not enable write-caching on the underlying device.

 Correct.  Engineering trade-off.  Since most folks don't read the manual,
 or the best practices guide, until after they've hit a problem, it is really
 just a CYA entry :-(

It seems this trade-off can now be mitigated, regarding Roch Bourbonnais
comment on a another thread on this list:
- http://mail.opensolaris.org/pipermail/zfs-discuss/2009-January/054587.html

In particular:
 If ZFS owns a disk it will enable the write cache on the drive but I'm
  not positive this has a great performance impact today.  It used to
  but that was before we had a proper NCQ implementation.  Today
  I don't know that it helps much.  That this is because we always
  flush the cache when consistency requires it.

-- 
julien.
http://blog.thilelli.net/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Adam Leventhal
   Since it's done in software by HDS, NetApp, and EMC, that's complete
   bullshit.  Forcing people to spend 3x the money for a Sun drive that's
   identical to the seagate OEM version is also bullshit and a piss-poor
   answer.
 
  I didn't know that HDS, NetApp, and EMC all allow users to replace their
  drives with stuff they've bought at Fry's. Is this still covered by their
  service plan or would this only be in an unsupported config?
 
 So because an enterprise vendor requires you to use their drives in their
 array, suddenly zfs can't right-size?  Vendor requirements have absolutely
 nothing to do with their right-sizing, and everything to do with them
 wanting your money.

Sorry, I must have missed your point. I thought that you were saying that
HDS, NetApp, and EMC had a different model. Were you merely saying that the
software in those vendors' products operates differently than ZFS?

 Are you telling me zfs is deficient to the point it can't handle basic
 right-sizing like a 15$ sata raid adapter?

How do there $15 sata raid adapters solve the problem? The more details you
could provide the better obviously.

Adam

-- 
Adam Leventhal, Fishworks http://blogs.sun.com/ahl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Bob Friesenhahn
On Mon, 19 Jan 2009, Adam Leventhal wrote:

 Are you telling me zfs is deficient to the point it can't handle basic
 right-sizing like a 15$ sata raid adapter?

 How do there $15 sata raid adapters solve the problem? The more details you
 could provide the better obviously.

It is really quite simple.  If the disk is resilvered but the new 
drive is a bit too small, then the RAID card might tell you that a bit 
of data might have lost in the last sectors, or it may just assume 
that you didn't need that data, or maybe a bit of cryptic message text 
scrolls off the screen a split second after it has been issued.  Or if 
you try to write at the end of the volume and one of the replacement 
drives is a bit too short, then the RAID card may return a hard read 
or write error.  Most filesystems won't try to use that last bit of 
space anyway since they run real slow when the disk is completely 
full, or their flimsy formatting algorithm always wastes a bit of the 
end of the disk.  Only ZFS is rash enough to use all of the space 
provided to it, and actually expect that the space continues to be 
usable.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Tim
On Mon, Jan 19, 2009 at 12:39 PM, Adam Leventhal a...@eng.sun.com wrote:


 Sorry, I must have missed your point. I thought that you were saying that
 HDS, NetApp, and EMC had a different model. Were you merely saying that the
 software in those vendors' products operates differently than ZFS?


Gosh, was the point that hard to get?  Let me state it a fourth time:  They
all short stroke the disks to avoid the CF that results in all drives not
adhering to a strict sizing standard.




  Are you telling me zfs is deficient to the point it can't handle basic
  right-sizing like a 15$ sata raid adapter?

 How do there $15 sata raid adapters solve the problem? The more details you
 could provide the better obviously.


They short stroke the disk so that when you buy a new 500GB drive that isn't
the exact same number of blocks you aren't screwed.  It's a design choice to
be both sane, and to make the end-users life easier.  You know, sort of like
you not letting people choose their raid layout...

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Tim
On Mon, Jan 19, 2009 at 1:12 PM, Bob Friesenhahn 
bfrie...@simple.dallas.tx.us wrote:

 On Mon, 19 Jan 2009, Adam Leventhal wrote:


  Are you telling me zfs is deficient to the point it can't handle basic
 right-sizing like a 15$ sata raid adapter?


 How do there $15 sata raid adapters solve the problem? The more details
 you
 could provide the better obviously.


 It is really quite simple.  If the disk is resilvered but the new drive is
 a bit too small, then the RAID card might tell you that a bit of data might
 have lost in the last sectors, or it may just assume that you didn't need
 that data, or maybe a bit of cryptic message text scrolls off the screen a
 split second after it has been issued.  Or if you try to write at the end of
 the volume and one of the replacement drives is a bit too short, then the
 RAID card may return a hard read or write error.  Most filesystems won't try
 to use that last bit of space anyway since they run real slow when the disk
 is completely full, or their flimsy formatting algorithm always wastes a bit
 of the end of the disk.  Only ZFS is rash enough to use all of the space
 provided to it, and actually expect that the space continues to be usable.



It's a horribly *bad thing* to not use the entire disk and right-size it for
sanity's sake.  That's why Sun currently sells arrays that do JUST THAT.

I'd wager fishworks does just that as well.  Why don't you open source that
code and prove me wrong ;)

I'm wondering why they don't come right out with it and say we want to
intentionally make this painful to our end users so that they buy our
packaged products.  It'd be far more honest and productive than this
pissing match.


--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Adam Leventhal
On Mon, Jan 19, 2009 at 01:35:22PM -0600, Tim wrote:
   Are you telling me zfs is deficient to the point it can't handle basic
   right-sizing like a 15$ sata raid adapter?
 
  How do there $15 sata raid adapters solve the problem? The more details you
  could provide the better obviously.
 
 They short stroke the disk so that when you buy a new 500GB drive that isn't
 the exact same number of blocks you aren't screwed.  It's a design choice to
 be both sane, and to make the end-users life easier.  You know, sort of like
 you not letting people choose their raid layout...

Drive vendors, it would seem, have an incentive to make their 500GB drives
as small as possible. Should ZFS then choose some amount of padding at the
end of each device and chop it off as insurance against a slightly smaller
drive? How much of the device should it chop off? Conversely, should users
have the option to use the full extent of the drives they've paid for, say,
if they're using a vendor that already provides that guarantee?

 You know, sort of like you not letting people choose their raid layout...

Yes, I'm not saying it shouldn't be done. I'm asking what the right answer
might be.

Adam

-- 
Adam Leventhal, Fishworks http://blogs.sun.com/ahl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Richard Elling
Tim wrote:
 On Mon, Jan 19, 2009 at 1:12 PM, Bob Friesenhahn 
 bfrie...@simple.dallas.tx.us mailto:bfrie...@simple.dallas.tx.us wrote:
 
 On Mon, 19 Jan 2009, Adam Leventhal wrote:
 
 
 Are you telling me zfs is deficient to the point it can't
 handle basic
 right-sizing like a 15$ sata raid adapter?
 
 
 How do there $15 sata raid adapters solve the problem? The more
 details you
 could provide the better obviously.

Note that for the LSI RAID controllers Sun uses on many products,
if you take a disk that was JBOD and tell the controller to make
it RAIDed, then the controller will relabel the disk for you and
will cause you to lose the data.  As best I can tell, ZFS is better
in that it will protect your data rather than just relabeling and
clobbering your data.  AFAIK, NVidia and others do likewise.

 It is really quite simple.  If the disk is resilvered but the new
 drive is a bit too small, then the RAID card might tell you that a
 bit of data might have lost in the last sectors, or it may just
 assume that you didn't need that data, or maybe a bit of cryptic
 message text scrolls off the screen a split second after it has been
 issued.  Or if you try to write at the end of the volume and one of
 the replacement drives is a bit too short, then the RAID card may
 return a hard read or write error.  Most filesystems won't try to
 use that last bit of space anyway since they run real slow when the
 disk is completely full, or their flimsy formatting algorithm always
 wastes a bit of the end of the disk.  Only ZFS is rash enough to use
 all of the space provided to it, and actually expect that the space
 continues to be usable.
 
 
 
 It's a horribly *bad thing* to not use the entire disk and right-size it 
 for sanity's sake.  That's why Sun currently sells arrays that do JUST 
 THAT.  

??

 I'd wager fishworks does just that as well.  Why don't you open source 
 that code and prove me wrong ;)

I don't think so, because fishworks is an engineering team and I
don't think I can reserve space on a person... at least not legally
where I live :-)

But this is not a problem for the Sun Storage 7000 systems because
the supported disks are already right-sized.

 I'm wondering why they don't come right out with it and say we want to 
 intentionally make this painful to our end users so that they buy our 
 packaged products.  It'd be far more honest and productive than this 
 pissing match.

I think that if there is enough real desire for this feature,
then someone would file an RFE on http://bugs.opensolaris.org
It would help to attach diffs to the bug and it would help to
reach a concensus of the amount of space to be reserved prior
to filing.  This is not an intractable problem and easy workarounds
already exist, but if ease of use is more valuable than squeezing
every last block, then the RFE should fly.
  -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Tim
On Mon, Jan 19, 2009 at 2:55 PM, Adam Leventhal a...@eng.sun.com wrote:

 Drive vendors, it would seem, have an incentive to make their 500GB
 drives
 as small as possible. Should ZFS then choose some amount of padding at the
 end of each device and chop it off as insurance against a slightly smaller
 drive? How much of the device should it chop off? Conversely, should users
 have the option to use the full extent of the drives they've paid for, say,
 if they're using a vendor that already provides that guarantee?


Drive vendors, it would seem, have incentive to make their 500GB drives as
cheap as possible.  The two are not necessarily one and the same.

And again, I say take a look at the market today, figure out a percentage,
and call it done.  I don't think you'll find a lot of users crying foul over
losing 1% of their drive space when they don't already cry foul over the
false advertising that is drive sizes today.

In any case, you might as well can ZFS entirely because it's not really fair
that users are losing disk space to raid and metadata... see where this
argument is going?

I really, REALLY doubt you're going to have users screaming at you for
losing 1% (or whatever the figure ends up being) to a right-sizing
algorithm.  In fact, I would bet the average user will NEVER notice if you
don't tell them ahead of time.  Sort of like the average user had absolutely
no clue that 500GB drives were of slightly differing block numbers, and he'd
end up screwed six months down the road if he couldn't source an identical
drive.

I have two disks in one of my systems... both maxtor 500GB drives, purchased
at the same time shortly after the buyout.  One is a rebadged Seagate, one
is a true, made in China Maxtor.  Different block numbers... same model
drive, purchased at the same time.

Wasn't zfs supposed to be about using software to make up for deficiencies
in hardware?  It would seem this request is exactly that...





  You know, sort of like you not letting people choose their raid layout...

 Yes, I'm not saying it shouldn't be done. I'm asking what the right answer
 might be.


The *right answer* in simplifying storage is not manually slice up every
disk you insert into the system to avoid this issue.

The right answer is right-size by default, give admins the option to skip
it if they really want.  Sort of like I'd argue the right answer on the
7000 is to give users the raid options you do today by default, and allow
them to lay it out themselves from some sort of advanced *at your own risk*
mode, whether that be command line (the best place I'd argue) or something
else.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Adam Leventhal
 And again, I say take a look at the market today, figure out a percentage,
 and call it done.  I don't think you'll find a lot of users crying foul over
 losing 1% of their drive space when they don't already cry foul over the
 false advertising that is drive sizes today.

Perhaps it's quaint, but 5GB still seems like a lot to me to throw away.

 In any case, you might as well can ZFS entirely because it's not really fair
 that users are losing disk space to raid and metadata... see where this
 argument is going?

Well, I see where this _specious_ argument is going.

 I have two disks in one of my systems... both maxtor 500GB drives, purchased
 at the same time shortly after the buyout.  One is a rebadged Seagate, one
 is a true, made in China Maxtor.  Different block numbers... same model
 drive, purchased at the same time.
 
 Wasn't zfs supposed to be about using software to make up for deficiencies
 in hardware?  It would seem this request is exactly that...

That's a fair point, and I do encourage you to file an RFE, but a) Sun has
already solved this problem in a different way as a company with our products
and b) users already have the ability to right-size drives.

Perhaps a better solution would be to handle the procedure of replacing a disk
with a slightly smaller one by migrating data and then treating the extant
disks as slightly smaller as well. This would have the advantage of being far
more dynamic and of only applying the space tax in situations where it actually
applies.

Adam

-- 
Adam Leventhal, Fishworks http://blogs.sun.com/ahl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-19 Thread Blake
So the place we are arriving is to push the RFE for shrinkable pools?

Warning the user about the difference in actual drive size, then
offering to shrink the pool to allow a smaller device seems like a
nice solution to this problem.

The ability to shrink pools might be very useful in other situations.
Say I built server that once did a decent amount of iops using SATA
disks, and now that the workloads iops is greatly increased (busy
database?), I need SAS disks.  If I'd originally bought 500gb SATA
(current sweet spot) disks, I might have a lot of empty space in my
pool.  Shrinking the pool would allow me to migrate to smaller
(capacity) SAS disks with much better seek times, without being forced
to buy 2x as many disks due to the higher cost/gb of SAS.

I think I remember an RFE for shrinkable pools, but can't find it -
can someone post a link if they know where it is?

cheers,
Blake
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-18 Thread dick hoogendijk
On Sat, 17 Jan 2009 23:18:35 PST
Antonius antoni...@gmail.com wrote:

Maybe the other disk has an EFI label?

-- 
Dick Hoogendijk -- PGP/GnuPG key: 01D2433D
+ http://nagual.nl/ | SunOS sxce snv105 ++
+ All that's really worth doing is what we do for others (Lewis Carrol)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-18 Thread Casper . Dik


So you're saying zfs does absolutely no right-sizing?  That sounds like a
bad idea all around...

You can use a bigger disk; NOT a smaller disk.

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-18 Thread Antonius
If so what should I do to remedy that? just reformat it?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-18 Thread JZ
meh


- Original Message - 
From: Antonius antoni...@gmail.com
To: zfs-discuss@opensolaris.org
Sent: Sunday, January 18, 2009 6:54 AM
Subject: Re: [zfs-discuss] replace same sized disk fails with too small 
error


 If so what should I do to remedy that? just reformat it?
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-18 Thread Tim
On Sun, Jan 18, 2009 at 5:18 AM, casper@sun.com wrote:



 So you're saying zfs does absolutely no right-sizing?  That sounds like a
 bad idea all around...

 You can use a bigger disk; NOT a smaller disk.

 Casper


Right, which is an absolutely piss poor design decision and why every major
storage vendor right-sizes drives.  What happens if I have an old maxtor
drive in my pool whose 500g is just slightly larger than every other mfg
on the market?  You know, the one who is no longer making their own drives
since being purchased by seagate.  I can't replace the drive anymore?
*GREAT*.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-18 Thread Adam Leventhal
 Right, which is an absolutely piss poor design decision and why  
 every major storage vendor right-sizes drives.  What happens if I  
 have an old maxtor drive in my pool whose 500g is just slightly  
 larger than every other mfg on the market?  You know, the one who is  
 no longer making their own drives since being purchased by seagate.   
 I can't replace the drive anymore?  *GREAT*.


Sun does right size our drives. Are we talking about replacing a  
device bought from sun with another device bought from Sun? If these  
are just drives that fell off the back of some truck, you may not have  
that assurance.

Adam

--
Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-18 Thread Casper . Dik


Right, which is an absolutely piss poor design decision and why every major
storage vendor right-sizes drives.  What happens if I have an old maxtor
drive in my pool whose 500g is just slightly larger than every other mfg
on the market?  You know, the one who is no longer making their own drives
since being purchased by seagate.  I can't replace the drive anymore?
*GREAT*.

With a larger drive.

Who can replace drives with smaller drives?

What exactly does right size drives mean?  They don't use all of the 
disk?

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-18 Thread Bob Friesenhahn
On Sun, 18 Jan 2009, Tim wrote:
 Right, which is an absolutely piss poor design decision and why every major
 storage vendor right-sizes drives.  What happens if I have an old maxtor
 drive in my pool whose 500g is just slightly larger than every other mfg
 on the market?  You know, the one who is no longer making their own drives
 since being purchased by seagate.  I can't replace the drive anymore?
 *GREAT*.

I appreciate that in these times of financial hardship that you can 
not afford a 750GB drive to replace the oversized 500GB drive.  Sorry
to hear about your situation.

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-18 Thread Will Murnane
On Sun, Jan 18, 2009 at 16:51, Bob Friesenhahn
bfrie...@simple.dallas.tx.us wrote:
 I appreciate that in these times of financial hardship that you can
 not afford a 750GB drive to replace the oversized 500GB drive.  Sorry
 to hear about your situation.
That's easy to say, but what if there were no larger alternative?
Suppose I have a pool composed of those 1.5TB Seagate disks, and
Hitachi puts out some of the same capacity that are actually
slightly smaller.  A drive fails in my array, I buy a Hitachi disk to
replace it, and it doesn't work.  If I can't get a large enough drive
to replace the missing disk with, it'd be a shame to have to destroy
and recreate the pool on smaller media.

Perhaps this is yet another problem that can be solved with BP
rewrite.  If zpool replace detects that a disk is slightly smaller
but not so small that it can't hold all the data, warn the user first
but then allow them to replace the disk anyways.

Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-18 Thread Tim
On Sun, Jan 18, 2009 at 10:17 AM, casper@sun.com wrote:



 Right, which is an absolutely piss poor design decision and why every
 major
 storage vendor right-sizes drives.  What happens if I have an old maxtor
 drive in my pool whose 500g is just slightly larger than every other mfg
 on the market?  You know, the one who is no longer making their own drives
 since being purchased by seagate.  I can't replace the drive anymore?
 *GREAT*.

 With a larger drive.

 Who can replace drives with smaller drives?

 What exactly does right size drives mean?  They don't use all of the
 disk?

 Casper



right-sizing is when the volume manager short strokes the drive
intentionally because not all vendors 500GB is the same size.  Hence the
OP's problem.

How aggressive the short-stroke is, depends on the OEM.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-18 Thread Bob Friesenhahn
On Sun, 18 Jan 2009, Will Murnane wrote:
 That's easy to say, but what if there were no larger alternative?
 Suppose I have a pool composed of those 1.5TB Seagate disks, and
 Hitachi puts out some of the same capacity that are actually
 slightly smaller.  A drive fails in my array, I buy a Hitachi disk to
 replace it, and it doesn't work.  If I can't get a large enough drive
 to replace the missing disk with, it'd be a shame to have to destroy
 and recreate the pool on smaller media.

What do you propose that OpenSolaris should do about this?  Should 
OpenSolaris use some sort of a table of common size drives, or use 
an algorithm which determines certain discrete usage values based on 
declared drive sizes and a margin for error?  What should OpenSolaris 
of today do with the 20TB disk drives of tomorrow?  What should the 
margin for error of a 30TB disk drive be?  Is it ok to arbitrarily 
ignore 3/4TB of storage space?

If the drive is actually a huge 20TB LUN exported from a SAN RAID 
array, how should the margin for error be handled in that case?

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-18 Thread Tim
On Sun, Jan 18, 2009 at 10:16 AM, Adam Leventhal a...@eng.sun.com wrote:

 Right, which is an absolutely piss poor design decision and why every major
 storage vendor right-sizes drives.  What happens if I have an old maxtor
 drive in my pool whose 500g is just slightly larger than every other mfg
 on the market?  You know, the one who is no longer making their own drives
 since being purchased by seagate.  I can't replace the drive anymore?
  *GREAT*.



 Sun does right size our drives. Are we talking about replacing a device
 bought from sun with another device bought from Sun? If these are just
 drives that fell off the back of some truck, you may not have that
 assurance.

 Adam

 --
 Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl



Since it's done in software by HDS, NetApp, and EMC, that's complete
bullshit.  Forcing people to spend 3x the money for a Sun drive that's
identical to the seagate OEM version is also bullshit and a piss-poor
answer.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-18 Thread Tim
On Sun, Jan 18, 2009 at 12:19 PM, Bob Friesenhahn 
bfrie...@simple.dallas.tx.us wrote:

 On Sun, 18 Jan 2009, Will Murnane wrote:
  That's easy to say, but what if there were no larger alternative?
  Suppose I have a pool composed of those 1.5TB Seagate disks, and
  Hitachi puts out some of the same capacity that are actually
  slightly smaller.  A drive fails in my array, I buy a Hitachi disk to
  replace it, and it doesn't work.  If I can't get a large enough drive
  to replace the missing disk with, it'd be a shame to have to destroy
  and recreate the pool on smaller media.

 What do you propose that OpenSolaris should do about this?  Should
 OpenSolaris use some sort of a table of common size drives, or use
 an algorithm which determines certain discrete usage values based on
 declared drive sizes and a margin for error?  What should OpenSolaris
 of today do with the 20TB disk drives of tomorrow?  What should the
 margin for error of a 30TB disk drive be?  Is it ok to arbitrarily
 ignore 3/4TB of storage space?

 If the drive is actually a huge 20TB LUN exported from a SAN RAID
 array, how should the margin for error be handled in that case?

 Bob
 ==
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/


Take a look at drives on the market, figure out a percentage, and call it a
day.  If there's a significant issue with 20TB drives of the future, issue
a bug report and a fix, just like every other issue that comes up.


--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-18 Thread Ellis, Mike
Does this all go away when BP-rewrite gets fully resolved/implemented?

Short of the pool being 100% full, it should allow a rebalancing
operation and possible LUN/device-size-shrink to match the new device
that is being inserted?

Thanks,

 -- MikeE

-Original Message-
From: zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Bob
Friesenhahn
Sent: Sunday, January 18, 2009 1:19 PM
To: Will Murnane
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] replace same sized disk fails with too small
error

On Sun, 18 Jan 2009, Will Murnane wrote:
 That's easy to say, but what if there were no larger alternative?
 Suppose I have a pool composed of those 1.5TB Seagate disks, and
 Hitachi puts out some of the same capacity that are actually
 slightly smaller.  A drive fails in my array, I buy a Hitachi disk to
 replace it, and it doesn't work.  If I can't get a large enough drive
 to replace the missing disk with, it'd be a shame to have to destroy
 and recreate the pool on smaller media.

What do you propose that OpenSolaris should do about this?  Should 
OpenSolaris use some sort of a table of common size drives, or use 
an algorithm which determines certain discrete usage values based on 
declared drive sizes and a margin for error?  What should OpenSolaris 
of today do with the 20TB disk drives of tomorrow?  What should the 
margin for error of a 30TB disk drive be?  Is it ok to arbitrarily 
ignore 3/4TB of storage space?

If the drive is actually a huge 20TB LUN exported from a SAN RAID 
array, how should the margin for error be handled in that case?

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-18 Thread Will Murnane
On Sun, Jan 18, 2009 at 18:19, Bob Friesenhahn
bfrie...@simple.dallas.tx.us wrote:
 What do you propose that OpenSolaris should do about this?
Take drive size, divide by 100, round down to two significant digits.
Floor to a multiple of that size.  This method wastes no more than 1%
of the disk space, and gives a reasonable (I think) number.

For example: I have a machine with a 250GB disk that is 251000193024
bytes long.
$ python
 n=str(251000193024//100)
 int(n[:2] + 0 * (len(n)-2)) * 100
2500L
So treat this volume as being 250 billion bytes long, exactly.

Most drives are sold with two significant digits in the size: 320 GB,
400 GB, 640GB, 1.0 TB, etc.  I don't see this changing any time
particularly soon; unless someone starts selling a 1.25 TB drive or
something, two digits will suffice.  Even then, this formula would
give you 96% (1.2/1.25) of the disk's capacity.

Note that this method also works for small-capacity disks: suppose I
have a disk that's exactly 250 billion bytes long.  This formula will
produce 250 billion as the size it is to be treated as.  Thus,
replacing my 251 billion byte disk with a 250 billion byte one will
not be a problem.

 Is it ok to arbitrarily ignore 3/4TB of storage
 space?
If it's less than 1% of the disk space, I don't see a problem doing so.

 If the drive is actually a huge 20TB LUN exported from a SAN RAID array,
 how should the margin for error be handled in that case?
So make it configurable if you must.  If no partition table exists
when zpool create is called, make it right-size the disks, but if
a pre-existing EFI label is there, use it instead.  Or make a flag
that tells zpool create not to right-size.

Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-18 Thread Bob Friesenhahn
On Sun, 18 Jan 2009, Will Murnane wrote:
 Most drives are sold with two significant digits in the size: 320 GB,
 400 GB, 640GB, 1.0 TB, etc.  I don't see this changing any time
 particularly soon; unless someone starts selling a 1.25 TB drive or
 something, two digits will suffice.  Even then, this formula would
 give you 96% (1.2/1.25) of the disk's capacity.

If the drive is attached to a RAID controller which steals part of its 
capacity for its own purposes, how will you handle that?

These stated drive sizes are just marketing terms and do not have a 
sound technical basis.  Don't drive vendors provide actual sizing 
information in their specification sheets so that knowledgeable 
people can purchase the right sized drive?

Bob
==
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-18 Thread Tim
On Sun, Jan 18, 2009 at 1:30 PM, Bob Friesenhahn 
bfrie...@simple.dallas.tx.us wrote:

 On Sun, 18 Jan 2009, Will Murnane wrote:
  Most drives are sold with two significant digits in the size: 320 GB,
  400 GB, 640GB, 1.0 TB, etc.  I don't see this changing any time
  particularly soon; unless someone starts selling a 1.25 TB drive or
  something, two digits will suffice.  Even then, this formula would
  give you 96% (1.2/1.25) of the disk's capacity.

 If the drive is attached to a RAID controller which steals part of its
 capacity for its own purposes, how will you handle that?

 These stated drive sizes are just marketing terms and do not have a
 sound technical basis.  Don't drive vendors provide actual sizing
 information in their specification sheets so that knowledgeable
 people can purchase the right sized drive?

 Bob
 ==
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/


You look at the size of the drive and you take a set percentage off...  If
it's a LUN and it's so far off it still can't be added with the percentage
that works across the board for EVERYTHING ELSE, you change the size of the
LUN at the storage array or adapter.

I know it's fun to pretend this is rocket science and impossible, but the
fact remains the rest of the industry has managed to make it work.  I have a
REAL tough time believing that Sun and/or zfs is so deficient it's an
insurmountable obstacle for them.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-18 Thread Eric D. Mudama
On Sun, Jan 18 at 13:43, Tim wrote:
   You look at the size of the drive and you take a set percentage off...  If
   it's a LUN and it's so far off it still can't be added with the
   percentage that works across the board for EVERYTHING ELSE, you change the
   size of the LUN at the storage array or adapter.

   I know it's fun to pretend this is rocket science and impossible, but the
   fact remains the rest of the industry has managed to make it work.  I have
   a REAL tough time believing that Sun and/or zfs is so deficient it's an
   insurmountable obstacle for them.

If, instead of having ZFS manage these differences, a user simply
created slices that were, say, 98% as big as the average number of
sectors in a XXX GB drive... would ZFS enable write cache on that
device or not?

I thought I'd read that ZFS didn't use write cache on slices because
it couldn't guarantee that the other slices were used in a
write-cache-safe fashion, would that apply to cases where no other
slices were allocated?

-- 
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-18 Thread JZ
Hi Bob, Will, Tim,
I also had some off-list comments on my irrelevant comments.
So I will try to make this post less irrelevant, though my thoughts on this 
topic may be off the list discussion line of thoughts, as usual.

From the major storage vendors I know, network storage systems as integrated 
products, are only offered with the same size/type of drives in a 
traditional RAID set (not the V-RAID style). Mixing different drives in a 
traditional RAID set is not recommanded by many vendors, and I think when 
taking that as a policy, it will cut off much trouble in trying to mix 
different drives in a RAID set.

And folks, the last time I really got into the largest database (by Winter) 
data sets, their sizes were not really as huge as I thought.
http://www.wintercorp.com/VLDB/2005_TopTen_Survey/TopTenProgram.html

Again, I think, the exponential data growth we have been talking about for a 
few years is more in file data. Databases use block-storage, very efficient 
on capacity.

The kind of drives is not as important as how do you use those drives. IMHO.

Best,
z


- Original Message - 
From: Bob Friesenhahn bfrie...@simple.dallas.tx.us
To: Will Murnane will.murn...@gmail.com
Cc: zfs-discuss@opensolaris.org
Sent: Sunday, January 18, 2009 2:30 PM
Subject: Re: [zfs-discuss] replace same sized disk fails with too small 
error


 On Sun, 18 Jan 2009, Will Murnane wrote:
 Most drives are sold with two significant digits in the size: 320 GB,
 400 GB, 640GB, 1.0 TB, etc.  I don't see this changing any time
 particularly soon; unless someone starts selling a 1.25 TB drive or
 something, two digits will suffice.  Even then, this formula would
 give you 96% (1.2/1.25) of the disk's capacity.

 If the drive is attached to a RAID controller which steals part of its
 capacity for its own purposes, how will you handle that?

 These stated drive sizes are just marketing terms and do not have a
 sound technical basis.  Don't drive vendors provide actual sizing
 information in their specification sheets so that knowledgeable
 people can purchase the right sized drive?

 Bob
 ==
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-18 Thread Tim
On Sun, Jan 18, 2009 at 1:56 PM, Eric D. Mudama
edmud...@bounceswoosh.orgwrote:

 On Sun, Jan 18 at 13:43, Tim wrote:

  You look at the size of the drive and you take a set percentage off...
  If
  it's a LUN and it's so far off it still can't be added with the
  percentage that works across the board for EVERYTHING ELSE, you change
 the
  size of the LUN at the storage array or adapter.

  I know it's fun to pretend this is rocket science and impossible, but the
  fact remains the rest of the industry has managed to make it work.  I
 have
  a REAL tough time believing that Sun and/or zfs is so deficient it's an
  insurmountable obstacle for them.


 If, instead of having ZFS manage these differences, a user simply
 created slices that were, say, 98% as big as the average number of
 sectors in a XXX GB drive... would ZFS enable write cache on that
 device or not?

 I thought I'd read that ZFS didn't use write cache on slices because
 it couldn't guarantee that the other slices were used in a
 write-cache-safe fashion, would that apply to cases where no other
 slices were allocated?


It will disable it by default, but you can manually re-enable it.  That's
not so much the point though.  ZFS is supposed to be filesystem/volume
manager all-in-one.  When I have to start going through format every time I
add a drive, it's a non-starter, not to mention it's a kludge.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-18 Thread Al Tobey
I ran into a bad label causing this once.
br/br/
Usually the s2 slice is a good bet for your whole disk device, but if it's EFI 
labeled, you need to use p0 (somebody correct me if I'm wrong).
br/br/
I like to zero the first few megs of a drive before doing any of this stuff.   
This will destroy any data.
br/br/
Obviously, change span style=font-family: monospace;c7t1d0p0/span to 
whatever your drive's device is.
br/br/
span style=font-family: monospace;dd if=/dev/zero of=/dev/rdsk/c7t1d0p0 
bs=512 count=8192/span
br/br/
For EFI you may also need to zero the end of the disk too because it writes the 
VTOC to both the beginning and end for redundancy.   I'm not sure of the best 
way to get the drive size in blocks without using format(1M) so I'll leave that 
as an exercise for the reader.For my 500gb disks it was something like:
br/br/
976533504 is $number_of_blocks (from format) - 8192 (4mb in 512 byte blocks).
br/br/
span style=font-family: monospace;dd if=/dev/zero of=/dev/rdsk/c7t0d0p0 
bs=512 count=8192 seek=976533504/span
br/br/
When you run format - fdisk, it should prompt you to write a new Solaris label 
to the disk.Just accept all the defaults.
br/br/
span style=font-family: monospace;format -d c7t1d0/span
br/br/
Remember to double-check your devices and wait a beat before pressing enter 
with those dd commands as they destroy without warning or checking.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-18 Thread JZ
Yes, I agree, command interface is more efficient and more risky than GUI.
You will have to be very careful when doing that.

Best,
z


- Original Message - 
From: Al Tobey tob...@gmail.com
To: zfs-discuss@opensolaris.org
Sent: Sunday, January 18, 2009 3:09 PM
Subject: Re: [zfs-discuss] replace same sized disk fails with too small 
error


I ran into a bad label causing this once.
 br/br/
 Usually the s2 slice is a good bet for your whole disk device, but if it's 
 EFI labeled, you need to use p0 (somebody correct me if I'm wrong).
 br/br/
 I like to zero the first few megs of a drive before doing any of this 
 stuff.   This will destroy any data.
 br/br/
 Obviously, change span style=font-family: monospace;c7t1d0p0/span 
 to whatever your drive's device is.
 br/br/
 span style=font-family: monospace;dd if=/dev/zero 
 of=/dev/rdsk/c7t1d0p0 bs=512 count=8192/span
 br/br/
 For EFI you may also need to zero the end of the disk too because it 
 writes the VTOC to both the beginning and end for redundancy.   I'm not 
 sure of the best way to get the drive size in blocks without using 
 format(1M) so I'll leave that as an exercise for the reader.For my 
 500gb disks it was something like:
 br/br/
 976533504 is $number_of_blocks (from format) - 8192 (4mb in 512 byte 
 blocks).
 br/br/
 span style=font-family: monospace;dd if=/dev/zero 
 of=/dev/rdsk/c7t0d0p0 bs=512 count=8192 seek=976533504/span
 br/br/
 When you run format - fdisk, it should prompt you to write a new Solaris 
 label to the disk.Just accept all the defaults.
 br/br/
 span style=font-family: monospace;format -d c7t1d0/span
 br/br/
 Remember to double-check your devices and wait a beat before pressing 
 enter with those dd commands as they destroy without warning or checking.
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-18 Thread Richard Elling
comment at the bottom...

Tim wrote:
 On Sun, Jan 18, 2009 at 1:56 PM, Eric D. Mudama 
 edmud...@bounceswoosh.org mailto:edmud...@bounceswoosh.org wrote:
 
 On Sun, Jan 18 at 13:43, Tim wrote:
 
  You look at the size of the drive and you take a set percentage
 off...  If
  it's a LUN and it's so far off it still can't be added with the
  percentage that works across the board for EVERYTHING ELSE, you
 change the
  size of the LUN at the storage array or adapter.
 
  I know it's fun to pretend this is rocket science and
 impossible, but the
  fact remains the rest of the industry has managed to make it
 work.  I have
  a REAL tough time believing that Sun and/or zfs is so deficient
 it's an
  insurmountable obstacle for them.
 
 
 If, instead of having ZFS manage these differences, a user simply
 created slices that were, say, 98% as big as the average number of
 sectors in a XXX GB drive... would ZFS enable write cache on that
 device or not?
 
 I thought I'd read that ZFS didn't use write cache on slices because
 it couldn't guarantee that the other slices were used in a
 write-cache-safe fashion, would that apply to cases where no other
 slices were allocated?
 
 
 It will disable it by default, but you can manually re-enable it.  
 That's not so much the point though.  ZFS is supposed to be 
 filesystem/volume manager all-in-one.  When I have to start going 
 through format every time I add a drive, it's a non-starter, not to 
 mention it's a kludge.

DIY.  Personally, I'd be more upset if ZFS reserved any sectors
for some potential swap I might want to do later, but may never
need to do.  If you want to reserve some space for swappage, DIY.

As others have noted, this is not a problem for systems vendors
because we try, and usually succeed, at ensuring that our multiple
sources of disk drives are compatible such that we can swap one
for another.
  -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-18 Thread Tim
On Sun, Jan 18, 2009 at 2:43 PM, Richard Elling richard.ell...@sun.comwrote:

 comment at the bottom...
 DIY.  Personally, I'd be more upset if ZFS reserved any sectors
 for some potential swap I might want to do later, but may never
 need to do.  If you want to reserve some space for swappage, DIY.

 As others have noted, this is not a problem for systems vendors
 because we try, and usually succeed, at ensuring that our multiple
 sources of disk drives are compatible such that we can swap one
 for another.
  -- richard



And again I call BS.  I've pulled drives out of a USP-V, Clariion, DMX, and
FAS3040.  Every single one had drives of slightly differing sizes.  Every
single one is right-sized at format time.

Hell, here's a filer I have sitting in a lab right now:

  RAID DiskDeviceHA  SHELF BAY CHAN Pool Type  RPM  Used
(MB/blks)Phys (MB/blks)
  ----    -
----
  dparity 0b.320b2   0   FC:B   -  FCAL 1
68000/139264000   68444/140174232
  parity  0b.330b2   1   FC:B   -  FCAL 1
68000/139264000   68444/140174232
  data0b.340b2   2   FC:B   -  FCAL 1
68000/139264000   68552/140395088

Notice line's 2 and 3 are different physical block size, and those are BOTH
seagate cheetah's, just different generation.  So, it gets short stroked to
68000 from 68552 or 68444.

And NO, the re-branded USP-V's Sun sell's don't do anything any differently,
so stop lying, it's getting old.

If you're so concerned with the storage *lying* or *hiding* space, I assume
you're leading the charge at Sun to properly advertise drive sizes, right?
Because the 1TB drive I can buy from Sun today is in no way, shape, or form
able to store 1TB of data.  You use the same *fuzzy math* the rest of the
industry does.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-18 Thread Richard Elling
Tim wrote:
 On Sun, Jan 18, 2009 at 2:43 PM, Richard Elling richard.ell...@sun.com 
 mailto:richard.ell...@sun.com wrote:
 
 comment at the bottom...
 DIY.  Personally, I'd be more upset if ZFS reserved any sectors
 for some potential swap I might want to do later, but may never
 need to do.  If you want to reserve some space for swappage, DIY.
 
 As others have noted, this is not a problem for systems vendors
 because we try, and usually succeed, at ensuring that our multiple
 sources of disk drives are compatible such that we can swap one
 for another.
  -- richard
 
 
 
 And again I call BS.  I've pulled drives out of a USP-V, Clariion, DMX, 
 and FAS3040.  Every single one had drives of slightly differing sizes.  
 Every single one is right-sized at format time.

It is naive to think that different storage array vendors
would care about people trying to use another array vendors
disks in their arrays. In fact, you should get a flat,
impersonal, not supported response.

What vendors can do, is make sure that if you get a disk
which is supported in a platform and replace it with another
disk which is also supported, and the same size, then it will
just work. In order for this method to succeed, a least,
common size is used.

 Hell, here's a filer I have sitting in a lab right now:
 
   RAID DiskDeviceHA  SHELF BAY CHAN Pool Type  RPM  Used 
 (MB/blks)Phys (MB/blks)
   ----    - 
 ----
   dparity 0b.320b2   0   FC:B   -  FCAL 1 
 68000/139264000   68444/140174232
   parity  0b.330b2   1   FC:B   -  FCAL 1 
 68000/139264000   68444/140174232
   data0b.340b2   2   FC:B   -  FCAL 1 
 68000/139264000   68552/140395088
 
 Notice line's 2 and 3 are different physical block size, and those are 
 BOTH seagate cheetah's, just different generation.  So, it gets short 
 stroked to 68000 from 68552 or 68444.
 
 And NO, the re-branded USP-V's Sun sell's don't do anything any 
 differently, so stop lying, it's getting old.

Vendors can change the default label, which is how it is
implemented.  For example, if we source XYZ-GByte disks
from two different vendors intended for the same platform,
then we will ensure that the number of available sectors
is the same, otherwise the FRU costs would be very high.
No conspiracy here... just good planning.

 If you're so concerned with the storage *lying* or *hiding* space, I 
 assume you're leading the charge at Sun to properly advertise drive 
 sizes, right?  Because the 1TB drive I can buy from Sun today is in no 
 way, shape, or form able to store 1TB of data.  You use the same *fuzzy 
 math* the rest of the industry does.

There is no fuzzy math.  Disk vendors size by base 10.
They explicitly state this in their product documentation,
as business law would expect.
http://en.wikipedia.org/wiki/Mebibyte
  -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-18 Thread Tim
On Sun, Jan 18, 2009 at 3:39 PM, Richard Elling richard.ell...@sun.comwrote:

 Tim wrote:
 It is naive to think that different storage array vendors
 would care about people trying to use another array vendors
 disks in their arrays. In fact, you should get a flat,

impersonal, not supported response.


But we aren't talking about me trying to stick disks into Sun's arrays.
We're talking about how this open source, supposed all-in-one volume manager
and filesystem handles new disks.  You know, the one that was supposed to
make all of our lives infinitely easier, and simplify managing lots, and
lots of disks.  Whether they be inside of a official Sun array or just a
server running Solaris.


 What vendors can do, is make sure that if you get a disk
 which is supported in a platform and replace it with another
 disk which is also supported, and the same size, then it will
 just work. In order for this method to succeed, a least,
 common size is used.


The ONLY reason vendors put special labels or firmware on disks is to force
you to buy them direct.  Let's not pretend there's something magical about
an HDS 1TB drive or a Sun 1TB drive.  They're rolling off the same line
as everyone else's.  The way they ensure the disk works is by short stroking
them from the start...

It's *naive* to claim it's any sort of technical limitation.



 Vendors can change the default label, which is how it is
 implemented.  For example, if we source XYZ-GByte disks
 from two different vendors intended for the same platform,
 then we will ensure that the number of available sectors
 is the same, otherwise the FRU costs would be very high.
 No conspiracy here... just good planning.


The number of blocks on the disks won't be the same.  Which is why they're
right-sized per above.  Do I really need to start pulling disks from my Sun
systems to prove this point?  Sun does not require exact block counts any
more than HDS, EMC, or NetApp.  So for the life of the server, I can call in
and get the exact same part that broke in the box from Sun, because they've
got contracts with the drive mfg's.  What happens when I'm out of the
supported life of the system?  Oh, I just buy a new one?  Because having my
volume manager us a bit of intelligence and short stroke the disk like I
would expect from the start is a *bad idea*.

The sad part about all of this is that the $15 promise raid controller in my
desktop short-strokes by default and you're telling me zfs can't, or won't.



 There is no fuzzy math.  Disk vendors size by base 10.
 They explicitly state this in their product documentation,
 as business law would expect.
 http://en.wikipedia.org/wiki/Mebibyte
  -- richard


If it's not fuzzy math, drive mfg's wouldn't lose in court over the false
advertising, would they?
http://apcmag.com/seagate_settles_class_action_cash_back_over_misleading_hard_drive_capacities.htm



At the end of the day, this back and forth changes nothing though.  The
default behavior for zfs importing a new disk should be right-sizing a
fairly conservative amount if you're (you as in Sun, not you as in Richard)
going to continue to market it as you have in the past.  It most definitely
does not eliminate the same old pains of managing disks with Solaris if I
have to start messing with labels and slices again.  The whole point of
merging a volume manager/filesystem/etc is to take away that pain.  That is
not even remotely manageable over the long term.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-18 Thread Eric D. Mudama
On Sun, Jan 18 at 15:00, Tim wrote:
   If you're so concerned with the storage *lying* or *hiding* space, I
   assume you're leading the charge at Sun to properly advertise drive sizes,
   right?  Because the 1TB drive I can buy from Sun today is in no way,
   shape, or form able to store 1TB of data.  You use the same *fuzzy math*
   the rest of the industry does.

While in general I'd like to see a combined FS/VM be smarter as you
do, on this point I disagree with you.  Most drive vendors publish the
exact sector counts of each model that they ship, and this should be
sufficient for your purposes.

As an arbitrary example, seagate lists a number of Guaranteed
Sectors in their technical specifications for each unique model
number.

Their 7200.11 500GB drive ST3500320AS guarantees 976,773,168 sectors,
which happens to exactly equal the IDEMA amount for 500GB.

While rounding down to the next IDEMA multiple might make sense,
depending on the technique that could cause you 1GB per device, and
i'm sure a lot of people would rather not have that limitation.


-- 
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] replace same sized disk fails with too small error

2009-01-17 Thread Antonius
Volume name = 
ascii name  = SAMSUNG-S0VVJ1CP30539-0001-465.76GB
bytes/sector=  512
sectors = 976760063
accessible sectors = 976760030
Part  TagFlag First Sector Size Last Sector
  0usrwm   256  465.75GB  976743646
  1 unassignedwm 0   0   0
  2 unassignedwm 0   0   0
  3 unassignedwm 0   0   0
  4 unassignedwm 0   0   0
  5 unassignedwm 0   0   0
  6 unassignedwm 0   0   0
  8   reservedwm 9767436478.00MB  976760030  

This is the readout from a disk it's meant to replace. looks like the same 
number of sectors, as it should be being the same model.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss