Re: [zfs-discuss] replace same sized disk fails with too small error
It also wouldn't be a bad idea for ZFS to also verify drives designated as hot spares in fact have sufficient capacity to be compatible replacements for particular configurations, prior to actually being critically required (as if drives otherwise appearing to have equivalent capacity may not, it wouldn't be a nice thing to first discover upon attempted replacement of a failed drive). ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
+1 On Thu, Jan 22, 2009 at 11:12 PM, Paul Schlie sch...@comcast.net wrote: It also wouldn't be a bad idea for ZFS to also verify drives designated as hot spares in fact have sufficient capacity to be compatible replacements for particular configurations, prior to actually being critically required (as if drives otherwise appearing to have equivalent capacity may not, it wouldn't be a nice thing to first discover upon attempted replacement of a failed drive). ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
Would this work? (to get rid of an EFI label). dd if=/dev/zero of=/dev/dsk/thedisk bs=1024k count=1 Then use format format might complain that the disk is not labeled. You can then label the disk. Dale Antonius wrote: can you recommend a walk-through for this process, or a bit more of a description? I'm not quite sure how I'd use that utility to repair the EFI label ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
yes, that's exactly what I did. the issue is that I can't get the corrected label to be written once I've zero'd the drive. I get and error from fdisk that apparently views the backup label -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
not quite .. it's 16KB at the front and 8MB back of the disk (16384 sectors) for the Solaris EFI - so you need to zero out both of these of course since these drives are 1TB you i find it's easier to format to SMI (vtoc) .. with format -e (choose SMI, label, save, validate - then choose EFI) but to Casper's point - you might want to make sure that fdisk is using the whole disk .. you should probably reinitialize the fdisk sectors either with the fdisk command or run fdisk from format (delete the partition, create a new partition using 100% of the disk, blah, blah) .. finally - glancing at the format output - there appears to be a mix of labels on these disks as you've got a mix c#d# entries and c#t#d# entries so i might suspect fdisk might not be consistent across the various disks here .. also noticed that you dumped the vtoc for c3d0 and c4d0, but you're replacing c2d1 (of unknown size/layout) with c1d1 (never dumped in your emails) .. so while this has been an animated (slightly trollish) discussion on right-sizing (odd - I've typically only seen that term as an ONTAPism) with some short-stroking digs .. it's a little unclear what the c1d1s0 slice looks like here or what the cylinder count is - i agree it should be the same - but it would be nice to see from my armchair here On Jan 22, 2009, at 3:32 AM, Dale Sears wrote: Would this work? (to get rid of an EFI label). dd if=/dev/zero of=/dev/dsk/thedisk bs=1024k count=1 Then use format format might complain that the disk is not labeled. You can then label the disk. Dale Antonius wrote: can you recommend a walk-through for this process, or a bit more of a description? I'm not quite sure how I'd use that utility to repair the EFI label ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
you mentioned one, so what do you recomend as a workaround?. I've tried re-initialing the disks on another system's HW RAID controller, but still get the same error. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
The user DEFINITELY isn't expecting 5 bytes, or what you meant to say 5000 bytes, they're expecting 500GB. You know, 536,870,912,000 bytes. But even if the drive mfg's calculated it correctly, they wouldn't even be getting that due to filesystem overhead. Then you have a very stupid user who is been living in a cave. The only reason why we incorrect label memory is because the systems are binary. (Incorrect, because there's one standard and it says that K, M, G and T are powers of 10.) The computer cannot efficiently address non-binary sized memory. IIRC, some stupid user did indeed sue WD and he won, but that is in America (I'm sure that the km is 1024 meters in the US) Since that lawsuit the vendors all make sure that the specification says how many addressable sectors are in a disk. You make the right size disk a big issue. And perhaps it is, however, ZFS is out a number of years and noone complained about it before. It's not just a big priority, it's not even in the list. File a bug/rfe, if you want this fixed. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
so you're suggesting I buy 750s to replace the 500s. then if a 750 fails buy another bigger drive again? Have you filed a bug/rfe to fix this in ZFS in future? Anyway, you only need to change the 750GB drives if: - all 500GBs drives are replace by 750GB disks - and they're all bigger than the newest 750GB the drives are RMA replacements for the other disks that faulted in the array before. they are the same brand, model and model number, apparently not so under the label though, but no way I could tell that before. That is really weird. Or is this, perhaps, because you use a EFI label on the disks and we now label the disks different? (I think we make sure that the ZFS label starts at a 128K offset, now, before it did not) Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
I believe this is an fdisk issue. But I don't think any of the fdisk engineers hang out on this forum. You might try partitioning the disk on another OS. -- richard Antonius wrote: I'll attach 2 files of output from 2 disks: c4d0 is a current member of the zpool that is a sibling (as in a member of the same batch a couple of serial number increments different) of the faulted disk to replace and currently running without issue and c3d0 is a new disk I got back from as a replacement for a failed disk that's obviously different. it appears the EFI label needs fixing. I just can't get it to stick with any combination of commands I've tried. eg removing and resetting all partitions with fdisk -e and trying to recreate with geometry as per the existing pool members even after trying to dd the first section of all partitions: bash-3.2# fdisk -A 238:0:0:1:0:254:63:1023:1:976773167 c3d0 fdisk: EFI partitions must encompass the entire disk (input numsect: 976773167 - avail: 976760063) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
Grab the AOE driver and pull aoelabinit out of the package.They wrote it just for forcing EFI or Sun labels onto disks when the normal Solaris tools get in the way. coraid's website looks like it's broken at the moment, so you may need to find it elsewhere on the web. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
can you recommend a walk-through for this process, or a bit more of a description? I'm not quite sure how I'd use that utility to repair the EFI label -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
On Mon, Jan 19, 2009 at 5:39 PM, Adam Leventhal a...@eng.sun.com wrote: And again, I say take a look at the market today, figure out a percentage, and call it done. I don't think you'll find a lot of users crying foul over losing 1% of their drive space when they don't already cry foul over the false advertising that is drive sizes today. Perhaps it's quaint, but 5GB still seems like a lot to me to throw away. That wasn't a hard number, that was a hypothetical number. On 750GB drives I'm only seeing them lose in the area of 300-500MB. I have two disks in one of my systems... both maxtor 500GB drives, purchased at the same time shortly after the buyout. One is a rebadged Seagate, one is a true, made in China Maxtor. Different block numbers... same model drive, purchased at the same time. Wasn't zfs supposed to be about using software to make up for deficiencies in hardware? It would seem this request is exactly that... That's a fair point, and I do encourage you to file an RFE, but a) Sun has already solved this problem in a different way as a company with our products and b) users already have the ability to right-size drives. Perhaps a better solution would be to handle the procedure of replacing a disk with a slightly smaller one by migrating data and then treating the extant disks as slightly smaller as well. This would have the advantage of being far more dynamic and of only applying the space tax in situations where it actually applies. A) Should have a big bright * next to it referencing our packaged storage solutions. I've got plenty of 72G Sun drives still lying around that aren't all identical block numbers ;) Yes, an RMA is great, but when I've got spares sitting on the shelf and I lose a drive at 4:40pm on a Friday, I'm going to stick the spare off the shelf in, call Sun, and put the replacement back on the shelf on Monday. /horse beaten B) I think we can both agree that having to pre-slice every disk that goes into the system is not a viable long-term solution to this issue. That being said, your conclusion sounds like a perfectly acceptable/good idea to me for all of the technical people such as those on this list. Joe User is another story, but much like adding a single drive to a raid-z(2) vdev, I doubt that's a target market for Sun at this time. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
Ross wrote: The problem is they might publish these numbers, but we really have no way of controlling what number manufacturers will choose to use in the future. If for some reason future 500GB drives all turn out to be slightly smaller than the current ones you're going to be stuck. Reserving 1-2% of space in exchange for greater flexibility in replacing drives sounds like a good idea to me. As others have said, RAID controllers have been doing this for long enough that even the very basic models do it now, and I don't understand why such simple features like this would be left out of ZFS. It would certainly be terrible go back to the days where 5% of the filesystem space is inaccessible to users, and force the sysadmin to manually change that percentage to 0 to get full use of the disk. Oh wait, UFS still does that, and it's a configurable parameter at mkfs time (and can be tuned on the fly) For a ZFS pool, (until block pointer rewrite capability) this would have to be a pool-create-time parameter. Perhaps a --usable-size=N[%] option which would either cut down the size of the EFI slices or fake the disk geometry so the EFI label ends early. Or it would be a small matter of programming to build a perl wrapper for zpool create that would accomplish the same thing. --Joe ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
mj == Moore, Joe joe.mo...@siemens.com writes: mj For a ZFS pool, (until block pointer rewrite capability) this mj would have to be a pool-create-time parameter. naw. You can just make ZFS do it all the time, like the other storage vendors do. no parameters. You can invent parameter-free ways of turning it off. For example, 1. label the disk with an EFI label taking up the whole disk 2. point ZFS at slice zero instead of the whole disk, like /dev/dsk/c0t0d0s0 instead of /dev/dsk/c0t0d0 3. ZFS will then be written to know it's supposed to use the entire disk instead of writing a new label, but will still behave as though it owns the disk cache-wise. -or- 1. label the disk any way you like 2. point ZFS at the whole disk, /dev/dsk/c0t0d0. And make that whole-disk device name work for all disks no matter what controller, whether or not they're ``removeable,'' or how they're labeled, like the equivalent device name does in Linux, FreeBSD, and Mac OS X. 3. ZFS should remove your label and write a one-slice EFI label that doesn't use the entire disk, and rounds down to a bucketed/quantized/whole-ish number. If the disk is a replacement for a component of an existing vdev, the EFI labelsize it picks will be the *larger* of: a. The right-size ZFS would have picked if the disk weren't a replacement b. the smallest existing component in the vdev Most people will not even notice the feature exists except by getting errors less often. AIUI this is how it works with other RAID layers, the cheap and expensive alike among ``hardware'' RAID, and this common-practice is very ZFS-ish. except hardware RAID is proprietary so you cannot determine their exact policy, while in ZFS you would be able to RTFS and figure it out. But there is still no need for parameters. There isn't even a need to explain the feature to the user. I guess this has by now become a case of the silly unimportant but easy-to-understand feature dominating the mailing list because it's so obvious that everyone's qualified to pipe up with his opinion, so maybe I'm a bit late and shuld have let it die. pgpg3DEgtVtdm.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
Miles Nordin wrote: mj == Moore, Joe joe.mo...@siemens.com writes: mj For a ZFS pool, (until block pointer rewrite capability) this mj would have to be a pool-create-time parameter. naw. You can just make ZFS do it all the time, like the other storage vendors do. no parameters. Other storage vendors have specific compatibility requirements for the disks you are allowed to install in their chassis. On the other hand, OpenSolaris is intended to work on commodity hardware. And there is no way to change this after the pool has been created, since after that time, the disk size can't be changed. So whatever policy is used by default, it is very important to get it right. (snip) Most people will not even notice the feature exists except by getting errors less often. AIUI this is how it works with other RAID layers, the cheap and expensive alike among ``hardware'' RAID, and this common-practice is very ZFS-ish. except hardware RAID is proprietary so you cannot determine their exact policy, while in ZFS you would be able to RTFS and figure it out. Sysadmins should not be required to RTFS. Behaviors should be documented in other places too. But there is still no need for parameters. There isn't even a need to explain the feature to the user. There isn't a need to explain the feature to the user? That's one of the most irresponsible responses I've heard lately. A user is expecting their 500GB disk to be 5 bytes, not 4999500 bytes, unless that feature is explained. Parameters with reasonable defaults (and a reasonable way to change them) allow users who care about the parameter and understand the tradeoffs involved in changing from the default to make their system work better. If I didn't want to be able to tune my system for performance, I would be running Windows. OpenSolaris is about transparency, not just Open Source. --Joe ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
[I hate to keep dragging this thread forward, but...] Moore, Joe wrote: And there is no way to change this after the pool has been created, since after that time, the disk size can't be changed. So whatever policy is used by default, it is very important to get it right. Today, vdev size can be grown, but not shrunk, on the fly, without causing any copying of data. If you need to shrink today, you need to copy the data. This is also true of many, but not all, file systems. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
jm == Moore, Joe joe.mo...@siemens.com writes: jm Sysadmins should not be required to RTFS. I never said they were. The comparison was between hardware RAID and ZFS, not between two ZFS alternatives. The point: other systems' behavior is enitely secret. Therefore, secret opaque undiscussed right-sizing is the baseline. The industry-wide baseline is not guaranteeing to use the whole disk no matter what, nor is it building a flag-ridden partitioning tool with bikeshed HOWTO documentation into zpool full of multi-paragraph Windows ExPee-style CYA ``are you SURE you want to use the whole disk, because blah bla blahblah blha blaaagh'' modal dialog box warnings. This overdiscussion feels like the way X.509 and IPsec grow and grow, accomodating every feature dreamed up by people who don't have to implement or live with the result because each feature is so important that some day it'd be disastrous not to have it. jm There isn't a need to explain the feature to the user? That's jm one of the most irresponsible responses I've heard lately. It's fine if you disagree, but the disastrous tone makes no sense. Other filesystems and RAID layers consume similar amounts of space for metadata, labels, bitmaps, whatever. The suggestion is neither surprising nor harmful, especially compared to the current behavior. anyway probably none of it matters because of the IDEMA sizes, and the rewrite/evacuation feature that will hopefully be done a couple years from now. pgpfv3oyMNGsm.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
On Tue, Jan 20, 2009 at 2:26 PM, Moore, Joe joe.mo...@siemens.com wrote: Other storage vendors have specific compatibility requirements for the disks you are allowed to install in their chassis. And again, the reason for those requirements is 99% about making money, not a technical one. If you go back far enough in time, nearly all of them at some point allowed non-approved disks into the system, or there was firmware available to flash unsupported drives to make them work. Heck, if you knew the right people you could still do that today... There isn't a need to explain the feature to the user? That's one of the most irresponsible responses I've heard lately. A user is expecting their 500GB disk to be 5 bytes, not 4999500 bytes, unless that feature is explained. The user DEFINITELY isn't expecting 5 bytes, or what you meant to say 5000 bytes, they're expecting 500GB. You know, 536,870,912,000 bytes. But even if the drive mfg's calculated it correctly, they wouldn't even be getting that due to filesystem overhead. Funny I haven't seen any posts to the list from you demanding that Sun release exact specifications for how much overhead is lost to metadata, snapshots, and filesystem structure... Parameters with reasonable defaults (and a reasonable way to change them) allow users who care about the parameter and understand the tradeoffs involved in changing from the default to make their system work better. If I didn't want to be able to tune my system for performance, I would be running Windows. OpenSolaris is about transparency, not just Open Source. If you fill the disks 100% full, you won't need to worry about performance. In fact, I would wager if the only space you have left on the device is the amount you lost to right-sizing, the pool will have already toppled over and died. Although I do agree with you, being able to change from the default behavior, in general, is a good idea. Agreeing on what that default behavior should be is probably another issue entirely ;) I would imagine this could be something set perhaps with a flag in bootenv.rc (or wherevever deemed appropriate). ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
The user DEFINITELY isn't expecting 5 bytes, or what you meant to say 5000 bytes, they're expecting 500GB. You know, 536,870,912,000 bytes. But even if the drive mfg's calculated it correctly, they wouldn't even be getting that due to filesystem overhead. I doubt there are any users left in the world that would expect that -- the drive manufacturers have made it clear for the past 20 years that 500 GB = 500*10^9, not 500*2^30. Even the OS vendors have finally (for the most part) started displaying GB instead of GiB. And again, the reason for [certified devices] is 99% about making money, not a technical one. Yes and no. From my experience at three storage vendors, it *is* about making money (aren't all corporate decisions supposed to be?) but it's less about making money by selling overpriced drives than by not *losing* money by trying to support hardware that doesn't quite work. It's a dirty little secret of the drive/controller/array industry (and networking, for that matter) that two arbitrary pieces of hardware which are supposed to conform to a standard will usually, mostly, work together -- but not always, and when they fail, it's very difficult to track down (usually impossible in a customer environment). By limiting which drives, controllers, firmware revisions, etc. are supported, we reduce the support burden immensely and are able to ensure that we can actually test what a customer is using. A few specific examples I've seen personally: * SCSI drives with caches that would corrupt data if the mode pages were set wrong. * SATA adapters which couldn't always complete commands simultaneously on multiple channels (leading to timeouts or I/O errors). * SATA controllers which couldn't quite deal with timing at one edge of the spec ... and drives which pushed the timing to that edge under the right conditions. * Drive firmware which silently dropped commands when the queue depth got too large. All of these would 'mostly work', especially in desktop use (few outstanding commands, no changes to default parameters, no use of task control messages), but would fail in other environments in ways that were almost impossible to track down with specialized hardware. When I was in a software-only RAID company, we did support nearly arbitrary hardware -- but we had a compatible list of what we'd tested, and for everything else, the users were pretty much on their own. That's OK for home users, but for critical data, the greatly increased risk is not worth saving a few thousand (or even tens of thousands) dollars. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
so you're suggesting I buy 750s to replace the 500s. then if a 750 fails buy another bigger drive again? the drives are RMA replacements for the other disks that faulted in the array before. they are the same brand, model and model number, apparently not so under the label though, but no way I could tell that before. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
yes, it's the same make and model as most of the other disks in the zpool and reports the same number of sectors -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
The problem is they might publish these numbers, but we really have no way of controlling what number manufacturers will choose to use in the future. If for some reason future 500GB drives all turn out to be slightly smaller than the current ones you're going to be stuck. Reserving 1-2% of space in exchange for greater flexibility in replacing drives sounds like a good idea to me. As others have said, RAID controllers have been doing this for long enough that even the very basic models do it now, and I don't understand why such simple features like this would be left out of ZFS. Fair enough, for high end enterprise kit where you want to squeeze every byte out of the system (and know you'll be buying Sun drives), you might not want this, but it would have been trivial to turn this off for kit like that. It's certainly a lot easier to expand a pool than shrink it! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
I'm going waaay out on a limb here, as a non-programmer...but... Since the source is open, maybe community members should organize and work on some sort of sizing algorithm? I can certainly imagine Sun deciding to do this in the future - I can also imagine that it's not at the top of Sun's priority list (most of the devices they deal with are their own, and perhaps not subject to the right-sizing issue). If it matters to the community, why not, as a community, try to fix/improve zfs in this way? Again, I've not even looked at the code for block allocation or whatever it might be called in this case, so I could be *way* off here :) Lastly, Antonius, you can try the zpool trick to get this disk relabeled, I think. Try 'zpool create temp_pool [problem_disk]' then 'zpool destroy temp_pool]' - this should relabel the disk in question and set up the defaults that zfs uses. Can you also run format partition print on one of the existing disks and send the output so that we can see what the existing disk looks like? (Off-list directly to me if you prefer). cheers, Blake ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
Ross wrote: The problem is they might publish these numbers, but we really have no way of controlling what number manufacturers will choose to use in the future. If for some reason future 500GB drives all turn out to be slightly smaller than the current ones you're going to be stuck. Reserving 1-2% of space in exchange for greater flexibility in replacing drives sounds like a good idea to me. As others have said, RAID controllers have been doing this for long enough that even the very basic models do it now, and I don't understand why such simple features like this would be left out of ZFS. I have added the following text to the best practices guide: * When a vdev is replaced, the size of the replacements vdev, measured by usable sectors, must be the same or greater than the vdev being replaced. This can be confusing when whole disks are used because different models of disks may provide a different number of usable sectors. For example, if a pool was created with a 500 GByte drive and you need to replace it with another 500 GByte drive, then you may not be able to do so if the drives are not of the same make, model, and firmware revision. Consider planning ahead and reserving some space by creating a slice which is smaller than the whole disk instead of the whole disk. Fair enough, for high end enterprise kit where you want to squeeze every byte out of the system (and know you'll be buying Sun drives), you might not want this, but it would have been trivial to turn this off for kit like that. It's certainly a lot easier to expand a pool than shrink it! Actually, enterprise customers do not ever want to squeeze every byte, they would rather have enough margin to avoid such issues entirely. This is what I was referring to earlier in this thread wrt planning. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
Richard, Ross wrote: The problem is they might publish these numbers, but we really have no way of controlling what number manufacturers will choose to use in the future. If for some reason future 500GB drives all turn out to be slightly smaller than the current ones you're going to be stuck. Reserving 1-2% of space in exchange for greater flexibility in replacing drives sounds like a good idea to me. As others have said, RAID controllers have been doing this for long enough that even the very basic models do it now, and I don't understand why such simple features like this would be left out of ZFS. I have added the following text to the best practices guide: * When a vdev is replaced, the size of the replacements vdev, measured by usable sectors, must be the same or greater than the vdev being replaced. This can be confusing when whole disks are used because different models of disks may provide a different number of usable sectors. For example, if a pool was created with a 500 GByte drive and you need to replace it with another 500 GByte drive, then you may not be able to do so if the drives are not of the same make, model, and firmware revision. Consider planning ahead and reserving some space by creating a slice which is smaller than the whole disk instead of the whole disk. Creating a slice, instead of using the whole disk, will cause ZFS to not enable write-caching on the underlying device. - Jim Fair enough, for high end enterprise kit where you want to squeeze every byte out of the system (and know you'll be buying Sun drives), you might not want this, but it would have been trivial to turn this off for kit like that. It's certainly a lot easier to expand a pool than shrink it! Actually, enterprise customers do not ever want to squeeze every byte, they would rather have enough margin to avoid such issues entirely. This is what I was referring to earlier in this thread wrt planning. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
Since it's done in software by HDS, NetApp, and EMC, that's complete bullshit. Forcing people to spend 3x the money for a Sun drive that's identical to the seagate OEM version is also bullshit and a piss-poor answer. I didn't know that HDS, NetApp, and EMC all allow users to replace their drives with stuff they've bought at Fry's. Is this still covered by their service plan or would this only be in an unsupported config? Thanks. Adam -- Adam Leventhal, Fishworks http://blogs.sun.com/ahl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
Jim Dunham wrote: Richard, Ross wrote: The problem is they might publish these numbers, but we really have no way of controlling what number manufacturers will choose to use in the future. If for some reason future 500GB drives all turn out to be slightly smaller than the current ones you're going to be stuck. Reserving 1-2% of space in exchange for greater flexibility in replacing drives sounds like a good idea to me. As others have said, RAID controllers have been doing this for long enough that even the very basic models do it now, and I don't understand why such simple features like this would be left out of ZFS. I have added the following text to the best practices guide: * When a vdev is replaced, the size of the replacements vdev, measured by usable sectors, must be the same or greater than the vdev being replaced. This can be confusing when whole disks are used because different models of disks may provide a different number of usable sectors. For example, if a pool was created with a 500 GByte drive and you need to replace it with another 500 GByte drive, then you may not be able to do so if the drives are not of the same make, model, and firmware revision. Consider planning ahead and reserving some space by creating a slice which is smaller than the whole disk instead of the whole disk. Creating a slice, instead of using the whole disk, will cause ZFS to not enable write-caching on the underlying device. Correct. Engineering trade-off. Since most folks don't read the manual, or the best practices guide, until after they've hit a problem, it is really just a CYA entry :-( BTW, I also added a quick link to CR 4852783, reduce pool capacity, which is the feature which has a good chance of making this point moot. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
edm == Eric D Mudama edmud...@bounceswoosh.org writes: edm If, instead of having ZFS manage these differences, a user edm simply created slices that were, say, 98% if you're willing to manually create slices, you should be able to manually enable the write cache, too, while you're in there, so I wouldn't worry about that. I'd worry a little about the confusion over this write cache bit in general---where the write cache setting is stored and when it's enabled and when (if?) it's disabled, if the rules differ on each type of disk attachment, and if you plug the disk into Linux will Linux screw up the setting by auto-enabling at boot or by auto-disabling at shutdown or does Linux use stateless versions (analagous to sdparm without --save) when it prints that boot-time message about enabling write caches? For example weirdness, on iSCSI I get this, on a disk to which I've let ZFS write a GPT/EFI label: write_cache display Write Cache is disabled write_cache enable Write cache setting is not changeable so is that a bug of my iSCSI target, and is there another implicit write cache inside the iSCSI initiator or not? The Linux hdparm man page says: -W Disable/enable the IDE drive's write-caching feature (default state is undeterminable; manufacturer/model specific). so is the write_cache 'display' feature in 'format -e' actually reliable? Or is it impossible to reliably read this setting on an ATA drive, and 'format -e' is making stuff up? With Linux I can get all kinds of crazy caching data from a SATA disk: r...@node0 ~ # sdparm --page=ca --long /dev/sda /dev/sda: ATA WDC WD1000FYPS-0 02.0 Caching (SBC) [PS=0] mode page: IC 0 Initiator control ABPF0 Abort pre-fetch CAP 0 Caching analysis permitted DISC0 Discontinuity SIZE0 Size (1-CSS valid, 0-NCS valid) WCE 1 Write cache enable MF 0 Multiplication factor RCD 0 Read cache disable DRRP0 Demand read retension priority WRP 0 Write retension priority DPTL0 Disable pre-fetch transfer length MIPF0 Minimum pre-fetch MAPF0 Maximum pre-fetch MAPFC 0 Maximum pre-fetch ceiling FSW 0 Force sequential write LBCSS 0 Logical block cache segment size DRA 0 Disable read ahead NV_DIS 0 Non-volatile cache disable NCS 0 Number of cache segments CSS 0 Cache segment size but what's actually coming from the drive, and what's fabricated by the SCSI-to-SATA translator built into Garzik's libata? Because I think Solaris has such a translator, too, if it's attaching sd to SATA disks. I'm guessing it's all a fantasy because: r...@node0 ~ # sdparm --clear=WCE /dev/sda /dev/sda: ATA WDC WD1000FYPS-0 02.0 change_mode_page: failed setting page: Caching (SBC) but neverminding the write cache, I'd be happy saying ``just round down disk sizes using the labeling tool instead of giving ZFS the whole disk, if you care,'' IF the following things were true: * doing so were written up as a best-practice. because, I think it's a best practice if the rest of the storage industry from EMC to $15 promise cards is doing it, though maybe it's not important any more because of IDEMA. And right now very few people are likely to have done it because of the way they've been guided into the setup process. * it were possible to do this label-sizing to bootable mirrors in the various traditional/IPS/flar/jumpstart installers * there weren't a proliferation of = 4 labeling tools in Solaris, each riddled with assertion bailouts and slightly different capabilities. Linux also has a mess of labeling tools, but they're less assertion-riddled, and usually you can pick one and use it for everything---you don't have to drag out a different tool for USB sticks because they're considered ``removeable.'' Also it's always possible to write to the unpartitioned block device with 'dd' on Linux (and FreeBSD and Mac OS X), no matter what label is on the disk, while Solaris doesn't seem to have an unpartitioned device. And finally the Linux formatting tools work by writing to this unpartitioned device, not by calling into a rat's nest of ioctl's, so they're much easier for me to get along with. Part of the attraction of ZFS should be avoiding this messy part of Solaris, but we still have to use format/fmthard/fdisk/rmformat, to swap label types because ZFS won't, to frob the write cache because ZFS's user interface is too simple and does that semi-automatically though I'm not sure all the rules it's using, to enumerate the installed disks, to determine in which of the several states working / connected-but-not-identified / disconnected / disconnected-but-refcounted the iSCSI initiator is in. And while ZFS will do special things to an UNlabeled disk, I'm not
Re: [zfs-discuss] replace same sized disk fails with too small error
On Mon, Jan 19, 2009 at 11:05 AM, Adam Leventhal a...@eng.sun.com wrote: Since it's done in software by HDS, NetApp, and EMC, that's complete bullshit. Forcing people to spend 3x the money for a Sun drive that's identical to the seagate OEM version is also bullshit and a piss-poor answer. I didn't know that HDS, NetApp, and EMC all allow users to replace their drives with stuff they've bought at Fry's. Is this still covered by their service plan or would this only be in an unsupported config? Thanks. Adam So because an enterprise vendor requires you to use their drives in their array, suddenly zfs can't right-size? Vendor requirements have absolutely nothing to do with their right-sizing, and everything to do with them wanting your money. Are you telling me zfs is deficient to the point it can't handle basic right-sizing like a 15$ sata raid adapter? --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
Creating a slice, instead of using the whole disk, will cause ZFS to not enable write-caching on the underlying device. Correct. Engineering trade-off. Since most folks don't read the manual, or the best practices guide, until after they've hit a problem, it is really just a CYA entry :-( It seems this trade-off can now be mitigated, regarding Roch Bourbonnais comment on a another thread on this list: - http://mail.opensolaris.org/pipermail/zfs-discuss/2009-January/054587.html In particular: If ZFS owns a disk it will enable the write cache on the drive but I'm not positive this has a great performance impact today. It used to but that was before we had a proper NCQ implementation. Today I don't know that it helps much. That this is because we always flush the cache when consistency requires it. -- julien. http://blog.thilelli.net/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
Since it's done in software by HDS, NetApp, and EMC, that's complete bullshit. Forcing people to spend 3x the money for a Sun drive that's identical to the seagate OEM version is also bullshit and a piss-poor answer. I didn't know that HDS, NetApp, and EMC all allow users to replace their drives with stuff they've bought at Fry's. Is this still covered by their service plan or would this only be in an unsupported config? So because an enterprise vendor requires you to use their drives in their array, suddenly zfs can't right-size? Vendor requirements have absolutely nothing to do with their right-sizing, and everything to do with them wanting your money. Sorry, I must have missed your point. I thought that you were saying that HDS, NetApp, and EMC had a different model. Were you merely saying that the software in those vendors' products operates differently than ZFS? Are you telling me zfs is deficient to the point it can't handle basic right-sizing like a 15$ sata raid adapter? How do there $15 sata raid adapters solve the problem? The more details you could provide the better obviously. Adam -- Adam Leventhal, Fishworks http://blogs.sun.com/ahl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
On Mon, 19 Jan 2009, Adam Leventhal wrote: Are you telling me zfs is deficient to the point it can't handle basic right-sizing like a 15$ sata raid adapter? How do there $15 sata raid adapters solve the problem? The more details you could provide the better obviously. It is really quite simple. If the disk is resilvered but the new drive is a bit too small, then the RAID card might tell you that a bit of data might have lost in the last sectors, or it may just assume that you didn't need that data, or maybe a bit of cryptic message text scrolls off the screen a split second after it has been issued. Or if you try to write at the end of the volume and one of the replacement drives is a bit too short, then the RAID card may return a hard read or write error. Most filesystems won't try to use that last bit of space anyway since they run real slow when the disk is completely full, or their flimsy formatting algorithm always wastes a bit of the end of the disk. Only ZFS is rash enough to use all of the space provided to it, and actually expect that the space continues to be usable. Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
On Mon, Jan 19, 2009 at 12:39 PM, Adam Leventhal a...@eng.sun.com wrote: Sorry, I must have missed your point. I thought that you were saying that HDS, NetApp, and EMC had a different model. Were you merely saying that the software in those vendors' products operates differently than ZFS? Gosh, was the point that hard to get? Let me state it a fourth time: They all short stroke the disks to avoid the CF that results in all drives not adhering to a strict sizing standard. Are you telling me zfs is deficient to the point it can't handle basic right-sizing like a 15$ sata raid adapter? How do there $15 sata raid adapters solve the problem? The more details you could provide the better obviously. They short stroke the disk so that when you buy a new 500GB drive that isn't the exact same number of blocks you aren't screwed. It's a design choice to be both sane, and to make the end-users life easier. You know, sort of like you not letting people choose their raid layout... --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
On Mon, Jan 19, 2009 at 1:12 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: On Mon, 19 Jan 2009, Adam Leventhal wrote: Are you telling me zfs is deficient to the point it can't handle basic right-sizing like a 15$ sata raid adapter? How do there $15 sata raid adapters solve the problem? The more details you could provide the better obviously. It is really quite simple. If the disk is resilvered but the new drive is a bit too small, then the RAID card might tell you that a bit of data might have lost in the last sectors, or it may just assume that you didn't need that data, or maybe a bit of cryptic message text scrolls off the screen a split second after it has been issued. Or if you try to write at the end of the volume and one of the replacement drives is a bit too short, then the RAID card may return a hard read or write error. Most filesystems won't try to use that last bit of space anyway since they run real slow when the disk is completely full, or their flimsy formatting algorithm always wastes a bit of the end of the disk. Only ZFS is rash enough to use all of the space provided to it, and actually expect that the space continues to be usable. It's a horribly *bad thing* to not use the entire disk and right-size it for sanity's sake. That's why Sun currently sells arrays that do JUST THAT. I'd wager fishworks does just that as well. Why don't you open source that code and prove me wrong ;) I'm wondering why they don't come right out with it and say we want to intentionally make this painful to our end users so that they buy our packaged products. It'd be far more honest and productive than this pissing match. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
On Mon, Jan 19, 2009 at 01:35:22PM -0600, Tim wrote: Are you telling me zfs is deficient to the point it can't handle basic right-sizing like a 15$ sata raid adapter? How do there $15 sata raid adapters solve the problem? The more details you could provide the better obviously. They short stroke the disk so that when you buy a new 500GB drive that isn't the exact same number of blocks you aren't screwed. It's a design choice to be both sane, and to make the end-users life easier. You know, sort of like you not letting people choose their raid layout... Drive vendors, it would seem, have an incentive to make their 500GB drives as small as possible. Should ZFS then choose some amount of padding at the end of each device and chop it off as insurance against a slightly smaller drive? How much of the device should it chop off? Conversely, should users have the option to use the full extent of the drives they've paid for, say, if they're using a vendor that already provides that guarantee? You know, sort of like you not letting people choose their raid layout... Yes, I'm not saying it shouldn't be done. I'm asking what the right answer might be. Adam -- Adam Leventhal, Fishworks http://blogs.sun.com/ahl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
Tim wrote: On Mon, Jan 19, 2009 at 1:12 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us mailto:bfrie...@simple.dallas.tx.us wrote: On Mon, 19 Jan 2009, Adam Leventhal wrote: Are you telling me zfs is deficient to the point it can't handle basic right-sizing like a 15$ sata raid adapter? How do there $15 sata raid adapters solve the problem? The more details you could provide the better obviously. Note that for the LSI RAID controllers Sun uses on many products, if you take a disk that was JBOD and tell the controller to make it RAIDed, then the controller will relabel the disk for you and will cause you to lose the data. As best I can tell, ZFS is better in that it will protect your data rather than just relabeling and clobbering your data. AFAIK, NVidia and others do likewise. It is really quite simple. If the disk is resilvered but the new drive is a bit too small, then the RAID card might tell you that a bit of data might have lost in the last sectors, or it may just assume that you didn't need that data, or maybe a bit of cryptic message text scrolls off the screen a split second after it has been issued. Or if you try to write at the end of the volume and one of the replacement drives is a bit too short, then the RAID card may return a hard read or write error. Most filesystems won't try to use that last bit of space anyway since they run real slow when the disk is completely full, or their flimsy formatting algorithm always wastes a bit of the end of the disk. Only ZFS is rash enough to use all of the space provided to it, and actually expect that the space continues to be usable. It's a horribly *bad thing* to not use the entire disk and right-size it for sanity's sake. That's why Sun currently sells arrays that do JUST THAT. ?? I'd wager fishworks does just that as well. Why don't you open source that code and prove me wrong ;) I don't think so, because fishworks is an engineering team and I don't think I can reserve space on a person... at least not legally where I live :-) But this is not a problem for the Sun Storage 7000 systems because the supported disks are already right-sized. I'm wondering why they don't come right out with it and say we want to intentionally make this painful to our end users so that they buy our packaged products. It'd be far more honest and productive than this pissing match. I think that if there is enough real desire for this feature, then someone would file an RFE on http://bugs.opensolaris.org It would help to attach diffs to the bug and it would help to reach a concensus of the amount of space to be reserved prior to filing. This is not an intractable problem and easy workarounds already exist, but if ease of use is more valuable than squeezing every last block, then the RFE should fly. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
On Mon, Jan 19, 2009 at 2:55 PM, Adam Leventhal a...@eng.sun.com wrote: Drive vendors, it would seem, have an incentive to make their 500GB drives as small as possible. Should ZFS then choose some amount of padding at the end of each device and chop it off as insurance against a slightly smaller drive? How much of the device should it chop off? Conversely, should users have the option to use the full extent of the drives they've paid for, say, if they're using a vendor that already provides that guarantee? Drive vendors, it would seem, have incentive to make their 500GB drives as cheap as possible. The two are not necessarily one and the same. And again, I say take a look at the market today, figure out a percentage, and call it done. I don't think you'll find a lot of users crying foul over losing 1% of their drive space when they don't already cry foul over the false advertising that is drive sizes today. In any case, you might as well can ZFS entirely because it's not really fair that users are losing disk space to raid and metadata... see where this argument is going? I really, REALLY doubt you're going to have users screaming at you for losing 1% (or whatever the figure ends up being) to a right-sizing algorithm. In fact, I would bet the average user will NEVER notice if you don't tell them ahead of time. Sort of like the average user had absolutely no clue that 500GB drives were of slightly differing block numbers, and he'd end up screwed six months down the road if he couldn't source an identical drive. I have two disks in one of my systems... both maxtor 500GB drives, purchased at the same time shortly after the buyout. One is a rebadged Seagate, one is a true, made in China Maxtor. Different block numbers... same model drive, purchased at the same time. Wasn't zfs supposed to be about using software to make up for deficiencies in hardware? It would seem this request is exactly that... You know, sort of like you not letting people choose their raid layout... Yes, I'm not saying it shouldn't be done. I'm asking what the right answer might be. The *right answer* in simplifying storage is not manually slice up every disk you insert into the system to avoid this issue. The right answer is right-size by default, give admins the option to skip it if they really want. Sort of like I'd argue the right answer on the 7000 is to give users the raid options you do today by default, and allow them to lay it out themselves from some sort of advanced *at your own risk* mode, whether that be command line (the best place I'd argue) or something else. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
And again, I say take a look at the market today, figure out a percentage, and call it done. I don't think you'll find a lot of users crying foul over losing 1% of their drive space when they don't already cry foul over the false advertising that is drive sizes today. Perhaps it's quaint, but 5GB still seems like a lot to me to throw away. In any case, you might as well can ZFS entirely because it's not really fair that users are losing disk space to raid and metadata... see where this argument is going? Well, I see where this _specious_ argument is going. I have two disks in one of my systems... both maxtor 500GB drives, purchased at the same time shortly after the buyout. One is a rebadged Seagate, one is a true, made in China Maxtor. Different block numbers... same model drive, purchased at the same time. Wasn't zfs supposed to be about using software to make up for deficiencies in hardware? It would seem this request is exactly that... That's a fair point, and I do encourage you to file an RFE, but a) Sun has already solved this problem in a different way as a company with our products and b) users already have the ability to right-size drives. Perhaps a better solution would be to handle the procedure of replacing a disk with a slightly smaller one by migrating data and then treating the extant disks as slightly smaller as well. This would have the advantage of being far more dynamic and of only applying the space tax in situations where it actually applies. Adam -- Adam Leventhal, Fishworks http://blogs.sun.com/ahl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
So the place we are arriving is to push the RFE for shrinkable pools? Warning the user about the difference in actual drive size, then offering to shrink the pool to allow a smaller device seems like a nice solution to this problem. The ability to shrink pools might be very useful in other situations. Say I built server that once did a decent amount of iops using SATA disks, and now that the workloads iops is greatly increased (busy database?), I need SAS disks. If I'd originally bought 500gb SATA (current sweet spot) disks, I might have a lot of empty space in my pool. Shrinking the pool would allow me to migrate to smaller (capacity) SAS disks with much better seek times, without being forced to buy 2x as many disks due to the higher cost/gb of SAS. I think I remember an RFE for shrinkable pools, but can't find it - can someone post a link if they know where it is? cheers, Blake ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
On Sat, 17 Jan 2009 23:18:35 PST Antonius antoni...@gmail.com wrote: Maybe the other disk has an EFI label? -- Dick Hoogendijk -- PGP/GnuPG key: 01D2433D + http://nagual.nl/ | SunOS sxce snv105 ++ + All that's really worth doing is what we do for others (Lewis Carrol) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
So you're saying zfs does absolutely no right-sizing? That sounds like a bad idea all around... You can use a bigger disk; NOT a smaller disk. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
If so what should I do to remedy that? just reformat it? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
meh - Original Message - From: Antonius antoni...@gmail.com To: zfs-discuss@opensolaris.org Sent: Sunday, January 18, 2009 6:54 AM Subject: Re: [zfs-discuss] replace same sized disk fails with too small error If so what should I do to remedy that? just reformat it? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
On Sun, Jan 18, 2009 at 5:18 AM, casper@sun.com wrote: So you're saying zfs does absolutely no right-sizing? That sounds like a bad idea all around... You can use a bigger disk; NOT a smaller disk. Casper Right, which is an absolutely piss poor design decision and why every major storage vendor right-sizes drives. What happens if I have an old maxtor drive in my pool whose 500g is just slightly larger than every other mfg on the market? You know, the one who is no longer making their own drives since being purchased by seagate. I can't replace the drive anymore? *GREAT*. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
Right, which is an absolutely piss poor design decision and why every major storage vendor right-sizes drives. What happens if I have an old maxtor drive in my pool whose 500g is just slightly larger than every other mfg on the market? You know, the one who is no longer making their own drives since being purchased by seagate. I can't replace the drive anymore? *GREAT*. Sun does right size our drives. Are we talking about replacing a device bought from sun with another device bought from Sun? If these are just drives that fell off the back of some truck, you may not have that assurance. Adam -- Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
Right, which is an absolutely piss poor design decision and why every major storage vendor right-sizes drives. What happens if I have an old maxtor drive in my pool whose 500g is just slightly larger than every other mfg on the market? You know, the one who is no longer making their own drives since being purchased by seagate. I can't replace the drive anymore? *GREAT*. With a larger drive. Who can replace drives with smaller drives? What exactly does right size drives mean? They don't use all of the disk? Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
On Sun, 18 Jan 2009, Tim wrote: Right, which is an absolutely piss poor design decision and why every major storage vendor right-sizes drives. What happens if I have an old maxtor drive in my pool whose 500g is just slightly larger than every other mfg on the market? You know, the one who is no longer making their own drives since being purchased by seagate. I can't replace the drive anymore? *GREAT*. I appreciate that in these times of financial hardship that you can not afford a 750GB drive to replace the oversized 500GB drive. Sorry to hear about your situation. Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
On Sun, Jan 18, 2009 at 16:51, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: I appreciate that in these times of financial hardship that you can not afford a 750GB drive to replace the oversized 500GB drive. Sorry to hear about your situation. That's easy to say, but what if there were no larger alternative? Suppose I have a pool composed of those 1.5TB Seagate disks, and Hitachi puts out some of the same capacity that are actually slightly smaller. A drive fails in my array, I buy a Hitachi disk to replace it, and it doesn't work. If I can't get a large enough drive to replace the missing disk with, it'd be a shame to have to destroy and recreate the pool on smaller media. Perhaps this is yet another problem that can be solved with BP rewrite. If zpool replace detects that a disk is slightly smaller but not so small that it can't hold all the data, warn the user first but then allow them to replace the disk anyways. Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
On Sun, Jan 18, 2009 at 10:17 AM, casper@sun.com wrote: Right, which is an absolutely piss poor design decision and why every major storage vendor right-sizes drives. What happens if I have an old maxtor drive in my pool whose 500g is just slightly larger than every other mfg on the market? You know, the one who is no longer making their own drives since being purchased by seagate. I can't replace the drive anymore? *GREAT*. With a larger drive. Who can replace drives with smaller drives? What exactly does right size drives mean? They don't use all of the disk? Casper right-sizing is when the volume manager short strokes the drive intentionally because not all vendors 500GB is the same size. Hence the OP's problem. How aggressive the short-stroke is, depends on the OEM. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
On Sun, 18 Jan 2009, Will Murnane wrote: That's easy to say, but what if there were no larger alternative? Suppose I have a pool composed of those 1.5TB Seagate disks, and Hitachi puts out some of the same capacity that are actually slightly smaller. A drive fails in my array, I buy a Hitachi disk to replace it, and it doesn't work. If I can't get a large enough drive to replace the missing disk with, it'd be a shame to have to destroy and recreate the pool on smaller media. What do you propose that OpenSolaris should do about this? Should OpenSolaris use some sort of a table of common size drives, or use an algorithm which determines certain discrete usage values based on declared drive sizes and a margin for error? What should OpenSolaris of today do with the 20TB disk drives of tomorrow? What should the margin for error of a 30TB disk drive be? Is it ok to arbitrarily ignore 3/4TB of storage space? If the drive is actually a huge 20TB LUN exported from a SAN RAID array, how should the margin for error be handled in that case? Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
On Sun, Jan 18, 2009 at 10:16 AM, Adam Leventhal a...@eng.sun.com wrote: Right, which is an absolutely piss poor design decision and why every major storage vendor right-sizes drives. What happens if I have an old maxtor drive in my pool whose 500g is just slightly larger than every other mfg on the market? You know, the one who is no longer making their own drives since being purchased by seagate. I can't replace the drive anymore? *GREAT*. Sun does right size our drives. Are we talking about replacing a device bought from sun with another device bought from Sun? If these are just drives that fell off the back of some truck, you may not have that assurance. Adam -- Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl Since it's done in software by HDS, NetApp, and EMC, that's complete bullshit. Forcing people to spend 3x the money for a Sun drive that's identical to the seagate OEM version is also bullshit and a piss-poor answer. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
On Sun, Jan 18, 2009 at 12:19 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: On Sun, 18 Jan 2009, Will Murnane wrote: That's easy to say, but what if there were no larger alternative? Suppose I have a pool composed of those 1.5TB Seagate disks, and Hitachi puts out some of the same capacity that are actually slightly smaller. A drive fails in my array, I buy a Hitachi disk to replace it, and it doesn't work. If I can't get a large enough drive to replace the missing disk with, it'd be a shame to have to destroy and recreate the pool on smaller media. What do you propose that OpenSolaris should do about this? Should OpenSolaris use some sort of a table of common size drives, or use an algorithm which determines certain discrete usage values based on declared drive sizes and a margin for error? What should OpenSolaris of today do with the 20TB disk drives of tomorrow? What should the margin for error of a 30TB disk drive be? Is it ok to arbitrarily ignore 3/4TB of storage space? If the drive is actually a huge 20TB LUN exported from a SAN RAID array, how should the margin for error be handled in that case? Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ Take a look at drives on the market, figure out a percentage, and call it a day. If there's a significant issue with 20TB drives of the future, issue a bug report and a fix, just like every other issue that comes up. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
Does this all go away when BP-rewrite gets fully resolved/implemented? Short of the pool being 100% full, it should allow a rebalancing operation and possible LUN/device-size-shrink to match the new device that is being inserted? Thanks, -- MikeE -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Bob Friesenhahn Sent: Sunday, January 18, 2009 1:19 PM To: Will Murnane Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] replace same sized disk fails with too small error On Sun, 18 Jan 2009, Will Murnane wrote: That's easy to say, but what if there were no larger alternative? Suppose I have a pool composed of those 1.5TB Seagate disks, and Hitachi puts out some of the same capacity that are actually slightly smaller. A drive fails in my array, I buy a Hitachi disk to replace it, and it doesn't work. If I can't get a large enough drive to replace the missing disk with, it'd be a shame to have to destroy and recreate the pool on smaller media. What do you propose that OpenSolaris should do about this? Should OpenSolaris use some sort of a table of common size drives, or use an algorithm which determines certain discrete usage values based on declared drive sizes and a margin for error? What should OpenSolaris of today do with the 20TB disk drives of tomorrow? What should the margin for error of a 30TB disk drive be? Is it ok to arbitrarily ignore 3/4TB of storage space? If the drive is actually a huge 20TB LUN exported from a SAN RAID array, how should the margin for error be handled in that case? Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
On Sun, Jan 18, 2009 at 18:19, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: What do you propose that OpenSolaris should do about this? Take drive size, divide by 100, round down to two significant digits. Floor to a multiple of that size. This method wastes no more than 1% of the disk space, and gives a reasonable (I think) number. For example: I have a machine with a 250GB disk that is 251000193024 bytes long. $ python n=str(251000193024//100) int(n[:2] + 0 * (len(n)-2)) * 100 2500L So treat this volume as being 250 billion bytes long, exactly. Most drives are sold with two significant digits in the size: 320 GB, 400 GB, 640GB, 1.0 TB, etc. I don't see this changing any time particularly soon; unless someone starts selling a 1.25 TB drive or something, two digits will suffice. Even then, this formula would give you 96% (1.2/1.25) of the disk's capacity. Note that this method also works for small-capacity disks: suppose I have a disk that's exactly 250 billion bytes long. This formula will produce 250 billion as the size it is to be treated as. Thus, replacing my 251 billion byte disk with a 250 billion byte one will not be a problem. Is it ok to arbitrarily ignore 3/4TB of storage space? If it's less than 1% of the disk space, I don't see a problem doing so. If the drive is actually a huge 20TB LUN exported from a SAN RAID array, how should the margin for error be handled in that case? So make it configurable if you must. If no partition table exists when zpool create is called, make it right-size the disks, but if a pre-existing EFI label is there, use it instead. Or make a flag that tells zpool create not to right-size. Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
On Sun, 18 Jan 2009, Will Murnane wrote: Most drives are sold with two significant digits in the size: 320 GB, 400 GB, 640GB, 1.0 TB, etc. I don't see this changing any time particularly soon; unless someone starts selling a 1.25 TB drive or something, two digits will suffice. Even then, this formula would give you 96% (1.2/1.25) of the disk's capacity. If the drive is attached to a RAID controller which steals part of its capacity for its own purposes, how will you handle that? These stated drive sizes are just marketing terms and do not have a sound technical basis. Don't drive vendors provide actual sizing information in their specification sheets so that knowledgeable people can purchase the right sized drive? Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
On Sun, Jan 18, 2009 at 1:30 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: On Sun, 18 Jan 2009, Will Murnane wrote: Most drives are sold with two significant digits in the size: 320 GB, 400 GB, 640GB, 1.0 TB, etc. I don't see this changing any time particularly soon; unless someone starts selling a 1.25 TB drive or something, two digits will suffice. Even then, this formula would give you 96% (1.2/1.25) of the disk's capacity. If the drive is attached to a RAID controller which steals part of its capacity for its own purposes, how will you handle that? These stated drive sizes are just marketing terms and do not have a sound technical basis. Don't drive vendors provide actual sizing information in their specification sheets so that knowledgeable people can purchase the right sized drive? Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ You look at the size of the drive and you take a set percentage off... If it's a LUN and it's so far off it still can't be added with the percentage that works across the board for EVERYTHING ELSE, you change the size of the LUN at the storage array or adapter. I know it's fun to pretend this is rocket science and impossible, but the fact remains the rest of the industry has managed to make it work. I have a REAL tough time believing that Sun and/or zfs is so deficient it's an insurmountable obstacle for them. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
On Sun, Jan 18 at 13:43, Tim wrote: You look at the size of the drive and you take a set percentage off... If it's a LUN and it's so far off it still can't be added with the percentage that works across the board for EVERYTHING ELSE, you change the size of the LUN at the storage array or adapter. I know it's fun to pretend this is rocket science and impossible, but the fact remains the rest of the industry has managed to make it work. I have a REAL tough time believing that Sun and/or zfs is so deficient it's an insurmountable obstacle for them. If, instead of having ZFS manage these differences, a user simply created slices that were, say, 98% as big as the average number of sectors in a XXX GB drive... would ZFS enable write cache on that device or not? I thought I'd read that ZFS didn't use write cache on slices because it couldn't guarantee that the other slices were used in a write-cache-safe fashion, would that apply to cases where no other slices were allocated? -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
Hi Bob, Will, Tim, I also had some off-list comments on my irrelevant comments. So I will try to make this post less irrelevant, though my thoughts on this topic may be off the list discussion line of thoughts, as usual. From the major storage vendors I know, network storage systems as integrated products, are only offered with the same size/type of drives in a traditional RAID set (not the V-RAID style). Mixing different drives in a traditional RAID set is not recommanded by many vendors, and I think when taking that as a policy, it will cut off much trouble in trying to mix different drives in a RAID set. And folks, the last time I really got into the largest database (by Winter) data sets, their sizes were not really as huge as I thought. http://www.wintercorp.com/VLDB/2005_TopTen_Survey/TopTenProgram.html Again, I think, the exponential data growth we have been talking about for a few years is more in file data. Databases use block-storage, very efficient on capacity. The kind of drives is not as important as how do you use those drives. IMHO. Best, z - Original Message - From: Bob Friesenhahn bfrie...@simple.dallas.tx.us To: Will Murnane will.murn...@gmail.com Cc: zfs-discuss@opensolaris.org Sent: Sunday, January 18, 2009 2:30 PM Subject: Re: [zfs-discuss] replace same sized disk fails with too small error On Sun, 18 Jan 2009, Will Murnane wrote: Most drives are sold with two significant digits in the size: 320 GB, 400 GB, 640GB, 1.0 TB, etc. I don't see this changing any time particularly soon; unless someone starts selling a 1.25 TB drive or something, two digits will suffice. Even then, this formula would give you 96% (1.2/1.25) of the disk's capacity. If the drive is attached to a RAID controller which steals part of its capacity for its own purposes, how will you handle that? These stated drive sizes are just marketing terms and do not have a sound technical basis. Don't drive vendors provide actual sizing information in their specification sheets so that knowledgeable people can purchase the right sized drive? Bob == Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
On Sun, Jan 18, 2009 at 1:56 PM, Eric D. Mudama edmud...@bounceswoosh.orgwrote: On Sun, Jan 18 at 13:43, Tim wrote: You look at the size of the drive and you take a set percentage off... If it's a LUN and it's so far off it still can't be added with the percentage that works across the board for EVERYTHING ELSE, you change the size of the LUN at the storage array or adapter. I know it's fun to pretend this is rocket science and impossible, but the fact remains the rest of the industry has managed to make it work. I have a REAL tough time believing that Sun and/or zfs is so deficient it's an insurmountable obstacle for them. If, instead of having ZFS manage these differences, a user simply created slices that were, say, 98% as big as the average number of sectors in a XXX GB drive... would ZFS enable write cache on that device or not? I thought I'd read that ZFS didn't use write cache on slices because it couldn't guarantee that the other slices were used in a write-cache-safe fashion, would that apply to cases where no other slices were allocated? It will disable it by default, but you can manually re-enable it. That's not so much the point though. ZFS is supposed to be filesystem/volume manager all-in-one. When I have to start going through format every time I add a drive, it's a non-starter, not to mention it's a kludge. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
I ran into a bad label causing this once. br/br/ Usually the s2 slice is a good bet for your whole disk device, but if it's EFI labeled, you need to use p0 (somebody correct me if I'm wrong). br/br/ I like to zero the first few megs of a drive before doing any of this stuff. This will destroy any data. br/br/ Obviously, change span style=font-family: monospace;c7t1d0p0/span to whatever your drive's device is. br/br/ span style=font-family: monospace;dd if=/dev/zero of=/dev/rdsk/c7t1d0p0 bs=512 count=8192/span br/br/ For EFI you may also need to zero the end of the disk too because it writes the VTOC to both the beginning and end for redundancy. I'm not sure of the best way to get the drive size in blocks without using format(1M) so I'll leave that as an exercise for the reader.For my 500gb disks it was something like: br/br/ 976533504 is $number_of_blocks (from format) - 8192 (4mb in 512 byte blocks). br/br/ span style=font-family: monospace;dd if=/dev/zero of=/dev/rdsk/c7t0d0p0 bs=512 count=8192 seek=976533504/span br/br/ When you run format - fdisk, it should prompt you to write a new Solaris label to the disk.Just accept all the defaults. br/br/ span style=font-family: monospace;format -d c7t1d0/span br/br/ Remember to double-check your devices and wait a beat before pressing enter with those dd commands as they destroy without warning or checking. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
Yes, I agree, command interface is more efficient and more risky than GUI. You will have to be very careful when doing that. Best, z - Original Message - From: Al Tobey tob...@gmail.com To: zfs-discuss@opensolaris.org Sent: Sunday, January 18, 2009 3:09 PM Subject: Re: [zfs-discuss] replace same sized disk fails with too small error I ran into a bad label causing this once. br/br/ Usually the s2 slice is a good bet for your whole disk device, but if it's EFI labeled, you need to use p0 (somebody correct me if I'm wrong). br/br/ I like to zero the first few megs of a drive before doing any of this stuff. This will destroy any data. br/br/ Obviously, change span style=font-family: monospace;c7t1d0p0/span to whatever your drive's device is. br/br/ span style=font-family: monospace;dd if=/dev/zero of=/dev/rdsk/c7t1d0p0 bs=512 count=8192/span br/br/ For EFI you may also need to zero the end of the disk too because it writes the VTOC to both the beginning and end for redundancy. I'm not sure of the best way to get the drive size in blocks without using format(1M) so I'll leave that as an exercise for the reader.For my 500gb disks it was something like: br/br/ 976533504 is $number_of_blocks (from format) - 8192 (4mb in 512 byte blocks). br/br/ span style=font-family: monospace;dd if=/dev/zero of=/dev/rdsk/c7t0d0p0 bs=512 count=8192 seek=976533504/span br/br/ When you run format - fdisk, it should prompt you to write a new Solaris label to the disk.Just accept all the defaults. br/br/ span style=font-family: monospace;format -d c7t1d0/span br/br/ Remember to double-check your devices and wait a beat before pressing enter with those dd commands as they destroy without warning or checking. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
comment at the bottom... Tim wrote: On Sun, Jan 18, 2009 at 1:56 PM, Eric D. Mudama edmud...@bounceswoosh.org mailto:edmud...@bounceswoosh.org wrote: On Sun, Jan 18 at 13:43, Tim wrote: You look at the size of the drive and you take a set percentage off... If it's a LUN and it's so far off it still can't be added with the percentage that works across the board for EVERYTHING ELSE, you change the size of the LUN at the storage array or adapter. I know it's fun to pretend this is rocket science and impossible, but the fact remains the rest of the industry has managed to make it work. I have a REAL tough time believing that Sun and/or zfs is so deficient it's an insurmountable obstacle for them. If, instead of having ZFS manage these differences, a user simply created slices that were, say, 98% as big as the average number of sectors in a XXX GB drive... would ZFS enable write cache on that device or not? I thought I'd read that ZFS didn't use write cache on slices because it couldn't guarantee that the other slices were used in a write-cache-safe fashion, would that apply to cases where no other slices were allocated? It will disable it by default, but you can manually re-enable it. That's not so much the point though. ZFS is supposed to be filesystem/volume manager all-in-one. When I have to start going through format every time I add a drive, it's a non-starter, not to mention it's a kludge. DIY. Personally, I'd be more upset if ZFS reserved any sectors for some potential swap I might want to do later, but may never need to do. If you want to reserve some space for swappage, DIY. As others have noted, this is not a problem for systems vendors because we try, and usually succeed, at ensuring that our multiple sources of disk drives are compatible such that we can swap one for another. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
On Sun, Jan 18, 2009 at 2:43 PM, Richard Elling richard.ell...@sun.comwrote: comment at the bottom... DIY. Personally, I'd be more upset if ZFS reserved any sectors for some potential swap I might want to do later, but may never need to do. If you want to reserve some space for swappage, DIY. As others have noted, this is not a problem for systems vendors because we try, and usually succeed, at ensuring that our multiple sources of disk drives are compatible such that we can swap one for another. -- richard And again I call BS. I've pulled drives out of a USP-V, Clariion, DMX, and FAS3040. Every single one had drives of slightly differing sizes. Every single one is right-sized at format time. Hell, here's a filer I have sitting in a lab right now: RAID DiskDeviceHA SHELF BAY CHAN Pool Type RPM Used (MB/blks)Phys (MB/blks) ---- - ---- dparity 0b.320b2 0 FC:B - FCAL 1 68000/139264000 68444/140174232 parity 0b.330b2 1 FC:B - FCAL 1 68000/139264000 68444/140174232 data0b.340b2 2 FC:B - FCAL 1 68000/139264000 68552/140395088 Notice line's 2 and 3 are different physical block size, and those are BOTH seagate cheetah's, just different generation. So, it gets short stroked to 68000 from 68552 or 68444. And NO, the re-branded USP-V's Sun sell's don't do anything any differently, so stop lying, it's getting old. If you're so concerned with the storage *lying* or *hiding* space, I assume you're leading the charge at Sun to properly advertise drive sizes, right? Because the 1TB drive I can buy from Sun today is in no way, shape, or form able to store 1TB of data. You use the same *fuzzy math* the rest of the industry does. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
Tim wrote: On Sun, Jan 18, 2009 at 2:43 PM, Richard Elling richard.ell...@sun.com mailto:richard.ell...@sun.com wrote: comment at the bottom... DIY. Personally, I'd be more upset if ZFS reserved any sectors for some potential swap I might want to do later, but may never need to do. If you want to reserve some space for swappage, DIY. As others have noted, this is not a problem for systems vendors because we try, and usually succeed, at ensuring that our multiple sources of disk drives are compatible such that we can swap one for another. -- richard And again I call BS. I've pulled drives out of a USP-V, Clariion, DMX, and FAS3040. Every single one had drives of slightly differing sizes. Every single one is right-sized at format time. It is naive to think that different storage array vendors would care about people trying to use another array vendors disks in their arrays. In fact, you should get a flat, impersonal, not supported response. What vendors can do, is make sure that if you get a disk which is supported in a platform and replace it with another disk which is also supported, and the same size, then it will just work. In order for this method to succeed, a least, common size is used. Hell, here's a filer I have sitting in a lab right now: RAID DiskDeviceHA SHELF BAY CHAN Pool Type RPM Used (MB/blks)Phys (MB/blks) ---- - ---- dparity 0b.320b2 0 FC:B - FCAL 1 68000/139264000 68444/140174232 parity 0b.330b2 1 FC:B - FCAL 1 68000/139264000 68444/140174232 data0b.340b2 2 FC:B - FCAL 1 68000/139264000 68552/140395088 Notice line's 2 and 3 are different physical block size, and those are BOTH seagate cheetah's, just different generation. So, it gets short stroked to 68000 from 68552 or 68444. And NO, the re-branded USP-V's Sun sell's don't do anything any differently, so stop lying, it's getting old. Vendors can change the default label, which is how it is implemented. For example, if we source XYZ-GByte disks from two different vendors intended for the same platform, then we will ensure that the number of available sectors is the same, otherwise the FRU costs would be very high. No conspiracy here... just good planning. If you're so concerned with the storage *lying* or *hiding* space, I assume you're leading the charge at Sun to properly advertise drive sizes, right? Because the 1TB drive I can buy from Sun today is in no way, shape, or form able to store 1TB of data. You use the same *fuzzy math* the rest of the industry does. There is no fuzzy math. Disk vendors size by base 10. They explicitly state this in their product documentation, as business law would expect. http://en.wikipedia.org/wiki/Mebibyte -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
On Sun, Jan 18, 2009 at 3:39 PM, Richard Elling richard.ell...@sun.comwrote: Tim wrote: It is naive to think that different storage array vendors would care about people trying to use another array vendors disks in their arrays. In fact, you should get a flat, impersonal, not supported response. But we aren't talking about me trying to stick disks into Sun's arrays. We're talking about how this open source, supposed all-in-one volume manager and filesystem handles new disks. You know, the one that was supposed to make all of our lives infinitely easier, and simplify managing lots, and lots of disks. Whether they be inside of a official Sun array or just a server running Solaris. What vendors can do, is make sure that if you get a disk which is supported in a platform and replace it with another disk which is also supported, and the same size, then it will just work. In order for this method to succeed, a least, common size is used. The ONLY reason vendors put special labels or firmware on disks is to force you to buy them direct. Let's not pretend there's something magical about an HDS 1TB drive or a Sun 1TB drive. They're rolling off the same line as everyone else's. The way they ensure the disk works is by short stroking them from the start... It's *naive* to claim it's any sort of technical limitation. Vendors can change the default label, which is how it is implemented. For example, if we source XYZ-GByte disks from two different vendors intended for the same platform, then we will ensure that the number of available sectors is the same, otherwise the FRU costs would be very high. No conspiracy here... just good planning. The number of blocks on the disks won't be the same. Which is why they're right-sized per above. Do I really need to start pulling disks from my Sun systems to prove this point? Sun does not require exact block counts any more than HDS, EMC, or NetApp. So for the life of the server, I can call in and get the exact same part that broke in the box from Sun, because they've got contracts with the drive mfg's. What happens when I'm out of the supported life of the system? Oh, I just buy a new one? Because having my volume manager us a bit of intelligence and short stroke the disk like I would expect from the start is a *bad idea*. The sad part about all of this is that the $15 promise raid controller in my desktop short-strokes by default and you're telling me zfs can't, or won't. There is no fuzzy math. Disk vendors size by base 10. They explicitly state this in their product documentation, as business law would expect. http://en.wikipedia.org/wiki/Mebibyte -- richard If it's not fuzzy math, drive mfg's wouldn't lose in court over the false advertising, would they? http://apcmag.com/seagate_settles_class_action_cash_back_over_misleading_hard_drive_capacities.htm At the end of the day, this back and forth changes nothing though. The default behavior for zfs importing a new disk should be right-sizing a fairly conservative amount if you're (you as in Sun, not you as in Richard) going to continue to market it as you have in the past. It most definitely does not eliminate the same old pains of managing disks with Solaris if I have to start messing with labels and slices again. The whole point of merging a volume manager/filesystem/etc is to take away that pain. That is not even remotely manageable over the long term. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
On Sun, Jan 18 at 15:00, Tim wrote: If you're so concerned with the storage *lying* or *hiding* space, I assume you're leading the charge at Sun to properly advertise drive sizes, right? Because the 1TB drive I can buy from Sun today is in no way, shape, or form able to store 1TB of data. You use the same *fuzzy math* the rest of the industry does. While in general I'd like to see a combined FS/VM be smarter as you do, on this point I disagree with you. Most drive vendors publish the exact sector counts of each model that they ship, and this should be sufficient for your purposes. As an arbitrary example, seagate lists a number of Guaranteed Sectors in their technical specifications for each unique model number. Their 7200.11 500GB drive ST3500320AS guarantees 976,773,168 sectors, which happens to exactly equal the IDEMA amount for 500GB. While rounding down to the next IDEMA multiple might make sense, depending on the technique that could cause you 1GB per device, and i'm sure a lot of people would rather not have that limitation. -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] replace same sized disk fails with too small error
Volume name = ascii name = SAMSUNG-S0VVJ1CP30539-0001-465.76GB bytes/sector= 512 sectors = 976760063 accessible sectors = 976760030 Part TagFlag First Sector Size Last Sector 0usrwm 256 465.75GB 976743646 1 unassignedwm 0 0 0 2 unassignedwm 0 0 0 3 unassignedwm 0 0 0 4 unassignedwm 0 0 0 5 unassignedwm 0 0 0 6 unassignedwm 0 0 0 8 reservedwm 9767436478.00MB 976760030 This is the readout from a disk it's meant to replace. looks like the same number of sectors, as it should be being the same model. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss