Re: [zfs-discuss] zfs promote and ENOSPC
Hi, Mike, It's like 6452872, it need enough space for 'zfs promote' - Regards, Mike Gerdts wrote: I needed to free up some space to be able to create and populate a new upgrade. I was caught off guard by the amount of free space required by zfs promote. bash-3.2# uname -a SunOS indy2 5.11 snv_86 i86pc i386 i86pc bash-3.2# zfs list NAME USED AVAIL REFER MOUNTPOINT rpool 5.49G 1.83G55K /rpool [EMAIL PROTECTED] 46.5K - 49.5K - rpool/ROOT5.39G 1.83G18K none rpool/ROOT/2008.052.68G 1.83G 3.38G legacy rpool/ROOT/2008.05/opt 814M 1.83G 22.3M legacy rpool/ROOT/2008.05/[EMAIL PROTECTED]43K - 22.3M - rpool/ROOT/2008.05/opt/SUNWspro739M 1.83G 739M legacy rpool/ROOT/2008.05/opt/netbeans 52.9M 1.83G 52.9M legacy rpool/ROOT/preview2 2.71G 1.83G 2.71G /mnt rpool/ROOT/[EMAIL PROTECTED] 6.13M - 2.71G - rpool/ROOT/preview2/opt 27K 1.83G 22.3M legacy rpool/export 89.8M 1.83G19K /export rpool/export/home 89.8M 1.83G 89.8M /export/home bash-3.2# zfs promote rpool/ROOT/2008.05 cannot promote 'rpool/ROOT/2008.05': out of space Notice that I have 1.83 GB of free space and the snapshot from which the clone was created (rpool/ROOT/[EMAIL PROTECTED]) is 2.71 GB. It was not until I had more than 2.71 GB of free space that I could promote rpool/ROOT/2008.05. This behavior does not seem to be documented. Is it a bug in the documentation or zfs? -- Regards, Robin Guo, Xue-Bin Guo Solaris Kernel and Data Service QE, Sun China Engineering and Reserch Institute Phone: +86 10 82618200 +82296 Email: [EMAIL PROTECTED] Blog: http://blogs.sun.com/robinguo ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SMC Webconsole 3.1 and ZFS Administration 1.0 - stacktraces in snv_b89
Likewise. Just plain doesn't work. Not required though, since the command-line is okay and way powerful ;) And there are some more interesting challenges to work on, so I didn't push this problem any more yet. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS conflict with MAID?
Richard Elling schrieb: Tobias Exner wrote: Hi John, I've done some tests with a SUN X4500 with zfs and MAID using the powerd of Solaris 10 to power down the disks which weren't access for a configured time. It's working fine... The only thing I run into was the problem that it took roundabout a minute to power on 4 disks in a zfs-pool. The problem seems to be that the powerd starts the disks sequentially. Did you power down disks or spin down disks? It is relatively easy to spin down (or up) disks with luxadm stop (start). If a disk is accessed, then it will spin itself up. By default, the timeout for disk response is 60 seconds, and most disks can spin up in less than 60 seconds. luxadm is not very helpful when I want to have a automatic MAID-solution. The powerd of Solaris just spin down automatically the disks and the powerconsumption falls below 1 watts. ( 3,5) My tests show me that it will take roundabout 20 seconds to power up one single disk and to get access. Actually I don't know why it takes 55 seconds to spin up 4 disks in a zfs-pool, but that are my results.. I tried to open a RFE... but until now without success. Perhaps because disks will spin up when an access is requested, so to solve your problem you'd have to make sure that all of a set of disks are accessed when any in the set are accessed -- butugly. As I know, when I'm using a zfs-pool I have no possibilities to change the behavior which disk will be accessed when I just try to read or write. Do you know more? NB. back when I had a largish pile of smallish disks hanging off my workstation for testing, a simple cron job running luxadm stop helped my energy bill :-) -- richard regards, Tobias ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] SSD reliability, wear levelling, warranty period
I've been reading, with great (personal/professional) interest about Sun getting very serious about SSD-equipping servers as a standard feature in the 2nd half of this year. Yeah! Excellent news - and it's nice to see Sun lead, rather than trail the market! Those of us, who are ZFS zealots, know the value of a ZFS log, and/or ZFS cache device and how these devices can (very positively) impact the performance of a ZFS raid configuration built on cost effective SATA disk drives. But - based on personal observation - there is a lot of hype surrounding SSD reliability. Obviously the *promise* of this technology is higher performance and *reliability* with lower power requirements due to no (mechanical) moving parts. But... if you look broadly at the current SSD product offerings, you see: a) lower than expected performance - particularly in regard to write IOPS (I/O Ops per Second) and b) warranty periods that are typically 1 year - with the (currently rare) exception of products that are offered with a 5 year warranty. Obviously, for SSD products to live up to the current marketing hype, they need to deliver superior performance and *reliability*. Everyone I know *wants* one or more SSD devices - but they also have the expectation that those devices will come with a warranty at least equivalent to current hard disk drives (3 or 5 years). So ... I'm interested in learning from anyone on this list, and, in particular, from Team ZFS, what the reality is regarding SSD reliability. Obviously Sun employees are not going to compromise their employment and divulge upcoming product specific data - but there must be *some* data (white papers etc) in the public domain that would provide some relevant technical data?? Regards, -- Al Hopper Logical Approach Inc,Plano,TX [EMAIL PROTECTED] Voice: 972.379.2133 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD reliability, wear levelling, warranty period
Hi Al, Sorry, but leading the market is not right at this point. www.superssd.com has the answer to all those questions about SSD and reliability/speed for many years.. But I'm with you. I'm looking forward the coming products of SUN concerning SSD.. btw: it's seems to me that this thread is a little bit OT. regards, Tobias Exner Al Hopper schrieb: I've been reading, with great (personal/professional) interest about Sun getting very serious about SSD-equipping servers as a standard feature in the 2nd half of this year. Yeah! Excellent news - and it's nice to see Sun lead, rather than trail the market! Those of us, who are ZFS zealots, know the value of a ZFS log, and/or ZFS cache device and how these devices can (very positively) impact the performance of a ZFS raid configuration built on cost effective SATA disk drives. But - based on personal observation - there is a lot of hype surrounding SSD reliability. Obviously the *promise* of this technology is higher performance and *reliability* with lower power requirements due to no (mechanical) moving parts. But... if you look broadly at the current SSD product offerings, you see: a) lower than expected performance - particularly in regard to write IOPS (I/O Ops per Second) and b) warranty periods that are typically 1 year - with the (currently rare) exception of products that are offered with a 5 year warranty. Obviously, for SSD products to live up to the current marketing hype, they need to deliver superior performance and *reliability*. Everyone I know *wants* one or more SSD devices - but they also have the expectation that those devices will come with a warranty at least equivalent to current hard disk drives (3 or 5 years). So ... I'm interested in learning from anyone on this list, and, in particular, from Team ZFS, what the reality is regarding SSD reliability. Obviously Sun employees are not going to compromise their employment and divulge upcoming product specific data - but there must be *some* data (white papers etc) in the public domain that would provide some relevant technical data?? Regards, ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD reliability, wear levelling, warranty period
On Wed, Jun 11, 2008 at 3:59 AM, Tobias Exner [EMAIL PROTECTED] wrote: Hi Al, Sorry, but leading the market is not right at this point. www.superssd.com has the answer to all those questions about SSD and reliability/speed for many years.. But I'm with you. I'm looking forward the coming products of SUN concerning SSD.. btw: it's seems to me that this thread is a little bit OT. I don't think its OT - because SSDs make perfect sense as ZFS log and/or cache devices. If I did not make that clear in my OP then I failed to communicate clearly. In both these roles (log/cache) reliability is of the utmost importance. Thanks for the link - I'll take a look/see. regards, Tobias Exner Al Hopper schrieb: I've been reading, with great (personal/professional) interest about Sun getting very serious about SSD-equipping servers as a standard feature in the 2nd half of this year. Yeah! Excellent news - and it's nice to see Sun lead, rather than trail the market! Those of us, who are ZFS zealots, know the value of a ZFS log, and/or ZFS cache device and how these devices can (very positively) impact the performance of a ZFS raid configuration built on cost effective SATA disk drives. But - based on personal observation - there is a lot of hype surrounding SSD reliability. Obviously the *promise* of this technology is higher performance and *reliability* with lower power requirements due to no (mechanical) moving parts. But... if you look broadly at the current SSD product offerings, you see: a) lower than expected performance - particularly in regard to write IOPS (I/O Ops per Second) and b) warranty periods that are typically 1 year - with the (currently rare) exception of products that are offered with a 5 year warranty. Obviously, for SSD products to live up to the current marketing hype, they need to deliver superior performance and *reliability*. Everyone I know *wants* one or more SSD devices - but they also have the expectation that those devices will come with a warranty at least equivalent to current hard disk drives (3 or 5 years). So ... I'm interested in learning from anyone on this list, and, in particular, from Team ZFS, what the reality is regarding SSD reliability. Obviously Sun employees are not going to compromise their employment and divulge upcoming product specific data - but there must be *some* data (white papers etc) in the public domain that would provide some relevant technical data?? Regards, Regards, -- Al Hopper Logical Approach Inc,Plano,TX [EMAIL PROTECTED] Voice: 972.379.2133 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD reliability, wear levelling, warranty period
On Jun 11, 2008, at 1:16 AM, Al Hopper wrote: But... if you look broadly at the current SSD product offerings, you see: a) lower than expected performance - particularly in regard to write IOPS (I/O Ops per Second) True. Flash is quite asymmetric in its performance characteristics. That said, the L2ARC has been specially designed to play well with the natural strengths and weaknesses of flash. and b) warranty periods that are typically 1 year - with the (currently rare) exception of products that are offered with a 5 year warranty. You'll see a new class of SSDs -- eSSDs -- designed for the enterprise with longer warranties and more write/erase cycles. Further, ZFS will do its part by not killing the write/erase cycles of the L2ARC by constantly streaming as fast as possible. You should see lifetimes in the 3-5 year range on typical flash. Obviously, for SSD products to live up to the current marketing hype, they need to deliver superior performance and *reliability*. Everyone I know *wants* one or more SSD devices - but they also have the expectation that those devices will come with a warranty at least equivalent to current hard disk drives (3 or 5 years). I don't disagree entirely, but as a cache device flash actually can be fairly unreliable and we'll pick it up in ZFS. So ... I'm interested in learning from anyone on this list, and, in particular, from Team ZFS, what the reality is regarding SSD reliability. Obviously Sun employees are not going to compromise their employment and divulge upcoming product specific data - but there must be *some* data (white papers etc) in the public domain that would provide some relevant technical data?? A typical high-end SSD can sustain 100k write/erase cycles so you can do some simple math to see that a 128GB device written to at a rate of 150M/s will last nearly 3 years. Again, note that unreliable devices will result in a performance degradation when you fail a checksum in the L2ARC, but the data will still be valid out of the main storage pool. You're going to see much more on this in the next few months. I made a post to my blog that probably won't answer your questions directly, but may help inform you about what we have in mind. http://blogs.sun.com/ahl/entry/flash_hybrid_pools_and_future Adam -- Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD reliability, wear levelling, warranty period
The reliability of flash increasing alot if "wear leveling" is implemented and there's the capability to build a raid over a couple of flash-modules ( maybe automatically by the controller ). And if there are RAM-modules as a cache infront of the flash the most problems will be solved regarding fast read- and write-access. I'm very interested what kind of data security will be implemented by SUN in future. I was not able to find any technical information until now. @ Adam I never heared about "eSSD". Do you have more information about this? google and me cannot find anything. regards, Tobias Adam Leventhal schrieb: On Jun 11, 2008, at 1:16 AM, Al Hopper wrote: But... if you look broadly at the current SSD product offerings, you see: a) lower than expected performance - particularly in regard to write IOPS (I/O Ops per Second) True. Flash is quite asymmetric in its performance characteristics. That said, the L2ARC has been specially designed to play well with the natural strengths and weaknesses of flash. and b) warranty periods that are typically 1 year - with the (currently rare) exception of products that are offered with a 5 year warranty. You'll see a new class of SSDs -- eSSDs -- designed for the enterprise with longer warranties and more write/erase cycles. Further, ZFS will do its part by not killing the write/erase cycles of the L2ARC by constantly streaming as fast as possible. You should see lifetimes in the 3-5 year range on typical flash. Obviously, for SSD products to live up to the current marketing hype, they need to deliver superior performance and *reliability*. Everyone I know *wants* one or more SSD devices - but they also have the expectation that those devices will come with a warranty at least equivalent to current hard disk drives (3 or 5 years). I don't disagree entirely, but as a cache device flash actually can be fairly unreliable and we'll pick it up in ZFS. So ... I'm interested in learning from anyone on this list, and, in particular, from Team ZFS, what the reality is regarding SSD reliability. Obviously Sun employees are not going to compromise their employment and divulge upcoming product specific data - but there must be *some* data (white papers etc) in the public domain that would provide some relevant technical data?? A typical high-end SSD can sustain 100k write/erase cycles so you can do some simple math to see that a 128GB device written to at a rate of 150M/s will last nearly 3 years. Again, note that unreliable devices will result in a performance degradation when you fail a checksum in the L2ARC, but the data will still be valid out of the main storage pool. You're going to see much more on this in the next few months. I made a post to my blog that probably won't answer your questions directly, but may help inform you about what we have in mind. http://blogs.sun.com/ahl/entry/flash_hybrid_pools_and_future Adam -- Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD reliability, wear levelling, warranty period
Tobias Exner wrote: The reliability of flash increasing alot if wear leveling is implemented and there's the capability to build a raid over a couple of flash-modules ( maybe automatically by the controller ). And if there are RAM-modules as a cache infront of the flash the most problems will be solved regarding fast read- and write-access. I'm very interested what kind of data security will be implemented by SUN in future. I was not able to find any technical information until now. If by data security you mean encrypting the data then see this project: http://opensolaris.org/os/project/zfs-crypto/ If you don't mean encrypting the data in the filesystem then what do you mean by that term ? Note that Sun also has tape encryption products as well. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Filesystem for each home dir - 10,000 users?
On Sat, 7 Jun 2008, Mattias Pantzare wrote: If I need to count useage I can use du. But if you can implement space usage info on a per-uid basis you are not far from quota per uid... That sounds like quite a challenge. UIDs are just numbers and new ones can appear at any time. Files with existing UIDs can have their UIDs switched from one to another at any time. The space used per UID needs to be tallied continuously and needs to track every change, including real-time file growth and truncation. We are ultimately talking about 128 bit counters here. Instead of having one counter per filesystem we now have potentially hundreds of thousands, which represents substantial memory. But if you already have the ZAP code, you ought to be able to do quick lookups of arbitrary byte sequences, right? Just assume that a value not stored is zero (or infinity, or uninitialized, as applicable), and you have the same functionality as the sparse quota file on ufs, without the problems. Besides, uid/gid/sid quotas would usually make more sense at the zpool level than at the individual filesystem level, so perhaps it's not _that_ bad. Which is to say, you want user X to have an n GB quota over the whole zpool, and you probably don't so much care whether the filesystem within the zpool corresponds to his home directory or to some shared directory. Multicore systems have the additional challenge that this complex information needs to be effectively shared between cores. Imagine if you have 512 CPU cores, all of which are running some of the ZFS code and have their own caches which become invalidated whenever one of those counters is updated. This sounds like a no-go for an almost infinite-sized pooled last word filesystem like ZFS. ZFS is already quite lazy at evaluating space consumption. With ZFS, 'du' does not always reflect true usage since updates are delayed. Whatever mechanism can check at block allocation/deallocation time to keep track of per-filesystem space (vs a filesystem quota, if there is one) could surely also do something similar against per-uid/gid/sid quotas. I suspect a lot of existing functions and data structures could be reused or adapted for most of it. Just one more piece of metadata to update, right? Not as if ufs quotas had zero runtime penalty if enabled. And you only need counters and quotas in-core for identifiers applicable to in-core znodes, not for every identifier used on the zpool. Maybe I'm off base on the details. But in any event, I expect that it's entirely possible to make it happen, scalably. Just a question of whether it's worth the cost of designing, coding, testing, documenting. I suspect there may be enough scenarios for sites with really high numbers of accounts (particularly universities, which are not only customers in their own right, but a chance for future mindshare) that it might be worthwhile, but I don't know that to be the case. IMO, even if no one sort of site using existing deployment architectures would justify it, given the future blurring of server, SAN, and NAS (think recent SSD announcement + COMSTAR + iSCSI initiator + separate device for zfs zil cache + in-kernel CIFS + enterprise authentication with Windows interoperability + Thumper + ...), the ability to manage all that storage in all sorts of as-yet unforseen deployment configurations _by user or other identity_ may well be important across a broad base of customers. Maybe identity-based, as well as filesystem-based quotas, should be part of that. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD reliability, wear levelling, warranty period
btw: it's seems to me that this thread is a little bit OT. I don't think its OT - because SSDs make perfect sense as ZFS log and/or cache devices. If I did not make that clear in my OP then I failed to communicate clearly. In both these roles (log/cache) reliability is of the utmost importance. Older SSDs (before cheap and relatively high-cycle-limit flash) were RAM cache+battery+hard disk. Surely RAM+battery+flash is also possible; the battery only needs to keep the RAM alive long enough to stage to the flash. That keeps the write count on the flash down, and the speed up (RAM being faster than flash). Such a device would of course cost more, and be less dense (given having to have battery+charging circuits and RAM as well as flash), than a pure flash device. But with more limited write rates needed, and no moving parts, _provided_ it has full ECC and maybe radiation-hardened flash (if that exists), I can't imagine why such a device couldn't be exceedingly reliable and have quite a long lifetime (with the battery, hopefully replaceable, being more of a limitation than the flash). It could be a matter of paying for how much quality you want... As for reliability, from zpool(1m): log A separate intent log device. If more than one log device is specified, then writes are load-balanced between devices. Log devices can be mirrored. However, raidz and raidz2 are not supported for the intent log. For more information, see the “Intent Log” section. cache A device used to cache storage pool data. A cache device cannot be mirrored or part of a raidz or raidz2 configuration. For more information, see the “Cache Devices” section. [...] Cache Devices Devices can be added to a storage pool as “cache devices.” These devices provide an additional layer of caching between main memory and disk. For read-heavy workloads, where the working set size is much larger than what can be cached in main memory, using cache devices allow much more of this working set to be served from low latency media. Using cache devices provides the greatest performance improvement for random read-workloads of mostly static content. To create a pool with cache devices, specify a “cache” vdev with any number of devices. For example: # zpool create pool c0d0 c1d0 cache c2d0 c3d0 Cache devices cannot be mirrored or part of a raidz configuration. If a read error is encountered on a cache device, that read I/O is reissued to the original storage pool device, which might be part of a mirrored or raidz configuration. The content of the cache devices is considered volatile, as is the case with other system caches. That tells me that the zil can be mirrored and zfs can recover from cache errors. I think that means that these devices don't need to be any more reliable than regular disks, just much faster. So...expensive ultra-reliability SSD, or much less expensive SSD plus mirrored zil? Given what zfs can do with cheap SATA, my bet is on the latter... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Cruft left after update
Hi after updating to svn_90 (several retries before I patched pkg) I was left with the following NAME USED AVAIL REFER MOUNTPOINT rpool 9.87G 24.6G62K /rpool [EMAIL PROTECTED] 19.5K - 55K - rpool/ROOT 7.96G 24.6G18K /rpool/ROOT rpool/[EMAIL PROTECTED]15K - 18K - rpool/ROOT/opensolaris 55.7M 24.6G 2.95G legacy rpool/ROOT/opensolaris-10 7.91G 24.6G 4.44G legacy rpool/ROOT/[EMAIL PROTECTED] 8.56M - 2.22G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-01-07:14:04 4.26M - 2.36G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-01-08:08:11 5.35M - 2.95G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-07-20:01:54 97.5K - 3.41G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-00:28:3756K - 3.41G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-00:38:34 120K - 3.41G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-00:56:4676K - 3.41G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-01:06:44 121K - 3.41G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-06:00:33 3.51M - 3.41G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-06:39:31 1.62M - 3.41G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-07:38:2374K - 3.41G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-07:55:1559K - 3.41G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-08:47:2249K - 3.41G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-09:47:45 2.21M - 3.41G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-10:33:50 2.88M - 3.41G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-13:18:02 2.86M - 3.41G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-14:16:02 1.98M - 3.41G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-15:11:06 967K - 3.41G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-15:23:41 1.01M - 3.41G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-16:43:01 925K - 3.41G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-17:05:32 925K - 3.41G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-20:10:03 5.05M - 3.41G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-08-23:04:19 6.47M - 4.15G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-09-12:12:05 238K - 4.15G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-09-12:58:23 160K - 4.15G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-09-13:35:4128K - 4.15G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-09-14:33:15 224K - 4.15G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-09-15:13:1082K - 4.15G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-09-15:32:3198K - 4.15G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-09-15:44:0482K - 4.15G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-09-16:11:51 108K - 4.15G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-06-09-19:25:12 12.8M - 4.15G - rpool/ROOT/opensolaris-10/opt 1.76G 24.6G 1.76G /opt rpool/ROOT/opensolaris-10/[EMAIL PROTECTED] 72K - 3.61M - rpool/ROOT/opensolaris-10/[EMAIL PROTECTED]:-:2008-06-01-07:14:0439K - 595M - rpool/ROOT/opensolaris-10/[EMAIL PROTECTED]:-:2008-06-01-08:08:1148K - 622M - rpool/ROOT/opensolaris-10/[EMAIL PROTECTED]:-:2008-06-08-00:28:37 510K - 1.76G - rpool/ROOT/opensolaris-10/[EMAIL PROTECTED]:-:2008-06-08-23:04:19 177K - 1.76G - rpool/ROOT/opensolaris-10/[EMAIL PROTECTED]:-:2008-06-09-14:33:15 118K - 1.76G - rpool/ROOT/opensolaris-10/[EMAIL PROTECTED]:-:2008-06-09-19:25:12 161K - 1.76G - rpool/ROOT/opensolaris/opt 0 24.6G 622M /opt rpool/export1.90G 24.6G19K /export rpool/[EMAIL PROTECTED] 15K - 19K - rpool/export/home 1.90G 24.6G 1.90G /export/home rpool/export/[EMAIL PROTECTED] 18K - 21K - Which one of these can I clear? Thanks This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Growing root pool ?
On Tue, Jun 10, 2008 at 11:33:36AM -0700, Wyllys Ingersoll wrote: Im running build 91 with ZFS boot. It seems that ZFS will not allow me to add an additional partition to the current root/boot pool because it is a bootable dataset. Is this a known issue that will be fixed or a permanent limitation? The current limitation is that a bootable pool be limited to one disk or one disk and a mirror. When your data is striped across multiple disks, that makes booting harder. From a post to zfs-discuss about two months ago: ... we do have plans to support booting from RAID-Z. The design is still being worked out, but it's likely that it will involve a new kind of dataset which is replicated on each disk of the RAID-Z pool, and which contains the boot archive and other crucial files that the booter needs to read. I don't have a projected date for when it will be available. It's a lower priority project than getting the install support for zfs boot done. - Darren If I read you right, with little or nothing extra, that would enable growing rpool as well, since what it would really do is ensure /boot (and whatever if anything else) was mirrored even though the rest of the zpool was raidz or raidz2; which would also ensure that those critical items were _not_ spread across the stripe that would result from adding devices to an existing zpool. Of course installation and upgrade would have to be able to recognize and deal with such exotica too. Which seems to pose a problem, since having one dataset in the zpool mirrored while the rest is raidz and/or extended by a stripe implies to me that some space is more or less reserved for that purpose, or that such a dataset couldn't be snapshotted, or both; so I suppose there might be a smaller-than-total-capacity limit on the number of BEs possible. http://en.wikipedia.org/wiki/TANSTAAFL ... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Growing root pool ?
I'm not even trying to stripe it across multiple disks, I just want to add another partition (from the same physical disk) to the root pool. Perhaps that is a distinction without a difference, but my goal is to grow my root pool, not stripe it across disks or enable raid features (for now). Currently, my root pool is using c1t0d0s4 and I want to add c1t0d0s0 to the pool, but can't. -Wyllys This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs promote and ENOSPC (+panic with dtrace)
On Wed, Jun 11, 2008 at 12:58 AM, Robin Guo [EMAIL PROTECTED] wrote: Hi, Mike, It's like 6452872, it need enough space for 'zfs promote' Not really - in 6452872 a file system is at its quota before the promote is issued. I expect that a promote may cause several KB of metadata changes that require some space and as such would require more space than the quota. In my case, quotas are not in used. I had over 1.8 GB free before I issued the zfs promote and fully expected to have roughly the same amount of space free after the promote. It seems as though a wrong comparison about the amount of required free space is being made. I have been able to reproduce - but then when I started poking at it with dtrace (no destructive actions) I got a panic. # mdb *.0 Loading modules: [ unix genunix specfs dtrace cpu.generic uppc scsi_vhci zfs random ip hook neti sctp arp usba fctl md lofs sppp crypto ptm ipc fcp fcip cpc logindmux sv nsctl sdbc ufs rdc ii nsmb ] ::status debugging crash dump vmcore.0 (32-bit) from indy2 operating system: 5.11 snv_86 (i86pc) panic message: BAD TRAP: type=e (#pf Page fault) rp=e0620d38 addr=200 occurred in module unkn own due to a NULL pointer dereference dump content: kernel pages only ::stack 0x200(eb1ea000) zfs_ioc_promote+0x3b() zfsdev_ioctl+0xd8(2d8, 5a23, 8045e40, 13, e8b3a020, e0620f78) cdev_ioctl+0x2e(2d8, 5a23, 8045e40, 13, e8b3a020, e0620f78) spec_ioctl+0x65(ddfb6c00, 5a23, 8045e40, 13, e8b3a020, e0620f78) fop_ioctl+0x49(ddfb6c00, 5a23, 8045e40, 13, e8b3a020, e0620f78) ioctl+0x155() sys_call+0x10c() The dtrace command that I was running was: dtrace -n 'fbt:zfs:dsl_dataset_promote:return { trace(arg0); stack() }' -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Quota question
Hi all, I'm new to the list and I thought I'd start out on the right foot. ZFS is great, but I have a couple questions I have a Try-n-buy x4500 with one large zfs pool with 40 1TB drives in it. The pool is named backup. Of this pool, I have a number of volumes. backup/clients backup/clients/bob backup/clients/daniel ... Now bob and Daniel are populated by rsync over ssh to synchronize filesystems with client machines. (the data will then be written to a SL500) I'd like to set the quota on /backup/clients to some arbitrary small amount. Seems pretty handy since nothing should go into backup/clients but into the volumes backup/clients/* But when I set the quota on backup/clients, I am unable to increase the quota for the sub volumes (bob, Daniel, etc). Any ideas if this is possible or how to do it? Thanks Dave David Glaser Systems Administrator LSA Information Technology University of Michigan ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Growing root pool ?
Wyllys Ingersoll wrote: I'm not even trying to stripe it across multiple disks, I just want to add another partition (from the same physical disk) to the root pool. Perhaps that is a distinction without a difference, but my goal is to grow my root pool, not stripe it across disks or enable raid features (for now). Currently, my root pool is using c1t0d0s4 and I want to add c1t0d0s0 to the pool, but can't. DANGER: Uncharted territory!!! That said, if the space on the disk (for the 2 partitions) is contiguous (which it doesn't appear is true in your case,) or could be made contiguous by moving some other slice out of the way, then one way you should (note: I haven't tried this, and there is chance for human error to mess things up even if it will work - and there's some chance it won't work even if you do it perfect,) be able to grow the root pool by deleting the new (second) partition, and redefine the original partition to extand across the space of both partitions. Once that's done, a zpool replace c1t0d0sX c1t0d0sX should notify ZFS that the slice is bigger, and it will grow the pool to match. You have s4 and s0, so I bet the space is not contiguous, and I'd guess the free space is earlier on the disk, not later. You might be able to get around that by mirroring s4 to s0 first then detaching s4, so that you're only using s0 and the beginning of the disk... but that's just more changes that could introduce problems. Needless to say, I wouldn't try this on a system I really needed with out: 1) Really good backups! and possibly, 2) Trying it out first on a virtual machine, or different HW. Personally, unless I really wanted to prove I could do it, I'd just backup and reinstall. ;) sorry. -Kyle -Wyllys This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Growing root pool ?
I'm not even trying to stripe it across multiple disks, I just want to add another partition (from the same physical disk) to the root pool. Perhaps that is a distinction without a difference, but my goal is to grow my root pool, not stripe it across disks or enable raid features (for now). Currently, my root pool is using c1t0d0s4 and I want to add c1t0d0s0 to the pool, but can't. -Wyllys Right, that's how it is right now (which the other guy seemed to be suggesting might change eventually, but nobody knows when because it's just not that important compared to other things). AFAIK, if you could shrink the partition whose data is after c1t0d0s4 on the disk, you could grow c1t0d0s4 by that much, and I _think_ zfs would pick up the growth of the device automatically. (ufs partitions can be grown like that, or by being on an SVM or VxVM volume that's grown, but then one has to run a command specific to ufs to grow the filesystem to use the additional space). I think zpools are supposed to grow automatically if SAN LUNs are grown, and this should be a similar situation, anyway. But if you can do that, and want to try it, just be careful. And of course you couldn't shrink it again, either. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SMC Webconsole 3.1 and ZFS Administration 1.0 - stacktraces in snv_b89
Yeah. The command line works fine. Thought it to be a bit curious that there was an issue with the HTTP interface. It's low priority I guess because it doesn't impact the functionality really. Thanks for the responses. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SATA controller suggestion
If your worried about the bandwidth limitations of putting something like the supermicro card in a pci slot how about using an active riser card to convert from PCI-E to PCI-X. One of these, or something similar: http://www.tyan.com/product_accessories_spec.aspx?pid=26 on sale at http://www.amazon.com/dp/B000OH5J9G?smid=ATVPDKIKX0DERtag=nextag-ce-tier2-20linkCode=asn I'm sure you can find something similar for less, and I have seen ones that go from PCI-E x16 to several PCI-X as well. That and the supermicro are under half the price of the cheapest LSI PCI-E card. Lee This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SATA controller suggestion
On Wed, Jun 11, 2008 at 10:18 AM, Lee [EMAIL PROTECTED] wrote: If your worried about the bandwidth limitations of putting something like the supermicro card in a pci slot how about using an active riser card to convert from PCI-E to PCI-X. One of these, or something similar: http://www.tyan.com/product_accessories_spec.aspx?pid=26 on sale at http://www.amazon.com/dp/B000OH5J9G?smid=ATVPDKIKX0DERtag=nextag-ce-tier2-20linkCode=asn I'm sure you can find something similar for less, and I have seen ones that go from PCI-E x16 to several PCI-X as well. That and the supermicro are under half the price of the cheapest LSI PCI-E card. Lee This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Are those universal though? I was under the impression it had to be supported by the motherboard, or you'd fry all components involved. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Filesystem for each home dir - 10,000 users?
Richard L. Hamilton wrote: Whatever mechanism can check at block allocation/deallocation time to keep track of per-filesystem space (vs a filesystem quota, if there is one) could surely also do something similar against per-uid/gid/sid quotas. I suspect a lot of existing functions and data structures could be reused or adapted for most of it. Just one more piece of metadata to update, right? Not as if ufs quotas had zero runtime penalty if enabled. And you only need counters and quotas in-core for identifiers applicable to in-core znodes, not for every identifier used on the zpool. The current quota system does its checking of quota constraints in the DSL (dsl_sync_task_group_sync ends up getting the quota check made) A user based quota system would I believe need to be in the ZPL because that is where we understand what users are. I suspect this means that quotas would probably be easiest implemented per dataset rather than per pool. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD reliability, wear levelling, warranty period
On Wed, 11 Jun 2008, Al Hopper wrote: disk drives. But - based on personal observation - there is a lot of hype surrounding SSD reliability. Obviously the *promise* of this technology is higher performance and *reliability* with lower power requirements due to no (mechanical) moving parts. But... if you look broadly at the current SSD product offerings, you see: a) lower than expected performance - particularly in regard to write IOPS (I/O Ops per Second) and b) warranty periods that are typically 1 year - with the (currently rare) exception of products that are offered with a 5 year warranty. Other than the fact that SSDs eventually wear out from use, SSDs are no different from any other electronic device in that the number of individual parts, and the individual reliability of those parts, results in an overall reliability factor for the subsystem comprised of those parts. SSDs are jam-packed with parts. In fact, if you were to look inside an SSD and then look at how typical computers are implemented these days, you will see that one SSD has a whole lot more complex parts than the rest of the computer. SSDs will naturally become more reliable as their parts count is reduced due to higher integration and product maturity. Large SSD storage capacity requires more parts so large storage devices have less relability than smaller devices comprised of similar parts. SSDs are good for laptop reliability since hard drives tend to fail with high shock levels and laptops are often severely abused. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SATA controller suggestion
I don't think so, not all of them anyway. They also sell ones that have a proprietary goldfinger, which obviously would not work. The spec does not mention any specific restrictions, just lists the interface types (but it is fairly breif), and you can certianly buy PCI - PCI-E generic adapters: http://virtuavia.eu/shop/pci-express-to-pci-adapter-p29855.html Which use a similar bridge chip. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD reliability, wear levelling, warranty period
On Jun 11, 2008, at 11:35 AM, Bob Friesenhahn wrote: On Wed, 11 Jun 2008, Al Hopper wrote: disk drives. But - based on personal observation - there is a lot of hype surrounding SSD reliability. Obviously the *promise* of this technology is higher performance and *reliability* with lower power requirements due to no (mechanical) moving parts. But... if you look broadly at the current SSD product offerings, you see: a) lower than expected performance - particularly in regard to write IOPS (I/O Ops per Second) and b) warranty periods that are typically 1 year - with the (currently rare) exception of products that are offered with a 5 year warranty. Other than the fact that SSDs eventually wear out from use, SSDs are no different from any other electronic device in that the number of individual parts, and the individual reliability of those parts, results in an overall reliability factor for the subsystem comprised of those parts. SSDs are jam-packed with parts. In fact, if you were to look inside an SSD and then look at how typical computers are implemented these days, you will see that one SSD has a whole lot more complex parts than the rest of the computer. SSDs will naturally become more reliable as their parts count is reduced due to higher integration and product maturity. Large SSD storage capacity requires more parts so large storage devices have less relability than smaller devices comprised of similar parts. SSDs are good for laptop reliability since hard drives tend to fail with high shock levels and laptops are often severely abused. Yeah I was going to add the fact that they dont spin at 7k+ rpm and have no 'moving' parts. I do agree that there is a lot of circuitry involved and eventually they will reduce that just like they did with mainboards. Remember how packed they used to be? Either way, I'm really interested in the vendor and technology Sun will choose for providing these SSD's in systems or as an add on card/ drive. -Andy Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Filesystem for each home dir - 10,000 users?
On Wed, 11 Jun 2008, Richard L. Hamilton wrote: But if you already have the ZAP code, you ought to be able to do quick lookups of arbitrary byte sequences, right? Just assume that a value not stored is zero (or infinity, or uninitialized, as applicable), and you have the same functionality as the sparse quota file on ufs, without the problems. I don't know anything about ZAP code but I do know that CPU caches are only so large and there can be many caches for the same data since each CPU has its own cache. Some of us do actual computing using these same CPUs so it would be nice if they weren't entirely consumed by the filesystem. Current application performance on today's hardware absolutely sucks as compared to its theroretical potential. Let's try to improve that performance rather than adding more cache thrashing, leading to more wait states. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD reliability, wear levelling, warranty period
Hi All ; Every NAND based SSD HDD have some ram. Consumer grade products will have smaller not battery protected ram with a smaller number of prallel working nand chips and a slower cpu to distribute the load. Also consumer product will have less number of spare cells. Enterprise SSD's are genrally compose of several nand devices and a lot of spare cells controlled by a fast micro computer which also have some cache and a super capacitor to protect the cache. Regardless of NAND write cycle capability, vendors can design a reliable SSD with incorprating more spare cells in to the desing. Mertol Mertol Ozyoney Storage Practice - Sales Manager Sun Microsystems, TR Istanbul TR Phone +902123352200 Mobile +905339310752 Fax +90212335 Email [EMAIL PROTECTED] -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Richard L. Hamilton Sent: Wednesday, June 11, 2008 2:58 PM To: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] SSD reliability, wear levelling, warranty period btw: it's seems to me that this thread is a little bit OT. I don't think its OT - because SSDs make perfect sense as ZFS log and/or cache devices. If I did not make that clear in my OP then I failed to communicate clearly. In both these roles (log/cache) reliability is of the utmost importance. Older SSDs (before cheap and relatively high-cycle-limit flash) were RAM cache+battery+hard disk. Surely RAM+battery+flash is also possible; the battery only needs to keep the RAM alive long enough to stage to the flash. That keeps the write count on the flash down, and the speed up (RAM being faster than flash). Such a device would of course cost more, and be less dense (given having to have battery+charging circuits and RAM as well as flash), than a pure flash device. But with more limited write rates needed, and no moving parts, _provided_ it has full ECC and maybe radiation-hardened flash (if that exists), I can't imagine why such a device couldn't be exceedingly reliable and have quite a long lifetime (with the battery, hopefully replaceable, being more of a limitation than the flash). It could be a matter of paying for how much quality you want... As for reliability, from zpool(1m): log A separate intent log device. If more than one log device is specified, then writes are load-balanced between devices. Log devices can be mirrored. However, raidz and raidz2 are not supported for the intent log. For more information, see the “Intent Log” section. cache A device used to cache storage pool data. A cache device cannot be mirrored or part of a raidz or raidz2 configuration. For more information, see the “Cache Devices” section. [...] Cache Devices Devices can be added to a storage pool as “cache devices.” These devices provide an additional layer of caching between main memory and disk. For read-heavy workloads, where the working set size is much larger than what can be cached in main memory, using cache devices allow much more of this working set to be served from low latency media. Using cache devices provides the greatest performance improvement for random read-workloads of mostly static content. To create a pool with cache devices, specify a “cache” vdev with any number of devices. For example: # zpool create pool c0d0 c1d0 cache c2d0 c3d0 Cache devices cannot be mirrored or part of a raidz configuration. If a read error is encountered on a cache device, that read I/O is reissued to the original storage pool device, which might be part of a mirrored or raidz configuration. The content of the cache devices is considered volatile, as is the case with other system caches. That tells me that the zil can be mirrored and zfs can recover from cache errors. I think that means that these devices don't need to be any more reliable than regular disks, just much faster. So...expensive ultra-reliability SSD, or much less expensive SSD plus mirrored zil? Given what zfs can do with cheap SATA, my bet is on the latter... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SATA controller suggestion
On Wed, Jun 11, 2008 at 8:21 AM, Tim [EMAIL PROTECTED] wrote: Are those universal though? I was under the impression it had to be supported by the motherboard, or you'd fry all components involved. There are PCI/PCI-X to PCI-e bridge chips available (as well as PCI-e to AGP) and they're part of the spec. As to how well they actually work on a separate riser card, I'm not sure. I like the idea though. This board looks decent if you need a ton of drives. The second x16 slot is actually a x4 electrical, but that not too shabby for a $100 mobo. http://www.newegg.com/Product/Product.aspx?Item=N82E16813128335 -B -- Brandon High [EMAIL PROTECTED] The good is the enemy of the best. - Nietzsche ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD reliability, wear levelling, warranty period
On Wed, Jun 11, 2008 at 10:35 AM, Bob Friesenhahn [EMAIL PROTECTED] wrote: On Wed, 11 Jun 2008, Al Hopper wrote: disk drives. But - based on personal observation - there is a lot of hype surrounding SSD reliability. Obviously the *promise* of this technology is higher performance and *reliability* with lower power requirements due to no (mechanical) moving parts. But... if you look broadly at the current SSD product offerings, you see: a) lower than expected performance - particularly in regard to write IOPS (I/O Ops per Second) and b) warranty periods that are typically 1 year - with the (currently rare) exception of products that are offered with a 5 year warranty. Other than the fact that SSDs eventually wear out from use, SSDs are no different from any other electronic device in that the number of individual parts, and the individual reliability of those parts, results in an overall reliability factor for the subsystem comprised of those parts. SSDs are jam-packed with parts. In fact, if you were to look inside an SSD and then look at how typical computers are implemented these days, you will see that one SSD has a whole lot more complex parts than the rest of the computer. Agreed - but the effect on overall system reliability is dominated by the required number of interconnections (soldered joints etc), rather than the total number of parts. But we're drifting OT here... SSDs will naturally become more reliable as their parts count is reduced due to higher integration and product maturity. Large SSD storage capacity requires more parts so large storage devices have less relability than smaller devices comprised of similar parts. Again - agreed - but the root problem being addressed is the reduction in the number of *interconnections* - which is directly related to the number of parts. SSDs are good for laptop reliability since hard drives tend to fail with high shock levels and laptops are often severely abused. My personal experience, echoed by numereous others I've talked with, is that a typical laptop drive dies in 18 months - whether the laptop travels or stays fixed on a desktop with occasional travel, for example, in the office all week and brought home for the weekend. For most laptops, the real enemy of laptop disk drive reliability is the operation of the drive at elevated temperatures common inside a laptop - rather than vibration/shock. I don't remember the number - but a vast number of laptops spend the vast majority of their time glued to a desk. Regards, -- Al Hopper Logical Approach Inc,Plano,TX [EMAIL PROTECTED] Voice: 972.379.2133 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD reliability, wear levelling, warranty period
On Wed, Jun 11, 2008 at 4:31 AM, Adam Leventhal [EMAIL PROTECTED] wrote: On Jun 11, 2008, at 1:16 AM, Al Hopper wrote: But... if you look broadly at the current SSD product offerings, you see: a) lower than expected performance - particularly in regard to write IOPS (I/O Ops per Second) True. Flash is quite asymmetric in its performance characteristics. That said, the L2ARC has been specially designed to play well with the natural strengths and weaknesses of flash. and b) warranty periods that are typically 1 year - with the (currently rare) exception of products that are offered with a 5 year warranty. You'll see a new class of SSDs -- eSSDs -- designed for the enterprise with longer warranties and more write/erase cycles. Further, ZFS will do its part by not killing the write/erase cycles of the L2ARC by constantly streaming as fast as possible. You should see lifetimes in the 3-5 year range on typical flash. Obviously, for SSD products to live up to the current marketing hype, they need to deliver superior performance and *reliability*. Everyone I know *wants* one or more SSD devices - but they also have the expectation that those devices will come with a warranty at least equivalent to current hard disk drives (3 or 5 years). I don't disagree entirely, but as a cache device flash actually can be fairly unreliable and we'll pick it up in ZFS. So ... I'm interested in learning from anyone on this list, and, in particular, from Team ZFS, what the reality is regarding SSD reliability. Obviously Sun employees are not going to compromise their employment and divulge upcoming product specific data - but there must be *some* data (white papers etc) in the public domain that would provide some relevant technical data?? A typical high-end SSD can sustain 100k write/erase cycles so you can do some simple math to see that a 128GB device written to at a rate of 150M/s will last nearly 3 years. Again, note that unreliable devices will result in a performance degradation when you fail a checksum in the L2ARC, but the data will still be valid out of the main storage pool. You're going to see much more on this in the next few months. I made a post to my blog that probably won't answer your questions directly, but may help inform you about what we have in mind. http://blogs.sun.com/ahl/entry/flash_hybrid_pools_and_future Adam -- Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl Ahh Haa! So this is the secret project (probably one of many) that you guys have been working on! :) Great post and I really appreciate how this thread has provided lots of interesting stuff to think about. I think that I'll (personally) avoid the initial rush-to-market comsumer level products by vendors with no track record of high tech software development - let alone those who probably can't afford the PhD level talent it takes to get the wear leveling algorithms correct - and then to implement them correctly. Instead I'll wait for a Sun product - from a company with a track record of proven design and *implementation* for enterprise level products (software and hardware). Otherwise, I think that I would be really upset with an SSD device that died every 2+ years - even if it has a 5 year warranty. No one I know would tolerate that kind of system disruption from todays hard disk drives - despite anticipated failures. Its more aggravation that most production oriented systems can simply live without! Again - thanks to all contributors for this interesting thread. Regards, -- Al Hopper Logical Approach Inc,Plano,TX [EMAIL PROTECTED] Voice: 972.379.2133 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Growing root pool ?
Luckily, my system had a pair of identical, 232GB disks. The 2nd wasn't yet used, so by juggling mirrors (create 3 mirrors, detach the one to change, etc...), I was able to reconfigure my disks more to my liking - all without a single reboot or loss of data. I now have 2 pools - a 20GB root pool and a 210GB other pool, each mirrored on the other disk. If not for the extra disk and the wonderful zfs snapshot/send/receive feature it would have taken a lot more time and aggravation to get it straightened out. -Wyllys This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD reliability, wear levelling, warranty period
Your key problem is going to be: Will Sun use SLC or MLC? From what I have read the trend now is towards MLC chips which have much lower number of write cycles but are cheaper and more storage. So then they end up layering ECC and wear-levelling on to address this shortened life-span. A lot of the large USB thumb-drives now are MLC and the appallingly slow write speeds and shortened lifespan are a problem. Older 4-gig and 8-gig SLC versions of the same device are superior. Hard to find this info though. I would use any SSD in a mirror assuming there WILL be cells going out over time. This would be true for me with both boot drives and slog devices. You can mirror the log device also for ZFS. I'm not really clear though if the log device is a huge enough win on SSD to warrant all this trouble. Depends on your application. I think if performance were my god I would chase after something like a RAMSAN device instead for logging. RAM-based performance with disk as backing-store. YMMV. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Growing root pool ?
On Wed, 2008-06-11 at 07:40 -0700, Richard L. Hamilton wrote: I'm not even trying to stripe it across multiple disks, I just want to add another partition (from the same physical disk) to the root pool. Perhaps that is a distinction without a difference, but my goal is to grow my root pool, not stripe it across disks or enable raid features (for now). Currently, my root pool is using c1t0d0s4 and I want to add c1t0d0s0 to the pool, but can't. -Wyllys Right, that's how it is right now (which the other guy seemed to be suggesting might change eventually, but nobody knows when because it's just not that important compared to other things). AFAIK, if you could shrink the partition whose data is after c1t0d0s4 on the disk, you could grow c1t0d0s4 by that much, and I _think_ zfs would pick up the growth of the device automatically. This works. ZFS doesn't notice the size increase until you reboot. I've been installing systems over the past year with a slice arrangement intended to make it easy to go to zfs root: s0 with a ZFS pool at start of disk s1 swap s3 UFS boot environment #1 s4 UFS boot environment #2 s7 SVM metadb (if mirrored root) I was happy to discover that this paid off. Once I upgraded a BE to nv_90 and was running on it, it was a matter of: lucreate -p $pool -n nv_90zfs luactivate nv_90zfs init 6 (reboot) ludelete other BE's format format partition delete slices other than s0 grow s0 to full disk reboot and you're all ZFS all the time. - Bill ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Filesystem for each home dir - 10,000 users?
This is one of those issues, where the developers generally seem to think that old-style quotas is legacy baggage. And that people running large home-directory sort of servers with 10,000+ users are a minority that can safely be ignored. I can understand their thinking.However it does represent a problem here at the University of California Davis. I would love to replace our Solaris 9 home-directory server with one running Solaris 10 and ZFS. A past issue with UFS corruption keeps us all nervous. But there is no other alternative to UFS+quotas as yet it seems. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Growing root pool ?
I had a similar configuration until my recent re-install to snv91. Now I am have just 2 ZFS pools - one for root+boot (big enough to hold multiple BEs and do LiveUpgrades) and another for the rest of my data. -Wyllys This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/receive issue
see: http://bugs.opensolaris.org/view_bug.do?bug_id=6700597 This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD reliability, wear levelling, warranty period
On Wed, Jun 11, 2008 at 01:51:17PM -0500, Al Hopper wrote: I think that I'll (personally) avoid the initial rush-to-market comsumer level products by vendors with no track record of high tech software development - let alone those who probably can't afford the PhD level talent it takes to get the wear leveling algorithms correct - and then to implement them correctly. Instead I'll wait for a Sun product - from a company with a track record of proven design and *implementation* for enterprise level products (software and hardware). Wear leveling is actually a fairly mature technology. I'm more concerned with what will happen as people continue pushing these devices out of the consumer space and into the enterprise where stuff like failure modes and reliability matters in a completely different way. If my iPod sucks that's a hassle, but it's a different matter if an SSD hangs an I/O request on my enterprise system. Adam -- Adam Leventhal, Fishworks http://blogs.sun.com/ahl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS conflict with MAID?
A Darren Dunham wrote: On Tue, Jun 10, 2008 at 05:32:21PM -0400, Torrey McMahon wrote: However, some apps will probably be very unhappy if i/o takes 60 seconds to complete. It's certainly not uncommon for that to occur in an NFS environment. All of our applications seem to hang on just fine for minor planned and unplanned outages. Would the apps behave differently in this case? (I'm certainly not thinking of a production database for such a configuration). Some applications have their own internal timers that track i/o time and, if it doesn't complete in time, will error out. I don't know which part of the stack the timer was in but I've seen an Oracle RAC cluster on QFS timeout much faster then the SCSI retries normally allow for. (I think it was Oracle in that case...) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs receive - list contents of incremental stream?
Thanks, Matt. Are you interested in feedback on various questions regarding how to display results? On list or off? Thanks. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS root boot failure?
So I decided to test out failure modes of ZFS root mirrors. Installed on a V240 with nv90. Worked great. Pulled out disk1, then replaced it and attached again, resilvered, all good. Now I pull out disk0 to simulate failure there. OS up and running fine, but lots of error message about SYNC CACHE. Next I decided to init 0, and reinsert disk 0, and reboot. Uh oh! Probing system devices Probing memory Probing I/O buses Sun Fire V240, No Keyboard Copyright 2007 Sun Microsystems, Inc. All rights reserved. OpenBoot 4.22.33, 8192 MB memory installed, Serial #54881337. Ethernet address 0:3:ba:45:6c:39, Host ID: 83456c39. Rebooting with command: boot Boot device: /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED],0:a File and args: SunOS Release 5.11 Version snv_90 64-bit Copyright 1983-2008 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. NOTICE: *** * This device is not bootable! * * It is either offlined or detached or faulted. * * Please try to boot from a different device.* *** NOTICE: spa_import_rootpool: error 22 Cannot mount root on /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED],0:a fstype zfs panic[cpu1]/thread=180e000: vfs_mountroot: cannot mount root 0180b950 genunix:vfs_mountroot+348 (600, 200, 800, 200, 1874800, 12b6000) %l0-3: 0001d524 0064 0001d4c0 1d4c %l4-7: 05dc 1770 0640 018c7000 0180ba10 genunix:main+b4 (1815000, 180c000, 1837240, 18151f8, 1, 180e000) %l0-3: 01838258 70002000 010bfc00 %l4-7: 0183c400 0001 0180c000 01837c00 skipping system dump - no dump device configured rebooting... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS root boot failure?
Sounds correct to me. The disk isn't sync'd so boot should fail. If you pull disk0 or set disk1 as the primary boot device what does it do? You can't expect it to resliver before booting. On 6/11/08, Vincent Fox [EMAIL PROTECTED] wrote: So I decided to test out failure modes of ZFS root mirrors. Installed on a V240 with nv90. Worked great. Pulled out disk1, then replaced it and attached again, resilvered, all good. Now I pull out disk0 to simulate failure there. OS up and running fine, but lots of error message about SYNC CACHE. Next I decided to init 0, and reinsert disk 0, and reboot. Uh oh! Probing system devices Probing memory Probing I/O buses Sun Fire V240, No Keyboard Copyright 2007 Sun Microsystems, Inc. All rights reserved. OpenBoot 4.22.33, 8192 MB memory installed, Serial #54881337. Ethernet address 0:3:ba:45:6c:39, Host ID: 83456c39. Rebooting with command: boot Boot device: /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED],0:a File and args: SunOS Release 5.11 Version snv_90 64-bit Copyright 1983-2008 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. NOTICE: *** * This device is not bootable! * * It is either offlined or detached or faulted. * * Please try to boot from a different device.* *** NOTICE: spa_import_rootpool: error 22 Cannot mount root on /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED],0:a fstype zfs panic[cpu1]/thread=180e000: vfs_mountroot: cannot mount root 0180b950 genunix:vfs_mountroot+348 (600, 200, 800, 200, 1874800, 12b6000) %l0-3: 0001d524 0064 0001d4c0 1d4c %l4-7: 05dc 1770 0640 018c7000 0180ba10 genunix:main+b4 (1815000, 180c000, 1837240, 18151f8, 1, 180e000) %l0-3: 01838258 70002000 010bfc00 %l4-7: 0183c400 0001 0180c000 01837c00 skipping system dump - no dump device configured rebooting... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Quota question
Glaser, David [EMAIL PROTECTED] writes: Hi all, I?m new to the list and I thought I?d start out on the right foot. ZFS is great, but I have a couple questions?. I have a Try-n-buy x4500 with one large zfs pool with 40 1TB drives in it. The pool is named backup. Of this pool, I have a number of volumes. backup/clients backup/clients/bob backup/clients/daniel ? Now bob and Daniel are populated by rsync over ssh to synchronize filesystems with client machines. (the data will then be written to a SL500) I?d like to set the quota on /backup/clients to some arbitrary small amount. Seems pretty handy since nothing should go into backup/clients but into the volumes backup/ clients/* But when I set the quota on backup/clients, I am unable to increase the quota for the sub volumes (bob, Daniel, etc). Any ideas if this is possible or how to do it? Sounds like you want refquota: From: zfs(1M) refquota=size | none Limits the amount of space a dataset can consume. This property enforces a hard limit on the amount of space used. This hard limit does not include space used by descendents, including file systems and snapshots. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS root boot failure?
Ummm, could you back up a bit there? What do you mean disk isn't sync'd so boot should fail? I'm coming from UFS of course where I'd expect to be able to fix a damaged boot drive as it drops into a single-user root prompt. I believe I did try boot disk1 but that failed I think due to prior trial with it, where I scrambled it with dd, then resilvered. Then removed it, replaced, resilvered it. Think I ended up with unusable boot sector on disk1 that didn't work but I didn't copy the message down sorry. I suppose all that would have been left is boot from media or jumpstart server in single-user and attempt repairs. Unfortunately I have since re-jumpstarted the system clean. This was plain nv90 both times by the way no /etc/system tweaks. I have to pull the motherboard on the V240 and replace it tomorrow, maybe on Friday I will be able to repeat my experiment. Just wanted to run through some failure-modes so I know what to expect when boot drives die on me. Thanks! This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS root boot failure?
Vincent Fox wrote: So I decided to test out failure modes of ZFS root mirrors. Installed on a V240 with nv90. Worked great. Pulled out disk1, then replaced it and attached again, resilvered, all good. Now I pull out disk0 to simulate failure there. OS up and running fine, but lots of error message about SYNC CACHE. Next I decided to init 0, and reinsert disk 0, and reboot. Uh oh! This is actually very good. It means that ZFS recognizes that there are two, out of sync mirrors and you booted from the oldest version. What happens when you change the boot order? -- richard Probing system devices Probing memory Probing I/O buses Sun Fire V240, No Keyboard Copyright 2007 Sun Microsystems, Inc. All rights reserved. OpenBoot 4.22.33, 8192 MB memory installed, Serial #54881337. Ethernet address 0:3:ba:45:6c:39, Host ID: 83456c39. Rebooting with command: boot Boot device: /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED],0:a File and args: SunOS Release 5.11 Version snv_90 64-bit Copyright 1983-2008 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. NOTICE: *** * This device is not bootable! * * It is either offlined or detached or faulted. * * Please try to boot from a different device.* *** NOTICE: spa_import_rootpool: error 22 Cannot mount root on /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED],0:a fstype zfs panic[cpu1]/thread=180e000: vfs_mountroot: cannot mount root 0180b950 genunix:vfs_mountroot+348 (600, 200, 800, 200, 1874800, 12b6000) %l0-3: 0001d524 0064 0001d4c0 1d4c %l4-7: 05dc 1770 0640 018c7000 0180ba10 genunix:main+b4 (1815000, 180c000, 1837240, 18151f8, 1, 180e000) %l0-3: 01838258 70002000 010bfc00 %l4-7: 0183c400 0001 0180c000 01837c00 skipping system dump - no dump device configured rebooting... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS root boot failure?
Vincent Fox wrote: Ummm, could you back up a bit there? What do you mean disk isn't sync'd so boot should fail? I'm coming from UFS of course where I'd expect to be able to fix a damaged boot drive as it drops into a single-user root prompt. I believe I did try boot disk1 but that failed I think due to prior trial with it, where I scrambled it with dd, then resilvered. Then removed it, replaced, resilvered it. Think I ended up with unusable boot sector on disk1 that didn't work but I didn't copy the message down sorry. I suppose all that would have been left is boot from media or jumpstart server in single-user and attempt repairs. Unfortunately I have since re-jumpstarted the system clean. This was plain nv90 both times by the way no /etc/system tweaks. I have to pull the motherboard on the V240 and replace it tomorrow, maybe on Friday I will be able to repeat my experiment. Just wanted to run through some failure-modes so I know what to expect when boot drives die on me. Sequence of events failures are one of the most common fatal errors in complex systems. In this case, you induced a failure mode we call amnesia. It works like this: Consider a system with two (!) mirrored disks (AB) working normally and in sync. At time0, disconnect disk A. It will still contain a view of the system state, but is not accessible by the system. At time1, the system gives up on disk A and proceeds using disk B. Now the two disks are no longer in sync and the data on disk B is newer than the data on disk A. At time2, shutdown the system. Re-attach disk A. The correct behaviour is that disk A is old and its data should be ignored until repaired. Disk B should be the primary, authoritative view of the system state. This failure mode is called amnesia because disk A doesn't remember the changes that should have occurred if it had been an active, functional member of the system. AFAIK, SVM will not handle this problem well. ZFS and Solaris Cluster can detect this because the configuration metadata knows the time difference (ZFS can detect this by the latest txg). I predict that if you had booted from disk B, then it would have worked (but I don't have the hardware setup to test this tonight) NB, for those who don't know about SPARC boot sequences, the OpenBoot program has a default boot device list and will try the first device, then the second, and so on. This is similar to how most BIOSes work. While you wouldn't normally expect to need to worry about this, it makes a difference in the case of amnesia. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS conflict with MAID?
Torrey McMahon wrote: A Darren Dunham wrote: On Tue, Jun 10, 2008 at 05:32:21PM -0400, Torrey McMahon wrote: However, some apps will probably be very unhappy if i/o takes 60 seconds to complete. It's certainly not uncommon for that to occur in an NFS environment. All of our applications seem to hang on just fine for minor planned and unplanned outages. Would the apps behave differently in this case? (I'm certainly not thinking of a production database for such a configuration). Some applications have their own internal timers that track i/o time and, if it doesn't complete in time, will error out. I don't know which part of the stack the timer was in but I've seen an Oracle RAC cluster on QFS timeout much faster then the SCSI retries normally allow for. (I think it was Oracle in that case...) Oracle bails out after 10 minutes (ORA-27062) ask me how I know... :-P -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss