Re: [zfs-discuss] part of active zfs pool error message reports incorrect decive
I'd agree with export/import *IF* the drive should be good, however I have a drive that was pulled from the pool a long time ago (to flash drive the drive) - The data on it is useless. exporting/importing would cause either a) errors, b) scrub to need to be run which should overwrite it completely. This seems like some simple method to say FORCE this drive out of the pool. Steve Radich www.BitShop.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Defragmentation - will it happen?
The way i understand it, this requires bprewrite and it's being worked on. On Sat, Jan 23, 2010 at 5:40 AM, Colin Raven co...@clearcutnetworks.comwrote: Can anyone comment on the likelihood of zfs defrag becoming a reality some day? If so, any approximation as to when? I realize this isn't exactly a trivial endeavor, but it sure would be nice to see. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] hard drive choice, TLER/ERC/CCTL
Thanks a lot. I'd looked at SO many different RAID boxes and never had a good feeling about them from the point of data safety, that when I read the 'A Conversation with Jeff Bonwick and Bill Moore – The future of file systems' article (http://queue.acm.org/detail.cfm?id=1317400), I was convinced that ZFS sounded like what I needed, and thought I'd try to help others see how good ZFS was and how to make their own home systems that work. Publishing the notes as articles had the side-benefit of allowing me to refer back to them when I was reinstalling a new SXCE build etc afresh... :) It's good to see that you've been able to set the error reporting time using HDAT2 for your Samsung HD154UI drives, but it is a pity that the change does not persist through cold starts. From a brief look, it looks like like the utility runs under DOS, so I wonder if it would be possible to convert the code into C and run it immediately after OpenSolaris has booted? That would seem a reasonable automated workaround. I might take a little look at the code. However, the big questions still remain: 1. Does ZFS benefit from shorter error reporting times? 2. Does having shorter error reporting times provide any significant data safety through, for example, preventing ZFS from kicking a drive from a vdev? Those are the questions I would like to hear somebody give an authoritative answer to. Cheers, Simon http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/ -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
Interesting question. The answer I came to, perhaps through lack of information and experience, is that there isn't a best 1.5tb drive. I decided that 1.5tb is too big, and that it's better to use more and smaller devices so I could get to raidz3. The reasoning came after reading the case for triple-parity raid. The curves showing time to failure versus time to resilver a single lost drive. Time to failure will remain constant-ish, while time to resilver will increase as the number of bits inside a single drive increases, largely because the input/output bandwidth is going to increase only very slowly. The bigger the number of bits in a single drive compared to the time to write a new, full disk worth of bits, the bigger the window for a second-drive failure. Hence, the third parity version is desirable. In general, more drives of smaller capacity within reason for a vdev, the less exposure to a double fault. This led me to look at sub-terabyte drives, and that's how I accidentally found those 0.75GB raid-rated drives, although the raid rated wasn't what I was looking for. I was after the best cost/bit in a six-drive batch with a top cost limit. After reading through the best practices stuff, I clumsily decided that a six- or seven-drive raidz3 would be a good idea. And I have a natural leaning to stay !OFF! the leading edge of technology where keeping data reliable is involved. It's a personal quirk I learned by getting several scars to remind me. How's that for a mismash of misunderstanding? 8-) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
pI just took a look at customer feedback on this drive here. 36% rate with one star, which I would consider alarming. Take a look here, ordered from lowest rating to highest rating. Note the recency of the comments and the descriptions:/p a href=http://www.newegg.com/Product/ProductReview.aspx?Item=22-148-412SortField=3SummaryType=0Pagesize=10SelectedRating=-1PurchaseMark=VideoOnlyMark=FalseVendorMark=Page=1Keywords=%28keywords%29;Seagate Barracuda LP ST31500541AS 1.5TB 5900 RPM/a pIs this the model you mean? If so, I might look at some other alternative possibilities./p pSo, we have apparently problematic newest revision WD Green 'EADS' and 'EARS' models, and an apparently problematic Seagate model described here./p pThat leaves Hitachi and Samsung./p pI had past 'experiences' with post IBM 'deathstar' Hitachi drives, so I think for now I shall be looking into the Samsungs, as from the customer reviews it seems these could be the most reliable consumer-priced high-capacity drives available right now./p pIt does seem that it is proving to be a big challenge for the drive manufacturers to produce reliable high-capacity consumer-priced drives. Maybe this is Samsung's opportunity to prove how good they are?/p a href=http://www.newegg.com/Product/Product.aspx?Item=N82E16822152175Tpk=HD154UI;Samsung 1.5TB HD154UI 3-platter drive/a a href=http://www.newegg.com/Product/Product.aspx?Item=N82E16822152202Tpk=HD203WI;Samsung 2TB HD203WI 4-platter drive/a -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
On Sat, 23 Jan 2010, R.G. Keen wrote: The reasoning came after reading the case for triple-parity raid. The curves showing time to failure versus time to resilver a single lost drive. Time to failure will remain constant-ish, while time to resilver will increase as the number of bits inside a single drive increases, largely because the input/output bandwidth is going to increase only very slowly. The bigger the number of bits in a single drive compared to the time to write a new, full disk worth of bits, the bigger the window for a second-drive failure. Hence, the third parity version is desirable. Resilver time is definitely important criteria. Besides the number of raw bits to transfer from the drive, you will also find that today's super-capacity SATA drives rotate more slowly, which increases access times. Since resilver is done in (roughly) the order that data was written, access time will be important to resilver times. A pool which has aged due to many snapshots, file updates, and file deletions, will require more seeks. The smaller drives are more responsive so their improved access time will help reduce resilver times. In other words, I think that you are making a wise choice. :-) Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS filesystem lock after running auto-re plicate.ksh - how to clear?
Fletcher Cocquyt fcocquyt at stanford.edu writes: I found this script for replicating zfs data: http://www.infrageeks.com/groups/infrageeks/wiki/8fb35/zfs_autoreplicate_script.html - I am testing it out in the lab with b129. It error-ed out the first run with some syntax error about the send component (recursive needed?) ..snip.. How do I clear the lock - I have not been able to find documentation on this... thanks! Hi, as one helpful user pointed out, the lock is not from ZFS, but an attribute set by the script to prevent contention (multiple replications etc). I used zfs get/set to clear the attribute and I was able to replicate the initial dataset - still working on the incrementals! thanks! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
In general I agree completely with what you are saying. Making reliable large capacity drives does appear to have become very difficult for the drive manufacturers, judging by the many sad comments from drive buyers listed on popular, highly-trafficked sales outlets' websites, like newegg. And I think your 750GB choice should be a good one. I'm currently using 750GB drives (WD7500AAKS) and they have worked flawlessly over the last 2 years. But knowing that drives don't last forever, it's time I looked for some new ones, assuming they can be reasonably assumed to be reliable from customer ratings and reports. If there's one manufacturer that *may* possibly have proved the exception, it might be Samsung with their 1.5TB and 2TB drives -- see my post just a little further up. And using triple parity RAID-Z3 does seem a good idea now when using these higher capacity drives. Or perhaps RAID-Z2 with a hot spare? I don't know which is better -- I guess RAID-Z3 is better, AND having a spare available ready to replace a failed drive when it happens. But I think I read that unused drive bearings seize up if unused so I don't know. Any comments? For resilvering to be required, I presume this will occur mostly in the event of a mechanical failure. Soft failures like bad sectors will presumably not require resilvering of the whole drive to occur, as these types of error are probably easily fixable by re-writing the bad sector(s) elsewhere using available parity data in redundant arrays. So in this case larger capacities and resilvering time shouldn't become an issue, right? And there's one big item of huge importance here, which is often overlooked by people, and that is the fact that one should always have a reasonably current backup available. Home RAID users often pay out the money for a high-capacity NAS and then think they're safe from failure, but a backup is still required to guard against loss. I do have a separate Solaris / ZFS machine dedicated to backups, but I do admit to not using it enough -- something I should improve. It contains a backup but an old one. Part of the reason for that is that to save money, I filled it with old drives of varying capacity in a *non-redundant* config to maximise available space from smaller drives mixed with larger drives. Being non-redundant, I shouldn't depend on its integrity, as there is a high likelihood of it containing multiple latent errors (bit rot). What might be a good idea for a backup box, is to use a large case to house all your old drives using multiple matched drive-capacity redundant vdevs. This way, each time you upgrade, you can still make use of your old drives in your backup machine, without disturbing the backup pool - i.e. simply adding a new vdev each time... -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
On Jan 23, 2010, at 12:04, Simon Breden wrote: And I think your 750GB choice should be a good one. I'm currently using 750GB drives (WD7500AAKS) and they have worked flawlessly over the last 2 years. But knowing that drives don't last forever, it's time I looked for some new ones, assuming they can be reasonably assumed to be reliable from customer ratings and reports. If there's one manufacturer that *may* possibly have proved the exception, it might be Samsung with their 1.5TB and 2TB drives -- see my post just a little further up. Have your storage needs expanded such that you've outgrown your current capacity? It may seem counter-intuitive, but is it worth considering replacing your current 750 GB drives with newer 750 GB drives, instead of going to a larger size? Would simply buying new drives be sufficient to get a new warranty, and presumably a device that has less wear on it? More is not always better (though it is more :). For resilvering to be required, I presume this will occur mostly in the event of a mechanical failure. Soft failures like bad sectors will presumably not require resilvering of the whole drive to occur, as these types of error are probably easily fixable by re-writing the bad sector(s) elsewhere using available parity data in redundant arrays. So in this case larger capacities and resilvering time shouldn't become an issue, right? Correct. Though it's recommended to run a 'scrub' on a regular (weekly?) basis to make sure data corruption / bit flipping is caught early. This will take some time and eat I/O, but can be done during low traffic times (overnight?). Scrubbing (like resilvering) is only done over used blocks, and not over the entire drive(s). And there's one big item of huge importance here, which is often overlooked by people, and that is the fact that one should always have a reasonably current backup available. Home RAID users often pay out the money for a high-capacity NAS and then think they're safe from failure, but a backup is still required to guard against loss. Depends on what the NAS is used for. It may be backup volume for the desktops / laptops of the house. In which case it's not /that/ essential for a backup of the backup to be done--though copying the data to an external drive regularly, and taking that offsite (work) would be useful in the case of fire or burglary. Of course if the NAS is the 'primary' data store for any data, and you're not replicating that data anywhere, you're tempting fate. There are two types of computer users: those have experienced catastrophic data failure, and those that will. I use OS X at home and have a FireWire drive for Time Machine, but I also purchased a FW dock and two stand-alone hard drives in which I use SuperDuper! to clone my boot volume to every Sunday. Then on Monday I take the drive (A) to work, and bring back the one I have there (Disk B). The syncing takes about 25 minutes each week with minimal effort (plug-in drive, launch SD!, press Copy). ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zero out block / sectors
On Fri, Jan 22, 2010 at 2:42 PM, Cindy Swearingen cindy.swearin...@sun.com wrote: Hi John, You might check with the virtualguru, Rudolf Kutina. Unfortunately, Rudolfs last day at Sun was Jan 15th: http://blogs.sun.com/VirtualGuru/entry/kiss_of_dead_my_last you can still catch up with him at his new blog: http://virtguru.wordpress.com/ . .snip .. Regards, -- Al Hopper Logical Approach Inc,Plano,TX a...@logical-approach.com Voice: 972.379.2133 Timezone: US CDT OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
On January 23, 2010 8:04:50 AM -0800 R.G. Keen k...@geofex.com wrote: The answer I came to, perhaps through lack of information and experience, is that there isn't a best 1.5tb drive. I decided that 1.5tb is too big, and that it's better to use more and smaller devices so I could get to raidz3. Please explain. I don't understand how smaller devices gets you to raidz3. With smaller devices, you probably have less need for raidz3 as you have more redundancy? It's the larger drives that forces you to add more parity. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] nfs mounts don't follow child filesystems?
I thought with NFS4 *on solaris* that clients would follow the zfs filesystem hierarchy and mount sub-filesystems. That doesn't seem to be happening and I can't find any documentation on it (either way). Did I only dream up this feature or does it actually exist? I am using s10_u8. thanks -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool
Hi, i found some time and was able to test again. - verify with unique uid of the device - verify with autoreplace = off Indeed autoreplace was set to yes for the pools. So I disabled the autoreplace. VOL PROPERTY VALUE SOURCE nxvol2 autoreplaceoff default Erased the labels on the cache disk and added it again to the pool. Now both cache disks have different guid's: # cache device in node1 r...@nex1:/volumes# zdb -l -e /dev/rdsk/c0t2d0s0 LABEL 0 version=14 state=4 guid=15970804704220025940 # cache device in node2 r...@nex2:/volumes# zdb -l -e /dev/rdsk/c0t2d0s0 LABEL 0 version=14 state=4 guid=2866316542752696853 GUID's are different. However after switching the pool nxvol2 to node1 (where nxvol1 was active), the disks picked up as cache dev's: # nxvol2 switched to this node ... volume: nxvol2 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM nxvol2 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t10d0 ONLINE 0 0 0 c4t13d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t9d0 ONLINE 0 0 0 c4t12d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t8d0 ONLINE 0 0 0 c4t11d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t18d0 ONLINE 0 0 0 c4t22d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t17d0 ONLINE 0 0 0 c4t21d0 ONLINE 0 0 0 cache c0t2d0 FAULTED 0 0 0 corrupted data # nxvol1 was active here before ... n...@nex1:/$ show volume nxvol1 status volume: nxvol1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM nxvol1 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t15d0 ONLINE 0 0 0 c4t18d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t14d0 ONLINE 0 0 0 c4t17d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t13d0 ONLINE 0 0 0 c4t16d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t12d0 ONLINE 0 0 0 c4t15d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t11d0 ONLINE 0 0 0 c4t14d0 ONLINE 0 0 0 cache c0t2d0 ONLINE 0 0 0 So this is true with and without autoreplace, and with differnt guid's of the devices. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zero out block / sectors
Mike Gerdts wrote: On Fri, Jan 22, 2010 at 1:00 PM, John Hoogerdijk john.hoogerd...@sun.com wrote: Is there a way to zero out unused blocks in a pool? I'm looking for ways to shrink the size of an opensolaris virtualbox VM and using the compact subcommand will remove zero'd sectors. I've long suspected that you should be able to just use mkfile or dd if=/dev/zero ... to create a file that consumes most of the free space then delete that file. Certainly it is not an ideal solution, but seems quite likely to be effective. I tried this with mkfile - no joy. jmh ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
Reading through your post brought back many memories of how I used to manage my data. I also found SuperDuper and Carbon Copy Cloner great for making a duplicate of my Mac's boot drive, which also contained my data. After juggling around with cloning boot/data drives and using non-redundant Time Machine backups etc, plus some manual copies here and there, I said 'there must be a better way' and so the long search ended up with the idea of having fairly 'dumb' boot drives containing OS and apps for each desktop PC and moving the data itself onto a redundant RAID NAS using ZFS. I won't bore you with the details any more -- see the link below if it's interesting. BTW, I still use SuperDuper for cloning my boot drive and it IS terrific. Regardless of where the data is, one still needs to do backups, like you say. Indeed, I know all about scrub and do that regularly and that is a great tool to guard against silent failure aka bit rot. Once your data is centralised, making data backups becomes easier, although other problems like the human factor still come into play :) If I left my backup system switched on 24/7 it would in theory be fairly easy to (1) automate NAS snapshots and then (2) automate zfs sends of the incremental differences between snapshots, but I don't want to spend the money on electricity for that. And when buying drives every few years, I always try to take advantage of Moore's law. Cheers, Simon http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/ -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
On Sat, 23 Jan 2010, Simon Breden wrote: pI just took a look at customer feedback on this drive here. 36% rate with one star, which I would consider alarming. Take a look here, ordered from lowest rating to highest rating. Note the recency of the comments and the descriptions:/p a href=http://www.newegg.com/Product/ProductReview.aspx?Item=22-148-412SortField=3SummaryType=0Pagesize=10SelectedRating=-1PurchaseMark=VideoOnlyMark=FalseVendorMark=Page=1Keywords=%28keywords%29;Seagate Barracuda LP ST31500541AS 1.5TB 5900 RPM/a pIs this the model you mean? If so, I might look at some other alternative possibilities./p This looks like a really good drive for use with zfs. Be sure to use a mirror configuration and keep in mind that zfs supports an arbitrary number of mirrors so that you can run six or ten of these drives in parallel so that there are enough working drives remaining to keep up with RMAed units. Be sure to mark any failed drive using a sledgehammer so that you don't accidentally use it again by mistake. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zero out block / sectors
On Sat, Jan 23, 2010 at 11:55 AM, John Hoogerdijk john.hoogerd...@sun.com wrote: Mike Gerdts wrote: On Fri, Jan 22, 2010 at 1:00 PM, John Hoogerdijk john.hoogerd...@sun.com wrote: Is there a way to zero out unused blocks in a pool? I'm looking for ways to shrink the size of an opensolaris virtualbox VM and using the compact subcommand will remove zero'd sectors. I've long suspected that you should be able to just use mkfile or dd if=/dev/zero ... to create a file that consumes most of the free space then delete that file. Certainly it is not an ideal solution, but seems quite likely to be effective. I tried this with mkfile - no joy. Let me ask a couple of the questions that come just after are you sure your computer is plugged in? Did you wait enough time for the data to be flushed to disk (or do sync and wait for it to complete) prior to removing the file? You did mkfile $huge /var/tmp/junk not mkfile -n $huge /var/tmp/junk, right? If not, I suspect that zpool replace to a thin provisioned disk is going to be your best bet (as suggested in another message). -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zero out block / sectors
Mike Gerdts wrote: On Sat, Jan 23, 2010 at 11:55 AM, John Hoogerdijk john.hoogerd...@sun.com wrote: Mike Gerdts wrote: On Fri, Jan 22, 2010 at 1:00 PM, John Hoogerdijk john.hoogerd...@sun.com wrote: Is there a way to zero out unused blocks in a pool? I'm looking for ways to shrink the size of an opensolaris virtualbox VM and using the compact subcommand will remove zero'd sectors. I've long suspected that you should be able to just use mkfile or dd if=/dev/zero ... to create a file that consumes most of the free space then delete that file. Certainly it is not an ideal solution, but seems quite likely to be effective. I tried this with mkfile - no joy. Let me ask a couple of the questions that come just after are you sure your computer is plugged in? :-) Did you wait enough time for the data to be flushed to disk (or do sync and wait for it to complete) prior to removing the file? yep - on the order of minutes and sync'd on the host as well. host is a Mac. You did mkfile $huge /var/tmp/junk not mkfile -n $huge /var/tmp/junk, right? yep. If not, I suspect that zpool replace to a thin provisioned disk is going to be your best bet (as suggested in another message). next on my list. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS filesystem lock after running auto-replicate.ksh - how to clear?
On Sat, Jan 23, 2010 at 8:44 AM, Fletcher Cocquyt fcocq...@stanford.edu wrote: Fletcher Cocquyt fcocquyt at stanford.edu writes: I found this script for replicating zfs data: http://www.infrageeks.com/groups/infrageeks/wiki/8fb35/zfs_autoreplicate_script.html - I am testing it out in the lab with b129. It error-ed out the first run with some syntax error about the send component (recursive needed?) ..snip.. How do I clear the lock - I have not been able to find documentation on this... thanks! Hi, as one helpful user pointed out, the lock is not from ZFS, but an attribute set by the script to prevent contention (multiple replications etc). I used zfs get/set to clear the attribute and I was able to replicate the initial dataset - still working on the incrementals! thanks! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss As the person who put in the original code for the ZFS lock/depend checks, the script is relatively simple. Seems Infrageeks added some better documentation which is very helpful. You'll want to make sure your remote side doesn't differ, ie. has the same current snapshots as the sender side. If the replication fails for some reason, unlock both sides with 'zfs set'. What problems are your experiencing with incrementals? -- Brent Jones br...@servuhome.net ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] nfs mounts don't follow child filesystems?
Frank Cusack wrote: I thought with NFS4 *on solaris* that clients would follow the zfs filesystem hierarchy and mount sub-filesystems. That doesn't seem to be happening and I can't find any documentation on it (either way). Did I only dream up this feature or does it actually exist? I am using s10_u8. Hi Frank, Solaris Nevada does this in build 77 or later, but it has not been backported to a Solaris 10 update. Rob T ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
Just to jump in. Did you guys ever consider to shortstroke a larger sata disk? I'm not familiar with this, but read a lot about it; Since the drive cache gets larger on the bigger drives. Bringing back a disk to roughly 25% of its capicity would give better cache ratio and less seektime. So 2TB would become 500GB, but better then a normal 500GB SATA. ( Or in your case, swing it down to 750Gb ) Regards, Armand - Original Message - From: Simon Breden sbre...@gmail.com To: zfs-discuss@opensolaris.org Sent: Saturday, January 23, 2010 7:53 PM Subject: Re: [zfs-discuss] Best 1.5TB drives for consumer RAID? Hi Bob, Why do you consider that model a good drive? Why do you like to use mirrors instead of something like RAID-Z2 / RAID-Z3? And how many drives do you (recommend to) use within each mirror vdev? Cheers, Simon http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/ -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
On Sat, 23 Jan 2010, Simon Breden wrote: Why do you consider that model a good drive? This is a good model of drive to test zfs's redundancy/resiliency support. It is surely recommended for anyone who does not have the resources to simulate drive failure. Why do you like to use mirrors instead of something like RAID-Z2 / RAID-Z3? Because raidz3 only supports tripple redundancy but mirrors can support much more. And how many drives do you (recommend to) use within each mirror vdev? Ten for this model of drive. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
On Sat, 23 Jan 2010, A. Krijgsman wrote: Just to jump in. Did you guys ever consider to shortstroke a larger sata disk? I'm not familiar with this, but read a lot about it; Since the drive cache gets larger on the bigger drives. Bringing back a disk to roughly 25% of its capicity would give better cache ratio and less seektime. Consider that a drive cache may be 16MB but the ZFS ARC cache can span up to 128GB of RAM in current servers, or much larger if SSDs are used to add a L2ARC. It seems to me that once the drive cache is large enough to contain a full drive track, that it is big enough. Perhaps a large drive cache may help with write performance. GB beats MB any day of the week. So 2TB would become 500GB, but better then a normal 500GB SATA. ( Or in your case, swing it down to 750Gb ) Or you could buy a smaller enterprise drive which is short-stroked by design. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] nfs mounts don't follow child filesystems?
On Sat, 23 Jan 2010, Frank Cusack wrote: I thought with NFS4 *on solaris* that clients would follow the zfs filesystem hierarchy and mount sub-filesystems. That doesn't seem to be happening and I can't find any documentation on it (either way). Did I only dream up this feature or does it actually exist? I am using s10_u8. The Solaris 10 automounter should handle this for you: % cat /etc/auto_home # Home directory map for automounter # #+auto_home * myserver:/export/home/ Notice that the referenced path is subordinate to the exported zfs filesystem. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
On Jan 23, 2010, at 12:12 PM, Bob Friesenhahn wrote: On Sat, 23 Jan 2010, A. Krijgsman wrote: Just to jump in. Did you guys ever consider to shortstroke a larger sata disk? I'm not familiar with this, but read a lot about it; Since the drive cache gets larger on the bigger drives. Bringing back a disk to roughly 25% of its capicity would give better cache ratio and less seektime. Consider that a drive cache may be 16MB but the ZFS ARC cache can span up to 128GB of RAM in current servers, or much larger if SSDs are used to add a L2ARC. Wimpy servers! To rewrite for 2010, Consider that a drive cache may be 16MB but the ZFS ARC cache can span up to 4 TB of RAM in current servers, or much larger if SSDs are used to add a L2ARC. :-) -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
On Jan 23, 2010, at 8:04 AM, R.G. Keen wrote: Interesting question. The answer I came to, perhaps through lack of information and experience, is that there isn't a best 1.5tb drive. I decided that 1.5tb is too big, and that it's better to use more and smaller devices so I could get to raidz3. My theory is that drives cost $100. When the price is $100, the drive is manufactured. When the price is $100, the drive is EOL and the manufacturer is flushing the inventory. Recently, 1.5 TB drives went below $100. So, if you consider avoiding the leading edge by buying EOL product, then it might not sound like such a good idea :-) -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
How does a previously highly rated drive that costed $100 suddenly become substandard when it costs $100 ? I can think of possible reasons, but they might not be printable here ;-) Cheers, Simon http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/ -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Trends in pool configuration
My main server doubles as a both a development system and web server for my work and a media server for home. When I built it in the early days of ZFS, drive prices were about four times current (500GB were the beading edge) and affordable SSDs were a way off so I opted for a stripe of 4 2way 320GB mirrors. This gave me good small read/write performance for compilation and enough capacity for media. Now I'm looking at replacing the box something quieter and cooler. With the improvements in ZFS and media, I'm looking at splitting the pool: A mirror of 2TB drives for media and a mirror of SSDs for build workspaces. Which leads to my question: Are people splitting pools based or workload, or opting for a single pool and cache devices? -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
On Sat, Jan 23, 2010 at 12:30:01PM -0800, Simon Breden wrote: And regarding mirror vdevs etc, I can see the usefulness of being able to build a mirror vdev of multiple drives for cases where you have really critical data -- e.g. a single 4-drive mirror vdev. I suppose regular backups can help with critical data too. Multi-way mirrors have lots of uses: - seek independence, for heavily read-biased loads (writes tend to kill this quickly by forcing all drives to seek together). - faster resilver times with less impact to production load (resilver reads are a particular case of the above) - capacity upgrades without losing redundancy in the process (note this is inherently n+1, proof by induction for arbitrary mirrors) - lots of variations of the attach another mirror, sync and detach workflow that zpool clone was created to support, whether for backup or reporting or remote replication or test systems or .. - burning in or qualifying new drives, to work out early failures before putting them in service (easy way to amplify a test workload by say 10x). Watch for slow units, as well as bad data/scrub fails. Just as good for amplifying test workload for controllers and other components. and.. um.. - testing dedup (make a n-way mirror out of n zvols on the same dedup'ed pool; comstar optional :) More seriously, though, it's for some of these scenarios that the zfs limitation of not being able to layer pool types (easily) is most irritating (raidz of mirrors, mirror of raidz). Again, that's in part because of practices developed previously; zfs may eventually offer even better solutions, but not yet. -- Dan. pgpUCCDnHnJbO.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
On Sat, Jan 23, 2010 at 09:04:31AM -0800, Simon Breden wrote: For resilvering to be required, I presume this will occur mostly in the event of a mechanical failure. Soft failures like bad sectors will presumably not require resilvering of the whole drive to occur, as these types of error are probably easily fixable by re-writing the bad sector(s) elsewhere using available parity data in redundant arrays. So in this case larger capacities and resilvering time shouldn't become an issue, right? Correct. However, consider that it's actually the *heads* that contribute most to errors accumulating; over time they lose the ability to read with the same sensitivity, for example. Of course this shows up first in some areas of the platter that already had slightly more marginal surface quality. This is why smart and similar systems consider both the absolute number of bad sectors, as well as the rate of discovery, as predictors of device failure. What might be a good idea for a backup box, is to use a large case to house all your old drives using multiple matched drive-capacity redundant vdevs. This way, each time you upgrade, you can still make use of your old drives in your backup machine, without disturbing the backup pool - i.e. simply adding a new vdev each time... This is basically my scheme at home - current generation drives are in service, the previous generation go in the backup pool, and the set before that become backup tapes. Every so often the same thing happens with the servers/chassis/controller/housing stuff, too. It's coming up to time for exactly one of those changeovers now. I always have a bunch of junk data in the main pool that really isn't worth backing up, which helps deal with the size difference. There's no need to constantly add vdevs to the backup pool, just do replacement upgrades the same as you did with your primary pool. I, too, will admit to not being as diligent at putting the scheme into regular practice as theory would demand. I may also relocate the backup pool at a neigbours house soon (or, really, trade backup pool space with him). -- Dan. pgpE5tTZyPXrY.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
On Sat, Jan 23, 2010 at 11:41 AM, Frank Cusack fcus...@fcusack.com wrote: On January 23, 2010 8:04:50 AM -0800 R.G. Keen k...@geofex.com wrote: The answer I came to, perhaps through lack of information and experience, is that there isn't a best 1.5tb drive. I decided that 1.5tb is too big, and that it's better to use more and smaller devices so I could get to raidz3. Please explain. I don't understand how smaller devices gets you to raidz3. With smaller devices, you probably have less need for raidz3 as you have more redundancy? It's the larger drives that forces you to add more parity. -frank Smaller devices get you to raid-z3 because they cost less money. Therefore, you can afford to buy more of them. -- --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
On January 23, 2010 5:17:16 PM -0600 Tim Cook t...@cook.ms wrote: Smaller devices get you to raid-z3 because they cost less money. Therefore, you can afford to buy more of them. I sure hope you aren't ever buying for my company! :) :) Smaller devices cost more $/GB; ie they are more expensive. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
On January 23, 2010 1:20:13 PM -0800 Richard Elling My theory is that drives cost $100. Obviously you're not talking about Sun drives. :) -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
Hey Dan, Thanks for the reply. Yes, I'd forgotten that it's often the heads that degrade -- something like lubricant buildup, IIRC. As well as SMART data, which I must admit to never looking at, presumably scrub errors are also a good indication of looming trouble due to head problems etc? As I've seen zero read/write/checksum errors after regular scrubs over 2 years, hopefully this is a reasonably good sign of r/w head health. Good to see you're already using a backup solution I have envisaged using. It seems to make sense: making use of old kit for backups to help preserve ROI on drive purchases -- even, no especially, for home users. Cheers, Simon http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/ -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
On Sat, Jan 23, 2010 at 5:39 PM, Frank Cusack fcus...@fcusack.com wrote: On January 23, 2010 5:17:16 PM -0600 Tim Cook t...@cook.ms wrote: Smaller devices get you to raid-z3 because they cost less money. Therefore, you can afford to buy more of them. I sure hope you aren't ever buying for my company! :) :) Smaller devices cost more $/GB; ie they are more expensive. First off, smaller devices don't necessarily cost more $/GB, but that's not really the point. For instance, the cheapest drive per GB is a 1.5TB drive today. The third cheapest is a 1TB drive. 2TB drives aren't even in the top ten. Regardless that's a great theory when you have an unlimited budget. When you've got a home system and X amount of dollars to spend, $/GB means absolutely nothing when you need a certain number of drives to have the redundancy you require. -- --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
On Jan 23, 2010, at 3:47 PM, Frank Cusack wrote: On January 23, 2010 1:20:13 PM -0800 Richard Elling My theory is that drives cost $100. Obviously you're not talking about Sun drives. :) Don't confuse cost with price :-) -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] customizing zfs list with less typing
It might be nice if zfs list would check an environment variable for a default list of properties to show (same as the comma-separated list used with the -o option). If not set, it would use the current default list; if set, it would use the value of that environment variable as the list. I find there are a lot of times I want to see the same one additional property when using zfs list; an environment variable would mean a one-time edit of .profile rather than typing the -o option with the default list modified by whatever I want to add. Along those lines, pseudo properties that were abbreviated (constant-length) versions of some real properties might help. For instance, sharenfs can be on, off, or a rather long list of nfs sharing options. A pseudo property with a related name and a value of on, off, or spec (with spec implying some arbitrary list of applicable options) would have a constant length. Given two potentially long properties (mountpoint and the pseudo property name), output lines are already close to cumbersome (that assumes one at the beginning of the line and one at the end). Additional potentially long properties in the output would tend to make it unreadable. Both of those, esp. together, would make quickly checking or familiarizing oneself with a server that much more civilized, IMO. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
On Sat, Jan 23, 2010 at 06:39:25PM -0500, Frank Cusack wrote: On January 23, 2010 5:17:16 PM -0600 Tim Cook t...@cook.ms wrote: Smaller devices get you to raid-z3 because they cost less money. Therefore, you can afford to buy more of them. I sure hope you aren't ever buying for my company! :) :) Smaller devices cost more $/GB; ie they are more expensive. Usually, other than the very largest (most recent) drives, that are still at a premium price. However, it all depends on your budget considerations. Budget applies not only to currency. You may be more constrained by available controller ports, motherboard slots, case drive bays, noise, power, heat or other factors. Even if it still comes back to currency units, adding more ports or drive bays can easily outweigh the cost of the drives to go on/in them, especially in the consumer market. There's usually a big step where just one more drive means a totally different solution. If you're targetting total available space, small drives really do cost more for the same space, when all these factors are counted. That's what sells the bigger drives, despite the premium. The other constraint is redundancy - I need N drives (raidz3 in the OP's case), the smaller size is big enough and maybe the only way to also be cheap enough. -- Dan. pgpTikO38711K.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] nfs mounts don't follow child filesystems?
On Sat, 23 Jan 2010, Frank Cusack wrote: Notice that the referenced path is subordinate to the exported zfs filesystem. Well, assuming there is a single home zfs filesystem and not a filesystem-per-user. For filesystem-per-user your example simply mounts the correct shared filesystem. Even for a single home filesystem, the above doesn't actually mount /export/home and then also mount /export/home/USER, so it's not following the zfs filesystem hierarchy. I am using a filesystem-per-user on my system. Are you saying that my system is wrong? These per-user fileystems are NFS exported due to the inheritance of zfs properties from their parent directory. The property is only set in one place. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
pI just took a look at customer feedback on this drive here. 36% rate with one star, which I would consider alarming. Take a look here, ordered from lowest rating to highest rating. Note the recency of the comments and the descriptions:/p Every people vote in different way for the same things. A lot of 1 star are about DOA. Maybe Neweggs have a bad batch (happens sometimes) The Bad news propagate much faster that the good ones. A angry user is more probable to post a bad review that a happy user. 120 review are a tiny sample to make a decision. look in the other way 50% is 4-5 stars, 1/2 is very happy. I've said that every brand have problem at moment. At 1.5TB there's few choice. For me the RMA service have a important role, because I expect to use it. I don't expect to see the five HDDs running flawless for 3 years. Samsung have no direct RMA service in my country, so the tournaround is a shoot in the dark, few weeks, maybe a month. Pick a rock solid product is more a matter of luck. I've a Deskstar DTLA da 13GB, runnings strong after nealy 11 years or so ! maybe it's the last ones alive. Reading all over internet it should be dead 12 years ago. If the failure rate was 36% Seagate was toast. The Barracuda LP can't be the right driver for everyone, but if you look at 1.5TB cheap consumer driver, that run coolquiet it deserve strong cosideration. It's better on the paper that the WD green, (stardard sector, no start/stop cycle after 8 sec of inactivity), Hitachi have no 1.5TB at the moment. samsung have a new recent model, the 7200.11 is a old product on 4 platters. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
I just took a look at customer feedback on this drive here. 36% rate with one star, which I would consider alarming. Take a look here, ordered from lowest rating to highest rating. Note the recency of the comments and the descriptions: Every people vote in different way for the same things. A lot of 1 star are about DOA. Maybe Neweggs have a bad batch (happens sometimes) The Bad news propagate much faster that the good ones. A angry user is more probable to post a bad review that a happy user. 120 review are a tiny sample to make a decision. look in the other way 50% is 4-5 stars, 1/2 is very happy. I've said that every brand have problem at moment. At 1.5TB there's few choice. For me the RMA service have a important role, because I expect to use it. I don't expect to see the five HDDs running flawless for 3 years. Samsung have no direct RMA service in my country, so the tournaround is a shoot in the dark, few weeks, maybe a month. Pick a rock solid product is more a matter of luck. I've a 13GB Deskstar DTLA, runnings strong after nealy 11 years or so ! maybe it's the last ones alive. Reading all over internet it should be dead 12 years ago. If the failure rate was 36% Seagate was toast. The Barracuda LP ca n't be the right driver for everyone, but if you look at 1.5TB cheap consumer driver, that run coolquiet it deserve strong cosideration. It's better on the paper that the WD green, (stardard sector, no start/stop cycle after 8 sec of inactivity), Hitachi have no 1.5TB at the moment. samsung have a new recent model, the 7200.11 is a old product on 4 platters. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] experience with sata p-m's?
As I said in another post, it's coming time to build a new storage platform at home. I'm revisiting all the hardware options and permutations again, for current kit. Build 125 added something I was very eager for earlier, sata port-multiplier support.Since then, I've seen very little, if any, comment or reports of success or trouble using them. Maybe they're on some other mailing list I'm not aware of? References and pointers welcome. Otherwise, I'd be keen to learn of any practical experiences from others on the list. Vendors, models, types, compatible controllers, etc. Finally, does anyone know of one of those 5-in-3 drive mounting bays that has the PM built into the backplane? That would eliminate a whole mess of cabling, but I haven't found any (closest is the backplanes used by backblaze with their case design). -- Dan. pgpnRpFC48OhS.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool
AIUI, this works as designed. I think the best practice will be to add the L2ARC to syspool (nee rpool). However, for current NexentaStor releases, you cannot add cache devices to syspool. Earlier I mentioned that this made me nervous. I no longer hold any reservation against it. It should work just fine as-is. -- richard On Jan 23, 2010, at 9:53 AM, Lutz Schumann wrote: Hi, i found some time and was able to test again. - verify with unique uid of the device - verify with autoreplace = off Indeed autoreplace was set to yes for the pools. So I disabled the autoreplace. VOL PROPERTY VALUE SOURCE nxvol2 autoreplaceoff default Erased the labels on the cache disk and added it again to the pool. Now both cache disks have different guid's: # cache device in node1 r...@nex1:/volumes# zdb -l -e /dev/rdsk/c0t2d0s0 LABEL 0 version=14 state=4 guid=15970804704220025940 # cache device in node2 r...@nex2:/volumes# zdb -l -e /dev/rdsk/c0t2d0s0 LABEL 0 version=14 state=4 guid=2866316542752696853 GUID's are different. However after switching the pool nxvol2 to node1 (where nxvol1 was active), the disks picked up as cache dev's: # nxvol2 switched to this node ... volume: nxvol2 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM nxvol2 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t10d0 ONLINE 0 0 0 c4t13d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t9d0 ONLINE 0 0 0 c4t12d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t8d0 ONLINE 0 0 0 c4t11d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t18d0 ONLINE 0 0 0 c4t22d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t17d0 ONLINE 0 0 0 c4t21d0 ONLINE 0 0 0 cache c0t2d0 FAULTED 0 0 0 corrupted data # nxvol1 was active here before ... n...@nex1:/$ show volume nxvol1 status volume: nxvol1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM nxvol1 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t15d0 ONLINE 0 0 0 c4t18d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t14d0 ONLINE 0 0 0 c4t17d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t13d0 ONLINE 0 0 0 c4t16d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t12d0 ONLINE 0 0 0 c4t15d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t11d0 ONLINE 0 0 0 c4t14d0 ONLINE 0 0 0 cache c0t2d0 ONLINE 0 0 0 So this is true with and without autoreplace, and with differnt guid's of the devices. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
On January 23, 2010 6:09:49 PM -0600 Tim Cook t...@cook.ms wrote: When you've got a home system and X amount of dollars to spend, $/GB means absolutely nothing when you need a certain number of drives to have the redundancy you require. Don't you generally need a certain amount of GB? I know I plan my storage based on how much data I have, even my home systems. And THEN add in the overhead for redundancy. If we're talking about such a small amount of storage (home) that the $/GB is not a factor (ie, even with the most expensive $/GB drives we won't exceed the budget and we don't have better things to spend the money on anyway) then raidz3 seems unnecessary. I mean, just do a triple mirror of the 1.5TB drives rather than say (6) .5TB drives in a raidz3. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] nfs mounts don't follow child filesystems?
On January 23, 2010 6:53:26 PM -0600 Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: On Sat, 23 Jan 2010, Frank Cusack wrote: Notice that the referenced path is subordinate to the exported zfs filesystem. Well, assuming there is a single home zfs filesystem and not a filesystem-per-user. For filesystem-per-user your example simply mounts the correct shared filesystem. Even for a single home filesystem, the above doesn't actually mount /export/home and then also mount /export/home/USER, so it's not following the zfs filesystem hierarchy. I am using a filesystem-per-user on my system. Are you saying that my system is wrong? These per-user fileystems are NFS exported due to the inheritance of zfs properties from their parent directory. The property is only set in one place. You have misunderstood the problem. Of course, or rather I understand, that zfs child filesystems inherit the sharenfs property from the parent similar to how they inherit other properties. (And even if they didn't, clients can still mount subdirectories of the directory that is shared unless the server explicitly disallows that option. Regardless of underlying filesystem.) With zfs filesystems, when you have a directory which is a subordinate filesystem, as in filesystem-per-user, then if the NFS client mounts the parent filesystem, when it crosses the child filesystem boundary it does not see into the child filesystem as it would if it were local. server: export export/home export/home/user client mounts server:/export/home on /home. the client can see (e.g.) /home/user, but as an empty directory. when the client enters that directory it is writing into the export/home filesystem on the server (and BTW those writes are not visible on the server since they are obscured by the child filesystem.) NFS4 has a mechanism to follow and mount the child filesystem. Your example doesn't do that, it simply mounts the child filesystem directly. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] customizing zfs list with less typing
On January 23, 2010 4:33:59 PM -0800 Richard L. Hamilton rlha...@smart.net wrote: It might be nice if zfs list would check an environment variable for a default list of properties to show (same as the comma-separated list used with the -o option). If not set, it would use the current default list; if set, it would use the value of that environment variable as the list. I find there are a lot of times I want to see the same one additional property when using zfs list; an environment variable would mean a one-time edit of .profile rather than typing the -o option with the default list modified by whatever I want to add. ... Both of those, esp. together, would make quickly checking or familiarizing oneself with a server that much more civilized, IMO. Just make 'zfs' an alias to your version of it. A one-time edit of .profile can update that alias. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
On Sat, Jan 23, 2010 at 7:57 PM, Frank Cusack fcus...@fcusack.com wrote: On January 23, 2010 6:09:49 PM -0600 Tim Cook t...@cook.ms wrote: When you've got a home system and X amount of dollars to spend, $/GB means absolutely nothing when you need a certain number of drives to have the redundancy you require. Don't you generally need a certain amount of GB? I know I plan my storage based on how much data I have, even my home systems. And THEN add in the overhead for redundancy. If we're talking about such a small amount of storage (home) that the $/GB is not a factor (ie, even with the most expensive $/GB drives we won't exceed the budget and we don't have better things to spend the money on anyway) then raidz3 seems unnecessary. I mean, just do a triple mirror of the 1.5TB drives rather than say (6) .5TB drives in a raidz3. -frank I bet you'll get the same performance out of 3x1.5TB drives you get out of 6x500GB drives too. Are you really trying to argue people should never buy anything but the largest drives available? I hope YOU aren't ever buying for MY company. -- --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] hard drive choice, TLER/ERC/CCTL
On Fri, Jan 22, 2010 at 04:12:48PM -0500, Miles Nordin wrote: w http://www.csc.liv.ac.uk/~greg/projects/erc/ dead link? Works for me - this is someone who's written patches for smartctl to set this feature; these are standardised/documented commands, no reverse engineering of DOS tools required. IMHO it is just a sucker premium because the feature is worthless anyway. There are two points here: - is the feature worth paying the premium for raid edition drives, assuming it's the main difference between them? If there are other differences, they have to factor into the assessment. For me and others here, the answer is clearly no. - for two otherwise comparable drives, for comparable price, would I choose the one with this feature? That's a very different question, and for me and others here, the answer is clearly yes. From the discussion I've read here, the feature is designed to keep drives which are *reporting failures* to still be considered *GOOD*, and to not drop out of RAIDsets in RAID-on-a-card implementations with RAID-level timeouts 60seconds. No. It is designed to make drives report errors at all, and within predictable time limits, rather than going non-responsive for indeterminate times and possibly reporting an error eventually. The rest of the response process, whether from a raid card or zfs+driver stack, and whether based on timeouts or error reports, is a separate issue. (more on which, below) Consider that a drive that goes totally failed and unresponsive can only be removed by timeout; this lets you tell the difference between failure modes, and know what's a sensible timeout to consider the drive really-failed. The solaris timeout, because of m * n * o multiplicative layered speculative retry nonsense, is 60 seconds or 180 seconds or many hours, so solaris is IMHO quite broken in this regard but also does not benefit from the TLER workaround: the long-TLER drives will not drop out of RAIDsets on ZFS even if they report an error now and then. There's enough truth in here to make an interesting rant, as always with Miles. I did enjoy it, because I do share the frustration. However, the key point is that concrete reported errors are definitive events to which zfs can respond, rather than relying on timeouts, however abstract or hidden or layered or frustrating. Making the system more deterministic is worthwhile. What's really needed for ZFS or RAID in general is (a) for drives to never spend more than x% of their time attempting recovery Sure. When you find where we can buy such a drive, please let us all know. Because a 7-second-per-read drive will fuck your pool just as badly as a 70-second-per-read drive: you're going to have to find and unplug it before the pool will work again. Agreed, to a point. If the drive repeatedly reports errors, zfs can and will respond by taking it offline. Even if it doesn't and you have to manually take it offline, at least you will know which drive is having difficulty. -- Dan. pgpRUZ3HIUyrp.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
Mirko wrote: Well, I've purchased 5 Barracuda LP 1.5TB. They ran very queit, cool, 5 in a cage and the vibration are nearly zero. reliability ? Well every HDD is unreliable, every major brand at this time have problems, so go for the best bang for the bucks. In my country Seagate have the best RMA service, with tournaround in about 1 week or so, WD is 3-4 weeks. Samsung have no direct RMA service, Hitachi well have a foot out HDD business IMHO, no attractive product at moment. I really wonder why the Hitachi 2GB is the cheapest in Singapore Seagate and WD price the 1TB around S$125 and 2TB around S$305 However Hitachi 1TB is around S$125 and 2TB around S$248, quite a steal. Since this is anomaly, anybody know what technology difference did Hitachi make to hit that price? Or do I miss a prophecy of disaster related to their business or technology? All of them are on 3 years warranty. Thinking of replacing my (8 +1)x 500 GB Seagate Barracuda with the 2TB disks. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] customizing zfs list with less typing
Just make 'zfs' an alias to your version of it. A one-time edit of .profile can update that alias. Sure; write a shell function, and add an alias to it. And use a quoted command name (or full path) within the function to get to the real command. Been there, done that. But to do a good job of it means parsing the command line the same way the real command would, so that it only adds -o ${ZFS_LIST_PROPS:-name,used,available,referenced,mountpoint} or perhaps better ${ZFS_LIST_PROPS:+-o ${ZFS_LIST_PROPS}} to zfs list (rather than to other subcommands), and only if the -o option wasn't given explicitly. That's not only a bit of a pain, but anytime one is duplicating parsing, it's begging for trouble: in case they don't really handle it the same, or in case the underlying command is changed. And unless that sort of thing is handled with extreme care (quoting all shell variable references, just for starters), it can turn out to be a security problem. And that's just the implicit options part of what I want; the other part would take optionally filtering to modify the command output as well. That's starting to get nuts, IMO. Heck, I can grab a copy of the source for the zfs command, modify it, and recompile it (without building all of OpenSolaris) faster than I can write a really good shell wrapper that does the same thing. But then I have to maintain my own divergent implementation, unless I can interest someone else in the idea...OTOH by the time the hoop-jumping for getting something accepted is over, it's definitely been more bother gain... -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] What is the normal operating temperature for consumer SATA drive?
I am curious to know what is the normal operating temperature of consumer SATA drive, and what is the considered maximum limit I need to watch out? These are my disks SMART output under FreeNAS 0.7RC2, where my ambient temperature is 28 C without air conditioning. ad4 476941MB ST3500418AS/CC38 10.36 KiB/t, 0 tps, 0.00 MiB/s 34 °C ONLINE ad6 476941MB ST3500418AS/CC34 9.70 KiB/t, 0 tps, 0.00 MiB/s 32 °C ONLINE ad8 476941MB ST3500320AS/SD1A 13.12 KiB/t, 0 tps, 0.00 MiB/s 31 °C ONLINE ad10 476941MB ST3500320AS/SD1A 12.63 KiB/t, 0 tps, 0.00 MiB/s 31 °C ONLINE ad12 953870MB Hitachi HDT721010SLA360/ST6OA31B 51.87 KiB/t, 6 tps, 0.30 MiB/s 45 °C ONLINE ad14 953870MB Hitachi HDS721010CLA332/JP4OA25C 53.38 KiB/t, 6 tps, 0.30 MiB/s 42 °C ONLINE da0 476941MB USB External 500GB 5.75 KiB/t, 0 tps, 0.00 MiB/s n/a ONLINE As you can see from the above, the Hitachi drives (2 x mirror) are always hotter than Seagate (4 x raidz), although they are on the same cooling mechanism. Thanks in advance, Dedhi ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] (snv_129, snv_130) can't import zfs pool
I'd like to thank Tim and Cindy at Sun for providing me with a new zfs binary file that fixed my issue. I was able to get my zpool back! Hurray! Thank You. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] sharesmb name not working
I can't get sharesmb=name= to workit worked in b130i'm not sure if it's broken in 131 or if my machine is being a pain. anyways, when i try to do this: zfs set sharesmb=name=wonslung tank/nas/Wonslung i get this: cannot set property for 'tank/nas/Wonslung': 'sharesmb' cannot be set to invalid options i've googled this...and it seems to pop up a lot but so far i can't find any solutions...it's really driving me nuts. Also, when i try to create a NEW share the same thing happens when i use -o triggers. Please help ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] hard drive choice, TLER/ERC/CCTL
On Jan 23, 2010, at 5:06 AM, Simon Breden wrote: Thanks a lot. I'd looked at SO many different RAID boxes and never had a good feeling about them from the point of data safety, that when I read the 'A Conversation with Jeff Bonwick and Bill Moore – The future of file systems' article (http://queue.acm.org/detail.cfm?id=1317400), I was convinced that ZFS sounded like what I needed, and thought I'd try to help others see how good ZFS was and how to make their own home systems that work. Publishing the notes as articles had the side-benefit of allowing me to refer back to them when I was reinstalling a new SXCE build etc afresh... :) It's good to see that you've been able to set the error reporting time using HDAT2 for your Samsung HD154UI drives, but it is a pity that the change does not persist through cold starts. From a brief look, it looks like like the utility runs under DOS, so I wonder if it would be possible to convert the code into C and run it immediately after OpenSolaris has booted? That would seem a reasonable automated workaround. I might take a little look at the code. However, the big questions still remain: 1. Does ZFS benefit from shorter error reporting times? In general, any system which detects and acts upon faults, would like to detect faults sooner rather than later. 2. Does having shorter error reporting times provide any significant data safety through, for example, preventing ZFS from kicking a drive from a vdev? On Solaris, ZFS doesn't kick out drives, FMA does. You can see the currently loaded diagnosis engines using pfexec fmadm config MODULE VERSION STATUS DESCRIPTION cpumem-retire1.1 active CPU/Memory Retire Agent disk-transport 1.0 active Disk Transport Agent eft 1.16active eft diagnosis engine ext-event-transport 0.1 active External FM event transport fabric-xlate 1.0 active Fabric Ereport Translater fmd-self-diagnosis 1.0 active Fault Manager Self-Diagnosis io-retire2.0 active I/O Retire Agent sensor-transport 1.1 active Sensor Transport Agent snmp-trapgen 1.0 active SNMP Trap Generation Agent sysevent-transport 1.0 active SysEvent Transport Agent syslog-msgs 1.0 active Syslog Messaging Agent zfs-diagnosis1.0 active ZFS Diagnosis Engine zfs-retire 1.0 active ZFS Retire Agent Diagnosis engines relevant to ZFS include: disk-transport: diagnose SMART reports fabric-xlate: translate PCI, PCI-X, PCI-E, and bridge reports zfs-diagnosis: notifies FMA when checksum, IO, and probe failure errors are found by ZFS activity. It also properly handles errors as a result of device removal. zfs-retire: manages hot spares for ZFS pools io-retire: retires a device which was diagnosed as faulty (NB may happen at next reboot) snmp-trapgen: you do configure SNMP traps, right? :-) Drivers, such as sd/ssd, can send FMA telemetry which will feed the diagnosis engines. Those are the questions I would like to hear somebody give an authoritative answer to. This topic is broader than ZFS. For example, a disk which has both a UFS and ZFS file system could be diagnosed by UFS activity and retired, which would also affect the ZFS pool that uses the disk. Similarly, the disk-transport agent can detect overtemp errors, for which a retirement is a corrective action. For more info, visit the FMA community: http://hub.opensolaris.org/bin/view/Community+Group+fm/ As for an authoritative answer, UTSL. http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/fm/modules/common -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] What is the normal operating temperature for consumer SATA drive?
On Sun, Jan 24 at 11:44, Dedhi Sujatmiko wrote: I am curious to know what is the normal operating temperature of consumer SATA drive, and what is the considered maximum limit I need to watch out? These are my disks SMART output under FreeNAS 0.7RC2, where my ambient temperature is 28 C without air conditioning. ad4 476941MB ST3500418AS/CC38 10.36 KiB/t, 0 tps, 0.00 MiB/s 34 °C ONLINE ad6 476941MB ST3500418AS/CC34 9.70 KiB/t, 0 tps, 0.00 MiB/s 32 °C ONLINE ad8 476941MB ST3500320AS/SD1A 13.12 KiB/t, 0 tps, 0.00 MiB/s 31 °C ONLINE ad10 476941MB ST3500320AS/SD1A 12.63 KiB/t, 0 tps, 0.00 MiB/s 31 °C ONLINE ad12 953870MB Hitachi HDT721010SLA360/ST6OA31B 51.87 KiB/t, 6 tps, 0.30 MiB/s 45 °C ONLINE ad14 953870MB Hitachi HDS721010CLA332/JP4OA25C 53.38 KiB/t, 6 tps, 0.30 MiB/s 42 °C ONLINE da0 476941MB USB External 500GB 5.75 KiB/t, 0 tps, 0.00 MiB/s n/a ONLINE As you can see from the above, the Hitachi drives (2 x mirror) are always hotter than Seagate (4 x raidz), although they are on the same cooling mechanism. Thanks in advance, Dedhi Those temperatures are fine, I wouldn't worry about it. If you get much above 45C you might want to get a bit more airflow over the hitachi drives. It's possible the mirror drives are doing more seeking thus driving their temps up as well. -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
These days, I've switched to 2.5 SATA laptop drives for large-storage requirements. They're going to cost more $/GB than 3.5 drives, but they're still not horrible ($100 for a 500GB/7200rpm Seagate Momentus). They're also easier to cram large numbers of them in smaller spaces, so it's easier to get larger number of spindles in the same case. Not to mention being lower-power than equivalent 3.5 drives. My sole problem is finding well-constructed high-density 2.5 hot-swap bay/chassis setups. If anyone has a good recommendation for a 1U or 2U JBOD chassis for 2.5 drives, that would really be helpful. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss