Re: [zfs-discuss] Case study/recommended ZFS setup for home file
Florin Iucha wrote: On Wed, Jul 09, 2008 at 08:42:37PM -0700, Bohdan Tashchuk wrote: I cannot use OpenSolaris 2008.05 since it does not recognize the SATA disks attached to the southbridge. A fix for this problem went into build 93. Which forum/mailing list discusses SATA issues like the above? #opensolaris in freenode.net I booted from the OpenSolaris LiveCD/installer, and noticing the lack of available disks, I cried for help on #irc. There were a few helpful people that gave me some commands to run, to try and get this going. After their efforts failed, I googled for solaris and SB600 (this is the ATI SouthBridge chip) and found a forum posting from another user, back in February, and the hit in bugzilla, pointing to the resolution of the bug, with the target being snv_93. For reference of other people, the bug in question is http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6665032 6665032 ahci driver doesn't work for ATI SB600 AHCI chipset (ASUS M2A-VM) which is fixed in snv_93 James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slog device
Yes, but that talks about Flash systems, and the end of the year. My concern is whether Sun will also be releasing flash add-on cards that we can make use of elsewhere, including on already purchased Sun kit. Much as I'd love to see Sun add a lightning fast flash boosted server to their x64 range, that's not going to help me with ZFS and my existing hardware. I'd really like to know if Sun have any plans for a PCI-X or PCI-E flash card with Solaris drivers, and if they don't have any plans along those lines, I'd love to see some Solaris drivers for the Fusion-io product. Ross This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Case study/recommended ZFS setup for home file server
My recommendation: buy a small, cheap 2.5 SATA hard drive (or 1.8 SSD) and use that as your boot volume, I'd even bolt it to the side of your case if you have to. Then use the whole of your three large disks as a raid-z set. If I were in your shoes I would also have bought 4 drives for ZFS instead of 3, and gone for raid-z2. And finally, I don't know how much room you have in your current case, but if you're ever looking for one that takes more drives I can highly recommend the Antec P182. I've got 6x 1TB drives in my home server and in that case it's so quiet I can't even hear it turn on. My watch ticking easily drowns out this server. PS. If you're going to be using CIFS, avoid build 93. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Case study/recommended ZFS setup for home file server
Brandon High wrote: On Wed, Jul 9, 2008 at 3:37 PM, Florin Iucha [EMAIL PROTECTED] wrote: The question is, how should I partition the drives, and what tuning parameters should I use for the pools and file systems? From reading the best practices guides [1], [2], it seems that I cannot have the root file system on a RAID-5 pool, but it has to be a separate storage pool. This seems to be slightly at odds with the suggestion of using whole-disks for ZFS, not just slices/partitions. The reason for using a whole disk is that ZFS will turn on the drive's cache. When using slices, the cache is normally disabled. If all slices are using ZFS, you can turn the drive cache back on. I don't think it happens by default right now, but you can set it manually. As I recall, using whole disk as zfs also change the disk label to EFI. Meaning, you can't boot from it. Another alternative is to use an IDE to Compact Flash adapter, and boot off of flash. Just curious, what will that flash contain? e.g. will it be similar to linux's /boot, or will it contain the full solaris root? How do you manage redundancy (e.g. mirror) for that boot device? My plan right now is to create a 20 GB and a 720 GB slice on each disk, then create two storage pools, one RAID-1 (20 GB) and one RAID-5 (1.440 TB). Create the root, var, usr and opt file systems in the first pool, and home, library and photos in the second. Good plan. I hope I won't need swap, but I could create three 1 GB slices (one on each disk) for that. If you have enough memory (say 4gb) you probably won't need swap. I believe swap can live in a ZFS pool now too, so you won't necesarily need another slice. You'll just have RAID-Z protected swap. Really? I think solaris still needs non-zfs swap for default dump device. Regards, Fajar smime.p7s Description: S/MIME Cryptographic Signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] X4540
I think it's a cracking upgrade Richard. I was hoping Sun would do something like this, so it's great to see it arrive. As others have said though, I think Sun are missing a trick by not working with Vmetro or Fusion-io to add nvram cards to the range now. In particular, if Sun were to work with Fusion-io and add Solaris drivers for the ioDrive, you'd be in a position right now to offer a 48TB server with 64GB of read cache, and 80GB of write cache You could even offer the same card on the smaller x4240. Can you imagine how well those machines would work as NFS servers? Either one would make a superb NFS storage platform for VMware: You've got incredible performance, ZFS snapshots for backups, and ZFS send/receive to replicate the data elsewhere. NetApp and EMC charge a small fortune for a NAS that can do all that, and they don't offer anywhere near that amount of fast cache. Both servers would take Infiniband too, which is dirt cheap these days at $125 a card, is supported by VMware, and particularly on the smaller server, is way faster than anything EMC or NetApp offer. As a NFS storage platform, you'd be beating EMC and NetApp on price, spindle count, features and performance. I really hope somebody at Sun considers this, and thinks about expanding the What can you do with an x4540 section on the website to include VMware. Ross This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slog device
The problem with that is that I'd need to mirror them to guard against failure, I'd loose storage capacity, and the peak throughput would be horrible when compared to the array. I'd be sacrificing streaming speed for random write speed, whereas with a PCIe nvram card I can have my cake an eat it. ps. Yes, I'm greedy :D This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] previously mentioned J4000 released
Heh, I like the way you think Tim. I'm sure Sun hate people like us. The first thing I tested when I had an x4500 on trial was to make sure an off the shelf 1TB disk worked in it :) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] previously mentioned J4000 released
.. And the answer was yes I hope. we are sriously thinking of buying 48 1 tb disk to replace those in a 1 year old thumper please confirm it again :) 2008/7/10, Ross [EMAIL PROTECTED]: Heh, I like the way you think Tim. I'm sure Sun hate people like us. The first thing I tested when I had an x4500 on trial was to make sure an off the shelf 1TB disk worked in it :) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Tommaso Boccali INFN Pisa ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Case study/recommended ZFS setup for home file server
Fajar A. Nugraha wrote: If you have enough memory (say 4gb) you probably won't need swap. I believe swap can live in a ZFS pool now too, so you won't necesarily need another slice. You'll just have RAID-Z protected swap. Really? I think solaris still needs non-zfs swap for default dump device. No longer true, you can swap and dump to a ZVOL (but not the same one). This change came in after OpenSolaris 2008.05 LiveCD/Install was cut so it doesn't take advantage of that. There was a big long thread cross posted to this list about it just recently. The current SX:CE installer (ie Nevada) uses ZVOL for swap and dump. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] proposal partial/relative paths for zfs(1)
I regularly create new zfs filesystems or snapshots and I find it annoying that I have to type the full dataset name in all of those cases. I propose we allow zfs(1) to infer the part of the dataset name upto the current working directory. For example: Today: $ zfs create cube/builds/darrenm/bugs/6724478 With this proposal: $ pwd /cube/builds/darrenm/bugs $ zfs create 6724478 Both of these would result in a new dataset cube/builds/darrenm/6724478 This will need some careful though about how to deal with cases like this: $ pwd /cube/builds/ $ zfs create 6724478/test What should that do ? should it create cube/builds/6724478 and cube/builds/6724478/test ? Or should it fail ? -p already provides some capbilities in this area. Maybe the easiest way out of the ambiquity is to add a flag to zfs create for the partial dataset name eg: $ pwd /cube/builds/darrenm/bugs $ zfs create -c 6724478 Why -c ? -c for current directory -p partial is already taken to mean create all non existing parents and -r relative is already used consistently as recurse in other zfs(1) commands (as well as lots of other places). Alternately: $ pwd /cube/builds/darrenm/bugs $ zfs mkdir 6724478 Which would act like mkdir does (including allowing a -p and -m flag with the same meaning as mkdir(1)) but creates datasets instead of directories. Thoughts ? Is this useful for anyone else ? My above examples are some of the shorter dataset names I use, ones in my home directory can be even deeper. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] We have a driver for the MM-5425CN
Hey everybody, Well, my pestering paid off. I have a Solaris driver which you're welcom to download, but please be aware that it comes with NO SUPPORT WHATSOEVER. I'm very grateful to the chap who provided this driver, please don't abuse his generosity by calling Micro Memory or Vmetro if you have any problems. I've no idea which version of Solaris this was developed for, how many other cards it works with, or if it even works in the current version of solaris. Use at your own risk. http://www.averysilly.com/Micro_Memory_MM-5425CN.zip Ross This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] proposal partial/relative paths for zfs(1)
On Thu, 2008-07-10 at 11:42 +0100, Darren J Moffat wrote: I regularly create new zfs filesystems or snapshots and I find it annoying that I have to type the full dataset name in all of those cases. I propose we allow zfs(1) to infer the part of the dataset name upto the current working directory. For example: Today: $ zfs create cube/builds/darrenm/bugs/6724478 With this proposal: $ pwd /cube/builds/darrenm/bugs $ zfs create 6724478 Both of these would result in a new dataset cube/builds/darrenm/6724478 I find this annoying as well. Another way that would help (but is fairly orthogonal to your suggestion) would be to write a completion module for zsh/bash/whatever that could tab-complete options to the z* commands including zfs filesystems. -M ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] proposal partial/relative paths for zfs(1)
On Thu, 10 Jul 2008, Mark Phalan wrote: I find this annoying as well. Another way that would help (but is fairly orthogonal to your suggestion) would be to write a completion module for zsh/bash/whatever that could tab-complete options to the z* commands including zfs filesystems. You mean something like this? http://www.sun.com/bigadmin/jsp/descFile.jsp?url=descAll/bash_tabcompletion_ Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] proposal partial/relative paths for zfs(1)
On Thu, 2008-07-10 at 07:12 -0400, Mark J Musante wrote: On Thu, 10 Jul 2008, Mark Phalan wrote: I find this annoying as well. Another way that would help (but is fairly orthogonal to your suggestion) would be to write a completion module for zsh/bash/whatever that could tab-complete options to the z* commands including zfs filesystems. You mean something like this? http://www.sun.com/bigadmin/jsp/descFile.jsp?url=descAll/bash_tabcompletion_ Yes! Exactly! Now I just need to re-write it for zsh.. Thanks, -M ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] proposal partial/relative paths for zfs(1)
On Thu, 10 Jul 2008, Tim Foster wrote: Mark Musante (famous for recently beating the crap out of lu) Heh. Although at this point it's hard to tell who's the beat-er and who's the beat-ee... Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] X4540
Oh god, I hope not. A patent on fitting a card in a PCI-E slot, or using nvram with RAID (which raid controllers have been doing for years) would just be rediculous. This is nothing more than cache, and even with the American patent system I'd have though it hard to get that past the obviousness test. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Case study/recommended ZFS setup for home file server
On Thu, Jul 10, 2008 at 12:47:26AM -0700, Ross wrote: My recommendation: buy a small, cheap 2.5 SATA hard drive (or 1.8 SSD) and use that as your boot volume, I'd even bolt it to the side of your case if you have to. Then use the whole of your three large disks as a raid-z set. Yup, I'm going with 4GB of mirrored flash for root/var/usr and I'll keep the main spindles only for data. If I were in your shoes I would also have bought 4 drives for ZFS instead of 3, and gone for raid-z2. No room - Antec NSK-2440 - and too much power draw. My server idles at 57-64 W (under Linux) and I'd like to keep it that way. And finally, I don't know how much room you have in your current case, but if you're ever looking for one that takes more drives I can highly recommend the Antec P182. I've got 6x 1TB drives in my home server and in that case it's so quiet I can't even hear it turn on. My watch ticking easily drowns out this server. Heh - I do have the P180 as my workstation case. But I don't have that much room for servers 8^) PS. If you're going to be using CIFS, avoid build 93. Can you please give a link to the discussion, or a bug id? Thanks, florin -- Bruce Schneier expects the Spanish Inquisition. http://geekz.co.uk/schneierfacts/fact/163 pgpmP7PaXForT.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS send/receive questions
Hi all, I'm a little (ok, a lot) confused on the whole zfs send/receive commands. I've seen mention of using zfs send between two different machines, but no good howto in order to make it work. I have one try-n-buy x4500 that we are trying to move data from onto a new x4500 that we've purchased. Right now I'm using rsync over ssh (via 1GB/s network) to copy the data but it is almost painfully slow (700GB over 24 hours). Yeah, it's a load of small files for the most part. Anyway, would zfs send/receive work better? Do you have to set up a service on the receiving machine in order to receive the zfs stream? The machine is an x4500 running Solaris 10 u5. Thanks Dave David Glaser Systems Administrator LSA Information Technology University of Michigan ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] proposal partial/relative paths for zfs(1)
Darren J Moffat wrote: Today: $ zfs create cube/builds/darrenm/bugs/6724478 With this proposal: $ pwd /cube/builds/darrenm/bugs $ zfs create 6724478 Both of these would result in a new dataset cube/builds/darrenm/6724478 ... Maybe the easiest way out of the ambiquity is to add a flag to zfs create for the partial dataset name eg: $ pwd /cube/builds/darrenm/bugs $ zfs create -c 6724478 Why -c ? -c for current directory -p partial is already taken to mean create all non existing parents and -r relative is already used consistently as recurse in other zfs(1) commands (as well as lots of other places). Why not zfs create $PWD/6724478. Works today, traditional UNIX behaviour, no coding required. Unles you're in some bizarroland shell (like csh?)... -- Carson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] X4540
On Jul 10, 2008, at 7:05 AM, Ross wrote: Oh god, I hope not. A patent on fitting a card in a PCI-E slot, or using nvram with RAID (which raid controllers have been doing for years) would just be rediculous. This is nothing more than cache, and even with the American patent system I'd have though it hard to get that past the obviousness test. How quickly they forget. Take a look at the Prestoserve User's Guide for a refresher... http://docs.sun.com/app/docs/doc/801-4896-11 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] previously mentioned J4000 released
Tommaso Boccali wrote: .. And the answer was yes I hope. we are sriously thinking of buying 48 1 tb disk to replace those in a 1 year old thumper please confirm it again :) In my 15 year experience with Sun Products, I've never known one to care about drive brand, model, or firmware. If it was standards compliant for both physical interface, and protocol the machine would use it in my experience. This was mainly with host attached JBOD though (which the x4500 and x4540 are.) In RAID arrays my guess is that it wouldn't care then either, though you'd be opening yourself up to wierd interactions between the array and the drive firmware if you didn't use a tested combination. The drive carriers were a different story though. Some were easy to get. Others extrememly hard. There was one carrier that we couldn't get separately even when I worked at Sun. -kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] X4540
Spencer Shepler wrote: On Jul 10, 2008, at 7:05 AM, Ross wrote: Oh god, I hope not. A patent on fitting a card in a PCI-E slot, or using nvram with RAID (which raid controllers have been doing for years) would just be rediculous. This is nothing more than cache, and even with the American patent system I'd have though it hard to get that past the obviousness test. How quickly they forget. Take a look at the Prestoserve User's Guide for a refresher... http://docs.sun.com/app/docs/doc/801-4896-11 Or Fast Write Cache http://docs.sun.com/app/docs/coll/fast-write-cache2.0 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] proposal partial/relative paths for zfs(1)
Carson Gaspar wrote: Darren J Moffat wrote: $ pwd /cube/builds/darrenm/bugs $ zfs create -c 6724478 Why -c ? -c for current directory -p partial is already taken to mean create all non existing parents and -r relative is already used consistently as recurse in other zfs(1) commands (as well as lots of other places). Why not zfs create $PWD/6724478. Works today, traditional UNIX behaviour, no coding required. Unles you're in some bizarroland shell (like csh?)... Because the zfs dataset mountpoint may not be the same as the zfs pool name. This makes things a bit complicated for the initial request. Personally, I haven't played with datasets where the mountpoint is different. If you have a zpool tank mounted on /tank and /tank/homedirs with mountpoint=/export/home, do you create the next dataset /tank/homedirs/carson, or /export/home/carson ? And does the mountpoint get inherited in the obvious (vs. the simple vs. not at all) way? I don't know. Also $PWD has a leading / in this example. --Joe ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] X4540
On Thu, 10 Jul 2008, Ross wrote: As a NFS storage platform, you'd be beating EMC and NetApp on price, spindle count, features and performance. I really hope somebody at Sun considers this, and thinks about expanding the What can you do with an x4540 section on the website to include VMware. I expect that Sun is realizing that it is already undercutting much of the rest of its product line. These minor updates would allow the X4540 to compete against much more expensive StorageTek SAN hardware. How can other products remain profitable when competing against such a star performer? Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] proposal partial/relative paths for zfs(1)
Carson Gaspar wrote: Why not zfs create $PWD/6724478. Works today, traditional UNIX behaviour, no coding required. Unles you're in some bizarroland shell Did you actually try that ? braveheart# echo $PWD /tank/p2/2/1 braveheart# zfs create $PWD/44 cannot create '/tank/p2/2/1/44': leading slash in name It work because zfs create takes a dataset name but $PWD will give you a pathname starting with /. Dataset names don't start with /. Also this assumes that your mountpoint hierarchy is identical to your dataset name hierarchy (other than the leading /) which isn't necessarily true, ie if any of the datasets have a non default mountpoint property. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send/receive questions
Glaser, David wrote: Hi all, I'm a little (ok, a lot) confused on the whole zfs send/receive commands. I've seen mention of using zfs send between two different machines, but no good howto in order to make it work. zfs(1) man page, Examples 12 and 13 show how to use senn/receive with ssh. What isn't clear about them ? Do you have to set up a service on the receiving machine in order to receive the zfs stream? No. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] proposal partial/relative paths for zfs(1)
On Thu, Jul 10, 2008 at 5:42 AM, Darren J Moffat [EMAIL PROTECTED] wrote: Thoughts ? Is this useful for anyone else ? My above examples are some of the shorter dataset names I use, ones in my home directory can be even deeper. Quite usable and should be done. The key problem I see is how to deal with ambiguity. # zpool create pool # zfs create pool/home # zfs set mountpoint=/home pool/home # zfs create pool/home/adams (for Dilbert's master) ... # zfs create pool/home/gerdts (for Me) ... # zfs create pool/home/pool (for Ms. Pool) ... # cd /home # zfs snapshot [EMAIL PROTECTED] What just got snapshotted? My vote would be that it would try the traditional match first, then try to do it by resolving the path. That is, if it would have failed in the past, it should see if the specified path is the root (mountpoint?) of a data set. That way things like the following should work unambigously: # zfs snapshot ./[EMAIL PROTECTED] # zfs snapshot `pwd`/[EMAIL PROTECTED] -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send/receive questions
On Thu, 10 Jul 2008, Glaser, David wrote: x4500 that we've purchased. Right now I'm using rsync over ssh (via 1GB/s network) to copy the data but it is almost painfully slow (700GB over 24 hours). Yeah, it's a load of small files for the most part. Anyway, would zfs send/receive work better? Do you have to set up a service on the receiving machine in order to receive the zfs stream? You don't need to set up a service on the remote machine. You can use ssh to invoke the zfs receive and pipe the data across the ssh connection, which is similar to what rsync is doing. For example (from the zfs docs): zfs send tank/[EMAIL PROTECTED] | ssh newsys zfs recv sandbox/[EMAIL PROTECTED] For a fresh copy, the bottleneck is quite likely ssh itself. Ssh uses fancy encryption algorithms which take lots of CPU time and really slows things down. The blowfish algorithm seems to be fastest so passing -c blowfish As an ssh option can significantly speed things up. For example, this is how you can tell rsync to use ssh with your own options: --rsh='/usr/bin/ssh -c blowfish' In order to achieve even more performance (but without encryption), you can use Netcat as the underlying transport. See http://netcat.sourceforge.net/. Lastly, if you have much more CPU available than bandwidth, then it is worthwhile to install and use the 'lzop' compression program which compresses very quickly to a format only about 30% less compressed than what gzip achieves but fast enough for real-time data transmission. It is easy to insert lzop into the pipeline so that less data is sent across the network. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send/receive questions
On Thu, Jul 10, 2008 at 09:02:35AM -0700, Tim Spriggs wrote: zfs(1) man page, Examples 12 and 13 show how to use senn/receive with ssh. What isn't clear about them ? I found that the overhead of SSH really hampered my ability to transfer data between thumpers as well. When I simply ran a set of sockets and a pipe things went much faster (filled a 1G link). Essentially I used netcat instead of SSH. You can use blowfish [0] or arcfour [1] as they are faster than the default algorithm (3des). Cheers, florin 0: ssh(1) man page 1: http://www.psc.edu/networking/projects/hpn-ssh/theory.php -- Bruce Schneier expects the Spanish Inquisition. http://geekz.co.uk/schneierfacts/fact/163 pgpvZKLQ7qBAI.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] previously mentioned J4000 released
Kyle McDonald wrote: Tommaso Boccali wrote: .. And the answer was yes I hope. we are sriously thinking of buying 48 1 tb disk to replace those in a 1 year old thumper please confirm it again :) In my 15 year experience with Sun Products, I've never known one to care about drive brand, model, or firmware. If it was standards compliant for both physical interface, and protocol the machine would use it in my experience. This was mainly with host attached JBOD though (which the x4500 and x4540 are.) In RAID arrays my guess is that it wouldn't care then either, though you'd be opening yourself up to wierd interactions between the array and the drive firmware if you didn't use a tested combination. In general, yes, industry standard drives should be industry standard. We do favor the enterprise-class drives, mostly because they are lower cost over time -- it costs real $$ to answer the phone for a field replacement request. Usually, there is a Sun-specific label because though we source from many vendors and products like hardware RAID controllers get upset when the replacement disk reports a different size. The drive carriers were a different story though. Some were easy to get. Others extrememly hard. There was one carrier that we couldn't get separately even when I worked at Sun. Drive carriers are a different ballgame. AFAIK, there is no industry standard carrier that meets our needs. We require service LEDs for many of our modern disk carriers, so there is a little bit of extra electronics there. You will see more electronics for some of the newer products as I explain here: http://blogs.sun.com/relling/entry/this_ain_t_your_daddy I won't get into the support issue... it hurts my brain. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send/receive questions
Florin Iucha wrote: On Thu, Jul 10, 2008 at 09:02:35AM -0700, Tim Spriggs wrote: zfs(1) man page, Examples 12 and 13 show how to use senn/receive with ssh. What isn't clear about them ? I found that the overhead of SSH really hampered my ability to transfer data between thumpers as well. When I simply ran a set of sockets and a pipe things went much faster (filled a 1G link). Essentially I used netcat instead of SSH. You can use blowfish [0] or arcfour [1] as they are faster than the default algorithm (3des). The default algorithm for ssh on Solaris is not 3des it is aes128-ctr. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] proposal partial/relative paths for zfs(1)
Mike Gerdts wrote: On Thu, Jul 10, 2008 at 5:42 AM, Darren J Moffat [EMAIL PROTECTED] wrote: Thoughts ? Is this useful for anyone else ? My above examples are some of the shorter dataset names I use, ones in my home directory can be even deeper. Quite usable and should be done. The key problem I see is how to deal with ambiguity. # zpool create pool # zfs create pool/home # zfs set mountpoint=/home pool/home # zfs create pool/home/adams (for Dilbert's master) ... # zfs create pool/home/gerdts (for Me) ... # zfs create pool/home/pool (for Ms. Pool) ... # cd /home # zfs snapshot [EMAIL PROTECTED] What just got snapshotted? The dataset named pool only. I don't see how that could be ambiguous now or with what I proposed. If you said zfs snapshot -r [EMAIL PROTECTED] then all of them. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send/receive questions
Is that faster than blowfish? Dave -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Darren J Moffat Sent: Thursday, July 10, 2008 12:27 PM To: Florin Iucha Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] ZFS send/receive questions Florin Iucha wrote: On Thu, Jul 10, 2008 at 09:02:35AM -0700, Tim Spriggs wrote: zfs(1) man page, Examples 12 and 13 show how to use senn/receive with ssh. What isn't clear about them ? I found that the overhead of SSH really hampered my ability to transfer data between thumpers as well. When I simply ran a set of sockets and a pipe things went much faster (filled a 1G link). Essentially I used netcat instead of SSH. You can use blowfish [0] or arcfour [1] as they are faster than the default algorithm (3des). The default algorithm for ssh on Solaris is not 3des it is aes128-ctr. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] X4540
On Thu, Jul 10, 2008 at 10:20 AM, Bob Friesenhahn [EMAIL PROTECTED] wrote: On Thu, 10 Jul 2008, Ross wrote: As a NFS storage platform, you'd be beating EMC and NetApp on price, spindle count, features and performance. I really hope somebody at Sun considers this, and thinks about expanding the What can you do with an x4540 section on the website to include VMware. I expect that Sun is realizing that it is already undercutting much of the rest of its product line. These minor updates would allow the X4540 to compete against much more expensive StorageTek SAN hardware. How can other products remain profitable when competing against such a star performer? Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Because at the end of the day, the x4540 still isn't *there* (and probably never will be) for 24/7 SAN/LUN access. AFAIK, nothing in the storagetek line-up is worth a damn as far as NAS goes that would compete with this. I honestly don't believe anyone looking at a home-grown x4540 is TRULY in the market for a high end STK SAN anyways. It's the same reason you don't see HDS or EMC rushing to adjust the price of the SYM or USP-V based on Sun releasing the thumpers. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] proposal partial/relative paths for zfs(1)
On Thu, Jul 10, 2008 at 11:31 AM, Darren J Moffat [EMAIL PROTECTED] wrote: Mike Gerdts wrote: On Thu, Jul 10, 2008 at 5:42 AM, Darren J Moffat [EMAIL PROTECTED] wrote: Thoughts ? Is this useful for anyone else ? My above examples are some of the shorter dataset names I use, ones in my home directory can be even deeper. Quite usable and should be done. The key problem I see is how to deal with ambiguity. # zpool create pool # zfs create pool/home # zfs set mountpoint=/home pool/home # zfs create pool/home/adams (for Dilbert's master) ... # zfs create pool/home/gerdts (for Me) ... # zfs create pool/home/pool (for Ms. Pool) ... # cd /home # zfs snapshot [EMAIL PROTECTED] What just got snapshotted? The dataset named pool only. I don't see how that could be ambiguous now or with what I proposed. If you said zfs snapshot -r [EMAIL PROTECTED] then all of them. Which dataset named pool? The one at /pool (the root of the zpool, if you will) or the one at /home/pool (Ms. Pool's home directory) which happens to be `pwd`/pool. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send/receive questions
I guess what I was wondering if there was a direct method rather than the overhead of ssh. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Darren J Moffat Sent: Thursday, July 10, 2008 11:40 AM To: Glaser, David Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] ZFS send/receive questions Glaser, David wrote: Hi all, I'm a little (ok, a lot) confused on the whole zfs send/receive commands. I've seen mention of using zfs send between two different machines, but no good howto in order to make it work. zfs(1) man page, Examples 12 and 13 show how to use senn/receive with ssh. What isn't clear about them ? Do you have to set up a service on the receiving machine in order to receive the zfs stream? No. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] proposal partial/relative paths for zfs(1)
Mike Gerdts wrote: On Thu, Jul 10, 2008 at 11:31 AM, Darren J Moffat [EMAIL PROTECTED] wrote: Mike Gerdts wrote: On Thu, Jul 10, 2008 at 5:42 AM, Darren J Moffat [EMAIL PROTECTED] wrote: Thoughts ? Is this useful for anyone else ? My above examples are some of the shorter dataset names I use, ones in my home directory can be even deeper. Quite usable and should be done. The key problem I see is how to deal with ambiguity. # zpool create pool # zfs create pool/home # zfs set mountpoint=/home pool/home # zfs create pool/home/adams (for Dilbert's master) ... # zfs create pool/home/gerdts (for Me) ... # zfs create pool/home/pool (for Ms. Pool) ... # cd /home # zfs snapshot [EMAIL PROTECTED] What just got snapshotted? The dataset named pool only. I don't see how that could be ambiguous now or with what I proposed. If you said zfs snapshot -r [EMAIL PROTECTED] then all of them. Which dataset named pool? The one at /pool (the root of the zpool, if you will) or the one at /home/pool (Ms. Pool's home directory) which happens to be `pwd`/pool. Ah sorry I missed that your third dataset ended in pool. The answer is still the same though if the proposal to use a new flag for partial paths is taken. Which is why I suggested that, it is ambiguous in the example you gave if zfs(1) commands other than create can take relative paths too [ which would be useful ]. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send/receive questions
Glaser, David wrote: I guess what I was wondering if there was a direct method rather than the overhead of ssh. As others have suggested use netcat (/usr/bin/nc) however you get no over the wire data confidentiality or integrity and no strong authentication with that. If you need those then a combination of netcat and IPsec might help. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send/receive questions
Thankfully right now it's between a private IP network between the two machines. I'll play with it a bit and let folks know if I can't get it to work. Thanks, Dave -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Darren J Moffat Sent: Thursday, July 10, 2008 12:50 PM To: Glaser, David Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] ZFS send/receive questions Glaser, David wrote: I guess what I was wondering if there was a direct method rather than the overhead of ssh. As others have suggested use netcat (/usr/bin/nc) however you get no over the wire data confidentiality or integrity and no strong authentication with that. If you need those then a combination of netcat and IPsec might help. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send/receive questions
On Thu, Jul 10, 2008 at 12:43, Glaser, David [EMAIL PROTECTED] wrote: I guess what I was wondering if there was a direct method rather than the overhead of ssh. On receiving machine: nc -l 12345 | zfs recv mypool/[EMAIL PROTECTED] and on sending machine: zfs send sourcepool/[EMAIL PROTECTED] | nc othermachine.umich.edu 12345 You'll need to build your own netcat, but this is fairly simple. If you run into trouble let me know and I'll post an x86 package. Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send/receive questions
Could I trouble you for the x86 package? I don't seem to have much in the way of software on this try-n-buy system... Thanks, Dave -Original Message- From: Will Murnane [mailto:[EMAIL PROTECTED] Sent: Thursday, July 10, 2008 12:58 PM To: Glaser, David Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] ZFS send/receive questions On Thu, Jul 10, 2008 at 12:43, Glaser, David [EMAIL PROTECTED] wrote: I guess what I was wondering if there was a direct method rather than the overhead of ssh. On receiving machine: nc -l 12345 | zfs recv mypool/[EMAIL PROTECTED] and on sending machine: zfs send sourcepool/[EMAIL PROTECTED] | nc othermachine.umich.edu 12345 You'll need to build your own netcat, but this is fairly simple. If you run into trouble let me know and I'll post an x86 package. Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] X4540
Torrey McMahon wrote: Spencer Shepler wrote: On Jul 10, 2008, at 7:05 AM, Ross wrote: Oh god, I hope not. A patent on fitting a card in a PCI-E slot, or using nvram with RAID (which raid controllers have been doing for years) would just be rediculous. This is nothing more than cache, and even with the American patent system I'd have though it hard to get that past the obviousness test. How quickly they forget. Take a look at the Prestoserve User's Guide for a refresher... http://docs.sun.com/app/docs/doc/801-4896-11 Or Fast Write Cache http://docs.sun.com/app/docs/coll/fast-write-cache2.0 Yeah, the J-shaped scar just below my right shoulder blade... For the benefit of the alias, these sorts of products have a very limited market because they store state inside the server and use batteries. RAS guys hate batteries, especially those which are sitting on non-hot-pluggable I/O cards. While there are some specific cards which do allow hardware assisted remote replication (a previous Sun technology called reflective memory as used by VAXclusters) most of the issues are with serviceability and not availability. It is really bad juju to leave state in the wrong place during a service event. Where I think the jury is deadlocked is whether these are actually faster than RAID cards like http://www.sun.com/storagetek/storage_networking/hba/raid/ But from a performability perspective, the question is whether or not such cards perform significantly better than SSDs? Thoughts? -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Case study/recommended ZFS setup for home file server
Fajar A. Nugraha wrote: Brandon High wrote: Another alternative is to use an IDE to Compact Flash adapter, and boot off of flash. Just curious, what will that flash contain? e.g. will it be similar to linux's /boot, or will it contain the full solaris root? How do you manage redundancy (e.g. mirror) for that boot device? zfs set copies=2 :-) hmm... I need to dig up my notes on that and blog it... -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send/receive questions
Will Murnane wrote: On Thu, Jul 10, 2008 at 12:43, Glaser, David [EMAIL PROTECTED] wrote: I guess what I was wondering if there was a direct method rather than the overhead of ssh. On receiving machine: nc -l 12345 | zfs recv mypool/[EMAIL PROTECTED] and on sending machine: zfs send sourcepool/[EMAIL PROTECTED] | nc othermachine.umich.edu 12345 You'll need to build your own netcat, but this is fairly simple. If you run into trouble let me know and I'll post an x86 package. Will If you are running Nexenta you can also apt-get install sunwnetcat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send/receive questions
On Thu, Jul 10, 2008 at 13:05, Glaser, David [EMAIL PROTECTED] wrote: Could I trouble you for the x86 package? I don't seem to have much in the way of software on this try-n-buy system... No problem. Packages are posted at http://will.incorrige.us/solaris-packages/ . You'll need gettext and iconv as well as netcat, as it links against libiconv. Download the gzip files, decompress them with gzip -d, then pkgtrans $packagefile $tempdir and run pkgadd -d $tempdir. Files will be installed in the /usr/site hierarchy. The executable is called netcat, not nc, because that's what it builds as by default. I believe I got all the dependencies, but if not I'll be glad to post whatever is missing as well. If you'd rather have spec files and sources (which you can assemble with pkgbuild) than binaries, I can provide those instead. Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] previously mentioned J4000 released
On Thu, Jul 10, 2008 at 4:13 PM, Kyle McDonald [EMAIL PROTECTED] wrote: In my 15 year experience with Sun Products, I've never known one to care about drive brand, model, or firmware. If it was standards compliant for both physical interface, and protocol the machine would use it in my experience. This was mainly with host attached JBOD though (which the x4500 and x4540 are.) In RAID arrays my guess is that it wouldn't care then either, though you'd be opening yourself up to wierd interactions between the array and the drive firmware if you didn't use a tested combination. My experience with RAID arrays (mostly Sun's) has been that they're incredibly picky about the drives they talk to, firmware in particular. You pretty much have to have one of the few supported configurations for it to work. If you're lucky the array will update the firmware for you. I've also seen the intelligent controllers in some of Sun's JBOD units (the S1, and the 3000 series) fail to recognize drives that work perfectly well elsewhere. I'm slightly disappointed that there wasn't a model for 2.5 inch drives in there, though. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] proposal partial/relative paths for zfs(1)
Moore, Joe wrote: Carson Gaspar wrote: Darren J Moffat wrote: $ pwd /cube/builds/darrenm/bugs $ zfs create -c 6724478 Why -c ? -c for current directory -p partial is already taken to mean create all non existing parents and -r relative is already used consistently as recurse in other zfs(1) commands (as well as lots of other places). Why not zfs create $PWD/6724478. Works today, traditional UNIX behaviour, no coding required. Unles you're in some bizarroland shell (like csh?)... Because the zfs dataset mountpoint may not be the same as the zfs pool name. This makes things a bit complicated for the initial request. The leading slash will be a problem with the current code. I forgot about that... make that ${PWD#/} (or change the code to ignore the leading slash...). That is, admittedly, more typing than a single character option, but not much. And yes, if your mount name and pool names don't match, extra code would be required to determine the parent pool/fs of the path passed. But no more code than magic CWD goo... I really don't like special case options whose sole purpose is to shorten command line length. -- Carson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Mirroring - Scenario
I have a scenario (tray failure) that I am trying to predict how zfs will behave and am looking for some input . Coming from the world of svm, ZFS is WAY different ;) If we have 2 racks, containing 4 trays each, 2 6540's that present 8D Raid5 luns to the OS/zfs and through zfs we setup a mirror config such that: I'm oversimplifying here but... Rack 1 - Tray 1 = lun 0Rack 2 - Tray 1 = lun 4 Rack 1 - Tray 2 = lun 1Rack 2 - Tray 2 = lun 5 Rack 1 - Tray 3 = lun 2Rack 2 - Tray 3 = lun 6 Rack 1 - Tray 4 = lun 3Rack 2 - Tray 4 = lun 7 so the zpool command would be: zpool create somepool mirror 0 4 mirror 1 5 mirror 2 6 mirror 3 7 ---(just for ease of explanation using the supposed lun numbers) so a status output would look similar to: somepool mirror 0 4 mirror 1 5 mirror 3 6 mirror 4 7 Now in the VERY unlikely event that we lost the first tray in each rack which contain 0 and 4 respectively... somepool mirror--- 0 | 4 | Bye Bye --- mirror 1 5 mirror 3 6 mirror 4 7 Would the entire somepool zpool die? Would it affect ALL users in this pool or a portion of the users? Is there a way in zfs to be able to tell what individual users are hosed (my group is a bunch of control freaks ;)? How would zfs react to something like this? Also any feedback on a better way to do this is more then welcome Please keep in mind I am a ZFS noob so detailed explanations would be awesome. Thanks in advance Robb ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Mirroring - Scenario
On Thu, 10 Jul 2008, Robb Snavely wrote: Now in the VERY unlikely event that we lost the first tray in each rack which contain 0 and 4 respectively... somepool mirror--- 0 | 4 | Bye Bye --- mirror 1 5 mirror 3 6 mirror 4 7 Would the entire somepool zpool die? Would it affect ALL users in this pool or a portion of the users? Is there a way in zfs to be able ZFS loadshares the pool over the VDEVs (mirror is a type of VDEV) so your entire pool would become dead and unusable until the VDEV is restored. You want your mirrors to be based on hardware which is as distinct as possible. If necessary you could consider a tripple mirror. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS/Install related question
Hi there, I'm currently setting up a new system to my lab. 4 SATA drives would be turned into the main file system (ZFS?) running on a soft raid (raid-z?). My main target is reliability, my experience with Linux SoftRaid was catastrophic and the array could no be restored after some testing simulating power failures (thank god I did the tests before relying on that...) For what I've seen so far, Solaris cannot boot from a raid-z system. Is that correct? In this case, what needs to be out of the array? Example, on a Linux system, I could set the /boot to be on a old 256MB USB flash.(As long the boot loader and kernel were out of the array the system would boot.) What are the requirements for booting from the USB but loading a system on the array? Second, how do I proceed during the Install process? I know it's a little bit weird but I must confess I'm doing it on purpose. :-) I thank you in advance This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] please help with raid / failure / rebuild calculations
I am building a 14 disk raid 6 array with 1 TB seagate AS (non-enterprise) drives. So there will be 14 disks total, 2 of them will be parity, 12 TB space available. My drives have a BER of 10^14 I am quite scared by my calculations - it appears that if one drive fails, and I do a rebuild, I will perform: 13*8*10^12 = 104 reads. But my BER is smaller: 10^14 = 100 So I am (theoretically) guaranteed to lose another drive on raid rebuild. Then the calculation for _that_ rebuild is: 12*8*10^12 = 96 So no longer guaranteed, but 96% isn't good. I have looked all over, and these seem to be the accepted calculations - which means if I ever have to rebuild, I'm toast. But here is the question - the part I am having trouble understanding: The 13*8*10^12 operations required for the first rebuild isn't that the number for _the entire array_ ? Any given 1 TB disk only has 10^12 bits on it _total_. So why would I ever do more than 10^12 operations on the disk ? It seems very odd to me that a raid controller would have to access any given bit more than once to do a rebuild ... and the total number of bits on a drive is 10^12, which is far below the 10^14 BER number. So I guess my question is - why are we all doing this calculation, wherein we apply the total operations across an entire array rebuild to a single drives BER number ? Thanks. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] please help with raid / failure / rebuild calculations
User Name wrote: I am building a 14 disk raid 6 array with 1 TB seagate AS (non-enterprise) drives. So there will be 14 disks total, 2 of them will be parity, 12 TB space available. My drives have a BER of 10^14 I am quite scared by my calculations - it appears that if one drive fails, and I do a rebuild, I will perform: 13*8*10^12 = 104 reads. But my BER is smaller: 10^14 = 100 So I am (theoretically) guaranteed to lose another drive on raid rebuild. Then the calculation for _that_ rebuild is: 12*8*10^12 = 96 So no longer guaranteed, but 96% isn't good. I have looked all over, and these seem to be the accepted calculations - which means if I ever have to rebuild, I'm toast. If you were using RAID-5, you might be concerned. For RAID-6, or at least raidz2, you could recover an unrecoverable read during the rebuild of one disk. But here is the question - the part I am having trouble understanding: The 13*8*10^12 operations required for the first rebuild isn't that the number for _the entire array_ ? Any given 1 TB disk only has 10^12 bits on it _total_. So why would I ever do more than 10^12 operations on the disk ? Actually, ZFS only rebuilds the data. So you need to multiply by the space utilization of the pool, which will usually be less than 100%. It seems very odd to me that a raid controller would have to access any given bit more than once to do a rebuild ... and the total number of bits on a drive is 10^12, which is far below the 10^14 BER number. So I guess my question is - why are we all doing this calculation, wherein we apply the total operations across an entire array rebuild to a single drives BER number ? You might also be interested in this blog http://blogs.zdnet.com/storage/?p=162 A couple of things seem to be at work here. I study field data failure rates. We tend to see unrecoverable read failure rates at least an order of magnitude better than the specifications. This is a good thing, but simply points out that the specifications are often sand-bagged -- they are not a guarantee. However, you are quite right in your intuition that if you have a lot of bits of data, then you need to pay attention to the bit-error rate (BER) of unrecoverable reads on disks. This sort of model can be used to determine a mean time to data loss (MTTDL) as I explain here: http://blogs.sun.com/relling/entry/a_story_of_two_mttdl Perhaps it would help if we changed the math to show the risk as a function of the amount of data given the protection scheme? hmmm something like probability of data loss per year for N TBytes with configuration XYZ. Would that be more useful for evaluating configurations? -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss