Re: [zfs-discuss] Re: Re: can I use zfs on just a partition?
Hi, When you do the initial install, how do you do the slicing? Just create like: / 10G swap 2G /altroot 10G /zfs restofdisk yes. Or do you just create the first three slices and leave the rest of the disk untouched? I understand the concept at this point, just trying to explain to a third party exactly what they need to do to prep the system disk for me :) No. You need to be able to tell ZFS what to use. Hence, if your pool is created at the slice level, you need to create a slice for it. So the above is the way to go. And yes, you only should do this on laptos and other machines where you only have 1 disk or are otherwise very disk-limited :). Best regards, Constantin -- Constantin GonzalezSun Microsystems GmbH, Germany Platform Technology Group, Client Solutionshttp://www.sun.de/ Tel.: +49 89/4 60 08-25 91 http://blogs.sun.com/constantin/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] high density SAS
Well Solaris SAS isn't there yet but anyway just found some interesting high density SAS/SATA enclosures. http://xtore.com/product_list.asp?cat=JBOD The XJ 2000 is like the x4500 in that it holds 48 drives, however with the XJ 2000 2 drives are on each carrier and you can get to them from the front. I don't like xtore in general but the 24 bay (2.5 SAS) and 48 bay JBODs are interesting. How badly can you mess up a JBOD? Two words: vibration, cooling. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: How much do we really want zpool remove?
On 25/01/07, Brian Hechinger [EMAIL PROTECTED] wrote: The other point is, how many other volume management systems allow you to remove disks? I bet if the answer is not zero, it's not large. ;) Even Linux LVM can do this (with pvmove) - slow, but you can do it online. -- Rasputin :: Jack of All Trades - Master of Nuns http://number9.hellooperator.net/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] multihosted ZFS
Hi! I've been testing ZFS, and would like to use it on SAN attached disks in our production environment, where multiple machines can see the same zpools. I'm having some concerns about importing/exporting pools on possible failure situations. If box that was using some zpool crashes (for example sending break to the host when testing this), I would like to import that pool on some other host right away. Of course I'll have to use import -f cause the pool was not exported. Now the other host is serving the disk, no problem there, but when I boot the crashed host again, it wants to keep using the pools it previosly had and it doesn't realize that the pool is now in use by the other host. That leads to two systems using the same zpool which is not nice. Is there any solution to this problem, or do I have to get Sun Cluster 3.2 if I want to serve same zpools from many hosts? We may try Sun Cluster anyway, but I'd like to know if this can be solved without it. -- Ari-Pekka Oksavuori aoksavuo at cs.tut.fi ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] multihosted ZFS
Ari-Pekka Oksavuori wrote: Hi! I've been testing ZFS, and would like to use it on SAN attached disks in our production environment, where multiple machines can see the same zpools. I'm having some concerns about importing/exporting pools on possible failure situations. If box that was using some zpool crashes (for example sending break to the host when testing this), I would like to import that pool on some other host right away. Of course I'll have to use import -f cause the pool was not exported. Now the other host is serving the disk, no problem there, but when I boot the crashed host again, it wants to keep using the pools it previosly had and it doesn't realize that the pool is now in use by the other host. That leads to two systems using the same zpool which is not nice. s/not nice/really really bad/ :-) Is there any solution to this problem, or do I have to get Sun Cluster 3.2 if I want to serve same zpools from many hosts? We may try Sun Cluster anyway, but I'd like to know if this can be solved without it. You can't do it *safely* without the protection of a high- availability framework such as SunCluster. best regards, James C. McPherson -- Solaris kernel software engineer Sun Microsystems ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] multihosted ZFS
James C. McPherson wrote: You can't do it *safely* without the protection of a high- availability framework such as SunCluster. Thanks for the fast reply. :) We'll have look into the Cluster solution. -- Ari-Pekka Oksavuori aoksavuo at cs.tut.fi ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] bug id 6343667
Hello zfs-discuss, Is anyone working on that bug? Any progress? It's really PITA on x4500 when one wants/needs snapshots regularry and resilvering of bad disks can take many days... -- Best regards, Robert mailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS or UFS - what to do?
Hi Folks, I am currently in the midst of setting up a completely new file server using a pretty well loaded Sun T2000 (8x1GHz, 16GB RAM) connected to an Engenio 6994 product (I work for LSI Logic so Engenio is a no brainer). I have configured a couple of zpools from Volume groups on the Engenio box - 1x2.5TB and 1x3.75TB. I then created sub zfs systems below that and set quotas and sharenfs'd them so that it appears that these file systems are dynamically shrinkable and growable. It looks very good... I can see the correct file system sizes on all types of machines (Linux 32/64bit and of course Solaris boxes) and if I resize the quota it's picked up in NFS right away. But I would be the first in our organization to use this in an enterprise system so I definitely have some concerns that I'm hoping someone here can address. 1. How stable is ZFS? The Engenio box is completely configured for RAID5 with hot spares and write cache (8GB) has battery backup so I'm not too concerned from a hardware side. I'm looking for an idea of how stable ZFS itself is in terms of corruptability, uptime and OS stability. 2. Recommended config. Above, I have a fairly simple setup. In many of the examples the granularity is home directory level and when you have many many users that could get to be a bit of a nightmare administratively. I am really only looking for high level dynamic size adjustability and am not interested in its built in RAID features. But given that, any real world recommendations? 3. Caveats? Anything I'm missing that isn't in the docs that could turn into a BIG gotchya? 4. Since all data access is via NFS we are concerned that 32 bit systems (Mainly Linux and Windows via Samba) will not be able to access all the data areas of a 2TB+ zpool even if the zfs quota on a particular share is less then that. Can anyone comment? The bottom line is that with anything new there is cause for concern. Especially if it hasn't been tested within our organization. But the convenience/functionality factors are way too hard to ignore. Thanks, Jeff This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS or UFS - what to do?
Hello Jeffery, Friday, January 26, 2007, 3:16:44 PM, you wrote: JM Hi Folks, JM I am currently in the midst of setting up a completely new file JM server using a pretty well loaded Sun T2000 (8x1GHz, 16GB RAM) JM connected to an Engenio 6994 product (I work for LSI Logic so JM Engenio is a no brainer). I have configured a couple of zpools JM from Volume groups on the Engenio box - 1x2.5TB and 1x3.75TB. I JM then created sub zfs systems below that and set quotas and JM sharenfs'd them so that it appears that these file systems are JM dynamically shrinkable and growable. It looks very good... I can JM see the correct file system sizes on all types of machines (Linux JM 32/64bit and of course Solaris boxes) and if I resize the quota JM it's picked up in NFS right away. But I would be the first in our JM organization to use this in an enterprise system so I definitely JM have some concerns that I'm hoping someone here can address. JM 1. How stable is ZFS? The Engenio box is completely configured JM for RAID5 with hot spares and write cache (8GB) has battery backup JM so I'm not too concerned from a hardware side. I'm looking for an JM idea of how stable ZFS itself is in terms of corruptability, uptime and OS stability. When it comes to uptime, os stability or corruptability - no problems here. However if you give ZFS entire LUN's on Enginio devices IIRC with that arrays when zfs issues flush wrtie cache to the array it actually does and this can possibly hurt performance. There's a way to setup array to ignore flush commands or you can put zfs on SMI. You have to check if this problem was actually with Enginio - I'm not sure. However, depending on workload, consider doing RAID in ZFS instead of in on the array. Especially 'coz you get self-healing from ZFS then. At least doing stripe between several RAID5 LUNs would be good idea. JM 2. Recommended config. Above, I have a fairly simple setup. In JM many of the examples the granularity is home directory level and JM when you have many many users that could get to be a bit of a JM nightmare administratively. I am really only looking for high JM level dynamic size adjustability and am not interested in its JM built in RAID features. But given that, any real world recommendations? Depending on how much users you have consider creating a file system for each user or at least for a group of users if you can group them. JM 3. Caveats? Anything I'm missing that isn't in the docs that could turn into a BIG gotchya? WRITE CACHE problem I mentioned above - but check if it was really Enginio - anyway there're simple workarounds. There're some performance issues in corner cases hope you won't hit one. Use at least S10U3 or Nevada (there're some people using nevada in production :)). JM 4. Since all data access is via NFS we are concerned that 32 bit JM systems (Mainly Linux and Windows via Samba) will not be able to JM access all the data areas of a 2TB+ zpool even if the zfs quota on JM a particular share is less then that. Can anyone comment? If there's quota on a file system then nfs client will see that quota as a file system size IIRC so it shouldn't be a problem. But that means a file system for each users. JM The bottom line is that with anything new there is cause for JM concern. Especially if it hasn't been tested within our JM organization. But the convenience/functionality factors are way too hard to ignore. ZFS is new, that's right. There're some problems, mostly related to performance and hot spare support (when doing raid in ZFS). Other that that you should be ok. Quite a lot of people are using ZFS in a production. I myself have ZFS in a production for years and right now with well over 100TB of data on it using different storage arrays and I'm still migrating more and more data. Never lost any data on ZFS, at least I don't know about it :) -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS or UFS - what to do?
On Fri, 2007-01-26 at 06:16 -0800, Jeffery Malloch wrote: Hi Folks, I am currently in the midst of setting up a completely new file server using a pretty well loaded Sun T2000 (8x1GHz, 16GB RAM) connected to an Engenio 6994 product (I work for LSI Logic so Engenio is a no brainer). I have configured a couple of zpools from Volume groups on the Engenio box - 1x2.5TB and 1x3.75TB. I then created sub zfs systems below that and set quotas and sharenfs'd them so that it appears that these file systems are dynamically shrinkable and growable. It looks very good... I can see the correct file system sizes on all types of machines (Linux 32/64bit and of course Solaris boxes) and if I resize the quota it's picked up in NFS right away. But I would be the first in our organization to use this in an enterprise system so I definitely have some concerns that I'm hoping someone here can address. 1. How stable is ZFS? The Engenio box is completely configured for RAID5 with hot spares That partly defeats the purpose of ZFS. ZFS offers raid-z and raid-z2 (double parity) with all the advantages of raid-5 or raid-6 but without several of the raid-5 issues. It also has features that a raid-5 controller could never do: ensure data integrity from the kernel to the disk, and self correction. and write cache (8GB) has battery backup so I'm not too concerned from a hardware side. Whereas the cache/battery backup is a requirement if you run raid-5, it is not for zfs. I'm looking for an idea of how stable ZFS itself is in terms of corruptability, uptime and OS stability. Since Solaris 10 U3, it is rock solid. No issue here. 1.3TB or so currently assigned in FC drives, in production without any issues. We switched after losing some data from hardware mirroring. Our sysadmin is ecstatic with zfs. Some of the filesystems have compression enabled and that increases even the throughput, if you have the cpu/ram available. 2. Recommended config. The most reliable setup is a JBOD + zfs. But if you have cache, on your box, there might be some magic setup you have to do for that box, and I'm sure somebody on the list will help you with that. I dont have an Engenio. Francois ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: multihosted ZFS
If you _boot_ the original machine then it should see that the pool now is owned by the other host and ignore it (you'd have to do a zpool import -f again I think). Not tested though so don't take my word for it... However if you simply type go and let it continue from where it was then things definitely will not be pretty... :-) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS or UFS - what to do?
On Fri, 2007-01-26 at 06:16 -0800, Jeffery Malloch wrote: 2. Recommended config. 1) Since this is a system that many users will depend on, use zfs-managed redundancy, either mirroring or raid-z, between the LUNs exported by the storage system. You may think your storage system is perfect, but are you sure? with a non-redundant zfs, over time, you'll know for sure, but you might find this out at a very inconvenient time. With zfs-managed redundancy, if bit rot happens, you have an excellent chance of slogging through without any application-visible impact. 2) Enable compression. For the software development workloads I'm seeing, this generally recovers the space lost to redundancy. - Bill ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS or UFS - what to do?
I've used ZFS since July/August 2006 when Sol 10 Update 2 came out (first release to integrate ZFS.) I've used it on three servers (E25K domain, and 2 E2900s) extensivesely; two them are production. I've over 3TB of storage from an EMC SAN under ZFS management for no less than 6 months. Like your configuration we've defered data redundancy to SAN. My observations are: 1. ZFS is stable to a very large extent. There are two known issues that I'm aware of: a. You can end up in an endless 'reboot' cycle when you've a corrupt zpool. I came across this when I had data corruption due to a HBA mismatch with EMC SAN. This mismatch injected data corruption in transit and the EMC faithfully wrote bad data, upon reading this bad data ZFS threw up all over the floor for that pool. There is a documented workaround to snap out of the 'reboot' cycle, I've not checked if this is fixed in 11/06 update 3. b. Your server will hang when one of the underlying disks disappear. In our case we had a T2000 running 11/06 and had a mirrored zpool against two internal drives. When we pulled one of the drives abruptly the server simply hung. I believe this is a known bug, workaround? 2. When you've I/O operations that either request fsync or open files with O_DSYNC option coupled with high I/O ZFS will choke. It won't crash but the filesystem I/O runs like molases on a cold morning. All my feedback is based on Solaris 10 Update 2 (aka 06/06) and I've no comments on NFS. I strongly recommend that you use ZFS data redundancy (z1, z2, or mirror) and simply delegate the Engenio to stripe the data for performance. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: multihosted ZFS
On Jan 26, 2007, at 7:17, Peter Eriksson wrote: If you _boot_ the original machine then it should see that the pool now is owned by the other host and ignore it (you'd have to do a zpool import -f again I think). Not tested though so don't take my word for it... Conceptually, that's about right, but in practice it's not quite as simple as that. We had to do a lot of work in Cluster to ensure that the zpool would never be imported on more than one node at a time. However if you simply type go and let it continue from where it was then things definitely will not be pretty... :-) Yes, but that's only one of the bad scenarios. --Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS or UFS - what to do?
On Fri, Jan 26, 2007 at 08:06:46AM -0800, Anantha N. Srirama wrote: b. Your server will hang when one of the underlying disks disappear. In our case we had a T2000 running 11/06 and had a mirrored zpool against two internal drives. When we pulled one of the drives abruptly the server simply hung. I believe this is a known bug, workaround? This was just covered here and looks like the fix will make it into u4 (i think it's in svn_48?) The workaround is to do a 'zpool offline' whenever possible before removing a disk. Yes, this is not always possible (in the case of disk death), but will help in some situations. I can't wait for U4. :) -brian -- The reason I don't use Gnome: every single other window manager I know of is very powerfully extensible, where you can switch actions to different mouse buttons. Guess which one is not, because it might confuse the poor users? Here's a hint: it's not the small and fast one.--Linus ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: multihosted ZFS
Peter Eriksson wrote: If you _boot_ the original machine then it should see that the pool now is owned by the other host and ignore it (you'd have to do a zpool import -f again I think). Not tested though so don't take my word for it... However if you simply type go and let it continue from where it was then things definitely will not be pretty... :-) I tested this, same thing with reboot. -- Ari-Pekka Oksavuori aoksavuo at cs.tut.fi ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: How much do we really want zpool remove?
- We need to avoid customers thinking Veritas can shrink, ZFS can't. That is wrong. ZFS _filesystems_ grow and shrink all the time, it's just the pools below them that can just grow. And Veritas does not even have pools. I'm sure that this issue is different for different environments, but I assure you it wasn't raised because we're looking at a spec chart and someone saw a missing check in the ZFS column. The ability to deallocate in-use storage without having to migrate the existing data is used today by many administrators. We'll live with this not being possible in ZFS at the moment, but the limitation is real and the flexibility of filesystems within the pool doesn't alleviate it. Sorry if I'm stating the obvious or stuff that has been discussed before, but the more I think about zpool remove, the more I think it's a question of willingness to plan/work/script/provision vs. a real show stopper. Show stopper would depend on the environment. It's certainly not that in many places. I agree that if I could exactly plan all my storage perfectly in advance, then several ways that it would be really useful would be reduced. However one of the reasons to have it is precisely because it is so difficult to get good predictions for storage use. I know just a touch of the internals of ZFS to understand why remove/split/evacuate is much more difficult than it might be in simpler volume managers. I'm happy we've got what we have today and that people have already thought up ways of attacking this problem to make ZFS even better. -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: multihosted ZFS
Ed Gould wrote: On Jan 26, 2007, at 7:17, Peter Eriksson wrote: If you _boot_ the original machine then it should see that the pool now is owned by the other host and ignore it (you'd have to do a zpool import -f again I think). Not tested though so don't take my word for it... Conceptually, that's about right, but in practice it's not quite as simple as that. We had to do a lot of work in Cluster to ensure that the zpool would never be imported on more than one node at a time. Did VxVM use hostid on disks to check where the disk groups were last used, and won't automatically import groups with different id on disk. Would something like this be hard to implement? -- Ari-Pekka Oksavuori aoksavuo at cs.tut.fi ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: How much do we really want zpool remove?
[EMAIL PROTECTED] wrote on 01/26/2007 03:00:13 AM: Hi, I do agree that zpool remove is a _very_ desirable feature, not doubt about that. Here are a couple of thoughts and workarounds, in random order, that might give us some more perspective: - My home machine has 4 disks and a big zpool across them. Fine. But what if a controller fails or worse, a CPU? Right, I need a second machine, if I'm really honest with myself and serious with my data. Don't laugh, ZFS on a Solaris server is becoming my mission-critical home storage solution that is supposed to last beyond CDs and DVDs and other vulnerable media. So, if I was an enterprise, I'd be willing to keep enough empty LUNs available to facilitate at least the migration of one or more filesystems if not complete pools. With a little bit of scripting, this can be done quite easily and efficiently through zfs send/receive and some LUN juggling. If I was an enterprise's server admin and the storage guys wouldn't have enough space for migrations, I'd be worried. I think you may find in practice that many medium to large enterprise IT departments are in this exact situation -- we do not have luns sitting stagnant just waiting for data migrations of our largest data sets. We have been sold (and rightly so, because it works and is cost effective and has no downtime) that we should be able to move luns around at will without duplicating (to tape or disk) and dumping. You are really expecting to have the storage guys to have 40tb of disk just sitting collecting dust when you want to pull out 10 disks from a 44tb system? This type of thinking may very well be why Sun has hard time in the last few years (although zfs, and recent products show that the tide is turning). - We need to avoid customers thinking Veritas can shrink, ZFS can't. That is wrong. ZFS _filesystems_ grow and shrink all the time, it's just the pools below them that can just grow. And Veritas does not even have pools. Sorry, that is silly. Can we compare if we call them both volumes or filesystems (or any virtualization of each) which are reserved for data in which we want to remove and add disks online? vxfs can grow and shrink and the volumes can grow and shrink. Pools may blur the line of volume/fs but they are still delivering the same constraints to administrators trying to admin these boxes and the disks attached to them. People have started to follow a One-pool-to-store-them-all which I think is not always appropriate. Some alternatives: - One pool per zone might be a good idea if you want to migrate zones across systems which then becomes easy through zpool export/import in a SAN. - One pool per service level (mirror, RAID-Z2, fast, slow, cheap, expensive) might be another idea. Keep some cheap mirrored storage handy for your pool migration needs and you could wiggle your life around zpool remove. You went from one pool to share data (the major advantage of the pool concept) to a bunch of constrained pools. Also how does this resolve the issue of lun migration online? Switching between Mirror, RAID-Z, RAID-Z2 then becomes just a zfs send/receive pair. Shrinking a pool requires some more zfs send/receiving and maybe some scripting, but these are IMHO less painful than living without ZFS' data integrity and the other joys of ZFS. Ohh, never mind, dump to tape and restore (err disk) -- you do realize that the industry has been selling products that have made this behavior obsolete for close to 10 years now? Sorry if I'm stating the obvious or stuff that has been discussed before, but the more I think about zpool remove, the more I think it's a question of willingness to plan/work/script/provision vs. a real show stopper. No, it is a specific workflow that requires disk to stay online, while allowing for economically sound use of resources -- this is not about laziness (that is how I am reading your view) or not wanting to script up solutions. Best regards, Constantin P.S.: Now with my big mouth I hope I'll survive a customer confcall next week with a customer asking for exactly zpool remove :). I hope so, you may want to rethink the script and go back in sysadmin time 10 years approach. ZFS buys alot and is a great filesystem but there are places such as this that are still weak and need fixing for many environments to be able to replace vxvm/vxfs or other solutions. Sure, you will find people that are viewing this new pooled filesystem with old eyes, but there are admins on this list that actually understand what they are missing and the other options for working around these issues. We don't look at this like a feature tickmark, but as a feature that we know is missing that we really need to consider moving some of our systems from vxvm/fs to zfs. -Wade Stuart -- Constantin GonzalezSun Microsystems
Re: [zfs-discuss] Re: multihosted ZFS
On Jan 26, 2007, at 7:17, Peter Eriksson wrote: If you _boot_ the original machine then it should see that the pool now is owned by the other host and ignore it (you'd have to do a zpool import -f again I think). Not tested though so don't take my word for it... Conceptually, that's about right, but in practice it's not quite as simple as that. We had to do a lot of work in Cluster to ensure that the zpool would never be imported on more than one node at a time. Did VxVM use hostid on disks to check where the disk groups were last used, and won't automatically import groups with different id on disk. Would something like this be hard to implement? Yes, it does. There was a long thread on this not too long ago. Something similar will be added to ZFS. It won't be a full cluster solution, but it would aid in hand-failover situations like this. -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] UFS on zvol: volblocksize and maxcontig
Hi all! First off, if this has been discussed, please point me in that direction. I have searched high and low and really can't find much info on the subject. We have a large-ish (200gb) UFS file system on a Sun Enterprise 250 that is being shared with samba (lots of files, mostly random IO). OS is Solaris 10u3. Disk set is 7x36gb 10k scsi, 4 internal 3 external. For several reasons we currently need to stay on UFS and can't switch to ZFS proper. So instead we have opted to do UFS on a zvol using raid-z, in lieu of UFS on SVM using raid5 (we want/need raid protection). This decision was made because of the ease of disk set portability of zpools, and also the [assumed] performance benefit vs SVM. Anyways, I've been pondering the volblocksize parameter, and trying to figure out how it interacts with UFS. When the zvol was setup, I took the default 8k size. Since UFS uses an 8k blocksize, this seemed to be a reasonable choice. I've been thinking more about it lately, and have also read that UFS will do R/W in bigger than 8k blocks when it can, up to maxcontig (default of 16, ie 128k). This presented me with several questions: Would a volblocksize of 128k and maxcontig 16 provide better UFS performance? Overall, or only in certain situations (ie only for sequential IO)? Would increasing the maxcontig beyond 16 make any difference (good, bad or indifferent) if the underlying device is limited to 128k blocks? What exactly does volblocksize control? My observations thus far indicate that it simply sets a max block size for the [virtual] zvol device. Changing volblocksize does NOT seem to have an impact on IOs to the underlying physical disks, which always seem to float in the 50-110k range). How does volblocksize affect IO that is not of a set block size? Finally, why does volblocksize only affects raidz and mirror devices? It seems to have no effect on 'simple' devices, even though I presume striping is still used there. That is also assuming that volblocksize interacts with striping. Any answers or input is greatly appreciated. Thanks much! -Brian -- --- Brian H. Nelson Youngstown State University System Administrator Media and Academic Computing bnelson[at]cis.ysu.edu --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] UFS on zvol: volblocksize and maxcontig
Brian H. Nelson wrote: For several reasons we currently need to stay on UFS and can't switch to ZFS proper. So instead we have opted to do UFS on a zvol using raid-z, Can you state what those reasons are please ? I know that isn't answering the question you are asking but it is worth making sure you have the correct info. I'd also like to understand why UFS works for you but ZFS as a filesystem does not. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] UFS on zvol: volblocksize and maxcontig
Darren J Moffat wrote: Brian H. Nelson wrote: For several reasons we currently need to stay on UFS and can't switch to ZFS proper. So instead we have opted to do UFS on a zvol using raid-z, Can you state what those reasons are please ? I know that isn't answering the question you are asking but it is worth making sure you have the correct info. I'd also like to understand why UFS works for you but ZFS as a filesystem does not. I knew someone would ask that :) The primary reason is that our backup software (EMC/Legato Networker 7.2) does not appear to support zfs. We don't have the funds currently to upgrade to the new version that does. The other reason is that the machine has been around for years, already using UFS and quotas extensively. Over winter break we had time to upgrade to Solaris 10 and migrate the volume from svm to zvol, but not much more.There are a few thousand users on the machine. The thought of transitioning to that many zfs 'partitions' in order to have per-user quotas seemed daunting, not to mention the administrative re-training needed (edquota doesn't work. du is reporting 3000 filesystems?! etc). IMO, the quota-per-file-system approach seems inconvenient when you get past a handful of file systems. Unless I'm really missing something, it just seems like a nightmare to have to deal with such a ridiculous number of file systems. -Brian -- --- Brian H. Nelson Youngstown State University System Administrator Media and Academic Computing bnelson[at]cis.ysu.edu --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS or UFS - what to do?
Oh yep, I know that churning feeling in stomach that there's got to be a GOTCHA somewhere... it can't be *that* simple! This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] UFS on zvol: volblocksize and maxcontig
The other reason is that the machine has been around for years, already using UFS and quotas extensively. Over winter break we had time to upgrade to Solaris 10 and migrate the volume from svm to zvol, but not much more.There are a few thousand users on the machine. The thought of transitioning to that many zfs 'partitions' in order to have per-user quotas seemed daunting, not to mention the administrative re-training needed (edquota doesn't work. du is reporting 3000 filesystems?! etc). I'm assuming df? I think that the problem you are describing is a symptom of how existing tools and methods fall apart when confronted with huge numbers of filesystems, but only because more information if presented by df than you did before. I'd love to have an option to df which only reported pools, not filesystems. (rather than having to type df -F ufs; zpool list) The same problem exists with automounted home directories (but only active directories are shown, again this is something ZFS may want to emulate) IMO, the quota-per-file-system approach seems inconvenient when you get past a handful of file systems. Unless I'm really missing something, it just seems like a nightmare to have to deal with such a ridiculous number of file systems. Why? What additional per-filesystem overhead from a maintenance perspective are you seeing? Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] UFS on zvol: volblocksize and maxcontig
Brian H. Nelson wrote: IMO, the quota-per-file-system approach seems inconvenient when you get past a handful of file systems. Unless I'm really missing something, it just seems like a nightmare to have to deal with such a ridiculous number of file systems. Seconded -- is there any chance BSD-style quotas will be implemented in ZFS? I notice there's an RFE: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6501037 Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS or UFS - what to do?
On Fri, Jan 26, 2007 at 09:33:40AM -0800, Akhilesh Mritunjai wrote: ZFS Rule #0: You gotta have redundancy ZFS Rule #1: Redundancy shall be managed by zfs, and zfs alone. Whatever you have, junk it. Let ZFS manage mirroring and redundancy. ZFS doesn't forgive even single bit errors! How does this work in an environment with storage that's centrally- managed and shared between many servers? I'm putting together a new IMAP server that will eventually use 3TB of space from our Netapp via an iSCSI SAN. The Netapp provides all of the disk management and redundancy that I'll ever need. The server will only see a virtual disk (a LUN). I want to use ZFS on that LUN because it's superior to UFS in this application, even without the redundancy. There's no way to get the Netapp to behave like a JBOD. Are you saying that this configuration isn't going to work? -- -Gary Mills--Unix Support--U of M Academic Computing and Networking- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] UFS on zvol: volblocksize and maxcontig
[EMAIL PROTECTED] wrote: *snip* IMO, the quota-per-file-system approach seems inconvenient when you get past a handful of file systems. Unless I'm really missing something, it just seems like a nightmare to have to deal with such a ridiculous number of file systems. Why? What additional per-filesystem overhead from a maintenance perspective are you seeing? Casper The obvious example would be /var/mail . UFS quotas are easy. Doing the same thing with ZFS would be (I think) impossible. You would have to completely convert and existing system to a maildir or home directory mail storage setup. Other file-system-specific software could also have issues. Networker for instance does backups per filesystem. In that situation I could then possibly have ~3000 backup sets DAILY for a single machine (worst case, that each file system has changes). Granted, that may not be better or worse, just 'different' and not what I'm used to. On the other hand, I could certainly see where that could add a ton of overhead to backup processing. Don't get me wrong, zfs quotas are a good thing, and could certainly be useful in many situations. I just don't think I agree that they are a one to one replacement for ufs quotas in terms of usability in all situations. -Brian -- --- Brian H. Nelson Youngstown State University System Administrator Media and Academic Computing bnelson[at]cis.ysu.edu --- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] UFS on zvol: volblocksize and maxcontig
[EMAIL PROTECTED] wrote on 01/26/2007 12:20:17 PM: [EMAIL PROTECTED] wrote: *snip* IMO, the quota-per-file-system approach seems inconvenient when you get past a handful of file systems. Unless I'm really missing something, it just seems like a nightmare to have to deal with such a ridiculous number of file systems. Why? What additional per-filesystem overhead from a maintenance perspective are you seeing? Casper The obvious example would be /var/mail . UFS quotas are easy. Doing the same thing with ZFS would be (I think) impossible. You would have to completely convert and existing system to a maildir or home directory mail storage setup. Other file-system-specific software could also have issues. Networker for instance does backups per filesystem. In that situation I could then possibly have ~3000 backup sets DAILY for a single machine (worst case, that each file system has changes). Granted, that may not be better or worse, just 'different' and not what I'm used to. On the other hand, I could certainly see where that could add a ton of overhead to backup processing. Don't get me wrong, zfs quotas are a good thing, and could certainly be useful in many situations. I just don't think I agree that they are a one to one replacement for ufs quotas in terms of usability in all situations. Yes, there is an RFE out there for this that has been dispatched?. In many cases the zfs quotas work very well and are actually a godsend (after getting over the initial shock of seeing screens of df output) but they fail to cover usage when a filesystem or directory tree must be shared by multiple users where each users needs to have limits to what disk space they may use -- think department folders, or your example of mail. The RFE does not go into details about HOW this would be done when implemented. User level quotas don't need to exactly match ufs quotas -- they can be rethunk for zfs. Are zfs style user quotas: per zfs fs? per zfs fs and all of their children (recursive)? affected by snapshots data usage? applied to lists of fs and summed ( username:100,000:/tank/home/username;/tank/departments/usersdepartment # allow 100,000 bytes to be used in total between these two unrelated filesystems by this user)... I have faith that user quotas are going to come sometime, these how questions are interesting to me... -Wade Stuart ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] A little different look at filesystems ... Just looking for ideas
Here's something I've been noodling around for a while. I'd like to run this by some of you in this forum and see what you think. If I'm off topic, I apologize. ZFS gives large companies the ability to have huge amounts of data available to the desktop user. Moving the user data from a locally installed system to a ZFS based server makes a huge amount of sense, not only saving money in the base configuration of the desktop, but also in maintenance, and backups etc ... It make mobility within the company easier, as they can just access the files from anywhere IN the company. That's wonderful. Only there is one small problem. Many companies that are having major issues with mobility are giving more and more employees laptops. Some of the data they need can be gotten off the Company's portal, but it still requires the OS and applications to be installed locally, and user data to be on the local disk as well. As more and more laptops are purchased, the issues simply multiply, and the company now has the same maintenance issues they started with! What if something like the old CacheFS was revived, using ZFS as the base file system instead of UFS? Using the ZFS filesystems on servers as the master systems, the laptop builds a cache of files, used in the last month or so. Could be applications or user data, it would not matter. If the system was disconnected from the network, say on an airplane, the data and applications would still be available. Using the Copy on Write method in ZFS, the local cache of user data would then update when connected back to the server. If anything happens to the system, the only files actually lost would be what was done since the last update to the master. This model could be used at the desktop as well. It would effective reduce the bandwidth needed for NFS mounted clients, and could handle far many clients than without a cache. In the end, the only thing different from a laptop and a desktop might be the size of the cache. If there was a problem, simply clean the cache out and start over. This also would commonize the maintenance models between laptop and desktop. Could this be a good thing, or am I way off base??? Gary A. Ross Network Operations Architect Ford Motor Company [EMAIL PROTECTED] Phone: (313) 390-4313 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] multihosted ZFS
[EMAIL PROTECTED] said: . . . realize that the pool is now in use by the other host. That leads to two systems using the same zpool which is not nice. Is there any solution to this problem, or do I have to get Sun Cluster 3.2 if I want to serve same zpools from many hosts? We may try Sun Cluster anyway, but I'd like to know if this can be solved without it. Perhaps I'm stating the obvious, but here goes: You could use SAN zoning of the affected LUN's to keep multiple hosts from seeing the zpool. When failover time comes, you change the zoning to make the LUN's visible to the new host, then import. When the old host reboots, it won't find any zpool. Better safe than sorry Regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS or UFS - what to do?
On Jan 26, 2007, at 9:42, Gary Mills wrote: How does this work in an environment with storage that's centrally- managed and shared between many servers? I'm putting together a new IMAP server that will eventually use 3TB of space from our Netapp via an iSCSI SAN. The Netapp provides all of the disk management and redundancy that I'll ever need. The server will only see a virtual disk (a LUN). I want to use ZFS on that LUN because it's superior to UFS in this application, even without the redundancy. There's no way to get the Netapp to behave like a JBOD. Are you saying that this configuration isn't going to work? It will work, but if the storage system corrupts the data, ZFS will be unable to correct it. It will detect the error. A number that I've been quoting, albeit without a good reference, comes from Jim Gray, who has been around the data-management industry for longer than I have (and I've been in this business since 1970); he's currently at Microsoft. Jim says that the controller/drive subsystem writes data to the wrong sector of the drive without notice about once per drive per year. In a 400-drive array, that's once a day. ZFS will detect this error when the file is read (one of the blocks' checksum will not match). But it can only correct the error if it manages the redundancy. I would suggest exporting two LUNs from your central storage and let ZFS mirror them. You can get a wider range of space/performance tradeoffs if you give ZFS a JBOD, but that doesn't sound like an option. --Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A little different look at filesystems ... Just looking for ideas
On Jan 26, 2007, at 10:57, Ross, Gary (G.A.) wrote: ... What if something like the old CacheFS was revived, using ZFS as the base file system instead of UFS? ... Could this be a good thing, or am I way off base??? Disconnected operation is a hard problem. One of the better research efforts in that area was CODA, at CMU. CODA was, as I recall, and extension to AFS, but it's probably reasonable to take some of those ideas and marry them with ZFS. CODA is now open-source; at least the BSDs have it. --Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A little different look at filesystems ... Just looking for ideas
On Fri, Jan 26, 2007 at 11:11:13AM -0800, Ed Gould wrote: Disconnected operation is a hard problem. One of the better research efforts in that area was CODA, at CMU. CODA was, as I recall, and extension to AFS, but it's probably reasonable to take some of those ideas and marry them with ZFS. CODA is now open-source; at least the BSDs have it. It's funny you should mention CODA. I've just recently started looking at it as a way to get davfs mounting support onto Solaris. It's not been easy. The CODA Solaris kernel module is several years old and looks like it hasn't been touched in at least 2 years. It does not cleanly build on svn_50. CODA itself has issues as well. CODA certainly looks like an interesting option as it makes it very easy to support filesystems under Solaris (we *still* lack smbfs for pete's sake) It seems like lots of work is going to be required to make it useful however. NetBSD 3.1 is currently getting installed on my Ghetto Laptop, at which point I will start playing with CODA. If I like what I see, I'll probably look into spending some time trying to at least get the kernel module working. -brian -- The reason I don't use Gnome: every single other window manager I know of is very powerfully extensible, where you can switch actions to different mouse buttons. Guess which one is not, because it might confuse the poor users? Here's a hint: it's not the small and fast one.--Linus ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS or UFS - what to do?
On Fri, Jan 26, 2007 at 11:05:17AM -0800, Ed Gould wrote: On Jan 26, 2007, at 9:42, Gary Mills wrote: How does this work in an environment with storage that's centrally- managed and shared between many servers? It will work, but if the storage system corrupts the data, ZFS will be unable to correct it. It will detect the error. A number that I've been quoting, albeit without a good reference, comes from Jim Gray, who has been around the data-management industry for longer than I have (and I've been in this business since 1970); he's currently at Microsoft. Jim says that the controller/drive subsystem writes data to the wrong sector of the drive without notice about once per drive per year. In a 400-drive array, that's once a day. ZFS will detect this error when the file is read (one of the blocks' checksum will not match). But it can only correct the error if it manages the redundancy. Our Netapp does double-parity RAID. In fact, the filesystem design is remarkably similar to that of ZFS. Wouldn't that also detect the error? I suppose it depends if the `wrong sector without notice' error is repeated each time. Or is it random? I would suggest exporting two LUNs from your central storage and let ZFS mirror them. You can get a wider range of space/performance tradeoffs if you give ZFS a JBOD, but that doesn't sound like an option. That would double the amount of disk that we'd require. I am actually planning on using two iSCSI LUNs and letting ZFS stripe across them. When we need to expand the ZFS pool, I'd like to just expand the two LUNs on the Netapp. If ZFS won't accomodate that, I can just add a couple more LUNs. This is all convenient and easily managable. -- -Gary Mills--Unix Support--U of M Academic Computing and Networking- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS or UFS - what to do?
Gary Mills wrote: On Fri, Jan 26, 2007 at 11:05:17AM -0800, Ed Gould wrote: On Jan 26, 2007, at 9:42, Gary Mills wrote: How does this work in an environment with storage that's centrally- managed and shared between many servers? It will work, but if the storage system corrupts the data, ZFS will be unable to correct it. It will detect the error. A number that I've been quoting, albeit without a good reference, comes from Jim Gray, who has been around the data-management industry for longer than I have (and I've been in this business since 1970); he's currently at Microsoft. Jim says that the controller/drive subsystem writes data to the wrong sector of the drive without notice about once per drive per year. In a 400-drive array, that's once a day. ZFS will detect this error when the file is read (one of the blocks' checksum will not match). But it can only correct the error if it manages the redundancy. The quote from Jim seems to be related to the leaves of the tree (disks). Anecdotally, now that we have ZFS at the trunk, we're seeing that the branches are also corrupting data. We've speculated that it would occur, but now we can measure it, and it is non-zero. See Anantha's post for one such anecdote. Our Netapp does double-parity RAID. In fact, the filesystem design is remarkably similar to that of ZFS. Wouldn't that also detect the error? I suppose it depends if the `wrong sector without notice' error is repeated each time. Or is it random? We're having a debate related to this, data would be appreciated :-) Do you get small, random read performance equivalent to N-2 spindles for an N-way double-parity volume? I would suggest exporting two LUNs from your central storage and let ZFS mirror them. You can get a wider range of space/performance tradeoffs if you give ZFS a JBOD, but that doesn't sound like an option. That would double the amount of disk that we'd require. I am actually planning on using two iSCSI LUNs and letting ZFS stripe across them. When we need to expand the ZFS pool, I'd like to just expand the two LUNs on the Netapp. If ZFS won't accomodate that, I can just add a couple more LUNs. This is all convenient and easily managable. Sounds reasonable to me :-) -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS or UFS - what to do?
Gary Mills wrote: On Fri, Jan 26, 2007 at 11:05:17AM -0800, Ed Gould wrote: On Jan 26, 2007, at 9:42, Gary Mills wrote: How does this work in an environment with storage that's centrally- managed and shared between many servers? It will work, but if the storage system corrupts the data, ZFS will be unable to correct it. It will detect the error. A number that I've been quoting, albeit without a good reference, comes from Jim Gray, who has been around the data-management industry for longer than I have (and I've been in this business since 1970); he's currently at Microsoft. Jim says that the controller/drive subsystem writes data to the wrong sector of the drive without notice about once per drive per year. In a 400-drive array, that's once a day. ZFS will detect this error when the file is read (one of the blocks' checksum will not match). But it can only correct the error if it manages the redundancy. Our Netapp does double-parity RAID. In fact, the filesystem design is remarkably similar to that of ZFS. Wouldn't that also detect the error? I suppose it depends if the `wrong sector without notice' error is repeated each time. If the wrong block is written by the controller then you're out of luck. The filesystem would read the incorrect block and ... who knows. Thats why the ZFS checksums are important. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS or UFS - what to do?
[EMAIL PROTECTED] wrote on 01/26/2007 01:43:35 PM: On Fri, Jan 26, 2007 at 11:05:17AM -0800, Ed Gould wrote: On Jan 26, 2007, at 9:42, Gary Mills wrote: How does this work in an environment with storage that's centrally- managed and shared between many servers? It will work, but if the storage system corrupts the data, ZFS will be unable to correct it. It will detect the error. A number that I've been quoting, albeit without a good reference, comes from Jim Gray, who has been around the data-management industry for longer than I have (and I've been in this business since 1970); he's currently at Microsoft. Jim says that the controller/drive subsystem writes data to the wrong sector of the drive without notice about once per drive per year. In a 400-drive array, that's once a day. ZFS will detect this error when the file is read (one of the blocks' checksum will not match). But it can only correct the error if it manages the redundancy. Our Netapp does double-parity RAID. In fact, the filesystem design is remarkably similar to that of ZFS. Wouldn't that also detect the error? I suppose it depends if the `wrong sector without notice' error is repeated each time. Or is it random? I do not know, WAFL and other portions of NetApp backends are never really described in very technical details -- even getting real IOPS numbers from them seems to be a hassle, much magic -- little meat. To me, zfs is very well defined behavior and methodology (you can even see the source to verify specifics) and this allows you to _know_ what weak points are. NetApp, EMC and other disk vendors may have financial benefits for allowing edge cases such as the write hole or bit rot (x errors per disk are acceptable losses, after x errors then consider replacing disk cost/benefit analysis -- will customers actually know a bit is flipped?). In EMC's case it is very common for a disk to have multiple read/write errors before EMC will swap out the disk, they even use a substantial portion of the disk as replacement and parity bits (outside of raid) so they offset or postpone the replacement volume/costs on the customer. The most detailed description of WAFL I was able to find last time I looked was: http://www.netapp.com/library/tr/3002.pdf I would suggest exporting two LUNs from your central storage and let ZFS mirror them. You can get a wider range of space/performance tradeoffs if you give ZFS a JBOD, but that doesn't sound like an option. That would double the amount of disk that we'd require. I am actually planning on using two iSCSI LUNs and letting ZFS stripe across them. When we need to expand the ZFS pool, I'd like to just expand the two LUNs on the Netapp. If ZFS won't accomodate that, I can just add a couple more LUNs. This is all convenient and easily managable. If you do have bit errors coming from the netapp zfs will find them and will not be able to correct in this case. -- -Gary Mills--Unix Support--U of M Academic Computing and Networking- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS or UFS - what to do?
On Jan 26, 2007, at 12:13, Richard Elling wrote: On Fri, Jan 26, 2007 at 11:05:17AM -0800, Ed Gould wrote: A number that I've been quoting, albeit without a good reference, comes from Jim Gray, who has been around the data-management industry for longer than I have (and I've been in this business since 1970); he's currently at Microsoft. Jim says that the controller/drive subsystem writes data to the wrong sector of the drive without notice about once per drive per year. In a 400-drive array, that's once a day. ZFS will detect this error when the file is read (one of the blocks' checksum will not match). But it can only correct the error if it manages the redundancy. The quote from Jim seems to be related to the leaves of the tree (disks). Anecdotally, now that we have ZFS at the trunk, we're seeing that the branches are also corrupting data. We've speculated that it would occur, but now we can measure it, and it is non-zero. See Anantha's post for one such anecdote. Actually, Jim was referring to everything but the trunk. He didn't specify where from the HBA to the drive the error actually occurs. I don't think it really matters. I saw him give a talk a few years ago at the Usenix FAST conference; that's where I got this information. --Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] UFS on zvol: volblocksize and maxcontig
On 1/26/07, Darren J Moffat [EMAIL PROTECTED] wrote: Brian H. Nelson wrote: For several reasons we currently need to stay on UFS and can't switch to ZFS proper. So instead we have opted to do UFS on a zvol using raid-z, Can you state what those reasons are please ? I know that isn't answering the question you are asking but it is worth making sure you have the correct info. I'd also like to understand why UFS works for you but ZFS as a filesystem does not. Samba does not currently support ZFS ACLs. This thread caught my eye as I just recently considered a similar solution. Support is being worked on though, apparently, so I can wait: http://lists.samba.org/archive/samba-technical/2007-January/051123.html -- Eric Enright ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS or UFS - what to do?
Ed Gould wrote: On Jan 26, 2007, at 12:13, Richard Elling wrote: On Fri, Jan 26, 2007 at 11:05:17AM -0800, Ed Gould wrote: A number that I've been quoting, albeit without a good reference, comes from Jim Gray, who has been around the data-management industry for longer than I have (and I've been in this business since 1970); he's currently at Microsoft. Jim says that the controller/drive subsystem writes data to the wrong sector of the drive without notice about once per drive per year. In a 400-drive array, that's once a day. ZFS will detect this error when the file is read (one of the blocks' checksum will not match). But it can only correct the error if it manages the redundancy. Actually, Jim was referring to everything but the trunk. He didn't specify where from the HBA to the drive the error actually occurs. I don't think it really matters. I saw him give a talk a few years ago at the Usenix FAST conference; that's where I got this information. So this leaves me wondering how often the controller/drive subsystem reads data from the wrong sector of the drive without notice; is it symmetrical with respect to writing, and thus about once a drive/year, or are there factors which change this? Dana ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS or UFS - what to do?
Dana H. Myers wrote: Ed Gould wrote: On Jan 26, 2007, at 12:13, Richard Elling wrote: On Fri, Jan 26, 2007 at 11:05:17AM -0800, Ed Gould wrote: A number that I've been quoting, albeit without a good reference, comes from Jim Gray, who has been around the data-management industry for longer than I have (and I've been in this business since 1970); he's currently at Microsoft. Jim says that the controller/drive subsystem writes data to the wrong sector of the drive without notice about once per drive per year. In a 400-drive array, that's once a day. ZFS will detect this error when the file is read (one of the blocks' checksum will not match). But it can only correct the error if it manages the redundancy. Actually, Jim was referring to everything but the trunk. He didn't specify where from the HBA to the drive the error actually occurs. I don't think it really matters. I saw him give a talk a few years ago at the Usenix FAST conference; that's where I got this information. So this leaves me wondering how often the controller/drive subsystem reads data from the wrong sector of the drive without notice; is it symmetrical with respect to writing, and thus about once a drive/year, or are there factors which change this? It's not symmetrical. Often times its a fw bug. Others a spurious event causes one block to be read/written instead of an other one. (Alpha particles anyone?) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS or UFS - what to do?
On Jan 26, 2007, at 12:52, Dana H. Myers wrote: So this leaves me wondering how often the controller/drive subsystem reads data from the wrong sector of the drive without notice; is it symmetrical with respect to writing, and thus about once a drive/year, or are there factors which change this? My guess is that it would be symmetric, but I don't really know. --Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: How much do we really want zpool remove?
So, if I was an enterprise, I'd be willing to keep enough empty LUNs available to facilitate at least the migration of one or more filesystems if not complete pools. You might be, but don't be surprised when the Financials folks laugh you out of their office. Large corporations do not make money by leaving wads of cash lying around, and that's exactly what a few terabytes of unused storage in a high-end SAN is. This is in addition to the laughter generated by the comment that, not a big deal if the Financials and HR databases are offline for three days while we do the migration. Good luck writing up a business case that justifies this sort of fiscal generosity. Sorry, this argument smacks a little too much of being out of touch with the fiscal (and time) restrictions of working in a typical corporation, as opposed to a well-funded research group. I hope I'm not sounding rude, but those of us working in medium to large corporations simply do not have the money for such luxuries. Period. Rainer This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS or UFS - what to do?
Torrey McMahon wrote: Dana H. Myers wrote: Ed Gould wrote: On Jan 26, 2007, at 12:13, Richard Elling wrote: On Fri, Jan 26, 2007 at 11:05:17AM -0800, Ed Gould wrote: A number that I've been quoting, albeit without a good reference, comes from Jim Gray, who has been around the data-management industry for longer than I have (and I've been in this business since 1970); he's currently at Microsoft. Jim says that the controller/drive subsystem writes data to the wrong sector of the drive without notice about once per drive per year. In a 400-drive array, that's once a day. ZFS will detect this error when the file is read (one of the blocks' checksum will not match). But it can only correct the error if it manages the redundancy. Actually, Jim was referring to everything but the trunk. He didn't specify where from the HBA to the drive the error actually occurs. I don't think it really matters. I saw him give a talk a few years ago at the Usenix FAST conference; that's where I got this information. So this leaves me wondering how often the controller/drive subsystem reads data from the wrong sector of the drive without notice; is it symmetrical with respect to writing, and thus about once a drive/year, or are there factors which change this? It's not symmetrical. Often times its a fw bug. Others a spurious event causes one block to be read/written instead of an other one. (Alpha particles anyone?) I would tend to expect these spurious events to impact read and write equally; more specifically, the chance of any one read or write being mis-addressed is about the same. Since, AFAIK, there are many more reads from a disk typically than writes, this would seem to suggest that there would be more mis-addressed reads in a drive/year than mis-addressed writes. Is this the reason for the asymmetry? (I'm sure waving my hands here) Dana ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] UFS on zvol: volblocksize and maxcontig
Eric Enright wrote: Samba does not currently support ZFS ACLs. Yes, but this just means you can't get/set your ACLs from a CIFS client. ACLs will be enforced just fine once set locally on the server; you may also be able to get/set them from an NFS client. You may know this, but I know some are confused by this and think you lose ACL protection. Rob T ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS or UFS - what to do?
On Jan 26, 2007, at 13:16, Dana H. Myers wrote: I would tend to expect these spurious events to impact read and write equally; more specifically, the chance of any one read or write being mis-addressed is about the same. Since, AFAIK, there are many more reads from a disk typically than writes, this would seem to suggest that there would be more mis-addressed reads in a drive/year than mis-addressed writes. Is this the reason for the asymmetry? Jim's once per drive per year number was not very precise. I took it to be just one significant digit. I don't recall if he distinguished reads from writes. --Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS or UFS - what to do?
it would be good to have real data and not only guess ot anecdots this story about wrong blocks being written by RAID controllers sounds like the anti-terrorism propaganda we are leaving in: exagerate the facts to catch everyone's attention .It's going to take more than that to prove RAID ctrls have been doing a bad jobs for the last 30 years Let's make up real stories with hard fact first s. On 1/26/07, Ed Gould [EMAIL PROTECTED] wrote: On Jan 26, 2007, at 12:52, Dana H. Myers wrote: So this leaves me wondering how often the controller/drive subsystem reads data from the wrong sector of the drive without notice; is it symmetrical with respect to writing, and thus about once a drive/year, or are there factors which change this? My guess is that it would be symmetric, but I don't really know. --Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS or UFS - what to do?
On Jan 26, 2007, at 13:29, Selim Daoud wrote: it would be good to have real data and not only guess ot anecdots Yes, I agree. I'm sorry I don't have the data that Jim presented at FAST, but he did present actual data. Richard Elling (I believe it was Richard) has also posted some related data from ZFS experience to this list. There is more than just anecdotal evidence for this. --Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?
Rainer Heilke wrote: So, if I was an enterprise, I'd be willing to keep enough empty LUNs available to facilitate at least the migration of one or more filesystems if not complete pools. You might be, but don't be surprised when the Financials folks laugh you out of their office. Large corporations do not make money by leaving wads of cash lying around, and that's exactly what a few terabytes of unused storage in a high-end SAN is. This is in addition to the laughter generated by the comment that, not a big deal if the Financials and HR databases are offline for three days while we do the migration. Good luck writing up a business case that justifies this sort of fiscal generosity. To be fair, you can replace vdevs with same-sized or larger vdevs online. The issue is that you cannot replace with smaller vdevs nor can you eliminate vdevs. In other words, I can migrate data around without downtime, I just can't shrink or eliminate vdevs without send/recv. This is where the philosophical disconnect lies. Everytime we descend into this rathole, we stir up more confusion :-( If you consider your pool of storage as a zpool, then the management of subparts of the pool is done at the file system level. This concept is different than other combinations of devices and file systems such as SVM+UFS. When answering the ZFS shrink question, you need to make sure you're not applying the old concepts to the new model. Personally, I've never been in the situation where users ask for less storage, but maybe I'm just the odd guy out? ;-) Others have offered cases where a shrink or vdev restructuring could be useful. But I still see some confusion with file system management (including zvols) and device management. The shrink feature is primarily at the device management level. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
RE: [zfs-discuss] Re: ZFS or UFS - what to do?
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ed Gould Sent: Friday, January 26, 2007 3:38 PM Yes, I agree. I'm sorry I don't have the data that Jim presented at FAST, but he did present actual data. Richard Elling (I believe it was Richard) has also posted some related data from ZFS experience to this list. This seems to be from Jim and on point: http://www.usenix.org/event/fast05/tech/gray.pdf paul ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] UFS on zvol: volblocksize and maxcontig
On 1/26/07, Robert Thurlow [EMAIL PROTECTED] wrote: Eric Enright wrote: Samba does not currently support ZFS ACLs. Yes, but this just means you can't get/set your ACLs from a CIFS client. ACLs will be enforced just fine once set locally on the server; you may also be able to get/set them from an NFS client. You may know this, but I know some are confused by this and think you lose ACL protection. Quite right. Getting them set poses a problem for my specific case, unfortunately. -- Eric Enright ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] bug id 6343667
On Jan 26, 2007, at 6:02 AM, Robert Milkowski wrote: Hello zfs-discuss, Is anyone working on that bug? Any progress? For bug: 6343667 scrub/resilver has to start over when a snapshot is taken I believe that is on Matt and Mark's radar, and they have made some progress. eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS or UFS - what to do?
Dana H. Myers wrote: Torrey McMahon wrote: Dana H. Myers wrote: Ed Gould wrote: On Jan 26, 2007, at 12:13, Richard Elling wrote: On Fri, Jan 26, 2007 at 11:05:17AM -0800, Ed Gould wrote: A number that I've been quoting, albeit without a good reference, comes from Jim Gray, who has been around the data-management industry for longer than I have (and I've been in this business since 1970); he's currently at Microsoft. Jim says that the controller/drive subsystem writes data to the wrong sector of the drive without notice about once per drive per year. In a 400-drive array, that's once a day. ZFS will detect this error when the file is read (one of the blocks' checksum will not match). But it can only correct the error if it manages the redundancy. Actually, Jim was referring to everything but the trunk. He didn't specify where from the HBA to the drive the error actually occurs. I don't think it really matters. I saw him give a talk a few years ago at the Usenix FAST conference; that's where I got this information. So this leaves me wondering how often the controller/drive subsystem reads data from the wrong sector of the drive without notice; is it symmetrical with respect to writing, and thus about once a drive/year, or are there factors which change this? It's not symmetrical. Often times its a fw bug. Others a spurious event causes one block to be read/written instead of an other one. (Alpha particles anyone?) I would tend to expect these spurious events to impact read and write equally; more specifically, the chance of any one read or write being mis-addressed is about the same. Since, AFAIK, there are many more reads from a disk typically than writes, this would seem to suggest that there would be more mis-addressed reads in a drive/year than mis-addressed writes. Is this the reason for the asymmetry? (I'm sure waving my hands here) For the spurious events, yes, I would expect things to be impacted symmetrically depending when it comes to errors during reads and errors during writes. That is if you could figure out what spurious event occurred. In most cases the spurious errors are caught only at read time and you're left wondering. Was it an incorrect read? Was the data written incorrectly? You end up throwing your hands up and saying, Lets hope that doesn't happen again. It's much easier to unearth a fw bug in a particular disk drive operating in certain conditions and fix it. Now that we're checksumming things I'd expect to find more errors, and hopefully be in a condition to fix them, then we have in the past. We will also start getting customer complaints like, We moved to ZFS and now we are seeing media errors more often. Why is ZFS broken? This is similar to the StorADE issues we had in NWS - Ahhh, the good old days - when we started doing a much better job discovering issues and reporting them when in the past we were blissfully silent. We used to have some data on that with nice graphs but I can't find them lying about. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS or UFS - what to do?
Hi Jeff, We're running a FLX210 which I believe is an Engenio 2884. In our case it also is attached to a T2000. ZFS has run VERY stably for us with data integrity issues at all. We did have a significant latency problem caused by ZFS flushing the write cache on the array after every write, but that can be fixed by configuring your array to ignore cache flushes. The instructions for Engenio products are here: http://blogs.digitar.com/jjww/?itemid=44 We use the config for a production database, so I can't speak to the NFS issues. All I would mention is to watch the RAM consumption by ZFS. Does anyone on the list have a recommendation for ARC sizing with NFS? Best Regards, Jason On 1/26/07, Jeffery Malloch [EMAIL PROTECTED] wrote: Hi Folks, I am currently in the midst of setting up a completely new file server using a pretty well loaded Sun T2000 (8x1GHz, 16GB RAM) connected to an Engenio 6994 product (I work for LSI Logic so Engenio is a no brainer). I have configured a couple of zpools from Volume groups on the Engenio box - 1x2.5TB and 1x3.75TB. I then created sub zfs systems below that and set quotas and sharenfs'd them so that it appears that these file systems are dynamically shrinkable and growable. It looks very good... I can see the correct file system sizes on all types of machines (Linux 32/64bit and of course Solaris boxes) and if I resize the quota it's picked up in NFS right away. But I would be the first in our organization to use this in an enterprise system so I definitely have some concerns that I'm hoping someone here can address. 1. How stable is ZFS? The Engenio box is completely configured for RAID5 with hot spares and write cache (8GB) has battery backup so I'm not too concerned from a hardware side. I'm looking for an idea of how stable ZFS itself is in terms of corruptability, uptime and OS stability. 2. Recommended config. Above, I have a fairly simple setup. In many of the examples the granularity is home directory level and when you have many many users that could get to be a bit of a nightmare administratively. I am really only looking for high level dynamic size adjustability and am not interested in its built in RAID features. But given that, any real world recommendations? 3. Caveats? Anything I'm missing that isn't in the docs that could turn into a BIG gotchya? 4. Since all data access is via NFS we are concerned that 32 bit systems (Mainly Linux and Windows via Samba) will not be able to access all the data areas of a 2TB+ zpool even if the zfs quota on a particular share is less then that. Can anyone comment? The bottom line is that with anything new there is cause for concern. Especially if it hasn't been tested within our organization. But the convenience/functionality factors are way too hard to ignore. Thanks, Jeff This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS or UFS - what to do?
Correction: ZFS has run VERY stably for us with data integrity issues at all. should read ZFS has run VERY stably for us with NO data integrity issues at all. On 1/26/07, Jason J. W. Williams [EMAIL PROTECTED] wrote: Hi Jeff, We're running a FLX210 which I believe is an Engenio 2884. In our case it also is attached to a T2000. ZFS has run VERY stably for us with data integrity issues at all. We did have a significant latency problem caused by ZFS flushing the write cache on the array after every write, but that can be fixed by configuring your array to ignore cache flushes. The instructions for Engenio products are here: http://blogs.digitar.com/jjww/?itemid=44 We use the config for a production database, so I can't speak to the NFS issues. All I would mention is to watch the RAM consumption by ZFS. Does anyone on the list have a recommendation for ARC sizing with NFS? Best Regards, Jason On 1/26/07, Jeffery Malloch [EMAIL PROTECTED] wrote: Hi Folks, I am currently in the midst of setting up a completely new file server using a pretty well loaded Sun T2000 (8x1GHz, 16GB RAM) connected to an Engenio 6994 product (I work for LSI Logic so Engenio is a no brainer). I have configured a couple of zpools from Volume groups on the Engenio box - 1x2.5TB and 1x3.75TB. I then created sub zfs systems below that and set quotas and sharenfs'd them so that it appears that these file systems are dynamically shrinkable and growable. It looks very good... I can see the correct file system sizes on all types of machines (Linux 32/64bit and of course Solaris boxes) and if I resize the quota it's picked up in NFS right away. But I would be the first in our organization to use this in an enterprise system so I definitely have some concerns that I'm hoping someone here can address. 1. How stable is ZFS? The Engenio box is completely configured for RAID5 with hot spares and write cache (8GB) has battery backup so I'm not too concerned from a hardware side. I'm looking for an idea of how stable ZFS itself is in terms of corruptability, uptime and OS stability. 2. Recommended config. Above, I have a fairly simple setup. In many of the examples the granularity is home directory level and when you have many many users that could get to be a bit of a nightmare administratively. I am really only looking for high level dynamic size adjustability and am not interested in its built in RAID features. But given that, any real world recommendations? 3. Caveats? Anything I'm missing that isn't in the docs that could turn into a BIG gotchya? 4. Since all data access is via NFS we are concerned that 32 bit systems (Mainly Linux and Windows via Samba) will not be able to access all the data areas of a 2TB+ zpool even if the zfs quota on a particular share is less then that. Can anyone comment? The bottom line is that with anything new there is cause for concern. Especially if it hasn't been tested within our organization. But the convenience/functionality factors are way too hard to ignore. Thanks, Jeff This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?
On Fri, 26 Jan 2007, Rainer Heilke wrote: So, if I was an enterprise, I'd be willing to keep enough empty LUNs available to facilitate at least the migration of one or more filesystems if not complete pools. reformatted ... You might be, but don't be surprised when the Financials folks laugh you out of their office. Large corporations do not make money by leaving wads of cash lying around, and that's exactly what a few terabytes of unused storage in a high-end SAN is. This is in addition to the laughter But this is exactly where ZFS distrupts Large corporations thinking. You're talking about (for example) 2 terabytes on a high-end SAN which costs (what ?) per GB (including the capital cost of the hi-end SAN) versus a dual Opteron box with 12 * 500Gb SATA disk drives that gives you 5TB of storage for (in round numbers) a total of ~ $6k. And how much are your ongoing monthlies on that hi-end SAN box? (Don't answer) So - aside from the occasional use of the box for data migration, this ZFS storage box has 1,001 other uses. Pick any two (uses), based on your knowledge of big corporation thinking and its an easy sell to management. Now your accounting folks are going to be asking you to justify the purchase of that hi-end SAN box and why you're not using ZFS everywhere. :) Oh - and the accounting folks love it when you tell them there's no ongoing cost of ownership - because Joe Screwdriver can swap out a failed Seagate 500Gb SATA drive after he picks up a replacement from Frys on his lunch break! generated by the comment that, not a big deal if the Financials and HR databases are offline for three days while we do the migration. Good Again - sounds like more legacy thinking. With multiple gigabit ethernet connections you can move terrabytes of information in a hour, instead of in 24-hours - using legacy tape systems etc. This can be easily handled during scheduled downtime. luck writing up a business case that justifies this sort of fiscal generosity. Sorry, this argument smacks a little too much of being out of touch with the fiscal (and time) restrictions of working in a typical corporation, as opposed to a well-funded research group. I hope I'm not sounding rude, but those of us working in medium to large corporations simply do not have the money for such luxuries. Period. On the contrary - if you're not thinking ZFS, you're wasting a ton of IT $s and hurting the competitiveness of your business. Regards, Al Hopper Logical Approach Inc, Plano, TX. [EMAIL PROTECTED] Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 OpenSolaris Governing Board (OGB) Member - Feb 2006 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?
To be fair, you can replace vdevs with same-sized or larger vdevs online. The issue is that you cannot replace with smaller vdevs nor can you eliminate vdevs. In other words, I can migrate data around without downtime, I just can't shrink or eliminate vdevs without send/recv. This is where the philosophical disconnect lies. Everytime we descend into this rathole, we stir up more confusion :-( We did just this to move off RAID-5 LUNs that were the vdevs for a pool, to RAID-10 LUNs. Worked very well, and as Richard said was done all on-line. Doesn't really address the shrinking issue though. :-) Best Regards, Jason ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] multihosted ZFS
You could use SAN zoning of the affected LUN's to keep multiple hosts from seeing the zpool. When failover time comes, you change the zoning to make the LUN's visible to the new host, then import. When the old host reboots, it won't find any zpool. Better safe than sorry Or change the LUN masking on the array. Depending on your switch that can be less disruptive, and depending on your storage array might be able to be scripted. Best Regards, Jason ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Project Proposal: Availability Suite
Could the replication engine eventually be integrated more tightly with ZFS? That would be slick alternative to send/recv. Best Regards, Jason On 1/26/07, Jim Dunham [EMAIL PROTECTED] wrote: Project Overview: I propose the creation of a project on opensolaris.org, to bring to the community two Solaris host-based data services; namely volume snapshot and volume replication. These two data services exist today as the Sun StorageTek Availability Suite, a Solaris 8, 9 10, unbundled product set, consisting of Instant Image (II) and Network Data Replicator (SNDR). Project Description: Although Availability Suite is typically known as just two data services (II SNDR), there is an underlying Solaris I/O filter driver framework which supports these two data services. This framework provides the means to stack one or more block-based, pseudo device drivers on to any pre-provisioned cb_ops structure, [ http://www.opensolaris.org/os/article/2005-03-31_inside_opensolaris__solaris_driver_programming/#datastructs ], thereby shunting all cb_ops I/O into the top of a developed filter driver, (for driver specific processing), then out the bottom of this filter driver, back into the original cb_ops entry points. Availability Suite was developed to interpose itself on the I/O stack of a block device, providing a filter driver framework with the means to intercept any I/O originating from an upstream file system, database or application layer I/O. This framework provided the means for Availability Suite to support snapshot and remote replication data services for UFS, QFS, VxFS, and more recently the ZFS file system, plus various databases like Oracle, Sybase and PostgreSQL, and also application I/Os. By providing a filter driver at this point in the Solaris I/O stack, it allows for any number of data services to be implemented, without regard to the underlying block storage that they will be configured on. Today, as a snapshot and/or replication solution, the framework allows both the source and destination block storage device to not only differ in physical characteristics (DAS, Fibre Channel, iSCSI, etc.), but also logical characteristics such as in RAID type, volume managed storage (i.e., SVM, VxVM), lofi, zvols, even ram disks. Community Involvement: By providing this filter-driver framework, two working filter drivers (II SNDR), and an extensive collection of supporting software and utilities, it is envisioned that those individuals and companies that adopt OpenSolaris as a viable storage platform, will also utilize and enhance the existing II SNDR data services, plus have offered to them the means in which to develop their own block-based filter driver(s), further enhancing the use and adoption on OpenSolaris. A very timely example that is very applicable to Availability Suite and the OpenSolaris community, is the recent announcement of the Project Proposal: lofi [ compression encryption ] - http://www.opensolaris.org/jive/click.jspamessageID=26841. By leveraging both the Availability Suite and the lofi OpenSolaris projects, it would be highly probable to not only offer compression encryption to lofi devices (as already proposed), but by collectively leveraging these two project, creating the means to support file systems, databases and applications, across all block-based storage devices. Since Availability Suite has strong technical ties to storage, please look for email discussion for this project at: storage-discuss at opensolaris dot org A complete set of Availability Suite administration guides can be found at: http://docs.sun.com/app/docs?p=coll%2FAVS4.0 Project Lead: Jim Dunham http://www.opensolaris.org/viewProfile.jspa?username=jdunham Availability Suite - New Solaris Storage Group This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS or UFS - what to do?
On Fri, Jan 26, 2007 at 11:05:17AM -0800, Ed Gould wrote: A number that I've been quoting, albeit without a good reference, comes from Jim Gray, who has been around the data-management industry for longer than I have (and I've been in this business since 1970); he's currently at Microsoft. Jim says that the controller/drive subsystem writes data to the wrong sector of the drive without notice about once per drive per year. In a 400-drive array, that's once a day. ZFS will detect this error when the file is read (one of the blocks' checksum will not match). But it can only correct the error if it manages the redundancy. My only qualification to enter this discussion is that I once wrote a floppy disk format program for minix. I recollect, however, that each sector on the disk is accompanied by a block that contains the sector address and a CRC. In order to write to the wrong sector, both of these items would have to be read incorrectly. Otherwise, the controller would never find the wrong sector. Are we just talking about a CRC failure here? That would be random, but the frequency of CRC errors would depend on the signal quality. -- -Gary Mills--Unix Support--U of M Academic Computing and Networking- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS or UFS - what to do?
On 26-Jan-07, at 7:29 PM, Selim Daoud wrote: it would be good to have real data and not only guess ot anecdots this story about wrong blocks being written by RAID controllers sounds like the anti-terrorism propaganda we are leaving in: exagerate the facts to catch everyone's attention .It's going to take more than that to prove RAID ctrls have been doing a bad jobs for the last 30 years It does happen. Hard numbers are available if you look. This sounds a bit like the RAID expert I bumped into who just couldn't see the paradigm had shifted under him -- the implications of end to end. Let's make up real stories with hard fact first s. Related links: https://www.gelato.unsw.edu.au/archives/comp-arch/2006-September/ 003008.html http://www.lockss.org/locksswiki/files/3/30/Eurosys2006.pdf [A Fresh Look at the Reliability of Long-term Digital Storage, 2006] http://www.ecsl.cs.sunysb.edu/tr/rpe19.pdf [Challenges of Long-Term Digital Archiving: A Survey, 2006] http://www.cs.wisc.edu/~vijayan/vijayan-thesis.pdf [IRON File Systems, 2006] http://www.tcs.hut.fi/~hhk/phd/phd_Hannu_H_Kari.pdf [Latent Sector Faults and Reliability of Disk Arrays, 1997] --T On 1/26/07, Ed Gould [EMAIL PROTECTED] wrote: On Jan 26, 2007, at 12:52, Dana H. Myers wrote: So this leaves me wondering how often the controller/drive subsystem reads data from the wrong sector of the drive without notice; is it symmetrical with respect to writing, and thus about once a drive/ year, or are there factors which change this? My guess is that it would be symmetric, but I don't really know. --Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?
Oh - and the accounting folks love it when you tell them there's no ongoing cost of ownership - because Joe Screwdriver can swap out a failed Seagate 500Gb SATA drive after he picks up a replacement from Frys on his lunch break! Why do people think this will work? I never could figure it out. There's many a slip 'twixt cup and lip. You need the spare already sitting there. --T ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS or UFS - what to do?
My only qualification to enter this discussion is that I once wrote a floppy disk format program for minix. I recollect, however, that each sector on the disk is accompanied by a block that contains the sector address and a CRC. You'd have to define the layer you're talking about. I presume something like this occurs between a dumb disk and an intelligent controller, or even within the encoding parameters of a disk, but I don't think it does between say a SCSI/FC controller and a disk. So if the drive itself put the head in the wrong sector, maybe it could figure that out. But perhaps the scsi controller had a bug and sent the wrong address to the drive. I don't think there's anything at that layer that would notice (unless the application/file system is encoding intent into the data). Corrections about my assumption with SCSI/FC/ATA appreciated. -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Project Proposal: Availability Suite
Jason J. W. Williams wrote: Could the replication engine eventually be integrated more tightly with ZFS? Not it in the present form. The architecture and implementation of Availability Suite is driven off block-based replication at the device level (/dev/rdsk/...), something that allows the product to replicate any Solaris file system, database, etc., without any knowledge of what it is actually replicating. To pursue ZFS replication in the manner of Availability Suite, one needs to see what replication looks like from an abstract point of view. So simplistically, remote replication is like the letter 'h', where the left side of the letter is the complete I/O path on the primary node, the horizontal part of the letter is the remote replication network link, and the right side of the letter is only the bottom half of the complete I/O path on the secondary node. Next ZFS would have to have its functional I/O path split into two halves, a top and bottom piece. Next we configure replication, the letter 'h', between two given nodes, running both a top and bottom piece of ZFS on the source node, and just the bottom half of ZFS on the secondary node. Today, the SNDR component of Availability Suite works like the letter 'h' today, where we split the Solaris I/O stack into a top and bottom half. The top half is that software (file system, database or application I/O) that directs its I/Os to the bottom half (raw device, volume manager or block device). So all that needs to be done is to design and build a new variant of the letter 'h', and find the place to separate ZFS into two pieces. - Jim Dunham That would be slick alternative to send/recv. Best Regards, Jason On 1/26/07, Jim Dunham [EMAIL PROTECTED] wrote: Project Overview: I propose the creation of a project on opensolaris.org, to bring to the community two Solaris host-based data services; namely volume snapshot and volume replication. These two data services exist today as the Sun StorageTek Availability Suite, a Solaris 8, 9 10, unbundled product set, consisting of Instant Image (II) and Network Data Replicator (SNDR). Project Description: Although Availability Suite is typically known as just two data services (II SNDR), there is an underlying Solaris I/O filter driver framework which supports these two data services. This framework provides the means to stack one or more block-based, pseudo device drivers on to any pre-provisioned cb_ops structure, [ http://www.opensolaris.org/os/article/2005-03-31_inside_opensolaris__solaris_driver_programming/#datastructs ], thereby shunting all cb_ops I/O into the top of a developed filter driver, (for driver specific processing), then out the bottom of this filter driver, back into the original cb_ops entry points. Availability Suite was developed to interpose itself on the I/O stack of a block device, providing a filter driver framework with the means to intercept any I/O originating from an upstream file system, database or application layer I/O. This framework provided the means for Availability Suite to support snapshot and remote replication data services for UFS, QFS, VxFS, and more recently the ZFS file system, plus various databases like Oracle, Sybase and PostgreSQL, and also application I/Os. By providing a filter driver at this point in the Solaris I/O stack, it allows for any number of data services to be implemented, without regard to the underlying block storage that they will be configured on. Today, as a snapshot and/or replication solution, the framework allows both the source and destination block storage device to not only differ in physical characteristics (DAS, Fibre Channel, iSCSI, etc.), but also logical characteristics such as in RAID type, volume managed storage (i.e., SVM, VxVM), lofi, zvols, even ram disks. Community Involvement: By providing this filter-driver framework, two working filter drivers (II SNDR), and an extensive collection of supporting software and utilities, it is envisioned that those individuals and companies that adopt OpenSolaris as a viable storage platform, will also utilize and enhance the existing II SNDR data services, plus have offered to them the means in which to develop their own block-based filter driver(s), further enhancing the use and adoption on OpenSolaris. A very timely example that is very applicable to Availability Suite and the OpenSolaris community, is the recent announcement of the Project Proposal: lofi [ compression encryption ] - http://www.opensolaris.org/jive/click.jspamessageID=26841. By leveraging both the Availability Suite and the lofi OpenSolaris projects, it would be highly probable to not only offer compression encryption to lofi devices (as already proposed), but by collectively leveraging these two project, creating the means to support file systems, databases and applications, across all block-based storage devices. Since Availability
Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?
On Fri, 26 Jan 2007, Toby Thain wrote: Oh - and the accounting folks love it when you tell them there's no ongoing cost of ownership - because Joe Screwdriver can swap out a failed Seagate 500Gb SATA drive after he picks up a replacement from Frys on his lunch break! Why do people think this will work? I never could figure it out. There's many a slip 'twixt cup and lip. You need the spare already sitting there. Agreed. I remember years ago, when a Sun service tech came onsite at a fortune 100 company I was working in at the time and we stopped him, handed him a disk drive in an anti-static bag and said - don't unpack your tools - it was a bad disk, we replaced it from our spares, here's the bad one - please replace it under the service agreement. He thought about this for about 5 Seconds and said; I wish all my customers were like you guys. Then he was gone! :) Regards, Al Hopper Logical Approach Inc, Plano, TX. [EMAIL PROTECTED] Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 OpenSolaris Governing Board (OGB) Member - Feb 2006 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?
On Fri, 26 Jan 2007, Torrey McMahon wrote: Al Hopper wrote: Now your accounting folks are going to be asking you to justify the purchase of that hi-end SAN box and why you're not using ZFS everywhere. :) Oh - and the accounting folks love it when you tell them there's no ongoing cost of ownership - because Joe Screwdriver can swap out a failed Seagate 500Gb SATA drive after he picks up a replacement from Frys on his lunch break! Because ZFS doesn't run everywhere. Because most low end JBODs are low end for a reason. They aren't as reliable, have crappy monitoring, etc. Agreed. There will never be one screwdriver that fits everything. I was simply trying to re-inforce my point. Fix those two things when you get a chance. ;) Have a good weekend Torrey (and zfs-discuss). Regards, Al Hopper Logical Approach Inc, Plano, TX. [EMAIL PROTECTED] Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005 OpenSolaris Governing Board (OGB) Member - Feb 2006 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?
Richard Elling wrote: Personally, I've never been in the situation where users ask for less storage, but maybe I'm just the odd guy out? ;-) You just realized that JoeSysadmin allocated ten luns to the zpool when he realy only should have allocated one. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?
Al Hopper wrote: On Fri, 26 Jan 2007, Torrey McMahon wrote: Al Hopper wrote: Now your accounting folks are going to be asking you to justify the purchase of that hi-end SAN box and why you're not using ZFS everywhere. :) Oh - and the accounting folks love it when you tell them there's no ongoing cost of ownership - because Joe Screwdriver can swap out a failed Seagate 500Gb SATA drive after he picks up a replacement from Frys on his lunch break! Because ZFS doesn't run everywhere. Because most low end JBODs are low end for a reason. They aren't as reliable, have crappy monitoring, etc. Agreed. There will never be one screwdriver that fits everything. I was simply trying to re-inforce my point. It's a good point. We just need to make sure we don't forget that part. People love to pull email threads out of contextor google for that matter. ;) Fix those two things when you get a chance. ;) Have a good weekend Torrey (and zfs-discuss). Same to you Al. (and zfs-discuss). ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS or UFS - what to do?
Toby Thain wrote: On 26-Jan-07, at 7:29 PM, Selim Daoud wrote: it would be good to have real data and not only guess ot anecdots this story about wrong blocks being written by RAID controllers sounds like the anti-terrorism propaganda we are leaving in: exagerate the facts to catch everyone's attention .It's going to take more than that to prove RAID ctrls have been doing a bad jobs for the last 30 years It does happen. Hard numbers are available if you look. This sounds a bit like the RAID expert I bumped into who just couldn't see the paradigm had shifted under him -- the implications of end to end. It happens. As long we look at the numbers in context and don't run around going, Hey...have you seen these numbers? What have been doing for the last 35 years!?!? we're ok. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs rewrite?
What do you guys think about implementing 'zfs/zpool rewrite' command? It'll read every block older than the date when the command was executed and write it again (using standard ZFS COW mechanism, simlar to how resilvering works, but the data is read from the same disk it is written to= ). #1 How do you control I/O overhead? #2 Snapshot blocks are never rewritten at the moment. Most of your suggestions seem to imply working on the live data, but doing that for snapshots as well might be tricky. 3. I created file system with huge amount of data, where most of the data is read-only. I change my server from intel to sparc64 machine. Adaptive endianess only change byte order to native on write and because file system is mostly read-only, it'll need to byteswap all the time. And here comes 'zfs rewrite'! It's only the metadata that is modified anyway, not the file data. I would hope that this could be done more easily than a full tree rewrite (and again the issue with snapshots). Also, the overhead there probably isn't going to be very high (since the metadata will be cached in most cases). Other than that, I'm guessing something like this will be necessary to implement disk evacuation/removal. If you have to rewrite data from one disk to elsewhere in the pool, then rewriting the entire tree shouldn't be much harder. -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs rewrite?
On 26-Jan-07, at 11:34 PM, Pawel Jakub Dawidek wrote: Hi. What do you guys think about implementing 'zfs/zpool rewrite' command? It'll read every block older than the date when the command was executed and write it again (using standard ZFS COW mechanism, simlar to how resilvering works, but the data is read from the same disk it is written to). I see few situations where it might be useful: 1. My file system is almost full (or not) and I'd like to enable compression on it. Unfortunately compression will work from now on and I'd also like to compress already stored data. Here comes 'zfs rewrite'! 2. I was bad boy and turned off checksuming. Now I suspect something corrupts my data and I'd really like to checksum everything. Ok, here comes 'zfs rewrite'! In this case you deserve what you get. 3. I created file system with huge amount of data, where most of the data is read-only. I change my server from intel to sparc64 machine. Adaptive endianess only change byte order to native on write and because file system is mostly read-only, it'll need to byteswap all the time. And here comes 'zfs rewrite'! Why would this help? (Obviously file data is never 'swapped'). --T 4. Not sure how ZFS traverse blocks tree, if it is done based on files, it my be used to move data from one file closer to each other, which will reduce seek times. Because of the way how ZFS works, the data may become fragmented and 'zfs rewrite' could be used for defragmentation. 5. Once file system encryption is implemented, this mechanism can be used to encrypt existing file system and also it can be used to change encryption key. What do you think? -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Re: How much do we really want zpool remove?
Richard Elling wrote: Rainer Heilke wrote: So, if I was an enterprise, I'd be willing to keep enough empty LUNs available to facilitate at least the migration of one or more filesystems if not complete pools. You might be, but don't be surprised when the Financials folks laugh you out of their office. Large corporations do not make money by leaving wads of cash lying around, and that's exactly what a few terabytes of unused storage in a high-end SAN is. This is in addition to the laughter generated by the comment that, not a big deal if the Financials and HR databases are offline for three days while we do the migration. Good luck writing up a business case that justifies this sort of fiscal generosity. To be fair, you can replace vdevs with same-sized or larger vdevs online. The issue is that you cannot replace with smaller vdevs nor can you eliminate vdevs. In other words, I can migrate data around without downtime, I just can't shrink or eliminate vdevs without send/recv. This is where the philosophical disconnect lies. Everytime we descend into this rathole, we stir up more confusion :-( If you consider your pool of storage as a zpool, then the management of subparts of the pool is done at the file system level. This concept is different than other combinations of devices and file systems such as SVM+UFS. When answering the ZFS shrink question, you need to make sure you're not applying the old concepts to the new model. Personally, I've never been in the situation where users ask for less storage, but maybe I'm just the odd guy out? ;-) Others have offered cases where a shrink or vdev restructuring could be useful. But I still see some confusion with file system management (including zvols) and device management. The shrink feature is primarily at the device management level. -- richard I understand these arguments, and the differences (and that most users will never ask for less storage), but there are many instances where storage needs to move around, even between systems, and in that case, unless a whole zpool of storage is going, how do you do it? You need to give back two LUN's in a 6-LUN zpool. Oh, wait. You can't shrink a zpool. Many people here are giving examples of where this capability is needed. We need to agree that different users' needs vary, and that there are real reasons for this. Rainer This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Re: How much do we really want zpool remove?
Al Hopper wrote: On Fri, 26 Jan 2007, Rainer Heilke wrote: So, if I was an enterprise, I'd be willing to keep enough empty LUNs available to facilitate at least the migration of one or more filesystems if not complete pools. reformatted ... You might be, but don't be surprised when the Financials folks laugh you out of their office. Large corporations do not make money by leaving wads of cash lying around, and that's exactly what a few terabytes of unused storage in a high-end SAN is. This is in addition to the laughter But this is exactly where ZFS distrupts Large corporations thinking. Yes and no. A corporation has a SAN for reasons that have been valid for years; you won't turn that ship around on a skating rink. You're talking about (for example) 2 terabytes on a high-end SAN which costs (what ?) per GB (including the capital cost of the hi-end SAN) versus a dual Opteron box with 12 * 500Gb SATA disk drives that gives you 5TB of storage for (in round numbers) a total of ~ $6k. And how much are your ongoing monthlies on that hi-end SAN box? (Don't answer) So - aside from the occasional use of the box for data migration, this ZFS storage box has 1,001 other uses. Pick any two (uses), based on your knowledge of big corporation thinking and its an easy sell to management. Now your accounting folks are going to be asking you to justify the purchase of that hi-end SAN box and why you're not using ZFS everywhere. :) No, they're going to be asking me why I want to run a $400K server holding all of our inventory and financials data on a cheap piece of storage I picked up at Pa's Pizza Parlor and Computer Parts. There are values (real and imagined, perhaps) that a SAN offers. And, when the rest of the company is running on the SAN, why aren't you? As a side-note, if your company has a mainframe (yes, they still exist!), when will ZFS run on it? We'll need the SAN for a while, yet. generated by the comment that, not a big deal if the Financials and HR databases are offline for three days while we do the migration. Good Again - sounds like more legacy thinking. With multiple gigabit ethernet connections you can move terrabytes of information in a hour, instead of in 24-hours - using legacy tape systems etc. This can be easily handled during scheduled downtime. If your company is graced with being able to cost-justify the rip-and-replace of the entire 100Mb network, more power to you. Someone has to pay for all of this, and good luck fobbing it all of on some client. Sorry, this argument smacks a little too much of being out of touch with the fiscal (and time) restrictions of working in a typical corporation, as opposed to a well-funded research group. I hope I'm not sounding rude, but those of us working in medium to large corporations simply do not have the money for such luxuries. Period. On the contrary - if you're not thinking ZFS, you're wasting a ton of IT $s and hurting the competitiveness of your business. But you can't write off the investment of the old gear in six months and move on. I wish life worked like that, but it doesn't. At least, not where I work. :-( Regards, Al Hopper Rainer This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs rewrite?
On January 27, 2007 12:27:17 AM -0200 Toby Thain [EMAIL PROTECTED] wrote: On 26-Jan-07, at 11:34 PM, Pawel Jakub Dawidek wrote: 3. I created file system with huge amount of data, where most of the data is read-only. I change my server from intel to sparc64 machine. Adaptive endianess only change byte order to native on write and because file system is mostly read-only, it'll need to byteswap all the time. And here comes 'zfs rewrite'! Why would this help? (Obviously file data is never 'swapped'). Metadata (incl checksums?) still has to be byte-swapped. Or would atime updates also force a metadata update? Or am I totally mistaken. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS or UFS - what to do?
1. How stable is ZFS? It's a new file system; there will be bugs. It appears to be well-tested, though. There are a few known issues; for instance, a write failure can panic the system under some circumstances. UFS has known issues too 2. Recommended config. Above, I have a fairly simple setup. In many of the examples the granularity is home directory level and when you have many many users that could get to be a bit of a nightmare administratively. Do you need user quotas? If so, you need a file system per user with ZFS. That may be an argument against it in some environments, but in my experience tends to be more important in academic settings than corporations. 4. Since all data access is via NFS we are concerned that 32 bit systems (Mainly Linux and Windows via Samba) will not be able to access all the data areas of a 2TB+ zpool even if the zfs quota on a particular share is less then that. Can anyone comment? Not a problem. NFS doesn't really deal with volumes, just files, so the offsets are always file-relative and the volume can be as large as desired. Anton This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: high density SAS
How badly can you mess up a JBOD? Two words: vibration, cooling. Three more: power, signal quality. I've seen even individual drive cases with bad enough signal quality to cause bit errors. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs rewrite?
On Fri, Jan 26, 2007 at 10:57:19PM -0800, Frank Cusack wrote: On January 27, 2007 12:27:17 AM -0200 Toby Thain [EMAIL PROTECTED] wrote: On 26-Jan-07, at 11:34 PM, Pawel Jakub Dawidek wrote: 3. I created file system with huge amount of data, where most of the data is read-only. I change my server from intel to sparc64 machine. Adaptive endianess only change byte order to native on write and because file system is mostly read-only, it'll need to byteswap all the time. And here comes 'zfs rewrite'! Why would this help? (Obviously file data is never 'swapped'). Metadata (incl checksums?) still has to be byte-swapped. Or would atime updates also force a metadata update? Or am I totally mistaken. You're all correct. File data is never byte-swapped. Most metadata needs to be byte-swapped, but it's generally only 1-2% of your space. So the overhead shouldn't be significant, even if you never rewrite. An atime update will indeed cause a znode rewrite (unless you run with zfs set atime=off), so znodes will get rewritten by reads. The only other non-trivial metadata is the indirect blocks. All files up to 128k are stored in a single block: ZFS has variable blocksize from 512 bytes to 128k, so a 35k file consumes exactly 35k (not, say, 40k as it would with a fixed 8k blocksize). Single-block files have no indirect blocks, and hence no metadata other than the znode. So all that remains is the indirect blocks for files larger than 128k -- which is to say, not very much. Jeff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss