Re: [zfs-discuss] Re: Re: simple Raid-Z question
Eric Haycraft wrote: Since no one seems to believe that you can expand a raidz pool, I have attached the following output from solaris 11/06 showing me doing just that. The first expanision is with like sized disks, and the second expansion is with larger disks. I realize that the documentation only has examples using mirrors, but raidz and raid2z are fully supported for adding disk space. No one has said that you can't increase the size of a zpool. What can't be increased is the size of a RAID-Z vdev (except by increasing the size of all of the components of the RAID-Z). You have created additional RAID-Z vdevs and added them to the pool. -- --Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: update on zfs boot support
Richard Elling wrote: warning: noun/verb overload. In my context, swap is a verb. It is also a common shorthand for swap space. -- --Ed begin:vcard fn:Ed Gould n:Gould;Ed org:Sun Microsystems, Inc.;Solaris Cluster adr;dom:M/S UMPK17-201;;17 Network Circle;Menlo Park;CA;94025 email;internet:[EMAIL PROTECTED] title:File System Architect, PSARC Chair tel;work:+1.650.786.4937 x-mozilla-html:FALSE version:2.1 end:vcard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: multihosted ZFS
On Jan 26, 2007, at 7:17, Peter Eriksson wrote: If you _boot_ the original machine then it should see that the pool now is owned by the other host and ignore it (you'd have to do a zpool import -f again I think). Not tested though so don't take my word for it... Conceptually, that's about right, but in practice it's not quite as simple as that. We had to do a lot of work in Cluster to ensure that the zpool would never be imported on more than one node at a time. However if you simply type go and let it continue from where it was then things definitely will not be pretty... :-) Yes, but that's only one of the bad scenarios. --Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS or UFS - what to do?
On Jan 26, 2007, at 9:42, Gary Mills wrote: How does this work in an environment with storage that's centrally- managed and shared between many servers? I'm putting together a new IMAP server that will eventually use 3TB of space from our Netapp via an iSCSI SAN. The Netapp provides all of the disk management and redundancy that I'll ever need. The server will only see a virtual disk (a LUN). I want to use ZFS on that LUN because it's superior to UFS in this application, even without the redundancy. There's no way to get the Netapp to behave like a JBOD. Are you saying that this configuration isn't going to work? It will work, but if the storage system corrupts the data, ZFS will be unable to correct it. It will detect the error. A number that I've been quoting, albeit without a good reference, comes from Jim Gray, who has been around the data-management industry for longer than I have (and I've been in this business since 1970); he's currently at Microsoft. Jim says that the controller/drive subsystem writes data to the wrong sector of the drive without notice about once per drive per year. In a 400-drive array, that's once a day. ZFS will detect this error when the file is read (one of the blocks' checksum will not match). But it can only correct the error if it manages the redundancy. I would suggest exporting two LUNs from your central storage and let ZFS mirror them. You can get a wider range of space/performance tradeoffs if you give ZFS a JBOD, but that doesn't sound like an option. --Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A little different look at filesystems ... Just looking for ideas
On Jan 26, 2007, at 10:57, Ross, Gary (G.A.) wrote: ... What if something like the old CacheFS was revived, using ZFS as the base file system instead of UFS? ... Could this be a good thing, or am I way off base??? Disconnected operation is a hard problem. One of the better research efforts in that area was CODA, at CMU. CODA was, as I recall, and extension to AFS, but it's probably reasonable to take some of those ideas and marry them with ZFS. CODA is now open-source; at least the BSDs have it. --Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS or UFS - what to do?
On Jan 26, 2007, at 12:13, Richard Elling wrote: On Fri, Jan 26, 2007 at 11:05:17AM -0800, Ed Gould wrote: A number that I've been quoting, albeit without a good reference, comes from Jim Gray, who has been around the data-management industry for longer than I have (and I've been in this business since 1970); he's currently at Microsoft. Jim says that the controller/drive subsystem writes data to the wrong sector of the drive without notice about once per drive per year. In a 400-drive array, that's once a day. ZFS will detect this error when the file is read (one of the blocks' checksum will not match). But it can only correct the error if it manages the redundancy. The quote from Jim seems to be related to the leaves of the tree (disks). Anecdotally, now that we have ZFS at the trunk, we're seeing that the branches are also corrupting data. We've speculated that it would occur, but now we can measure it, and it is non-zero. See Anantha's post for one such anecdote. Actually, Jim was referring to everything but the trunk. He didn't specify where from the HBA to the drive the error actually occurs. I don't think it really matters. I saw him give a talk a few years ago at the Usenix FAST conference; that's where I got this information. --Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS or UFS - what to do?
On Jan 26, 2007, at 12:52, Dana H. Myers wrote: So this leaves me wondering how often the controller/drive subsystem reads data from the wrong sector of the drive without notice; is it symmetrical with respect to writing, and thus about once a drive/year, or are there factors which change this? My guess is that it would be symmetric, but I don't really know. --Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS or UFS - what to do?
On Jan 26, 2007, at 13:16, Dana H. Myers wrote: I would tend to expect these spurious events to impact read and write equally; more specifically, the chance of any one read or write being mis-addressed is about the same. Since, AFAIK, there are many more reads from a disk typically than writes, this would seem to suggest that there would be more mis-addressed reads in a drive/year than mis-addressed writes. Is this the reason for the asymmetry? Jim's once per drive per year number was not very precise. I took it to be just one significant digit. I don't recall if he distinguished reads from writes. --Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS or UFS - what to do?
On Jan 26, 2007, at 13:29, Selim Daoud wrote: it would be good to have real data and not only guess ot anecdots Yes, I agree. I'm sorry I don't have the data that Jim presented at FAST, but he did present actual data. Richard Elling (I believe it was Richard) has also posted some related data from ZFS experience to this list. There is more than just anecdotal evidence for this. --Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] External drive enclosures + Sun Server for mass storage
Shannon Roddy wrote: For sun to charge 4-8 times street price for hard drives that they order just the same as I do from the same manufacturers that I order from is infuriating. Are you sure they're really the same drives? Mechanically, they probably are, but last I knew (I don't work in the Storage part of Sun, so I have no particular knowledge about current practices), Sun and other systems vendors (I know both Apple and DEC did) had custom firmware in the drives they resell. One reason for this is that the systems vendors qualified the drives with a particular firmware load, and did not buy just the latest firmware that the drive manufacturer wanted to ship, for quality-control reasons. At least some of the time, there were custom functionality changes as well. -- --Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Distributed FS
Ivan wrote: Hi, Is ZFS comparable to PVFS2? Could it also be used as an distributed filesystem at the moment or are there any plans for this in the future? I don't know anything at all about PVFS2, so I can't comment on that point. As far as ZFS being used as a distributed file system, it cannot be used as such today, but it is something we sould like to develop. Do you have a specific use case in mind for a distributed file system? -- --Ed begin:vcard fn:Ed Gould n:Gould;Ed org:Sun Microsystems, Inc.;Solaris Cluster adr;dom:M/S UMPK17-201;;17 Network Circle;Menlo Park;CA;94025 email;internet:[EMAIL PROTECTED] title:File System Architect, PSARC Chair tel;work:+1.650.786.4937 x-mozilla-html:FALSE version:2.1 end:vcard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: !
On Dec 22, 2006, at 09:50, Anton B. Rang wrote: Phantom writes and/or misdirected reads/writes: I haven't seen probabilities published on this; obviously the disk vendors would claim zero, but we believe they're slightly wrong. ;-) That said, 1 in 10^8 bits would mean we’d have an error in every 12 megabytes written! That’s clearly far too low. 1 in 10^8 blocks would be an error in every 46 gigabytes written; that is also clearly far too low. (At 1 GB/second that would be a phantom write every minute.) Jim Gray (a well-known and respected database expert, currently at Microsoft) clams that the drive/controller combination will write data to the wrong place on the drive at a rate of about one incident/ drive/year. In a 400-drive array (JBOD or RAID, doesn't matter), that would be about once a day. This is a kind of error that (so far, at least) can only be detected (and potentially corrected, given redundancy) by ZFS. --Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] A Plea for Help: Thumper/ZFS/NFS/B43
On Dec 9, 2006, at 8:59 , Jim Mauro wrote: AnywayI'm feeling rather naive' here, but I've seen the NFS enforced synchronous semantics phrase kicked around many times as the explanation for suboptimal performance for metadata-intensive operations when ZFS is the underlying file system, but I never really understood what's unsynchronous about doing the same thing to a local ZFS If I remember correctly, the difference is that NFS requires that the operation be committed to stable storage before the return to the client. This is definitely a heavier operation than the local case, where the return to the caller may happen as soon as the operation is cached. If there's a crash, the local system does not guarantee to the caller that the operation is on disk, but NFS does. Both guarantee consistency but NFS makes stronger guarantees of completeness. --Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mirrored Raidz
On Oct 20, 2006, at 0:48, Torrey McMahon wrote: Anthony Miller wrote: I want to create create a raidz on one array and have it mirrored to the other array. Do you think this will get you more availability compared to a simple mirror? I'm curious as to why you would want to do this. This configuration will survive the failure of one drive in either RAIDZ *plus* the failure of any number of drives (or the whole mirror) in the other. That may or may not be valuable enough to choose, but it will survive more failures than just a mirror. --Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic
On Sep 8, 2006, at 9:33, Richard Elling - PAE wrote: I was looking for a new AM2 socket motherboard a few weeks ago. All of the ones I looked at had 2xIDE and 4xSATA with onboard (SATA) RAID. All were less than $150. In other words, the days of having a JBOD-only solution are over except for single disk systems. 4x750 GBytes is a *lot* of data (and video). It's not clear to me that JBOD is dead. The (S)ATA RAID cards I've seen are really software RAID solutions that know just enough in the controller to let the BIOS boot off a RAID volume. None of the expensive RAID stuff is in the controller. --Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can a zfs storage pool be imported readonly?
oab wrote: I'm new to ZFS so I was wondering if it is possible to concurrently share a ZFS storage pool between two separate machines. I am currently evaluating Sybase IQ running on ZFS rather than raw devices(initial performance tests look very promising) and need now to evaluate whether the IQ query server functionality will work. Not today. We are very early on in the process of defining a project to make ZFS a shared file system for Sun Cluster. When this project is complete (no idea when that might be, yet, sorry), it will be possible to share a zpool among nodes of a cluster. We have support for fail-over ZFS (HA-ZFS) in the next release of Sun Cluster, due out later this year. That doesn't provide concurrent access, but it does provide high availability. This simply means that the Query server will have readonly access to the data (no lock manage required as far as I can tell), so that is why I would need to import the storage pool readonly or at least without exporting it. It's conceivable that it might work if *all* the nodes imported the pool read-only, but that's not tested and certainly not supported. -- --Ed begin:vcard fn:Ed Gould n:Gould;Ed org:Sun Microsystems, Inc.;Sun Cluster adr;dom:M/S UMPK17-201;;17 Network Circle;Menlo Park;CA;94025 email;internet:[EMAIL PROTECTED] title:File System Architect tel;work:650/786-4937 x-mozilla-html:FALSE version:2.1 end:vcard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Proposal: zfs create -o
Brian Hechinger wrote: Could you mix and match by keeping the current style assuming there are no -o options present? # zfs create pool/fs If you need to specify options, then they should all be options: # zfs create -o name=pool/fs -o mountpoint=/bar -o etc I would be tempted to have two forms of the command. One (as exists today) takes no options. The other takes only options, but doesn't need the -o markers. For example: zfs create pool/fs or zfs create mountpoint=/bar name=pool/fs compression=on In the second form, options may appear in any order. -- --Ed begin:vcard fn:Ed Gould n:Gould;Ed org:Sun Microsystems, Inc.;Sun Cluster adr;dom:M/S UMPK17-201;;17 Network Circle;Menlo Park;CA;94025 email;internet:[EMAIL PROTECTED] title:File System Architect tel;work:650/786-4937 x-mozilla-html:FALSE version:2.1 end:vcard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: PSARC 2006/288 zpool history
On May 3, 2006, at 15:21, eric kustarz wrote: There's basically two writes that need to happen: one for time and one for the subcommand string. The kernel just needs to make sure if a write completes, the data is parseable (has a delimiter). Its then up to the userland parser (zpool history) to figure out if there are incomplete records, but the kernel guarantees it is parseable. The userland command only prints out complete records, nothing partial. So the userland command needs to handle if say for one record that the time entry was written but the subcommand was not. I'm not clear on how the parser knows enough to do that. I believe I saw that a record looked like \0arbitrary number of NUL-terminated strings If this is correct, how can the parser know if a string (or part of one) got dropped? I think this might be a case where a structured record (like the compact XML suggestion made earlier) would help. At least having distinguished start and end markers (whether they be one byte each, or XML constructs) for a record looks necessary to me. --Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss