Re: [zfs-discuss] This mailing list EOL???
mail-archive.com is an independent third party. This is one of their FAQ's http://www.mail-archive.com/faq.html#duration The Mail Archive has been running since 1998. Archiving services are planned to continue indefinitely. We do not plan on ever needing to remove archived material. Do not, however, misconstrue these intentions with a warranty of any kind. We reserve the right to discontinue service at any time. From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Deirdre Straughan Sent: Wednesday, March 20, 2013 5:16 PM To: Cindy Swearingen; zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] This mailing list EOL??? Will the archives of all the lists be preserved? I don't think we've seen a clear answer on that (it's possible you haven't, either!). On Wed, Mar 20, 2013 at 2:14 PM, Cindy Swearingen mailto:cindy.swearin...@oracle.com>> wrote: Hi Ned, This list is migrating to java.net<http://java.net> and will not be available in its current form after March 24, 2013. The archive of this list is available here: http://www.mail-archive.com/zfs-discuss@opensolaris.org/ I will provide an invitation to the new list shortly. Thanks for your patience. Cindy On 03/20/13 15:05, Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) wrote: I can't seem to find any factual indication that opensolaris.org<http://opensolaris.org> mailing lists are going away, and I can't even find the reference to whoever said it was EOL in a few weeks ... a few weeks ago. So ... are these mailing lists going bye-bye? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org<mailto:zfs-discuss@opensolaris.org> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org<mailto:zfs-discuss@opensolaris.org> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- best regards, Deirdré Straughan Community Architect, SmartOS illumos Community Manager cell 720 371 4107 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] This mailing list EOL???
I can't seem to find any factual indication that opensolaris.org mailing lists are going away, and I can't even find the reference to whoever said it was EOL in a few weeks ... a few weeks ago. So ... are these mailing lists going bye-bye? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] What would be the best tutorial cum reference doc for ZFS
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Hans J. Albertsson > > I'm looking for something that would make me afterwards understand what, > say, commands like zpool import ... or zfs send ... actually do, and > some idea as to why, so I can begin to understand ZFS in a way that > allows me to make educated guesses on how to perform tasks I haven't > tried before. man zpool man zfs And the ZFS Best Practices Guide And the ZFS Evil (I forget what it's called, performance tuning? just search for evil, you'll find it.) But almost everything is literally in the man pages. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] partioned cache devices
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Andrew Werchowiecki > > muslimwookie@Pyzee:~$ sudo zpool add aggr0 cache c25t10d1p2 > Password: > cannot open '/dev/dsk/c25t10d1p2': I/O error > muslimwookie@Pyzee:~$ > > I have two SSDs in the system, I've created an 8gb partition on each drive for > use as a mirrored write cache. I also have the remainder of the drive > partitioned for use as the read only cache. However, when attempting to add > it I get the error above. Sounds like you're probably running into confusion about how to partition the drive. If you create fdisk partitions, they will be accessible as p0, p1, p2, but I think p0 unconditionally refers to the whole drive, so the first partition is p1, and the second is p2. If you create one big solaris fdisk parititon and then slice it via "partition" where s2 is typically the encompassing slice, and people usually use s1 and s2 and s6 for actual slices, then they will be accessible via s1, s2, s6 Generally speaking, it's unadvisable to split the slog/cache devices anyway. Because: If you're splitting it, evidently you're focusing on the wasted space. Buying an expensive 128G device where you couldn't possibly ever use more than 4G or 8G in the slog. But that's not what you should be focusing on. You should be focusing on the speed (that's why you bought it in the first place.) The slog is write-only, and the cache is a mixture of read/write, where it should be hopefully doing more reads than writes. But regardless of your actual success with the cache device, your cache device will be busy most of the time, and competing against the slog. You have a mirror, you say. You should probably drop both the cache & log. Use one whole device for the cache, use one whole device for the log. The only risk you'll run is: Since a slog is write-only (except during mount, typically at boot) it's possible to have a failure mode where you think you're writing to the log, but the first time you go back and read, you discover an error, and discover the device has gone bad. In other words, without ever doing any reads, you might not notice when/if the device goes bad. Fortunately, there's an easy workaround. You could periodically (say, once a month) script the removal of your log device, create a junk pool, write a bunch of data to it, scrub it (thus verifying it was written correctly) and in the absence of any scrub errors, destroy the junk pool and re-add the device as a slog to the main pool. I've never heard of anyone actually being that paranoid, and I've never heard of anyone actually experiencing the aforementioned possible undetected device failure mode. So this is all mostly theoretical. Mirroring the slog device really isn't necessary in the modern age. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] maczfs / ZEVO
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Edward Ned Harvey > > Tim, Simon, Volker, Chris, and Erik - How do you use it? > I am making the informed guess, that you're using it primarily on non- > laptops, which have second hard drives, and you're giving the entire disk to > the zpool. Right? Perhaps it works fine for whole disks, or even partitions, but with my file-backed pool, the performance was terrible. Everything else I could work around ... lack of zvol, inability to import during reboot ... But the performance problem was significant enough for me to scrap it and go back to normal. Oh well. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs-discuss mailing list & opensolaris EOL
> From: Tim Cook [mailto:t...@cook.ms] > > We can agree to disagree. > > I think you're still operating under the auspices of Oracle wanting to have an > open discussion. This is patently false. I'm just going to respond to this by saying thank you, Cindy, Casper, Neil, and others, for all the help over the years. I think we all agree it was cooler when opensolaris was open, but things are beyond our control, so be it. Moving forward, I don't expect Oracle to be any more open than MS or Apple or Google, which is to say, I understand there's stuff you can't talk about, and support you can't give freely or openly. But to the extent you're still able to discuss publicly known things, thank you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs-discuss mailing list & opensolaris EOL
> From: Tim Cook [mailto:t...@cook.ms] > > Why would I spend all that time and > energy participating in ANOTHER list controlled by Oracle, when they have > shown they have no qualms about eliminating it with basically 0 warning, at > their whim? >From an open source, community perspective, I understand and agree with this >sentiment. If OSS projects behave this way, they die. The purpose of an >oracle-hosted mailing list is not for the sake of being open in any way. It's >for the sake of allowing public discussions about their product. While a >certain amount of knowledge will exist with or without the list (people can >still download solaris 11 for evaluation purposes and test it out on the honor >system) there will be less oracle-specific knowledge in existence without the >list. For anyone who's 100% dedicated to OSS and/or illumos and doesn't care >about oracle-specific stuff, there's no reason to use that list. But for >those of us who are sysadmins, developers using eval-licensed solaris, or in >any way not completely closed to the possibility of using oracle zfs / >solaris... For those of us, it makes sense. Guess what, I formerly subscribed to netapp-toasters as well. Until zfs came along and I was able to happily put netapp in my past. Perhaps someday I'll leave zfs behind in favor of btrfs. But not yet. Guess what also, there is a very active thriving Microsoft forum out there too. And they don't even let you download MS Office or Windows for evaluation purposes - they're even more closed than Oracle in this regard. They learned their lesson about piracy and the honor system. ;-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] maczfs / ZEVO
> From: Tim Cook [mailto:t...@cook.ms] > Sent: Friday, February 15, 2013 11:14 AM > > I have a few coworkers using it. No horror stories and it's been in use > about 6 > months now. If there were any showstoppers I'm sure I'd have heard loud > complaints by now :) So, I have discovered a *couple* of unexpected problems. At first, I thought it would be nice to split my HD into 2 partitions, use the 2nd partition for zpool, and use vmdk wrapper around a zvol raw device. So I started partitioning my HD. As it turns out, there's a bug in diskutility... As long as you partition your hard drive and *format* the second partition with hfs+, then it works very smoothly. But then I couldn't find any way to dismount the second partition (there is no eject) ... If I go back, I think maybe I'll figure it out, but I didn't try too hard ... I resized back to normal, and then split again, selecting the "Empty Space" option for the second partition. Bad idea. Diskutillity horked the partition tables, and I had to restore from time machine. I thought maybe it was just a fluke, so I repeated the whole process a second time ... try to split disk, try to make the second half "Free Space" and forced to restore system. Lesson learned. Don't try to create an unused partition on the mac HD. So then I just created one big honking file via "dd" and used it for zpool store. Tried to create zvol. Unfortunately zevo doesn't do zvol. Ok, no problem. Windows can run NTFS inside a vmdk file inside a zfs filesystem inside an hfs+ file inside the hfs+ filesystem. (Yuk.) But it works. Unfortunately, because it's a file in the backend, zevo doesn't find the pool on reboot. It doesn't seem to do the equivalent of a zpool.cache. I've asked a question in their support forum to see if there's some way to solve that problem, but I don't know yet. Tim, Simon, Volker, Chris, and Erik - How do you use it? I am making the informed guess, that you're using it primarily on non-laptops, which have second hard drives, and you're giving the entire disk to the zpool. Right? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs-discuss mailing list & opensolaris EOL
> From: cindy swearingen [mailto:cindy.swearin...@gmail.com] > > This was new news to use too and we're just talking over some options > yesterday > afternoon so please give us a chance to regroup and provide some > alternatives. > > This list will be shutdown but we can start a new one on java.net. Thanks Cindy - I, for one, am in favor of another list on java.net, because the development is basically split into oracle & illumos. While illumos users might have a small aversion to using another oracle list, I think oracle users will likely have a much larger aversion to using a non-oracle list. So I think there's room for both lists, as well as just cause for both lists. If at all possible, I would advise preserving the history of these mailing lists. Extremely useful sometimes, when referencing past conversations and stuff, and searching for little tidbits via google. I would also advise making some sort of announcement on any of the other opensolaris mailing lists that happen to be active. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs-discuss mailing list & opensolaris EOL
> From: Bob Friesenhahn [mailto:bfrie...@simple.dallas.tx.us] > > Good for you. I am sure that Larry will be contacting you soon. hehehehehe... he knows better. ;-) > Previously Oracle announced and invited people to join their > discussion forums, which are web-based and virtually dead. Invited people with paid support contracts. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs-discuss mailing list & opensolaris EOL
> From: sriram...@gmail.com [mailto:sriram...@gmail.com] On Behalf Of > Sriram Narayanan > > Or, given that this is a weekend, we assume that someone within Oracle > would see this mail only on Monday morning Pacific Time, then send out > some mails within, and be able to respond in public only by Wednesday > evening Pacific Time at best. I remembered to take that into account. Question was posted Friday morning, EST. And not every oracle employee subscribes here with their work email address. Nor does everyone limit themselves to conversing in the community during only business hours. Don't forget Monday's a holiday. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs-discuss mailing list & opensolaris EOL
> From: Tim Cook [mailto:t...@cook.ms] > > That would be the logical decision, yes. Not to poke fun, but did you really > expect an official response after YEARS of nothing from Oracle? This is the > same company that refused to release any Java patches until the DHS issued > a national warning suggesting that everyone uninstall Java. Well, yes. We do have oracle employees who contribute to this mailing list. It is not accurate or fair to stereotype the whole company. Oracle by itself is as large as some cities or countries. I can understand a company policy of secrecy about development direction and stuff like that. I would think somebody would be able to officially confirm or deny that this mailing list is going to stop. At least one of their system administrators lurks here... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs-discuss mailing list & opensolaris EOL
In the absence of any official response, I guess we just have to assume this list will be shut down, right? So I guess we just have to move to the illumos mailing list, as Deirdre suggests? From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) Sent: Friday, February 15, 2013 11:00 AM To: zfs-discuss@opensolaris.org Subject: [zfs-discuss] zfs-discuss mailing list & opensolaris EOL So, I hear, in a couple weeks' time, opensolaris.org is shutting down. What does that mean for this mailing list? Should we all be moving over to something at illumos or something? I'm going to encourage somebody in an official capacity at opensolaris to respond... I'm going to discourage unofficial responses, like, illumos enthusiasts etc simply trying to get people to jump this list. Thanks for any info ... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] maczfs / ZEVO
Anybody using maczfs / ZEVO? Have good or bad things to say, in terms of reliability, performance, features? My main reason for asking is this: I have a mac, I use Time Machine, and I have VM's inside. Time Machine, while great in general, has the limitation of being unable to intelligently identify changed bits inside a VM file. So you have to exclude the VM from Time Machine, and you have to run backup software inside the VM. I would greatly prefer, if it's reliable, to let the VM reside on ZFS and use zfs send to backup my guest VM's. I am not looking to replace HFS+ as the primary filesystem of the mac; although that would be cool, there's often a reliability benefit to staying on the supported, beaten path, standard configuration. But if ZFS can be used to hold the guest VM storage reliably, I would benefit from that. Thanks... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs-discuss mailing list & opensolaris EOL
So, I hear, in a couple weeks' time, opensolaris.org is shutting down. What does that mean for this mailing list? Should we all be moving over to something at illumos or something? I'm going to encourage somebody in an official capacity at opensolaris to respond... I'm going to discourage unofficial responses, like, illumos enthusiasts etc simply trying to get people to jump this list. Thanks for any info ... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] how to know available disk space
> From: Pasi Kärkkäinen [mailto:pa...@iki.fi] > > What's the correct way of finding out what actually uses/reserves that 1023G > of FREE in the zpool? Maybe this isn't exactly what you need, but maybe: for fs in `zfs list -H -o name` ; do echo $fs ; zfs get reservation,refreservation,usedbyrefreservation $fs ; done > At this point the filesystems are full, and it's not possible to write to them > anymore. You'll have to either reduce your reservations, or destroy old snapshots. Or add more disks. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] how to know available disk space
> From: Gregg Wonderly [mailto:gregg...@gmail.com] > > This is one of the greatest annoyances of ZFS. I don't really understand how, > a zvol's space can not be accurately enumerated from top to bottom of the > tree in 'df' output etc. Why does a "zvol" divorce the space used from the > root of the volume? The way I would say that is: Intuitively, I think people expect reservations to count against Alloc and Used. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] how to know available disk space
I have a bunch of VM's, and some samba shares, etc, on a pool. I created the VM's using zvol's, specifically so they would have an appropriate refreservation and never run out of disk space, even with snapshots. Today, I ran out of disk space, and all the VM's died. So obviously it didn't work. When I used "zpool status" after the system crashed, I saw this: NAME SIZE ALLOC FREE EXPANDSZCAP DEDUP HEALTH ALTROOT storage 928G 568G 360G -61% 1.00x ONLINE - I did some cleanup, so I could turn things back on ... Freed up about 4G. Now, when I use "zpool status" I see this: NAME SIZE ALLOC FREE EXPANDSZCAP DEDUP HEALTH ALTROOT storage 928G 564G 364G -60% 1.00x ONLINE - When I use "zfs list storage" I see this: NAME USED AVAIL REFER MOUNTPOINT storage 909G 4.01G 32.5K /storage So I guess the lesson is (a) refreservation and zvol alone aren't enough to ensure your VM's will stay up. and (b) if you want to know how much room is *actually* available, as in "usable," as in, "how much can I write before I run out of space," you should use "zfs list" and not "zpool status" ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Scrub performance
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Edward Ned Harvey > > I can tell you I've had terrible everything rates when I used dedup. So, the above comment isn't fair, really. The truth is here: http://mail.opensolaris.org/pipermail/zfs-discuss/2011-July/049209.html ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Scrub performance
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Koopmann, Jan-Peter > > all I can tell you is that I've had terrible scrub rates when I used dedup. I can tell you I've had terrible everything rates when I used dedup. > The > DDT was a bit too big to fit in my memory (I assume according to some very > basic debugging). This is more or less irrelevant, becuase the system doesn't load it into memory anyway. It will cache a copy in ARC just like everything else in the pool. It gets evicted just as quickly as everything else. > Only two of my datasets were deduped. On scrubs and > resilvers I noticed that sometimes I had terrible rates with < 10MB/sec. Then > later it rose up to < 70MB/sec. After upgrading some discs (same speeds > observed) I got rid of the deduped datasets (zfs send/receive them) and > guess what: All of the sudden scrub goes to 350MB/sec steady and only take > a fraction of the time. Are you talking about scrub rates for the complete scrub? Because if you sit there and watch it, from minute to minute, it's normal for it to bounce really low for a long time, and then really high for a long time, etc. The only measurement that has any real meaning is time to completion. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE: Un-dedup for unique blocks
> From: Robert Milkowski [mailto:rmilkow...@task.gda.pl] > > That is one thing that always bothered me... so it is ok for others, like > Nexenta, to keep stuff closed and not in open, while if Oracle does it they > are bad? Oracle, like Nexenta, and my own company CleverTrove, and Microsoft, and Netapp, has every right to close source development, if they believe it's beneficial to their business. For all we know, Oracle might not even have a choice about it - it might have been in the terms of settlement with NetApp (because open source ZFS definitely hurt NetApp business.) The real question is, in which situations, is it beneficial to your business to be closed source, as opposed to open source? There's the whole redhat/centos dichotomy. At first blush, it would seem redhat gets screwed by centos (or oracle linux) but then you realize how many more redhat derived systems are out there, compared to suse, etc. By allowing people to use it for free, it actually gains popularity, and then redhat actually has a successful support business model as compared to suse, which tanked. But it's useless to argue about whether oracle's making the right business choice, whether open or closed source is better for their business. Cuz it's their choice, regardless who agrees. Arguing about it here isn't going to do any good. Those of us who gained something and no longer count on having that benefit moving forward have a tendency to say "You gave it to me for free before, now I'm pissed off because you're not giving it to me for free anymore." instead of "thanks for what you gave before." The world moves on. There's plenty of time to figure out which solution is best for you, the consumer, in the future product offerings: commercial closed source product offering, open source product offering, or something completely different such as btrfs. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE: Un-dedup for unique blocks
> From: Gary Mills [mailto:gary_mi...@fastmail.fm] > > > In solaris, I've never seen it swap out idle processes; I've only > > seen it use swap for the bad bad bad situation. I assume that's all > > it can do with swap. > > You would be wrong. Solaris uses swap space for paging. Paging out > unused portions of an executing process from real memory to the swap > device is certainly beneficial. Swapping out complete processes is a > desperation move, but paging out most of an idle process is a good > thing. You seem to be emphasizing the distinction between swapping and paging. My point though, is that I've never seen the swap usage (which is being used for paging) on any solaris derivative to be used nonzero, for the sake of keeping something in cache. It seems to me, that solaris will always evict all cache memory before it swaps (pages) out even the most idle process memory. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE: Un-dedup for unique blocks
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Nico Williams > > As for swap... really, you don't want to swap. If you're swapping you > have problems. For clarification, the above is true in Solaris and derivatives, but it's not universally true for all OSes. I'll cite linux as the example, because I know it. If you provide swap to a linux kernel, it considers this a degree of freedom when choosing to evict data from the cache, versus swapping out idle processes (or zombie processes.) As long as you swap out idle process memory that is colder than some cache memory, swap actually improves performance. But of course, if you have any active process starved of ram and consequently thrashing swap actively, of course, you're right. It's bad bad bad to use swap that way. In solaris, I've never seen it swap out idle processes; I've only seen it use swap for the bad bad bad situation. I assume that's all it can do with swap. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE: Un-dedup for unique blocks
> From: Darren J Moffat [mailto:darr...@opensolaris.org] > > Support for SCSI UNMAP - both issuing it and honoring it when it is the > backing store of an iSCSI target. When I search for scsi unmap, I come up with all sorts of documentation that ... is ... like reading a medical journal when all you want to know is the conversion from 98.6F to C. Would you mind momentarily, describing what SCSI UNMAP is used for? If I were describing to a customer (CEO, CFO) I'm not going to tell them about SCSI UNMAP, I'm going to say the new system has a new feature that enables ... or solves the ___ problem... Customer doesn't *necessarily* have to be as clueless as CEO/CFO. Perhaps just another IT person, or whatever. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE: Un-dedup for unique blocks
> From: Sašo Kiselkov [mailto:skiselkov...@gmail.com] > > as far as incompatibility among products, I've yet to come > across it I was talking about ... install solaris 11, and it's using a new version of zfs that's incompatible with anything else out there. And vice-versa. (Not sure if feature flags is the default, or zpool 28 is the default, in various illumos-based distributions. But my understanding is that once you upgrade to feature flags, you can't go back to 28. Which means, mutually, anything >28 is incompatible with each other.) You have to typically make a conscious decision and plan ahead, and intentionally go to zpool 28 and no higher, if you want compatibility between systems. > Let us know at z...@lists.illumos.org how that goes, perhaps write a blog > post about your observations. I'm sure the BTRFS folks came up with some > neat ideas which we might learn from. Actually - I've written about it before (but it'll be difficult to find, and nothing earth shattering, so not worth the search.) I don't think there's anything that zfs developers don't already know. Basic stuff like fsck, and ability to shrink and remove devices, those are the things btrfs has and zfs doesn't. (But there's lots more stuff that zfs has and btrfs doesn't. Just making sure my previous comment isn't seen as a criticism of zfs, or a judgement in favor of btrfs.) And even with a new evaluation, the conclusion can't be completely clear, nor immediate. Last evaluation started about 10 months ago, and we kept it in production for several weeks or a couple of months, because it appeared to be doing everything well. (Except for features that were known to be not-yet implemented, such as read-only snapshots (aka quotas) and btrfs-equivalent of "zfs send.") Problem was, the system was unstable, crashing about once a week. No clues why. We tried all sorts of things in kernel, hardware, drivers, with and without support, to diagnose and capture the cause of the crashes. Then one day, I took a blind stab in the dark (for the ninetieth time) and I reformatted the storage volume ext4 instead of btrfs. After that, no more crashes. That was approx 8 months ago. I think the only thing I could learn upon a new evaluation is: #1 I hear "btrfs send" is implemented now. I'd like to see it with my own eyes before I believe it. #2 I hear quotas (read-only snapshots) are implemented now. Again, I'd like to see it before I believe it. #3 Proven stability. Never seen it yet with btrfs. Want to see it with my eyes and stand the test of time before it earns my trust. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE: Un-dedup for unique blocks
> From: Richard Elling [mailto:richard.ell...@gmail.com] > > I disagree the ZFS is developmentally challenged. As an IT consultant, 8 years ago before I heard of ZFS, it was always easy to sell Ontap, as long as it fit into the budget. 5 years ago, whenever I told customers about ZFS, it was always a quick easy sell. Nowadays, anybody who's heard of it says they don't want it, because they believe it's a dying product, and they're putting their bets on linux instead. I try to convince them otherwise, but I'm trying to buck the word on the street. They don't listen, however much sense I make. I can only sell ZFS to customers nowadays, who have still never heard of it. "Developmentally challenged" doesn't mean there is no development taking place. It means the largest development effort is working closed-source, and not available for free (except some purposes), so some consumers are going to follow their path, while others are going to follow the open source branch illumos path, which means both disunity amongst developers and disunity amongst consumers, and incompatibility amongst products. So far, in the illumos branch, I've only seen bugfixes introduced since zpool 28, no significant introduction of new features. (Unlike the oracle branch, which is just as easy to sell as ontap). Which presents a challenge. Hence the term, "challenged." Right now, ZFS is the leading product as far as I'm concerned. Better than MS VSS, better than Ontap, better than BTRFS. It is my personal opinion that one day BTRFS will eclipse ZFS due to oracle's unsupportive strategy causing disparity and lowering consumer demand for zfs, but of course, that's just a personal opinion prediction for the future, which has yet to be seen. So far, every time I evaluate BTRFS, it fails spectacularly, but the last time I did, was about a year ago. I'm due for a BTRFS re-evaluation now. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE: Un-dedup for unique blocks
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Nico Williams > > To decide if a block needs dedup one would first check the Bloom > filter, then if the block is in it, use the dedup code path, else the > non-dedup codepath and insert the block in the Bloom filter. Sorry, I didn't know what a Bloom filter was before I replied before - Now I've read the wikipedia article and am consequently an expert. *sic* ;-) It sounds like, what you're describing... The first time some data gets written, it will not produce a hit in the Bloom filter, so it will get written to disk without dedup. But now it has an entry in the Bloom filter. So the second time the data block gets written (the first duplicate) it will produce a hit in the Bloom filter, and consequently get a dedup DDT entry. But since the system didn't dedup the first one, it means the second one still needs to be written to disk independently of the first one. So in effect, you'll always "miss" the first duplicated block write, but you'll successfully dedup n-1 duplicated blocks. Which is entirely reasonable, although not strictly optimal. And sometimes you'll get a false positive out of the Bloom filter, so sometimes you'll be running the dedup code on blocks which are actually unique, but with some intelligently selected parameters such as Bloom table size, you can get this probability to be reasonably small, like less tha n 1%. In the wikipedia article, they say you can't remove an entry from the Bloom filter table, which would over time cause consistent increase of false positive probability (approaching 100% false positives) from the Bloom filter and consequently high probability of dedup'ing blocks that are actually unique; but with even a minimal amount of thinking about it, I'm quite sure that's a solvable implementation detail. Instead of storing a single bit for each entry in the table, store a counter. Every time you create a new entry in the table, increment the different locations; every time you remove an entry from the table, decrement. Obviously a counter requires more bits than a bit, but it's a linear increase of size, exponential increase of utility, and within the implementation limits of available hardware. But there may be a more intelligent way of accomplishing the same goal. (Like I said, I've only thought about this minimally). Meh, well. Thanks for the interesting thought. For whatever it's worth. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] iSCSI access patterns and possible improvements?
> From: Richard Elling [mailto:richard.ell...@gmail.com] > Sent: Saturday, January 19, 2013 5:39 PM > > the space allocation more closely resembles a variant > of mirroring, > like some vendors call "RAID-1E" Awesome, thank you. :-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RFE: Un-dedup for unique blocks
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Nico Williams > > I've wanted a system where dedup applies only to blocks being written > that have a good chance of being dups of others. > > I think one way to do this would be to keep a scalable Bloom filter > (on disk) into which one inserts block hashes. > > To decide if a block needs dedup one would first check the Bloom > filter, then if the block is in it, use the dedup code path, How is this different or better than the existing dedup architecture? If you found that some block about to be written in fact matches the hash of an existing block on disk, then you've already determined it's a duplicate block, exactly as you would, if you had dedup enabled. In that situation, gosh, it sure would be nice to have the extra information like reference count, and pointer to the duplicate block, which exists in the dedup table. In other words, exactly the way existing dedup is already architected. > The nice thing about this is that Bloom filters can be sized to fit in > main memory, and will be much smaller than the DDT. If you're storing all the hashes of all the blocks, how is that going to be smaller than the DDT storing all the hashes of all the blocks? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Resilver w/o errors vs. scrub with errors
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Jim Klimov > > And regarding the "considerable activity" - AFAIK there is little way > for ZFS to reliably read and test "TXGs newer than X" My understanding is like this: When you make a snapshot, you're just creating a named copy of the present latest TXG. When you zfs send incremental from one snapshot to another, you're creating the delta between two TXG's, that happen to have names. So when you break a mirror and resilver, it's exactly the same operation as an incremental zfs send, it needs to calculate the delta between the latest (older) TXG on the previously UNAVAIL device, up to the latest TXG on the current pool. Yes this involves examining the meta tree structure, and yes the system will be very busy while that takes place. But the work load is very small relative to whatever else you're likely to do with your pool during normal operation, because that's the nature of the meta tree structure ... very small relative to the rest of your data. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Resilver w/o errors vs. scrub with errors
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Stephan Budach > > I am always experiencing chksum errors while scrubbing my zpool(s), but > I never experienced chksum errors while resilvering. Does anybody know > why that would be? When you resilver, you're not reading all the data on all the drives. Only just enough to resilver, which doesn't include all the data that was previously in-sync (maybe a little of it, but mostly not). Even if you have a completely failed drive, replaced with a completely new empty drive, if you have a 3-way mirror, you only need to read one good copy of the data in order to write the resilver'd data onto the new drive. So you could still be failing to detect cksum errors on the *other* side of the mirror, which wasn't read during the resilver. What's more, when you resilver, the system is just going to write the target disk. Not go back and verify every written block of the target disk. So, think of a scrub as a "complete, thorough, resilver" whereas "resilver" is just a lightweight version, doing only the parts that are known to be out-of sync, and without subsequent read verification. > This happens on all of my servers, Sun Fire 4170M2, > Dell PE 650 and on any FC storage that I have. While you apparently have been able to keep the system in production for a while, consider yourself lucky. You have a real problem, and solving it probably won't be easy. Your problem is either hardware, firmware, or drivers. If you have a support contract on the Sun, I would recommend starting there. Because the Dell is definitely a configuration that you won't find official support for - just a lot of community contributors, who will likely not provide a super awesome answer for you super soon. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] iSCSI access patterns and possible improvements?
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Bob Friesenhahn > > If almost all of the I/Os are 4K, maybe your ZVOLs should use a > volblocksize of 4K? This seems like the most obvious improvement. Oh, I forgot to mention - The above logic only makes sense for mirrors and stripes. Not for raidz (or raid-5/6/dp in general) If you have a pool of mirrors or stripes, the system isn't forced to subdivide a 4k block onto multiple disks, so it works very well. But if you have a pool blocksize of 4k and let's say a 5-disk raidz (capacity of 4 disks) then the 4k block gets divided into 1k on each disk and 1k parity on the parity disk. Now, since the hardware only supports block sizes of 4k ... You can see there's a lot of wasted space, and if you do a bunch of it, you'll also have a lot of wasted time waiting for seeks/latency. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] poor CIFS and NFS performance
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Eugen Leitl > > I have a pool of 8x ST31000340AS on an LSI 8-port adapter as > a raidz3 (no compression nor dedup) with reasonable bonnie++ > 1.03 values, e.g. 145 MByte/s Seq-Write @ 48% CPU and 291 MByte/s > Seq-Read @ 53% CPU. For 8-disk raidz3 (effectively 5 disks) I would expect approx 640MB/s for both seq read and seq write. The first halving (from 640 down to 291) could maybe be explained by bottlenecking through a single HBA or something like that, so I wouldn't be too concerned about that. But the second halving, from 291 down to 145 ... A single disk should do 128MB/sec no problem, so the whole pool writing at only 145MB/sec sounds wrong to me. But as you said ... This isn't the area of complaint... Moving on, you can start a new discussion about this if you want to later... > My problem is pretty poor network throughput. An NFS > mount on 12.04 64 bit Ubuntu (mtu 9000) or CIFS are > read at about 23 MBytes/s. Windows 7 64 bit (also jumbo > frames) reads at about 65 MBytes/s. The highest transfer > speed on Windows just touches 90 MByte/s, before falling > back to the usual 60-70 MBytes/s. > > Does anyone have any suggestions on how to debug/optimize > throughput? The first thing I would do is build another openindiana box and try NFS / CIFS to/from it. See how it behaves. Whenever I've seen this sort of problem before, it was version incompatibility requiring tweaks between the client and server. I don't know which version of samba / solaris cifs is being used ... But at some point in history (win7), windows transitioned from NTLM v1 to v2, and at that point, all the older servers became 4x slower with the new clients, but if you built a new server with the new clients, then the old version was 4x slower than the new. Not to mention, I've had times when I couldn't even get linux & solars to *talk* to each other over NFS, due to version differences, nevermind tweak all the little performance knobs. So my advice is to first eliminate any question about version / implementation differences, and see where that takes you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs receive options (was S11 vs illumos zfs compatiblity)
> From: Cindy Swearingen [mailto:cindy.swearin...@oracle.com] > > Which man page are you referring to? > > I see the zfs receive -o syntax in the S11 man page. Oh ... It's the latest openindiana. So I suppose it must be a new feature post-rev-28 in the non-open branch... But it's no big deal. I found that if I "zfs create" and then "zfs set" a few times, and then "zfs receive" I get the desired behavior. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs receive options (was S11 vs illumos zfs compatiblity)
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Edward Ned Harvey > > zfs send foo/bar@42 | zfs receive -o compression=on,sync=disabled biz/baz > > I have not yet tried this syntax. Because you mentioned it, I looked for it > in > the man page, and because it's not there, I hesitate before using it. Also, readonly=on ... and ... Bummer. When I try zfs receive with -o, I get the message: invalid option 'o' ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs receive options (was S11 vs illumos zfs compatiblity)
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of bob netherton > > You can, with recv, override any property in the sending stream that can > be > set from the command line (ie, a writable). > > # zfs send repo/support@cpu-0412 | zfs recv -o version=4 repo/test > cannot receive: cannot override received version Are you sure you can do this with other properties? It's not in the man page. I would like to set the compression & sync on the receiving end: zfs send foo/bar@42 | zfs receive -o compression=on,sync=disabled biz/baz I have not yet tried this syntax. Because you mentioned it, I looked for it in the man page, and because it's not there, I hesitate before using it. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] S11 vs illumos zfs compatiblity
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Bob Netherton > > At this point, the only thing would be to use 11.1 to create a new pool at > 151's > version (-o version=) and top level dataset (-O version=). Recreate the file > system hierarchy and do something like an rsync. I don't think there is > anything more elegant, I'm afraid. Is that right? You can't use zfs send | zfs receive to send from a newer version and receive on an older version? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] The format command crashes on 3TB disk but zpool create ok
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of sol > > I added a 3TB Seagate disk (ST3000DM001) and ran the 'format' command but > it crashed and dumped core. > > However the zpool 'create' command managed to create a pool on the whole > disk (2.68 TB space). > > I hope that's only a problem with the format command and not with zfs or > any other part of the kernel. Suspicion and conjecture only: I think format uses a fdisk label, which has a 2T limit. Normally it's advised to use the whole disk directly via zpool anyway, so hopefully that's a good solution for you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] any more efficient way to transfer snapshot between two hosts than ssh tunnel?
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Fred Liu > > BTW, anyone played NDMP in solaris? Or is it feasible to transfer snapshot via > NDMP protocol? I've heard you could, but I've never done it. Sorry I'm not much help, except as a cheer leader. You can do it! I think you can! Don't give up! heheheheh Please post back whatever you find, or if you have to figure it out for yourself, then blog about it and post that. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Remove disk
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Freddie Cash > > On Thu, Dec 6, 2012 at 12:35 AM, Albert Shih wrote: > Le 01/12/2012 ? 08:33:31-0700, Jan Owoc a ?crit > > > 2) replace the disks with larger ones one-by-one, waiting for a > > resilver in between > > This is the point I don't see how to do it. I've 48 disk actually from > /dev/da0 -> /dev/da47 (I'm under FreeBSD 9.0) lets say 3To. You have 12 x 2T disks in a raidz2, and you want to replace those disks with 4T each. Right? Start with a scrub. Wait for it to complete. Ensure you have no errors. sudo format -e < /dev/null > before.txt Then "zpool offline" one disk. Pull it out and stick a new 4T disk in its place. "devfsadm -Cv" to recognize the new disk. sudo format -e < /dev/null > after.txt diff before.txt after.txt You should see one device disappeared, and a new one was created. Now "zpool replace" to replace the old disk with the new disk. "zpool status" should show the new drive resilvering. Wait for the resilver to finish. Repeat 11 more times. Replace each disk, one at a time, with resilver in between. When you're all done, it might expand to the new size automatically, or you might need to play with the "autoexpand" property to make use of the new storage space. What percentage full is your pool? When you're done, please write back to tell us how much time this takes. I predict it will take a very long time, and I'm curious to know exactly how much. Before you start, I'm going to guess ... 80% full, and 7-10 days to resilver each drive. So the whole process will take you a few months to complete. (That's the disadvantage of a bunch of disks in a raidzN configuration.) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] query re disk mirroring
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Enda o'Connor - Oracle Ireland - > > Say I have an ldoms guest that is using zfs root pool that is mirrored, > and the two sides of the mirror are coming from two separate vds > servers, that is > mirror-0 >c3d0s0 >c4d0s0 > > where c3d0s0 is served by one vds server, and c4d0s0 is served by > another vds server. > > Now if for some reason, this physical rig loses power, then how do I > know which side of the mirror to boot off, ie which side is most recent. If one storage host goes down, it should be no big deal, one side of the mirror becomes degraded, and later when it comes up again, it resilvers. If one storage host goes down, and the OS continues running for a while and then *everything* goes down, later you bring up both sides of the storage, and bring up the OS, and the OS will know which side is more current because of the higher TXG. So the OS will resilver the old side. If one storage host goes down, and the OS continues running for a while and then *everything* goes down... Later you bring up only one half of the storage, and bring up the OS. Then the pool will refuse to mount, because with missing devices, it doesn't know if maybe the other side is more current. As long as one side of the mirror disappears and reappears while the OS is still running, no problem. As long as all the devices are present during boot, no problem. Only problem is when you try to boot from one side of a broken mirror. If you need to do this, you should mark the broken mirror as broken before shutting down - Certainly detach would do the trick. Perhaps "offline" might also do the trick. Does that answer it? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on SunFire X2100M2 with hybrid pools
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Jim Klimov > > this is > the part I am not certain about - it is roughly as cheap to READ the > gzip-9 datasets as it is to read lzjb (in terms of CPU decompression). Nope. I know LZJB is not LZO, but I'm starting from a point of saying that LZO is specifically designed to be super-fast, low-memory for decompression. (As claimed all over the LZO webpage, as well as wikipedia, and supported by my own personal experience using lzop). So for comparison to LZJB, see here: http://denisy.dyndns.org/lzo_vs_lzjb/ LZJB is, at least according to these guys, even faster than LZO. So I'm confident concluding that lzjb (default) decompression is significantly faster than zlib (gzip) decompression. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Question about degraded drive
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Chris Dunbar - Earthside, LLC > > # zpool replace tank c11t4d0 > # zpool clear tank I would expect this to work, or detach/attach. You should scrub periodically, and ensure no errors after scrub. But the really good question is why does the device go offline? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Question about degraded drive
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Freddie Cash > > And you can try 'zpool online' on the failed drive to see if it comes back > online. Be cautious here - I have an anecdote, which might represent a trend in best practice, or it might just be an anecdote. At least once, I had an iscsi device go offline, and then I "zpool online"d the device, and it seemed to work - resilvered successfully, zpool status showed clean, I'm able to zfs send and zfs receive. But for normal usage (go in and actually use the files in the pool) it was never usable again. I don't know the root cause right now. Maybe it's iscsi related. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on SunFire X2100M2 with hybrid pools
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Jim Klimov > > I really hope someone better versed in compression - like Saso - > would chime in to say whether gzip-9 vs. lzjb (or lz4) sucks in > terms of read-speeds from the pools. My HDD-based assumption is > in general that the less data you read (or write) on platters - > the better, and the spare CPU cycles can usually take the hit. Oh, I can definitely field that one - The lzjb compression (default compression as long as you just turn compression on without specifying any other detail) is very fast compression, similar to lzo. It generally has no noticeable CPU overhead, but it saves you a lot of time and space for highly repetitive things like text files (source code) and sparse zero-filled files and stuff like that. I personally always enable this. "compresson=on" zlib (gzip) is more powerful, but *way* slower. Even the fastest level gzip-1 uses enough CPU cycles that you probably will be CPU limited rather than IO limited. There are very few situations where this option is better than the default lzjb. Some data (anything that's already compressed, zip, gz, etc, video files, jpg's, encrypted files, etc) are totally uncompressible with these algorithms. If this is the type of data you store, you should not use compression. Probably not worth mention, but what the heck. If you normally have uncompressible data and then one day you're going to do a lot of stuff that's compressible... (Or vice versa)... The compression flag is only used during writes. Once it's written to the pool, compressed or uncompressed, it stays that way, even if you change the flag later. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on SunFire X2100M2 with hybrid pools
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Eugen Leitl > > can I make e.g. LSI SAS3442E > directly do SSD caching (it says something about CacheCade, > but I'm not sure it's an OS-side driver thing), as it > is supposed to boost IOPS? Unlikely shot, but probably > somebody here would know. Depending on the type of work you will be doing, the best performance thing you could do is to disable zil (zfs set sync=disabled) and use SSD's for cache. But don't go *crazy* adding SSD's for cache, because they still have some in-memory footprint. If you have 8G of ram and 80G SSD's, maybe just use one of them for cache, and let the other 3 do absolutely nothing. Better yet, make your OS on a pair of SSD mirror, then use pair of HDD mirror for storagepool, and one SSD for cache. Then you have one SSD unused, which you could optionally add as dedicated log device to your storagepool. There are specific situations where it's ok or not ok to disable zil - look around and ask here if you have any confusion about it. Don't do redundancy in hardware. Let ZFS handle it. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Directory is not accessible
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Sami Tuominen > > How can one remove a directory containing corrupt files or a corrupt file > itself? For me rm just gives input/output error. I was hoping to see somebody come up with an answer for this ... I would expect rm to work... Maybe you have to rm the parent of the thing you're trying to rm? But I kinda doubt it. Maybe you need to verify you're rm'ing the right thing? I believe, if you scrub the pool, it should tell you the name of the corrupt things. Or maybe you're not experiencing a simple cksum mismatch, maybe you're experiencing a legitimate IO error. The "rm" solution could only possibly work to clear up a cksum mismatch. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Appliance as a general-purpose server question
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Jim Klimov > > I wonder if it would make weird sense to get the boxes, forfeit the > cool-looking Fishworks, and install Solaris/OI/Nexenta/whatever to > get the most flexibility and bang for a buck from the owned hardware... This is what we decided to do at work, and this is the reason why. But we didn't buy the appliance-branded boxes; we just bought normal servers running solaris. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Woeful performance from an iSCSI pool
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Ian Collins > > I look after a remote server that has two iSCSI pools. The volumes for > each pool are sparse volumes and a while back the target's storage > became full, causing weird and wonderful corruption issues until they > manges to free some space. > > Since then, one pool has been reasonably OK, but the other has terrible > performance receiving snapshots. Despite both iSCSI devices using the > same IP connection, iostat shows one with reasonable service times while > the other shows really high (up to 9 seconds) service times and 100% > busy. This kills performance for snapshots with many random file > removals and additions. > > I'm currently zero filling the bad pool to recover space on the target > storage to see if that improves matters. > > Has anyone else seen similar behaviour with previously degraded iSCSI > pools? This sounds exactly like the behavior I was seeing with my attempt at two machines zpool mirror'ing each other via iscsi. In my case, I had two machines that are both targets and initiators. I made the initiator service dependent on the target service, and I made the zpool mount dependent on the initiator service, and I made the virtualbox guest start dependent on the zpool mount. Everything seemed fine for a while, including some reboots. But then one reboot, one of my systems stayed down too long, and when it finally came back up, both machines started choking. So far I haven't found any root cause, and so far the only solution I've found was to reinstall the OS. I tried everything I know in terms of removing, forgetting, recreating the targets, initiators, and pool, but somehow none of that was sufficient. I recently (yesterday) got budgetary approval to dig into this more, so hopefully maybe I'll have some insight before too long, but don't hold your breath. I could fail, and even if I don't, it's likely to be weeks or months. What I want to know from you is: Which machines are your solaris machines? Just the targets? Just the initiators? All of them? You say you're having problems just with snapshots. Are you sure you're not having trouble with all sorts of IO, and not just snapshots? What about import / export? In my case, I found I was able to zfs send, zfs receive, zfs status, all fine. But when I launched a guest VM, there would be a massive delay - you said up to 9 seconds - I was sometimes seeing over 30s - sometimes crashing the host system. And the guest OS was acting like it was getting IO error, without actually displaying error message indicating IO error. I would attempt, and sometimes fail, to power off the guest vm (kill -KILL VirtualBox). After the failure began, zpool status still works (and reports no errors), but if I try to do things like export/import, they fail indefinitely, and I need to power cycle the host. While in the failure mode, I can zpool iostat, and I sometimes see 0 transactions with nonzero bandwidth. Which defies my understanding. Did you ever see the iscsi targets "offline" or "degraded" in any way? Did you do anything like "online" or "clear?" My systems are openindiana - the latest, I forget if that's 151a5 or a6 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zvol wrapped in a vmdk by Virtual Box and double writes?
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Jim Klimov > > As for ZIL - even if it is used with the in-pool variant, I don't > think your setup needs any extra steps to disable it (as Edward likes > to suggest), and most other setups don't need to disable it either. No, no - I know I often suggest disabling the zil, because so many people outrule it on principle (the evil tuning guide says "disable the zil (don't!)") But in this case, I was suggesting precisely the opposite of disabling it. I was suggesting making it more aggressive. But now that you mention it - if he's looking for maximum performance, perhaps disabling the zil would be best for him. ;-) Nathan, it will do you some good to understand when it's ok or not ok to disable the zil. (zfs set sync=disabled) If this is a guest VM in your laptop or something like that, then it's definitely safe. If the guest VM is a database server, with a bunch of external clients (on the LAN or network or whatever) then it's definitely *not* safe. Basically if anything external of the VM is monitoring or depending on the state of the VM, then it's not ok. But, if the VM were to crash and go back in time by a few seconds ... If there are no clients that would care about that ... then it's safe to disable ZIL. And that is the highest performance thing you can possibly do. > It also shouldn't add much to your writes - the in-pool ZIL blocks > are then referenced as userdata when the TXG commit happens (I think). I would like to get some confirmation of that - because it's the opposite of what I thought. I thought the ZIL is used like a circular buffer. The same blocks will be overwritten repeatedly. But if there's a sync write over a certain size, then it skips the ZIL and writes immediately to main zpool storage, so it doesn't have to get written twice. > I also think that with a VM in a raw partition you don't get any > snapshots - neither ZFS as underlying storage ('cause it's not), > not hypervisor snaps of the VM. So while faster, this is also some > trade-off :) Oh - But not faster than zvol. I am currently a fan of wrapping zvol inside vmdk, so I get maximum performance and also snapshots. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zvol wrapped in a vmdk by Virtual Box and double writes?
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Nathan Kroenert > > I chopped into a few slices - p0 (partition table), p1 128GB, p2 60gb. > > As part of my work, I have used it both as a RAW device (cxtxdxp1) and > wrapped partition 1 with a virtualbox created VMDK linkage, and it works > like a champ. :) Very happy with that. > > I then tried creating a new zpool using partition 2 of the disk (zpool > create c2d0p2) and then carved a zvol out of that (30GB), and wrapped > *that* in a vmdk. Why are you parititoning, then creating zpool, and then creating zvol? I think you should make the whole disk a zpool unto itself, and then carve out the 128G zvol and 60G zvol. For that matter, why are you carving out multiple zvol's? Does your Guest VM really want multiple virtual disks for some reason? Side note: Assuming you *really* just want a single guest to occupy the whole disk and run as fast as possible... If you want to snapshot your guest, you should make the whole disk one zpool, and then carve out a zvol which is significantly smaller than 50%, say perhaps 40% or 45% might do the trick. The zvol will immediately reserve all the space it needs, and if you don't have enough space leftover to completely replicate the zvol, you won't be able to create the snapshot. If your pool ever gets over 90% used, your performance will degrade, so a 40% zvol is what I would recommend. Back to the topic: Given that you're on the SSD, there is no faster nonvolatile storage you can use for ZIL log device. So you should leave the default ZIL inside the pool... Don't try adding any separate slice or anything as a log device... But as you said, sync writes will hit the disk twice. I would have to guess it's a good idea for you to tune ZFS to immediately flush transactions whenever there's a sync write. I forget how this is done - there's some tunable that indicates anything sync write over a certain size should be immediately flushed... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zvol access rights - chown zvol on reboot / startup / boot
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Edward Ned Harvey > > An easier event to trigger is the starting of the virtualbox guest. Upon vbox > guest starting, check the service properties for that instance of vboxsvc, and > chmod if necessary. But vboxsvc runs as non-root user... > > I like the idea of using zfs properties, if someday the functionality is > going to > be built into ZFS, and we can simply scrap the SMF chown service. But these > days, ZFS isn't seeing a lot of public development. I just built this into simplesmf, http://code.google.com/p/simplesmf/ Support to execut the zvol chown immediately prior to launching guestvm I know Jim is also building it into vboxsvc, but I haven't tried that yet. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedicated server running ESXi with no RAID card, ZFS for storage?
> From: Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) > > > Found quite a few posts on > > various > > forums of people complaining that RDP with external auth doesn't work (or > > not reliably), > > Actually, it does work, and it works reliably, but the setup is very much not > straightforward. I'm likely to follow up on this later today, because as > coincidence would have it, this is on my to-do for today. I just published "simplesmf" http://code.google.com/p/simplesmf/ which includes a lot of the work I've done in the last month. Relevant to this discussion, the step-by-step instructions to enable VBoxHeadless external authentication, and connect the RDP client to it. http://code.google.com/p/simplesmf/source/browse/trunk/samples/virtualbox-guest-control/headless-hints.txt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zvol access rights - chown zvol on reboot / startup / boot
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Jim Klimov > > Well, as a simple stone-age solution (to simplify your SMF approach), > you can define custom attributes on dataset, zvols included. I think > a custom attr must include a colon ":" in the name, and values can be > multiline if needed. Simple example follows: > > # zfs set owner:user=jim pool/rsvd > > Then you can query the zvols for such attribute values and use them > in chmod, chown, ACL settings, etc. from your script. This way the > main goal is reached: the ownership config data stays within the pool. Given that zfs doesn't already have built-in support for these properties at mount time, given the necessity to poll for these values using an as-yet-unwritten SMF service, I'm not necessarily in agreement that zfs properties is a better solution than using a conf file to list these properties on a per-vdev basis. Either way, a SMF service manages it, and it's difficult or impossible to trigger an SMF to occur on every mount, and only on every mount. So the SMF would have to be either a one-time shot at bootup or manual refresh (and consequently miss anything mounted later) or it will have to continuously poll all the filesystems and volumes in the system. An easier event to trigger is the starting of the virtualbox guest. Upon vbox guest starting, check the service properties for that instance of vboxsvc, and chmod if necessary. But vboxsvc runs as non-root user... I like the idea of using zfs properties, if someday the functionality is going to be built into ZFS, and we can simply scrap the SMF chown service. But these days, ZFS isn't seeing a lot of public development. If we assume the SMF service is the thing that will actually be used, from now until someday when BTRFS eventually eclipses ZFS, then I would rather see a conf file or SMF service property, so the SMF service doesn't constantly scan all the filesystems and volumes for their zfs properties. It just checks the conf file and knows instantly which ones need to be chown'd. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zvol access rights - chown zvol on reboot / startup / boot
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Geoff Nordli > > Instead of using vdi, I use comstar targets and then use vbox built-in scsi > initiator. Based on my recent experiences, I am hesitant to use the iscsi ... I don't know if it was the iscsi initiator or target that was unstable, or the combination of both running on the same system, or some other characteristic... Plus when I think about the complexity of creating the zvol and configuring the target, with iscsi and IP overhead... As compared to just creating the zvol and using it directly... Maybe there is unavoidable complexity around the chown, but it seems like the chown should be easier and simpler than the iscsi solution... But in any event, thanks for the suggestion. It's nice to know there's at *least* one alternative option. ;-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zvol access rights - chown zvol on reboot / startup / boot
When I google around for anyone else who cares and may have already solved the problem before I came along - it seems we're all doing the same thing for the same reason. If by any chance you are running VirtualBox on a solaris / opensolaris / openidiana / whatever ZFS host, you could of course use .vdi files for the VM virtual disks, but a lot of us are using zvol instead, for various reasons. To do the zvol, you first create the zvol (sudo zfs create -V) and then chown it to the user who runs VBox (sudo chown someuser /dev/zvol/rdsk/...) and then create a rawvmdk that references it (VBoxManage internalcommands createrawvmdk -filename /home/someuser/somedisk.vmdk -rawdisk /dev/zvol/rdsk/...) The problem is - during boot / reboot, or anytime the zpool or zfs filesystem is mounted or remounted, export, import... The zvol ownership reverts back to root:root. So you have to repeat your "sudo chown" before the guest VM can start. And the question is ... Obviously I can make an SMF service which will chown those devices automatically, but that's kind of a crappy solution. Is there any good way to assign the access rights, or persistently assign ownership of zvol's? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedicated server running ESXi with no RAID card, ZFS for storage?
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Dan Swartzendruber > > Well, I think I give up for now. I spent quite a few hours over the last > couple of days trying to get gnome desktop working on bare-metal OI, > followed by virtualbox. I would recommend installing OI desktop, not OI server. Because I too, tried to get gnome working in OI server, to no avail. But if you install OI desktop, it simply goes in, brainlessly, simple. > Found quite a few posts on > various > forums of people complaining that RDP with external auth doesn't work (or > not reliably), Actually, it does work, and it works reliably, but the setup is very much not straightforward. I'm likely to follow up on this later today, because as coincidence would have it, this is on my to-do for today. Right now, I'll say this much: When you RDP from a windows machine to a windows machine, you get prompted for password. Nice, right? Seems pretty obvious. ;-) But the VirtualBox RDP server doesn't have that capability. Pt... You need to enter the username & password into the RDP client, and save it, before attempting the connection. > The final straw was when I > rebooted the OI server as part of cleaning things up, and... It hung. Bummer. That might be some unsupported hardware for running OI. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedicated server running ESXi with no RAID card, ZFS for storage?
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Eugen Leitl > > On Thu, Nov 08, 2012 at 04:57:21AM +0000, Edward Ned Harvey > (opensolarisisdeadlongliveopensolaris) wrote: > > > Yes you can, with the help of Dell, install OMSA to get the web interface > > to manage the PERC. But it's a pain, and there is no equivalent option for > > most HBA's. Specifcally, on my systems with 3ware, I simply installed the > > solaris 3ware utility to manage the HBA. Which would not be possible on > > ESXi. This is important because the systems are in a remote datacenter, > and > > it's the only way to check for red blinking lights on the hard drives. ;-) > > I thought most IPMI came with full KVM, and also SNMP, and some ssh built- > in. Depends. So, one possible scenario: You power up the machine for the first time, you enter ILOM console, you create username & password & static IP address. From now on, you're able to get the remote console, awesome, great. No need for ipmi-tool in the OS. Another scenario, that I encounter just as often: You inherit some system from the previous admin. They didn't set up IPMI or ILOM. They installed ESXi, and now the only thing you can do is power off the system to do it. But in the situation where I inherit a Linux / Solaris machine from a previous admin who didn't config ipmi... I don't need to power down. I can config the ipmi via ipmi-tools. Going a little further down these trails... If you have a basic IPMI device, then all it does is *true* ipmi, which is a standard protocol. You have to send it ipmi signals via the ipmi-tool command on your laptop (or another server). It doesn't use SSL; it uses either no encryption, or a preshared key. The preshared key is a random HEX 20 character long string. If you configure that at the boot time (as in the first situation mentioned above) then you have to type in at the physical console at first boot: new username, new password, new static IP address etc, and the new encryption key. But if you're running a normal OS, you can skip all that, boot the new OS, and paste all that stuff in via ssh, using the local ipmi-tool to config the local ipmi device. If you have a newer, more powerful ILOM device, then you probably only need to assign an IP address to the ilom. Then you can browse to it via https and do whatever else you need to do. Make sense? Long story short, "Depends.";-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedicated server running ESXi with no RAID card, ZFS for storage?
> From: Karl Wagner [mailto:k...@mouse-hole.com] > > If I was doing this now, I would probably use the ZFS aware OS bare metal, > but I still think I would use iSCSI to export the ZVols (mainly due to the > ability > to use it across a real network, hence allowing guests to be migrated simply) Yes, if your VM host is some system other than your ZFS baremetal storage server, then exporting the zvol via iscsi is a good choice, or exporting your storage via NFS. Each one has their own pros/cons, and I would personally be biased in favor of iscsi. But if you're going to run the guest VM on the same machine that is the ZFS storage server, there's no need for the iscsi. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedicated server running ESXi with no RAID card, ZFS for storage?
> From: Dan Swartzendruber [mailto:dswa...@druber.com] > > I have to admit Ned's (what do I call you?)idea is interesting. I may give > it a try... Yup, officially Edward, most people call me Ned. I contributed to the OI VirtualBox instructions. See here: http://wiki.openindiana.org/oi/VirtualBox Jim's vboxsvc is super powerful - But at first I found it overwhelming, mostly due to unfamiliarity with SMF. One of these days I'm planning to contribute a "Quick Start" guide to vboxsvc, but for now, if you find it confusing in any way, just ask for help here. (Right Jim?) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedicated server running ESXi with no RAID card, ZFS for storage?
> From: Dan Swartzendruber [mailto:dswa...@druber.com] > > Now you have me totally confused. How does your setup get data from the > guest to the OI box? If thru a wire, if it's gig-e, it's going to be > 1/3-1/2 the speed of the other way. If you're saying you use 10gig or > some-such, we're talking about a whole different animal. Sorry - In the old setup, I had ESXi host, with solaris 10 guest, exporting NFS back to the host. So ESXi created the other guests inside the NFS storage pool. In this setup, the bottleneck is the virtual LAN that maxes out around 2-3 Gbit, plus TCP/IP and NFS overhead that degrades the usable performance a bit more. In the new setup, I have openindiana running directly on the hardware (OI is the host) and virtualization is managed by VirtualBox. I would use zones if I wanted solaris/OI guests, but it just so happens I want linux & windows guests. There is no bottleneck. My linux guest can read 6Gbit/sec and write 3Gbit/sec (I'm using 3 disks mirrored with another 3 disks, each can read/write 1 Gbit/sec). ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedicated server running ESXi with no RAID card, ZFS for storage?
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Karl Wagner > > I am just wondering why you export the ZFS system through NFS? > I have had much better results (albeit spending more time setting up) using > iSCSI. I found that performance was much better, A couple years ago, I tested and benchmarked both configurations on the same system. I found that the performance was equal both ways (which surprised me because I expected NFS to be slower due to FS overhead.) I cannot say if CPU utilization was different - but the IO measurements were the same. At least, indistinguishably different. Based on those findings, I opted to use NFS for several weak reasons. If I wanted to, I could export NFS to more different systems. I know everything nowadays supports iscsi initiation, but it's not as easy to set up as a NFS client. If you want to expand the guest disk, in iscsi, ... I'm not completely sure you *can* expand a zvol, but if you can, you at least have to shut everything down, then expand and bring it all back up and then have the iscsi initiator expand to occupy the new space. But in NFS, the client can simply expand, no hassle. I like being able to look in a filesystem and see the guests listed there as files. Know I could, if I wanted to, copy those things out to any type of storage I wish. Someday, perhaps I'll want to move some guest VM's over to a BTRFS server instead of ZFS. But it would be more difficult with iscsi. For what it's worth, in more recent times, I've opted to use iscsi. And here are the reasons: When you create a guest file in a ZFS filesystem, it doesn't automatically get a refreservation. Which means, if you run out of disk space thanks to snapshots and stuff, the guest OS suddenly can't write to disk, and it's a hard guest crash/failure. Yes you can manually set the refreservation, if you're clever, but it's easy to get wrong. If you create a zvol, by default, it has an appropriately sized refreservation that guarantees the guest will always be able to write to disk. Although I got the same performance using iscsi or NFS with ESXi... I did NOT get the same result using VirtualBox. In Virtualbox, if I use a *.vdi file... The performance is *way* slower than using a *.vmdk wrapper for physical device (zvol). ( using VBoxManage internalcommands createrawvmdk ) The only problem with the zvol / vmdk idea in virtualbox is that every reboot (or remount) the zvol becomes owned by root again. So I have to manually chown the zvol for each guest each time I restart the host. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedicated server running ESXi with no RAID card, ZFS for storage?
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Jim Klimov > > the VM running "a ZFS OS" enjoys PCI-pass-through, so it gets dedicated > hardware access to the HBA(s) and harddisks at raw speeds, with no > extra layers of lags in between. Ah. But even with PCI pass-thru, you're still limited by the virtual LAN switch that connects ESXi to the ZFS guest via NFS. When I connected ESXi and a guest this way, obviously your bandwidth between the host & guest is purely CPU and memory limited. Because you're not using a real network interface; you're just emulating the LAN internally. I streamed data as fast as I could between ESXi and a guest, and found only about 2-3 Gbit. That was over a year ago so I forget precisely how I measured it ... NFS read/write perhaps, or wget or something. I know I didn't use ssh or scp, because those tend to slow down network streams quite a bit. The virtual network is a bottleneck (unless you're only using 2 disks, in which case 2-3 Gbit is fine.) I think THIS is where we're disagreeing: I'm saying "Only 2-3 gbit" but I see Dan's email said " since the traffic never leaves the host (I get 3gb/sec or so usable thruput.)" and "No offense, but quite a few people are doing exactly what I describe and it works just fine..." It would seem we simply have different definitions of "fine" and "abysmal." ;-) > Also, VMWare does not (AFAIK) use ext3, but their own VMFS which is, > among other things, cluster-aware (same storage can be shared by > several VMware hosts). I didn't know vmfs3 had extensions - I think vmfs3 is based on ext3. At least, all the performance characteristics I've ever observed are on-par with ext3. But it makes sense they would extend it in some way. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedicated server running ESXi with no RAID card, ZFS for storage?
> From: Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) > > Stuff like that. I could go on, but it basically comes down to: With > openindiana, you can do a lot more than you can with ESXi. Because it's a > complete OS. You simply have more freedom, better performance, less > maintenance, less complexity. IMHO, it's better in every way. Oh - I just thought of an important one - make that two, three... On ESXi, you can't run ipmi-tools. Which means, if you're configuring ipmi, you have to do it at power-on, by hitting the BIOS key, and then you have to type in your encryption key by hand (20 hex chars). Whereas, with a real OS, you run ipmi-tool and paste on the ssh prompt. (Even if you enable ssh prompt on ESXi, you won't get ipmi-tool running there.) I have two systems that have 3ware HBA's, and I have some systems with Dell PERC. Yes you can, with the help of Dell, install OMSA to get the web interface to manage the PERC. But it's a pain, and there is no equivalent option for most HBA's. Specifcally, on my systems with 3ware, I simply installed the solaris 3ware utility to manage the HBA. Which would not be possible on ESXi. This is important because the systems are in a remote datacenter, and it's the only way to check for red blinking lights on the hard drives. ;-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedicated server running ESXi with no RAID card, ZFS for storage?
> From: Dan Swartzendruber [mailto:dswa...@druber.com] > > I'm curious here. Your experience is 180 degrees opposite from mine. I > run an all in one in production and I get native disk performance, and > ESXi virtual disk I/O is faster than with a physical SAN/NAS for the NFS > datastore, since the traffic never leaves the host (I get 3gb/sec or so > usable thruput.) What is all in one? I wonder if we crossed wires somehow... I thought Tiernan said he was running Nexenta inside of ESXi, where Nexenta exports NFS back to the ESXi machine, so ESXi will have the benefit of ZFS underneath its storage. That's what I used to do. When I said performance was abysmal, I meant, if you dig right down and pressure the system for throughput to disk, you've got a Linux or Windows VM isnide of ESX, which is writing to a virtual disk, which ESX is then wrapping up inside NFS and TCP, talking on the virtual LAN to the ZFS server, which unwraps the TCP and NFS, pushes it all through the ZFS/Zpool layer, writing back to the virtual disk that ESX gave it, which is itself a layer on top of Ext3, before it finally hits disk. Based purely on CPU and memory throughput, my VM guests were seeing a max throughput of around 2-3 Gbit/sec. That's not *horrible* abysmal. But it's bad to be CPU/memory/bus limited if you can just eliminate all those extra layers, and do the virtualization directly isnide a system that supports zfs. > > I have abandoned ESXi in favor of openindiana or solaris running as the > host, with virtualbox running the guests. I am S much happier now. > But it takes a higher level of expertise than running ESXi, but the results > are > much better. > > > in what respect? due to the 'abysmal performance'? No - mostly just the fact that I am no longer constrained by ESXi. In ESXi, you have such limited capabilities of monitoring, storage, and how you interface it ... You need a windows client, you only have a few options in terms of guest autostart and so forth. If you manage all that in a shell script (or whatever) you can literally do anything you want. Startup one guest, then launch something that polls the first guest for the operational XMPP interface (or whatever service you happen to care about) before launching the second guest, etc. Obviously you can still do brain-dead timeouts or monitoring for the existence of late-boot-cycle services such as vmware-tools too, but that's no longer your only option. Of particular interest, I formerly had ESXi running a guest that was a DHCP and DNS server, and everything else had to wait for it. Now I run DHCP and DNS directly inside of the host openindiana. (So I eliminated one VM). I am now able to connect to guest consoles via VNC or RDP (ok on mac and linux), whereas with ESXi your only choice is to connect via VSphere from windows. In ESXi, you cannot use a removable USB drive to store your removable backup storage. I was using an eSATA drive, and I needed to reboot the whole system every time I rotated backups offsite. But with openindiana as the host, I can add/remove removable storage, perform my zpool imports / exports, etc, all without any rebooting. Stuff like that. I could go on, but it basically comes down to: With openindiana, you can do a lot more than you can with ESXi. Because it's a complete OS. You simply have more freedom, better performance, less maintenance, less complexity. IMHO, it's better in every way. I say "less complexity" but maybe not. It depends. I have greater complexity in the host OS, but I have less confusion and less VM dependencies, so to me that's less complexity. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedicated server running ESXi with no RAID card, ZFS for storage?
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Tiernan OToole > > I have a Dedicated server in a data center in Germany, and it has 2 3TB > drives, > but only software RAID. I have got them to install VMWare ESXi and so far > everything is going ok... I have the 2 drives as standard data stores... ESXi doesn't do software raid, so ... what are you talking about? > But i am paranoid... So, i installed Nexenta as a VM, gave it a small disk to > boot off and 2 1Tb disks on separate physical drives... I have created a > mirror > pool and shared it with VMWare over NFS and copied my ISOs to this share... I formerly did exactly the same thing. Of course performance is abysmal because you're booting a guest VM to share storage back to the host where the actual VM's run. Not to mention, there's the startup dependency, which is annoying to work around. But yes it works. > 1: If you where given the same hardware, what would you do? (RAID card is > an extra EUR30 or so a month, which i don't really want to spend, but could, > if > needs be...) I have abandoned ESXi in favor of openindiana or solaris running as the host, with virtualbox running the guests. I am S much happier now. But it takes a higher level of expertise than running ESXi, but the results are much better. > 2: should i mirror the boot drive for the VM? Whenever possible, you should always give more than one storage device to ZFS and let it do redundancy of some kind, be it mirror or raidz. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Strange mount -a problem in Solaris 11.1
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Ian Collins > > >> ioctl(3, ZFS_IOC_OBJECT_STATS, 0xF706BBB0) > >> > >> The system boots up fine in the original BE. The root (only) pool in a > >> single drive. > >> > >> Any ideas? > > devfsadm -Cv > > rm /etc/zfs/zpool.cache > > init 6 > > > > That was a big enough stick to fix it. Nasty bug none the less. I wonder what caused it? The ioctl error suggests inability to access some device. Hence the devfsadm and rm zpool.cache. Force the system to search for devices anew. What did you upgrade from? Perhaps in your old system, you had pools made of c0t0d0 and so forth, while in sol 11, the devices all became multipath? If so, I would expect the upgrader to be smart enough to do the devfsadm for you, and rebuild the zpool.cache. Anyway, glad you got out of the woods. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Strange mount -a problem in Solaris 11.1
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Ian Collins > > Have have a recently upgraded (to Solaris 11.1) test system that fails > to mount its filesystems on boot. > > Running zfs mount -a results in the odd error > > #zfs mount -a > internal error > Invalid argument > > truss shows the last call as > > ioctl(3, ZFS_IOC_OBJECT_STATS, 0xF706BBB0) > > The system boots up fine in the original BE. The root (only) pool in a > single drive. > > Any ideas? devfsadm -Cv rm /etc/zfs/zpool.cache init 6 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Scrub and checksum permutations
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Jim Klimov > > I tend to agree that parity calculations likely > are faster (even if not all parities are simple XORs - that would > be silly for double- or triple-parity sets which may use different > algos just to be sure). Even though parity calculation is faster than fletcher, which is faster than sha256, it's all irrelevant, except in the hugest of file servers. Go write to disk or read from disk as fast as you can, and see how much CPU you use. Even on moderate fileservers that I've done this on (a dozen disks in parallel) the cpu load is negligible. If you ever get up to a scale where the cpu load becomes significant, you solve it by adding more cpu's. There is a limit somewhere, but it's huge. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zpool LUN Sizes
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Fajar A. Nugraha > > So my > suggestion is actually just present one huge 25TB LUN to zfs and let > the SAN handle redundancy. Oh - No Definitely let zfs handle the redundancy. Because ZFS is doing the checksumming, if it finds a cksum error, it needs access to the redundant copy in order to correct it. If you let the SAN handle the redundancy, then zfs finds a cksum error, and your data is unrecoverable. (Just the file in question, not the whole pool or anything like that.) The answer to Morris's question, about size of LUNs and so forth... It really doesn't matter what size the LUNs are. Just choose based on your redundancy and performance requirements. Best would be to go JBOD, or if that's not possible, create a bunch of 1-disk volumes and let ZFS handle them as if they're JBOD. Performance is much better if you use mirrors instead of raid. (Sequential performance is just as good either way, but sequential IO is unusual for most use cases. Random IO is much better with mirrors, and that includes scrubs & resilvers.) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zpool LUN Sizes
> From: Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) > > Performance is much better if you use mirrors instead of raid. (Sequential > performance is just as good either way, but sequential IO is unusual for most > use cases. Random IO is much better with mirrors, and that includes scrubs & > resilvers.) Even if you think you use sequential IO... If you use snapshots... Thanks to the nature of snapshot creation & deletion & the nature of COW, you probably don't have much sequential IO in your system, after a couple months of actual usage. Some people use raidzN, but I always use mirrors. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Scrub and checksum permutations
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Jim Klimov > > Logically, yes - I agree this is what we expect to be done. > However, at least with the normal ZFS reading pipeline, reads > of redundant copies and parities only kick in if the first > read variant of the block had errors (HW IO errors, checksum > mismatch). I haven't read or written the code myself personally, so I'm not authoritative. But I certainly know I've heard it said on this list before, that when you read a mirror, it only reads one side (as you said) unless there's an error; this allows a mirror to read 2x faster than a single disk (which I confirm by benchmarking.) However, a scrub reads both sides, all redundant copies of the data. I'm personally comfortably confident assuming this is true also for reading the redundant copies of raidzN data. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Scrub and checksum permutations
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Karl Wagner > > I can only speak anecdotally, but I believe it does. > > Watching zpool iostat it does read all data on both disks in a mirrored > pair. > > Logically, it would not make sense not to verify all redundant data. > The point of a scrub is to ensure all data is correct. Same for me. Think about it: When you write some block, it computes parity bits, and writes them to the redundant parity disks. When you later scrub the same data, it wouldn't make sense to do anything other than repeating this process, to verify all the disks including parity. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send to older version
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Karl Wagner > > The only thing I think Oracle should have done differently is to allow > either a downgrade or creating a send stream in a lower version > (reformatting the data where necessary, and disabling features which > weren't present). However, this would not be a simple addition, and it > is probably not worth it for Oracle's intended customers. So you have a backup server in production, that has storage and does a zfs send to removable media, on periodic basis. (I know I do.) So you buy a new server, and it comes with a new version of zfs. Now you can't backup your new server. Or maybe you upgrade some other machine, and now you can't back *it* up. The ability to either downgrade a pool, or send a stream that's compatible with an older version seems pretty obvious, as a missing feature. I will comment on the irony, that right now, there's another thread on this list seeing a lot of attention, regarding how to receive a 'zfs send' data stream on non-ZFS systems. But there is no discussion about receiving on older zfs systems. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send to older version
> From: Richard Elling [mailto:richard.ell...@gmail.com] > > At some point, people will bitterly regret some "zpool upgrade" with no way > back. > > uhm... and how is that different than anything else in the software world? > > No attempt at backward compatibility, and no downgrade path, not even by > going back to an older snapshot before the upgrade. > > ZFS has a stellar record of backwards compatibility. The only break with > backwards > compatibility I can recall was a bug fix in the send stream somewhere around > opensolaris b34. > > Perhaps you are confusing backwards compatibility with forwards > compatibility? Semantics. New version isn't compatible with old version, or old version isn't compatible with new version. Either way, same end result. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] What is L2ARC write pattern?
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Jim Klimov > >One idea I have is that a laptop which only has a single HDD slot, > often has SD/MMC cardreader slots. If populated with a card for L2ARC, > can it be expected to boost the laptop's ZFS performance? You won't find that type of card with performance that's worth a damn. Worse yet, it will likely be extremely unreliable. In a SSD, all the performance and reliability come from intelligence in the controller, which emulates SATA HDD on one side, and manages Flash memory on the other side. Things like wear leveling, block mapping, garbage collection, etc, that's where all the performance comes from. You're not going to get it in a USB stick or a SD card. You're only going to get it in full size SSD's that consume power, and to some extent, the good stuff will cost more. (But of course, there's no way for the consumer to distinguish between paying for quality, and paying for marketing and margin, without trying it.) Even if you do try it, most likely you won't know the difference until a month later, having two identical systems with identical workload side-by-side. This is NOT to say the difference is insignificant; it's very significant, but without a point of reference, you don't have any comparison. All the published performance specs are fudged - but not lies - they represent optimal conditions, which are unrealistic. All the mfgrs are going to publish comparable specs, and none of them represent real life usage. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] What happens when you rm zpool.cache?
> From: Jim Klimov [mailto:jimkli...@cos.ru] > Sent: Monday, October 22, 2012 7:26 AM > > Are you sure that the system with failed mounts came up NOT in a > read-only root moment, and that your removal of /etc/zfs/zpool.cache > did in fact happen (and that you did not then boot into an earlier > BE with the file still in it)? I'm going to take your confusion and disbelief in support of my confusion and disbelief. So it's not that I didn't understand what to expect ... it's that I somehow made a mistake, but I don't know what (and I don't care enough to try reproducing the same circumstance.) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] What happens when you rm zpool.cache?
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Edward Ned Harvey > > If you rm /etc/zfs/zpool.cache and reboot... The system is smart enough (at > least in my case) to re-import rpool, and another pool, but it didn't figure > out > to re-import some other pool. > > How does the system decide, in the absence of rpool.cache, which pools it's > going to import at boot? So, in this thread, I haven't yet got the answer that I expect or believe. Because, the behavior I observed was: I did a "zfs send" from one system to another, received onto /localpool/backups. Side note, the receiving system has three pools: rpool, localpool, and iscsipool. Unfortunately, I sent the zfs properties with it, including the mountpoint. Naturally, there was already something mounted on / and /exports and /exports/home, so the zfs receive failed to mount on the receiving system, but I didn't notice that. Later, I rebooted. During reboot, of course, rpool mounted correctly on /, but then the system found the localpool/backups filesystems, and mounted /exports, /exports/home and so forth. So when it tried to mount rpool/exports, it failed. Then, iscsipool was unavailable, so the system failed to bootup completely. I was able to login to console as myself, but I had no home directory, so I su'd to root. I tried to change the mountpoints of localpool/backups/exports and so forth - but it failed. Filesystem is in use, or filesystem busy or something like that. (Because I logged in, obviously.) I tried to export localpool, and again failed. So I wanted some way to prevent localpool from importing or mounting next time, although I can't make it unmount or change mountpoints this time. rm /etc/zfs/zpool.cache ; init 6 This time, the system came up, and iscsipool was not imported (as expected.) But I was surprised - localpool was imported. Fortunately, this time the system mounted filesystems in the right order - rpool/exports was mounted under /exports, and I was able to login as myself, and export/import / change mountpoints of the localpool filesystems. One more reboot just to be sure, and voila, no problem. Point in question is - After I removed the zpool.cache file, I expected rpool to be the only pool imported upon reboot. That's not what I observed, and I was wondering how the system knew to import localpool? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] What happens when you rm zpool.cache?
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Gary Mills > > On Sun, Oct 21, 2012 at 11:40:31AM +0200, Bogdan Ćulibrk wrote: > >Follow up question regarding this: is there any way to disable > >automatic import of any non-rpool on boot without any hacks of > removing > >zpool.cache? > > Certainly. Import it with an alternate cache file. You do this by > specifying the `cachefile' property on the command line. The `zpool' > man page describes how to do this. You can also specify cachefile=none ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] vm server storage mirror
> From: Timothy Coalson [mailto:tsc...@mst.edu] > Sent: Friday, October 19, 2012 9:43 PM > > A shot in the dark here, but perhaps one of the disks involved is taking a > long > time to return from reads, but is returning eventually, so ZFS doesn't notice > the problem? Watching 'iostat -x' for busy time while a VM is hung might tell > you something. Oh yeah - this is also bizarre. I watched "zpool iostat" for a while. It was showing me : Operations (read and write) consistently 0 Bandwidth (read and write) consistently non-zero, but something small, like 1k-20k or so. Maybe that is normal to someone who uses zpool iostat more often than I do. But to me, zero operations resulting in non-zero bandwidth defies logic. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] What happens when you rm zpool.cache?
If you rm /etc/zfs/zpool.cache and reboot... The system is smart enough (at least in my case) to re-import rpool, and another pool, but it didn't figure out to re-import some other pool. How does the system decide, in the absence of rpool.cache, which pools it's going to import at boot? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send to older version
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Richard Elling > >> At some point, people will bitterly regret some "zpool upgrade" with no way >> back. > > uhm... and how is that different than anything else in the software world? No attempt at backward compatibility, and no downgrade path, not even by going back to an older snapshot before the upgrade. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] vm server storage mirror
Yikes, I'm back at it again, and so frustrated. For about 2-3 weeks now, I had the iscsi mirror configuration in production, as previously described. Two disks on system 1 mirror against two disks on system 2, everything done via iscsi, so you could zpool export on machine 1, and then zpool import on machine 2 for a manual failover. Created the dependency - initiator depends on target, and created a new smf service to mount the iscsi zpool after the initiator is up (and consequently export the zpool before the initiator shuts down.) Able to reboot, everything working perfectly. Until today. Today I rebooted one system for some maintenance, and it stayed down longer than expected, so those disks started throwing errors on the second machine. First system eventually came up again, second system resilvered, everything looked good. I zpool clear'd the pool on the second machine just to make the counters look pretty again. But it wasn't pretty at all. This is so bizarre - Throughout the day, the VM's on system 2 kept choking. I had to powercycle system 2 about half a dozen times due to unresponsiveness. Exactly the type of behavior you expect for IO error - but nothing whatsoever appears in the system log, and the zpool status still looks clean. Several times, I destroyed the pool and recreated it completely from backup. zfs send and zfs receive both work fine. But strangely - when I launch a VM, the IO grinds to a halt, and I'm forced to powercycle (usually) the host. You might try to conclude it's something wrong with virtualbox - but it's not. I literally copied & pasted the zfs send | zfs receive commands that restored the pool from backup, but this time restored it onto local storage. The only difference is local disk versus iscsi pool. And then it finally worked without any glitches. During the day, trying to get the iscsi pool up again - this is so bizarre - I did everything I could think of, to get back to a pristine state. I removed iscsi targets, I removed lun's (lu's), I removed the static discovery and re-added it, got new device names, I wiped the disks (zpool destroy & zpool create) re-created lu's, re-created static discovery, re-created targets, re-created zpools... The behavior was the same no matter what I did. I can create the pool, import it, zfs receive onto it no problem, but then when I launch the VM, the whole system grinds to a halt. VirtualBox will be in a "sleep" state, Virtualbox shows the green light on the hard drive indicating it's trying to read, meanwhile if I try to X it out, it won't die, and gnome gives me the "Force Quit" dialog, meanwhile I can sudo kill -KILL VirtualBox, and VirtualBox *still* won't die. Any "zpool" or "zfs" command I type in hangs indefinitely (even time-slider daemon or zfs auto snapshot are hung). I can poke around the system in other areas - on other pools and stuff - but the only way out of it is power cycle. It's so weird, that once the problem happens once, I have not yet found any way to recover from it except to reformat and reinstall the OS for the whole system. I cannot, for the life of me, think of *any*thing that could be storing state like this, preventing me from getting back into a usable iscsi mirror pool. One thing I haven't tried yet - It appears, I think, that when you make a disk, let's say c2t4d0 an iscsi target, let's say c6t7blahblahblahd0... It appears, I think, that c6t7blahblahblahd0 is actually c2t4d0s2. I could create a pool using c2t4d0, and/or zero the whole disk, completely obliterating any semblance of partition tables inside there, or old redundant copies of old uberblocks or anything like that. But seriously, I'm grasping at straws here, just trying to find *any* place where some bad state is stored that I haven't thought of yet. I shouldn't need to reformat the host. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Changing rpool device paths/drivers
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of James C. McPherson > > As far as I'm aware, having an rpool on multipathed devices > is fine. Even a year ago, a new system I bought from Oracle came with multipath devices for all devices by default. Granted, there weren't any multiple paths on that system... But it was using the multipath device names. I expect this is the new default for everything moving forward. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send to older version
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Ian Collins > > You have to create pools/filesystems with the older versions used by the > destination machine. Apparently "zpool create -d -o version=28" you might want to do on the new system... (I just wrote 28, but make sure it matches the latest version supported by your receiving system.) You might have to do something similar to use an older ZFS version too. And then you should be able to send from the new to the old system. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] openindiana-1 filesystem, time-slider, and snapshots
Can anyone explain to me what the openindiana-1 filesystem is all about? I thought it was the "backup" copy of the openindiana filesystem, when you apply OS updates, but that doesn't seem to be the case... I have time-slider enabled for rpool/ROOT/openindiana. It has a daily snapshot (amongst others). But every day when the new daily snap is taken, the old daily snap rotates into the rpool/ROOT/openindiana-1 filesystem. This is messing up my cron-scheduled "zfs send" script - which detects that the rpool/ROOT/openindiana filesystem no longer has the old daily snapshot, and therefore has no snapshot in common with the receiving system, and therefore sends a new full backup every night. To make matters more confusing, when I run "mount" and when I zfs get all | grep -i mount, I see / on rpool/ROOT/openindiana-1 It would seem, I shouldn't be backing up openindiana, but instead, backup openindiana-1? I would have sworn, out-of-the-box, there was no openindiana-1. Am I simply wrong? My expectation is that rpool/ROOT/openindiana should have lots of snaps available... 3 frequent: one every 15 mins, 23 hourly: one every hour, 6 daily: one every day, 4 weekly: one every 7 days, etc. I checked to ensure auto-snapshot service is enabled. I checked svccfg to ensure I understood the correct interval, keep, and period (as described above.) I have the expected behavior (as I described, the expected behavior according to my expectations) on rpool/export/home/eharvey... But the behavior is different on rpool/ROOT/openindiana, even though, as far as I can tell, I have the same settings for both. That is, simply, com.sun:auto-snapshot=true One more comment - I recall, when I first configured time-slider, they have a threshold, default 80% pool used before they automatically bump off old snapshots (or stop taking new snaps, I'm not sure what the behavior is). I don't see that setting anywhere I look, using svccfg or zfs get. My pools are pretty much empty right now. Nowhere near the 80% limit. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Fixing device names after disk shuffle
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Paul van der Zwan > > What was c5t2 is now c7t1 and what was c4t1 is now c5t2. > Everything seems to be working fine, it's just a bit confusing. That ... Doesn't make any sense. Did you reshuffle these while the system was powered on or something? sudo devfsadm -Cv sudo zpool export datapool sudo zpool export homepool sudo zpool import -a sudo reboot -p The normal behavior is: During the import, or during the reboot when the filesystem gets mounted, zfs searches the available devices in the system for components of a pool. I don't see any way the devices reported by "zpool status" wouldn't match the devices reported by "format." Unless, as you say, it's somehow overridden by the cache file. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS best practice for FreeBSD?
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Edward Ned Harvey > > A solid point. I don't. > > This doesn't mean you can't - it just means I don't. This response was kind of long-winded. So here's a simpler version: Suppose 6 disks in a system, each 2T. c0t0d0 through c0t5d0 rpool is a mirror: mirror c0t0d0p1 c0t1d0p1 c0t0d0p2 = 1.9T, unused (Extended, unused) c0t1d0p2 = 1.9T, unused (Extended, unused) Now partition all the other disks the same. Create datapool: zpool create datapool \ mirror c0t0d0p2 c0t1d0p2 \ mirror c0t2d0p1 c0t3d0p1 \ mirror c0t2d0p2 c0t3d0p2 \ mirror c0t4d0p1 c0t5d0p1 \ mirror c0t4d0p2 c0t4d0p2 Add a spare? A seventh disk, c0t6d0 Partition it. add spare c0t6d0p1 spare c0t6d0p2 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS best practice for FreeBSD?
> From: Ian Collins [mailto:i...@ianshome.com] > > On 10/13/12 02:12, Edward Ned Harvey > (opensolarisisdeadlongliveopensolaris) wrote: > > There are at least a couple of solid reasons *in favor* of partitioning. > > > > #1 It seems common, at least to me, that I'll build a server with let's > > say, 12 > disk slots, and we'll be using 2T disks or something like that. The OS itself > only takes like 30G which means if I don't partition, I'm wasting 1.99T on > each > of the first two disks. As a result, when installing the OS, I always > partition > rpool down to ~80G or 100G, and I will always add the second partitions of > the first disks to the main data pool. > > How do you provision a spare in that situation? A solid point. I don't. This doesn't mean you can't - it just means I don't. If I'm not mistaken... If you have a pool with multiple different sizes of devices in the pool, you only need to add a spare of the larger size. If you have a smaller device failure, I believe the pool will use the larger spare device rather than not using a spare. So if I'm not mistaken, you can add a spare to your pool exactly the same, regardless of having partitions or no partitions. If I'm wrong - if the pool won't use the larger spare device in place of a smaller failed device (partition), then you would likely need to add one spare for each different size device used in your pool. In particular, this means: Option 1: Given that you partition your first 2 disks, 80G for OS and 1.99T for data, you would likely want to partition *all* your disks the same, including the disk that's designated as a spare. Then you could add your spare 80G partition as a spare device, and your spare 1.99T partition as a spare device. Option 2: Suppose you partition your first disks, and you don't want to hassle on all the rest. (This is my case.) Or you have physically different size devices, a pool that was originall made of 1T disks but now it's been extended to include a bunch of 2T disks, or something like that. It's conceivable you would want to have a spare of each different size, which could in some cases mean you use two spares (one partitioned and one not) in a pool where you might otherwise have only one spare. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Building an On-Site and Off-Size ZFS server, replication question
Jim, I'm trying to contact you off-list, but it doesn't seem to be working. Can you please contact me off-list? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS best practice for FreeBSD?
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of andy thomas > > According to a Sun document called something like 'ZFS best practice' I > read some time ago, best practice was to use the entire disk for ZFS and > not to partition or slice it in any way. Does this advice hold good for > FreeBSD as well? I'm not going to address the FreeBSD question. I know others have made some comments on the "best practice" on solaris, but here goes: There are two reasons for the "best practice" of not partitioning. And I disagree with them both. First, by default, the on-disk write cache is disabled. But if you use the whole disk in a zpool, then zfs enables the cache. If you partition a disk and use it for only zpool's, then you might want to manually enable the cache yourself. This is a fairly straightforward scripting exercise. You may use this if you want: (No warranty, etc, it will probably destroy your system if you don't read and understand and rewrite it yourself before attempting to use it.) https://dl.dropbox.com/u/543241/dedup%20tests/cachecontrol/cachecontrol.zip If you do that, you'll need to re-enable the cache once on each boot (or zfs mount). The second reason is because when you "zpool import" it doesn't automatically check all the partitions of all the devices - it only scans devices. So if you are forced to move your disks to a new system, you try to import, you get an error message, you panic and destroy your disks. To overcome this problem, you just need to be good at remembering the disks were partitioned - Perhaps you should make a habit of partitioning *all* of your disks, so you'll *always* remember. On zpool import, you need to specify the partitions to scan for zpools. I believe this is the "zpool import -d" option. And finally - There are at least a couple of solid reasons *in favor* of partitioning. #1 It seems common, at least to me, that I'll build a server with let's say, 12 disk slots, and we'll be using 2T disks or something like that. The OS itself only takes like 30G which means if I don't partition, I'm wasting 1.99T on each of the first two disks. As a result, when installing the OS, I always partition rpool down to ~80G or 100G, and I will always add the second partitions of the first disks to the main data pool. #2 A long time ago, there was a bug, where you couldn't attach a mirror unless the two devices had precisely the same geometry. That was addressed in a bugfix a couple of years ago. (I had a failed SSD mirror, and Sun shipped me a new SSD with a different firmware rev, and the size of the replacement device was off by 1 block, so I couldn't replace the failed SSD). After the bugfix, a mirror can be attached if there's a little bit of variation in the sizes of the two devices. But it's not quite enough - As recently as 2 weeks ago, I tried to attach two devices that were precisely the same, but couldn't because of the different size. One of them was a local device, and the other was an iscsi target. So I guess iscsi must require a little bit of space, and that was enough to make the devices un-mirror-able without partitioning. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Building an On-Site and Off-Size ZFS server, replication question
> From: Richard Elling [mailto:richard.ell...@gmail.com] > > Pedantically, a pool can be made in a file, so it works the same... Pool can only be made in a file, by a system that is able to create a pool. Point is, his receiving system runs linux and doesn't have any zfs; his receiving system is remote from his sending system, and it has been suggested that he might consider making an iscsi target available, so the sending system could "zpool create" and "zfs receive" directly into a file or device on the receiving system, but it doesn't seem as if that's going to be possible for him - he's expecting to transport the data over ssh. So he's looking for a way to do a "zfs receive" on a linux system, transported over ssh. Suggested answers so far include building a VM on the receiving side, to run openindiana (or whatever) or using zfs-fuse-linux. He is currently writing his "zfs send" datastream into a series of files on the receiving system, but this has a few disadvantages as compared to doing "zfs receive" on the receiving side. Namely, increased risk of data loss and less granularity for restores. For these reasons, it's been suggested to find a way of receiving via "zfs receive" and he's exploring the possibilities of how to improve upon this situation. Namely, how to "zfs receive" on a remote linux system via ssh, instead of cat'ing or redirecting into a series of files. There, I think I've recapped the whole thread now. ;-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Building an On-Site and Off-Size ZFS server, replication question
> From: Richard Elling [mailto:richard.ell...@gmail.com] > > Read it again he asked, "On that note, is there a minimal user-mode zfs thing > that would allow > receiving a stream into an image file?" Something like: > zfs send ... | ssh user@host "cat > file" He didn't say he wanted to cat to a file. But it doesn't matter. It was only clear from context, responding to the advice of "zfs receive"ing into a zpool-in-a-file, that it was clear he was asking about doing a "zfs receive" into a file, not just cat. If you weren't paying close attention to the thread, it would be easy to misunderstand what he was asking for. When he asked for "minimal user-mode" he meant, something less than a full-blown OS installation just for the purpose of zfs receive. He went on to say, he was considering zfs-fuse-on-linux. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Directory is not accessible
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Sami Tuominen > > Unfortunately there aren't any snapshots. > The version of zpool is 15. Is it safe to upgrade that? > Is zpool clear -F supported or of any use here? The only thing that will be of use to restore your data will be a backup. To forget about the lost data and make the error message go away, simply rm the bad directory (and/or its parent). You're probably wondering, you have redundancy and no faulted devices, so how could this happen? There are a few possible explanations, but they're all going to have one thing in common: At some point, something got corrupted before it was written corrupted and the redundant copy also written corrupted. It might be you had a CPU error, or some parity error in non-ECC ram, or a bus glitch or bad firmware in the HBA, for example. The fact remains, something was written corrupted, and the redundant copy was also written corrupted. All you can do is restore from a snapshot, restore from a backup, or accept it for what it is and make the error go away. Sorry to hear it... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Building an On-Site and Off-Size ZFS server, replication question
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Richard Elling > > >> If the recipient system doesn't support "zfs receive," [...] > > > > On that note, is there a minimal user-mode zfs thing that would allow > > receiving a stream into an image file? No need for file/directory access > > etc. > > cat :-) He was asking if it's possible to do "zfs receive" on a system that doesn't natively support zfs. The answer is no, unless you want to consider fuse or similar. I can't speak about zfs on fuse or anything - except that I personally wouldn't trust it. There are differences even between zfs on solaris versus freebsd, vs whatever, all of which are fully supported, much better than zfs on fuse. But different people use and swear by all of these things - so maybe it would actually be a good solution for you. The direction I would personally go would be an openindiana virtual machine to do the zfs receive. > > I was thinking maybe the zfs-fuse-on-linux project may have suitable bits? > > I'm sure most Linux distros have cat hehe. Anyway. Answered above. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How many disk in one pool
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Albert Shih > > I'm actually running ZFS under FreeBSD. I've a question about how many > disks I have in one pool. > > At this moment I'm running with one server (FreeBSD 9.0) with 4 MD1200 > (Dell) meaning 48 disks. I've configure with 4 raidz2 in the pool (one on > each MD1200) > > On what I understand I can add more more MD1200. But if I loose one > MD1200 > for any reason I lost the entire pool. > > In your experience what's the ? 100 disk ? > > How FreeBSD manage 100 disk ? /dev/da100 ? Correct about if you lose one storage tray you lose the pool. Ideally you would span your redundancy across trays as well as across disks - but in your situation, 12 disks in raidz2 - and 4 trays - it's just not realistic for you. You would have to significantly increase cost (not to mention rebuild pool) in order to keep the same available disk space and gain the redundancy. Go ahead and add more trays. I've never heard of any limit of number of disks you can have in ZFS. I'm sure there is a limit, but whatever it is, you're nowhere near it. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Building an On-Site and Off-Size ZFS server, replication question
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Frank Cusack > > On Fri, Oct 5, 2012 at 3:17 AM, Ian Collins wrote: > I do have to suffer a slow, glitchy WAN to a remote server and rather than > send stream files, I broke the data on the remote server into a more fine > grained set of filesystems than I would do normally. In this case, I made the > directories under what would have been the leaf filesystems filesystems > themselves. > > Meaning you also broke the data on the LOCAL server into the same set of > more granular filesystems? Or is it now possible to zfs send a subdirectory > of > a filesystem? "zfs create" instead of "mkdir" As Ian said - he didn't zfs send subdirs, he made filesystems where he otherwise would have used subdirs. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] vm server storage mirror
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Edward Ned Harvey > > I must be missing something - I don't see anything above that indicates any > required vs optional dependencies. Ok, I see that now. (Thanks to the SMF FAQ). A dependency may have grouping optional_all, require_any, or require_all. Mine is require_all, and I figured out the problem. I had my automatic zpool import/export script dependent on the initiator ... But it wasn't the initiator going down first. It was the target going down first. So the solution is like this: sudo svccfg -s svc:/network/iscsi/initiator:default svc:/network/iscsi/initiator:default> addpg iscsi-target dependency svc:/network/iscsi/initiator:default> setprop iscsi-target/grouping = astring: "require_all" svc:/network/iscsi/initiator:default> setprop iscsi-target/restart_on = astring: "none" svc:/network/iscsi/initiator:default> setprop iscsi-target/type = astring: "service" svc:/network/iscsi/initiator:default> setprop iscsi-target/entities = fmri: "svc:/network/iscsi/target:default" svc:/network/iscsi/initiator:default> exit sudo svcadm refresh svc:/network/iscsi/initiator:default And additionally, create the SMF service dependent on initiator, which will import/export the iscsi pools automatically. http://nedharvey.com/blog/?p=105 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Building an On-Site and Off-Size ZFS server, replication question
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Tiernan OToole > > I am in the process of planning a system which will have 2 ZFS servers, one on > site, one off site. The on site server will be used by workstations and > servers > in house, and most of that will stay in house. There will, however, be data i > want backed up somewhere else, which is where the offsite server comes > in... This server will be sitting in a Data Center and will have some storage > available to it (the whole server currently has 2 3Tb drives, though they are > not dedicated to the ZFS box, they are on VMware ESXi). There is then some > storage (currently 100Gb, but more can be requested) of SFTP enabled > backup which i plan to use for some snapshots, but more on that later. > > Anyway, i want to confirm my plan and make sure i am not missing anything > here... > > * build server in house with storage, pools, etc... > * have a server in data center with enough storage for its reason, plus the > extra for offsite backup > * have one pool set as my "offsite" pool... anything in here should be backed > up off site also... > * possibly have another set as "very offsite" which will also be pushed to the > SFTP server, but not sure... > * give these pools out via SMB/NFS/iSCSI > * every 6 or so hours take a snapshot of the 2 offsite pools. > * do a ZFS send to the data center box > * nightly, on the very offsite pool, do a ZFS send to the SFTP server > * if anything goes wrong (my server dies, DC server dies, etc), Panic, > download, pray... the usual... :) > > Anyway, I want to make sure i am doing this correctly... Is there anything on > that list that sounds stupid or am i doing anything wrong? am i missing > anything? > > Also, as a follow up question, but slightly unrelated, when it comes to the > ZFS > Send, i could use SSH to do the send, directly to the machine... Or i could > upload the compressed, and possibly encrypted dump to the server... Which, > for resume-ability and speed, would be suggested? And if i where to go with > an upload option, any suggestions on what i should use? It is recommended, whenever possible, you should pipe the "zfs send" directly into a "zfs receive" on the receiving system. For two solid reasons: If a single bit is corrupted, the whole stream checksum is wrong and therefore the whole stream is rejected. So if this occurs, you want to detect it (in the form of one incremental failed) and then correct it (in the form of the next incremental succeeding). Whereas, if you store your streams on storage, it will go undetected, and everything after that point will be broken. If you need to do a restore, from a stream stored on storage, then your only choice is to restore the whole stream. You cannot look inside and just get one file. But if you had been doing send | receive, then you obviously can look inside the receiving filesystem and extract some individual specifics. If the recipient system doesn't support "zfs receive," you might consider exporting an iscsi device, and allowing the sender system deal with it directly. Or share a filesystem (such as NFS) with the sender system, and let the sender create a recipient filesystem inside a file container, so the sender can deal with it directly. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Making ZIL faster
> From: Neil Perrin [mailto:neil.per...@oracle.com] > > In general - yes, but it really depends. Multiple synchronous writes of any > size > across multiple file systems will fan out across the log devices. That is > because there is a separate independent log chain for each file system. > > Also large synchronous writes (eg 1MB) within a specific file system will be > spread out. > The ZIL code will try to allocate a block to hold all the records it needs to > commit up to the largest block size - which currently for you should be 128KB. > Anything larger will allocate a new block - on a different device if there are > multiple devices. > > However, lots of small synchronous writes to the same file system might not > use more than one 128K block and benefit from multiple slog devices. That is an awesome explanation. Thank you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss