Re: [zfs-discuss] ZFS - Sudden decrease in write performance
arc-discuss doesn't have anything specifically to do with ZFS; in particular, it has nothing to do with the ZFS ARC. Just an unfortunate overlap of acronyms. Cross-posted to zfs-discuss, where this probably belongs. Hey all1 Recently I've decided to implement OpenSolaris as a target for BackupExec. The server I've converted into a Storage Appliance is an IBM x3650 M2 w/ ~4TB of on board storage via ~10 local SATA drives and I'm using OpenSolaris svn_134. I'm using a QLogic 4Gb FC HBA w/ the QLT driver and presented an 8TB sparse volume to the host due to dedup and compression being turned on for the zpool. When writes begin, I see anywhere from 4.5GB/Min to 5.5GB/Min and then it drops of quickly (I mean down to 1GB/Min or less). I've already swapped out the card, cable, and port with no results. I have since ensured that every piece of equipment on the box had it's firmware updated. While doing so, I installed Windows Server 2008 to flash all the firmware (IBM doesn't have a Solaris installer). While in Server 2008, I decided to just attempt a backup via share on the 1Gbs copper connection. I saw speeds of up to 5.5GB/Min consistently and they were sustained throughout 3 days of testing. Today I decided to move back to OpenSolaris with confidence. All writes began at 5.5GB/Min and quickly dropped off. In my troubleshooting efforts, I have also dropped the fiber connection and made it an iSCSI target with no performance gains. I have let the on board RAID controller do the RAID portion instead of creating a zpool of multiple disks with no performance gains. And, I have created the target LUN using both rdsk and dsk paths. I did notice today though, that there is a direct correlation between the ARC memory usage and speed. Using arcstat.pl, as soon as arcsz hits 1G (half of c column [commit?]), my throughput hits the floor (i.e. 600MB/Min or less). I can't figure it out. I tried every configuration possible. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs under medium load causes SMB to delay writes
This is not the appropriate group/list for this message. Crossposting to zfs-discuss (where it perhaps primarily belongs) and to cifs-discuss, which also relates. Hi, I have an I/O load issue and after days of searching wanted to know if anyone has pointers on how to approach this. My 1-year stable zfs system (raidz3 8 2TB drives, all OK) just started to cause problems when I introduced a new backup script that puts medium I/O load. This script simply tars up a few filesystems and md5sums the tarball, to copy to another system for off OpenSolaris backup. The simple commands are: tar -c /tank/[filesystem]/.zfs/snapshot/[snapshot] /tank/[otherfilesystem]/file.tar md5sum -b /tank/[otherfilesystem]/file.tar file.md5sum These 2 commands obviously cause high read/write I/O because the 8 drives are directly reading and writing a large amount of data as fast as the system can go. This is OK and OpenSolaris functions fine. The problem is I host VMWare images on another PC which access their images on this zfs box over smb, and during this high I/O period, these VMWare guests are crashing. What I think is happening is during the backup with high I/O, zfs is delaying reads/writes to the VMWare images. This is quickly causing VMWare to freeze the guest machines. When the backup script is not running, the VMWare guests are fine, and have been fine for 1-year. (setup has been rock solid) Any idea how to address this? I'd thought puting the relevant filesystem (tank/vmware) on a higher priority for reads/writes, but haven't figured out how. Another way is to deprioritize the backup somehow. Any pointers would be appreciated. Thanks, Tom -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sharesmb should be ignored if filesystem is not mounted
On 10/28/10 08:40 AM, Richard L. Hamilton wrote: I have sharesmb=on set for a bunch of filesystems, including three that weren't mounted. Nevertheless, all of those are advertised. Needless to say, the one that isn't mounted can't be accessed remotely, even though since advertised, it looks like it could be. When you say advertised do you mean that it appears in /etc/dfs/sharetab when the dataset is not mounted and/or you can see it from a client with 'net view' on a client? I'm using a recent build and I see the smb share disappear from both when the dataset is unmounted. I could see it in Finder on a Mac client; presumably were I on a Windows client, it would have appeared with net view. I've since turned off the sharesmb property on those filesystems, so I may need to reboot (which I'd much rather not) to re-create the problem. But if recent builds don't have the problem, that's the main thing. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] sharesmb should be ignored if filesystem is not mounted
I have sharesmb=on set for a bunch of filesystems, including three that weren't mounted. Nevertheless, all of those are advertised. Needless to say, the one that isn't mounted can't be accessed remotely, even though since advertised, it looks like it could be. # zfs list -o name,mountpoint,sharesmb,mounted|awk '$(NF-1)!=off $(NF-1)!=- $NF!=yes' NAME MOUNTPOINT SHARESMB MOUNTED rpool/ROOT legacy on no rpool/ROOT/snv_129 / on no rpool/ROOT/snv_93 /tmp/.alt.luupdall.22709on no # So I think that if a zfs filesystem is not mounted, sharesmb should be ignored. This is in snv_97 (SXCE; with a pending LU BE not yet activated, and an old one no longer active); I don't know if it's still a problem in current builds that unmounted filesystems are advertised, but if it is, I can see how it could confuse clients. So I thought I'd mention it. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sharesmb should be ignored if filesystem is not mounted
PS obviously these are home systems; in a real environment, I'd only be sharing out filesystems with user or application data, and not local system filesystems! But since it's just me, I somewhat trust myself not to shoot myself in the foot. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] tagged ACL groups: let's just keep digging until we come out the other side
On Thu, Sep 30, 2010 at 08:14:24PM -0400, Miles Nordin wrote: Can the user in (3) fix the permissions from Windows? no, not under my proposal. Then your proposal is a non-starter. Support for multiple remote filesystem access protocols is key for ZFS and Solaris. The impedance mismatches between these various protocols means that we need to make some trade-offs. In this case I think the business (as well as the engineers involved) would assert that being a good SMB server is critical, and that being able to authoritatively edit file permissions via SMB clients is part of what it means to be a good SMB server. Now, you could argue that we should being aclmode back and let the user choose which trade-offs to make. And you might propose new values for aclmode or enhancements to the groupmask setting of aclmode. but it sounds like currently people cannot ``fix'' permissions through the quirky autotranslation anyway, certainly not to the point where neither unix nor windows users are confused: windows users are always confused, and unix users don't get to see all the permissions. Thus the current behavior is the same as the old aclmode=discard setting. Now what? set the unix perms to 777 as a sign to the unix people to either (a) leave it alone, or (b) learn to use 'chmod A...'. This will actually work: it's not a hand-waving hypothetical that just doesn't play out. That's not an option, not for a default behavior anyways. Nico One question: Casper, where are you? The guy that did fine-grained permissions IMO ought to have an idea of how to do something with ACLs that's both safe and unsurprising for the various sorts of clients. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Please warn a home user against OpenSolaris under VirtualBox under WinXP ; )
Hmm...according to http://www.mail-archive.com/vbox-users-commun...@lists.sourceforge.net/msg00640.html that's only needed before VirtualBox 3.2, or for IDE. = 3.2, non-IDE should honor flush requests, if I read that correctly. Which is good, because I haven't seen an example of how to enabling flushing for SAS (which is the emulation I usually use because it's supposed to have better performance). -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] fs root inode number?
Typically on most filesystems, the inode number of the root directory of the filesystem is 2, 0 being unused and 1 historically once invisible and used for bad blocks (no longer done, but kept reserved so as not to invalidate assumptions implicit in ufsdump tapes). However, my observation seems to be (at least back at snv_97), the inode number of ZFS filesystem root directories (including at the top level of a spool) is 3, not 2. If there's any POSIX/SUS requirement for the traditional number 2, I haven't found it. So maybe there's no reason founded in official standards for keeping it the same. But there are bound to be programs that make what was with other filesystems a safe assumption. Perhaps a warning is in order, if there isn't already one. Is there some _reason_ why the inode number of filesystem root directories in ZFS is 3 rather than 2? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Debunking the dedup memory myth
Even the most expensive decompression algorithms generally run significantly faster than I/O to disk -- at least when real disks are involved. So, as long as you don't run out of CPU and have to wait for CPU to be available for decompression, the decompression will win. The same concept is true for dedup, although I don't necessarily think of dedup as a form of compression (others might reasonably do so though.) Effectively, dedup is a form of compression of the filesystem rather than any single file, but one oriented to not interfering with access to any of what may be sharing blocks. I would imagine that if it's read-mostly, it's a win, but otherwise it costs more than it saves. Even more conventional compression tends to be more resource intensive than decompression... What I'm wondering is when dedup is a better value than compression. Most obviously, when there are a lot of identical blocks across different files; but I'm not sure how often that happens, aside from maybe blocks of zeros (which may well be sparse anyway). -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Legality and the future of zfs...
Losing ZFS would indeed be disastrous, as it would leave Solaris with only the Veritas File System (VxFS) as a semi-modern filesystem, and a non-native FS at that (i.e. VxFS is a 3rd-party for-pay FS, which severely inhibits its uptake). UFS is just way to old to be competitive these days. Having come to depend on them, the absence of some of the features would certainly be significant. But how come everyone forgets about QFS? http://www.sun.com/storage/management_software/data_management/qfs/index.xml http://en.wikipedia.org/wiki/QFS http://hub.opensolaris.org/bin/view/Project+samqfs/WebHome -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Legality and the future of zfs...
On Tue, 13 Jul 2010, Edward Ned Harvey wrote: It is true there's no new build published in the last 3 months. But you can't use that to assume they're killing the community. Hmm, the community seems to think they're killing the community: http://developers.slashdot.org/story/10/07/14/1448209 /OpenSolaris-Governing-Board-Closing-Shop?from=rss ZFS is great. It's pretty much the only reason we're running Solaris. But I don't have much confidence Oracle Solaris is going to be a product I'm going to want to run in the future. We barely put our ZFS stuff into production last year but quite frankly I'm already on the lookout for something to replace it. No new version of OpenSolaris (which we were about to start migrating to). No new update of Solaris 10. *Zero* information about what the hell's going on... Presumably if you have a maintenance contract or some other formal relationship, you could get an NDA briefing. Not having been to one yet myself, I don't know what that would tell you, but presumably more than without it. Still, the silence is quite unhelpful, and the apparent lack of anyone willing to recognize that, and with the authority to do anything about it, is troubling. ZFS will surely live on as the filesystem under the hood in the doubtlessly forthcoming Oracle database appliances, and I'm sure they'll keep selling their NAS devices. But for home users? I doubt it. I was about to build a big storage box at home running OpenSolaris, I froze that project. Oracle is all about the money. Which I guess is why they're succeeding and Sun failed to the point of having to sell out to them. My home use wasn't exactly going to make them a profit, but on the other hand, the philosophy that led to my not using the product at home is a direct cause of my lack of desire to continue using it at work, and while we're not exactly a huge client we've dropped a decent penny or two in Sun's wallet over the years. FWIW, you're not the only one that's tried to make that point! Who knows, maybe Oracle will start to play ball before August 16th and the OpenSolaris Governing Board won't shut themselves down. But I wouldn't hold my breath. Postponement of respiration pending hypothetical actions by others is seldom an effective survival strategy. Nevertheless, the zfs on my Sun Blade 2000 currently running SXCE snv_97 (pending luactivate and reboot to switch to snv_129) is doing just fine with what is presently 3TB of redundant storage, and will eventually grow to 9TB as I populate the rest of the slots in my JBOD (8 slots; 2 x 1TB mirror for root; presently also 2 x 2TB mirror for data, but that will change to 5 x 2TB raidz + 1 2TB hot spare when I can afford four more 2TB drives). I have a spare power supply and some other odds and ends for the Sun Blade 2000, so, with fingers crossed, it will run (and heat my house :-) for quite some time to come, regardless of availability of future software updates. If not, I'm sure I have an ISO of SXCE 129 or so for x86 somewhere too, which I could put on any cheap x86 box with a PCIx slot for my SAS controller, and just import the zpools and go. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Legality and the future of zfs...
never make it any better. Just for a record: Solaris 9 and 10 from Sun was a plain crap to work with, and still is inconvenient conservative stagnationware. They won't build a free cool tools Everybody but geeks _wants_ stagnationware, if you means something that just runs. Even my old Sun Blade 100 at home still has Solaris 9 on it, because I haven't had a day to kill to split the mirror, load something newer like the last SXCE, and get everything on there working on it. (My other SPARC is running a semi-recent SXCE, and pending activation of an already installed most recent SXCE. Sitting at a Sun, I still prefer CDE to GNOME, and the best graphics card I have for that box won't work with the newer Xorg server, so I can't see putting OpenSolaris on it.) For instance, recent enough Solaris 10 updates to be able to do zfs root are pretty decent; you get into the habit of doing live upgrades even for patching, so you can minimize downtime. Hardly stagnant, considering that the initial release of Solaris 10 didn't even have zfs in it yet. for Solaris, hence the whole thing will turned to be a dry job for trained monkeys wearing suits in a corporations. Nothing more. That's a philosophy of last decade, but IT now is very changing and is very different. That is why Oracle's idea to kill community is totally stupid. And that's why IBM will win, because you run the same Linux on their hardware as you run at your home. Yes, Oracle will run good for a while, using the inertia of a hype (and latest their financial report proves that), but soon people will realize that Oracle is just another evil mean beast with great marketing and the same sh*tty products as they always had. Buy Solaris for any single little purpose? No way ever! I may buy support and/or security patches, updates. But not the OS itself. If that is the only option, then I'd rather stick to Linux from other vendor, i.e. RedHat. That will lead me to no more talk to Oracle about software at OS level, only applications (if I am an idiot enough to jump into APEX or something like that). Hence, if all I can do is talk only about hardware (well, not really, because no more hardware-only support!!!), then I'd better talk to IBM, if I need a brand and I consider myself too dumb to get SuperMicro instead. IBM System x3550 M3 is still better by characteristics than equivalent from Oracle, it is OEM if somebody needs that at first place and is still cheaper than Oracle's similar class. And IBM stuff just works great (at least if we talk about hardware). I'm not going to say you're wrong, because in part I agree with you. Systems people can run at home, desktops, laptops, those are all what get future mindshare and eventually get people with big bucks spending them. But the simple fact that Sun went down suggests that just being all lovey-dovey (and plenty of people thought that Sun wasn't lovey-dovey _enough_?) won't keep you in business either. [...] But for home users? I doubt it. I was about to build a big storage box at home running OpenSolaris, I froze that project. Mine's running SXCE, and unless I can find a solution to getting decent graphics working with Xorg on it, probably always will be. But the big (well, target 9TB redundant; presently 3TB redundant) storage is doing just fine. Being super latest and greatest just isn't necessary for that. Same here. A lot of nice ideas and potential open-source tools basically frozen and I think gonna be dumped. We (geeks) won't build stuff for Larry just for free. We need OS back opened in reward. So I think OpenSolaris is pretty much game over, thanks to the Oracle. Some Oracle fanboys might call it a plain FUD, hope to get updates etc, but the reality is that Oracle to OpenSolaris is pretty much the same what Palm did for BeOS. Enjoy your last svn_134 build. I can't rule out that possibility, but I see some reasons to think that it's worth being patient for a couple more months. As it is, I find myself updating my Mac and Windows every darn week; so I'm pretty much past getting a kick out of updating just to see what's kewl. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Exporting iSCSI - it's still getting all the ZFS protect
AFAIK, zfs should be able to protect against (if the pool is redundant), or at least detect, corruption from the point that it is handed the data, to the point that the data is written to permanent storage, _provided_that_ the system has ECC RAM (so it can detect and often correct random background-radiation caused memory errors), and that, if zfs controls the whole disk and the disk has a write cache, the disk correctly honors requests to flush the write cache to permanent storage. That should be just as true for a zvol as for a regular zfs file. What I'm trying to say is that zfs should give you a lot of protection in your situation, but that it can do nothing about it if it is handed bad data: for example, if the client is buggy and sends corrupt data, if somehow a network error goes undetected (unlikely given that AFAIK iSCSI runs over TCP and at least thus far never over UDP, and TCP always checksums (UDP might not)), if the iSCSI server software corrupts data before writing it to disk, etc. In other words, zfs probably gives more protection to a larger portion of the data path than just about anything else, but in the case of a remote client, whether iSCSI, NFS, CIFS, or whatever, the data path is longer and distributed, and the verification that zfs does only covers part of that. What I'm saying would _not_ apply if the client were doing zfs onto iSCSI storage; in that case, the client's zfs would also be looking after data integrity. So the closer to the data generating application that the integrity from that point on is provided, the less places something bad can happen without being at least detected. Note: I can't guarantee that any of what I said is correct, although I would be willing to risk my own data as if it were. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] why both dedup and compression?
I've googled this for a bit, but can't seem to find the answer. What does compression bring to the party that dedupe doesn't cover already? Thank you for you patience and answers. That almost sounds like a classroom question. Pick a simple example: large text files, of which each is unique, maybe lines of data or something. Not likely to be much in the way of duplicate blocks to share, but very likely to be highly compressible. Contrast that with binary files, which might have blocks of zero bytes in them (without being strictly sparse, sometimes). With deduping, one such block is all that's actually stored (along with all the references to it, of course). In the 30 seconds or so I've been thinking about it to type this, I would _guess_ that one might want one or the other, but rarely both, since compression might tend to work against deduplication. So given the availability of both, and how lightweight zfs filesystems are, one might want to create separate filesystems within a pool with one or the other as appropriate, and separate the data according to which would likely work better on it. Also, one might as well put compressed video, audio, and image formats in a filesystem that was _not_ compressed, since compressing an already compressed file seldom gains much if anything more. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] why both dedup and compression?
Another thought is this: _unless_ the CPU is the bottleneck on a particular system, compression (_when_ it actually helps) can speed up overall operation, by reducing the amount of I/O needed. But storing already-compressed files in a filesystem with compression is likely to result in wasted effort, with little or no gain to show for it. Even deduplication requires some extra effort. Looking at the documentation, it implies a particular checksum algorithm _plus_ verification (if the checksum or digest matches, then make sure by doing a byte-for-byte compare of the blocks, since nothing shorter than the data itself can _guarantee_ that they're the same, just like no lossless compression can possibly work for all possible bitstreams). So doing either of these where the success rate is likely to be too low is probably not helpful. There are stats that show the savings for a filesystem due to compression or deduplication. What I think would be interesting is some advice as to how much (percentage) savings one should be getting to expect to come out ahead not just on storage, but on overall system performance. Of course, no such guidance would exactly fit any particular workload, but I think one might be able to come up with some approximate numbers, or at least a range, below which those features probably represented a waste of effort unless space was at an absolute premium. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool rename?
[...] To answer Richard's question, if you have to rename a pool during import due to a conflict, the only way to change it back is to re-import it with the original name. You'll have to either export the conflicting pool, or (if it's rpool) boot off of a LiveCD which doesn't use an rpool to do the rename. Thanks. The latter is what I ended up doing (well, off of the SXCE install CD image that I'd used to set up that disk image in VirtualBox in the first place). -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zpool rename?
One can rename a zpool on import zpool import -f pool_or_id newname Is there any way to rename it (back again, perhaps) on export? (I had to rename rpool in an old disk image to access some stuff in it, and I'd like to put it back the way it was so it's properly usable if I ever want to boot off of it.) But I suppose there must be other scenarios where that would be useful too... -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mapping inode numbers to file names
[...] There is a way to do this kind of object to name mapping, though there's no documented public interface for it. See zfs_obj_to_path() function and ZFS_IOC_OBJ_TO_PATH ioctl. I think it should also be possible to extend it to handle multiple names (in case of multiple hardlinks) in some way, as id of parent directory is recorded at the time of link creation in zone attributes To add a bit: these sorts of things are _not_ required by any existing standard, and may be limited to use by root (since they bypass directory permissions). So they're typically private, undocumented, and subject to change without notice. Some other examples: UFS _FIOIO ioctl: obtain a read-only file descriptor given an existing file descriptor on the file system (to make the ioctl on) and the inode number and generation number (keeps inode numbers from being reused too quickly, mostly to make NFS happy I think) in an argument to the ioctl. Mac OS X /.vol directory: allows pre-OS X style access by volume-ID/folder-ID/name triplet Those are all hidden behind a particular library or application that is the only supported way of using them. It is perhaps unfortunate that there is no generic root-only way to look up fsid/inode (problematic though due to hard links) or fsid/dir_inode/name (could fail if name has been moved to another directory on the same filesystem) but implementing a generic solution would likely be a lot of work (requiring support from every filesystem, most of which were _not_ designed to do a reverse lookup, i.e. from inode back to name), and the use cases seem to be very few indeed. (As an example of that, /.vol on a Mac is said to only work for HFS or HFS+ volumes, not old UFS volumes (Macs used to support their own flavor of UFS, apparently; no doubt one considerably different from on Solaris, so don't go there) In fact, I'm not sure that /.vol works at all on the latest Mac OS X.) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] customizing zfs list with less typing
It might be nice if zfs list would check an environment variable for a default list of properties to show (same as the comma-separated list used with the -o option). If not set, it would use the current default list; if set, it would use the value of that environment variable as the list. I find there are a lot of times I want to see the same one additional property when using zfs list; an environment variable would mean a one-time edit of .profile rather than typing the -o option with the default list modified by whatever I want to add. Along those lines, pseudo properties that were abbreviated (constant-length) versions of some real properties might help. For instance, sharenfs can be on, off, or a rather long list of nfs sharing options. A pseudo property with a related name and a value of on, off, or spec (with spec implying some arbitrary list of applicable options) would have a constant length. Given two potentially long properties (mountpoint and the pseudo property name), output lines are already close to cumbersome (that assumes one at the beginning of the line and one at the end). Additional potentially long properties in the output would tend to make it unreadable. Both of those, esp. together, would make quickly checking or familiarizing oneself with a server that much more civilized, IMO. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] customizing zfs list with less typing
Just make 'zfs' an alias to your version of it. A one-time edit of .profile can update that alias. Sure; write a shell function, and add an alias to it. And use a quoted command name (or full path) within the function to get to the real command. Been there, done that. But to do a good job of it means parsing the command line the same way the real command would, so that it only adds -o ${ZFS_LIST_PROPS:-name,used,available,referenced,mountpoint} or perhaps better ${ZFS_LIST_PROPS:+-o ${ZFS_LIST_PROPS}} to zfs list (rather than to other subcommands), and only if the -o option wasn't given explicitly. That's not only a bit of a pain, but anytime one is duplicating parsing, it's begging for trouble: in case they don't really handle it the same, or in case the underlying command is changed. And unless that sort of thing is handled with extreme care (quoting all shell variable references, just for starters), it can turn out to be a security problem. And that's just the implicit options part of what I want; the other part would take optionally filtering to modify the command output as well. That's starting to get nuts, IMO. Heck, I can grab a copy of the source for the zfs command, modify it, and recompile it (without building all of OpenSolaris) faster than I can write a really good shell wrapper that does the same thing. But then I have to maintain my own divergent implementation, unless I can interest someone else in the idea...OTOH by the time the hoop-jumping for getting something accepted is over, it's definitely been more bother gain... -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] high read iops - more memory for arc?
FYI, the arc and arc-discuss lists or forums are not appropriate for this. There are two arc acronyms: * Architecture Review Committee (arc list is for cases being considered, arc-discuss is for other discussion. Non-committee business is most unwelcome on the arc list.) * the ZFS Adaptive Replacement Cache. That is what you are talking about. The zfs-discuss list is appropriate for that subject; storage-discuss and database-discuss _may_ relate, but rather than sending to every list that _might_ relate, I'd suggest starting with the most appropriate first, and reading enough of the posts already on a list to get some idea of what's appropriate there and what isn't, before just adding it as and additional CC in the hopes that someone might answer. Very few people are likely to be responding here at this time, insofar as the largest part of the people that might are probably observing (at least socially) the Christmas holiday right now (their families might not appreciate them being distracted by anything else!), and many of the rest aren't interacting much because of how many are not around right now. Don't expect too much until the first Monday after 1 January. And anyway, discussion lists are not a place where anyone is _obligated_ to answer. Those with support contracts presumably have other ways of getting help. Now...I probably couldn't answer your question even if I had all the information you left out,but maybe someone could, eventually. Some of the information they might need: * what are you running (uname -a will do)? ZFS is constantly being improved; problems get fixed (and sometimes introduced) in just about every build * what system, how is it configured, exactly what disk models, etc? Free memory is _supposed_ to be low. Free memory is wasted memory, except that a little is kept free to quickly respond to requests for more. Most memory not otherwise being used for mappings, kernel data structures, etc, is used as either additional VM page cache of pages that might be used again, or by the ZFS ARC. The tools to report on just how memory is used behave differently on Solaris (and even on different versions) than they do on other OSs, because Solaris tries really hard to make best use of all RAM. The uname -a information would also help someone (more knowledgeable than I, although I might be able to look it up) suggest which tools would best help to understand your situation. So while free memory alone doesn't tell you much, there's a good chance that more would help unless there's some specific problem that's involved. There's also a good chance that your problem is known, recognizable, and probably has a fix in a newer version or a workaround, if you provide enough information to help someone find that for you. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Why is st_size of a zfs directory equal to the number of entries?
Cute idea, maybe. But very inconsistent with the size in blocks (reported by ls -dls dir). Is there a particular reason for this, or is it one of those just for the heck of it things? Granted that it isn't necessarily _wrong_. I just checked SUSv3 for stat() and sys/stat.h, and it appears that st_size is only well-defined for regular files and symlinks. So I suppose it could be (a) undefined, or (b) whatever is deemed to be useful, for directories, device files, etc. This is of course inconsistent with the behavior on other filesystems. On UFS (a bit of a special case perhaps in that it still allows read(2) on a directory, for compatibility), the st_size seems to reflect the actual number of bytes used by the implementation to hold the directory's current contents. That may well also be the case for tmpfs, but from user-land, one can't tell since it (reasonably enough) disallows read(2) on directories. Haven't checked any other filesystems. Don't have anything else (pcfs, hsfs, udfs, ...) mounted at the moment to check. (other stuff: ISTR that devices on Solaris will give a size if applicable, but for non LF-aware 32-bit, that may be capped at MAXOFF32_T rather than returning an error; I think maybe for pipes, one sees the number of bytes available to be read. None of which is portable or should necessarily be depended on...) Cool ideas are fine, but IMO, if one does wish to make something nominally undefined have some particular behavior, I wonder why one wouldn't at least try for consistency... -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is st_size of a zfs directory equal to the
Richard L. Hamilton rlha...@smart.net wrote: Cute idea, maybe. But very inconsistent with the size in blocks (reported by ls -dls dir). Is there a particular reason for this, or is it one of those just for the heck of it things? Granted that it isn't necessarily _wrong_. I just checked SUSv3 for stat() and sys/stat.h, and it appears that st_size is only well-defined for regular files and symlinks. So I suppose it could be (a) undefined, or (b) whatever is deemed to be useful, for directories, device files, etc. You could also return 0 for st_size for all directories and would still be POSIX compliant. Jörg Yes, some do IIRC (certainly for empty directories, maybe always; I forget what OS I saw that on). Heck, undefined means it wouldn't be _wrong_ to return a random number. Even a _negative_ number wouldn't necessarily be wrong (although it would be a new low in rudeness, perhaps). I did find the earlier discussion on the subject (someone e-mailed me that there had been such). It seemed to conclude that some apps are statically linked with old scandir() code that (incorrectly) assumed that the number of directory entries could be estimated as st_size/24; and worse, that some such apps might be seeing the small st_size that zfs offers via NFS, so they might not even be something that could be fixed on Solaris at all. But I didn't see anything in the discussion that suggested that this was going to be changed. Nor did I see a compelling argument for leaving it the way it is, either. In the face of undefined, all arguments end up as pragmatism rather than principle, IMO. Maybe it's not a bad thing to go and break incorrect code. But if that code has worked for a long time (maybe long enough for the source to have been lost), I don't know that it's helpful to just remind everyone that st_size is only defined for certain types of objects, and directories aren't one of them. (Now if one wanted to write something to break code depending on 32-bit time_t _now_ rather than waiting for 2038, that might be a good deed in terms of breaking things. But I'll be 80 then (if I'm still alive), and I probably won't care.) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mac Mini (OS X 10.5.4) with globalSAN
On Wed, 13 Aug 2008, Richard L. Hamilton wrote: Reasonable enough guess, but no, no compression, nothing like that; nor am I running anything particularly demanding most of the time. I did have the volblocksize set down to 512 for that volume, since I thought that for the purpose, that reflected hardware-like behavior. But maybe there's some reason that's not a good idea. Yes, that would normally be a very bad idea. The default is 128K. The main reason to want to reduce it is if you have an application doing random-access I/O with small block sizes (e.g. 8K is common for applications optimized for UFS). In that case the smaller block sizes decrease overhead since zfs reads and updates whole blocks. If the block size is 512 then that means you are normally performing more low-level I/Os, doing more disk seeks, and wasting disk space. The hardware itself does not really deal with 512 bytes any more since buffering on the disk drive is sufficient to buffer entire disk tracks and when data is read, it is common for the disk drive to read the entire track into its local buffer. A hardware RAID controller often extends that 512 bytes to a somewhat larger value for its own purposes. Bob Ok, but that leaves the question what a better value would be. I gather that HFS+ operates in terms of 512-byte sectors but larger allocation units; however, unless those allocation units are a power of two between 512 and 128k inclusive _and_ are accordingly aligned within the device (or actually, with the choice of a proper volblocksize can be made to correspond to blocks in the underlying zvol), it seems to me that a larger volblocksize would not help; it might well mean that a one a.u. write by HFS+ equated to two blocks read and written by zfs, because the alignment didn't match, whereas at least with the smallest volblocksize, there should never be a need to read/merge/write. I'm having trouble figuring out how to get the info to make a better choice on the HFS+ side; maybe I'll just fire up wireshark, and see if it knows how to interpret iSCSI, and/or run truss on iscsitgtd to see what it actually is reading from/writing to the zvol; if there is a consistent least common aligned blocksize, I would expect the latter especially to reveal it, and probably the former to confirm it. I did string Ethernet; I think that sped things up a bit, but it didn't change the annoying pauses. In the end, I found a 500GB USB drive on sale for $89.95 (US), and put that on the Mac, with 1 partition for backups, and 1 each for possible future [Open]Solaris x86, Linux, and Windows OSs, assuming they can be booted from a USB drive on a Mac Mini. Still, I want to know if the pausing with iscsitgtd is in part something I can tune down to being non-obnoxious, or is (as I suspect) in some sense a real bug. cc-ing zfs-discuss, since I suspect the problem might be there at least as much as with iscsitgtd (not that the latter is a prize-winner, having core-dumped with an assert() somewhere a number of times). This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] memory hog
Hmm...my SB2K, 2GB RAM, 2x 1050MHz UltraSPARC III Cu CPU, seems to freeze momentarily for a couple of seconds every now and then in a zfs root setup on snv_90, which it never did with mostly ufs on snv_81; that despite having much faster disks now (LSI SAS 3800X and a pair of Seagate 1TB SAS drives (mirrored), vs the 2x internal 73GB FC drives; the SAS drives, at a mere 7200 RPM can sustain a sequential transfer rate about 2.5x that of the 10KRPM FC drives!). Then again, between the hardware differences and any other software differences as well as the configuration change, I'm not absolutely ready to blame any particular one of those for those annoying pauses...but my suspicions are on zfs... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Boot from mirrored vdev
Are you using set md:mirrored_root_flag=1 in /etc/system? See the entry for md:mirrored_root_flag on http://docs.sun.com/app/docs/doc/819-2724/chapter2-156?a=view keeping in mind all the cautions... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can't rm file when No space left on device...
I wonder if one couldln't reduce (but probably not eliminate) the likelihood of this sort of situation by setting refreservation significantly lower than reservation? Along those lines, I don't see any property that would restrict the number of concurrent snapshots of a dataset :-( I think that would be real handy, along with one that would say whether to refuse another when the limit was reached, or to automatically delete the oldest snapshot. Yes, one can script the rotation of snapshots, but it might be nice to just make it policy for a given dataset instead, particularly together with delegated snapshot permission (provided that that didn't also delegate the ability to change the maximum number of allowed snapshots). This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Filesystem for each home dir - 10,000 users?
On Sat, 7 Jun 2008, Mattias Pantzare wrote: If I need to count useage I can use du. But if you can implement space usage info on a per-uid basis you are not far from quota per uid... That sounds like quite a challenge. UIDs are just numbers and new ones can appear at any time. Files with existing UIDs can have their UIDs switched from one to another at any time. The space used per UID needs to be tallied continuously and needs to track every change, including real-time file growth and truncation. We are ultimately talking about 128 bit counters here. Instead of having one counter per filesystem we now have potentially hundreds of thousands, which represents substantial memory. But if you already have the ZAP code, you ought to be able to do quick lookups of arbitrary byte sequences, right? Just assume that a value not stored is zero (or infinity, or uninitialized, as applicable), and you have the same functionality as the sparse quota file on ufs, without the problems. Besides, uid/gid/sid quotas would usually make more sense at the zpool level than at the individual filesystem level, so perhaps it's not _that_ bad. Which is to say, you want user X to have an n GB quota over the whole zpool, and you probably don't so much care whether the filesystem within the zpool corresponds to his home directory or to some shared directory. Multicore systems have the additional challenge that this complex information needs to be effectively shared between cores. Imagine if you have 512 CPU cores, all of which are running some of the ZFS code and have their own caches which become invalidated whenever one of those counters is updated. This sounds like a no-go for an almost infinite-sized pooled last word filesystem like ZFS. ZFS is already quite lazy at evaluating space consumption. With ZFS, 'du' does not always reflect true usage since updates are delayed. Whatever mechanism can check at block allocation/deallocation time to keep track of per-filesystem space (vs a filesystem quota, if there is one) could surely also do something similar against per-uid/gid/sid quotas. I suspect a lot of existing functions and data structures could be reused or adapted for most of it. Just one more piece of metadata to update, right? Not as if ufs quotas had zero runtime penalty if enabled. And you only need counters and quotas in-core for identifiers applicable to in-core znodes, not for every identifier used on the zpool. Maybe I'm off base on the details. But in any event, I expect that it's entirely possible to make it happen, scalably. Just a question of whether it's worth the cost of designing, coding, testing, documenting. I suspect there may be enough scenarios for sites with really high numbers of accounts (particularly universities, which are not only customers in their own right, but a chance for future mindshare) that it might be worthwhile, but I don't know that to be the case. IMO, even if no one sort of site using existing deployment architectures would justify it, given the future blurring of server, SAN, and NAS (think recent SSD announcement + COMSTAR + iSCSI initiator + separate device for zfs zil cache + in-kernel CIFS + enterprise authentication with Windows interoperability + Thumper + ...), the ability to manage all that storage in all sorts of as-yet unforseen deployment configurations _by user or other identity_ may well be important across a broad base of customers. Maybe identity-based, as well as filesystem-based quotas, should be part of that. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD reliability, wear levelling, warranty period
btw: it's seems to me that this thread is a little bit OT. I don't think its OT - because SSDs make perfect sense as ZFS log and/or cache devices. If I did not make that clear in my OP then I failed to communicate clearly. In both these roles (log/cache) reliability is of the utmost importance. Older SSDs (before cheap and relatively high-cycle-limit flash) were RAM cache+battery+hard disk. Surely RAM+battery+flash is also possible; the battery only needs to keep the RAM alive long enough to stage to the flash. That keeps the write count on the flash down, and the speed up (RAM being faster than flash). Such a device would of course cost more, and be less dense (given having to have battery+charging circuits and RAM as well as flash), than a pure flash device. But with more limited write rates needed, and no moving parts, _provided_ it has full ECC and maybe radiation-hardened flash (if that exists), I can't imagine why such a device couldn't be exceedingly reliable and have quite a long lifetime (with the battery, hopefully replaceable, being more of a limitation than the flash). It could be a matter of paying for how much quality you want... As for reliability, from zpool(1m): log A separate intent log device. If more than one log device is specified, then writes are load-balanced between devices. Log devices can be mirrored. However, raidz and raidz2 are not supported for the intent log. For more information, see the “Intent Log” section. cache A device used to cache storage pool data. A cache device cannot be mirrored or part of a raidz or raidz2 configuration. For more information, see the “Cache Devices” section. [...] Cache Devices Devices can be added to a storage pool as “cache devices.” These devices provide an additional layer of caching between main memory and disk. For read-heavy workloads, where the working set size is much larger than what can be cached in main memory, using cache devices allow much more of this working set to be served from low latency media. Using cache devices provides the greatest performance improvement for random read-workloads of mostly static content. To create a pool with cache devices, specify a “cache” vdev with any number of devices. For example: # zpool create pool c0d0 c1d0 cache c2d0 c3d0 Cache devices cannot be mirrored or part of a raidz configuration. If a read error is encountered on a cache device, that read I/O is reissued to the original storage pool device, which might be part of a mirrored or raidz configuration. The content of the cache devices is considered volatile, as is the case with other system caches. That tells me that the zil can be mirrored and zfs can recover from cache errors. I think that means that these devices don't need to be any more reliable than regular disks, just much faster. So...expensive ultra-reliability SSD, or much less expensive SSD plus mirrored zil? Given what zfs can do with cheap SATA, my bet is on the latter... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Growing root pool ?
On Tue, Jun 10, 2008 at 11:33:36AM -0700, Wyllys Ingersoll wrote: Im running build 91 with ZFS boot. It seems that ZFS will not allow me to add an additional partition to the current root/boot pool because it is a bootable dataset. Is this a known issue that will be fixed or a permanent limitation? The current limitation is that a bootable pool be limited to one disk or one disk and a mirror. When your data is striped across multiple disks, that makes booting harder. From a post to zfs-discuss about two months ago: ... we do have plans to support booting from RAID-Z. The design is still being worked out, but it's likely that it will involve a new kind of dataset which is replicated on each disk of the RAID-Z pool, and which contains the boot archive and other crucial files that the booter needs to read. I don't have a projected date for when it will be available. It's a lower priority project than getting the install support for zfs boot done. - Darren If I read you right, with little or nothing extra, that would enable growing rpool as well, since what it would really do is ensure /boot (and whatever if anything else) was mirrored even though the rest of the zpool was raidz or raidz2; which would also ensure that those critical items were _not_ spread across the stripe that would result from adding devices to an existing zpool. Of course installation and upgrade would have to be able to recognize and deal with such exotica too. Which seems to pose a problem, since having one dataset in the zpool mirrored while the rest is raidz and/or extended by a stripe implies to me that some space is more or less reserved for that purpose, or that such a dataset couldn't be snapshotted, or both; so I suppose there might be a smaller-than-total-capacity limit on the number of BEs possible. http://en.wikipedia.org/wiki/TANSTAAFL ... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Growing root pool ?
I'm not even trying to stripe it across multiple disks, I just want to add another partition (from the same physical disk) to the root pool. Perhaps that is a distinction without a difference, but my goal is to grow my root pool, not stripe it across disks or enable raid features (for now). Currently, my root pool is using c1t0d0s4 and I want to add c1t0d0s0 to the pool, but can't. -Wyllys Right, that's how it is right now (which the other guy seemed to be suggesting might change eventually, but nobody knows when because it's just not that important compared to other things). AFAIK, if you could shrink the partition whose data is after c1t0d0s4 on the disk, you could grow c1t0d0s4 by that much, and I _think_ zfs would pick up the growth of the device automatically. (ufs partitions can be grown like that, or by being on an SVM or VxVM volume that's grown, but then one has to run a command specific to ufs to grow the filesystem to use the additional space). I think zpools are supposed to grow automatically if SAN LUNs are grown, and this should be a similar situation, anyway. But if you can do that, and want to try it, just be careful. And of course you couldn't shrink it again, either. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SATA controller suggestion
I don't presently have any working x86 hardware, nor do I routinely work with x86 hardware configurations. But it's not hard to find previous discussion on the subject: http://www.opensolaris.org/jive/thread.jspa?messageID=96790 for example... Also, remember that SAS controllers can usually also talk to SATA drives; they're usually more expensive of course, but sometimes you can find a deal. I have a LSI SAS 3800x, and I paid a heck of a lot less than list for it (eBay), I'm guessing because someone bought the bulk package and sold off whatever they didn't need (new board, sealed, but no docs). That was a while ago, and being around US $100, it might still not have been what you'd call cheap. If you want $50, you might have better luck looking at the earlier discussion. But I suspect to some extent you get what you pay for; the throughput on the higher-end boards may well be a good bit higher, although for one disk (or even two, to mirror the system disk), it might not matter so much. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can't rm file when No space left on device...
On Thu, Jun 05, 2008 at 09:13:24PM -0600, Keith Bierman wrote: On Jun 5, 2008, at 8:58 PM 6/5/, Brad Diggs wrote: Hi Keith, Sure you can truncate some files but that effectively corrupts the files in our case and would cause more harm than good. The only files in our volume are data files. So an rm is ok, but a truncation is not? Seems odd to me, but if that's your constraint so be it. Neither will help since before the space can be freed a transaction must be written, which in turn requires free space. (So you say let ZFS save some just-in-case-space for this, but, how much is enough?) If you make it a parameter, that's the admin's problem. Although since each rm of a file also present in a snapshot just increases the divergence, only an rm of a file _not_ present in a snapshot would actually recover space, right? So in some circumstances, even if it's the admin's problem, there might be no amount that's enough to do what one wants to do without removing a snapshot. Specifically, take a snapshot of a filesystem that's very nearly full, and then use dd or whatever to create a single new file that fills up the filesystem. At that point, only removing that single new file will help, and even that's not possible without a just-in-case reserve of enough to handle worst case metadata(including system attributes, if any) update+transaction log+\ any other fudge I forgot, for at least one file's worth. Maybe that's a simplistic view of the scenario, I dunno... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs incremental-forever
If I read the man page right, you might only have to keep a minimum of two on each side (maybe even just one on the receiving side), although I might be tempted to keep an extra just in case; say near current, 24 hours old, and a week old (space permitting for the larger interval of the last one). Adjust frequency, spacing, and number according to available space, keeping in mind that the more COW-ing between snapshots (the longer interval if activity is more or less constant), the more space required. (assuming my head is more or less on straight right now...) Of course if you get messed up, you can always resync with a non-incremental transfer, so if you could live with that occasionally, there may be no need for more than two. Your script would certainly have to be careful to check for successful send _and_ receive before removing old snapshots on either side. ssh remotehost exit 1 seems to have a return code of 1 (cool). rsh does _not_ have that desirable property. But that still leaves the problem of how to check the exit status of the commands on both ends of a pipeline; maybe someone has solved that? Anyway, correctly verifying successful completion of the commands on both ends might be a bit tricky, but is critical if you don't want failures or the need for frequent non-incremental transfers. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Per-user home filesystems and OS-X Leopard anomaly
I encountered an issue that people using OS-X systems as NFS clients need to be aware of. While not strictly a ZFS issue, it may be encounted most often by ZFS users since ZFS makes it easy to support and export per-user filesystems. The problem I encountered was when using ZFS to create exported per-user filesystems and the OS-X automounter to perform the necessary mount magic. OS-X creates hidden .DS_Store directories in every directory which is accessed (http://en.wikipedia.org/wiki/.DS_Store). OS-X decided that it wanted to create the path /home/.DS_Store and it would not take `no' for an answer. First it would try to create /home/.DS_Store and then it would try an alternate name. Since the automounter was used, there would be an automount request for /home/.DS_Store, which does not exist on the server so the mount request would fail. Since OS-X does not take 'no' for an answer, there would be subsequent thousands of back to back mount requests. The end result was that 'mountd' was one of the top three resource consumers on my system, there would be bursts of high network traffic (1500 packets/second), and the affected OS-X system would operate more strangely than normal. The simple solution was to simply create a /home/.DS_Store directory on the server so that the mount request would succeed. Too bad it appears to be non-obvious how to do loopback mounts (a mount of one local directory onto another, without having to be an NFS server) on Darwin/MacOS X; then you could mount the /home/.DS_Store locally from a directory elsewhere (e.g. /export/home/.DS_Store) on each machine, rather than bothering the server with it. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Filesystem for each home dir - 10,000 users?
[...] That's not to say that there might not be other problems with scaling to thousands of filesystems. But you're certainly not the first one to test it. For cases where a single filesystem must contain files owned by multiple users (/var/mail being one example), old fashioned UFS quotas still solve the problem where the alternative approach with ZFS doesn't. A single /var/mail doesn't work well for 10,000 users either. When you start getting into that scale of service provisioning, you might look at how the big boys do it... Apple, Verizon, Google, Amazon, etc. You should also look at e-mail systems designed to scale to large numbers of users which implement limits without resorting to file system quotas. Such e-mail systems actually tell users that their mailbox is too full rather than just failing to deliver mail. So please, when we start having this conversation again, lets leave /var/mail out. I'm not recommending such a configuration; I quite agree that it is neither scalable nor robust. It's only merit is that it's an obvious example of where one would have potentially large files owned by many users necessarily on one filesystem, inasmuch as they were in one common directory. But there must be other examples where the ufs quota model is a better fit than the zfs quota model with potentially one filesystem per user. In terms of the limitations they can provide, zfs filesystem quotas remind me of DG/UX control point directories (presumably a relic of AOS/VS) - like regular directories except they could have a quota bound to them restricting the sum of the space of the subtree rooted there (the native filesystem on DG/UX didn't have UID-based quotas). Given restricted chown (non-root can't give files away), per-UID*filesystem quotas IMO make just as much sense as per-filesystem quotas themselves do on zfs, save only that per-UID*filesystem quotas make the filesystem less lightweight. For zfs, perhaps an answer might be if it were possible to have per-zpool uid/gid/projid/zoneid/sid quotas too? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Filesystem for each home dir - 10,000 users?
Hi All, I'm new to ZFS but I'm intrigued by the possibilities it presents. I'm told one of the greatest benefits is that, instead of setting quotas, each user can have their own 'filesystem' under a single pool. This is obviously great if you've got 10 users but what if you have 10,000? Are the overheads too great and do they outweigh the potential benefits? I've got a test system running with 5,000 dummy users which seems to perform fine, even if my 'df' output is a little sluggish :-) . Any advice or experiences would be greatly appreciated. I think sharemgr was created to speed up the case of sharing out very high numbers of filesystems on NFS servers, which otherwise took quite a long time. That's not to say that there might not be other problems with scaling to thousands of filesystems. But you're certainly not the first one to test it. For cases where a single filesystem must contain files owned by multiple users (/var/mail being one example), old fashioned UFS quotas still solve the problem where the alternative approach with ZFS doesn't. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] system backup and recovery
Hi list, for windows we use ghost to backup system and recovery. can we do similar thing for solaris by ZFS? I want to create a image and install to another machine, So that the personal configuration will not be lost. Since I don't do Windows, I'm not familiar with ghost, but I gather from Wikipedia that it's more a disk cloning tool (bare metal backup/restore) than a conventional backup program, although some people may well use it for backups too. Zfs has send and receive commands, which more or less correspond to ufsdump and ufsrestore for ufs, except that the names send and receive are perhaps more appropriate, since the zfs(1m) man page says: The format of the stream is evolving. No backwards compatibility is guaranteed. You may not be able to receive your streams on future versions of ZFS. which means to me that it's not a really good choice for archiving or long-term backups, but it should be ok for transferring zfs filesystems between systems that are the same OS version (or at any rate, close enough that the format of the zfs send/receive datastream is compatible). There are of course also generic archiving utilities that can be used for backup/restore, like tar (or star), pax, cpio, and so on. But as far as I know, there's no bare metal backup/restore facility that comes with Solaris, although there are some commercial (and probably quite expensive) products that do that. But there's probably nothing at all that's quite equivalent to Norton Ghost. One can of course use dd to copy entire raw disk partitions, but that won't set up the partitions, nor will it work as expected unless all disk sizes are identical (for filesystems that don't have the OS on them), or if the OS is on there, all hardware is identical. Depending on just what personal configuration you mean, you may not necessarily need to back up the whole system anyway. Which is another way of saying that I'm not sure your post was specific enough about what you're doing to make it possible to suggest the best available (and preferably free) solution. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] system backup and recovery
On Thu, 2008-06-05 at 15:44 +0800, Aubrey Li wrote: for windows we use ghost to backup system and recovery. can we do similar thing for solaris by ZFS? How about flar ? http://docs.sun.com/app/docs/doc/817-5668/flash-24?a=v iew [ I'm actually not sure if it's supported for zfs root though ] cheers, tim Oops, forgot about that one... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] More USB Storage Issues
Nathan Kroenert wrote: For what it's worth, I started playing with USB + flash + ZFS and was most unhappy for quite a while. I was suffering with things hanging, going slow or just going away and breaking, and thought I was witnessing something zfs was doing as I was trying to do mirror recovery and all that sort of stuff. On a hunch, I tried doing UFS and RAW instead and saw the same issues. It's starting to look like my USB hubs. Once they are under any reasonable read/write load, they just make bunches of things go offline. Yep - They are powered and plugged in. So, at this stage, I'll be grabbing a couple of 'better' USB hubs (Mine are pretty much the cheapest I could buy) and see how that goes. For gags, take ZFS out of the equation and validate that your hardware is actually providing a stable platform for ZFS... Mine wasn't... That's my experience too. USB HUBs are cheap [ expletive deleted ] mostly... What do you expect? They're mostly consumer-grade, which is to say garbage, rather than datacenter-grade. And it's not just USB hubs - I've got a consumer-grade external modem, and I swear it must have little or no ECC and/or watchdog, because I have to power-cycle it every so often. Wish I had a lead box to put it in to shield it from the cosmic rays, maybe that would help... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] new install - when is zfs root offered? (snv_90)
A Darren Dunham [EMAIL PROTECTED] writes: On Tue, Jun 03, 2008 at 05:56:44PM -0700, Richard L. Hamilton wrote: How about SPARC - can it do zfs install+root yet, or if not, when? Just got a couple of nice 1TB SAS drives, and I think I'd prefer to have a mirrored pool where zfs owns the entire drives, if possible. (I'd also eventually like to have multiple bootable zfs filesystems in that pool, corresponding to multiple versions.) Is they just under 1TB? I don't believe there's any boot support in Solaris for EFI labels, which would be required for 1TB+. ISTR that I saw an ARC case go past about a week ago about extended SMI labels to allow 1TB disks, for exactly this reason. Thanks. Just searched, that's http://www.opensolaris.org/jive/thread.jspa?messageID=237603 (approved) Since format didn't choke, and since a close reading suggests the actual older limit is 1TiB (or maybe 1TiB - 1 sector), I should be fine on that score. The LSI SAS 3800x is supposed to have fcode boot support. And snv_90 is supposed to have zfs boot install working on both SPARC and x86. So I guess I'll just have to try it. That only leaves me wondering whether I should attempt a live upgrade from SXCE snv_81, or just do the text install off a DVD onto one of the new disks (hoping the installer takes care of setting up the disk however it needs to be to be bootable), and then adding identical partitioning to the other disk, attaching a suitable partition on the 2nd disk to the zpool, and using LVM (Disk Suite) to mirror any non-zfs partitions the installation created. Never having used live upgrade myself (although read about it), I suppose it would be an educational experience either way. Time was once, I'd have looked forward to that...must be getting tired... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] new install - when is zfs root offered? (snv_90)
On Tue, Jun 03, 2008 at 05:56:44PM -0700, Richard L. Hamilton wrote: How about SPARC - can it do zfs install+root yet, or if not, when? Just got a couple of nice 1TB SAS drives, and I think I'd prefer to have a mirrored pool where zfs owns the entire drives, if possible. (I'd also eventually like to have multiple bootable zfs filesystems in that pool, corresponding to multiple versions.) Is they just under 1TB? I don't believe there's any boot support in Solaris for EFI labels, which would be required for 1TB+. Don't know about Solaris or the on-disk bootloader (I would think they ought to have that eventually if not already), but since it's been awhile since I've seen a new firmware update for the SB2K, I doubt the firmware could handle EFI labels. But format is perfectly happy putting either Sun or EFI labels on these drives, so that shouldn't be a problem. SCSI read capacity shows 1953525168 (512-byte) sectors, which multiplied out is 1,000,204,886,016 bytes; more than 10^12 (1TB), but less than 2^40 (1TiB). This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] new install - when is zfs root offered? (snv_90)
P.S. the ST31000640SS drives, together with the LSI SAS 3800x controller (in a 64-bit 66MHz slot) gave me, using dd with a block size of either 1024k or 16384k (1MB or 16MB) and a count of 1024, a sustained read rate that worked out to a shade over 119MB/s, even better than the nominal sustained transfer rate of 116MB/s documented for the drives. Even at a miserly 7200 RPM, that was better than 2 1/2 times faster than the internal 10,000 RPM 73GB FC/AL (2GB/s) drives, which impressed the heck out of me. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] new install - when is zfs root offered? (snv_90)
How about SPARC - can it do zfs install+root yet, or if not, when? Just got a couple of nice 1TB SAS drives, and I think I'd prefer to have a mirrored pool where zfs owns the entire drives, if possible. (I'd also eventually like to have multiple bootable zfs filesystems in that pool, corresponding to multiple versions.) Is/will all that be possible? Would it be ok to pre-create the pool, and if so, any particular requirements? Currently running snv_81 on a Sun Blade 2000; SAS/SATA controller is an LSI Logic SAS 3800X 8-port, in the 66MHz slot. I chose SAS drives for the first two (of 8) trusting SCSI support to probably be more mature and functional than SATA support, but the rest (as I'm willing to part with the $$) will probably be SATA for price. The current two SAS drives are Seagate ST31000640SS (which I just used smartctl to confirm have SMART support including temperature reporting). Enclosure is an Enhance E8-ML (no enclosure services support). This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] The ZFS inventor and Linus sitting in a tree?
On Mon, May 19, 2008 at 10:06 PM, Bill McGonigle [EMAIL PROTECTED] wrote: On May 18, 2008, at 14:01, Mario Goebbels wrote: I mean, if the Linux folks to want it, fine. But if Sun's actually helping with such a possible effort, then it's just shooting itself in the foot here, in my opinion. [] they're quick to do it - they threatened to sue me when they couldn't figure out how to take back a try-out server). There's a story contained within that for sure! :) You brought a smile to this subscriber when I read it. Having ZFS as a de- facto standard lifts all boats, IMHO. It's still hard to believe (in one sense) that the entire world isn't beating a path to Sun's door and PLEADING for ZFS. This is (if y'all will forgive the colloquialism) a kick-ass amazing piece of software. It appears to defy all the rules, a bit like levitation in a way, or perhaps it just rewrites those rules. There are days I still can't get my head around what ZFS really is. In general, licensing issues just make my brain bleed, but one hopes that the licensing gurus can get their heads together and find a way to get this done. I don't personally believe that Open Solaris *OR* Solaris will lose if ZFS makes its way over the fence to Linux, I think that this is a big enough tent for everyone. Sure hope so anyway, it would be immensely sad to see technology like this not being adopted/ported/migrated/whatever more widely because of damn lawyers and the morass called licensing. Perhaps (gazing into a cloudy crystal ball that hasn't been cleaned in a while) Solaris/Open Solaris can manage to hold onto ZFS-on-boot which is perhaps *the* most mind bending accomplishment within the zfs concept, and let the rest procreate elsewhere. That could contribute to the must-have/must-install cachet of Solaris/OpenSolaris. Umm, I think it's too late for that; as I recall, the bits needed for read-only access had to be made dual CDDL/GPL to be linked with GRUB. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS for write-only media?
Dana H. Myers [EMAIL PROTECTED] wrote: Bob Friesenhahn wrote: Are there any plans to support ZFS for write-only media such as optical storage? It seems that if mirroring or even zraid is used that ZFS would be a good basis for long term archival storage. I'm just going to assume that write-only here means write-once, read-many, since it's far too late for an April Fool's joke. I know two write-only device types: WOM Write-only media WORN Write-once read never (this one is often used for backups ;-) Jörg Save $$ (or €€) - use /dev/null instead. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] utf8only-property
So, I set utf8only=on and try to create a file with a filename that is a byte array that can't be decoded to text using UTF-8. What's supposed to happen? Should fopen(), or whatever syscall 'touch' uses, fail? Should the syscall somehow escape utf8-incompatible bytes, or maybe replace them with ?s or somesuch? Or should it automatically convert the filename from the active locale's fs-encoding (LC_CTYPE?) to UTF-8? First, utf8only can AFAIK only be set when a filesystem is created. Second, use the source, Luke: http://src.opensolaris.org/source/search?q=defs=refs=z_utf8path=%2Fonnv%2Fonnv-gate%2Fusr%2Fsrc%2Futs%2Fcommon%2Ffs%2Fzfs%2Fzfs_vnops.chist=project=%2Fonnv Looks to me like lookups, file create, directory create, creating symlinks, and creating hard links will all fail with error EILSEQ (Illegal byte sequence) if utf8only is enabled and they are presented with a name that is not valid UTF-8. Thus, on a filesystem where it is enabled (since creation), no such names can be created or would ever be there to be found anyway. So in that case, the system is refusing non UTF-8 compatible byte strings and there's no need to escape anything. Further, your last sentence suggests that you might hold the incorrect idea that the kernel knows or cares what locale an application is running in: it does not. Nor indeed does the kernel know about environment variables at all, except as the third argument passed to execve(2); it doesn't interpret them, or even validate that they are of the usual name=value form, they're typically handled pretty much the same as the command line args, and the only illusion of magic is that with the more widely used variants of exec that don't explicitly pass the environment, they internally call execve(2) with the external variable environ as the last arg, thus passing the environment automatically. There have been Unix-like OSs that make the environment available to additional system calls (give or take what's a true system call in the example I'm thinking of, namely variant links (symlinks with embedded environment variable references) in the now defunct Apollo Domain/OS), but AFAIK, that's not the case in those that are part of the historical Unix source lineage. (I have no idea off the top of my head whether or not Linux, or oddballs like OSF/1 might make environment variables implicitly available to syscalls other than execve(2).) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] vxfs vs ufs vs zfs
Hello, I have just done comparison of all the above filesystems using the latest filebench. If you are interested: http://przemol.blogspot.com/2008/02/zfs-vs-vxfs-vs-ufs -on-x4500-thumper.html Regards przemol I would think there'd be a lot more variation based on workload, such that the overall comparison may fall far short of telling the whole story. For example, IIRC, VxFS is more or less extent-based (like mainframe storage), so serial I/O for large files should be perhaps its strongest point, while other workloads may do relatively better with the other filesystems. The free basic edition sounds cool, though - downloading now. I could use a bit of practice with VxVM/VxFS; it's always struck me as very good when it was good (online reorgs of storage and such), and an utter terror to untangle when it got messed up, not to mention rather more complicated that DiskSuite/SVM (and of course _waay_ more complicated than zfs :-) Any idea if it works with reasonably recent OpenSolaris (build 81) ? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 'du' is not accurate on zfs
On Sat, 16 Feb 2008, Richard Elling wrote: ls -l shows the length. ls -s shows the size, which may be different than the length. You probably want size rather than du. That is true. Unfortunately 'ls -s' displays in units of disk blocks and does not also consider the 'h' option in order to provide a value suitable for humans. Bob ISTR someone already proposing to make ls -h -s work in a way one might hope for. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sharenfs with over 10000 file systems
New, yes. Aware - probably not. Given cheap filesystems, users would create many filesystems was an easy guess, but I somehow don't think anybody envisioned that users would be creating tens of thousands of filesystems. ZFS - too good for it's own good :-p IMO (and given mails/posts I've seen typically by people using or wanting to use zfs at large universities and the like, for home directories) this is frequently driven by the need for per-user quotas. Since zfs doesn't have per-uid quotas, this means they end up creating (at least one) filesystem per user. That means a share per user, and locally a mount per user, which will never scale as well as (locally) a single share of /export/home, and a single mount (although there would of course be automounts to /home on demand, but they wouldn't slow down bootup). sharemgr and the like may be attempts to improve the situation, but they mitigate rather than eliminate the consequences of exploding what used to be a single large filesystem into a bunch of relatively small ones, simply based on the need to have per-user quotas with zfs. And there are still situations where a per-uid quota would be useful, such as /var/mail (although I could see that corrupting mailboxes in some cases) or other sorts of shared directories. OTOH, the implementation could certainly vary a little. The equivalent of the quotas file should be automatically created when quotas are enabled, and invisible; and unless quotas are not only disabled but purged somehow, it should maintain per-uid use statistics even for uids with no quotas, to eliminate the need for quotacheck (initialization of quotas might well be restricted to filesystem creation time, to eliminate the need for a cumbersome pass through existing data, at least at first; but that would probably be wanted too, since people don't always plan ahead). But other quota-related functionality could IMO maintain, although the implementations might have to get smarter, and there ought to be some alternative to the method presently used with ufs of simply reading the quotas file to iterate through the available stats. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 7zip compression?
Hello Marc, Sunday, July 29, 2007, 9:57:13 PM, you wrote: MB MC rac at eastlink.ca writes: Obviously 7zip is far more CPU-intensive than anything in use with ZFS today. But maybe with all these processor cores coming down the road, a high-end compression system is just the thing for ZFS to use. MB I am not sure you realize the scale of things here. Assuming the worst case: MB that lzjb (default ZFS compression algorithm) performs as bad as lha in [1], MB 7zip would compress your data only 20-30% better at the cost of being 4x-5x MB slower ! MB Also, in most cases, the bottleneck in data compression is the CPU, so MB switching to 7zip would reduce the I/O throughput by about 4x. 1. it depends on a specific case - sometimes it's cpu sometimes not 2. sometimes you don't really care about cpu - you have hundreds TBs of data rarely used and then squeezing 20-30% more space is a huge benefit - especially when you only read those files once they are written * disks are probably cheaper than CPUs * it looks to me like 7z may also be RAM-hungry; and there are probably better ways to use the RAM, too No doubt it's an option that would serve _someone_ well despite its shortcomings. But are there enough such someones to make it worthwhile? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cluster File System Use Cases
Bringing this back towards ZFS-land, I think that there are some clever things we can do with snapshots and clones. But the age-old problem of arbitration rears its ugly head. I think I could write an option to expose ZFS snapshots to read-only clients. But in doing so, I don't see how to prevent an ill-behaved client from clobbering the data. To solve that problem, an arbiter must decide who can write where. The SCSI rotocol has almost nothing to assist us in this cause, but NFS, QFS, and pxfs do. There is room for cleverness, but not at the SCSI or block level. -- richard Yeah; ISTR that IBM mainframe complexes with what they called shared DASD (DASD==Direct Access Storage Device, i.e. disk, drum, or the like) depended on extent reserves. IIRC, SCSI dropped extent reserve support, and indeed it was never widely nor reliably available anyway. AFAIK, all SCSI offers is reserves of an entire LUN; that doesn't even help with slices, let alone anything else. Nor (unlike either the VTOC structure on MVS nor VxFS) is ZFS extent-based anyway; so even if extent reserves were available, they'd only help a little. Which means, as he says, some sort of arbitration. I wonder whether the hooks for putting the ZIL on a separate device will be of any use for the cluster filesystem problem; it almost makes me wonder if there could be any parallels between pNFS and a refactored ZFS. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: ZFS - SAN and Raid
Victor Engle wrote: Roshan, As far as I know, there is no problem at all with using SAN storage with ZFS and it does look like you were having an underlying problem with either powerpath or the array. Correct. A write failed. The best practices guide on opensolaris does recommend replicated pools even if your backend storage is redundant. There are at least 2 good reasons for that. ZFS needs a replica for the self healing feature to work. Also there is no fsck like tool for ZFS so it is a good idea to make sure self healing can work. Yes, currently ZFS on Solaris will panic if a non-redundant write fails. This is known and being worked on, but there really isn't a good solution if a write fails, unless you have some ZFS-level redundancy. Why not? If O_DSYNC applies, a write() can still fail with EIO, right? And if O_DSYNC does not apply, an app could not assume that the written data was on stable storage anyway. Or the write() can just block until the problem is corrected (if correctable) or the system is rebooted. In any case, IMO there ought to be some sort of consistent behavior possible short of a panic. I've seen UFS based systems stay up even with their disks incommunicado for awhile, although they were hardly useful like that except insofar as activity strictly involving reading already cached pages was involved. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: OT: extremely poor experience with Sun Download
Well, I just grabbed the latest SXCE, and just for the heck of it, fooled around until I got the Java Web Start to work. Basically, one's browser needs to know the following (how to do that depends on the browser): MIME Type: application/x-java-jnlp-file File Extension: jnlp Open With: /usr/bin/javaws I got that working with both firefox and opera without inordinate difficulty. Once that was done, after clicking accept and selecting the three files, I clicked on the download with sdm box, it started sdm, and passed all three files to it. I think I also had to click start on sdm. That's it...not so bad after all. sdm has a major advantage over typical downloads done directly by browsers for such large files: if the server supports it (needs to be able to handle requests fo portions of files rather than just an entire file), it can restart failed transfers more or less automatically; and they can even be paused and resumed more or less arbitrarily. I've used that in the past to download the entire Solaris 10 CD set over a _dialup_. Took a week (well, 8 hours a day connected), but it worked. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: OT: extremely poor experience with Sun Download
Intending to experiment with ZFS, I have been struggling with what should be a simple download routine. Sun Download Manager leaves a great deal to be desired. In the Online Help for Sun Download Manager there's a section on troubleshooting, but if it causes *anyone* this much trouble http://fuzzy.wordpress.com/2007/06/14/sundownloadmana gercrap/ then it should, surely, be fixed. Sun Download Manager -- a FORCED step in an introduction to downloadable software from Sun -- should be PROBLEM FREE in all circumstances. It gives an extraordinarily poor first impression. If it can't assuredly be fixed, then we should not be forced to use it. (True, I might have ordered rather than downloaded a DVD, but Sun Download Manager has given such a poor impression that right now I'm disinclined to pay.) For trying out zfs, you could always request the free Starter Kit DVD at http://www.opensolaris.org/kits/ which contains the SXCE, Nexenta, Belenix and Schillix distros (all newer than Solaris 10). Beyond that, while I'm sure you're right about that providing a poor first impression, I guess I'm too old to have much sympathy for something taking minutes rather than seconds of attention being a barrier to entry. Yes, the download experience should be vastly improved, but if you let that stop you, I wonder if you're all that interested in the first place. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Re: Re: Re: ZFS consistency guarantee
I wish there was a uniform way whereby applications could register their ability to achieve or release consistency on demand, and if registered, could also communicate back that they had either achieved consistency on-disk, or were unable to do so. That would allow backup procedures to automatically talk to apps capable of such functions, to get them to a known state on-disk before taking a snapshot. That would allow one to for example not stop a DBMS, but simply have it seem to pause for a moment while achieving consistency and until told that the snapshot was complete; thus providing minimum impact while still having fully usable backups (and without needing to do the database backups _through_ the DBMS). Something I heard once leads me to believe that some such facility or convention for how to communicate such issues with e.g. database server processes exists on Windows. If they've got it, we really ought to have something even better, right? :-) (That's of course not specific to ZFS, but would be useful with any filesystem that can take snapshots.) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: shareiscsi is cool, but what about sharefc or sharescsi?
I'd love to be able to server zvols out as SCSI or FC targets. Are there any plans to add this to ZFS? That would be amazingly awesome. Can one use a spare SCSI or FC controller as if it were a target? Even if the hardware is capable, I don't see what you describe as a ZFS thing really; it isn't for iSCSI, except that ZFS supports a shareiscsi option (and property?) by knowing how to tell the iSCSI server to do the right thing. That is, there would have to be something like an iSCSI server except that it listened on an otherwise unused SCSI or FC interface. I think that would require not just the daemon but probably new driver facilities as well. Given that one can run IP over FC, it seems to me that in principle it ought to be possible, at least for FC. Not so sure about SCSI. Also not sure about performance. I suspect even high-end SAN controllers have a bit more latency than the underlying drives. And this is a general-purpose OS we're talking about doing this to; I don't know that it would be acceptably close, or as robust (depending on the hardware) as a high-end FC SAN, although it might be possible to be a good deal cheaper. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: storage type for ZFS
Well, no; his quote did say software or hardware. The theory is apparently that ZFS can do better at detecting (and with redundancy, correcting) errors if it's dealing with raw hardware, or as nearly so as possible. Most SANs _can_ hand out raw LUNs as well as RAID LUNs, the folks that run them are just not used to doing it. Another issue that may come up with SANs and/or hardware RAID: supposedly, storage systems with large non-volatile caches will tend to have poor performance with ZFS, because ZFS issues cache flush commands as part of committing every transaction group; this is worse if the filesystem is also being used for NFS service. Most such hardware can be configured to ignore cache flushing commands, which is safe as long as the cache is non-volatile. The above is simply my understanding of what I've read; I could be way off base, of course. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Testing of UFS, VxFS and ZFS
# zfs create pool raidz d1 … d8 Surely you didn't create the zfs pool on top of SVM metadevices? If so, that's not useful; the zfs pool should be on top of raw devices. Also, because VxFS is extent based (if I understand correctly), not unlike how MVS manages disk space I might add, _it ought_ to blow the doors off of everything for sequential reads, and probably sequential writes too, depending on the write size. OTOH, if a lot of files are created and deleted, it needs to be defragmented (although I think it can do that automatically; but there's still at least some overhead while a defrag is running). Finally, don't forget complexity. VxVM+VxFS is quite capable, but it doesn't always recover from problems as gracefully as one might hope, and it can be a real bear to get untangled sometimes (not to mention moderately tedious just to set up). SVM, although not as capable as VxVM, is much easier IMO. And zfs on top of raw devices is about as easy as it gets. That may not matter _now_, when whoever sets these up is still around; but when their replacement has to troubleshoot or rebuild, it might help to have something that's as easy as possible. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: FreeBSD's system flags.
So you're talking about not just reserving something for on-disk compatibility, but also maybe implementing these for Solaris? Cool. Might be fairly useful for hardening systems (although as long as someone had raw device access, or physical access, they could of course still get around it; that would have to be taken into account in the overall design for it to make much of a difference). Other problems: from a quick look at the header files there's no room left in the 64-bit version of the stat structure to add something in which to retrieve the flags; that may mean a new and incompatible (with other chflags(2) supporting systems) system call? Also, there's no provision in pkgmap(4) for file flags; could that be extended compatibly? This message posted from opensolaris.org ___ zfs-discuss mailing list [EMAIL PROTECTED] http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS and Linux
I hope this isn't turning into a License flame war. But why do Linux contributors not deserve the right to retain their choice of license as equally as Sun, or any other copyright holder, does? The anti-GPL kneejerk just witnessed on this list is astonishing. The BSD license, for instance, is fundamentally undesirable to many GPL licensors (myself included). Nothing wrong with GPL as an abstract ideology. But when ideology trumps practicality (which it does when code can't be as widely reused as possible), I have a problem with that. As far as I'm concerned, GPL is to open licenses as political correctness is to free speech. Of course, anyone who writes something is free to use any license they please. And anyone else is free to choose an incompatible license, either for reasons that have nothing specifically to do with being incompatible, or because they just don't want the sucking sound of their goodies being adopted and very little being returned (which strikes me as a major element of the relationship between Linux and *BSD; although to be sure, there is some two-way cooperation). I have zero problem with Linux using GPLv2 (and as some have said, perhaps being stuck with it at this point). I'm not sure I'd want their code anyway, and even if I did, I darn sure wouldn't want the we don't need no steekin' DDI 'cause we're source based philosophy that comes with it, because to my mind that ends up justifying a lot of poor design and engineering discipline in the name of not being limited by backwards compatibility. So, if having chosen a license based on the ideology of being a lever to free other software (but on their terms!) for the sake of being compatible with them, the Linux folks now have to re-invent equivalents of ZFS and Dtrace, it serves them right, IMO. And as someone else also mentioned, competition is good anyway. Not as if a lot of ideas don't cross-pollinate. But if every free OS used compatible licenses, I think 20 years later, the result would resemble the result of inbreeding...not pretty, and a shallower meme pool overall. This message posted from opensolaris.org ___ zfs-discuss mailing list [EMAIL PROTECTED] http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: How big a write to a regular file is atomic?
On Wed, Mar 28, 2007 at 06:55:17PM -0700, Anton B. Rang wrote: It's not defined by POSIX (or Solaris). You can rely on being able to atomically write a single disk block (512 bytes); anything larger than that is risky. Oh, and it has to be 512-byte aligned. File systems with overwrite semantics (UFS, QFS, etc.) will never guarantee atomicity for more than a disk block, because that's the only guarantee from the underlying disks. I thought UFS and others have a guarantee of atomicity for O_APPEND writes vis-a-vis other O_APPEND writes up to some write size. (Of course, NFS does not have true O_APPEND support, so this wouldn't apply to NFS.) That's mainly what I was thinking of, since the overwrite case would get more complicated. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] How big a write to a regular file is atomic?
and does it vary by filesystem type? I know I ought to know the answer, but it's been a long time since I thought about it, and I must not be looking at the right man pages. And also, if it varies, how does one tell? For a pipe, there's fpathconf() with _PC_PIPE_BUF, but how about for a regular file? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] missing features?Could/should zfs support a new ioctl, constrained if neede
_FIOSATIME - why doesn't zfs support this (assuming I didn't just miss it)? Might be handy for backups. Could/should zfs support a new ioctl, constrained if needed to files of zero size, that sets an explicit (and fixed) blocksize for a particular file? That might be useful for performance in special cases when one didn't necessarily want to specify (or depend on the specification of perhaps) the attribute at the filesystem level. One could imagine a database that was itself tunable per-file to a similar range of blocksizes, which would almost certainly benefit if it used those sizes for the corresponding files. Additional capabilities that might be desirable: setting the blocksize to zero to let the system return to default behavior for a file; being able to discover the file's blocksize (does fstat() report this?) as well as whether it was fixed at the filesystem level, at the file level, or in default state. Wasn't there some work going on to add real per-user (and maybe per-group) quotas, so one doesn't necessarily need to be sharing or automounting thousands of individual filesystems (slow)? Haven't heard anything lately though... This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] mirror question
If I create a mirror, presumably if possible I use two or more identically sized devices, since it can only be as large as the smallest. However, if later I want to replace a disk with a larger one, and detach the mirror (and anything else on the disk), replace the disk (and if applicable repartition it), since it _is_ a larger disk (and/or the partitions will likely be larger since they mustn't be smaller, and blocks per cylinder will likely differ, and partitions are on cylinder boundaries), once I reattach everything, I'll now have two different sized devices in the mirror. So far, the mirror is still the original size. But what if I later replace the other disks with ones identical to the first one I replaced? With all the devices within the mirror now the larger size, will the mirror and the zpool of which it is a part expand? And if that won't happen automatically, can it (without inordinate trickery, and online, i.e. without backup and restore) be forced to do so? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: UFS on zvol: volblocksize and maxcontig
I hope there will be consideration given to providing compatibility with UFS quotas (except that inode limits would be ignored). At least to the point of having edquota(1m) quot(1m) quota(1m) quotactl(7i) repquota(1m) rquotad(1m) and possibly quotactl(7i) work with zfs (with the exception previously mentioned). OTOH, quotaon(1m)/quotaoff(1m)/quotacheck(1m) may not be needed for support of per-user quotas in zfs (since it will presumably have its own ways of enabling these, and will simply never mess up?) None of which need preclude new interfaces with greater functionality (like both user and group quotas), but where there is similar functionality, IMO it would be easier for a lot of folks if quota maintenance (esp. edquota and reporting) could be done the same way for ufs and zfs. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: zpool split
...such that a snapshot (cloned if need be) won't do what you want? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: A versioning FS
What would a version FS buy us that cron+ zfs snapshots doesn't? Some people are making money on the concept, so I suppose there are those who perceive benefits: http://en.wikipedia.org/wiki/Rational_ClearCase (I dimly remember DSEE on the Apollos; also some sort of versioning file type on (probably long-dead) Harris VOS real-time OS.) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS + rsync, backup on steroids.
Are both of you doing a umount/mount (or export/import, I guess) of the source filesystem before both first and second test? Otherwise, there might still be a fair bit of cached data left over from the first test, which would give the 2nd an unfair advantage. I'm fairly sure unmounting a filesystem invalidates all cached pages associated with files on that filesystem, as well as any cached [iv]node entries, all of which in needed to ensure both tests are starting from the most similar situation possible. Ideally, all this would even be done in single-user mode, so that nothing else could interfere. If there were a list of precautions to take that would put comparisons like this on firmer ground, it might provide a good starting point for such comparisons to be more than anecdotes, saving time for all concerned, both those attempting to replicate a prior casual observation for reporting, and those looking at the report. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re: Re: SCSI synchronize cache cmd
Filed as 6462690. If our storage qualification test suite doesn't yet check for support of this bit, we might want to get that added; it would be useful to know (and gently nudge vendors who don't yet support it). Is either the test suite, or at least a list of what it tests (which it looks like may more or less track what Solaris requires) publically available, or could it be made so? Seems to me that if people can independently discover problem hardware, that might make your job easier insofar as they're smarter before they start asking you questions; even more so if they feed back what they find (not unlike the do-it-yourself x86 compatibility testing). This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss