RE: [zfs-discuss] Automounting ? (idea ?)
> So recently, i decided to test out some of the ideas i've been toying > with, and decided to create 50 000 and 100 000 filesystems, the test > machine was a nice V20Z with dual 1.8 opterons, 4gb ram, connecting a > scsi 3310 raid array, via two scsi controllers. I did a similar test a couple of months ago, albeit on a smaller system, and 'only' 10,000 users. I saw a similar delay at boot time, but also saw a large amount of memory utilisation. > So ... how about an automounter? Is this even possible? Does > it exist ? Around the same time, Casper Dik mentioned the possibility of automounting zfs datasets, as well as the possibility of cool stuff like *creating* zfs datasets with the automounter. One thing that hasn't been touched on is how one would back up a system when some (or most) filesystems are unmounted most of the time. Is is possible to make a backup and/or take a snapshot of an unmounted dataset (and if not, is that a future possibility)? Steve. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
RE: [zfs-discuss] Proposal: multiple copies of user data
Darren said: > Right, that is a very important issue. Would a > ZFS "scrub" framework do copy on write ? > As you point out if it doesn't then we still need > to do something about the old clear text blocks > because strings(1) over the raw disk will show them. > > I see the desire to have a knob that says "make this > encrypted now" but I personally believe that it is > actually better if you can make this choice at the > time you create the ZFS data set. I'm not sure that that gets rid of the problem at all. If I have an existing filesystem that I want to encrypt, but I need to create a new dataset to do so, I'm going to create my new, encrypted dataset, then copy my data onto it, then (maybe) delete the old one. If both datasets are in the same pool (which is likely), I'll still not be able to securely erase the blocks that have all my cleartext data on them. The only way to do the job properly would to overwrite the entire pool, which is likely to be pretty inconvenient in most cases. So, how about some way to securely erase freed blocks? It could be implemented as a one-off operation that acts on an entire pool e.g. zfs shred tank which would walk the free block list and overwrite with random data some number of times. Or it might be more useful to have it as a per-dataset option: zfs set shred=32 tank/secure which could overwrite blocks with random data as they are freed. I have no idea how expensive this might be (both in development time, and in performance hit), but its use might be a bit wider than just dealing with encryption and/or rekeying. I guess that deletion of a snapshot might get a bit expensive, but maybe there's some way that blocks awaiting shredding could be queued up and dealt with at a lower priority... Steve. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
RE: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic
> Dunno about eSATA jbods, but eSATA host ports have > appeared on at least two HDTV-capable DVRs for storage > expansion (looks like one model of the Scientific Atlanta > cable box DVR's as well as on the shipping-any-day-now > Tivo Series 3). > > It's strange that they didn't go with firewire since it's > already widely used for digital video. Cost? If you use eSata it's pretty much just a physical connector onto the board, whereas I guess firewire needs a 1394 interface (couple of dollars?) plus a royalty to all the patent holders. It's probably not much, but I can't see how there can be *any* margin in consumer electronics these days... Steve. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
RE: [zfs-discuss] ZFS Boot Disk
Lori said: > The limitation is mainly about the *number* of disks > that can be accessed at one time. > ... > But with straight mirroring, there's no such problem > because any disk in the mirror can supply all of the > disk blocks needed to boot. Does that mean that these restrictions will go away once replication can be varied on a per dataset (or per file) basis? You could have all your 'essential to boot' files mirrored across all disks, then raidz2 the rest... Steve. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
RE: [zfs-discuss] How to best layout our filesystems
Eric said: > For U3, these are the performance fixes: > 6424554 full block re-writes need not read data in > 6440499 zil should avoid txg_wait_synced() and use dmu_sync() > to issue > parallelIOs when fsyncing > 6447377 ZFS prefetch is inconsistant > 6373978 want to take lots of snapshots quickly ('zfs snapshot -r') > > you could perhaps include these two as well: > 4034947 anon_swap_adjust() should call kmem_reap() if > availrmem is low. > 6416482 filebench oltp workload hangs in zfs > > There won't be anything in U3 that isn't already in nevada... Hi Eric, Do S10U2 users have to wait for U3 to get these fixes, or are they going to be released as patches before then? I'm presuming that U3 is scheduled for early 2007... Steve. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
RE: [zfs-discuss] Expanding raidz2
Jeff Bonwick said: > RAID-Z takes a different approach. We were designing a filesystem > as well, so we could make the block pointers as semantically rich > as we wanted. To that end, the block pointers in ZFS contains data > layout information. One nice side effect of this is that we don't > need fixed-width RAID stripes. If you have 4+1 RAID-Z, we'll store > 128k as 4x32k plus 32k of parity, just like any RAID system would. > But if you only need to store 3 sectors, we won't do a partial-stripe > update of an existing 5-wide stripe; instead, we'll just allocate > four sectors, and store the data and its parity. The stripe width > is variable on a per-block basis. And, although we don't support it > yet, so is the replication model. The rule for how to reconstruct > a given block is described explicitly in the block pointer, not > implicitly by the device configuration. Thanks for the explanation - a great help in understanding how all this stuff fits together. Unfortunately I'm now less sure about why you cannot 'just' add another disk to a RAID-Z pool. Is this just a policy decision for the sake of keeping it simple, rather than a technical restriction? > If your free disk space might be used for single-copy data, > or might be used for mirrored data, then how much free space > do you have? Questions like that need to be answered, and > answered in ways that make sense. They need to be answered, but as the storage is scaled up we don't need any extra accuracy - knowing that a filesystem is somewhere around 80% full is just fine - I really don't need to care precisely how many blocks are free, and it actually hinders me if I get given the exact information (I have to scale it into the number of GB, or the percentage of space used). The fact that we then pretty much ignore exact block counts then leads me to think that we don't actually need to care about exactly how many blocks are free on a disk - so if I store N blocks of data it's acceptable for the number of free blocks to change by domething different to N. And once data starts to be compressed the direct correlation between the size of a file and the amount of disk space it uses goes away in any case. All pretty exciting - how long are we going to have to wait for this? Steve. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
RE: [zfs-discuss] Expanding raidz2
> > I guess that could be made to work, but then the data on > > the disk becomes much (much much) more difficult to > > interpret because you have some rows which are effectively > > one width and others which are another (ad infinitum). > > How do rows come into it? I was just assuming that each > (existing) in-use disk block was pointed to by a FS block, > which was tracked by other structures. I was guessing that > adding space (effectively extending the "rows") wasn't > going to be noticed for accessing old data. That's what my assumption was too. I had the impression from the initial information (I nearly said hype) about ZFS, that the distinctions between RAID levels were to become less clear i.e. that you could have some files stored with higher resilience than others. Maybe this is a dumb question, but I've never written a filesystem is there a fundamental reason why you cannot have some files mirrored, with others as raidz, and others with no resilience? This would allow a pool to initially exist on one disk, then gracefully change between different resilience strategies as you add disks and the requirements change. Apologies if this is pie in the sky. Steve. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
RE: [zfs-discuss] ZFS needs a viable backup mechanism
Mike said: > 3) ZFS ability to recognize duplicate blocks and store only one copy. > I'm not sure the best way to do this, but my thought was to have ZFS > remember what the checksums of every block are. As new blocks are > written, the checksum of the new block is compared to known checksums. > If there is a match, a full comparison of the block is performed. If > it really is a match, the data is not really stored a second time. In > this case, you are still backing up and restoring 50 TB. I've done a limited version of this on a disk-to-disk backup system that we use - I use rsync with --link-dest to preserve multiple copies in a space-efficient way, but I found that glitches caused the links to be lost occasionally, so I have a job that I run occasionally that looks for identical files and hard links them to each other. The ability to get this done in ZFS would be pretty neat, and presumably COW would ensure that there was no danger of a change on one copy affecting any others. Even if there were severe restrictions on how it worked - e.g. only files with the same relative paths would be considered, or it was batch-only instead of live and continuous - it would still be pretty powerful. Steve. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
RE: [zfs-discuss] ZFS needs a viable backup mechanism
> If you are going to use Veritas NetBackup why not use the > native Solaris client ? I don't suppose anyone knows if Networker will become zfs-aware at any point? e.g. backing up properties backing up an entire pool as a single save set efficient incrementals (something similar to "zfs send -i") The ability to back stuff up well would make widespread adoption easier, especially if thumper lives up to expectations. Steve. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] what to put on zfs
A slightly different tack now... what filesystems is it a good (or bad) idea to put on ZFS? root - NO (not yet anyway) home - YES (although the huge number of mounts still scares me a bit) /usr - possible? /var - possible? swap - no? Is there any advantage in having multiple zpools over just having one and allocating all filesystems out of it? Obviously if you wanted (for example) /export/home to be raidz and /usr to be mirror you would have to, but are there other considerations beyond that? I'm thinking that zfs frees me up from getting the sizing 'right' at install time i.e. big enough that I don't have to resize later, which inevitably means at least one filesystem being far bigger than it needs to be. Steve. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
RE: [zfs-discuss] Re: Re: Supporting ~10K users on ZFS
Casper said: > You can have composite mounts (multiple nested mounts) > but that is essentially a single automount entry so it > can't be overly long, I believe. I've seen that in the man page, but I've never managed to find a use for it! What I'd *like* to be able to do is have a map that amounts to: 00 -ro \ / keck:/export/home/00 /* -rw /export/home/00/& 01 -ro \ / keck:/export/home/01 /* -rw /export/home/01/& ... This doesn't work - I think it's beyond the capabilities of automountd. I don't even think an executable map would help. I can see that I could do an executable map to preserve the /export/home/NN/username on the server, but have /home/username on the client - we were considering this on a different system here (where we're encountering similar problems with a panasas fileserver). Thanks Steve. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss