Re: [zfs-discuss] Server upgrade
On Thu, February 16, 2012 11:18, Paul Kraus wrote: > On Thu, Feb 16, 2012 at 11:42 AM, David Dyer-Bennet wrote: > >> I'm seriously thinking of going Nexenta, as I think it would let me be a >> little less of a sysadmin. Solaris 11 express is tempting in its own >> way >> though, if I decide the price is tolerable. > > I looked at the Nexenta route, and while it is _very_ attractive, > I need my home server to function as DHCP and DNS server as well (and > a couple other services would be nice as well). Since Nexenta is a > storage appliance, I could not go that route and get what I needed > without hacking into it. Ah, that might be a problem. Not those specific services currently, but I do now and then run things. MRTG, maybe Nagios, are on the list to do (though it's so much harder to get anything like that going on Solaris, I'm tempted to run a linux virtual server; that would be on the same box though, so still a problem). -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Server upgrade
On Wed, February 15, 2012 18:06, Brandon High wrote: > On Wed, Feb 15, 2012 at 9:16 AM, David Dyer-Bennet wrote: >> Is there an upgrade path from (I think I'm running Solaris Express) to >> something modern? (That could be an Oracle distribution, or the free > > There *was* an upgrade path from snv_134 to snv_151a (Solaris 11 > Express) but I don't know if Oracle still supports it. There was an > intermediate step or two along the way (snv_134b I think?) to move > from OpenSolaris to Oracle Solaris. > > As others mentioned, you could jump to OpenIndiana from your current > version. You may not be able to move between OI and S11 in the future, > so it's a somewhat important decision. Thanks. Given the pricing for commercial Solaris versions, I don't think moving to them is likely to ever be important to me. It looks like OI and Nexenta are the viable choices I have to look at. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Server upgrade
On Thu, February 16, 2012 13:31, Edward Ned Harvey wrote: >> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- >> boun...@opensolaris.org] On Behalf Of David Dyer-Bennet >> >> This is already getting useful; "which has never worked for me" for >> example is the sort of observation I find informative, since I've been >> seeing your name around here for some time and have the general >> impression >> that you're not stupid or incompetent. > > Just because I talk a lot doesn't mean I'm not stupid or incompetent. ;-) I resemble that remark! But, slightly more seriously, I've read what you said, not just noticed the volume :-). > "Never worked for me," in this case, basically means I tried upgrading > from > one opensolaris to another... which went horribly wrong... And even when > applying system updates (paid commercial solaris 10 support, applying > security patches etc) those often cause problems too. But I wouldn't call > them "horribly wrong." I've gotten at least that to work a few times. But for me, keeping up with OS upgrades is one of the most important sysadmin tasks. Otherwise, you're leaving unpatched vulnerabilities sitting around. >> I was going to say the commercial version wasn't an option -- but on >> consideration, I haven't done the research to determine that. So that's >> a >> task (how hard can it be to find out how much they want?). > > You mean, how much it costs? http://oracle.com click on "Store," and > "Solaris." Looks like $1,000 per socket per year for 1-4 sockets. You beat me to it. And if that's the order of magnitude, then I was right the first time, the commercial versions are completely out of the question. I might, if I felt really friendly towards Oracle, consider a one-shot payment of 1/10 or maybe a little more :-). -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Server upgrade
On Thu, February 16, 2012 08:54, Edward Ned Harvey wrote: >> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- >> boun...@opensolaris.org] On Behalf Of David Dyer-Bennet >> >> While I'm not in need of upgrading my server at an emergency level, I'm >> starting to think about it -- to be prepared (and an upgrade could be >> triggered by a failure at this point; my server dates to 2006). > > There are only a few options for you to consider. I don't know which ones > support encryption, or which ones offer an upgrade path from your version > of > opensolaris, but I figure you can probably easily evaluate each of the > options for your own purposes. > > No matter which you use, I assume you will be exporting the data pool, and > later importing it. But the OS will either need to be wiped and > reinstalled > from scratch, or obviously, follow your upgrade path (which has never > worked > for me; I invariably end up wiping the OS and reinstalling. Good thing I > keep documentation about how I configure my OS.) This is already getting useful; "which has never worked for me" for example is the sort of observation I find informative, since I've been seeing your name around here for some time and have the general impression that you're not stupid or incompetent. Yeah, I'll try to export and import the pool. AND I'll have three current backups on external drives, at least one out of the house and at least one in the house :-). I'm kind of fond of this data, and wouldn't like anything to happen to it (I could recover some of the last decade of photography from optical disks, with a lot of work, and the online copies would remain but those aren't high-res). > Nexenta, OpenIndiana, Solaris 11 Express (free version only permitted for > certain uses, no regular updates available), or commercial Solaris. > > If you consider paying for solaris - at Oracle, you just pay them for "An > OS" and they don't care which one you use. Could be oracle linux, > solaris, > or solaris express. I would recommend solaris 11 express based on > personal > experience. It gets bugfixes and new features sooner than commercial > solaris. I was going to say the commercial version wasn't an option -- but on consideration, I haven't done the research to determine that. So that's a task (how hard can it be to find out how much they want?). Listing the options is extremely useful, in fact. Even though I've heard of all of them, seeing how you group things helps me too. I'm seriously thinking of going Nexenta, as I think it would let me be a little less of a sysadmin. Solaris 11 express is tempting in its own way though, if I decide the price is tolerable. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Server upgrade
While I'm not in need of upgrading my server at an emergency level, I'm starting to think about it -- to be prepared (and an upgrade could be triggered by a failure at this point; my server dates to 2006). I'm actually more concerned with software than hardware. My load is small, the current hardware is handling it no problem. I don't see myself as a candidate for dedup, so I don't need to add huge quantities of RAM. I'm handling compression on backups just fine (the USB external disks are the choke-point, so compression actually speeds up the backups). I'd like to be on a current software stream that I can easily update with bug-fixes and new features. The way I used to do that got broke in the Oracle takeover. I'm interested in encryption for my backups, if that's functional (and safe) in current software versions. I take copies off-site, so that's a useful precaution. Whatever I do, I'll of course make sure my backups are ALL up-to-date and at least one is back off-site before I do anything drastic. Is there an upgrade path from (I think I'm running Solaris Express) to something modern? (That could be an Oracle distribution, or the free software fork, or some Nexenta distribution; my current data pool is 1.8T, and I don't expect it to grow terribly fast, so the fully-featured free version fits my needs for example.) Upgrading might perhaps save me from changing all the user passwords (half a dozen, not a huge problem) and software packages I've added. (uname -a says "SunOS fsfs 5.11 snv_134 i86pc i386 i86pc"). Or should I just export my pool and do a from-scratch install of something? (Then recreate the users and install any missing software. I've got some cron jobs, too.) AND, what "something" should I upgrade to or install? I've tried a couple of times to figure out the alternatives and it's never really clear to me what my good options are. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow zfs send/recv speed
On Tue, November 15, 2011 17:05, Anatoly wrote: > Good day, > > The speed of send/recv is around 30-60 MBytes/s for initial send and > 17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk > to 100+ disks in pool. But the speed doesn't vary in any degree. As I > understand 'zfs send' is a limiting factor. I did tests by sending to > /dev/null. It worked out too slow and absolutely not scalable. > None of cpu/memory/disk activity were in peak load, so there is of room > for improvement. What you're probably seeing with incremental sends is that the disks being read are hitting their IOPS limits. Zfs send does random reads all over the place -- every block that's changed since the last incremental send is read, in TXG order. So that's essentially random reads all of the disk. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Remove corrupt files from snapshot
On Tue, November 15, 2011 10:07, sbre...@hotmail.com wrote: > Would it make sense to do "zfs scrub" regularly and have a report sent, > i.e. once a day, so discrepancy would be noticed beforehand? Is there > anything readily available in the Freebsd ZFS package for this? If you're not scrubbing regularly, you're losing out on one of the key benefits of ZFS. In nearly all fileserver situations, a good amount of the content is essentially archival, infrequently accessed but important now and then. (In my case it's my collection of digital and digitized photos.) A weekly scrub combined with a decent backup plan will detect bit-rot before the backups with the correct data cycle into the trash (and, with redundant storage like mirroring or RAID, the scrub will probably be able to fix the error without resorting to restoring files from backup). -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow zfs send/recv speed
On Tue, November 15, 2011 20:08, Edward Ned Harvey wrote: >> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- >> boun...@opensolaris.org] On Behalf Of Anatoly >> >> The speed of send/recv is around 30-60 MBytes/s for initial send and >> 17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk > > I suggest watching zpool iostat before, during, and after the send to > /dev/null. Actually, I take that back - zpool iostat seems to measure > virtual IOPS, as I just did this on my laptop a minute ago, I saw 1.2k > ops, > which is at least 5-6x higher than my hard drive can handle, which can > only > mean it's reading a lot of previously aggregated small blocks from disk, > which are now sequentially organized on disk. How do you measure physical > iops? Is it just regular iostat? I have seriously put zero effort into > answering this question (sorry.) > > I have certainly noticed a delay in the beginning, while the system thinks > about stuff for a little while to kick off an incremental... And it's > acknowledged and normal that incrementals are likely fragmented all over > the > place so you could be IOPS limited (hence watching the iostat). > > Also, whenever I sit and watch it for long times, I see that it varies > enormously. For 5 minutes it will be (some speed), and for 5 minutes it > will be 5x higher... > > Whatever it is, it's something we likely are all seeing, but probably just > ignoring. If you can find it in your heart to just ignore it too, then > great, no problem. ;-) Otherwise, it's a matter of digging in and > characterizing to learn more about it. I see rather variable io stats while sending incremental backups. The receiver is a USB disk, so fairly slow, but I get 30MB/s in a good stretch. I'm compressing the ZFS filesystem on the receiving end, but much of my content is already-compressed photo files, so it doesn't make a huge difference. Helps some, though, and at 30MB/s there's no shortage of CPU horsepower to handle the compression. The raw files are around 12MB each, probably not fragmented much (they're just copied over from memory cards). For a small number of the files, there's a photoshop file that's much bigger (sometimes more than 1GB, if it's a stitched panorama with layers of changes). And then there are sidecar XMP files, mostly two per image, and for most of them web-resolution images, 100kB. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Adding mirrors to an existing zfs-pool]
On Tue, July 26, 2011 09:55, Cindy Swearingen wrote: > > Subject: Re: [zfs-discuss] Adding mirrors to an existing zfs-pool > Date: Tue, 26 Jul 2011 08:54:38 -0600 > From: Cindy Swearingen > To: Bernd W. Hennig > References: <342994905.11311662049567.JavaMail.Twebapp@sf-app1> > > Hi Bernd, > > If you are talking about attaching 4 new disks to a non redundant pool > with 4 disks, and then you want to detach the previous disks then yes, > this is possible and a good way to migrate to new disks. > > The new disks must be the equivalent size or larger than the original > disks. > > See the hypothetical example below. > > If you mean something else, then please provide your zpool status > output. > > Thanks, > > Cindy > > > # zpool status tank > pool: tank > state: ONLINE > scan: resilvered 1018K in 0h0m with 0 errors on Fri Jul 22 15:54:52 2011 > config: > > NAMESTATE READ WRITE CKSUM > tankONLINE 0 0 0 > c4t1d0 ONLINE 0 0 0 > c4t2d0 ONLINE 0 0 0 > c4t3d0 ONLINE 0 0 0 > c4t4d0 ONLINE 0 0 0 > > > # zpool attach tank c4t1d0 c6t1d0 > # zpool attach tank c4t2d0 c6t2d0 > # zpool attach tank c4t3d0 c6t3d0 > # zpool attach tank c4t4d0 c6t4d0 > > The above syntax will create 4 mirrored pairs of disks. I was somewhat surprised when I first learned of this. In my head, I now remember it as "a single disk in ZFS seems to be treated as a one-disk mirror". Previously, in my head, single disks were very different objects from mirrors! I'm still impressed by the ability to attach and detach arbitrary numbers of disks to mirrors. It makes upgrading mirrored disks very very safe, since I can perform the entire procedure without ever reducing redundancy below my starting point (using the classic attach new, resilver, detach old sequence, repeated for however many disks were in the original mirror). -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD vs "hybrid" drive - any advice?
On Mon, July 25, 2011 10:03, Orvar Korvar wrote: > "There is at least a common perception (misperception?) that devices > cannot process TRIM requests while they are 100% busy processing other > tasks." > > Just to confirm; SSD disks can do TRIM while processing other tasks? "Processing" the request just means flagging the blocks, though, right? And the actual benefits only acrue if the garbage collection / block reshuffling background tasks get a chance to run? -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Quick zfs send -i performance questions
On Tue, May 3, 2011 19:39, Rich Teer wrote: > I'm playing around with nearline backups using zfs send | zfs recv. > A full backup made this way takes quite a lot of time, so I was > wondering: after the initial copy, would using an incremental send > (zfs send -i) make the process much quick because only the stuff that > had changed between the previous snapshot and the current one be > copied? Is my understanding of incremental zfs send correct? Yes, that works. In my setup, a full backup takes 6 hours (about 800GB of data to an external USB 2 drive), the incremental maybe 20 minutes even if I've added several gigabytes of images. > Also related to this is a performance question. My initial test involved > copying a 50 MB zfs file system to a new disk, which took 2.5 minutes > to complete. The strikes me as being a bit high for a mere 50 MB; > are my expectation realistic or is it just because of my very budget > concious set up? If so, where's the bottleneck? In addition to issues others have mentiond, the way incremental send works, it follows the order the blocks were written in rather than disk order, so that can sometimes be bad. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Network video streaming [Was: Re: X4540 no next-gen product?]
On 04/08/2011 07:22 PM, J.P. King wrote: > No, I haven't tried a S7000, but I've tried other kinds of network > storage and from a design perspective, for my applications, it doesn't > even make a single bit of sense. I'm talking about high-volume > real-time > video streaming, where you stream 500-1000 (x 8Mbit/s) live streams > from > a machine over UDP. Having to go over the network to fetch the data > from > a different machine is kind of like building a proxy which doesn't > really do anything - if the data is available from a different machine > over the network, then why the heck should I just put another machine > in > the processing path? For my applications, I need a machine with as few > processing components between the disks and network as possible, to > maximize throughput, maximize IOPS and minimize latency and jitter. Amusing history here -- the "Thumper" was developed at Kealia specifically for their streaming video server. Sun then bought them, and continued the video server project until Oracle ate them (the Sun Streaming Video Server). That product supported 80,000 (not a typo) 4 megabit/sec video streams if fully configured. (Not off a single thumper, though, I don't believe.) However, there was a custom hardware board handling streaming, into multiple line cards with multiple 10G optical ethernet interfaces. And a LOT of buffer memory; the card could support 2TB of RAM, though I believe real installations were using 512GB. Data got from the Thumpers to the streaming board over Ethernet, though. In big chunks -- 10MB maybe? (Been a while; I worked on the user interface level, but had little to do with the streaming hardware.) -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS send/receive to Solaris/FBSD/OpenIndiana/Nexenta VM guest?
On Tue, April 5, 2011 14:38, Joe Auty wrote: > Migrating to a new machine I understand is a simple matter of ZFS > send/receive, but reformatting the existing drives to host my existing > data is an area I'd like to learn a little more about. In the past I've > asked about this and was told that it is possible to do a send/receive > to accommodate this, and IIRC this doesn't have to be to a ZFS server > with the same number of physical drives? The internal structure of the pool (how many vdevs, and what kind) is irrelevant to zfs send / receive. So I routinely send from a pool of 3 mirrored pairs of disks to a pool of one large drive, for example (it's how I do my backups). I've also gone the other way once :-( (It's good to have backups). I'm not 100.00% sure I understand what you're asking; does that answer it? Mind you, this can be slow. On my little server (under 1TB filled) the full backup takes about 7 hours (largely because the single large external drive is a USB drive; the bottleneck is the USB). Luckily an incremental backup is rather faster. > How about getting a little more crazy... What if this entire server > temporarily hosting this data was a VM guest running ZFS? I don't > foresee this being a problem either, but with so much at stake I thought > I would double check :) When I say temporary I mean simply using this > machine as a place to store the data long enough to wipe the original > server, install the new OS to the original server, and restore the data > using this VM as the data source. I haven't run ZFS extensively in VMs (mostly just short-lived small test setups). From my limited experience, and what I've heard on the list, it's solid and reliable, though, which is what you need for that application. > Also, more generally, is ZFS send/receive mature enough that when you do > data migrations you don't stress about this? Piece of cake? The > difficulty of this whole undertaking will influence my decision and the > whole timing of all of this. A full send / receive has been reliable for a long time. With a real (large) data set, it's often a long run. It's often done over a network, and any network outage can break the run, and at that point you start over, which can be annoying. If the servers themselves can't stay up for 10 or 20 hours you presumably aren't ready to put them into production anyway :-). > I'm also thinking that a ZFS VM guest might be a nice way to maintain a > remote backup of this data, if I can install the VM image on a > drive/partition large enough to house my data. This seems like it would > be a little less taxing than rsync cronjobs? I'm a big fan of rsync, in cronjobs or wherever. What it won't do is properly preserve ZFS ACLs, and ZFS snapshots, though. I moved from using rsync to using zfs send/receive for my backup scheme at home, and had considerable trouble getting that all working (using incremental send/receive when there are dozens of snapshots new since last time). But I did eventually get up to recent enough code that it's working reliably now. If you can provision big enough data stores for your VM to hold what you need, that seems a reasonable approach to me, but I haven't tried anything much like it, so my opinion is, if you're very lucky, maybe worth what you paid for it. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [illumos-Developer] zfs incremental send?
On Tue, March 29, 2011 07:39, Richard Elling wrote: > On Mar 29, 2011, at 3:10 AM, Roy Sigurd Karlsbakk > wrote: > >> - Original Message - >>> On 2011-Mar-29 02:19:30 +0800, Roy Sigurd Karlsbakk >>> wrote: >>>> Is it (or will it) be possible to do a partial/resumable zfs >>>> send/receive? If having 30TB of data and only a gigabit link, such >>>> transfers takes a while, and if interrupted, will require a >>>> re-transmit of all the data. >>> >>> zfs send/receive works on snapshots: The smallest chunk of data that >>> can be sent/received is the delta between two snapshots. There's no >>> way to do a partial delta - defining the endpoint of a partial >>> transfer or the starting point for resumption is effectively a >>> snapshot. >> >> I know that's how it works, I'm merely pointing out that changing this >> to something resumable would be rather nice, since an initial transfer >> or 30 or 300 terabytes may easily be interrupted. > > In the UNIX tradition, the output and input are pipes. This allows you to > add whatever transport you'd like for moving the bits. There are many that > offer protection against network interruptions. Look for more, interesting > developments in this area soon... Name three :-). I don't happen to have run into any that I can remember. And in any case, that doesn't actually help my situation, where I'm running both processes on the same box (the receive is talking to an external USB disk that I disconnect and take off-site after the receive is complete). A system crash (or power shutdown, or whatever) during this process seems to make the receiving pool unimportable. Possibly I could use recovery tricks to step back a TXG or two until I get something valid, and then manually remove the snapshots added to get back to the initial state, and then I could start the incremental again; in practice, I haven't made that work, and just do another full send to start over (7 hours, not too bad really). Anyway, the incremental send/receive seems to be the fragile point in my backup scheme as well. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Good SLOG devices?
On Tue, March 1, 2011 10:35, Garrett D'Amore wrote: > a) do you need an SLOG at all? Some workloads (asynchronous ones) will > never benefit from an SLOG. I've been fighting the urge to maybe do something about ZIL (which is what we're talking about here, right?). My load is CIFS, not NFS (so not synchronous, right?), but there are a couple of areas that are significant to me where I do decent-size (100MB to 1GB) sequential writes (to newly-created files). On the other hand, when those writes seem to me to be going slowly, the disk access lights aren't mostly on, suggesting that the disk may not be what's holding me up. I can test that by saving to local disk and comparing times, also maybe running zpool iostat. This is a home system, lightly used; the performance issue is me sitting waiting while big Photoshop files save. So of some interest to me personally, and not at ALL like what performance issues on NAS usually look like. It's on a UPS, so I'm not terribly worried about losses on power failure; and I'd just lose my work since the last save, generally, at worst. I might not believe the disk access lights on the box (Chenbro chassis, with two 4-drive hot-swap bays for the data disks; driven off the motherboard SATA plus a Supermicro 8-port SAS controller with SAS-to-SATA cables). In doing a drive upgrade just recently, I got rather confusing results with the lights, perhaps the controller or the drive model made a difference in when the activity lights came on. The VDEVs in the pool are mirror pairs. It's been expanded twice by adding VDEVs and once by replacing devices in one VDEV. So the load is probably fairly unevenly spread across them just now. My desktop connects to this server over gigabit ethernet (through one switch; the boxes sit next to each other on a shelf over my desk). I'll do more research before spending money. But as a question of general theory, should a decent separate intent log device help for a single-user sequential write sequence in the 100MB to 1GB size range? -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Good SLOG devices?
On Tue, March 1, 2011 16:32, Rocky Shek wrote: > David, > > STEC/DataON ZeusRAM(Z4RZF3D-8UC-DNS) SSD now available for users in > channel. > > It is 8GB DDR3 RAM based SAS SSD protected by supercapacitor and NVRAM > 16GB. > > It is designed for ZFS ZIL with low latency > > http://dataonstorage.com/zeusram Says "call for price". I know what that means, it means "If you have to ask, you can't afford it." -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Drive i/o anomaly
On Wed, February 9, 2011 04:51, Matt Connolly wrote: > Nonetheless, I still find it odd that the whole io system effectively > hangs up when one drive's queue fills up. Since the purpose of a mirror is > to continue operating in the case of one drive's failure, I find it > frustrating that the system slows right down so much because one drive's > i/o queue is full. I see what you're saying. But I don't think mirror systems really try to handle asymmetric performance. They either treat the drives equivalently, or else they decide one of them is "broken" and don't use it at all. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS/Drobo (Newbie) Question
On 2011-02-08 21:39, Brandon High wrote: On Tue, Feb 8, 2011 at 12:53 PM, David Dyer-Bennet wrote: Wait, are you saying that the handling of errors in RAIDZ and mirrors is completely different? That it dumps the mirror disk immediately, but keeps trying to get what it can from the RAIDZ disk? Because otherwise, you assertion doesn't seem to hold up. I think he meant that if one drive in a mirror dies completely, then any single read error on the remaining drive is not recoverable. With raidz2 (or a 3-way mirror for that matter), if one drive dies completely, you still have redundancy. Sure, a 2-way mirror has only 100% redundancy; if one dies, no more redundancy. Same for a RAIDZ -- if one dies, no more redundancy. But a 4-drive RAIDZ has roughly twice the odds of a 2-drive mirror of having a drive die. And sure, a RAIDZ two has more redundancy -- as does a 3-way mirror. Or a 48-way mirror (I read a report from somebody who mirrored all the drives in a Thumper box, just to see if he could). -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS/Drobo (Newbie) Question
On Tue, February 8, 2011 13:03, Roy Sigurd Karlsbakk wrote: >> Or you could stick strictly to mirrors; 4 pools 2x2T, 2x2T, 2x750G, >> 2x1.5T. Mirrors are more flexible, give you more redundancy, and are >> much easier to work with. > > Easier to work with, yes, but a RAIDz2 will statistically be safer than a > set of mirrors, since in many cases, you loose a drive and during > resilver, you find bad sectors on another drive in the same VDEV, > resulting in data corruption. With RAIDz2 (or 3), the chance of these > errors to be on the same place on all drives is quite minimal. With a > (striped?) mirror, a single bitflip on the 'healthy' drive will involve > data corruption. Wait, are you saying that the handling of errors in RAIDZ and mirrors is completely different? That it dumps the mirror disk immediately, but keeps trying to get what it can from the RAIDZ disk? Because otherwise, you assertion doesn't seem to hold up. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS/Drobo (Newbie) Question
On Mon, February 7, 2011 14:59, David Dyer-Bennet wrote: > > On Sat, February 5, 2011 11:54, Gaikokujin Kyofusho wrote: >> Thank you kebabber. I will try out indiana and virtual box to play >> around >> with it a bit. >> >> Just to make sure I understand your example, if I say had a 4x2tb >> drives, >> 2x750gb, 2x1.5tb drives etc then i could make 3 groups (perhaps 1 raidz1 >> + >> 1 mirrored + 1 mirrored), in terms of accessing them would they just be >> mounted like 3 partitions or could it all be accessed like one big >> partition? > > A ZFS pool can contain many vdevs; you could put the three groups you > describe into one pool, and then assign one (or more) file-systems to that > pool. Putting them all in one pool seems to me the natural way to handle > it; they're all similar levels of redundancy. It's more flexible to have > everything in one pool, generally. > > (You could also make separate pools; my experience, for what it's worth, > argues for making pools based on redundancy and performance (and only > worry about BIG differences), and assign file-systems to pools based on > needs for redundancy and performance. And for my home system I just have > one big data pool, currently consisting of 1x1TB, 2x400GB, 2x400GB, plus > 1TB hot spare.) Typo; I don't in fact have a non-redundant vdev in my main data pool! It's *2*x1TB at the start of that list. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Understanding directio, O_DSYNC and zfs_nocacheflush on ZFS
On Mon, February 7, 2011 14:49, Yi Zhang wrote: > On Mon, Feb 7, 2011 at 3:14 PM, Bill Sommerfeld > wrote: >> On 02/07/11 11:49, Yi Zhang wrote: >>> >>> The reason why I >>> tried that is to get the side effect of no buffering, which is my >>> ultimate goal. >> >> ultimate = "final". Â you must have a goal beyond the elimination of >> buffering in the filesystem. >> >> if the writes are made durable by zfs when you need them to be durable, >> why >> does it matter that it may buffer data while it is doing so? >> >> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â - >> Bill > > If buffering is on, the running time of my app doesn't reflect the > actual I/O cost. My goal is to accurately measure the time of I/O. > With buffering on, ZFS would batch up a bunch of writes and change > both the original I/O activity and the time. I'm not sure I understand what you're trying to measure (which seems to be your top priority). Achievable performance with ZFS would be better using suitable caching; normally that's the benchmark statistic people would care about. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS/Drobo (Newbie) Question
On Sat, February 5, 2011 11:54, Gaikokujin Kyofusho wrote: > Thank you kebabber. I will try out indiana and virtual box to play around > with it a bit. > > Just to make sure I understand your example, if I say had a 4x2tb drives, > 2x750gb, 2x1.5tb drives etc then i could make 3 groups (perhaps 1 raidz1 + > 1 mirrored + 1 mirrored), in terms of accessing them would they just be > mounted like 3 partitions or could it all be accessed like one big > partition? A ZFS pool can contain many vdevs; you could put the three groups you describe into one pool, and then assign one (or more) file-systems to that pool. Putting them all in one pool seems to me the natural way to handle it; they're all similar levels of redundancy. It's more flexible to have everything in one pool, generally. (You could also make separate pools; my experience, for what it's worth, argues for making pools based on redundancy and performance (and only worry about BIG differences), and assign file-systems to pools based on needs for redundancy and performance. And for my home system I just have one big data pool, currently consisting of 1x1TB, 2x400GB, 2x400GB, plus 1TB hot spare.) Or you could stick strictly to mirrors; 4 pools 2x2T, 2x2T, 2x750G, 2x1.5T. Mirrors are more flexible, give you more redundancy, and are much easier to work with. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Newbie question
On Sat, February 5, 2011 03:54, Gaikokujin Kyofusho wrote: > From what I understand using ZFS one could setup something like RAID 6 > (RAID-Z2?) but with the ability to use drives of varying > sizes/speeds/brands and able to add additional drives later. Am I about > right? If so I will continue studying up on this if not then I guess I > need to continue exploring different options. Thanks!! IMHO, your best bet for this kind of configuration is to use mirror pairs, not RAIDZ*. Because... Things you can't do with RAIDZ*: You cannot remove a vdev from a pool. You cannot make a RAIDZ* vdev smaller (fewer disks). You cannot make a RAIDZ* vdev larger (more disks). To increase the storage capacity of a RAIDZ* vdev you need to replace all the drives, one at a time, waiting for resilver between replacements (resilver times can be VERY long with big modern drives). And during each resilver, your redundancy will be reduced by 1 -- meaning a RAIDZ array would have NO redundancy during the resilver. (And activity in the pool is high during the resilver -- meaning the chances of any marginal drive crapping out are higher than normal during the resilver.) With mirrors, you can add new space by adding simply two drives (add a new mirror vdev). You can upgrade an existing mirror by replacing only two drives. You can upgrade an existing mirror without reducing redundancy below your starting point ever -- you attach a new drive, wait for the resilver to complete (at this point you have a three-way mirror), then detach one of the original drives; repeat for another new drive and the other original drive. Obviously, using mirrors requires you to buy more drives for any given amount of usable space. I must admit that my 8-bay hot-swap ZFS server cost me a LOT more than a Drobo (but then I bought in 2006, too). -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs-discuss Digest, Vol 64, Issue 13
On Sun, February 6, 2011 13:01, Michael Armstrong wrote: > Additionally, the way I do it is to draw a diagram of the drives in the > system, labelled with the drive serial numbers. Then when a drive fails, I > can find out from smartctl which drive it is and remove/replace without > trial and error. Having managed to muddle through this weekend without loss (though with a certain amount of angst and duplication of efforts), I'm in the mood to label things a bit more clearly on my system :-). smartctl doesn't seem to be on my system, though. I'm running snv_134. I'm still pretty badly lost in the whole repository / package thing with Solaris, most of my brain cells were already occupied with Red Hat, Debian, and Perl package information :-( . Where do I look? Are the controller port IDs, the "C9T3D0" things that ZFS likes, reasonably stable? They won't change just because I add or remove drives, right; only maybe if I change controller cards? -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replace block devices to increase pool size
On Sun, February 6, 2011 08:41, Achim Wolpers wrote: > I have a zpool biult up from two vdrives (one mirror and one raidz). The > raidz is built up from 4x1TB HDs. When I successively replace each 1TB > drive with a 2TB drive will the capacity of the raidz double after the > last block device is replaced? You may have to manually set property autoexpand=on; I found yesterday that I had to (in my case on a mirror that I was upgrading). Probably depends on what version you created things at and/or what version you're running now. I replaced the drives in one of the three mirror vdevs in my main pool over this last weekend, and it all went quite smoothly, but I did have to turn on autoexpand at the end of the process to see the new space. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Identifying drives (SATA), question about hot spare allocation
Following up to myself, I think I've got things sorted, mostly. 1. The thing I was most sure of, I was wrong about. Some years back, I must have split the mirrors so that they used different brand disks. I probably did this, maybe even accidentally, when I had to restore from backups at one point. I suppose I could have physically labeled the carriers...no, that's crazy talk! 2. The dd trick doesn't produce reliable activity light activation in my system. I think some of the drives and/or controllers only turn on the activity light for writes. 3. However, in spite of all this, I have replaced the disks in mirror-0 with the bigger disks (via attach-new-resilver-detach-old), and added the third drive I bought as a hot spare. All without having to restore from backups. 4. AND I know which physical drive the detached 400GB drive is. It occurs to me I could make that a second hot spare -- there are 4 remaining 400GB drives in the pool, so it's useful for 2/3 of the failures by drive count. Leading to a new question -- is ZFS smart about hot spare sizes? Will it skip over too-small drives? Will it, even better, prefer smaller drives to larger so long as they are big enough (thus leaving the big drives for bigger failures)? -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Identifying drives (SATA)
On 2011-02-06 05:58, Orvar Korvar wrote: Will this not ruin the zpool? If you overwrite one of discs in the zpool won't the zpool go broke, so you need to repair it? Without quoting I can't tell what you think you're responding to, but from my memory of this thread, I THINK you're forgetting how dd works. The dd commands being proposed to create drive traffic are all read-only accesses, so they shouldn't damage anything -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Drive id confusion
Solaris and/or ZFS are badly confused about drive IDs. The "c5t0d0" names are very far removed from the real world, and possibly they've gotten screwed up somehow. Is devfsadm supposed to fix those, or does it only delete excess? Reason I believe it's confused: zpool status shows mirror-0 on c9t3d0, c9t2d0, and c9t5d0. But format shows the one remaining Seagate 400GB drive at c5t0d0 (my initial pool was two of those; I replaced one with a Samsung 1TB earlier today). Now the mirror with three drives in is my very first mirror, which has to have the one remaining Seagate drive in it (given that I removed one Seagate drive; otherwise I could be confused about order of creation vs. mirror numbering). I'm thinking either Solaris' appalling mess of device files is somehow scrod, or else ZFS is confused in its reporting (perhaps because of cache file contents?). Is there anything I can do about either of these? Does devfsadm really create the apporpirate /dev/dsk and etc. files based on what's present? Would deleting the cache file while the pool is exported, and then searching for and importing the pool, help? How worried should I be? (I've got current backups). -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] /dev/dsk files missing
And devfsadm doesn't create them. Am I looking at the wrong program, or what? -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Identifying drives (SATA)
I've got a small home fileserver, Chenowith case with 8 hot-swap bays. Of course, at this level, I don't have cute little lights next to each drive that the OS knows about and can control to indicate things to me. The configuration I think I have is three mirror pairs. I've got motherboard SATA connections, and an add-in SAS card with SAS-to-SATA cabling (all drives are SATA), and I've tried to wire it so each mirror is split across the two controllers. However -- the old disks were already a pool before. So if I put them in the "wrong" physical slots, when I imported the pool it would have still found them. So I could have the disks in slots that aren't what I expected, without knowing it. I'm planning to upgrade the first mirror by attaching new, larger, drives, letting the resilver finish, and eventually detaching the old drives. I just installed the first new drive, located what controller it was on, and typed an attach command that did what I wanted: bash-4.0$ zpool status zp1 pool: zp1 state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h4m, 3.13% done, 2h5m to go config: NAMESTATE READ WRITE CKSUM zp1 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c9t3d0 ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c9t5d0 ONLINE 0 0 0 14.0G resilvered mirror-1 ONLINE 0 0 0 c9t4d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 c9t2d0 ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 errors: No known data errors As you can see, the new drive being resilvered is in fact associated with the first mirror, as I had intended. (The old drives in the first mirror are older than in the second two, and all three are the same size, so that's definitely the one to replace first.) HOWEVER...the activity lights on the drives aren't doing what I expect. The activity light on the new drive is on pretty solidly (that I expected), but the OTHER activity puzzles me. (User activity is so close to nil that I'm quite confident that's not confusing me; 95% + of the access right now is the resilver. Besides, usage could light up other drives, but it couldn't turn off the lights on the ones being resilvered.) At first, I saw the second drive in the rack light up. I believe that to be c5t1d0, the second disk in mirror-0, and it's the drive I specified for the old drive in the attach command. However, soon I started seeing the fourth drive in the rack light up. I believe that to be c6t1d0; part of mirror-1, and thus having no place in this resilver. It remained active. And after a while, the second drive activity light went off. For some minutes now, I've been seeing activity ONLY on the new drive, and on drive 4 (the one I don't think is part of mirror 0). The activity lights aren't connected by separate cables, so I don't see how I could have them hooked up differently from the disks. It's clear from zpool status that I have attached the new drive to the right mirror. So things are fine for now, I can let the resilver run to completion. I can detach one of the old drives fine, because that's done with logical names, and those are shown in zpool status, so I have no doubt which logical names are the old drives in mirror 0. However, eventually it will be time to physically remove the old drives. If I remove only one at a time, I "shouldn't" cause a disaster even if I pull the wrong one, and I can tell by checking spool status right away whether I pulled the right or wrong one. But this gets me into what I regard as risky territory -- if I pull a live drive, I'm going to suddenly need to know the commands needed to reattach it. Can somebody point me at clear examples of that (or post them)? I just found zpool iostat -v; now that I'm seeing traffic on the individual drives in the pool, it's clearly reading from both the old drives, and writing to the new drive, exactly as expected. But only one activity light is lit on any of the old drives. Is there a clever way to figure out which drive is which? And if I have to fall back on removing a drive I think is right, and seeing if that's true, what admin actions will I have to perform to get the pool back to safety? (I've got backups, but it's a pain to restore of course.) (Hmmm; in single-user mode, use dd to read huge chunks of one disk, and see which lights come on? Do I even need to be in single-user mode
Re: [zfs-discuss] BOOT, ZIL, L2ARC one one SSD?
On Thu, December 23, 2010 22:45, Edward Ned Harvey wrote: >> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- >> boun...@opensolaris.org] On Behalf Of Bill Werner >> >> on a single 60GB SSD drive, use FDISK to create 3 physical partitions, a > 20GB >> for boot, a 30GB for L2ARC and a 10GB for ZIL? Or is 3 physical >> Solaris >> partitions on a disk not considered the entire disk as far as ZFS is >> concerned? > > You can do that. Other people have before. But IMHO, it demonstrates a > faulty way of thinking. > > "SSD's are big and cheap now, so I can buy one of these high performance > things, and slice it up!" In all honesty, GB availability is not your > limiting factor. Speed is your limiting factor. That's the whole point > of > buying the thing in the first place. If you have 3 SSD's, they're each > able > to talk 3Gbit/sec at the same time. But if you buy one SSD which is 3x > larger, you save money but you get 1/3 the speed. Boot, at least, largely doesn't overlap with any significant traffic to ZIL, for example. And where I come from, even at work, money doesn't grow on trees. Sure, three separate SSDs will clearly perform better. They will also cost 3x as much. (Or more, if you don't have three free bays and controller ports.) The question we often have to address is, "what's the biggest performance increase we can get for $500". I considered multiple rotating disks vs. one SSD for that reason, for example. Yeah, anybody quibbling about $500 isn't building top-performance enterprise-grade storage. We do know this. It's still where a whole lot of us live -- especially those running a home NAS. > That's not to say there's never a situation where it makes sense. Other > people have done it, and maybe it makes sense for you. But probably not. Yeah, okay, maybe we're not completely disagreeing. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] file-inherit and dir-inherit at toplevel of ZFS CIFS share
It looks like permissions don't descend properly from the top-level share in CIFS; I had to set them on the next level down to get the intended results (including on lower levels; they seem to inherit properly from the second level, just not from the top). Is this a known behavior, or am I confused and setting myself up for trouble later? More broadly, is there anything good about "best practices" for using ACLS with ZFS and CIFS shares? For example, there are so many defined attributes, some of them with the same short-form letter (I think one is for directories and one is for files in that case, but that's not documented that I can find), that I find myself wondering what "standard bundles" of permissions would be useful. Is it generally better to have separate permissions to inherit for files and directories, or can most things you want be accomplished with just one? Back to specifics again -- I was running into a problem where a user on the Solaris box could rename a file or directory, but an XP box authenticating as the same user could not. This was the one that seemed to be solved by setting the permissions again one level down (dunno what happens with new top-level items yet). Is this normal behavior of something that makes sense? It's terribly weird. (In windows, I could right-click and create the "new directory" or whatever, but when I then filled in the name I wanted and hit enter, I got a permission error. I could just leave it named "new directory", though. And I could rename it on the Linux side as the same user that failed to rename it from the Windows side.) -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Balancing LVOL fill?
On Wed, October 20, 2010 04:24, Tuomas Leikola wrote: > I wished for a more aggressive write balancer but that may be too much > to ask for. I don't think it can be too much to ask for. Storage servers have long enough lives that adding disks to them is a routine operation; to the extent that that's a problem, that really needs to be fixed. However, it's not the sort of thing one should hold one's breath waiting for! -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Finding corrupted files
On Fri, October 8, 2010 04:47, Stephan Budach wrote: > So, I decided to give tar a whirl, after zfs send encountered the next > corrupted file, resulting in an I/O error, even though scrub ran > successfully w/o any erors. I must say that this concept of scrub running w/o error when corrupted files, detectable to zfs send, apparently exist, is very disturbing. Background scrubbing, and the block checksums to make it more meaningful than just reading the disk blocks, was the key thing that drew me into ZFS, and this seems to suggest that it doesn't work. Does your sequence of tests happen to provide evidence that the problem isn't new errors appearing, sometimes after a scrub and before the send? For example, have you done 1) scrub finds no error, 2) send finds error, 3) scrub finds no error? (with nothing in between that could have cleared or fixed the error). -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Increase size of 2-way mirror
On Wed, October 6, 2010 14:14, Tony MacDoodle wrote: > Is it possible to add 2 disks to increase the size of the pool below? > > NAME STATE READ WRITE CKSUM > testpool ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > c1t2d0 ONLINE 0 0 0 > c1t3d0 ONLINE 0 0 0 > mirror-1 ONLINE 0 0 0 > c1t4d0 ONLINE 0 0 0 > c1t5d0 ONLINE 0 0 0 You have two ways to increase the size of this pool (sanely). First, you can add a third mirror vdev. I think that's what you're specifically asking about. You do this with the "zpool add ..." command, see man page. Second, you can add (zpool attach) two larger disks to one of the existing mirror vdevs, wait until the resilvers have finished, and then detach the two original (smaller) disks. At that point (with recent versions; with older versions you have to set a property) the vdev will expand to use the full capacity of the new larger disks, and that space will become available in the pool. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] TLER and ZFS
On Tue, October 5, 2010 16:47, casper@sun.com wrote: > > >>My immediate reaction to this is "time to avoid WD drives for a while"; >>until things shake out and we know what's what reliably. >> >>But, um, what do we know about say the Seagate Barracuda 7200.12 ($70), >>the SAMSUNG Spinpoint F3 1TB ($75), or the HITACHI Deskstar 1TB 3.5" >>($70)? > > > I've seen several important features when selecting a drive for > a mirror: > > TLER (the ability of the drive to timeout a command) I went and got what detailed documentation I could on a couple of the Seagate drives last night, and I couldn't find anything on how they behaved in that sort of error cases. (I believe TLER is a WD-specific term, but I didn't just search, I read them through.) So that's inconvenient. How do we find out about that sort of thing? > sector size (native vs virtual) Richard Elling said ZFS handles the 4k real 512byte fake drives okay now in default setups; but somebody immediately asked for version info, so I'm still watching this one. > power use (specifically at home) Hadn't thought about that. But when I'm upgrading drives, I figure I'm always going to come out better on power than when I started. > performance (mostly for work) I can't bring myself to buy below 7200RPM, but it's probably foolish (except that other obnoxious features tend to come in the "green" drives). > price Yeah, well. I'm cheap. > I've heard scary stories about a mismatch of the native sector size and > unaligned Solaris partitions (4K sectors, unaligned cylinder). So have I. Sounds like you get read-modify-write actions for non-aligned accesses. I hope the next generation of drives admit to being 4k sectors, and that ZFS will be prepared to use them sensibly. But I'm not sure I'm willing to wait for that; the oldest drives in my box are now 4 years old, and I'm about ready for the next capacity upgrade. > I was pretty happen with the WD drives (except for the one with a > seriously > broken cache) but I see the reasons to not to pick WD drives over the 1TB > range. And the big ones are what pretty much everybody is using at home. Capacity and price are vastly more important than performance for most of us. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] TLER and ZFS
On Tue, October 5, 2010 17:20, Richard Elling wrote: > On Oct 5, 2010, at 2:06 PM, Michael DeMan wrote: >> >> On Oct 5, 2010, at 1:47 PM, Roy Sigurd Karlsbakk wrote: >>> Well, here it's about 60% up and for 150 drives, that makes a wee >>> difference... >> Understood on 1.6 times cost, especially for quantity 150 drives. > One service outage will consume far more in person-hours and downtime than > this little bit of money. Penny-wise == Pound-foolish? That looks to be true, yes (going back to the actual prices, 150 drives would cost $6000 extra for the enterprise versions). It's still quite annoying to be jerked around by people charging 60% extra for changing a timeout in the firmware, and carefully making it NOT user-alterable. Also, the non-TLER versions are a constant threat to anybody running home systems, who might quite reasonably think they could put those in a home server. (Yeah, I know the enterprise versions have other differences. I'm not nearly so sure I CARE about the other differences, in the size servers I'm working with.) -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] TLER and ZFS
On Tue, October 5, 2010 15:30, Roy Sigurd Karlsbakk wrote: > I just discovered WD Black drives are rumored not to be set to allow TLER. > Does anyone know how much performance impact the lack of TLER might have > on a large pool? Choosing Enterprise drives will cost about 60% more, and > on a large install, that means a lot of money... My immediate reaction to this is "time to avoid WD drives for a while"; until things shake out and we know what's what reliably. But, um, what do we know about say the Seagate Barracuda 7200.12 ($70), the SAMSUNG Spinpoint F3 1TB ($75), or the HITACHI Deskstar 1TB 3.5" ($70)? This is not a completely theoretical question to me; it's getting on towards time to at least consider replacing my oldest mirrored pair; those are 400GB Seagate, I think, dating from 2006. I'd want something at least twice as big (to make the space upgrade worthwhile), and I'm expecting to buy three of them rather than just two because I think it's time to add a hot spare to the system (currently 3 pair of data disks, and I've got two more bays; I think a hot spare is a better use for them than a fourth pair; safety of the data is very important, performance is adequate, and I need a modest capacity upgrade, but the whole pool is currently 1.2TB usable, not large). On the third hand, there's the Barracuda 7200.11 1.5TB for only $75, which is a really small price increment for a big space increment. The WD RE3 1TB is $130 (all these prices are from Newegg just now). That's very close to TWICE the price of the competing 1TB drives. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] When Zpool has no space left and no snapshots
On Wed, September 29, 2010 15:17, Matt Cowger wrote: > You can truncate a file: > > Echo "" > bigfile > > That will free up space without the 'rm' Copy-on-write; the new version gets written to the disk before the old version is released, it doesn't just overwrite. AND, if it's in any snapshots, the old version doesn't get released. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] When Zpool has no space left and no snapshots
On Wed, September 22, 2010 21:25, Aleksandr Levchuk wrote: > I ran out of space, consequently could not rm or truncate files. (It > make sense because it's a copy-on-write and any transaction needs to > be written to disk. It worked out really well - all I had to do is > destroy some snapshots.) > > If there are no snapshots to destroy, how to prepare for a situation > when a ZFS pool looses it's last free byte? Add some more space somewhere around 90%, or earlier :-). If you do get stuck, you can add another vdev when full, too. Just remember that you're stuck with whatever you add "forever", since there's no way to remove a vdev from a pool. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] non-ECC Systems and ZFS for home users (was: Please warn a home user against OpenSolaris under VirtualBox under WinXP ; ))
On Thu, September 23, 2010 01:33, Alexander Skwar wrote: > Hi. > > 2010/9/19 R.G. Keen > >> and last-generation hardware is very, very cheap. > > Yes, of course, it is. But, actually, is that a true statement? I've read > that it's *NOT* advisable to run ZFS on systems which do NOT have ECC > RAM. And those cheapo last-gen hardware boxes quite often don't have > ECC, do they? Last-generation server hardware supports ECC, and was usually populated with ECC. Last-generation desktop hardware rarely supports ECC, and was even more rarely populated with ECC. The thing is, last-generation server hardware is, um, marvelously adequate for most home setups (the problem *I* see with it, for many home setups, is that it's *noisy*). So, if you can get it cheap in a sound-level that fits your needs, that's not at all a bad choice. I'm running a box I bought new as a home server, but it's NOW at least last-generation hardware (2006), and it's still running fine; in particular the CPU load remains trivial compared to what the box supports (not doing compression or dedup on the main data pool, though I do compress the backup pools on external USB disks). (It does have ECC; even before some of the cases leading to that recommendation were explained on that list, I just didn't see the percentage in not protecting the memory.) -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver = defrag?
On Thu, September 16, 2010 14:04, Miles Nordin wrote: >>>>>> "dd" == David Dyer-Bennet writes: > > dd> Sure, if only a single thread is ever writing to the disk > dd> store at a time. > > video warehousing is a reasonable use case that will have small > numbers of sequential readers and writers to large files. virtual > tape library is another obviously similar one. basically, things > which used to be stored on tape. which are not uncommon. Haven't encountered those kinds of things first-hand, so I didn't think of them. Yes, those sound like they'd have lower numbers of simultaneous users by a lot for one reason or another. > AIUI ZFS does not have a fragmentation problem for these cases unless > you fill past 96%, though I've been trying to keep my pool below 80% > because . As various people have said recently, we have no way to measure it that we know of. I don't feel I have a problem in my own setup, but it's so low-stress that if ZFS doesn't work there, it wouldn't work anywhere. > dd> This situation doesn't exist with any kind of enterprise disk > dd> appliance, though; there are always multiple users doing > dd> stuff. > > the point's relevant, but I'm starting to tune out every time I hear > the word ``enterprise.'' seems it often decodes to: Picked the phrase out of an orifice; trying to distinguish between storage for key corporate data assets, and other uses. > (1) ``fat sacks and no clue,'' or > > (2) ``i can't hear you i can't hear you i have one big hammer in my > toolchest and one quick answer to all questions, and everything's > perfect! perfect, I say. unless you're offering an even bigger > hammer I can swap for this one, I don't want to hear it,'' or > > (3) ``However of course I agree that hammers come in different > colors, and a wise and experienced craftsman will always choose > the color of his hammer based on the color of the nail he's > hitting, because the interface between hammers and nails doesn't > work well otherwise. We all know here how to match hammer and > nail colors, but I don't want to discuss that at all because it's > a private decision to make between you and your salesdroid. > > ``However, in this forum here we talk about GREEN NAILS ONLY. If > you are hitting green nails with red hammers and finding they go > into the wood anyway then you are being very unprofessional > because that nail might have been a bank transaction. --posted > from opensolaris.org'' #3 is particularly amusing! -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver = defrag?
On Wed, September 15, 2010 16:18, Edward Ned Harvey wrote: > For example, if you start with an empty drive, and you write a large > amount > of data to it, you will have no fragmentation. (At least, no significant > fragmentation; you may get a little bit based on random factors.) As life > goes on, as long as you keep plenty of empty space on the drive, there's > never any reason for anything to become significantly fragmented. Sure, if only a single thread is ever writing to the disk store at a time. This situation doesn't exist with any kind of enterprise disk appliance, though; there are always multiple users doing stuff. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver = defrag?
The difference between multi-user thinking and single-user thinking is really quite dramatic in this area. I came up the time-sharing side (PDP-8, PDP-11, DECSYSTEM-20); TOPS-20 didn't have any sort of disk defragmenter, and nobody thought one was particularly desirable, because the normal access pattern of a busy system was spread all across the disk packs anyway. On a desktop workstation, it makes some sense to think about loading big executable files fast -- that's something the user is sitting there waiting for, and there's often nothing else going on at that exact moment. (There *could* be significant things happening in the background, but quite often there aren't.) Similarly, loading a big "document" (single-file book manuscript, bitmap image, or whatever) happens at a point where the user has requested it and is waiting for it right then, and there's mostly nothing else going on. But on really shared disk space (either on a timesharing system, or a network file server serving a good-sized user base), the user is competing for disk activity (either bandwidth or IOPs, depending on the access pattern of the users). Generally you don't get to load your big DLL in one read -- and to the extent that you don't, it doesn't matter much how it's spread around the disk, because the head won't be in the same spot when you get your turn again. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Configuration questions for Home File Server (CPU cores, dedup, checksum)?
On Tue, September 7, 2010 15:58, Craig Stevenson wrote: > 3. Should I consider using dedup if my server has only 8Gb of RAM? Or, > will that not be enough to hold the DDT? In which case, should I add > L2ARC / ZIL or am I better to just skip using dedup on a home file server? I would not consider using dedup in the current state of the code. I hear too many horror stories. Also, why do you think you'd get much benefit? It takes pretty big blocks of exact bit-for-bit duplication to actually trigger the code, and you're not going to find them in compressed image (including motion picture / video) or audio files, for example (the main things that take up much space on most home servers). -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver = defrag?
On Mon, September 13, 2010 07:14, Edward Ned Harvey wrote: >> From: Richard Elling [mailto:rich...@nexenta.com] >> >> This operational definition of "fragmentation" comes from the single- >> user, >> single-tasking world (PeeCees). In that world, only one thread writes >> files >> from one application at one time. In those cases, there is a reasonable >> expectation that a single file's blocks might be contiguous on a single >> disk. >> That isn't the world we live in, where have RAID, multi-user, or multi- >> threaded >> environments. > > I don't know what you're saying, but I'm quite sure I disagree with it. > > Regardless of multithreading, multiprocessing, it's absolutely possible to > have contiguous files, and/or file fragmentation. That's not a > characteristic which depends on the threading model. > > Also regardless of raid, it's possible to have contiguous or fragmented > files. The same concept applies to multiple disks. The attitude that it *matters* seems to me to have developed, and be relevant only to, single-user computers. Regardless of whether a file is contiguous or not, by the time you read the next chunk of it, in the multi-user world some other user is going to have moved the access arm of that drive. Hence, it doesn't matter if the file is contiguous or not. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Storage server hardwae
On Thu, August 26, 2010 13:58, Tom Buskey wrote: > I usually see 17 MB/s max on an external USB 2.0 drive. Interesting; I routinly see 27 MB/s peaking to 30 MB/s on the cheap WD 1TB external drives I use for backups. (Backup is probably best case, the only user of that drive is a zfs receive process.) -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opensolaris is apparently dead
On Mon, August 16, 2010 15:35, Joerg Schilling wrote: > I know of ext* performance checks where people did run gtar to unpack a > linux > kernel archive and these people did nothing but metering the wall clock > time > for gtar. > > I repeated this test and it turned out, that Linux did not even start to > write > to the disk when gtar finished. As a test of ext? performance, that does seem to be lacking something! I guess it's a consequence of the low sound levels of modern disk drives; you go back enough years, that error couldn't have passed unnoticed :-) . -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opensolaris is apparently dead
On Mon, August 16, 2010 12:36, Bob Friesenhahn wrote: > Can someone provide a link to the requisite source files so that we > can see the copyright statements? It may well be that Oracle assigned > the copyright to some other party. 2 * Copyright (C) 2007 Oracle. All rights reserved. 3 * 4 * This program is free software; you can redistribute it and/or 5 * modify it under the terms of the GNU General Public 6 * License v2 as published by the Free Software Foundation. <http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-unstable.git;a=blob;f=fs/btrfs/root-tree.c;h=2d958be761c84556b39c60afa3b0f3fd75d6;hb=HEAD> -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opensolaris is apparently dead
On Sat, August 14, 2010 16:26, Andrej Podzimek wrote: > Well, a typical conversation about speed and stability usually boils down > to this: > > A: I've heard that XYZ is unstable and slow. > B: Are you sure? Have you tested XYZ? What are your benchmark results? > Have you had any issues? > A: No. I *have* *not* *tested* XYZ. I think XYZ is so unstable and slow > that it's not worth testing. Yes indeed! I can't afford to test everything carefully. Like most people, I read published reports and listen to conversations places like this, and form an impression of what performs how. Then I do some testing to verify that something I'm seriously considering produces satisfactory performance. The key there is "satisfactory"; I'm not looking for the "best", I'm looking for something that fits in and is satisfactory. The more unusual my requirements, and the better defined, the less I can gain from studying outside test reports. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opensolaris is apparently dead
On Sun, August 15, 2010 09:19, David Magda wrote: > On Aug 14, 2010, at 14:54, Edward Ned Harvey wrote: > >> From: Russ Price > >> >>> For me, Solaris had zero mindshare since its beginning, on account of >>> being prohibitively expensive. >> >> I hear that a lot, and I don't get it. $400/yr does move it out of >> peoples' >> basements generally, and keeps sol10 out of enormous clustering >> facilities >> that don't have special purposes or free alternatives. But I >> wouldn't call >> it prohibitively expensive, for a whole lot of purposes. > > But that US$ 400 was only if you wanted support. For the last little > while you could run Solaris 10 legally without a support contract > without issues. Looks like there are prices for "service" for things that could legitimately be called RedHat Enterprise Linux from $80/year up into at least the mid thousands; this may account for the range of impressions people have. The 24/7 Premium subscription for a two-socket server is $1299/year. The business-hours plan is $799. <https://www.redhat.com/wapps/store/catalog.html> Your point that "free" has been important is very true. I'm not sure that what Oracle says they're doing with Solaris 11 Express won't cover that at least for business customers, though. (I do think that they'll lose out on the extensive testing we've been providing.) -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opensolaris is apparently dead
On Mon, August 16, 2010 11:01, Joerg Schilling wrote: > "David Dyer-Bennet" wrote: > >> >> As such, they'll need to continue to comply with GPLv2 requirements. >> > >> > No, there is definitely no need for Oracle to comply with the GPL as >> they >> > own the code. >> >> Ray's point is, how long would BTRFS remain in the Linux kernel in that >> case? > > Such a license change can happen at any time. The Linux folks have no > grant > that it would not happen. And they have every right to stop including BTRFS in the kernel whenever they wish. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opensolaris is apparently dead
On Mon, August 16, 2010 10:43, Joerg Schilling wrote: > "David Dyer-Bennet" wrote: > >> >> On Sun, August 15, 2010 20:44, Peter Jeremy wrote: >> >> > Irrespective of the above, there is nothing requiring Oracle to >> release >> > any future btrfs or ZFS improvements (or even bugfixes). They can't >> > retrospectively change the license on already released code but they >> > can put a different (non-OSS) license on any new code. >> >> That's true. >> >> However, if Oracle makes a binary release of BTRFS-derived code, they >> must >> release the source as well; BTRFS is under the GPL. > > This claim would only be true in case that Oracle does not own the > copyright > on its' code... Oops, yeah, you're right there; the copyright holder can grant additional licenses and do things itself. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opensolaris is apparently dead
On Mon, August 16, 2010 10:48, Joerg Schilling wrote: > Ray Van Dolson wrote: > >> > I absolutely guarantee Oracle can and likely already has >> > dual-licensed BTRFS. >> >> Well, Oracle obviously would want btrfs to stay as part of the Linux >> kernel rather than die a death of anonymity outside of it... >> >> As such, they'll need to continue to comply with GPLv2 requirements. > > No, there is definitely no need for Oracle to comply with the GPL as they > own the code. Ray's point is, how long would BTRFS remain in the Linux kernel in that case? -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opensolaris is apparently dead
On Sun, August 15, 2010 20:44, Peter Jeremy wrote: > Irrespective of the above, there is nothing requiring Oracle to release > any future btrfs or ZFS improvements (or even bugfixes). They can't > retrospectively change the license on already released code but they > can put a different (non-OSS) license on any new code. That's true. However, if Oracle makes a binary release of BTRFS-derived code, they must release the source as well; BTRFS is under the GPL. So, if they're going to use it in any way as a product, they have to release the source. If they want to use it just internally they can do anything they want, of course. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] New Supermicro SAS/SATA controller: AOC-USAS2-L8e in SOHO NAS and HD HT
On Aug 12, 2010, at 7:03 PM, valrh...@gmail.com wrote: > Has anyone bought one of these cards recently? It seems to list for > around $170 at various places, which seems like quite a decent deal. But > no well-known reputable vendor I know seems to sell these, and I want to > be able to have someone backing the sale if something isn't perfect. > Where do you all recommend buying this card from? I put something very similar in -- same number with an 'i' suffix instead of the 'e'. I remember seeing both existed at the time, and that the i was what I needed. I'm using SATA cables, and no expanders (each cable goes directly to a drive), maybe the 'e' has more advanced features (that I knew I didn't need). I can't imagine the retailer would be of any value for support on such a card; perhaps, in the worst case, they might possibly take it back. Selling it on Ebay is often more profitable, since the buyer pays shipping :-). -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Problems with big ZFS send/receive in b134
On Wed, August 11, 2010 15:11, Paul Kraus wrote: > On Wed, Aug 11, 2010 at 10:36 AM, David Dyer-Bennet wrote: >>>> Am I looking for too much here? I *thought* I was doing something >>>> that >>>> should be simple and basic and frequently used nearly everywhere, and >>>> hence certain to work. "What could go wrong?", I thought :-). If I'm >>>> doing something inherently dicey I can try to find a way to back off; >>>> as >>>> my primary backup process, this needs to be rock-solid. > > It looks like you are trying to do a full send every time, what about > a first full then incremental (which should be much faster) ? The > first full might run afoul of the 2 hour snapshots (and deletions), > but I would not expect the incremental to. I am syncing about 20 TB of > data between sites this way every 4 hours over a 100 Mb link. I put > the snapshot management and the site to site replication in the same > script to keep them from fighting :-) What I'm working on is, in fact, the first backup. I intended from the start to use incrementals; they just didn't work in earlier versions, and I was reduced to doing full backups only. And I need a successful full backup to start the series, and to initialize any new backup media, and so forth. So I think I have to solve this problem, even if most of the backups will be incrementals. Mostly the incrementals should be quite fast -- but I can come home from a weekend away with 30 GB or so of photos, which would appear on the server all at once. Still, that's well under 2 hours. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Problems with big ZFS send/receive in b134
On Tue, August 10, 2010 16:41, Dave Pacheco wrote: > David Dyer-Bennet wrote: >> If that turns out to be the problem, that'll be annoying to work around >> (I'm making snapshots every two hours and deleting them after a couple >> of >> weeks). Locks between admin scripts rarely end well, in my experience. >> But at least I'd know what I had to work around. >> >> Am I looking for too much here? I *thought* I was doing something that >> should be simple and basic and frequently used nearly everywhere, and >> hence certain to work. "What could go wrong?", I thought :-). If I'm >> doing something inherently dicey I can try to find a way to back off; as >> my primary backup process, this needs to be rock-solid. > > > It's certainly a reasonable thing to do and it should work. There have > been a few problems around deleting and renaming snapshots as they're > being sent, but the delete issues were fixed in build 123 by having > zfs_send hold snapshots being sent (as long as you've upgraded your pool > past version 18), and it sounds like you're not doing renames, so your > problem may be unrelated. AHA! You may have nailed the issue -- I've upgraded from 111b to 134, but have not yet upgraded my pool. Checking...yes, the pool I'm sending from is V14. (I don't instantly upgrade pools; I need to preserve the option of falling back to older software for a while after an upgrade.) So, I should try either turning off my snapshot creator/deleter during the backup, or upgrade the pool. Will do! (I will eventually upgrade the pool of course, but I think I'll try the more reversible option first. I can have the deleter check for the pid file the backup already creates to avoid two backups running at once.) Thank you very much! This is extremely encouraging. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Problems with big ZFS send/receive in b134
On Tue, August 10, 2010 23:13, Ian Collins wrote: > On 08/11/10 03:45 PM, David Dyer-Bennet wrote: >> cannot receive incremental stream: most recent snapshot of >> bup-wrack/fsfs/zp1/ddb does not >> match incremental source > That last error occurs if the snapshot exists, but has changed, it has > been deleted and a new one with the same name created. So for testing purposes at least, I need to shut down everything I have that creates or deletes snapshots. (I don't, though, have anything that would delete one and create one with the same name. I create snapshots with various names (2hr, daily, weekly, monthly, yearly) and a current timestamp, and I delete old ones (many days old at a minimum).) And I think I'll abstract the commands from my backup script into a simpler dedicated test script, so I'm sure I'm doing exactly the same thing each time (that should cause me to hit on a combination that works right away :-) ). Is there anything stock in b134 that messes with snapshots that I should shut down to keep things stable, or am I only worried about my own stuff? Are other people out there not using send/receive for backups? Or not trying to preserve snapshots while doing it? Or, are you doing what I'm doing, and not having the problems I'm having? -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Problems with big ZFS send/receive in b134
On 10-Aug-10 13:46, David Dyer-Bennet wrote: On Tue, August 10, 2010 13:23, Dave Pacheco wrote: David Dyer-Bennet wrote: My full backup still doesn't complete. However, instead of hanging the entire disk subsystem as it did on 111b, it now issues error messages. Errors at the end. [...] cannot receive incremental stream: most recent snapshot of bup-wrack/fsfs/zp1/ddb does not match incremental source bash-4.0$ The bup-wrack pool was newly-created, empty, before this backup started. The backup commands were: zfs send -Rv "$srcsnap" | zfs recv -Fudv "$BUPPOOL/$HOSTNAME/$FS" I don't see how anything could be creating snapshots on bup-wrack while this was running. That pool is not normally mounted (it's on a single external USB drive, I plug it in for backups). My script for doing regular snapshots of zp1 and rpool doesn't reference any of the bup-* pools. I don't see how this snapshot mismatch can be coming from anything but the send/receive process. There are quite a lot of snapshots; dailys for some months, 2-hour ones for a couple of weeks. Most of them are empty or tiny. Next time I will try WITHOUT -v on both ends, and arrange to capture the expanded version of the command with all the variables filled in, but I don't expect any different outcome. Any other ideas? Is it possible that snapshots were renamed on the sending pool during the send operation? I don't have any scripts that rename a snapshot (in fact I didn't know it was possible until just now), and I don't have other users with permission to make snapshots (either delegated or by root access). I'm not using the Sun auto-snapshot thing, I've got a much-simpler script of my own (hence I know what it does). So I don't at the moment see how one would be getting renamed. It's possible that a snapshot was *deleted* on the sending pool during the send operation, however. Also that snapshots were created (however, a newly created one would be after the one specified in the zfs send -R, and hence should be irrelevant). (In fact it's certain that snapshots were created and I'm nearly certain of deleted.) More information. The test I started this morning errored out somewhat similarly, and one set of errors is clearly deleted snapshots (they're 2hr snapshots that some of get deleted every 2 hours). There are also errors relating to "incremental streams" which is strange since I'm not using -I or -i at all. Here are the commands again, and all the output. + zfs create -p bup-wrack/fsfs/zp1 + zfs send -Rp z...@bup-20100810-154542gmt + zfs recv -Fud bup-wrack/fsfs/zp1 warning: cannot send 'zp1/d...@bup-2hr-20100731-12cdt': no such pool or dataset warning: cannot send 'zp1/d...@bup-2hr-20100731-14cdt': no such pool or dataset warning: cannot send 'zp1/d...@bup-2hr-20100731-16cdt': no such pool or dataset warning: cannot send 'zp1/d...@bup-20100731-213303gmt': incremental source (@bup-2hr-20100731-16CDT) does not exist warning: cannot send 'zp1/d...@bup-2hr-20100731-18cdt': no such pool or dataset warning: cannot send 'zp1/d...@bup-2hr-20100731-20cdt': incremental source (@bup-2hr-20100731-18CDT) does not exist cannot receive incremental stream: most recent snapshot of bup-wrack/fsfs/zp1/ddb does not match incremental source Afterward, bash-4.0$ zpool list NAMESIZE ALLOC FREECAP DEDUP HEALTH ALTROOT bup-wrack 928G 687G 241G73% 1.00x ONLINE /backups/bup-wrack rpool 149G 10.0G 139G 6% 1.00x ONLINE - zp11.09T 743G 373G66% 1.00x ONLINE - So quite a lot did get transferred; but not all. So, it appears clear that snapshots being deleted during the zfs send -R causes a warning. A warning is fine, since they're not there it can't send them, and they were there when the command was given so it makes sense for it to try. That last message, which is not tagged as either warning or error, worries me though. And wondering how complete the transfer is; I believe the backup copy is compressed whereas the zp1 copy isn't, so the ALLOC being that different isn't clear-cut evidence of anything. I'll try to guess a few things that should be recent and see if they in fact got into the backup. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Problems with big ZFS send/receive in b134
On Tue, August 10, 2010 13:23, Dave Pacheco wrote: > David Dyer-Bennet wrote: >> My full backup still doesn't complete. However, instead of hanging the >> entire disk subsystem as it did on 111b, it now issues error messages. >> Errors at the end. > [...] >> cannot receive incremental stream: most recent snapshot of >> bup-wrack/fsfs/zp1/ddb does not >> match incremental source >> bash-4.0$ >> >> The bup-wrack pool was newly-created, empty, before this backup started. >> >> The backup commands were: >> >> zfs send -Rv "$srcsnap" | zfs recv -Fudv "$BUPPOOL/$HOSTNAME/$FS" >> >> I don't see how anything could be creating snapshots on bup-wrack while >> this was running. That pool is not normally mounted (it's on a single >> external USB drive, I plug it in for backups). My script for doing >> regular snapshots of zp1 and rpool doesn't reference any of the bup-* >> pools. >> >> I don't see how this snapshot mismatch can be coming from anything but >> the >> send/receive process. >> >> There are quite a lot of snapshots; dailys for some months, 2-hour ones >> for a couple of weeks. Most of them are empty or tiny. >> >> Next time I will try WITHOUT -v on both ends, and arrange to capture the >> expanded version of the command with all the variables filled in, but I >> don't expect any different outcome. >> >> Any other ideas? > > > Is it possible that snapshots were renamed on the sending pool during > the send operation? I don't have any scripts that rename a snapshot (in fact I didn't know it was possible until just now), and I don't have other users with permission to make snapshots (either delegated or by root access). I'm not using the Sun auto-snapshot thing, I've got a much-simpler script of my own (hence I know what it does). So I don't at the moment see how one would be getting renamed. It's possible that a snapshot was *deleted* on the sending pool during the send operation, however. Also that snapshots were created (however, a newly created one would be after the one specified in the zfs send -R, and hence should be irrelevant). (In fact it's certain that snapshots were created and I'm nearly certain of deleted.) If that turns out to be the problem, that'll be annoying to work around (I'm making snapshots every two hours and deleting them after a couple of weeks). Locks between admin scripts rarely end well, in my experience. But at least I'd know what I had to work around. Am I looking for too much here? I *thought* I was doing something that should be simple and basic and frequently used nearly everywhere, and hence certain to work. "What could go wrong?", I thought :-). If I'm doing something inherently dicey I can try to find a way to back off; as my primary backup process, this needs to be rock-solid. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Problems with big ZFS send/receive in b134
Additional information. I started another run, and captured the exact expanded commands. These SHOULD BE the exact commands used in the last run except for the snapshot name (this script makes a recursive snapshot just before it starts a backup). In any case they ARE the exact commands used in this new run, and we'll see what happens at the end of this run. (These are from a bash trace as produced by "set -x") + zfs create -p bup-wrack/fsfs/zp1 + zfs send -Rp z...@bup-20100810-154542gmt + zfs recv -Fud bup-wrack/fsfs/zp1 (The send and the receive are source and sink in a pipeline). As you can see, the destination filesystem is new in the bup-wrack pool. The "-R" on the send should, as I understand it, create a replication stream which will "replicate the specified filesystem, and all descendent file systems, up to the named snapshot. When received, all properties, snapshots, descendent file systems, and clones are preserved." This should send the full state of zp1 up to the snapshot. And the receive should receive it into bup-wrack/fsfs/zp1.) Isn't this how a "full backup" should be made using zfs send/receive? (Once this is working, I think intend to use -I to send incremental streams to update it regularly.) bash-4.0$ zpool list NAMESIZE ALLOC FREECAP DEDUP HEALTH ALTROOT bup-wrack 928G 4.62G 923G 0% 1.00x ONLINE /backups/bup-wrack rpool 149G 10.0G 139G 6% 1.00x ONLINE - zp11.09T 743G 373G66% 1.00x ONLINE - zp1 is my primary data pool. It's not very big (physically it's 3 2-way mirrors of 400GB drives). It has 743G of data in it. bup-wrack is the backup pool, it's a single 1TB external USB drive. This was taken shortly after starting the second try at a full backup (since the b134 upgrade), so bup-wrack is still mostly empty. None of the pools have shown any errors of any sort in months. zp1 and rpool are scrubbed weekly. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Problems with big ZFS send/receive in b134
My full backup still doesn't complete. However, instead of hanging the entire disk subsystem as it did on 111b, it now issues error messages. Errors at the end. sending from @bup-daily-20100726-10CDT to zp1/d...@bup-daily-20100727-10cdt received 3.80GB stream in 136 seconds (28.6MB/sec) receiving incremental stream of zp1/d...@bup-daily-20100727-10cdt into bup-wrack/fsfs/z p1/d...@bup-daily-20100727-10cdt sending from @bup-daily-20100727-10CDT to zp1/d...@bup-daily-20100728-11cdt received 192MB stream in 10 seconds (19.2MB/sec) receiving incremental stream of zp1/d...@bup-daily-20100728-11cdt into bup-wrack/fsfs/z p1/d...@bup-daily-20100728-11cdt sending from @bup-daily-20100728-11CDT to zp1/d...@bup-daily-20100729-10cdt received 170MB stream in 9 seconds (18.9MB/sec) receiving incremental stream of zp1/d...@bup-daily-20100729-10cdt into bup-wrack/fsfs/z p1/d...@bup-daily-20100729-10cdt sending from @bup-daily-20100729-10CDT to zp1/d...@bup-2hr-20100729-22cdt warning: cannot send 'zp1/d...@bup-2hr-20100729-22cdt': no such pool or dataset sending from @bup-2hr-20100729-22CDT to zp1/d...@bup-2hr-20100730-00cdt warning: cannot send 'zp1/d...@bup-2hr-20100730-00cdt': no such pool or dataset sending from @bup-2hr-20100730-00CDT to zp1/d...@bup-2hr-20100730-02cdt warning: cannot send 'zp1/d...@bup-2hr-20100730-02cdt': no such pool or dataset sending from @bup-2hr-20100730-02CDT to zp1/d...@bup-2hr-20100730-04cdt warning: cannot send 'zp1/d...@bup-2hr-20100730-04cdt': incremental source (@bup-2hr-20 100730-02CDT) does not exist sending from @bup-2hr-20100730-04CDT to zp1/d...@bup-2hr-20100730-06cdt sending from @bup-2hr-20100730-06CDT to zp1/d...@bup-2hr-20100730-08cdt sending from @bup-2hr-20100730-08CDT to zp1/d...@bup-daily-20100730-10cdt sending from @bup-daily-20100730-10CDT to zp1/d...@bup-2hr-20100730-10cdt sending from @bup-2hr-20100730-10CDT to zp1/d...@bup-2hr-20100730-12cdt sending from @bup-2hr-20100730-12CDT to zp1/d...@bup-2hr-20100730-14cdt sending from @bup-2hr-20100730-14CDT to zp1/d...@bup-2hr-20100730-16cdt sending from @bup-2hr-20100730-16CDT to zp1/d...@bup-2hr-20100730-18cdt sending from @bup-2hr-20100730-18CDT to zp1/d...@bup-2hr-20100730-20cdt sending from @bup-2hr-20100730-20CDT to zp1/d...@bup-2hr-20100730-22cdt received 162MB stream in 9 seconds (18.0MB/sec) receiving incremental stream of zp1/d...@bup-2hr-20100730-06cdt into bup-wrack/fsfs/zp1 /d...@bup-2hr-20100730-06cdt cannot receive incremental stream: most recent snapshot of bup-wrack/fsfs/zp1/ddb does not match incremental source bash-4.0$ The bup-wrack pool was newly-created, empty, before this backup started. The backup commands were: zfs send -Rv "$srcsnap" | zfs recv -Fudv "$BUPPOOL/$HOSTNAME/$FS" I don't see how anything could be creating snapshots on bup-wrack while this was running. That pool is not normally mounted (it's on a single external USB drive, I plug it in for backups). My script for doing regular snapshots of zp1 and rpool doesn't reference any of the bup-* pools. I don't see how this snapshot mismatch can be coming from anything but the send/receive process. There are quite a lot of snapshots; dailys for some months, 2-hour ones for a couple of weeks. Most of them are empty or tiny. Next time I will try WITHOUT -v on both ends, and arrange to capture the expanded version of the command with all the variables filled in, but I don't expect any different outcome. Any other ideas? -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Directory tree renaming -- disk usage
If I have a directory with a bazillion files in it (or, let's say, a directory subtree full of raw camera images, about 15MB each, totalling say 50GB) on a ZFS filesystem, and take daily snapshots of it (without altering it), the snapshots use almost no extra space, I know. If I now rename that directory, and take another snapshot, what happens? Do I get two copies of the unchanged data now, or does everything still reference the same original data (file content)? Seems like the new directory tree contains the "same old files", same inodes and so forth, so it shouldn't be duplicating the data as I understand it; is that correct? This would, obviously, be fairly easy to test; and, if I removed the snapshots afterward, wouldn't take space permanently (have to make sure that the scheduler doesn't do one of my permanent snapshots during the test). But I'm interested in the theoretical answer in any case. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Errors upgrading 2009.06 to dev build 134
Last night I upgraded from 2009.6 to b134 from the dev branch. I haven't tried to boot the resulting BE yet, because I got the following errors: PHASEACTIONS Removal Phase16199/21806 Warning - directory etc/sma/snmp/mibs not empty - contents preserved in /tmp/tmp3udGdj/var/pkg/lost+found/etc/sma/snmp/mibs-20100805T075635Z Removal Phase21806/21806 Install Phase79042/80017 The 'pcieb' driver shares the alias 'pciexclass,060400' with the 'pcie_pci' driver, but the system cannot determine how the latter was delivered. Its entry on line 2 in /etc/driver_aliases has been commented out. If this driver is no longer needed, it may be removed by booting into the 'opensolaris-3' boot environment and invoking 'rem_drv pcie_pci' as well as removing line 2 from /etc/driver_aliases or, before rebooting, mounting the 'opensolaris-3' boot environment and running 'rem_drv -b pcie_pci' and removing line 2 from /etc/driver_aliases. The 'pcieb' driver shares the alias 'pciexclass,060401' with the 'pcie_pci' driver, but the system cannot determine how the latter was delivered. Its entry on line 3 in /etc/driver_aliases has been commented out. If this driver is no longer needed, it may be removed by booting into the 'opensolaris-3' boot environment and invoking 'rem_drv pcie_pci' as well as removing line 3 from /etc/driver_aliases or, before rebooting, mounting the 'opensolaris-3' boot environment and running 'rem_drv -b pcie_pci' and removing line 3 from /etc/driver_aliases. Install Phase80017/80017 Update Phase 27721/27760 driver (aggr) upgrade (removal of policy'read_priv_set=net_rawaccess write_priv_set=net_rawaccess) failed: minor node spec required. Update Phase 27725/27760 driver (softmac) upgrade (removal of policy'read_priv_set=net_rawaccess write_priv_set=net_rawaccess) failed: minor node spec required. Update Phase 27726/27760 driver (vnic) upgrade (removal of policy'read_priv_set=net_rawaccess write_priv_set=net_rawaccess) failed: minor node spec required. Update Phase 27736/27760 driver (ibd) upgrade (removal of policy'read_priv_set=net_rawaccess write_priv_set=net_rawaccess) failed: minor node spec required. Update Phase 27743/27760 driver (dnet) upgrade (removal of policy'read_priv_set=net_rawaccess write_priv_set=net_rawaccess) failed: minor node spec required. Update Phase 27744/27760 driver (elxl) upgrade (removal of policy'read_priv_set=net_rawaccess write_priv_set=net_rawaccess) failed: minor node spec required. Update Phase 27745/27760 driver (iprb) upgrade (removal of policy'read_priv_set=net_rawaccess write_priv_set=net_rawaccess) failed: minor node spec required. U Do these look familiar to anybody? Can they, I hope, be ignored? Or does anybody have any ideas what needs to be fixed? I didn't install any drivers beyond what the earlier installers figured out for themselves I needed, and I didn't mess with driver config that I recall. I know I can probably fall back to what I'm running now if this new install fails to run, and I'll eventually just try it. I've got a couple of bootable CDs with "recovery consoles" that at least get me single user, and one with full LiveCD capability, so I should be able to unwind the mess if necessary. I guess technically this has no business on zfs-discuss; apologies for that, but all the prior discussion of this upgrade, and the motivation for it, is that I need a more current ZFS, and everybody I know is in this list, not over in the install-discuss list. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Upgrading 2009.06 to something current
What's a good choice for a decently stable upgrade? I'm unable to run backups because ZFS send/receive won't do full-pool replication reliably, it hangs better than 2/3 of the time, and people here have told me later versions (later than 111b) fix this. I was originally waiting for the "spring" release, but okay, I've kind of given up on that. This is a home "production" server; it's got all my photos on it. And the backup isn't as current as I'd like, and I'm having trouble getting a better backup. (I'll do *something* before I risk the upgrade; maybe brute force, rsync to an external drive, to at least give me a clean copy of the current state; I can live without ACLs.) I find various blogs with instructions for how to do such an upgrade, and they don't agree, and each one has posts from people for whom it didn't work, too. Is there any kind of consensus on what the best way to do this is? -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Legality and the future of zfs...
On Fri, July 16, 2010 14:07, Frank Cusack wrote: > On 7/16/10 12:02 PM -0500 David Dyer-Bennet wrote: >>> It would be nice to have applications request to be notified >>> before a snapshot is taken, and when that have requested >>> notification have acknowledged that they're ready, the snapshot >>> would be taken; and then another notification sent that it was >>> taken. Prior to indicating they were ready, the apps could >>> have achieved a logically consistent on disk state. That >>> would eliminate the need for (for example) separate database >>> backups, if you could have a snapshot with the database on it >>> in a consistent state. >> >> Any software dependent on cooperating with the filesystem to ensure that >> the files are consistent in a snapshot fails the cord-yank test (which >> is >> equivalent to the "processor explodes" test and the "power supply bursts >> into flames" test and the "disk drive shatters" test and so forth). It >> can't survive unavoidable physical-world events. > > It can, if said software can roll back to the last consistent state. > That may or may not be "recent" wrt a snapshot. If an application is > very active, it's possible that many snapshots may be taken, none of > which are actually in a state the application can use to recover from. > Rendering snapshots much less effective. Wait, if the application can in fact survive the "cord pull" test then by definition of "survive", all the snapshots are useful. They'll be everything consistent that was committed to disk by the time of the yank (or snapshot); which, it seems to me, is the very best that anybody could hope for. > Also, just administratively, and perhaps legally, it's highly desirable > to know that the time of a snapshot is the actual time that application > state can be recovered to or referenced to. Maybe, but since that's not achievable for your core corporate asset (the database), I think of it as a pipe dream rather than a goal. > Also, if an application cannot survive a cord-yank test, it might be > even more highly desirable that snapshots be a stable that from which > the application can be restarted. If it cannot survive a cord-yank test, it should not be run, ever, by anybody, for any purpose more important than playing a game. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Legality and the future of zfs...
On Fri, July 16, 2010 08:39, Richard L. Hamilton wrote: >> > It'd be handy to have a mechanism where >> applications could register for >> > snapshot notifications. When one is about to >> happen, they could be told >> > about it and do what they need to do. Once all the >> applications have >> > acknowledged the snapshot alert--and/or after a >> pre-set timeout--the file >> > system would create the snapshot, and then notify >> the applications that >> > it's done. >> > >> Why would an application need to be notified? I think >> you're under the >> misconception that something happens when a ZFS >> snapshot is taken. >> NOTHING happens when a snapshot is taken (OK, well, >> there is the >> snapshot reference name created). Blocks aren't moved >> around, we don't >> copy anything, etc. Applications have no need to "do >> anything" before a >> snapshot it taken. > > It would be nice to have applications request to be notified > before a snapshot is taken, and when that have requested > notification have acknowledged that they're ready, the snapshot > would be taken; and then another notification sent that it was > taken. Prior to indicating they were ready, the apps could > have achieved a logically consistent on disk state. That > would eliminate the need for (for example) separate database > backups, if you could have a snapshot with the database on it > in a consistent state. Any software dependent on cooperating with the filesystem to ensure that the files are consistent in a snapshot fails the cord-yank test (which is equivalent to the "processor explodes" test and the "power supply bursts into flames" test and the "disk drive shatters" test and so forth). It can't survive unavoidable physical-world events. Conversely, any scheme for a program writing to its files that PASSES those tests will be fine with arbitrary snapshots, too. For that matter, remember that the "snapshot" may be taken on a zfs server on another continent which is making the storage available via iScsi; there's currently no notification channel to tell the software the snapshot is happening. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Legality and the future of zfs...
On Thu, July 15, 2010 09:29, Tim Cook wrote: > On Thu, Jul 15, 2010 at 9:09 AM, David Dyer-Bennet wrote: > >> >> On Wed, July 14, 2010 23:51, Tim Cook wrote: >> > You're clearly talking about something completely different than >> everyone >> > else. Whitebox works GREAT if you've got 20 servers. Try scaling it >> to >> > 10,000. "A couple extras" ends up being an entire climate controlled >> > warehouse full of parts that may or may not be in the right city. Not >> to >> > mention you've then got full-time staff on-hand to constantly be >> replacing >> > parts. Your model doesn't scale for 99% of businesses out there. >> Unless >> > they're google, and they can leave a dead server in a rack for years, >> it's >> > an unsustainable plan. Out of the fortune 500, I'd be willing to bet >> > there's exactly zero companies that use whitebox systems, and for a >> > reason. >> >> You might want to talk to Google about that; as I understand it they >> decided that buying expensive servers was a waste of money precisely >> because of the high numbers they needed. Even with the good ones, some >> will fail, so they had to plan to work very well through server >> failures, >> so they can save huge amounts of money on hardware by buying cheap >> servers rather than expensive ones. > Obviously someone was going to bring up google, whose business model is > unique, and doesn't really apply to anyone else. Google makes it work > because they order so many thousands of servers at a time, they can demand > custom made parts for the servers, that are built to their specifications. Certainly they're one of the most unusual setups out there, in several ways (size, plus details of what they do with their computers. > Furthermore, the clustering and filesystem they use wouldn't function at > all for 99% of the workloads out there. Their core application: search, > is > what makes the hardware they use possible. If they were serving up a > highly > transactional database that required millisecond latency it would be a > different story. Again, I'm not at all convinced of that "99%" bit. Obviously low-latency transactional database applications are about the polar opposite of what Google does. However, transactional database applications are nearer 1% than 99% of the workloads out there, at every shop I've worked at or seen detailed descriptions of. Big email farms, for example, don't generally have that kind of database at all. Big web farms probably do have some databases used that way -- but not for that high a percentage of their traffic, and generally running on one big server while the web is spread across hundreds of servers. Akamai is more like Google in a bunch of ways than most places. Wikipedia and ebay and amazon have huge web front-ends, while also needing transactional database support. Um, maybe I'm getting really too far afield from ZFS. I'll shut up now :-) . -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Legality and the future of zfs...
On Wed, July 14, 2010 23:51, Tim Cook wrote: > On Wed, Jul 14, 2010 at 9:27 PM, BM wrote: > >> On Thu, Jul 15, 2010 at 12:49 AM, Edward Ned Harvey >> wrote: >> > I'll second that. And I think this is how you can tell the >> difference: >> > With supermicro, do you have a single support number to call and a >> 4hour >> > onsite service response time? >> >> Yes. >> >> BTW, just for the record, people potentially have a bunch of other >> supermicros in a stock, that they've bought for the rest of the money >> that left from a budget that was initially estimated to get shiny >> Sun/Oracle hardware. :) So normally you put them online in a cluster >> and don't really worry that one of them gone just power that thing >> down and disconnect from the whole grid. >> >> > When you pay for the higher prices for OEM hardware, you're paying for >> the >> > knowledge of parts availability and compatibility. And a single point >> > vendor who supports the system as a whole, not just one component. >> >> What exactly kind of compatibility you're talking about? For example, >> if I remove my broken mylar air shroud for X8 DP with a >> MCP-310-18008-0N number because I step on it accidentally :-D, pretty >> much I think I am gonna ask them to replace exactly THAT thing back. >> Or you want to let me tell you real stories how OEM hardware is >> supported and how many emails/phonecalls it involves? One of the very >> latest (just a week ago): Apple Support reported me that their >> engineers in US has no green idea why Darwin kernel panics on their >> XServe, so they suggested me replace mother board TWICE and keep OLDER >> firmware and never upgrade, since it will cause crash again (although >> identical server works just fine with newest firmware)! I told them >> NNN times that traceback of Darwin kernel was yelling about ACPI >> problem and gave them logs/tracebacks/transcripts etc, but they still >> have no idea where is the problem. Do I need such "support"? No. Not >> at all. >> >> -- >> Kind regards, BM >> >> Things, that are stupid at the beginning, rarely ends up wisely. >> ___ >> >> > > You're clearly talking about something completely different than everyone > else. Whitebox works GREAT if you've got 20 servers. Try scaling it to > 10,000. "A couple extras" ends up being an entire climate controlled > warehouse full of parts that may or may not be in the right city. Not to > mention you've then got full-time staff on-hand to constantly be replacing > parts. Your model doesn't scale for 99% of businesses out there. Unless > they're google, and they can leave a dead server in a rack for years, it's > an unsustainable plan. Out of the fortune 500, I'd be willing to bet > there's exactly zero companies that use whitebox systems, and for a > reason. You might want to talk to Google about that; as I understand it they decided that buying expensive servers was a waste of money precisely because of the high numbers they needed. Even with the good ones, some will fail, so they had to plan to work very well through server failures, so they can save huge amounts of money on hardware by buying cheap servers rather than expensive ones. And your juxtaposition of "fortune 500" and "99% of businesses" is significant; possibly the Fortune 500, other than Google, use expensive proprietary hardware; but 99% of businesses out there are NOT in the Fortune 500, and mostly use whitebox systems (and not rackmount at all; they'll have one or at most two tower servers). -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] preparing for future drive additions
On Wed, July 14, 2010 14:58, Daniel Taylor wrote: > I'm about the build a opensolaris NAS system, currently we have two drives > and are planning on adding two more at a later date (2TB enterprise level > HDD are a bit expensive!). Do you really need them? Now? Maybe 1TB drives are good now, and then add a pair of 2TB in a year? > Whats the best configuration for setting up these drives bearing in mind I > want to expand in the future? Mirror now (pool consisting of one two-way mirror vdev). Add second mirror vdev to the pool when you need to expand. > I was thinking of mirroring the drives and then converting to raidz some > how? No way to convert to raidz. (That is, no magic simple way; you can of course put in new drives for the raidz and copy the data across.) > It will only be a max of 4 drives, the second two of which will be bought > later. 5 drives would be a lot better. You could keep a hot spare -- and you could expand mirror vdevs safely (never dropping below your normal redundancy level), too. You can add new vdevs to a pool. This is very useful for a growing system (until you run out of drive slots). You can expand an existing vdev by replacing all the drives (one at a time). It's a lot cleaner and safer with mirror vdevs than with raidz[ 23] vdevs. In a raid vdev, you can replace drives individually and wait for them to resilver. When each drive is done, replace the next. When you have replaced all of the drives, the vdev will then make the new space available. HOWEVER, doing this takes away a level of redundancy -- you take away a live drive. For a RAIDZ, that means no redundancy during the resilver (which takes a while on a 2TB drive, if you haven't noticed). And the resilver is stressing the drives, so if there's any incipient failure, it's more likely to show up during the resilver. Scary! (RAIDZ2 is better in that you still have one layer of redundancy when you take one drive out; but in a 4-drive chassis forget it!). In a mirror vdev, you can be much cleverer, IF you can connect the new drive while the old drives are all still present. Attach the new bigger drive as a THIRD drive to the mirror vdev, and wait for the resilver. You now have a three-way mirror, and you never dropped below a two-way mirror at any time during the process. Detach one small drive and attach a new big drive, and wait again. And detach the last small drive, and you have now expanded your mirror vdev without ever dropping below your normal redundancy. (There are variants on this; the key point is that a mirror vdev can be an n-way mirror for any value of n your hardware can support.) If your backups are good and your uptime requirements aren't really strict, of course the risks can be tolerated better. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/recv hanging in 2009.06
On Fri, July 9, 2010 18:42, Giovanni Tirloni wrote: > On Fri, Jul 9, 2010 at 6:49 PM, BJ Quinn wrote: >> I have a couple of systems running 2009.06 that hang on relatively large >> zfs send/recv jobs. With the -v option, I see the snapshots coming >> across, and at some point the process just pauses, IO and CPU usage go >> to zero, and it takes a hard reboot to get back to normal. The same >> script running against the same data doesn't hang on 2008.05. > > There are issues running concurrent zfs receive in 2009.6. Try to run > just one at a time. He's doing the same thing I'm doing -- one send, one receive. (But incremental replication.) > Switching to a development build (b134) is probably the answer until > we've a new release. Given that the "spring" stable release was my planned solution, I'm starting to think about doing something else myself. Does anybody have any idea what's up with the stable release, though? Has anything been said about the plans that I've maybe missed? -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/recv hanging in 2009.06
On Fri, July 9, 2010 16:49, BJ Quinn wrote: > I have a couple of systems running 2009.06 that hang on relatively large > zfs send/recv jobs. With the -v option, I see the snapshots coming > across, and at some point the process just pauses, IO and CPU usage go to > zero, and it takes a hard reboot to get back to normal. The same script > running against the same data doesn't hang on 2008.05. > > There are maybe 100 snapshots, 200GB of data total. Just trying to send > to a blank external USB drive in one case, and in the other, I'm restoring > from a USB drive to a local drive, but the behavior is the same. > > I see that others have had a similar problem, but there doesn't seem to be > any answers - > > https://opensolaris.org/jive/thread.jspa?messageID=384540 > http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg34493.html > http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg37158.html > > I'd like to stick with a "released" version of OpenSolaris, so I'm hoping > that the answer isn't to switch to the dev repository and pull down b134. I still have this problem (I was msg34493 there). My original plan was to wait for the Spring release, to get me to a stable release on more recent code. I'm still following that plan, i.e. haven't done anything else yet. At the time the "March" release was expected to actually appear by April. Other than trying more recent code, I don't recall any useful ideas coming through the list. It seems like the thing people recommend as the backup scheme for ZFS simply doesn't work yet. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Crucial RealSSD C300 and cache flush?
On Thu, June 24, 2010 08:58, Arne Jansen wrote: > Cross check: we pulled also while writing with cache enabled, and it lost > 8 writes. I'm SO pleased to see somebody paranoid enough to do that kind of cross-check doing this benchmarking! "Benchmarking is hard!" > So I'd say, yes, it flushes its cache on request. Starting to sound pretty convincing, yes. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Complete Linux Noob
On Tue, June 15, 2010 14:13, CarlPalmer wrote: > I have been researching different types of raids, and I happened across > raidz, and I am blown away. I have been trying to find resources to > answer some of my questions, but many of them are either over my head in > terms of details, or foreign to me as I am a linux noob, and I have to > admit I have never even looked at Solaris. Heh; caught another one :-) . > Are the Parity drives just that, a drive assigned to parity, or is the > parity shared over several drives? No drives are formally designated for "parity"; all n drives in the RAIDZ vdev are used together in such a way that you can lose one drive without loss of data, but exactly which bits are "data" and which bits are "parity" and where they are stored is not something the admin has to think about or know (and in fact cannot know). > I understand that you can build a raidz2 that will have 2 parity disks. > So in theory I could lose 2 disks and still rebuild my array so long as > they are not both the parity disks correct? Any two disks out of a raidz2 vdev can be lost. Lose a third before the recover completes and your data is toast. > I understand that you can have Spares assigned to the raid, so that if a > drive fails, it will immediately grab the spare and rebuild the damaged > drive. Is this correct? Yes, RAIDZ (including z2 and z3) and mirror vdevs will grab a "hot spare" if one is assigned and needed, and start the resilvering operation immediately. > Now I can not find anything on how much space is taken up in the raidz1 or > raidz2. If all the drives are the same size, does a raidz2 take up the > space of 2 of the drives for parity, or is the space calculation > different? That's the right calculation. > I get that you can not expand a raidz as you would a normal raid, by > simply slapping on a drive. Instead it seems that the preferred method is > to create a new raidz. Now Lets say that I want to add another raidz1 to > my system, can I get the OS to present this as one big drive with the > space from both raid pools? You can't expand a normal RAID, either, anywhere I've ever seen. A "pool" can contain multiple "vdevs". You can add additional vdevs to a pool and the new space become immediately available to the pool, and hence to anything (like a filesystem) drawing from that pool. (The zpool command will attempt to stop you from mixing vdevs of different redundancy in the same pool, but you can force it to let you. Mixing a RAIDZ vdev and a RAIDZ3 vdev in the same pool is a silly thing to do, since you don't control where in the pool any new data goes, and it's likely to be striped across the vdevs in the pool.) You can also replace all the drives in a vdev, serially (and waiting for the resilver to complete at each step before continuing to the next drive), and if the new drives are larger than the old drives, when you've replaced all of them the new space will be usable in that vdev. This is particularly useful with mirrors, where there are only two drives to replace. (Well, actually, ZFS mirrors can have any number of drives. To avoid the risk of loss when upgrading the drives in a mirror, attach the new bigger drive FIRST, wait for the resilver, and THEN detach one of the smaller original drives, repeat for the second drive, and you will never go to a redundancy lower than 2. You can even attach BOTH new disks at once, if you have the slots and controller space, and have a 4-way mirror for a while. Somebody reported configuring ALL the drives in a 'Thumper' as a mirror, a 48-way mirror, just to see if it worked. It did.) > How do I share these types of raid pools across the network. Or more > specifically, how do I access them from Windows based systems? Is there > any special trick? Nothing special. In-kernel CIFS is better than SAMBA, and supports full NTFS ACLs. I hear it also attaches to AD cleanly, but I haven't done that, don't run AD at home. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Please trim posts
On Thu, June 10, 2010 12:26, patto...@yahoo.com wrote: > It's getting downright ridiculous. The digest people will kiss you. But those reading via individual message email quite possibly will not. Quoting at least what you're actually responding to is crucial to making sense out here. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Depth of Scrub
On Fri, June 4, 2010 03:29, sensille wrote: > Hi, > > I have a small question about the depth of scrub in a raidz/2/3 > configuration. > I'm quite sure scrub does not check spares or unused areas of the disks > (it > could check if the disks detects any errors there). > But what about the parity? Obviously it has to be checked, but I can't > find > any indications for it in the literature. The man page only states that > the > data is being checksummed and only if that fails the redundancy is being > used. > Please tell me I'm wrong ;) I believe you're wrong. Scrub checks all the blocks used by ZFS, regardless of what's in them. (It doesn't check free blocks.) > But what I'm really targeting with my question: How much coverage can be > reached with a find | xargs wc in contrast to scrub? It misses the > snapshots, but anything beyond that? Your find script misses the redundant data; scrub checks it all. It may well miss some of the metadata as well, and probably misses the redundant copies of metadata. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] one more time: pool size changes
On Thu, June 3, 2010 12:03, Bob Friesenhahn wrote: > On Thu, 3 Jun 2010, David Dyer-Bennet wrote: >> >> In an 8-bay chassis, there are other concerns, too. Do I keep space >> open >> for a hot spare? There's no real point in a hot spare if you have only >> one vdev; that is, 8-drive RAIDZ3 is clearly better than 7-drive RAIDZ2 >> plus a hot spare. And putting everything into one vdev means that for >> any >> upgrade I have to replace all 8 drives at once, a financial problem for >> a >> home server. > > It is not so clear to me that an 8-drive raidz3 is clearly better than > 7-drive raidz2 plus a hot spare. From a maintenance standpoint, I > think that it is useful to have a spare drive or even an empty spare > slot so that it is easy to replace a drive without needing to > physically remove it from the system. A true hot spare allows > replacement to start automatically right away if a failure is > detected. But is having a RAIDZ2 drop to single redundancy, with replacement starting instantly, actually as good or better than having a RAIDZ3 drop to double redundancy, with actual replacement happening later? The "degraded" state of the RAIDZ3 has the same redundancy as the "healthy" state of the RAIDZ2. Certainly having a spare drive bay to play with is often helpful; though the scenarios that most immediately spring to mind are all mirror-related and hence don't apply here. > With only 8-drives, the reliability improvement from raidz3 is > unlikely to be borne out in practice. Other potential failures modes > will completely drown out the on-paper reliability improvement > provided by raidz3. I wouldn't give up much of anything to add Z3 on 8 drives, no. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] one more time: pool size changes
On Thu, June 3, 2010 13:04, Garrett D'Amore wrote: > On Thu, 2010-06-03 at 11:49 -0500, David Dyer-Bennet wrote: >> hot spares in place, but I have the bays reserved for that use. >> >> In the latest upgrade, I added 4 2.5" hot-swap bays (which got the >> system >> disks out of the 3.5" hot-swap bays). I have two free, and that's the >> form-factor SSDs come in these days, so if I thought it would help I >> could >> add an SSD there. Have to do quite a bit of research to see which uses >> would actually benefit me, and how much. It's not obvious that either >> l2arc or zil on SSD would help my program loading, image file loading, >> or >> image file saving cases that much. There may be more other stuff than I >> really think of though. > > It really depends on the working sets these programs deal with. > > zil is useful primarily when doing lots of writes, especially lots of > writes to small files or to data scattered throughout a file. I view it > as a great solution for database acceleration, and for accelerating the > filesystems I use for hosting compilation workspaces. (In retrospect, > since by definition the results of compilation are reproducible, maybe I > should just turn off synchronous writes for build workspaces... provided > that they do not contain any modifications to the sources themselves. > I'm going to have to play with this.) I suspect there are more cases here than I immediately think of. For example, sitting here thinking, I wonder if the web cache would benefit a lot? And all those email files? RAW files from my camera are 12-15MB, and the resulting Photoshop files are around 50MB (depending on compression, and they get bigger fast if I add layers). Those aren't small, and I don't read the same thing over and over lots. For build spaces, definitely should be reproducible from source. A classic production build starts with checking out a tagged version from source control, and builds from there. > l2arc is useful for data that is read back frequently but is too large > to fit in buffer cache. I can imagine that it would be useful for > hosting storage associated with lots of programs that are called > frequently. You can think of it as a logical extension of the buffer > cache in this regard... if your working set doesn't fit in RAM, then > l2arc can prevent going back to rotating media. I don't think I'm going to benefit much from this. > All other things being equal, I'd increase RAM before I'd worry too much > about l2arc. The exception to that would be if I knew I had working > sets that couldn't possibly fit in RAM... 160GB of SSD is a *lot* > cheaper than 160GB of RAM. :-) I just did increase RAM, same upgrade as the 2.5" bays and the additional controller and the third mirrored vdev. I increased it all the way to 4GB! And I can't increase it further feasibly (4GB sticks of ECC RAM being hard to find and extremely pricey; plus I'd have to displace some of my existing memory). Since this is a 2006 system, in another couple of years it'll be time to replace MB and processor and memory, and I'm sure it'll have a lot more memory next time. I'm desperately waiting for Solaris 2006.$Q2 ("Q2" since it was pointed out last time that "Spring" was wrong on half the Earth), since I hope it will resolve my backup problems so I can get incremental backups happening nightly (intention is to use zfs send/receive with incremental replication streams, to keep external drives up-to-date with data and all snapshots). The oldness of the system and especially the drives makes this more urgent, though of course it's important in general. I do manage a full backup that completes now and then, anyway, and they'll complete overnight if they don't hang. Problem is, if they hang, have to reboot the Solaris box and every Windows box using it. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] one more time: pool size changes
On Thu, June 3, 2010 10:50, Garrett D'Amore wrote: > On Thu, 2010-06-03 at 10:35 -0500, David Dyer-Bennet wrote: >> On Thu, June 3, 2010 10:15, Garrett D'Amore wrote: >> > Using a stripe of mirrors (RAID0) you can get the benefits of multiple >> > spindle performance, easy expansion support (just add new mirrors to >> the >> > end of the raid0 stripe), and 100% data redundancy. If you can >> afford >> > to pay double for your storage (the cost of mirroring), this is IMO >> the >> > best solution. >> >> Referencing "RAID0" here in the context of ZFS is confusing, though. >> Are >> you suggesting using underlying RAID hardware to create virtual volumes >> to >> then present to ZFS, or what? > > RAID0 is basically the default configuration of a ZFS pool -- its a > concatenation of the underlying vdevs. In this case the vdevs should > themselves be two-drive mirrors. > > This of course has to be done in the ZFS layer, and ZFS doesn't call it > RAID0, any more than it calls a mirror RAID1, but effectively that's > what they are. Kinda mostly, anyway. I thought we recently had this discussion, and people were pointing out things like the striping wasn't physically the same on each drive and such. >> > Note that this solution is not quite as resilient against hardware >> > failure as raidz2 or raidz3. While the RAID1+0 solution can tolerate >> > multiple drive failures, if both both drives in a mirror fail, you >> lose >> > data. >> >> In a RAIDZ solution, two or more drive failures lose your data. In a >> mirrored solution, losing the WRONG two drives will still lose your >> data, >> but you have some chance of surviving losing a random two drives. So I >> would describe the mirror solution as more resilient. >> >> So going to RAIDZ2 or even RAIDZ3 would be better, I agree. > >>From a data resiliency point, yes, raidz2 or raidz3 offers better > protection. At a significant performance cost. The place I care about performance is almost entirely sequential read/write -- loading programs, and loading and saving large image files. I don't know a lot of home users that actually need high IOPS. > Given enough drives, one could probably imagine using raidz3 underlying > vdevs, with RAID0 striping to spread I/O across multiple spindles. I'm > not sure how well this would perform, but I suspect it would perform > better than straight raidz2/raidz3, but at a significant expense (you'd > need a lot of drives). Might well work that way; it does sound about right. >> In an 8-bay chassis, there are other concerns, too. Do I keep space >> open >> for a hot spare? There's no real point in a hot spare if you have only >> one vdev; that is, 8-drive RAIDZ3 is clearly better than 7-drive RAIDZ2 >> plus a hot spare. And putting everything into one vdev means that for >> any >> upgrade I have to replace all 8 drives at once, a financial problem for >> a >> home server. > > This is one of the reasons I don't advocate using raidz (any version) > for home use, unless you can't afford the cost in space represented by > mirroring and a hot spare or two. (The other reason ... for my use at > least... is the performance cost. I want to use my array to host > compilation workspaces, and for that I would prefer to get the most > performance out of my solution. I suppose I could add some SSDs... but > I still think multiple spindles are a good option when you can do it.) > > In an 8 drive chassis, without any SSDs involved,I'd configure 6 of the > drives as a 3 vdev stripe consisting of mirrors of 2 drives, and I'd > leave the remaining two bays as hot spares. Btw, using the hot spares > in this way potentially means you can use those bays later to upgrade to > larger drives in the future, without offlining anything and without > taking too much of a performance penalty when you do so. And the three 2-way mirrors is exactly where I am right now. I don't have hot spares in place, but I have the bays reserved for that use. In the latest upgrade, I added 4 2.5" hot-swap bays (which got the system disks out of the 3.5" hot-swap bays). I have two free, and that's the form-factor SSDs come in these days, so if I thought it would help I could add an SSD there. Have to do quite a bit of research to see which uses would actually benefit me, and how much. It's not obvious that either l2arc or zil on SSD would help my program loading, image file loading, or image file saving cases that much. There may be more other stuff than I really think of though. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] one more time: pool size changes
On Thu, June 3, 2010 10:50, Marty Scholes wrote: > David Dyer-Bennet wrote: >> My choice of mirrors rather than RAIDZ is based on >> the fact that I have >> only 8 hot-swap bays (I still think of this as LARGE >> for a home server; >> the competition, things like the Drobo, tends to have >> 4 or 5), that I >> don't need really large amounts of storage (after my >> latest upgrade I'm >> running with 1.2TB of available data space), and that >> I expected to need >> to expand storage over the life of the system. With >> mirror vdevs, I can >> expand them without compromising redundancy even >> temporarily, by attaching >> the new drives before I detach the old drives; I >> couldn't do that with >> RAIDZ. Also, the fact that disk is now so cheap >> means that 100% >> redundancy is affordable, I don't have to compromise >> on RAIDZ. > > Maybe I have been unlucky too many times doing storage admin in the 90s, > but simple mirroring still scares me. Even with a hot spare (you do have > one, right?) the rebuild window leaves the entire pool exposed to a single > failure. No hot spare currently. And now running on 4-year-old disks, too. For me, mirroring is a big step UP from bare single drives. That's my "default state". Of course, I'm a big fan of multiple levels of backup. > One of the nice things about zfs is that allows, "to each his own." My > home server's main pool is 22x 73GB disks in a Sun A5000 configured as > RAIDZ3. Even without a hot spare, it takes several failures to get the > pool into trouble. Yes, it's very flexible, and while there are no doubt useless degenerate cases here and there, lots of the cases are useful for some environment or other. That does seem like rather an extreme configuration. > At the same time, there are several downsides to a wide stripe like that, > including relatively poor iops and longer rebuild windows. As noted > above, until bp_rewrite arrives, I cannot change the geometry of a vdev, > which kind of limits the flexibility. There are a LOT of reasons to want bp_rewrite, certainly. > As a side rant, I still find myself baffled that Oracle/Sun correctly > touts the benefits of zfs in the enterprise, including tremendous > flexibility and simplicity of filesystem provisioning and nondisruptive > changes to filesystems via properties. > > These forums are filled with people stating that the enterprise demands > simple, flexibile and nondisruptive filesystem changes, but no enterprise > cares about simple, flexibile and nondisruptive pool/vdev changes, e.g. > changing a vdev geometry or evacuating a vdev. I can't accept that zfs > flexibility is critical and zpool flexibility is unwanted. We could certainly use that level of pool-equivalent flexibility at work; we don't currently have it (not ZFS, not high-end enterprise storage units). -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] one more time: pool size changes
On Thu, June 3, 2010 10:15, Garrett D'Amore wrote: > Using a stripe of mirrors (RAID0) you can get the benefits of multiple > spindle performance, easy expansion support (just add new mirrors to the > end of the raid0 stripe), and 100% data redundancy. If you can afford > to pay double for your storage (the cost of mirroring), this is IMO the > best solution. Referencing "RAID0" here in the context of ZFS is confusing, though. Are you suggesting using underlying RAID hardware to create virtual volumes to then present to ZFS, or what? > Note that this solution is not quite as resilient against hardware > failure as raidz2 or raidz3. While the RAID1+0 solution can tolerate > multiple drive failures, if both both drives in a mirror fail, you lose > data. In a RAIDZ solution, two or more drive failures lose your data. In a mirrored solution, losing the WRONG two drives will still lose your data, but you have some chance of surviving losing a random two drives. So I would describe the mirror solution as more resilient. So going to RAIDZ2 or even RAIDZ3 would be better, I agree. In an 8-bay chassis, there are other concerns, too. Do I keep space open for a hot spare? There's no real point in a hot spare if you have only one vdev; that is, 8-drive RAIDZ3 is clearly better than 7-drive RAIDZ2 plus a hot spare. And putting everything into one vdev means that for any upgrade I have to replace all 8 drives at once, a financial problem for a home server. > If you're clever, you'll also try to make sure each side of the mirror > is on a different controller, and if you have enough controllers > available, you'll also try to balance the controllers across stripes. I did manage to split the mirrors accross controllers (I have 6 SATA on the motherboard and I added an 8-port SAS card with SAS-SATA cabling). > One way to help with that is to leave a drive or two available as a hot > spare. > > Btw, the above recommendation mirrors what Jeff Bonwick himself (the > creator of ZFS) has advised on his blog. I believe that article directly influenced my choice, in fact. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] one more time: pool size changes
On Wed, June 2, 2010 17:54, Roman Naumenko wrote: > Recently I talked to a co-worker who manages NetApp storages. We discussed > size changes for pools in zfs and aggregates in NetApp. > > And some time before I had suggested to a my buddy zfs for his new home > storage server, but he turned it down since there is no expansion > available for a pool. I set up my home fileserver with ZFS (in 2006) BECAUSE zfs could expand the pool for me, and nothing else I had access to could do that (home fileserver, little budget). My server is currently running with one data pool, three vdevs. Each of the data vdev is a two-way mirror. I started with one, expanded to two, then expanded to three. Rather than expanding to four when this fills up, I'm going to attach a larger drive to the first mirror vdev, and then a second one, and then remove the two current drives, thus expanding the vdev without ever compromising the redundancy. My choice of mirrors rather than RAIDZ is based on the fact that I have only 8 hot-swap bays (I still think of this as LARGE for a home server; the competition, things like the Drobo, tends to have 4 or 5), that I don't need really large amounts of storage (after my latest upgrade I'm running with 1.2TB of available data space), and that I expected to need to expand storage over the life of the system. With mirror vdevs, I can expand them without compromising redundancy even temporarily, by attaching the new drives before I detach the old drives; I couldn't do that with RAIDZ. Also, the fact that disk is now so cheap means that 100% redundancy is affordable, I don't have to compromise on RAIDZ. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Can I change copies to 2 *after* I have copied a bunch of files?
On Fri, May 28, 2010 11:04, Thanassis Tsiodras wrote: > I've read on the web that copies=2 affects only the files copied *after* I > have changed the setting That is correct. Rewriting datasets is a feature desired for future versions (it would make a LOT of things, including shrinking pools and adding compression or extra redundancy later work). Nobody has promised a date for it that I recall. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] send/recv over ssh
On Fri, May 21, 2010 12:59, Brandon High wrote: > On Fri, May 21, 2010 at 7:12 AM, David Dyer-Bennet wrote: >> >> On Thu, May 20, 2010 19:44, Freddie Cash wrote: >>> And you can always patch OpenSSH with HPN, thus enabling the NONE >>> cipher, >>> which disable encryption for the data transfer (authentication is >>> always >>> encrypted). And twiddle the internal buffers that OpenSSH uses to >>> improve >>> transfer rates, especially on 100 Mbps or faster links. >> >> Ah! I've been wanting that for YEARS. Very glad to hear somebody has >> done it. > > ssh-1 has had the 'none' cipher from day one, though it looks like > openssh has removed it at some point. Fixing the buffers seems to be a > nice tweak though. I thought I remembered a "none" cipher, but couldn't find it the other year and decided I must have been wrong. I did use ssh-1, so maybe I really WAS remembering after all. >> With the common use of SSH for for moving bulk data (under rsync as >> well), >> this is a really useful idea. Of course one should think about where >> one > > I think there's a certain assumption that using ssh = safe, and by > enabling a none cipher you break that assumption. All of us know > better, but less experienced admins may not. Seems a high price to pay to try to protect idiots from being idiots. Anybody who doesn't understand that "encryption = none" means it's not encrypted and hence not safe isn't safe as an admin anyway. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?
On Fri, May 21, 2010 10:19, Bob Friesenhahn wrote: > On Fri, 21 May 2010, Miika Vesti wrote: > >> AFAIK OCZ Vertex 2 does not use volatile DRAM cache but non-volatile >> NAND >> grid. Whether it respects or ignores the cache flush seems irrelevant. >> >> There has been previous discussion about this: >> http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/35702 >> >> "I'm pretty sure that all SandForce-based SSDs don't use DRAM as their >> cache, but take a hunk of flash to use as scratch space instead. Which >> means that they'll be OK for ZIL use." >> >> So, OCZ Vertex 2 seems to be a good choice for ZIL. > > There seem to be quite a lot of blind assumptions in the above. The > only good choice for ZIL is when you know for a certainty and not > assumptions based on 3rd party articles and blog postings. Otherwise > it is like assuming that if you jump through an open window that there > will be firemen down below to catch you. Just how DOES one know something for a certainty, anyway? I've seen LOTS of people mess up performance testing in ways that gave them very wrong answers; relying solely on your own testing is as foolish as relying on a couple of random blog posts. To be comfortable (I don't ask for "know for a certainty"; I'm not sure that exists outside of "faith"), I want a claim by the manufacturer and multiple outside tests in "significant" journals -- which could be the blog of somebody I trusted, as well as actual magazines and such. Ideally, certainly if it's important, I'd then verify the tests myself. There aren't enough hours in the day, so I often get by with less. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] send/recv over ssh
On Thu, May 20, 2010 19:44, Freddie Cash wrote: > And you can always patch OpenSSH with HPN, thus enabling the NONE > cipher, > which disable encryption for the data transfer (authentication is always > encrypted). And twiddle the internal buffers that OpenSSH uses to improve > transfer rates, especially on 100 Mbps or faster links. Ah! I've been wanting that for YEARS. Very glad to hear somebody has done it. With the common use of SSH for for moving bulk data (under rsync as well), this is a really useful idea. Of course one should think about where one is moving one's data unencrypted; but the precise cases where the performance hit of encryption will show are the safe ones, such as between my desktop and server which are plugged into the same switch; no data would leave that small LAN segment. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance drop during scrub?
On Mon, May 3, 2010 17:02, Richard Elling wrote: > On May 3, 2010, at 2:38 PM, David Dyer-Bennet wrote: >> On Sun, May 2, 2010 14:12, Richard Elling wrote: >>> On May 1, 2010, at 1:56 PM, Bob Friesenhahn wrote: >>>> On Fri, 30 Apr 2010, Freddie Cash wrote: >>>>> Without a periodic scrub that touches every single bit of data in the >>>>> pool, how can you be sure >>>>> that 10-year files that haven't been opened in 5 years are still >>>>> intact? >>>> >>>> You don't. But it seems that having two or three extra copies of the >>>> data on different disks should instill considerable confidence. With >>>> sufficient redundancy, chances are that the computer will explode >>>> before >>>> it loses data due to media corruption. The calculated time before >>>> data >>>> loss becomes longer than even the pyramids in Egypt could withstand. >>> >>> These calculations are based on fixed MTBF. But disk MTBF decreases >>> with >>> age. Most disks are only rated at 3-5 years of expected lifetime. >>> Hence, >>> archivists >>> use solutions with longer lifetimes (high quality tape = 30 years) and >>> plans for >>> migrating the data to newer media before the expected media lifetime is >>> reached. >>> In short, if you don't expect to read your 5-year lifetime rated disk >>> for >>> another 5 years, >>> then your solution is uhmm... shall we say... in need of improvement. >> >> Are they giving tape that long an estimated life these days? They >> certainly weren't last time I looked. > > Yes. > http://www.oracle.com/us/products/servers-storage/storage/tape-storage/036556.pdf > http://www.sunstarco.com/PDF%20Files/Quantum%20LTO3.pdf Yep, they say 30 years. That's probably in the same "years" where the MAM gold archival DVDs are good for 200, I imagine. (i.e. based on accelerated testing, with the lab knowing what answer the client wants). Although we may know more about tape aging, the accelerated tests may be more valid for tapes? But LTO-3 is a 400GB tape that costs, hmmm, maybe $40 each (maybe less with better shopping, that's a quick Amazon price rounded down). (I don't factor in compression in my own analysis because my data is overwhelmingly image filee and MP3 files, which don't compress further very well.) Plus a $1000 drive, or $2000 for a 3-tape changer (and that's barely big enough to back up my small server without manual intervention, might not be by the end of the year). Tape is a LOT more expensive than my current hard-drive based backup scheme, even if I use the backup drives only three years (and since they spin less than 10% of the time, they should last pretty well). Also, I lose my snapshots in a tape backup, whereas I keep them on my hard drive backups. (Or else I'm storing a ZFS send stream on tape and hoping it will actually restore.) >> And I basically don't trust tape; too many bad experiences (ever since I >> moved off of DECTape, I've been having bad experiences with tape). The >> drives are terribly expensive and I can't afford redundancy, and in >> thirty >> years I very probably could not buy a new drive for my old tapes. >> >> I started out a big fan of tape, but the economics have been very much >> against it in the range I'm working (small; 1.2 terabytes usable on my >> server currently). >> >> I don't expect I'll keep my hard disks for 30 years; I expect I'll >> upgrade >> them periodically, probably even within their MTBF. (Although note >> that, >> though tests haven't been run, the MTBF of a 5-year disk after 4 years >> is >> nearly certainly greater than 1 year.) > > Yes, but MTBF != expected lifetime. MTBF is defined as Mean Time Between > Failures (a rate), not Time Until Death (a lifetime). If your MTBF was 1 > year, > then the probability of failing within 1 year would be approximately 63%, > assuming an exponential distribution. Yeah, sorry, I stumbled into using the same wrong figures lots of people were. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance drop during scrub?
On Sun, May 2, 2010 14:12, Richard Elling wrote: > On May 1, 2010, at 1:56 PM, Bob Friesenhahn wrote: >> On Fri, 30 Apr 2010, Freddie Cash wrote: >>> Without a periodic scrub that touches every single bit of data in the >>> pool, how can you be sure >>> that 10-year files that haven't been opened in 5 years are still >>> intact? >> >> You don't. But it seems that having two or three extra copies of the >> data on different disks should instill considerable confidence. With >> sufficient redundancy, chances are that the computer will explode before >> it loses data due to media corruption. The calculated time before data >> loss becomes longer than even the pyramids in Egypt could withstand. > > These calculations are based on fixed MTBF. But disk MTBF decreases with > age. Most disks are only rated at 3-5 years of expected lifetime. Hence, > archivists > use solutions with longer lifetimes (high quality tape = 30 years) and > plans for > migrating the data to newer media before the expected media lifetime is > reached. > In short, if you don't expect to read your 5-year lifetime rated disk for > another 5 years, > then your solution is uhmm... shall we say... in need of improvement. Are they giving tape that long an estimated life these days? They certainly weren't last time I looked. And I basically don't trust tape; too many bad experiences (ever since I moved off of DECTape, I've been having bad experiences with tape). The drives are terribly expensive and I can't afford redundancy, and in thirty years I very probably could not buy a new drive for my old tapes. I started out a big fan of tape, but the economics have been very much against it in the range I'm working (small; 1.2 terabytes usable on my server currently). I don't expect I'll keep my hard disks for 30 years; I expect I'll upgrade them periodically, probably even within their MTBF. (Although note that, though tests haven't been run, the MTBF of a 5-year disk after 4 years is nearly certainly greater than 1 year.) -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance drop during scrub?
On Fri, April 30, 2010 13:44, Freddie Cash wrote: > On Fri, Apr 30, 2010 at 11:35 AM, Bob Friesenhahn < > bfrie...@simple.dallas.tx.us> wrote: > >> On Thu, 29 Apr 2010, Tonmaus wrote: >> >> Recommending to not using scrub doesn't even qualify as a workaround, >> in >>> my regard. >>> >> >> As a devoted believer in the power of scrub, I believe that after the >> OS, >> power supplies, and controller have been verified to function with a >> good >> scrubbing, if there is more than one level of redundancy, scrubs are not >> really warranted. With just one level of redundancy it becomes much >> more >> important to verify that both copies were written to disk correctly. >> > Without a periodic scrub that touches every single bit of data in the > pool, > how can you be sure that 10-year files that haven't been opened in 5 years > are still intact? > > Self-healing only comes into play when the file is read. If you don't > read > a file for years, how can you be sure that all copies of that file haven't > succumbed to bit-rot? Yes, that's precisely my point. That's why it's especially relevant to archival data -- it's important (to me), but not frequently accessed. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance drop during scrub?
On Thu, April 29, 2010 17:35, Bob Friesenhahn wrote: > In my opinion periodic scrubs are most useful for pools based on > mirrors, or raidz1, and much less useful for pools based on raidz2 or > raidz3. It is useful to run a scrub at least once on a well-populated > new pool in order to validate the hardware and OS, but otherwise, the > scrub is most useful for discovering bit-rot in singly-redundant > pools. I've got 10 years of photos on my disk now, and it's growing at faster than one year per year (since I'm scanning backwards slowly through the negatives). Many of them don't get accessed very often; they're archival, not current use. Scrub was one of the primary reasons I chose ZFS for the fileserver they live on -- I want some assurance, 20 years from now, that they're still valid. I needed something to check them periodically, and something to check *against*, and block checksums and scrub seemed to fill the bill. So, yes, I want to catch bit rot -- on a pool of mirrored VDEVs. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance drop during scrub?
On Wed, April 28, 2010 10:16, Eric D. Mudama wrote: > On Wed, Apr 28 at 1:34, Tonmaus wrote: >>> Zfs scrub needs to access all written data on all >>> disks and is usually >>> disk-seek or disk I/O bound so it is difficult to >>> keep it from hogging >>> the disk resources. A pool based on mirror devices >>> will behave much >>> more nicely while being scrubbed than one based on >>> RAIDz2. >> >> Experience seconded entirely. I'd like to repeat that I think we >> need more efficient load balancing functions in order to keep >> housekeeping payload manageable. Detrimental side effects of scrub >> should not be a decision point for choosing certain hardware or >> redundancy concepts in my opinion. > > While there may be some possible optimizations, i'm sure everyone > would love the random performance of mirror vdevs, combined with the > redundancy of raidz3 and the space of a raidz1. However, as in all > systems, there are tradeoffs. The situations being mentioned are much worse than what seem reasonable tradeoffs to me. Maybe that's because my intuition is misleading me about what's available. But if the normal workload of a system uses 25% of its sustained IOPS, and a scrub is run at "low priority", I'd like to think that during a scrub I'd see a little degradation in performance, and that the scrub would take 25% or so longer than it would on an idle system. There's presumably some inefficiency, so the two loads don't just add perfectly; so maybe another 5% lost to that? That's the big uncertainty. I have a hard time believing in 20% lost to that. Do you think that's a reasonable outcome to hope for? Do you think ZFS is close to meeting it? People with systems that live at 75% all day are obviously going to have more problems than people who live at 25%! -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?
On Tue, April 27, 2010 11:17, Bob Friesenhahn wrote: > On Tue, 27 Apr 2010, David Dyer-Bennet wrote: >> >> I don't think I understand your scenario here. The docs online at >> <http://docs.sun.com/app/docs/doc/819-5461/gazgd?a=view> describe uses >> of >> zpool replace that DO run the array degraded for a while, and don't seem >> to mention any other. >> >> Could you be more detailed? > > If a disk has failed, then it makes sense to physically remove the old > disk, insert a new one, and do 'zpool replace tank c1t1d0'. However > if the disk has not failed, then you can install a new disk in another > location and use the two argument form of replace like 'zpool replace > tank c1t1d0 c1t1d7'. If I understand things correctly, this allows > you to replace one good disk with another without risking the data in > your pool. I don't see any reason to think the old device remains in use until the new device is resilvered, and if it doesn't, then you're down one level of redundancy the instant the old device goes out of service. I don't have a RAIDZ group, but trying this while there's significant load on the group, it should be easy to see if there's traffic on the old drive after the resilver starts. If there is, that would seem to be evidence that it's continuing to use the old drive while resilvering to the new one, which would be good. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?
On Tue, April 27, 2010 10:38, Bob Friesenhahn wrote: > On Tue, 27 Apr 2010, David Dyer-Bennet wrote: >> >> Hey, you know what might be helpful? Being able to add redundancy to a >> raid vdev. Being able to go from RAIDZ2 to RAIDZ3 by adding another >> drive >> of suitable size. Also being able to go the other way. This lets you >> do >> the trick of temporarily adding redundancy to a vdev while swapping out >> devices one at a time to eventually upgrade the size (since you're >> deliberately creating a fault situation, increasing redundancy before >> you >> do it makes loads of sense!). > > You can already replace one drive with another (zpool replace) so as > long as there is space for the new drive, it is not necessary to > degrade the array and lose redundancy while replacing a device. As > long as you can physically add a drive to the system (even > temporarily) it is not necessary to deliberately create a fault > situation. I don't think I understand your scenario here. The docs online at <http://docs.sun.com/app/docs/doc/819-5461/gazgd?a=view> describe uses of zpool replace that DO run the array degraded for a while, and don't seem to mention any other. Could you be more detailed? -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?
On Mon, April 26, 2010 17:21, Edward Ned Harvey wrote: > Also, if you've got all those disks in an array, and they're MTBF is ... > let's say 25,000 hours ... then 3 yrs later when they begin to fail, they > have a tendency to all fail around the same time, which increases the > probability of exceeding your designed level of redundancy. It's useful to consider this when doing mid-life upgrades. Unfortunately there's not too much useful to be done right now with RAID setups. With mirrors, when adding some disks mid-life (seems like a common though by no means universal scenario to not fully populate the chassis at first, and add more 1/3 to 1/2 way through the projected life), with some extra trouble one can attach a new disk as a n+1st disk in an existing mirror, wait for the resilver, and detach an old disk. That mirror is now one new disk and one old disk, rather than two disks of the same age. Then build a new mirror out of the freed disk plus another new disk. Now you've got both mirrors consisting of disks of different ages, less prone to failing at the same time. (Of course this doesn't work when you're using bigger drives for the mid-life kicker, and most of the time it would make sense to do so.) Even buying different (mixed) brands initially doesn't help against aging; only against batch or design problems. Hey, you know what might be helpful? Being able to add redundancy to a raid vdev. Being able to go from RAIDZ2 to RAIDZ3 by adding another drive of suitable size. Also being able to go the other way. This lets you do the trick of temporarily adding redundancy to a vdev while swapping out devices one at a time to eventually upgrade the size (since you're deliberately creating a fault situation, increasing redundancy before you do it makes loads of sense!). > I recently bought 2x 1Tb disks for my sun server, for $650 each. This was > enough to make me do the analysis, "why am I buying sun branded overpriced > disks?" Here is the abridged version: No argument that, in the existing market, with various levels of need, this is often the right choice. I find it deeply frustrating and annoying that this dilemma exists entirely due to bad behavior by the disk companies, though. First they sell deliberately-defective drives (lie about cache flush, for example) and then they (in conspiracy with an accomplice company) charge us many times the cost of the physical hardware for fixed versions. This MUST be stopped. This is EXACTLY what standards exist for -- so we can buy known-quantity products in a competitive market. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Which build is the most stable, mainly for NAS (zfs)?
On 14-Apr-10 22:44, Ian Collins wrote: On 04/15/10 06:16 AM, David Dyer-Bennet wrote: Because 132 was the most current last time I paid much attention :-). As I say, I'm currently holding out for 2010.$Spring, but knowing how to get to a particular build via package would be potentially interesting for the future still. I hope it's 2010.$Autumn, I don't fancy waiting until October. Hint: the southern hemisphere does exist! I've even been there. But the month/season relationship is too deeply built into too many things I follow (like the Christmas books come out of the publisher's fall list; for that matter, like that Christmas is in the winter) to go away at all easily. California doesn't have seasons anyway. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Which build is the most stable, mainly for NAS (zfs)?
On Wed, April 14, 2010 15:28, Miles Nordin wrote: >>>>>> "dd" == David Dyer-Bennet writes: > > dd> Is it possible to switch to b132 now, for example? > > yeah, this is not so bad. I know of two approaches: Thanks, I've filed and flagged this for reference. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Suggestions about current ZFS setup
On Wed, April 14, 2010 12:29, Bob Friesenhahn wrote: > On Wed, 14 Apr 2010, David Dyer-Bennet wrote: >>>> >>>> Not necessarily for a home server. While mine so far is all mirrored >>>> pairs of 400GB disks, I don't even think about "performance" issues, I >>>> never come anywhere near the limits of the hardware. >>> >>> I don't see how the location of the server has any bearing on required >>> performance. If these 2TB drives are the new 4K sector variety, even >>> you might notice. >> >> The location does not, directly, of course; but the amount and type of >> work being supported does, and most home servers see request streams >> very >> different from commercial servers. > > If it was not clear, the performance concern is primarily for writes > since zfs will load-share the writes across the available vdevs using > an algorithm which also considers the write queue/backlog for each > vdev. If a vdev is slow, then it may be filled more slowly than the > other vdevs. This is also the reason why zfs encourages that all > vdevs use the same organization. As I said, I don't think of performance issues on mine. So I wasn't thinking of that particular detail, and it's good to call it out explicitly. If the performance of the new drives isn't adequate, then the performance of the entire pool will become inadequate, it looks like. I expect it's routine to have disks of different generations in the same pool at this point (and if it isn't now, it will be in 5 years), just due to what's available, replacing bad drives, and so forth. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Which build is the most stable, mainly for NAS (zfs)?
On Wed, April 14, 2010 11:51, Tonmaus wrote: >> >> On Wed, April 14, 2010 08:52, Tonmaus wrote: >> > safe to say: 2009.06 (b111) is unusable for the >> purpose, ans CIFS is dead >> > in this build. >> >> That's strange; I run it every day (my home Windows >> "My Documents" folder >> and all my photos are on 2009.06). >> >> >> -bash-3.2$ cat /etc/release >> OpenSolaris 2009.06 snv_111b >> X86 >> Copyright 2009 Sun Microsystems, Inc. All >> Rights Reserved. >> Use is subject to license >> terms. >> Assembled 07 May 2009 > > > I would be really interested how you got past this > http://defect.opensolaris.org/bz/show_bug.cgi?id=11371 > which I was so badly bitten by that I considered giving up on OpenSolaris. I don't get random hangs in normal use; so I haven't done anything to "get past" this. I DO get hangs when funny stuff goes on, which may well be related to that problem (at least they require a reboot). Hmmm; I get hangs sometimes when trying to send a full replication stream to an external backup drive, and I have to reboot to recover from them. I can live with this, in the short term. But now I'm feeling hopeful that they're fixed in what I'm likely to be upgrading to next. >> not sure if this is best choice. I'd like to >> hear from others as well. >> Well, it's technically not a stable build. >> >> I'm holding off to see what 2010.$Spring ends up >> being; I'll convert to >> that unless it turns into a disaster. >> >> Is it possible to switch to b132 now, for example? I >> don't think the old >> builds are available after the next one comes out; I >> haven't been able to >> find them. > > There are methods to upgrade to any dev build by pkg. Can't tell you from > the top of my head, but I have done it with success. > > I wouldn't know why to go to 132 instead of 133, though. 129 seems to be > an option. Because 132 was the most current last time I paid much attention :-). As I say, I'm currently holding out for 2010.$Spring, but knowing how to get to a particular build via package would be potentially interesting for the future still. Having been told it's possible helps, makes it worth looking harder. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Suggestions about current ZFS setup
On Wed, April 14, 2010 12:06, Bob Friesenhahn wrote: > On Wed, 14 Apr 2010, David Dyer-Bennet wrote: >>> It should be "safe" but chances are that your new 2TB disks are >>> considerably slower than the 1TB disks you already have. This should >>> be as much cause for concern (or more so) than the difference in raidz >>> topology. >> >> Not necessarily for a home server. While mine so far is all mirrored >> pairs of 400GB disks, I don't even think about "performance" issues, I >> never come anywhere near the limits of the hardware. > > I don't see how the location of the server has any bearing on required > performance. If these 2TB drives are the new 4K sector variety, even > you might notice. The location does not, directly, of course; but the amount and type of work being supported does, and most home servers see request streams very different from commercial servers. The last server software I worked on was able to support 80,000 simultaneous HD video streams. Coming off Thumpers, in fact (well, coming out of a truly obscene amount of DRAM buffer on the streaming board, which was in turn loaded from Thumpers); this was the thing that Thumper was originally designed for, known when I worked there as the Sun Streaming System I believe. You don't see loads like that on home servers :-). And a big database server would have an equally extreme but totally different access pattern. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss