Re: [zfs-discuss] Slow write speed to ZFS pool (via NFS)
So that leaves us with a Samba vs NFS issue (not related to ZFS). We know that NFS is able to create file _at most_ at one file per server I/O latency. Samba appears better and this is what we need to investigate. It might be better in a way that NFS can borrow (maybe through some better NFSV4 delegation code) or Samba might be better by being careless with data. If we find such an NFS improvement it will help all backend filesystems not just ZFS. Just curious: Was this nfs-samba ghost ever caught and sent back to the spirit realm? :) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Force ditto block on different vdev?
On August 10, 2007 2:20:30 PM +0300 Tuomas Leikola [EMAIL PROTECTED] wrote: We call that a mirror :-) Mirror and raidz suffer from the classic blockdevice abstraction problem in that they need disks of equal size. Not that I'm aware of. Mirror and raid-z will simply use the smallest size of your available disks. Exactly. The rest is not usable. Well I don't understand how you suggest to use it if you want redundancy. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Force ditto block on different vdev?
We call that a mirror :-) Mirror and raidz suffer from the classic blockdevice abstraction problem in that they need disks of equal size. Not that I'm aware of. Mirror and raid-z will simply use the smallest size of your available disks. Exactly. The rest is not usable. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is there _any_ suitable motherboard?
Alec Muffett wrote: Does anyone on this list have experience with a recent board with 6 or more SATA ports that they know is supported? Well so far I have only populated 5 of the ports I have available, but my writeup with my 9-port SATS ASUS mobo is at: http://www.crypticide.com/dropsafe/article/2091 ...and I hope to run a few more tests this weekend, time permitting. But you know this, since you've already commented there. :-) - alec In fact, any of the recent Intel chipset motherboards; Server class: Chipset ESB-2 southbridge Desktop class: Chipset ICH-8 and ICH-9 Motherboards known as i965 chipset and Intel P35 chipsets The above all support AHCI and have typically up to 6 SATA connectors, which the southbridge supports. On ICH-9, all six are in the southbridge and supported/tested running Solaris. On some i965 classs motherboards, there may only be 4 SATA ports that Solaris supports, since the other 2 may or may not be present and may be a third party SATA controller chip with no device driver software for Solaris. So if you want all six SATA ports for Solaris in AHCI mode, go with a desktop board using the P35 chipset (ICH-9 southbridge) or a server board based on ESB-2 southbridge (Intel Series S-5000 server boards). Hope this helps. Neal ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Force ditto block on different vdev?
On 8/10/07, Darren Dunham [EMAIL PROTECTED] wrote: For instance, it might be nice to create a mirror with a 100G disk and two 50G disks. Right now someone has to create slices on the big disk manually and feed them to zpool. Letting ZFS handle everything itself might be a win for some cases. Especially performance-wise. AFAIK ZFS doesn't understand that the two vdevs actually share a physical disk and therefore should not be used as raid0-like stripes. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Force ditto block on different vdev?
On 8/10/07, Darren J Moffat [EMAIL PROTECTED] wrote: Tuomas Leikola wrote: We call that a mirror :-) Mirror and raidz suffer from the classic blockdevice abstraction problem in that they need disks of equal size. Not that I'm aware of. Mirror and raid-z will simply use the smallest size of your available disks. Exactly. The rest is not usable. For what you are asking, forcing ditto blocks on to separate vdevs, to work you effectively end up with the same restriction as mirroing. In theory, correct. In practice, administration is much simpler when there are multiple devices. Simplicity of administration really being the point here - sorry I didn't make it clear at first. I'm skipping the two-disk example as trivial - which it is. Howerer: administration becomes a real mess when you have multiple (say, 10) disks, all differing sizes, and want to use all the space - think about the home user with a constrained budget or just a huge pile of random oldish disk lying around. It is possible to merge disks before (or after) setting up the mirrors, but it is a tedious job, especially when you start replacing small disks one by one with larger ones, etc. This can be - relatively easily - automated by zfs block allocation strategies and this is why I consider it a worthwhile feature. However I suspect you will say that unlike mirroring only some of your datasets will have ditto blocks turned on. That's one good point. Maybe I don't want to decide in advance how much mirrored storage i really need - or just use all the free mirrored space for nonmirrored temporary storage. I'd call this flexibility. The only way I could see this working is if *all* datasets that have copies 1 were quotaed down to the size of the smallest disk. Admittedly, in the two-disk scenario the benefit is relatively low, but in the most multi-disk scenarios the disks can be practically full before running out of ditto locations - minus the block(s). (This holds for copies=2 if largest disk sum of others). Which basically ends up back at a real mirror or a really hard to understand system IMO. I find volume manager mess hard to understand - and it is a mess in the multidisk scenario when you start adding and removing disks. For a real-world use case, i'll present my home fileserver. 11 disks, sizes vary between 80 and 400 gigabytes. The disks are concatenated together into 6 stacks that are raid6:d together - with only 40G or so wasted space. I had to write a program to optimize the disk arrangement. Raid6 isn't exactly mirroring, but the administrative hurdles are the same. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Force ditto block on different vdev?
On 8/10/07, Moore, Joe [EMAIL PROTECTED] wrote: Wishlist: It would be nice to put the whole redundancy definitions into the zfs filesystem layer (rather than the pool layer): Imagine being able to set copies=5+2 for a filesystem... (requires a 7-VDEV pool, and stripes via RAIDz2, otherwise the zfs create/set fails) Yes please ;) This is practically the holy grail of dynamic raid - the ability to dynamically use different redundancy settings on a per-directory level, and to use a mix of different sized devices and add/remove them at will. I guess one would call this feature (ditto block setting of stripe+parity). It's doable but probably requires large(ish) changes to on-disk structures as block pointer will look different. James, did you look at this? With vdev removal (which I suppose will be implemented with some kind of rewrite block -type code) in place, reshape and rebalance functionality would propably be relatively small improvements. BTW here's more wishlist items now that we're at it: - copies=max+2 (use as many stripes as possible, with border case of 3-way mirror) - minchunk=8kb (dont spread smaller stripes than this - performance optimization) - checksum on every disk independently (instead of full stripe) - fixes raidz random read performance .. And one crazy idea just popped into my head: fs-level raid could be implemented with separate parity blocks instead of the ditto mechanism. Say, when data first is written, normal ditto block is used. Then later, asynchronously, the block is combined with some other blocks (that may be unrelated), the parity is written to a new allocation and the ditto block(s) are freed. When data blocks are freed (by COW) the parity needs to be recalculated before the data block can actually be forgotten. This can be thought of as combining a number of ditto blocks into a parity block. That may be easier or more complicated to implement than saving the block as stripe+parity in the first place. Depends on the data structures, which I don't yet know intimately. Come to think of this, it's probably best to get all these ideas out there _before_ I start looking into the code - knowing the details has the tendency to kill all the crazy ideas :) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is there _any_ suitable motherboard?
On Fri, Aug 10, 2007 at 10:23:49AM -0700, Neal Pollack wrote: Server class: Chipset ESB-2 southbridge Desktop class: Chipset ICH-8 and ICH-9 Motherboards known as i965 chipset and Intel P35 chipsets Are the i975 chipset boards any less likely to run Solaris? 4 SATA is fine, but I need to know before I buy the board, otherwise I'll end up putting Linux on it, and I *really* don't want to do that. :) -brian -- Perl can be fast and elegant as much as J2EE can be fast and elegant. In the hands of a skilled artisan, it can and does happen; it's just that most of the shit out there is built by people who'd be better suited to making sure that my burger is cooked thoroughly. -- Jonathan Patschke ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Force ditto block on different vdev?
We call that a mirror :-) Mirror and raidz suffer from the classic blockdevice abstraction problem in that they need disks of equal size. Not that I'm aware of. Mirror and raid-z will simply use the smallest size of your available disks. Exactly. The rest is not usable. Well I don't understand how you suggest to use it if you want redundancy. Well it is possible if you 'slice' the disks up as Tuomas suggested previously (or do something slightly cleverer -- though equivalent -- under the hood). See my recent most on zfs-code: http://mail.opensolaris.org/pipermail/zfs-code/2007-August/000583.html . I made this work as a university project, though as it currently stands you can't replace disks with my implementation -- which I'm hoping to solve, along with adding disks to the RAID-Z, when I get some more free time. Sadly it's not going to come immediately as I'm now working full time but certainly a fun project as the ZFS guys have higher priority items on their list. James ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is there _any_ suitable motherboard?
Does anyone on this list have experience with a recent board with 6 or more SATA ports that they know is supported? Well so far I have only populated 5 of the ports I have available, but my writeup with my 9-port SATS ASUS mobo is at: http://www.crypticide.com/dropsafe/article/2091 ...and I hope to run a few more tests this weekend, time permitting. But you know this, since you've already commented there. :-) - alec -- Alec Muffett http://www.google.com/search?q=alec+muffett ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Force ditto block on different vdev?
Tuomas Leikola wrote: We call that a mirror :-) Mirror and raidz suffer from the classic blockdevice abstraction problem in that they need disks of equal size. Not that I'm aware of. Mirror and raid-z will simply use the smallest size of your available disks. Exactly. The rest is not usable. For what you are asking, forcing ditto blocks on to separate vdevs, to work you effectively end up with the same restriction as mirroing. For example lets say you have a two disk pool of 50 and 100 sized disks, if ZFS only ever put a ditto block onto a separate vdev to the original block you can still only use 50 not 100. What do you do when the disk of size 50 is full yet you have more ditto blocks to write ? I can see only two options: 1) fail the write due to lack of space. Which is basically the same as mirroring today. 2) break the requirement that the ditto must be on an alternate vdev. If you break the requirement you are back to what the current design does which is *try* to use an alternate vdev for the ditto. However I suspect you will say that unlike mirroring only some of your datasets will have ditto blocks turned on. The only way I could see this working is if *all* datasets that have copies 1 were quotaed down to the size of the smallest disk. Which basically ends up back at a real mirror or a really hard to understand system IMO. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Force ditto block on different vdev?
On 8/9/07, Richard Elling [EMAIL PROTECTED] wrote: What I'm looking for is a disk full error if ditto cannot be written to different disks. This would guarantee that a mirror is written on a separate disk - and the entire filesystem can be salvaged from a full disk failure. We call that a mirror :-) Mirror and raidz suffer from the classic blockdevice abstraction problem in that they need disks of equal size. Not really a problem for most people, but inconvenient for everyone. Isn't flexibility and ease of administration the zfs way? ;) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Force ditto block on different vdev?
On 8/9/07, Mario Goebbels [EMAIL PROTECTED] wrote: If you're that bent on having maximum redundancy, I think you should consider implementing real redundancy. I'm also biting the bullet and going mirrors (cheaper than RAID-Z for home, less disks needed to start with). Currently I am, and as I'm stuck with different sized disks I first have to slice them up to similarly sized chunks and .. well you get the idea. It's a pain. The problem here is that the filesystem, especially with a considerable fill factor, can't guarantee the necessary allocation balance across the vdevs (that is maintaining necessary free space) to spread the ditto blocks as optimal as you'd like. Implementing the required code would increase the overhead a lot. Not to mention that ZFS may have to defrag on the fly more than not to make sure the ditto spread can be maintained balanced. I Feel that for most purposes, this could be fixed with an allocator strategy option, like: Prefer vdevs with most free space (which is not that good a default as it has performance implications). And then snapshots on top of that, which are supposed to be physically and logically immovable (unless you execute commands affecting the pool, like a vdev remove, I suppose), just increase the existing complexity, where all that would have to be hammered into. I'm not that familiar with the code, but i get the feeling that if vdev remove is a given, rebalance would not be a huge step? The code to migrate data blocks would already be there. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Force ditto block on different vdev?
On August 10, 2007 12:34:23 PM +0300 Tuomas Leikola [EMAIL PROTECTED] wrote: On 8/9/07, Richard Elling [EMAIL PROTECTED] wrote: What I'm looking for is a disk full error if ditto cannot be written to different disks. This would guarantee that a mirror is written on a separate disk - and the entire filesystem can be salvaged from a full disk failure. We call that a mirror :-) Mirror and raidz suffer from the classic blockdevice abstraction problem in that they need disks of equal size. Not that I'm aware of. Mirror and raid-z will simply use the smallest size of your available disks. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Force ditto block on different vdev?
This is practically the holy grail of dynamic raid - the ability to dynamically use different redundancy settings on a per-directory level, and to use a mix of different sized devices and add/remove them at will. Well I suspect that arbitrary redundancy configuration is not something we'll see anytime soon, nor is it something we should necessarily want. The main reason being it's very difficult to see it being used effectively; it's difficult enough to reason about data loss characteristics currently. (Not even considering the implementation complexity.) If you really need different guarantees on integrity, you could create separate specialized pools of mirrors -- or use the ditto blocks feature. As Richard Elling points out (http://blogs.sun.com/relling/entry/raid_recommendations_space_vs_mttdl), some redundancy (RAID-Z) is much better than none, and mirroring your data increases the MTTDL by another 5/6 orders of magnitude (though ditto'ing isn't quite doing that), and interestingly the RAID data points clump together. I think the important thing is that the system should be a little more flexible than it is currently (allow variably sized disks, adding/removing them), but not so much so that it's a completely different system. James ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Force ditto block on different vdev?
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Frank Cusack Sent: Friday, August 10, 2007 7:26 AM To: Tuomas Leikola Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] Force ditto block on different vdev? On August 10, 2007 2:20:30 PM +0300 Tuomas Leikola [EMAIL PROTECTED] wrote: We call that a mirror :-) Mirror and raidz suffer from the classic blockdevice abstraction problem in that they need disks of equal size. Not that I'm aware of. Mirror and raid-z will simply use the smallest size of your available disks. Exactly. The rest is not usable. Well I don't understand how you suggest to use it if you want redundancy. Since copies=N is a per-filesystem setting, you fail writes to /tank/important_documents (copies=2) when you run out of ditto blocks on another VDEV, but still allow /tank/torrentcache (copies=1) to use the other space. With disks of 100 and 50 GB mirrored, /tank/torrentcache would be more redundant than necessary, and you run out of capacity too soon. Wishlist: It would be nice to put the whole redundancy definitions into the zfs filesystem layer (rather than the pool layer): Imagine being able to set copies=5+2 for a filesystem... (requires a 7-VDEV pool, and stripes via RAIDz2, otherwise the zfs create/set fails) --Joe ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Is there _any_ suitable motherboard?
Hi, I am wondering what the readers of this list are using to control their ZFS RAID-Z arrays with. A quote from an under answered comment the OpenSolaris device driver forumhttp://www.opensolaris.org/jive/thread.jspa?threadID=32610tstart=0: *I'm having a hard time finding any decent motherboard that will work with OpenSolaris. I don't have high requirements, only: - 6 (or more) SATA ports (with the future in mind I guess some AHCI controller should be best) - a gigabit port (or two) - 4 DDR2 @ 800MHz - some PCI-Express slots As you can see, these are very modest specs, except that most mobos still come with only 4 sata ports, although 6 isn't very uncommon either (a mobo with 4 sata ports and a cheap pci-e card with at least 2 sata ports would be ok, too). One would think that it'd be quite easy to find at least one suitable cheap mainstream (i.e., not any super-expensive server hardware) mobo that OpenSolaris would work on, but this task seems to be impossible. I've been looking through these forums (and Google) for several tens of hours, but without luck. I've only found people recommending obsolete hardware (yesterday's tech is quite expensive and hard to find), or people saying stuff like after tweaking a gazillion undocumented parameters in a patch to the driver and recompiling it I finally got it working, but it's slow and it might be causing some short freezes or lockups (not quite what one would want), or go out and buy stuff and then when you get it you can let us know what works (as if everyone has time to test and money to blow on loads of mobos to find one that works), or this particular revision of this particular chip works well (and then it turns out to be impossible to find any decent products that actually has that chip in it). I personally know several people who want to buy a mobo for running OpenSolaris on, but don't know which one to buy. If I were to buy one I would only accept relatively cheap mobos that are widely available, so that my friends then could easily buy the same hardware when I let them know that mine works well. So, does anyone have, or know of, some suitable motherboard? *As it happens I too am looking for such a board and getting the same answers. The OpenSolaris HCL http://www.sun.com/bigadmin/hcl/index.htmlis outdated and not really helpful. The AHCI driver projecthttp://www.opensolaris.org/os/community/device_drivers/projects/AHCI/(which appears to be the crux of the question) has not been update recently (perhaps it has never been updated) and promises compatibility with an old short lived Intel part and VIA part I am unfamiliar with. Does anyone on this list have experience with a recent board with 6 or more SATA ports that they know is supported? Thanks! ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Force ditto block on different vdev?
Mirror and raidz suffer from the classic blockdevice abstraction problem in that they need disks of equal size. Not that I'm aware of. Mirror and raid-z will simply use the smallest size of your available disks. Exactly. The rest is not usable. Well I don't understand how you suggest to use it if you want redundancy. With more than two disks involved, you might have the space available, but not in simple 1:1 configurations. For instance, it might be nice to create a mirror with a 100G disk and two 50G disks. Right now someone has to create slices on the big disk manually and feed them to zpool. Letting ZFS handle everything itself might be a win for some cases. -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs on entire disk?
Is it possible/recommended to create a zpool and zfs setup such that the OS itself (in root /) is in its own zpool? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs send/recv partial filesystem?
Hi, I want to send peices of a zfs filesystem to another system. Can zfs send peices of a snapshot? Say I only want to send over /[EMAIL PROTECTED] and not include /app/conf data while /app/conf is still apart of the /[EMAIL PROTECTED] snapshot? I say app/conf as an example, it could be webserver/http-webservername/config or a piece of a repository. I know once the receive -F option is added I'll be able to blow away data that's at the receiving host side, which isn't always what I want. Yes, I know I can use rsync for this. Thanks in advance, ~~sa -- Shannon A. Fiume System Administrator, Infrastructure and Lab Management, Java Tools ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] solved: zfs on entire disk?
Thanks Cindy and Erik, The link to the boot page exactly answers my question. Russ Cindy Swearingen wrote: Hi Russ, If you are asking whether you can create a ZFS file system for the root file system and boot from it, it is possible on an x86 system running the Nevada release. Not possible yet for a Solaris 10 release. The zfs boot/root support in Nevada is minimal. You'll find more information here: http://opensolaris.org/os/community/zfs/faq/ and here: http://opensolaris.org/os/community/zfs/boot/ Cindy - Original Message - From: Russ Petruzzelli [EMAIL PROTECTED] Date: Friday, August 10, 2007 4:03 pm Subject: [zfs-discuss] zfs on entire disk? To: zfs-discuss@opensolaris.org Is it possible/recommended to create a zpool and zfs setup such that the OS itself (in root /) is in its own zpool? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/recv partial filesystem?
Shannon Fiume wrote: Hi, I want to send peices of a zfs filesystem to another system. Can zfs send peices of a snapshot? Say I only want to send over /[EMAIL PROTECTED] and not include /app/conf data while /app/conf is still apart of the /[EMAIL PROTECTED] snapshot? I say app/conf as an example, it could be webserver/http-webservername/config or a piece of a repository. Nope. You'll have to make /app/conf a different filesystem to accomplish your goal with zfs send|recv. --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss