Re: [zfs-discuss] hardware sizing for a zfs-based system?
Heya Kent, Kent Watsen wrote: It sounds good, that way, but (in theory), you'll see random I/O suffer a bit when using RAID-Z2: the extra parity will drag performance down a bit. I know what you are saying, but I , wonder if it would be noticeable? I Well, noticeable again comes back to your workflow. As you point out to Richard, it's (theoretically) 2x IOPS difference, which can be very significant for some people. think my worst case scenario would be 3 myth frontends watching 1080p content while 4 tuners are recording 1080p content - with each 1080p stream being 27Mb/s, that would be 108Mb/s writes and 81Mb/s reads (all sequential I/O) - does that sound like it would even come close to pushing a 4(4+2) array? I would say no, not even close to pushing it. Remember, we're measuring performance in MBytes/s, and video throughput is measured in Mbit/s (and even then, I imagine that a 27 Mbit/s stream over the air is going to be pretty rare). So I'm figuring you're just scratching the surface of even a minimal array. Put it this way: can a single, modern hard drive keep up with an ADSL2+ (24 Mbit/s) connection? Throw 24 spindles at the problem, and I'd say you have headroom for a *lot* of streams. The RAS guys will flinch at this, but have you considered 8*(2+1) RAID-Z1? That configuration showed up in the output of the program I posted back in July (http://mail.opensolaris.org/pipermail/zfs-discuss/2007-July/041778.html): 24 bays w/ 500 GB drives having MTBF=5 years - can have 8 (2+1) w/ 0 spares providing 8000 GB with MTTDL of 95.05 years - can have 4 (4+2) w/ 0 spares providing 8000 GB with MTTDL of 8673.50 years But it is 91 times more likely to fail and this system will contain data that I don't want to risk losing I wasn't sure, with your workload. I know with mine, I'm seeing the data store as being mostly temporary. With that much data streaming in and out, are you planning on archiving *everything*? Cos that's only one month's worth of HD video. I'd consider tuning a portion of the array for high throughput, and another for high redundancy as an archive for whatever you don't want to lose. Whether that's by setting copies=2, or by having a mirrored zpool (smart for an archive, because you'll be less sensitive to the write performance that suffers there), it's up to you... ZFS gives us a *lot* of choices. (But then you knew that, and it's what brought you to the list :) I don't want to over-pimp my links, but I do think my blogged experiences with my server (also linked in another thread) might give you something to think about: http://lindsay.at/blog/archive/tag/zfs-performance/ I see that you also set up a video server (myth?), For the uncompressed HD test case, no. It'd be for storage/playout of Ultra-Grid-like streams, and really, that's there so our network guys can give their 10Gb links a little bit of a workout. from you blog, I think you are doing 5(2+1) (plus a hot-spare?) - this is what my program says about a 16-bay system: 16 bays w/ 500 GB drives having MTBF=5 years - can have 5 (2+1) w/ 1 spares providing 5000 GB with MTTDL of 1825.00 years [snipped some interesting numbers] Note that are MTTDL isn't quite as bad as 8(2+1) since you have three less strips. I also committed to having at least one hot spare, which, after staring at relling's graphs for days on end, seems to be the cheapest, easiest way of upping the MTTDL for any array. I'd recommend it. Also, its interesting for me to note that have have 5 strips and my 4(4+2) setup would have just one less - so the question to answer if your extra strip is better than my 2 extra disks in each raid-set? As I understand it, 5(2+1) would scale to better IOPS performance than 4(4+2), and IOPS represents the performance baseline; as you ask the array to do more and more at once, it'll look more like random seeks. What you get from those bigger zvol groups of 4+2 is higher performance per zvol. That said, with my few datapoints on 4+1 RAID-Z groups (running on 2 controllers) suggest that that configuration runs into a bottleneck somewhere, and underperforms from what's expected. Testing 16 disks locally, however, I do run into noticeable I/O bottlenecks, and I believe it's down to the top limits of the PCI-X bus. Yes, too bad Supermicro doesn't make a PCIe-based version... But still, the limit of a 64-bit, 133.3MHz PCI-X bus is 1067 MB/s whereas a 64-bit, 100MHz, PCI-X bus is 800MB/s - either way, its much faster than my worst case scenario from above where 7 1080p streams would be 189Mb/s... Oh, the bus will far exceed your needs, I think. The exercise is to specify something that handles what you need without breaking the bank, no? BTW, where are these HDTV streams coming from/going to? Ethernet? A capture card? (and which ones will work with Solaris?) Still, though, take a look at
[zfs-discuss] PLOGI errors
Hello, today we made some tests with failed drives on a zpool. (SNV60, 2xHBA, 4xJBOD connected through 2 Brocade 2800) On the log we found hundred of the following errors: Sep 16 12:04:23 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(0): PLOGI to 11dca failed state=Timeout, reason=Hardware Error Sep 16 12:04:23 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(0)::PLOGI to 11dca failed. state=c reason=1. Sep 16 12:04:24 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(1): PLOGI to 11dcc failed state=Timeout, reason=Hardware Error Sep 16 12:04:24 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(1)::PLOGI to 11dcc failed. state=c reason=1. Sep 16 12:04:43 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(0): PLOGI to 11d01 failed state=Timeout, reason=Hardware Error Sep 16 12:04:43 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(0)::PLOGI to 11d01 failed. state=c reason=1. Sep 16 12:04:44 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(1): PLOGI to 11dca failed state=Timeout, reason=Hardware Error Sep 16 12:04:44 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(1)::PLOGI to 11dca failed. state=c reason=1. Sep 16 12:05:04 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(0): PLOGI to 11dd6 failed state=Timeout, reason=Hardware Error Sep 16 12:05:04 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(0)::PLOGI to 11dd6 failed. state=c reason=1. Sep 16 12:05:04 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(1): PLOGI to 11dd6 failed state=Timeout, reason=Hardware Error Could be related to http://sunsolve.sun.com/search/document.do?assetkey=1-26-57773-1 ?? Gino This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] PLOGI errors
Gino, although these messages show some similarity to ones in the Sun Alert you are referring to, it looks like this is unrelated. Sun Alert 57773 describes symptoms of a problem seen in SAN configurations with specific switches (Brocade SilkWorm Switch 12000, 24000, 3250, 3850, 3900) with specific FabOS version (prior to 4.4.0b). Gino пишет: Hello, today we made some tests with failed drives on a zpool. (SNV60, 2xHBA, 4xJBOD connected through 2 Brocade 2800) Your switch model is different, so I believe Sun Alert 57773 is not applicable here. Hth, Victor On the log we found hundred of the following errors: Sep 16 12:04:23 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(0): PLOGI to 11dca failed state=Timeout, reason=Hardware Error Sep 16 12:04:23 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(0)::PLOGI to 11dca failed. state=c reason=1. Sep 16 12:04:24 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(1): PLOGI to 11dcc failed state=Timeout, reason=Hardware Error Sep 16 12:04:24 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(1)::PLOGI to 11dcc failed. state=c reason=1. Sep 16 12:04:43 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(0): PLOGI to 11d01 failed state=Timeout, reason=Hardware Error Sep 16 12:04:43 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(0)::PLOGI to 11d01 failed. state=c reason=1. Sep 16 12:04:44 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(1): PLOGI to 11dca failed state=Timeout, reason=Hardware Error Sep 16 12:04:44 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(1)::PLOGI to 11dca failed. state=c reason=1. Sep 16 12:05:04 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(0): PLOGI to 11dd6 failed state=Timeout, reason=Hardware Error Sep 16 12:05:04 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(0)::PLOGI to 11dd6 failed. state=c reason=1. Sep 16 12:05:04 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(1): PLOGI to 11dd6 failed state=Timeout, reason=Hardware Error Could be related to http://sunsolve.sun.com/search/document.do?assetkey=1-26-57773-1 ?? Gino This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] hardware sizing for a zfs-based system?
- can have 6 (2+2) w/ 0 spares providing 6000 GB with MTTDL of 28911.68 years This should, of course, set off one's common-sense alert. it is 91 times more likely to fail and this system will contain data that I don't want to risk losing If you don't want to risk losing data, you need multiple -- off-site -- copies. (Incidentally, I rarely see these discussions touch upon what sort of UPS is being used. Power fluctuations are a great source of correlated disk failures.) Anton This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [xen-discuss] hardware sizing for a zfs-based system?
One option I'm still holding on to is to also use the ZFS system as a Xen-server - that is OpenSolaris would be running in Dom0... Given that the Xen hypervisor has a pretty small cpu/memory footprint, do you think it could share 2-cores + 4Gb with ZFS or should I allocate 3 cores of Dom0 and bump the memory up 512MB? A dom0 with 4G and 2 cores should be plenty to run ZFS and the support necessary for a reasonable (16) paravirtualised domains. If the guest domains end up using HVM then the dom0 load is higher, but we haven't done the work to quantify this properly yet. dme. -- David Edmondson, Solaris Engineering, http://dme.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [xen-discuss] hardware sizing for a zfs-based system?
David Edmondson wrote: One option I'm still holding on to is to also use the ZFS system as a Xen-server - that is OpenSolaris would be running in Dom0... Given that the Xen hypervisor has a pretty small cpu/memory footprint, do you think it could share 2-cores + 4Gb with ZFS or should I allocate 3 cores of Dom0 and bump the memory up 512MB? A dom0 with 4G and 2 cores should be plenty to run ZFS and the support necessary for a reasonable (16) paravirtualised domains. If the guest domains end up using HVM then the dom0 load is higher, but we haven't done the work to quantify this properly yet. A tasty insight - a million thanks! I think if I get 2 quad-cores and 16Gb mem, I'd be able to stomach the overhead of 25%cpu and 25%mem going to the host - as the cost differential of have a dedicated SAN with another totally-redundant Xen box would be more expensive Cheers! Kent ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] hardware sizing for a zfs-based system?
I know what you are saying, but I , wonder if it would be noticeable? I Well, noticeable again comes back to your workflow. As you point out to Richard, it's (theoretically) 2x IOPS difference, which can be very significant for some people. Yeah, but my point is if it would be noticeable to *me* (yes, I am a bit self-centered) I would say no, not even close to pushing it. Remember, we're measuring performance in MBytes/s, and video throughput is measured in Mbit/s (and even then, I imagine that a 27 Mbit/s stream over the air is going to be pretty rare). So I'm figuring you're just scratching the surface of even a minimal array. Put it this way: can a single, modern hard drive keep up with an ADSL2+ (24 Mbit/s) connection? Throw 24 spindles at the problem, and I'd say you have headroom for a *lot* of streams. Sweet! I should probably hang-up this thread now, but there are too many other juicy bits to respond too... I wasn't sure, with your workload. I know with mine, I'm seeing the data store as being mostly temporary. With that much data streaming in and out, are you planning on archiving *everything*? Cos that's only one month's worth of HD video. Well, not to down-play the importance of my TV recordings, which is really a laugh because I'm not really a big TV watcher, I simply don't want to ever have to think about this again after getting it setup I'd consider tuning a portion of the array for high throughput, and another for high redundancy as an archive for whatever you don't want to lose. Whether that's by setting copies=2, or by having a mirrored zpool (smart for an archive, because you'll be less sensitive to the write performance that suffers there), it's up to you... ZFS gives us a *lot* of choices. (But then you knew that, and it's what brought you to the list :) All true, but if 4(4+2) serves all my needs, I think that its simpler to administrate as I can arbitrarily allocate space as needed without needing to worry about what kind of space it is - all the space is good and fast space... I also committed to having at least one hot spare, which, after staring at relling's graphs for days on end, seems to be the cheapest, easiest way of upping the MTTDL for any array. I'd recommend it. No doubt that a hot-spare gives you a bump in MTTDL, but double-parity trumps it big time - check out Richard's blog... As I understand it, 5(2+1) would scale to better IOPS performance than 4(4+2), and IOPS represents the performance baseline; as you ask the array to do more and more at once, it'll look more like random seeks. What you get from those bigger zvol groups of 4+2 is higher performance per zvol. That said, with my few datapoints on 4+1 RAID-Z groups (running on 2 controllers) suggest that that configuration runs into a bottleneck somewhere, and underperforms from what's expected. Er? Can anyone fill in the missing blank here? Oh, the bus will far exceed your needs, I think. The exercise is to specify something that handles what you need without breaking the bank, no? Bank, smank - I build a system every 5+ years and I want it to kick ass all the way until I build the next one - cheers! BTW, where are these HDTV streams coming from/going to? Ethernet? A capture card? (and which ones will work with Solaris?) Glad you asked, for the lists sake, I'm using two HDHomeRun tuners (http://www.silicondust.com/wiki/products/hdhomerun) - actually, I bought 3 of them because I felt like I needed a spare :-D Yeah, perhaps I've been a bit too circumspect about it, but I haven't been all that impressed with my PCI-X bus configuration. Knowing what I know now, I might've spec'd something different. Of all the suggestions that've gone out on the list, I was most impressed with Tim Cook's: Won't come cheap, but this mobo comes with 6x pci-x slots... should get the job done :) http://www.supermicro.com/products/motherboard/Xeon1333/5000P/X7DBE-X.cfm That has 3x 133MHz PCI-X slots each connected to the Southbridge via a different PCIe bus, which sounds worthy of being the core of the demi-Thumper you propose. Yeah, but getting back to PCIe I see these tasty SAS/SATA HBAs from LSI: http://www.lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/lsisas3081er/index.html (note, LSI also sells matching PCI-X HBA controllers, in case you need to balance your mobo's architecture] ...But It all depends what you intend to spend. (This is what I was going to say in my next blog entry on the system:) We're talking about benchmarks that are really far past what you say is your most taxing work load. I say I'm disappointed with the contention on my bus putting limits on maximum throughputs, but really, what I have far outstrips my ability to get data into or out of the system. So moving to the PCIe-based cards should fix that - no? So all of my disappointment is in theory. Seems like this
Re: [zfs-discuss] hardware sizing for a zfs-based system?
Kent Watsen wrote: Glad you brought that up - I currently have an APC 2200XL (http://www.apcc.com/resource/include/techspec_index.cfm?base_sku=SU2200XLNET) - its rated for 1600 watts, but my current case selections are saying they have a 1500W 3+1, should I be worried? Probably not, my box has 10 drives and two very thirsty FX74 processors and it draws 450W max. At 1500W, I'd be more concerned about power bills and cooling than the UPS! Ian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss