Re: [zfs-discuss] hardware sizing for a zfs-based system?

2007-09-16 Thread Adam Lindsay
Heya Kent,

Kent Watsen wrote:
 It sounds good, that way, but (in theory), you'll see random I/O 
 suffer a bit when using RAID-Z2: the extra parity will drag 
 performance down a bit.
 I know what you are saying, but I , wonder if it would be noticeable?  I 

Well, noticeable again comes back to your workflow. As you point out 
to Richard, it's (theoretically) 2x IOPS difference, which can be very 
significant for some people.

 think my worst case scenario would be 3 myth frontends watching 1080p 
 content while 4 tuners are recording 1080p content - with each 1080p 
 stream being 27Mb/s, that would be 108Mb/s writes and 81Mb/s reads (all 
 sequential I/O) - does that sound like it would even come close to 
 pushing a 4(4+2) array?

I would say no, not even close to pushing it. Remember, we're measuring 
performance in MBytes/s, and video throughput is measured in Mbit/s (and 
even then, I imagine that a 27 Mbit/s stream over the air is going to be 
pretty rare). So I'm figuring you're just scratching the surface of even 
a minimal array.

Put it this way: can a single, modern hard drive keep up with an ADSL2+ 
(24 Mbit/s) connection?
Throw 24 spindles at the problem, and I'd say you have headroom for a 
*lot* of streams.

 The RAS guys will flinch at this, but have you considered 8*(2+1) 
 RAID-Z1?
 That configuration showed up in the output of the program I posted back 
 in July 
 (http://mail.opensolaris.org/pipermail/zfs-discuss/2007-July/041778.html):
 
24 bays w/ 500 GB drives having MTBF=5 years
  - can have 8 (2+1) w/ 0 spares providing 8000 GB with MTTDL of
95.05 years
  - can have 4 (4+2) w/ 0 spares providing 8000 GB with MTTDL of
8673.50 years
 
 But it is 91 times more likely to fail and this system will contain data 
 that  I don't want to risk losing

I wasn't sure, with your workload. I know with mine, I'm seeing the data 
store as being mostly temporary. With that much data streaming in and 
out, are you planning on archiving *everything*? Cos that's only one 
month's worth of HD video.

I'd consider tuning a portion of the array for high throughput, and 
another for high redundancy as an archive for whatever you don't want to 
lose. Whether that's by setting copies=2, or by having a mirrored zpool 
(smart for an archive, because you'll be less sensitive to the write 
performance that suffers there), it's up to you...
ZFS gives us a *lot* of choices. (But then you knew that, and it's what 
brought you to the list :)

 I don't want to over-pimp my links, but I do think my blogged 
 experiences with my server (also linked in another thread) might give 
 you something to think about:
  http://lindsay.at/blog/archive/tag/zfs-performance/
 I see that you also set up a video server (myth?), 

For the uncompressed HD test case, no. It'd be for storage/playout of 
Ultra-Grid-like streams, and really, that's there so our network guys 
can give their 10Gb links a little bit of a workout.

 from you blog, I 
 think you are doing 5(2+1) (plus a hot-spare?)  - this is what my 
 program says about a 16-bay system:
 
16 bays w/ 500 GB drives having MTBF=5 years
  - can have 5 (2+1) w/ 1 spares providing 5000 GB with MTTDL of
1825.00 years
  [snipped some interesting numbers]
 Note that are MTTDL isn't quite as bad as 8(2+1) since you have three 
 less strips.  

I also committed to having at least one hot spare, which, after staring 
at relling's graphs for days on end, seems to be the cheapest, easiest 
way of upping the MTTDL for any array. I'd recommend it.

Also, its interesting for me to note that have have 5 
 strips and my 4(4+2) setup would have just one less - so the question to 
 answer if your extra strip is better than my 2 extra disks in each 
 raid-set?

As I understand it, 5(2+1) would scale to better IOPS performance than 
4(4+2), and IOPS represents the performance baseline; as you ask the 
array to do more and more at once, it'll look more like random seeks.

What you get from those bigger zvol groups of 4+2 is higher performance 
per zvol. That said, with my few datapoints on 4+1 RAID-Z groups 
(running on 2 controllers) suggest that that configuration runs into a 
bottleneck somewhere, and underperforms from what's expected.

 Testing 16 disks locally, however, I do run into noticeable I/O 
 bottlenecks, and I believe it's down to the top limits of the PCI-X bus.
 Yes, too bad Supermicro doesn't make a PCIe-based version...   But 
 still, the limit of a 64-bit, 133.3MHz PCI-X bus is 1067 MB/s whereas a 
 64-bit, 100MHz, PCI-X bus is 800MB/s - either way, its much faster than 
 my worst case scenario from above where 7 1080p streams would be 189Mb/s...

Oh, the bus will far exceed your needs, I think.
The exercise is to specify something that handles what you need without 
breaking the bank, no?

BTW, where are these HDTV streams coming from/going to? Ethernet? A 
capture card? (and which ones will work with Solaris?)

 Still, though, take a look at 

[zfs-discuss] PLOGI errors

2007-09-16 Thread Gino
Hello,
today we made some tests with failed drives on a zpool.
(SNV60, 2xHBA, 4xJBOD connected through 2 Brocade 2800)
On the log we found hundred of the following errors:

Sep 16 12:04:23 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(0): PLOGI to 11dca 
failed state=Timeout, reason=Hardware Error
Sep 16 12:04:23 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(0)::PLOGI to 
11dca failed. state=c reason=1.
Sep 16 12:04:24 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(1): PLOGI to 11dcc 
failed state=Timeout, reason=Hardware Error
Sep 16 12:04:24 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(1)::PLOGI to 
11dcc failed. state=c reason=1.
Sep 16 12:04:43 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(0): PLOGI to 11d01 
failed state=Timeout, reason=Hardware Error
Sep 16 12:04:43 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(0)::PLOGI to 
11d01 failed. state=c reason=1.
Sep 16 12:04:44 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(1): PLOGI to 11dca 
failed state=Timeout, reason=Hardware Error
Sep 16 12:04:44 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(1)::PLOGI to 
11dca failed. state=c reason=1.
Sep 16 12:05:04 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(0): PLOGI to 11dd6 
failed state=Timeout, reason=Hardware Error
Sep 16 12:05:04 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(0)::PLOGI to 
11dd6 failed. state=c reason=1.
Sep 16 12:05:04 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(1): PLOGI to 11dd6 
failed state=Timeout, reason=Hardware Error

Could be related to 
http://sunsolve.sun.com/search/document.do?assetkey=1-26-57773-1  ??

Gino
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] PLOGI errors

2007-09-16 Thread Victor Latushkin
Gino,

although these messages show some similarity to ones in the Sun Alert 
you are referring to, it looks like this is unrelated. Sun Alert 57773 
describes symptoms of a problem seen in SAN configurations with specific 
switches (Brocade SilkWorm Switch 12000, 24000, 3250, 3850, 3900) with 
specific FabOS version (prior to 4.4.0b).

Gino пишет:
 Hello,
 today we made some tests with failed drives on a zpool.
 (SNV60, 2xHBA, 4xJBOD connected through 2 Brocade 2800)

Your switch model is different, so I believe Sun Alert 57773 is not 
applicable here.

Hth,
Victor

 On the log we found hundred of the following errors:
 
 Sep 16 12:04:23 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(0): PLOGI to 
 11dca failed state=Timeout, reason=Hardware Error
 Sep 16 12:04:23 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(0)::PLOGI 
 to 11dca failed. state=c reason=1.
 Sep 16 12:04:24 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(1): PLOGI to 
 11dcc failed state=Timeout, reason=Hardware Error
 Sep 16 12:04:24 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(1)::PLOGI 
 to 11dcc failed. state=c reason=1.
 Sep 16 12:04:43 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(0): PLOGI to 
 11d01 failed state=Timeout, reason=Hardware Error
 Sep 16 12:04:43 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(0)::PLOGI 
 to 11d01 failed. state=c reason=1.
 Sep 16 12:04:44 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(1): PLOGI to 
 11dca failed state=Timeout, reason=Hardware Error
 Sep 16 12:04:44 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(1)::PLOGI 
 to 11dca failed. state=c reason=1.
 Sep 16 12:05:04 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(0): PLOGI to 
 11dd6 failed state=Timeout, reason=Hardware Error
 Sep 16 12:05:04 svrt12 fctl: [ID 517869 kern.warning] WARNING: fp(0)::PLOGI 
 to 11dd6 failed. state=c reason=1.
 Sep 16 12:05:04 svrt12 fp: [ID 517869 kern.info] NOTICE: fp(1): PLOGI to 
 11dd6 failed state=Timeout, reason=Hardware Error
 
 Could be related to 
 http://sunsolve.sun.com/search/document.do?assetkey=1-26-57773-1  ??
 
 Gino
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hardware sizing for a zfs-based system?

2007-09-16 Thread Anton B. Rang
 - can have 6 (2+2) w/ 0 spares providing 6000 GB with MTTDL of
 28911.68 years

This should, of course, set off one's common-sense alert.

 it is 91 times more likely to fail and this system will contain data 
 that I don't want to risk losing

If you don't want to risk losing data, you need multiple -- off-site -- copies.

(Incidentally, I rarely see these discussions touch upon what sort of UPS is 
being used. Power fluctuations are a great source of correlated disk failures.)

Anton
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [xen-discuss] hardware sizing for a zfs-based system?

2007-09-16 Thread David Edmondson
 One option I'm still holding on to is to also use the ZFS system as a
 Xen-server - that is OpenSolaris would be running in Dom0...  Given  
 that
 the Xen hypervisor has a pretty small cpu/memory footprint, do you  
 think
 it could share 2-cores + 4Gb with ZFS or should I allocate 3 cores of
 Dom0 and bump the memory up 512MB?

A dom0 with 4G and 2 cores should be plenty to run ZFS and the  
support necessary for a reasonable (16) paravirtualised domains. If  
the guest domains end up using HVM then the dom0 load is higher, but  
we haven't done the work to quantify this properly yet.

dme.
-- 
David Edmondson, Solaris Engineering, http://dme.org


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [xen-discuss] hardware sizing for a zfs-based system?

2007-09-16 Thread Kent Watsen
David Edmondson wrote:
 One option I'm still holding on to is to also use the ZFS system as a
 Xen-server - that is OpenSolaris would be running in Dom0...  Given that
 the Xen hypervisor has a pretty small cpu/memory footprint, do you think
 it could share 2-cores + 4Gb with ZFS or should I allocate 3 cores of
 Dom0 and bump the memory up 512MB?

 A dom0 with 4G and 2 cores should be plenty to run ZFS and the support 
 necessary for a reasonable (16) paravirtualised domains. If the guest 
 domains end up using HVM then the dom0 load is higher, but we haven't 
 done the work to quantify this properly yet.

A tasty insight - a million thanks!

I think if I get 2 quad-cores and 16Gb mem, I'd be able to stomach the 
overhead of 25%cpu and 25%mem going to the host - as the cost 
differential of have a dedicated SAN with another totally-redundant Xen 
box would be more expensive

Cheers!
Kent

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hardware sizing for a zfs-based system?

2007-09-16 Thread Kent Watsen

 I know what you are saying, but I , wonder if it would be noticeable?  I 

 Well, noticeable again comes back to your workflow. As you point out 
 to Richard, it's (theoretically) 2x IOPS difference, which can be very 
 significant for some people.
Yeah, but my point is if it would be noticeable to *me* (yes, I am a bit 
self-centered)

 I would say no, not even close to pushing it. Remember, we're 
 measuring performance in MBytes/s, and video throughput is measured in 
 Mbit/s (and even then, I imagine that a 27 Mbit/s stream over the air 
 is going to be pretty rare). So I'm figuring you're just scratching 
 the surface of even a minimal array.

 Put it this way: can a single, modern hard drive keep up with an 
 ADSL2+ (24 Mbit/s) connection?
 Throw 24 spindles at the problem, and I'd say you have headroom for a 
 *lot* of streams.
Sweet!  I should probably hang-up this thread now, but there are too 
many other juicy bits to respond too...

 I wasn't sure, with your workload. I know with mine, I'm seeing the 
 data store as being mostly temporary. With that much data streaming in 
 and out, are you planning on archiving *everything*? Cos that's only 
 one month's worth of HD video.
Well, not to down-play the importance of my TV recordings, which is 
really a laugh because I'm not really a big TV watcher, I simply don't 
want to ever have to think about this again after getting it setup

 I'd consider tuning a portion of the array for high throughput, and 
 another for high redundancy as an archive for whatever you don't want 
 to lose. Whether that's by setting copies=2, or by having a mirrored 
 zpool (smart for an archive, because you'll be less sensitive to the 
 write performance that suffers there), it's up to you...
 ZFS gives us a *lot* of choices. (But then you knew that, and it's 
 what brought you to the list :)
All true, but if 4(4+2) serves all my needs, I think that its simpler to 
administrate as I can arbitrarily allocate space as needed without 
needing to worry about what kind of space it is - all the space is good 
and fast space...

 I also committed to having at least one hot spare, which, after 
 staring at relling's graphs for days on end, seems to be the cheapest, 
 easiest way of upping the MTTDL for any array. I'd recommend it.
No doubt that a hot-spare gives you a bump in MTTDL, but double-parity 
trumps it big time - check out Richard's blog...

 As I understand it, 5(2+1) would scale to better IOPS performance than 
 4(4+2), and IOPS represents the performance baseline; as you ask the 
 array to do more and more at once, it'll look more like random seeks.

 What you get from those bigger zvol groups of 4+2 is higher 
 performance per zvol. That said, with my few datapoints on 4+1 RAID-Z 
 groups (running on 2 controllers) suggest that that configuration runs 
 into a bottleneck somewhere, and underperforms from what's expected.
Er?  Can anyone fill in the missing blank here?


 Oh, the bus will far exceed your needs, I think.
 The exercise is to specify something that handles what you need 
 without breaking the bank, no?
Bank, smank - I build a system every 5+ years and I want it to kick ass 
all the way until I build the next one - cheers!


 BTW, where are these HDTV streams coming from/going to? Ethernet? A 
 capture card? (and which ones will work with Solaris?)
Glad you asked, for the lists sake, I'm using two HDHomeRun tuners 
(http://www.silicondust.com/wiki/products/hdhomerun) - actually, I 
bought 3 of them because I felt like I needed a spare :-D


 Yeah, perhaps I've been a bit too circumspect about it, but I haven't 
 been all that impressed with my PCI-X bus configuration. Knowing what 
 I know now, I might've spec'd something different. Of all the 
 suggestions that've gone out on the list, I was most impressed with 
 Tim Cook's:

 Won't come cheap, but this mobo comes with 6x pci-x slots... should 
 get the job done :)

 http://www.supermicro.com/products/motherboard/Xeon1333/5000P/X7DBE-X.cfm 


 That has 3x 133MHz PCI-X slots each connected to the Southbridge via a 
 different PCIe bus, which sounds worthy of being the core of the 
 demi-Thumper you propose.
Yeah, but getting back to PCIe I see these tasty SAS/SATA HBAs from LSI: 
http://www.lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/lsisas3081er/index.html
 
(note, LSI also sells matching PCI-X HBA controllers, in case you need 
to balance your mobo's architecture]

 ...But It all depends what you intend to spend. (This is what I 
 was going to say in my next blog entry on the system:) We're talking 
 about benchmarks that are really far past what you say is your most 
 taxing work load. I say I'm disappointed with the contention on my 
 bus putting limits on maximum throughputs, but really, what I have far 
 outstrips my ability to get data into or out of the system.
So moving to the PCIe-based cards should fix that - no?

 So all of my disappointment is in theory.
Seems like this 

Re: [zfs-discuss] hardware sizing for a zfs-based system?

2007-09-16 Thread Ian Collins
Kent Watsen wrote:

 Glad you brought that up - I currently have an APC 2200XL 
 (http://www.apcc.com/resource/include/techspec_index.cfm?base_sku=SU2200XLNET)
  
 - its rated for 1600 watts, but my current case selections are saying 
 they have a 1500W 3+1, should I be worried?

   
Probably not, my box has 10 drives and two very thirsty FX74 processors
and it draws 450W max.

At 1500W, I'd be more concerned about power bills and cooling than the UPS!

Ian
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss