[zfs-discuss] Reliability WAS: LSI 3081 (1068) + expander + (bad) SATA disk?
On Sun, Apr 8, 2012 at 7:40 AM, Jim Klimov jimkli...@cos.ru wrote: 2012-04-08 6:06, Richard Elling wrote: You can't get past the age-old idiom: you get what you pay for. True... but it can be somewhat countered with DrHouse-age idiom: people lie, even if they don't mean to ;) I think both aspects over simplify the issue, but that is generally true for idioms :-) I see Moore's Law in effect in a bunch of ways here, not all of them good. The available _capacity_ for a given _price_ at a given _reliability_ and _performance_ increases with time. Two unintended side effects of this increase in capacity _at_the_same_reliability_ is that problems that a user was not likely to run into (statistically) now occur on a more frequent basis and the time to correct (rebuild) after failure is longer. It used to be that we looked at cost vs. capacity vs. performance, the old adage was pick two of the three. Today we need to add the fourth parameter, reliability. I'm not sure the relationship between the four is a linear one :-) So if you have x dollars (or euros or yen or ...) you need to decide how much capacity / performance / reliability you can get for that x and what you need. For my home server, where I am very cost conscious, I went with Seagate SATA drives, but I went for the ES2 series which _are_ rated for 7x24 operation and have a (slightly) better reliability spec than the lower cost drives I compared them with. In the past, my need for an increase in capacity caused replacement of hard drives. This time I'm not sure if I'll run out of capacity before the drives reach end of practical service life and start failing. -- {1-2-3-4-5-6-7-} Paul Kraus - Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) - Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) - Technical Advisor, RPI Players ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] What's wrong with LSI 3081 (1068) + expander + (bad) SATA
Out of curiosity, are there any third-party hardware vendors that make server/storage chassis (Supermicro et al) who make SATA backplanes with the SAS interposers soldered on? There doesn't seem to be much out there, though I haven't looked. Would that make sense, or be cheaper/more reliable than having extra junk between the disk and backplane connectors? (if I correctly understand what the talk is about? ;) Honestly, probably not. Either the expander chip properly handles SATA-tunneling, or it doesn't. Shoving an interposer in just throws a band-aid on the problem IMO. ZFS was very attractive at first because of the claim that it returns Inexpensive into raId and can do miracles with SATA disks. Reality has shown to many of us that many SATA implementations existing in the wild should be avoided... so we're back to good vendors' higher end expensive SATAs or better yet SAS drives. Not inexpensive anymore again :( Many != all. Not that I've tried a whole bunch of them, mind you.However, I've found all of the SuperMicro SAS1 backplanes to be somewhat problematic with SATA drives, especially if you use the 1068-based controllers. It was horrible with the really old single-digit-Phase firmware. I find it... acceptable...with 2008-based controllers. I've finally settled on having one box based on 1068s (3081s) and I think it's up to 4 or 5 expanders of 16 1TB drives. Basically, the box hangs when certain drives die in certain ways - it eventually gets over it, mostly, but it can hang for a bit. I might see the occasional hang until you yank the bad disk, but drives don't die THAT often - even 3yr-old-seagate-cudas. Granted, most all the firmware on the 333ASes has finally been updated to the CC1H version. I might note that that box represents most of the collection of 1068+SAS1-based expanders that I have. It's an archival system that doesn't do much at all (well, the CPU pounds like hell running rtgpoll but that's a different matter having nothing to do with the ZFS pools). I also have a small pile of leftover 3081s and 3041s if anyone's interested. :) Now, I suspect that there is improved LSI firmware available for the SAS1 expander chips. I could go chasing after it - SMC doesn't have it public, but LSI probably has it somewhere and I know an expert I could ask to go through and tweak my controllers. However, I hadn't met him 3 years ago, and now it just isn't worth my time (or worth paying him to do it). On the other hand, I have two-hands-worth of CSE847-E26-RJBOD1s stuffed with 'cuda 2T and 3T SATA drives, connected to 9211-8es running the phase-10 firmware. One box is up to 170TB worth. It's fine. Nary an issue. Granted, I'm not beating the arrays to death - again, that's not what it's for, it's there to hang onto a bunch of data. But it does get used, and I'm writing 200-300GB/day to it. I have another such JBOD attached to a box with a pile of constellations, and it causes no issues. Frankly, I would say that yes ZFS _does_ do miracles with inexpensive disks. I can trust 100s of TB to it and not worry about losing any of it. But it wasn't written by Jesus; it can't turn water into wine or deal with all of the terrible variations of Crap out there into enterprise-level replacements for your EMC arrays. Nor can it cope with having Any Old Random Crap you have laying around thrown at it - but it does surprisingly well, IMO. My home box is actually just a bunch of random drives tied onto 3 3041 controllers on an old overclocked Q6600 on an ASUS board and it's never had a problem, not ever. So, is there really a fundamental requirement to avoid cheap hardware, and are there no good ways to work around its inherently higher instability and lack of dependability? Or is it just a harder goal (indefinitely far away on the project roadmap)? ZFS is no replacement for doing your research, homework, and testing. God only knows I've gone through some crap - I had a 20pk of WD 2TB Blacks that I bought that turned out to work for $*%$. I suppose with enough patience and effort I could have made them work, but Seagate's firmware has just simply been more reliable, and the contents of that box have filtered their way into desktops. (Some of which are in that home machine mentioned above - attached directly to the controller, no problems at all.) If you're going to do BYO for your enterprise needs, be prepared to fork over the additional cash you are going to need to spend on test kit - defined both as kit to test on, and kit you buy, test, and pitch because it don't work like Vendor said or A don't play nice with B. Which sometimes is much cheaper than fighting with Vendor A and Vendor B about why they can't work together. Not to mention the RD time. I don't avoid cheap hardware. Any number of my colleagues would say I am insane for running what they would say should be being run on EMC or NetApp on a handful of Solaris 10 fileservers built off raw SuperMicro boxes. But we
[zfs-discuss] J4400 question (not directly ZFS)
Sorry for the off topic post, but I figure there is experience here. I have a total of ten J4400 chassis all loaded with SATA drives. Has anyone noticed a tendency for drives in specific slots to fail more often than others? I have seen more drive failures in slot 20 than any other. I am wondering if there is something about slot 20 that may be causing drives to fail. -- {1-2-3-4-5-6-7-} Paul Kraus - Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) - Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) - Technical Advisor, Troy Civic Theatre Company - Technical Advisor, RPI Players ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] J4400 question (not directly ZFS)
On Apr 9, 2012, at 7:10 AM, Paul Kraus wrote: Sorry for the off topic post, but I figure there is experience here. I have a total of ten J4400 chassis all loaded with SATA drives. Has anyone noticed a tendency for drives in specific slots to fail more often than others? I have seen more drive failures in slot 20 than any other. I am wondering if there is something about slot 20 that may be causing drives to fail. It depends on the failure mode. From an electrical perspective, not all slots are equal. Electrical issues can manifest themselves as noise or higher than normal BER. This can often be seen in the link state counters, but you need to check all link ends: HBA, expanders, and disks. Symptoms can include: timeouts, retries, intermittent data corruption, etc. From a mechanical perspective, not all slots are equal. Different slots have different vibration characteristics. However, these rarely result in failures and there does not appear to be a correlation vibration and disk failures. -- richard -- ZFS Performance and Training richard.ell...@richardelling.com +1-760-896-4422 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Seagate Constellation vs. Hitachi Ultrastar
richard.ell...@richardelling.com said: We are starting to see a number of SAS HDDs that prefer logical-block to round-robin. I see this with late model Seagate and Toshiba HDDs. There is another, similar issue with recognition of multipathing by the scsi_vhci driver. Both of these are being tracked as https://www.illumos.org/ issues/644 and there is an alternate scsi_vhci.conf file posted in that bugid. Interesting, I just last week had a Toshiba come from Dell as a replacement for a Seagate 2TB SAS drive; On Solaris-10, the Toshiba insisted on showing up as 2 drives, so mpxio was not recognizing it. Fortunately I was able to swap the drive for a Seagate, but I'll stash away a copy of the scsi_vhci.conf entry for the future. We're considering making logical-block the default (as in above bugid) and we have not discovered a reason to keep round-robin. If you know of any reason why round-robin is useful, please add to the bugid. Should be fine. When I first ran into this a couple years ago, I did a lot of tests and found logical-block to be slower than none (with those Seagate 2TB SAS drives in Dell MD1200's), but not a whole lot slower. I vaguely recall that round-robin was better for highly random, small I/O (IOPS-intensive) workloads. I got the best results by manually load-balancing half the drives to one path and half the drives to the other path. But I decided it was not worth the effort. Maybe if there was a way to automatically do that (with a relatively static result) Of course, this was all tested on Solaris-10, so your mileage may vary. Regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Seagate Constellation vs. Hitachi Ultrastar
Are these issues something to watch out for on Solaris 11 as well? Thx in advance… -Anh On Apr 9, 2012, at 12:43 PM, Marion Hakanson wrote: richard.ell...@richardelling.com said: We are starting to see a number of SAS HDDs that prefer logical-block to round-robin. I see this with late model Seagate and Toshiba HDDs. There is another, similar issue with recognition of multipathing by the scsi_vhci driver. Both of these are being tracked as https://www.illumos.org/ issues/644 and there is an alternate scsi_vhci.conf file posted in that bugid. Interesting, I just last week had a Toshiba come from Dell as a replacement for a Seagate 2TB SAS drive; On Solaris-10, the Toshiba insisted on showing up as 2 drives, so mpxio was not recognizing it. Fortunately I was able to swap the drive for a Seagate, but I'll stash away a copy of the scsi_vhci.conf entry for the future. We're considering making logical-block the default (as in above bugid) and we have not discovered a reason to keep round-robin. If you know of any reason why round-robin is useful, please add to the bugid. Should be fine. When I first ran into this a couple years ago, I did a lot of tests and found logical-block to be slower than none (with those Seagate 2TB SAS drives in Dell MD1200's), but not a whole lot slower. I vaguely recall that round-robin was better for highly random, small I/O (IOPS-intensive) workloads. I got the best results by manually load-balancing half the drives to one path and half the drives to the other path. But I decided it was not worth the effort. Maybe if there was a way to automatically do that (with a relatively static result) Of course, this was all tested on Solaris-10, so your mileage may vary. Regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss