[zfs-discuss] Reliability WAS: LSI 3081 (1068) + expander + (bad) SATA disk?

2012-04-09 Thread Paul Kraus
On Sun, Apr 8, 2012 at 7:40 AM, Jim Klimov jimkli...@cos.ru wrote:
 2012-04-08 6:06, Richard Elling wrote:

 You can't get past the age-old idiom: you get what you pay for.


 True... but it can be somewhat countered with DrHouse-age idiom:
 people lie, even if they don't mean to ;)

I think both aspects over simplify the issue, but that is
generally true for idioms :-)

I see Moore's Law in effect in a bunch of ways here, not all of them good.

The available _capacity_ for a given _price_ at a given
_reliability_ and _performance_ increases with time. Two unintended
side effects of this increase in capacity _at_the_same_reliability_ is
that problems that a user was not likely to run into (statistically)
now occur on a more frequent basis and the time to correct (rebuild)
after failure is longer.

It used to be that we looked at cost vs. capacity vs. performance,
the old adage was pick two of the three. Today we need to add the
fourth parameter, reliability. I'm not sure the relationship between
the four is a linear one :-)

So if you have x dollars (or euros or yen or ...) you need to
decide how much capacity / performance / reliability you can get for
that x and what you need. For my home server, where I am very cost
conscious, I went with Seagate SATA drives, but I went for the ES2
series which _are_ rated for 7x24 operation and have a (slightly)
better reliability spec than the lower cost drives I compared them
with. In the past, my need for an increase in capacity caused
replacement of hard drives. This time I'm not sure if I'll run out of
capacity before the drives reach end of practical service life and
start failing.

-- 
{1-2-3-4-5-6-7-}
Paul Kraus
- Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
- Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
- Technical Advisor, RPI Players
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What's wrong with LSI 3081 (1068) + expander + (bad) SATA

2012-04-09 Thread Jeff Bacon
 Out of curiosity, are there any third-party hardware vendors
 that make server/storage chassis (Supermicro et al) who make
 SATA backplanes with the SAS interposers soldered on?

There doesn't seem to be much out there, though I haven't looked. 

 Would that make sense, or be cheaper/more reliable than
 having extra junk between the disk and backplane connectors?
 (if I correctly understand what the talk is about? ;)

Honestly, probably not. Either the expander chip properly handles
SATA-tunneling, or it doesn't. Shoving an interposer in just 
throws a band-aid on the problem IMO. 

 ZFS was very attractive at first because of the claim that
 it returns Inexpensive into raId and can do miracles
 with SATA disks. Reality has shown to many of us that
 many SATA implementations existing in the wild should
 be avoided... so we're back to good vendors' higher end
 expensive SATAs or better yet SAS drives. Not inexpensive
 anymore again :(

Many != all. Not that I've tried a whole bunch of them, mind
you.However, I've found all of the SuperMicro SAS1 backplanes
to be somewhat problematic with SATA drives, especially if
you use the 1068-based controllers. It was horrible with the
really old single-digit-Phase firmware. I find it...
acceptable...with 2008-based controllers.

I've finally settled on having one box based on 1068s (3081s)
and I think it's up to 4 or 5 expanders of 16 1TB drives.
Basically, the box hangs when certain drives die in certain
ways - it eventually gets over it, mostly, but it can hang
for a bit. I might see the occasional hang until you yank
the bad disk, but drives don't die THAT often - even
3yr-old-seagate-cudas. Granted, most all the firmware on
the 333ASes has finally been updated to the CC1H version. 

I might note that that box represents most of the collection
of 1068+SAS1-based expanders that I have. It's an archival
system that doesn't do much at all (well, the CPU pounds
like hell running rtgpoll but that's a different matter
having nothing to do with the ZFS pools). I also have a
small pile of leftover 3081s and 3041s if anyone's
interested. :)

Now, I suspect that there is improved LSI firmware available
for the SAS1 expander chips. I could go chasing after it -
SMC doesn't have it public, but LSI probably has it somewhere
and I know an expert I could ask to go through and tweak my
controllers. However, I hadn't met him 3 years ago, and 
now it just isn't worth my time (or worth paying him to do it). 

On the other hand, I have two-hands-worth of CSE847-E26-RJBOD1s
stuffed with 'cuda 2T and 3T SATA drives, connected to 9211-8es
running the phase-10 firmware. One box is up to 170TB worth.
It's fine. Nary an issue. Granted, I'm not beating the arrays
to death - again, that's not what it's for, it's there to hang
onto a bunch of data. But it does get used, and I'm writing
200-300GB/day to it. I have another such JBOD attached to 
a box with a pile of constellations, and it causes no issues.

Frankly, I would say that yes ZFS _does_ do miracles with
inexpensive disks. I can trust 100s of TB to it and not
worry about losing any of it. But it wasn't written by Jesus;
it can't turn water into wine or deal with all of the terrible
variations of Crap out there into enterprise-level replacements
for your EMC arrays. Nor can it cope with having Any Old
Random Crap you have laying around thrown at it - but it does
surprisingly well, IMO. My home box is actually just a bunch
of random drives tied onto 3 3041 controllers on an old
overclocked Q6600 on an ASUS board and it's never had
a problem, not ever. 


 So, is there really a fundamental requirement to avoid
 cheap hardware, and are there no good ways to work around
 its inherently higher instability and lack of dependability?
 
 Or is it just a harder goal (indefinitely far away on the
 project roadmap)?

ZFS is no replacement for doing your research, homework,
and testing. God only knows I've gone through some crap -
I had a 20pk of WD 2TB Blacks that I bought that turned out
to work for $*%$. I suppose with enough patience and
effort I could have made them work, but Seagate's firmware
has just simply been more reliable, and the contents of that
box have filtered their way into desktops. (Some of which
are in that home machine mentioned above - attached directly
to the controller, no problems at all.)

If you're going to do BYO for your enterprise needs, be
prepared to fork over the additional cash you are going to
need to spend on test kit - defined both as kit to test
on, and kit you buy, test, and pitch because it don't
work like Vendor said or A don't play nice with B. 
Which sometimes is much cheaper than fighting with
Vendor A and Vendor B about why they can't work together.
Not to mention the RD time.

I don't avoid cheap hardware. Any number of my colleagues
would say I am insane for running what they would say should
be being run on EMC or NetApp on a handful of Solaris 10
fileservers built off raw SuperMicro boxes. But we 

[zfs-discuss] J4400 question (not directly ZFS)

2012-04-09 Thread Paul Kraus
Sorry for the off topic post, but I figure there is experience
here. I have a total of ten J4400 chassis all loaded with SATA drives.
Has anyone noticed a tendency for drives in specific slots to fail
more often than others? I have seen more drive failures in slot 20
than any other. I am wondering if there is something about slot 20
that may be causing drives to fail.

-- 
{1-2-3-4-5-6-7-}
Paul Kraus
- Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
- Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
- Technical Advisor, Troy Civic Theatre Company
- Technical Advisor, RPI Players
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] J4400 question (not directly ZFS)

2012-04-09 Thread Richard Elling
On Apr 9, 2012, at 7:10 AM, Paul Kraus wrote:

Sorry for the off topic post, but I figure there is experience
 here. I have a total of ten J4400 chassis all loaded with SATA drives.
 Has anyone noticed a tendency for drives in specific slots to fail
 more often than others? I have seen more drive failures in slot 20
 than any other. I am wondering if there is something about slot 20
 that may be causing drives to fail.

It depends on the failure mode. 

From an electrical perspective, not all slots are equal. Electrical issues can
manifest themselves as noise or higher than normal BER. This can often be
seen in the link state counters, but you need to check all link ends: HBA,
expanders, and disks. Symptoms can include: timeouts, retries, intermittent
data corruption, etc.

From a mechanical perspective, not all slots are equal. Different slots have
different vibration characteristics. However, these rarely result in failures 
and
there does not appear to be a correlation vibration and disk failures.
 -- richard


--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422






___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Seagate Constellation vs. Hitachi Ultrastar

2012-04-09 Thread Marion Hakanson
richard.ell...@richardelling.com said:
 We are starting to see a number of SAS HDDs that prefer logical-block to
 round-robin. I see this with late model Seagate and Toshiba HDDs.
 
 There is another, similar issue with recognition of multipathing by the
 scsi_vhci driver. Both of these are being tracked as https://www.illumos.org/
 issues/644 and there is an alternate scsi_vhci.conf file posted in that
 bugid.

Interesting, I just last week had a Toshiba come from Dell as a replacement
for a Seagate 2TB SAS drive;  On Solaris-10, the Toshiba insisted on showing
up as 2 drives, so mpxio was not recognizing it.  Fortunately I was able to
swap the drive for a Seagate, but I'll stash away a copy of the scsi_vhci.conf
entry for the future.


 We're considering making logical-block the default (as in above bugid) and we
 have not discovered a reason to keep round-robin. If you know of any reason
 why round-robin is useful, please add to the bugid. 

Should be fine.  When I first ran into this a couple years ago, I did a
lot of tests and found logical-block to be slower than none (with those
Seagate 2TB SAS drives in Dell MD1200's), but not a whole lot slower.
I vaguely recall that round-robin was better for highly random, small I/O 
(IOPS-intensive) workloads.

I got the best results by manually load-balancing half the drives to one
path and half the drives to the other path.  But I decided it was not
worth the effort.  Maybe if there was a way to automatically do that
(with a relatively static result)  Of course, this was all tested
on Solaris-10, so your mileage may vary.

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Seagate Constellation vs. Hitachi Ultrastar

2012-04-09 Thread Anh Quach
Are these issues something to watch out for on Solaris 11 as well? Thx in 
advance…

-Anh


On Apr 9, 2012, at 12:43 PM, Marion Hakanson wrote:

 richard.ell...@richardelling.com said:
 We are starting to see a number of SAS HDDs that prefer logical-block to
 round-robin. I see this with late model Seagate and Toshiba HDDs.
 
 There is another, similar issue with recognition of multipathing by the
 scsi_vhci driver. Both of these are being tracked as https://www.illumos.org/
 issues/644 and there is an alternate scsi_vhci.conf file posted in that
 bugid.
 
 Interesting, I just last week had a Toshiba come from Dell as a replacement
 for a Seagate 2TB SAS drive;  On Solaris-10, the Toshiba insisted on showing
 up as 2 drives, so mpxio was not recognizing it.  Fortunately I was able to
 swap the drive for a Seagate, but I'll stash away a copy of the scsi_vhci.conf
 entry for the future.
 
 
 We're considering making logical-block the default (as in above bugid) and we
 have not discovered a reason to keep round-robin. If you know of any reason
 why round-robin is useful, please add to the bugid. 
 
 Should be fine.  When I first ran into this a couple years ago, I did a
 lot of tests and found logical-block to be slower than none (with those
 Seagate 2TB SAS drives in Dell MD1200's), but not a whole lot slower.
 I vaguely recall that round-robin was better for highly random, small I/O 
 (IOPS-intensive) workloads.
 
 I got the best results by manually load-balancing half the drives to one
 path and half the drives to the other path.  But I decided it was not
 worth the effort.  Maybe if there was a way to automatically do that
 (with a relatively static result)  Of course, this was all tested
 on Solaris-10, so your mileage may vary.
 
 Regards,
 
 Marion
 
 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss