Re: Will Softraid RAID1 read from the fastest mirror/-s / supports user-specified device read priority order, nowadays? Takes broken disk out of use?
On 02/15/16 16:02, Karel Gardas wrote: >> ..And therefore you need enterprise disks because they behave "cleanly", as >> when using those only, essentially full softraid QoS is maintained at all >> times. > > Interesting! I've understand Nick excellent email in completely > reversed sense. I understood it in "use consumer drives which fail > really slowly and with degraded performance which will give you a > chance to notice it at all. With enterprise, your drives may fail too > quickly so there is a danger of failing drive in a array which is just > rebuilding after another drive failure few hours ago". > And that's the way I meant it... I've had maybe five drives do the "slow-fail" thing. Maybe. In 34 years, including selling and supporting thousands of computers at a very successful store, working for a few very large companies, and working with a lot of tiny companies. I'd file that under "it happens, don't wait up, and certainly don't design around it". In contrast, the number of "fast failures" I've seen on "Enterprise grade" stuff is ... stunning. And, I think I've seen evidence of one "event" taking multiple drives off-line at once, with predictable results to the array. Fix? Remove and re-insert drive, and rebuild, since there is really nothing wrong with the disk 80-90% of the time. Oh, guess you need a hot-swap enclosure, then. My experience can be summed up as: Simple systems have simple problems. "Enterprise Grade" stuff that is never supposed to break or go down...will (due to complexity) and will stay that way for amazing periods of time (due to your lack of preparation, because you don't believe it will happen). And when it comes to disk systems, IF "enterprise grade" *disks* are any better (and I don't believe it), when combined with enterprise grade enclosures and enterprise grade disk controllers and firmware and fancy drivers...no question in my mind, consumer grade SATA disks on dull interfaces win, hands down. Remember, it isn't WHY you lost data that matters (be it hardware, software or human error), just that you did. (A common failure part in "enterprise grade" servers is the disk backplane board. There's almost no active electronics on it, but they fail often. they don't exist on a desktop pc. I suspect the vibration of drives cracks the solder joints). with My recommendation: 1) Plan for things to break. 2) Plan for ANYTHING to break. 3) Have an in-house way of dealing with whatever breaks. 4) Don't rely on others. It's not their business that is down. 5) The people you paid to bail you out of 1 & 2 so you don't have to worry about 3 and 4 WILL let you down and will not live up to their promises, and when you read the fine print, you will realize there isn't a damn thing you can do about it, 'cept pay them again when the contract comes up. And after you do that, you will realize that obsessing over "enterprise grade" parts is not part of the design. NOTE WELL: That's my opinion based on *my* experience (including what was almost a "controlled experiment" along those lines). Every manufacturer out there says I'm wrong. Most of my coworkers say I'm wrong. Every new technology (like SSDs) give another opportunity to "change everything" (and the results always seem to be the same, but maybe THIS time will be different). If you follow my advice and things blow up, you will look like an idiot, and I really don't want to hear about it. If you follow the mainstream mindset, you can always say, "That's what (almost) everyone said is the right way, not my fault!". Blindly following the opinions of some crackpot on the internet may be foolish. Blindly following the opinions of people who profit from what they advise you will be expensive. Nick.
Re: Will Softraid RAID1 read from the fastest mirror/-s / supports user-specified device read priority order, nowadays? Takes broken disk out of use?
Tue, 16 Feb 2016 10:57:38 -0800 Chris Cappuccio> li...@wrant.com [li...@wrant.com] wrote: > > > > Plan for your use case, and consult the man page and respective source > > code on implementation details. And flash storage disks are still > > unreliable compared to spinning hard drives. > > Although I was a long proponent of read-only flash use, I've found the > Samsung 845DC Pro and Samsung SM863 to be very durable in heavy write > environments (heavily written-to monitoring database, mail server). Thank you for the tip, I'll consider these in the future too. I've found Intel 35xx/37xx series to be the other option of better flash drives currently on the market. Yet, it's still not the same class of reliability. This is not related to OpenBSD, but my 20+ years of hard disks are still able to store and retrieve data, after their long and useful production life. I can not validate this for any other flash or memory based storage device. In present understanding data retention decay is still present in the flash devices and can not meet spinning hard disks, and we all know that's not going to change without improvement in battery ageing and the type of cells used in the flash drives. I insist on recommending pairing any storage type device in soft-RAID and not mixing device types in the same array, advising the reliable parts despite hating the enterprise server tax for personal use. This and advanced engineering knowledge on the basis of technical specifications and hardware documentation, to compliment the incredibly useful OpenBSD software man pages and source code. For kids: don't forget to make a copy of your important files.
Re: Will Softraid RAID1 read from the fastest mirror/-s / supports user-specified device read priority order, nowadays? Takes broken disk out of use?
li...@wrant.com [li...@wrant.com] wrote: > > Plan for your use case, and consult the man page and respective source > code on implementation details. And flash storage disks are still > unreliable compared to spinning hard drives. Although I was a long proponent of read-only flash use, I've found the Samsung 845DC Pro and Samsung SM863 to be very durable in heavy write environments (heavily written-to monitoring database, mail server).
Re: Will Softraid RAID1 read from the fastest mirror/-s / supports user-specified device read priority order, nowadays? Takes broken disk out of use?
Mon, 15 Feb 2016 22:03:13 +0100 Karel Gardas> > ..And therefore you need enterprise disks because they behave "cleanly", as > > when using those only, essentially full softraid QoS is maintained at all > > times. > > Interesting! I've understand Nick excellent email in completely > reversed sense. That does not reverse the advice however. Double slow speed read again carefully ;-) > I understood it in "use consumer drives which fail > really slowly and with degraded performance which will give you a > chance to notice it at all. This is not the concept. It is more an important technological prerequisite many people don't know exists in the hardware RAID world. > With enterprise, your drives may fail too > quickly so there is a danger of failing drive in a array which is just > rebuilding after another drive failure few hours ago". That's not the takeaway advice. That would be: have in mind some controllers reject a drive which is still operational but does not meet the controller timeout. More like: hardware RAID controllers twist your hands to buy enterprise class disks and replace them more diligently before they actually reach the fail state on continuous usage timing parameters. Plan for your use case, and consult the man page and respective source code on implementation details. And flash storage disks are still unreliable compared to spinning hard drives.
Re: Will Softraid RAID1 read from the fastest mirror/-s / supports user-specified device read priority order, nowadays? Takes broken disk out of use?
> ..And therefore you need enterprise disks because they behave "cleanly", as > when using those only, essentially full softraid QoS is maintained at all > times. Interesting! I've understand Nick excellent email in completely reversed sense. I understood it in "use consumer drives which fail really slowly and with degraded performance which will give you a chance to notice it at all. With enterprise, your drives may fail too quickly so there is a danger of failing drive in a array which is just rebuilding after another drive failure few hours ago".
Re: Will Softraid RAID1 read from the fastest mirror/-s / supports user-specified device read priority order, nowadays? Takes broken disk out of use?
Constantine, Just basically followup to say that I agree with you. On 2016-02-15 17:41, Constantine A. Murenin wrote: On 13 February 2016 at 08:50, Tinkerwrote: Hi, 1) http://www.openbsd.org/papers/asiabsdcon2010_softraid/softraid.pdf page 3 "2.2 RAID 1" says that it reads "on a round-robin basis from all active chunks", i.e. read operations are spread evenly across disks. Yes, that's still the case today: .. There are presently no optimisations in-tree, but the softraid policies are so simple that it's really easy to hack it up to do something else that you may want. That is awesome. Since then did anyone implement selective reading based on experienced read operation time, or a user-specified device read priority order? That would make the code less readable! :-) That is indeed an excellent reason for not adding an additional feature - couldn't agree with you more. Added complexity is (the root of all) 'evil'. That would allow Softraid RAID1 based on 1 SSD mirror + 1 SSD mirror + 1 HDD mirror, which would give the best combination of IO performance and data security OpenBSD would offer today. Not sure what'd be the practical point of such a setup. Your writes will still be limited by the slowest component, and IOPS specs are vastly different between SSDs and HDDs. (And modern SSDs are no longer considered nearly as unreliable as they once were.) Yeah. I'm half-unwillingly starting to agree with that (discussed in depth with Nick in the previous email). 2) Also if there's a read/write failure (or excessive time consumption for a single operation, say 15 seconds), will Softraid RAID1 learn to take the broken disk out of use? A failure in a softraid1 chunk will result in the chunk being taken offline. (What constitutes a failure is most likely outside of softraid's control.) My best understanding today is that Nick clarified this in the previous post, that is, he clarified that softraid doesn't actually have any IO operation timeouts, and IO lag will not lead to softraid plugging out a disk - only a disconnect or specific disk failure SMART command from the underlying disk will have that effect on softraid (of causing that respective physical disk to be automatically disconnected). ..And therefore you need enterprise disks because they behave "cleanly", as when using those only, essentially full softraid QoS is maintained at all times. Best regards, Tinker
Re: Will Softraid RAID1 read from the fastest mirror/-s / supports user-specified device read priority order, nowadays? Takes broken disk out of use?
Dear Nick, On 2016-02-15 05:29, Nick Holland wrote: On 02/13/16 11:49, Tinker wrote: Hi, 1) http://www.openbsd.org/papers/asiabsdcon2010_softraid/softraid.pdf page 3 "2.2 RAID 1" says that it reads "on a round-robin basis from all active chunks", i.e. read operations are spread evenly across disks. Since then did anyone implement selective reading based on experienced read operation time, or a user-specified device read priority order? That would allow Softraid RAID1 based on 1 SSD mirror + 1 SSD mirror + 1 HDD mirror, which would give the best combination of IO performance and data security OpenBSD would offer today. I keep flip-flopping on the merits of this. At one point, I was with you, thinking, "great idea! Back an expensive, fast disk with a cheap disk". Currently, I'm thinking, "REALLY BAD IDEA". Here's my logic: There's no such thing as an "expensive disk" anymore. A quick look .. of "fast" storage to make their very few business apps run better. No question in their mind, it was worth it. Now we do much more with our computers and it costs much less. The business value of our investment should be much greater than it was in 1982. And ignoring hardware, it is. Companies drop thousands of dollars on consulting and assistance and think nothing of it. And in a major computer project, a couple $1000 disks barely show as a blip on the budget. Hey, I'm all about being a cheap bastard whenever possible, but this just isn't a reasonable place to be cheap, so not somewhere I'd suggest spending developer resources. Also ... it's probably a bad idea for functional reasons. You can't just assume that "slower" is better than "nothing" -- very often, it's indistinguishable from "nothing". In many cases, computer systems that perform below a certain speed are basically non-functional, as tasks can pile up on them faster than they can produce results. Anyone who has dealt with an overloaded database server, mail server or firewall will know what I'm saying here -- at a certain load, they go from "running ok" to "death spiral", and they do it very quickly. If you /need/ the speed of an SSD, you can justify the cost of a pair of 'em. If you can't justify the cost, you are really working with a really unimportant environment, and you can either wait for two cheap slow disks or skip the RAID entirely. How fast do you need to get to your porn, anyway? I technically agree with you - What lead me to think about SDD+HDD was the idea of having on the same mountpoint a hybrid-SSD-HDD storage where the "important stuff" would be automatically in the SSD and the "less important" on the HDD. This symmetry would mean that those two data sets could be stored within one and the same directory structure, which would be really handy, and archiving of unused files would be implicit. I understand that ZFS is quite good at delivering this. LSI MegRaid cards are good at that as long as the "important stuff" is forever <512GB, which is not the case, duh. This whole idea has a really exotic, unpredictable, ""stinking"" edge to it though. Your "slower" is generally as bad as "nothing" allegory combined with the market price situation, makes all sense - So, even if kind of unwillingly, I must agree with your reasoning. (now ... that being said, part of me would love a tmpfs / disk RAID1, one that would come up degraded, and the disk would populate the RAM disk, writes would go to both subsystems, reads would come from the RAM disk once populated. I could see this for some applications like CVS repositories or source directories where things are "read mostly", and typically smaller than a practical RAM size these days, and as there are still a few orders of magnitude greater performance in a RAM disk than an SSD and this will likely remain true for a while, there are SOME applications where this could be nice) Wait.. you mean you would like OpenBSD to implement read cache that is "100% caching agressive" rather than the current "buffer cache" which has "dynamic caching agressiveness" - I don't understand how this could make sense, can you please clarify? 2) Also if there's a read/write failure (or excessive time consumption for a single operation, say 15 seconds), will Softraid RAID1 learn to take the broken disk out of use? As far as I am aware, Softraid (like most RAID systems, hw or sw) will deactivate a drive which reports a failure. Drives which go super slow (i.e., always manage to get the data BEFORE the X'th retry at which they would toss an error) never report an error back, so never deactivate the drive. Sound implausible? Nope. It Happens. Frustrating as heck when you have this happen to you until you figure it out. In fact, one key feature of "enterprise" and "RAID" grade disks is that when they hop off-line and throw an error fast and early, to prevent this problem (some "NAS" grade disks may do this. Or they may just see your credit
Re: Will Softraid RAID1 read from the fastest mirror/-s / supports user-specified device read priority order, nowadays? Takes broken disk out of use?
On 13 February 2016 at 08:50, Tinkerwrote: > Hi, > > 1) > http://www.openbsd.org/papers/asiabsdcon2010_softraid/softraid.pdf page 3 > "2.2 RAID 1" says that it reads "on a round-robin basis from all active > chunks", i.e. read operations are spread evenly across disks. Yes, that's still the case today: http://bxr.su/o/sys/dev/softraid_raid1.c#sr_raid1_rw 345rt = 0; 346ragain: 347/* interleave reads */ 348chunk = sd->mds.mdd_raid1.sr1_counter++ % 349sd->sd_meta->ssdi.ssd_chunk_no; 350scp = sd->sd_vol.sv_chunks[chunk]; 351switch (scp->src_meta.scm_status) { 356case BIOC_SDOFFLINE: 359if (rt++ < sd->sd_meta->ssdi.ssd_chunk_no) 360goto ragain; There are presently no optimisations in-tree, but the softraid policies are so simple that it's really easy to hack it up to do something else that you may want. > > Since then did anyone implement selective reading based on experienced read > operation time, or a user-specified device read priority order? That would make the code less readable! :-) > > > That would allow Softraid RAID1 based on 1 SSD mirror + 1 SSD mirror + 1 HDD > mirror, which would give the best combination of IO performance and data > security OpenBSD would offer today. Not sure what'd be the practical point of such a setup. Your writes will still be limited by the slowest component, and IOPS specs are vastly different between SSDs and HDDs. (And modern SSDs are no longer considered nearly as unreliable as they once were.) > > 2) > Also if there's a read/write failure (or excessive time consumption for a > single operation, say 15 seconds), will Softraid RAID1 learn to take the > broken disk out of use? A failure in a softraid1 chunk will result in the chunk being taken offline. (What constitutes a failure is most likely outside of softraid's control.) C.
Re: Will Softraid RAID1 read from the fastest mirror/-s / supports user-specified device read priority order, nowadays? Takes broken disk out of use?
On 02/13/16 11:49, Tinker wrote: > Hi, > > 1) > http://www.openbsd.org/papers/asiabsdcon2010_softraid/softraid.pdf page > 3 "2.2 RAID 1" says that it reads "on a round-robin basis from all > active chunks", i.e. read operations are spread evenly across disks. > > Since then did anyone implement selective reading based on experienced > read operation time, or a user-specified device read priority order? > > > That would allow Softraid RAID1 based on 1 SSD mirror + 1 SSD mirror + 1 > HDD mirror, which would give the best combination of IO performance and > data security OpenBSD would offer today. I keep flip-flopping on the merits of this. At one point, I was with you, thinking, "great idea! Back an expensive, fast disk with a cheap disk". Currently, I'm thinking, "REALLY BAD IDEA". Here's my logic: There's no such thing as an "expensive disk" anymore. A quick look shows me that I can WALK INTO my local computer store and pick up a 2TB SSD for under $1000US. Now, that looks like a lot of money, and as a life-long cheapskate, when I get to four digits, I'm expecting at least two wheels and an engine. But in the Big Picture? No. That's one heck of a lot of stunningly fast storage for a reasonable chunk of change. Thirty-four years ago when I started in this business, I was installing 10MB disks for $2000/ea as fast as we could get the parts (and at that time, you could get a darned nice car for five of those drives, and a new Corvette cost less than ten of them). Now sure, the price has dropped a whole lot since then, and my first reaction would be "What does that have to do anything? I can buy 2TB disks for under $100, that's a huge savings over the SSD!" In raw dollars, sure. Percentage? Sure. In "value to business"? I don't think so. In 1982, people felt the computers of the day were worth adding $2000 to to get a tiny amount of "fast" storage to make their very few business apps run better. No question in their mind, it was worth it. Now we do much more with our computers and it costs much less. The business value of our investment should be much greater than it was in 1982. And ignoring hardware, it is. Companies drop thousands of dollars on consulting and assistance and think nothing of it. And in a major computer project, a couple $1000 disks barely show as a blip on the budget. Hey, I'm all about being a cheap bastard whenever possible, but this just isn't a reasonable place to be cheap, so not somewhere I'd suggest spending developer resources. Also ... it's probably a bad idea for functional reasons. You can't just assume that "slower" is better than "nothing" -- very often, it's indistinguishable from "nothing". In many cases, computer systems that perform below a certain speed are basically non-functional, as tasks can pile up on them faster than they can produce results. Anyone who has dealt with an overloaded database server, mail server or firewall will know what I'm saying here -- at a certain load, they go from "running ok" to "death spiral", and they do it very quickly. If you /need/ the speed of an SSD, you can justify the cost of a pair of 'em. If you can't justify the cost, you are really working with a really unimportant environment, and you can either wait for two cheap slow disks or skip the RAID entirely. How fast do you need to get to your porn, anyway? (now ... that being said, part of me would love a tmpfs / disk RAID1, one that would come up degraded, and the disk would populate the RAM disk, writes would go to both subsystems, reads would come from the RAM disk once populated. I could see this for some applications like CVS repositories or source directories where things are "read mostly", and typically smaller than a practical RAM size these days, and as there are still a few orders of magnitude greater performance in a RAM disk than an SSD and this will likely remain true for a while, there are SOME applications where this could be nice) > 2) > Also if there's a read/write failure (or excessive time consumption for > a single operation, say 15 seconds), will Softraid RAID1 learn to take > the broken disk out of use? As far as I am aware, Softraid (like most RAID systems, hw or sw) will deactivate a drive which reports a failure. Drives which go super slow (i.e., always manage to get the data BEFORE the X'th retry at which they would toss an error) never report an error back, so never deactivate the drive. Sound implausible? Nope. It Happens. Frustrating as heck when you have this happen to you until you figure it out. In fact, one key feature of "enterprise" and "RAID" grade disks is that when they hop off-line and throw an error fast and early, to prevent this problem (some "NAS" grade disks may do this. Or they may just see your credit limit hasn't been reached). However, having done this for a looong time, and seen the problems from both rapid-failure and "try and try" disks, I'll take the "try and try" problem any
Will Softraid RAID1 read from the fastest mirror/-s / supports user-specified device read priority order, nowadays? Takes broken disk out of use?
Hi, 1) http://www.openbsd.org/papers/asiabsdcon2010_softraid/softraid.pdf page 3 "2.2 RAID 1" says that it reads "on a round-robin basis from all active chunks", i.e. read operations are spread evenly across disks. Since then did anyone implement selective reading based on experienced read operation time, or a user-specified device read priority order? That would allow Softraid RAID1 based on 1 SSD mirror + 1 SSD mirror + 1 HDD mirror, which would give the best combination of IO performance and data security OpenBSD would offer today. 2) Also if there's a read/write failure (or excessive time consumption for a single operation, say 15 seconds), will Softraid RAID1 learn to take the broken disk out of use? Thanks, Tinker