Re: Will Softraid RAID1 read from the fastest mirror/-s / supports user-specified device read priority order, nowadays? Takes broken disk out of use?

2016-02-16 Thread Nick Holland
On 02/15/16 16:02, Karel Gardas wrote:
>> ..And therefore you need enterprise disks because they behave "cleanly", as
>> when using those only, essentially full softraid QoS is maintained at all
>> times.
>
> Interesting! I've understand Nick excellent email in completely
> reversed sense. I understood it in "use consumer drives which fail
> really slowly and with degraded performance which will give you a
> chance to notice it at all. With enterprise, your drives may fail too
> quickly so there is a danger of failing drive in a array which is just
> rebuilding after another drive failure few hours ago".
>

And that's the way I meant it...

I've had maybe five drives do the "slow-fail" thing.  Maybe.  In 34
years, including selling and supporting thousands of computers at a very
successful store, working for a few very large companies, and working
with a lot of tiny companies.  I'd file that under "it happens, don't
wait up, and certainly don't design around it".

In contrast, the number of "fast failures" I've seen on "Enterprise
grade" stuff is ... stunning.  And, I think I've seen evidence of one
"event" taking multiple drives off-line at once, with predictable
results to the array.  Fix?  Remove and re-insert drive, and rebuild,
since there is really nothing wrong with the disk 80-90% of the time.
Oh, guess you need a hot-swap enclosure, then.

My experience can be summed up as: Simple systems have simple problems.
 "Enterprise Grade" stuff that is never supposed to break or go
down...will (due to complexity) and will stay that way for amazing
periods of time (due to your lack of preparation, because you don't
believe it will happen).

And when it comes to disk systems, IF "enterprise grade" *disks* are any
better (and I don't believe it), when combined with enterprise grade
enclosures and enterprise grade disk controllers and firmware and fancy
drivers...no question in my mind, consumer grade SATA disks on dull
interfaces win, hands down.  Remember, it isn't WHY you lost data that
matters (be it hardware, software or human error), just that you did.
(A common failure part in "enterprise grade" servers is the disk
backplane board.  There's almost no active electronics on it, but they
fail often.  they don't exist on a desktop pc.  I suspect the vibration
of drives cracks the solder joints).
with
My recommendation:
1) Plan for things to break.
2) Plan for ANYTHING to break.
3) Have an in-house way of dealing with whatever breaks.
4) Don't rely on others.  It's not their business that is down.
5) The people you paid to bail you out of 1 & 2 so you don't have to
worry about 3 and 4 WILL let you down and will not live up to their
promises, and when you read the fine print, you will realize there isn't
a damn thing you can do about it, 'cept pay them again when the contract
comes up.

And after you do that, you will realize that obsessing over "enterprise
grade" parts is not part of the design.


NOTE WELL: That's my opinion based on *my* experience (including what
was almost a "controlled experiment" along those lines).  Every
manufacturer out there says I'm wrong.  Most of my coworkers say I'm
wrong.  Every new technology (like SSDs) give another opportunity to
"change everything" (and the results always seem to be the same, but
maybe THIS time will be different).  If you follow my advice and things
blow up, you will look like an idiot, and I really don't want to hear
about it.  If you follow the mainstream mindset, you can always say,
"That's what (almost) everyone said is the right way, not my fault!".
Blindly following the opinions of some crackpot on the internet may be
foolish.  Blindly following the opinions of people who profit from what
they advise you will be expensive.

Nick.



Re: Will Softraid RAID1 read from the fastest mirror/-s / supports user-specified device read priority order, nowadays? Takes broken disk out of use?

2016-02-16 Thread lists
Tue, 16 Feb 2016 10:57:38 -0800 Chris Cappuccio 
> li...@wrant.com [li...@wrant.com] wrote:
> > 
> > Plan for your use case, and consult the man page and respective source
> > code on implementation details.  And flash storage disks are still
> > unreliable compared to spinning hard drives.  
> 
> Although I was a long proponent of read-only flash use, I've found the
> Samsung 845DC Pro and Samsung SM863 to be very durable in heavy write
> environments (heavily written-to monitoring database, mail server).

Thank you for the tip, I'll consider these in the future too.  I've
found Intel 35xx/37xx series to be the other option of better flash
drives currently on the market.

Yet, it's still not the same class of reliability.  This is not related
to OpenBSD, but my 20+ years of hard disks are still able to store and
retrieve data, after their long and useful production life.  I can not
validate this for any other flash or memory based storage device.

In present understanding data retention decay is still present in the
flash devices and can not meet spinning hard disks, and we all know
that's not going to change without improvement in battery ageing and
the type of cells used in the flash drives.

I insist on recommending pairing any storage type device in soft-RAID
and not mixing device types in the same array, advising the reliable
parts despite hating the enterprise server tax for personal use.

This and advanced engineering knowledge on the basis of technical
specifications and hardware documentation, to compliment the incredibly
useful OpenBSD software man pages and source code.  For kids: don't
forget to make a copy of your important files.



Re: Will Softraid RAID1 read from the fastest mirror/-s / supports user-specified device read priority order, nowadays? Takes broken disk out of use?

2016-02-16 Thread Chris Cappuccio
li...@wrant.com [li...@wrant.com] wrote:
> 
> Plan for your use case, and consult the man page and respective source
> code on implementation details.  And flash storage disks are still
> unreliable compared to spinning hard drives.

Although I was a long proponent of read-only flash use, I've found the
Samsung 845DC Pro and Samsung SM863 to be very durable in heavy write
environments (heavily written-to monitoring database, mail server).



Re: Will Softraid RAID1 read from the fastest mirror/-s / supports user-specified device read priority order, nowadays? Takes broken disk out of use?

2016-02-16 Thread lists
Mon, 15 Feb 2016 22:03:13 +0100 Karel Gardas 
> > ..And therefore you need enterprise disks because they behave "cleanly", as
> > when using those only, essentially full softraid QoS is maintained at all
> > times.  
> 
> Interesting! I've understand Nick excellent email in completely
> reversed sense.

That does not reverse the advice however.  Double slow speed read again
carefully ;-)

> I understood it in "use consumer drives which fail
> really slowly and with degraded performance which will give you a
> chance to notice it at all.

This is not the concept.  It is more an important technological
prerequisite many people don't know exists in the hardware RAID world.

> With enterprise, your drives may fail too
> quickly so there is a danger of failing drive in a array which is just
> rebuilding after another drive failure few hours ago".

That's not the takeaway advice.  That would be: have in mind some
controllers reject a drive which is still operational but does not meet
the controller timeout.  More like: hardware RAID controllers twist
your hands to buy enterprise class disks and replace them more
diligently before they actually reach the fail state on continuous
usage timing parameters.

Plan for your use case, and consult the man page and respective source
code on implementation details.  And flash storage disks are still
unreliable compared to spinning hard drives.



Re: Will Softraid RAID1 read from the fastest mirror/-s / supports user-specified device read priority order, nowadays? Takes broken disk out of use?

2016-02-15 Thread Karel Gardas
> ..And therefore you need enterprise disks because they behave "cleanly", as
> when using those only, essentially full softraid QoS is maintained at all
> times.

Interesting! I've understand Nick excellent email in completely
reversed sense. I understood it in "use consumer drives which fail
really slowly and with degraded performance which will give you a
chance to notice it at all. With enterprise, your drives may fail too
quickly so there is a danger of failing drive in a array which is just
rebuilding after another drive failure few hours ago".



Re: Will Softraid RAID1 read from the fastest mirror/-s / supports user-specified device read priority order, nowadays? Takes broken disk out of use?

2016-02-15 Thread Tinker

Constantine,

Just basically followup to say that I agree with you.

On 2016-02-15 17:41, Constantine A. Murenin wrote:

On 13 February 2016 at 08:50, Tinker  wrote:

Hi,

1)
http://www.openbsd.org/papers/asiabsdcon2010_softraid/softraid.pdf 
page 3
"2.2 RAID 1" says that it reads "on a round-robin basis from all 
active

chunks", i.e. read operations are spread evenly across disks.


Yes, that's still the case today:

..

There are presently no optimisations in-tree, but


the softraid policies are so simple that it's really easy to hack it up 
to do

something else that you may want.


That is awesome.

Since then did anyone implement selective reading based on experienced 
read

operation time, or a user-specified device read priority order?


That would make the code less readable!  :-)


That is indeed an excellent reason for not adding an additional feature 
- couldn't agree with you more.


Added complexity is (the root of all) 'evil'.

That would allow Softraid RAID1 based on 1 SSD mirror + 1 SSD mirror + 
1 HDD
mirror, which would give the best combination of IO performance and 
data

security OpenBSD would offer today.


Not sure what'd be the practical point of such a setup.  Your writes
will still be limited by the slowest component, and IOPS specs are
vastly different between SSDs and HDDs.  (And modern SSDs are no
longer considered nearly as unreliable as they once were.)


Yeah. I'm half-unwillingly starting to agree with that (discussed in 
depth with Nick in the previous email).



2)
Also if there's a read/write failure (or excessive time consumption 
for a
single operation, say 15 seconds), will Softraid RAID1 learn to take 
the

broken disk out of use?


A failure in a softraid1 chunk will result in the chunk being taken 
offline.
(What constitutes a failure is most likely outside of softraid's 
control.)


My best understanding today is that Nick clarified this in the previous 
post, that is, he clarified that softraid doesn't actually have any IO 
operation timeouts, and IO lag will not lead to softraid plugging out a 
disk - only a disconnect or specific disk failure SMART command from the 
underlying disk will have that effect on softraid (of causing that 
respective physical disk to be automatically disconnected).


..And therefore you need enterprise disks because they behave "cleanly", 
as when using those only, essentially full softraid QoS is maintained at 
all times.


Best regards,
Tinker



Re: Will Softraid RAID1 read from the fastest mirror/-s / supports user-specified device read priority order, nowadays? Takes broken disk out of use?

2016-02-15 Thread Tinker

Dear Nick,

On 2016-02-15 05:29, Nick Holland wrote:

On 02/13/16 11:49, Tinker wrote:

Hi,

1)
http://www.openbsd.org/papers/asiabsdcon2010_softraid/softraid.pdf 
page

3 "2.2 RAID 1" says that it reads "on a round-robin basis from all
active chunks", i.e. read operations are spread evenly across disks.

Since then did anyone implement selective reading based on experienced
read operation time, or a user-specified device read priority order?


That would allow Softraid RAID1 based on 1 SSD mirror + 1 SSD mirror + 
1
HDD mirror, which would give the best combination of IO performance 
and

data security OpenBSD would offer today.


I keep flip-flopping on the merits of this.
At one point, I was with you, thinking, "great idea!  Back an 
expensive,

fast disk with a cheap disk".

Currently, I'm thinking, "REALLY BAD IDEA".  Here's my logic:

There's no such thing as an "expensive disk" anymore.  A quick look

..

of "fast" storage to make their very few business apps run better.  No
question in their mind, it was worth it.  Now we do much more with our
computers and it costs much less.  The business value of our investment
should be much greater than it was in 1982.

And ignoring hardware, it is.  Companies drop thousands of dollars on
consulting and assistance and think nothing of it.  And in a major
computer project, a couple $1000 disks barely show as a blip on the
budget.  Hey, I'm all about being a cheap bastard whenever possible, 
but

this just isn't a reasonable place to be cheap, so not somewhere I'd
suggest spending developer resources.


Also ... it's probably a bad idea for functional reasons.  You can't
just assume that "slower" is better than "nothing" -- very often, it's
indistinguishable from "nothing".  In many cases, computer systems that
perform below a certain speed are basically non-functional, as tasks 
can

pile up on them faster than they can produce results.  Anyone who has
dealt with an overloaded database server, mail server or firewall will
know what I'm saying here -- at a certain load, they go from "running
ok" to "death spiral", and they do it very quickly.

If you /need/ the speed of an SSD, you can justify the cost of a pair 
of

'em.  If you can't justify the cost, you are really working with a
really unimportant environment, and you can either wait for two cheap
slow disks or skip the RAID entirely.

How fast do you need to get to your porn, anyway?


I technically agree with you -


What lead me to think about SDD+HDD was the idea of having on the same 
mountpoint a hybrid-SSD-HDD storage where the "important stuff" would be 
automatically in the SSD and the "less important" on the HDD.


This symmetry would mean that those two data sets could be stored within 
one and the same directory structure, which would be really handy, and 
archiving of unused files would be implicit.


I understand that ZFS is quite good at delivering this. LSI MegRaid 
cards are good at that as long as the "important stuff" is forever 
<512GB, which is not the case, duh.


This whole idea has a really exotic, unpredictable, ""stinking"" edge to 
it though. Your "slower" is generally as bad as "nothing" allegory 
combined with the market price situation, makes all sense -


So, even if kind of unwillingly, I must agree with your reasoning.



(now ... that being said, part of me would love a tmpfs / disk RAID1,
one that would come up degraded, and the disk would populate the RAM
disk, writes would go to both subsystems, reads would come from the RAM
disk once populated.  I could see this for some applications like CVS
repositories or source directories where things are "read mostly", and
typically smaller than a practical RAM size these days, and as there 
are

still a few orders of magnitude greater performance in a RAM disk than
an SSD and this will likely remain true for a while, there are SOME
applications where this could be nice)


Wait.. you mean you would like OpenBSD to implement read cache that is 
"100% caching agressive" rather than the current "buffer cache" which 
has "dynamic caching agressiveness" - I don't understand how this could 
make sense, can you please clarify?



2)
Also if there's a read/write failure (or excessive time consumption 
for

a single operation, say 15 seconds), will Softraid RAID1 learn to take
the broken disk out of use?


As far as I am aware, Softraid (like most RAID systems, hw or sw) will
deactivate a drive which reports a failure.  Drives which go super slow
(i.e., always manage to get the data BEFORE the X'th retry at which 
they
would toss an error) never report an error back, so never deactivate 
the

drive.

Sound implausible?  Nope.  It Happens.  Frustrating as heck when you
have this happen to you until you figure it out.  In fact, one key
feature of "enterprise" and "RAID" grade disks is that when they hop
off-line and throw an error fast and early, to prevent this problem
(some "NAS" grade disks may do this.  Or they may just see your credit

Re: Will Softraid RAID1 read from the fastest mirror/-s / supports user-specified device read priority order, nowadays? Takes broken disk out of use?

2016-02-15 Thread Constantine A. Murenin
On 13 February 2016 at 08:50, Tinker  wrote:
> Hi,
>
> 1)
> http://www.openbsd.org/papers/asiabsdcon2010_softraid/softraid.pdf page 3
> "2.2 RAID 1" says that it reads "on a round-robin basis from all active
> chunks", i.e. read operations are spread evenly across disks.

Yes, that's still the case today:

http://bxr.su/o/sys/dev/softraid_raid1.c#sr_raid1_rw

345rt = 0;
346ragain:
347/* interleave reads */
348chunk = sd->mds.mdd_raid1.sr1_counter++ %
349sd->sd_meta->ssdi.ssd_chunk_no;
350scp = sd->sd_vol.sv_chunks[chunk];
351switch (scp->src_meta.scm_status) {

356case BIOC_SDOFFLINE:

359if (rt++ < sd->sd_meta->ssdi.ssd_chunk_no)
360goto ragain;

There are presently no optimisations in-tree, but the softraid
policies are so simple that it's really easy to hack it up to do
something else that you may want.

>
> Since then did anyone implement selective reading based on experienced read
> operation time, or a user-specified device read priority order?

That would make the code less readable!  :-)

>
>
> That would allow Softraid RAID1 based on 1 SSD mirror + 1 SSD mirror + 1 HDD
> mirror, which would give the best combination of IO performance and data
> security OpenBSD would offer today.

Not sure what'd be the practical point of such a setup.  Your writes
will still be limited by the slowest component, and IOPS specs are
vastly different between SSDs and HDDs.  (And modern SSDs are no
longer considered nearly as unreliable as they once were.)

>
> 2)
> Also if there's a read/write failure (or excessive time consumption for a
> single operation, say 15 seconds), will Softraid RAID1 learn to take the
> broken disk out of use?

A failure in a softraid1 chunk will result in the chunk being taken
offline.  (What constitutes a failure is most likely outside of
softraid's control.)

C.



Re: Will Softraid RAID1 read from the fastest mirror/-s / supports user-specified device read priority order, nowadays? Takes broken disk out of use?

2016-02-14 Thread Nick Holland
On 02/13/16 11:49, Tinker wrote:
> Hi,
> 
> 1)
> http://www.openbsd.org/papers/asiabsdcon2010_softraid/softraid.pdf page 
> 3 "2.2 RAID 1" says that it reads "on a round-robin basis from all 
> active chunks", i.e. read operations are spread evenly across disks.
> 
> Since then did anyone implement selective reading based on experienced 
> read operation time, or a user-specified device read priority order?
> 
> 
> That would allow Softraid RAID1 based on 1 SSD mirror + 1 SSD mirror + 1 
> HDD mirror, which would give the best combination of IO performance and 
> data security OpenBSD would offer today.

I keep flip-flopping on the merits of this.
At one point, I was with you, thinking, "great idea!  Back an expensive,
fast disk with a cheap disk".

Currently, I'm thinking, "REALLY BAD IDEA".  Here's my logic:

There's no such thing as an "expensive disk" anymore.  A quick look
shows me that I can WALK INTO my local computer store and pick up a 2TB
SSD for under $1000US.  Now, that looks like a lot of money, and as a
life-long cheapskate, when I get to four digits, I'm expecting at least
two wheels and an engine.  But in the Big Picture?  No.  That's one heck
of a lot of stunningly fast storage for a reasonable chunk of change.

Thirty-four years ago when I started in this business, I was installing
10MB disks for $2000/ea as fast as we could get the parts (and at that
time, you could get a darned nice car for five of those drives, and a
new Corvette cost less than ten of them).  Now sure, the price has
dropped a whole lot since then, and my first reaction would be "What
does that have to do anything?  I can buy 2TB disks for under $100,
that's a huge savings over the SSD!"  In raw dollars, sure.  Percentage?
 Sure.  In "value to business"?  I don't think so.  In 1982, people felt
the computers of the day were worth adding $2000 to to get a tiny amount
of "fast" storage to make their very few business apps run better.  No
question in their mind, it was worth it.  Now we do much more with our
computers and it costs much less.  The business value of our investment
should be much greater than it was in 1982.

And ignoring hardware, it is.  Companies drop thousands of dollars on
consulting and assistance and think nothing of it.  And in a major
computer project, a couple $1000 disks barely show as a blip on the
budget.  Hey, I'm all about being a cheap bastard whenever possible, but
this just isn't a reasonable place to be cheap, so not somewhere I'd
suggest spending developer resources.


Also ... it's probably a bad idea for functional reasons.  You can't
just assume that "slower" is better than "nothing" -- very often, it's
indistinguishable from "nothing".  In many cases, computer systems that
perform below a certain speed are basically non-functional, as tasks can
pile up on them faster than they can produce results.  Anyone who has
dealt with an overloaded database server, mail server or firewall will
know what I'm saying here -- at a certain load, they go from "running
ok" to "death spiral", and they do it very quickly.

If you /need/ the speed of an SSD, you can justify the cost of a pair of
'em.  If you can't justify the cost, you are really working with a
really unimportant environment, and you can either wait for two cheap
slow disks or skip the RAID entirely.

How fast do you need to get to your porn, anyway?

(now ... that being said, part of me would love a tmpfs / disk RAID1,
one that would come up degraded, and the disk would populate the RAM
disk, writes would go to both subsystems, reads would come from the RAM
disk once populated.  I could see this for some applications like CVS
repositories or source directories where things are "read mostly", and
typically smaller than a practical RAM size these days, and as there are
still a few orders of magnitude greater performance in a RAM disk than
an SSD and this will likely remain true for a while, there are SOME
applications where this could be nice)


> 2)
> Also if there's a read/write failure (or excessive time consumption for 
> a single operation, say 15 seconds), will Softraid RAID1 learn to take 
> the broken disk out of use?

As far as I am aware, Softraid (like most RAID systems, hw or sw) will
deactivate a drive which reports a failure.  Drives which go super slow
(i.e., always manage to get the data BEFORE the X'th retry at which they
would toss an error) never report an error back, so never deactivate the
drive.

Sound implausible?  Nope.  It Happens.  Frustrating as heck when you
have this happen to you until you figure it out.  In fact, one key
feature of "enterprise" and "RAID" grade disks is that when they hop
off-line and throw an error fast and early, to prevent this problem
(some "NAS" grade disks may do this.  Or they may just see your credit
limit hasn't been reached).

However, having done this for a looong time, and seen the problems from
both rapid-failure and "try and try" disks, I'll take the "try and try"
problem any 

Will Softraid RAID1 read from the fastest mirror/-s / supports user-specified device read priority order, nowadays? Takes broken disk out of use?

2016-02-13 Thread Tinker

Hi,

1)
http://www.openbsd.org/papers/asiabsdcon2010_softraid/softraid.pdf page 
3 "2.2 RAID 1" says that it reads "on a round-robin basis from all 
active chunks", i.e. read operations are spread evenly across disks.


Since then did anyone implement selective reading based on experienced 
read operation time, or a user-specified device read priority order?



That would allow Softraid RAID1 based on 1 SSD mirror + 1 SSD mirror + 1 
HDD mirror, which would give the best combination of IO performance and 
data security OpenBSD would offer today.


2)
Also if there's a read/write failure (or excessive time consumption for 
a single operation, say 15 seconds), will Softraid RAID1 learn to take 
the broken disk out of use?


Thanks,
Tinker