Re: Storage server HW advice/feedback req for setup overall & in particular reliability/QoS of SATA, to protect from controller- or BIOS-induced system crashes? Dedicated PCI SATA HBA needed??

2016-02-19 Thread Zé Loff
On Fri, Feb 19, 2016 at 05:01:19PM +0700, Tinker wrote:
> On 2016-02-17 01:01, j...@bitminer.ca wrote:
> ..
> >Why do you think you need to build such a device?  Why don't you buy it?
> >
> >(Dell PowerEdge VRTX, HP hyper converged, etc)
> 
> Colocation requires rack servers, but thanks for thinking about it.
> 
> >Some important things:
> >
> > - what is the purpose of this collection of clients, servers,
> >networks and software?
> > - who will judge, and how will they judge, the effectiveness of it?
> >How fast/correctly it performs not just reliability
> > - what is their budget?
> > - how much time will they give you?
> > - how will you spend your time?
> > - how will you prove to yourself that you have finished?  How can you
> >prove to your users/customer that it works?
> 
> 
> I feel the general takehome from this conversation and Nick Holland's
> suggestions, is that really anything might break and everything needs to be
> set up to handle that.
> 
> So like, all of
> 
>  * data integrity verifications,
>  * checksumming of everything,
>  * automated routines to take a node out of use in case of IO slowdown or
> failure, or any other error, and
>  * live syncing of important stuff to another datacenter for the case o
> power failures
> 
> need to be in place for there to be any real data integrity + QoS
> guarantees.
> 
> 
> Thanks!

You keep forgetting "planning for the manual stuff that will need to be
done when all automated stuff fails".



Re: Storage server HW advice/feedback req for setup overall & in particular reliability/QoS of SATA, to protect from controller- or BIOS-induced system crashes? Dedicated PCI SATA HBA needed??

2016-02-19 Thread Tinker

On 2016-02-17 01:01, j...@bitminer.ca wrote:
..
Why do you think you need to build such a device?  Why don't you buy 
it?


(Dell PowerEdge VRTX, HP hyper converged, etc)


Colocation requires rack servers, but thanks for thinking about it.


Some important things:

 - what is the purpose of this collection of clients, servers,
networks and software?
 - who will judge, and how will they judge, the effectiveness of it?
How fast/correctly it performs not just reliability
 - what is their budget?
 - how much time will they give you?
 - how will you spend your time?
 - how will you prove to yourself that you have finished?  How can you
prove to your users/customer that it works?



I feel the general takehome from this conversation and Nick Holland's 
suggestions, is that really anything might break and everything needs to 
be set up to handle that.


So like, all of

 * data integrity verifications,
 * checksumming of everything,
 * automated routines to take a node out of use in case of IO slowdown 
or failure, or any other error, and
 * live syncing of important stuff to another datacenter for the case o 
power failures


need to be in place for there to be any real data integrity + QoS 
guarantees.



Thanks!



Re: Storage server HW advice/feedback req for setup overall & in particular reliability/QoS of SATA, to

2016-02-16 Thread j

Hi,

This is to ask you for your thoughts/advice on the best hardware setup
for an OpenBSD server.


Oh where to start.  You have a lot of enthusiasm clearly but not a lot 
of

experience.

OK, I'll bite.

"best" is subjective.  The server(s) will be surrounded
by clients (they are servers after all).  What is the best client for
this best server?  What is the purpose of this collection of servers
and clients?  What is your budget?  Who will evaluate this system and on
what basis will they describe it as successful or not?



This email ultimately reduces to the question, "What HW & config do you
suggest for minimizing the possibility of IO freeze or system crash 
from

BIOS or SATA card, in the event of SSD/HDD malfunction?", however I'll
take the whole reasoning around the HW choice from ground up with you
just to see that you feel that I got it all right.



This post and others seem to show you are very concerned with I/O 
freeze.

Yet that is a rare occurence, by comparison to hundreds of other
possibilities for system failure.  AC power failure, for instance.


I hope this email will serve as general advice for others re. best
practice for OpenBSD server hardware choices.

GOAL
I am setting up an SSD-based storage facility that needs high data
integrity guarantees and high performance (random reads/writes). The
goal is to be able to safely store and constantly process something
about as important as, say, medical records.


"high" and "guarantee" are mutually incompatible.  You either get a 
guarantee
or you don't.  (Any guarantee is unlikely to be credible.)  Now, if they 
said
"perfect" and "guarantee" then your statement would be correct, however, 
still

unbelievable.  There is a disconnect here in the logic.



Needless to say, at some point such a storage server *will* fail, and
the only way to get to any sense of a pretty-much-100% uptime 
guarantee,
is to set up the facility in the form of multiple servers in a 
reduntant

cluster.


OK, now you have a choice: do you want to spend lots of money on highly
reliable servers, and cluster them, or spend less money on less reliable
servers and rely on the clustering for overall reliability?

OpenBSD does not support clustered filesystems, so here you must be 
assuming
some other non-OpenBSD package, such as from ports, to implement 
"clusters".


Is this right?



What the individual server can do then is to never ever deliver broken
data. And, locally and collectively there needs to be a well working
mechanism for detecting when a node needs maintenance & take it out of
use then.


Another error in logic.  "never ever" is incompatible with "*will* 
fail".


You might want to review how Netflix manages failure.  Look up "chaos 
monkey".
The gist of which is, based on a "will fail" assumption, they constantly 
test

handling failures.



What I want to ask you about nw then, is your thoughts on what would be
the most suitable hardware configuration for the individual server, for
them to function for as long as possible without need for physical
administrator intervention.


Why do you think you need to build such a device?  Why don't you buy it?

(Dell PowerEdge VRTX, HP hyper converged, etc)



(And for when physical admin intervention would be needed, to reduce
competence need for that maintenance if possible, to only involve
hotswapping or adding a physical disk - so that is to minimize need of
reboots due to SATA controller issues, weird BIOS behavior, or other
reasons.)

GENERAL PROBLEM SURFACE OF SERVER HARDWARE

It seems to me that the accumulated experience with respect to why
servers break, is 1) anything storage-related, 2) PSU, 3) other.


You don't give any source for this claim.  Check out various 
publications

by Google and other at-scale users about their experience.



So then, stability aspects should be given consideration in that order.

For 2), the PSU can be made redundant easily, and PSU failures are
fairly rare anyhow, so that is pretty much what is reasonable to do for
that.


You omit AC power failures, distribution panel faults, uninterruptible 
power
systems, power cables, unintended pressure by fingers roaming on/off 
buttons,

feet kicking power cables, and so on.  Why do you leave these risks out?



For 3), the "other" category would either be because of bad thermal
conditions (so that needs to be given proper consideration), or happen
anyhow, for which no safeguards exist anyhow, so we just need to take
that.

The rest of this post will discuss 1) the storage aspect, only.

THE STORAGE SOLUTION
Originally I thought RAID 5/6 would provide data integrity guarantees
and performance well. Then I saw the benchmark for a high-end RAID card
showing 25MB/sec write (= 95% overhead) and 80% overhead on reads
(http://www.storagereview.com/lsi_megaraid_sas3_93618i_review) per disk


The reference you cite says no such thing.  The word "overhead" does not
appear in the article.

That reference has some flaky methodology 

Storage server HW advice/feedback req for setup overall & in particular reliability/QoS of SATA, to protect from controller- or BIOS-induced system crashes? Dedicated PCI SATA HBA needed??

2016-02-16 Thread Tinker
Hi,

This is to ask you for your thoughts/advice on the best hardware setup
for an OpenBSD server.

This email ultimately reduces to the question, "What HW & config do you
suggest for minimizing the possibility of IO freeze or system crash from
BIOS or SATA card, in the event of SSD/HDD malfunction?", however I'll
take the whole reasoning around the HW choice from ground up with you
just to see that you feel that I got it all right.

I hope this email will serve as general advice for others re. best
practice for OpenBSD server hardware choices.

GOAL
I am setting up an SSD-based storage facility that needs high data
integrity guarantees and high performance (random reads/writes). The
goal is to be able to safely store and constantly process something
about as important as, say, medical records.

Needless to say, at some point such a storage server *will* fail, and
the only way to get to any sense of a pretty-much-100% uptime guarantee,
is to set up the facility in the form of multiple servers in a reduntant
cluster.

What the individual server can do then is to never ever deliver broken
data. And, locally and collectively there needs to be a well working
mechanism for detecting when a node needs maintenance & take it out of
use then.

What I want to ask you about now then, is your thoughts on what would be
the most suitable hardware configuration for the individual server, for
them to function for as long as possible without need for physical
administrator intervention.

(And for when physical admin intervention would be needed, to reduce
competence need for that maintenance if possible, to only involve
hotswapping or adding a physical disk - so that is to minimize need of
reboots due to SATA controller issues, weird BIOS behavior, or other
reasons.)

GENERAL PROBLEM SURFACE OF SERVER HARDWARE

It seems to me that the accumulated experience with respect to why
servers break, is 1) anything storage-related, 2) PSU, 3) other.

So then, stability aspects should be given consideration in that order.

For 2), the PSU can be made redundant easily, and PSU failures are
fairly rare anyhow, so that is pretty much what is reasonable to do for
that.

For 3), the "other" category would either be because of bad thermal
conditions (so that needs to be given proper consideration), or happen
anyhow, for which no safeguards exist anyhow, so we just need to take
that.

The rest of this post will discuss 1) the storage aspect, only.

THE STORAGE SOLUTION
Originally I thought RAID 5/6 would provide data integrity guarantees
and performance well. Then I saw the benchmark for a high-end RAID card
showing 25MB/sec write (= 95% overhead) and 80% overhead on reads
(http://www.storagereview.com/lsi_megaraid_sas3_93618i_review) per disk
set, which is enough to make me understand that the upcoming softraid
RAID1C with 2-4 drives will be far better at delivering those qualities
-

Of course I didn't see any benchmarks on RAID1C, but I guess its
overhead for both read and write will be <<10-15% in average at least
with its default CRC32C.

(Perhaps RAID1C needs to be fortified with a better checksumming
algorithm, and perhaps also double mirror reads on any read (depending
on how the scrubbing works - didn't check this yet), though that is a
separate conversation.)

Of course to really know how well RAID1C will perform, I would need to
benchmark it, but, there seems to be a general consensus in the RAID
community that checksummed mirroring is preferable to RAID 5/6, so like,
I perceive that this preliminary understanding I have that RAID1C will
be the winning option, is well founded.

The SSD:s would be enterprise grade and hence *should* shut down
immediately if they start malfunctioning, so there should be essentially
no QoS dumps in the softraid from any IO operations that take ultra-long
to complete e.g. >>>10 seconds.

For the RAID1C to really deliver then (now that PSU, CPU, RAM, and SSD
all work), all that would be needed is that the remaining factors
deliver well, so that is the SATA connectivity and that the BIOS
operates transparently.

HARDWARE BUDGET
A good Xeon Supermicro server with onboard SATA and ethernet with decent
PSU, RAM, CPU is some 1000:ds USD. 2TB x 2-3 enterprise SSD:s is around
2700-4000 USD. If any specialized SATA controllers if needed would be
below 2000 USD anyhow.

QUESTION
Someone with 30 years of admin experience warned me that in the case
that an individual storage drive dies, the SATA controller could crash,
or the BIOS could kill the whole system.

Also he warned me that if any disk in the boot softraid RAID1 would
break, then the BIOS could get so confused that the system even wouldn't
want to boot - and for that reason I guess the boot disks should be
separated altogether from the "data disks", as the further will have a
much, much lower turnover.

A SATA-controller- or BIOS-induced system crash, freeze, or other need
to reboot the system because of malfunction because of them, would be
really