riak on SSDs - how to manage potential SSD failures

Alex Babkin Mon, 26 Nov 2012 10:25:57 -0800

Hi all

first post here, so please be kind :)


I have plans to build an experimental riak cluster out of cheap ARM
computing parts and consumer grade SSDs to measure performance and
experiment to assess production viability
I plan to use levelDB as the backend

One thing to be concerned of, in light of various SSD failure stories, is
of course a scenario of SSD failure and also the way it fails (some parts
of SSD space just aren't writable anymore, but still readable, i.e stuck at
some constant value). This may potentially result in a scenario where a
replicated record on two clusters, one with working SSD and one with
faulty, will have different data. Will riak try to account for this
scenario?

I'm trying to think of ways to mitigate this risk of nodes failing due to
these SSD failures or at least get an early indication of a failure
(however insignificant it may be).
Guess my first question should be "Does riak provide any form of checksums
or what not on the data it reads/writes, or it blindly trusts that the
backend/filesystem reads/writes data correctly?"

If not, are there any other tricks people use to trigger some alarm bells
that an SSD is 'going' ?

Thanks
Alex

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

riak on SSDs - how to manage potential SSD failures

Reply via email to