I prefer second option since it will show are the corrupted blocks related to race condition. First option needs to be run for a long time to be completely sure that it really fixes the issue.
2013/7/26 Matthew Von-Maszewski <[email protected]> > Vladimir, > > I apologize for not recognizing your name and previous contribution. I > just tend to think in terms of code and performance bottlenecks, not people. > > Your June contribution resulted in changes that were released in 1.4 and > 1.3.2. I and the team thank you. However, we have not isolated the source > of the corruption. We only know today that it does not happen very often. > We have a second, high transaction site, that has seen the same issue. > > I can offer you two non-release options: > > - I have a branch to 1.4.0 that fixes a potential, but unproven, race > condition. Details are here: > > https://github.com/basho/leveldb/wiki/mv-sst-fadvise > > You would have to build eleveldb locally and copy it into your executable > tree. The 1.4 leveldb and eleveldb work fine with Riak 1.3.x. should you > desire to limit changes to your production environment. > > > - I have code, soon to be a branch against 1.3.2, that only adds syslog > error messages to prove / disprove the race condition. You could take this > code and see if it reports problems. This route would help the community > and mostly me know the root cause is within the race condition addressed by > the mv-sst-fadvise branch. > > > The two options above are what I currently have to offer. I am actively > working to find the corruption source. The good news is that Riak will > naturally recover from a "bad CRC" when detected. The bad news is that the > Google defaults let some bad CRCs become good CRCs. Riak 1.4 and 1.3.2 > cannot identify those bad CRCs that became good CRCs. > > Matthew > > > > > On Jul 25, 2013, at 4:32 PM, Vladimir Shabanov <[email protected]> > wrote: > > Good. Will wait for doctor. > > A month ago I mailed about segmentation fault > > http://lists.basho.com/pipermail/riak-users_lists.basho.com/2013-June/012245.html > After looking at core dumps you have found this problem with CRC checks > being skipped. I enabled paranoid_checks and got my node up an running. > > I've also found that lost/BLOCKS.bad sometimes appears in partitions and > have sent you these blocks for further analysis. > > It's very interesting why corrupted data appears in the first place. Nodes > didn't crashed, hardware didn't failed. As I mentioned previously all my > machines are with ECC memory and Riak data is kept on ZFS filesystem (which > also checks CRC for all the data and doesn't report any CRC errors). So it > looks that data is somehow corrupted by Riak itself. > > lost/BLOCKS.bad are usually small 2-8kb and appears very infrequently > (once a week, once a month or never for many partitions). I found these > BLOCKS.bad in both data/leveldb and data/anti_entropy. So I have suspicion > that there is a bug in LevelDB. > > Looking at LOGs they are created during compactions: > "Moving corrupted block to lost/BLOCKS.bad (size 2393)" > but there is no more information. What kind of block is it, where it was > found. > > Is it possible to somehow find source of those BLOCKS.bad files? I'm > building Riak from sources, maybe it's possible to enable some additional > logging to find what these BLOCKS.bad are? > > > 2013/7/25 Matthew Von-Maszewski <[email protected]> > >> Vladimir, >> >> I can explain what happened, but not how to correct the problem. The >> gentleman that can walk you through a repair is tied up on another project, >> but he intends to respond as soon as he is able. >> >> We recently discovered / realized that Google's leveldb code does not >> check the CRC of each block rewritten during a compaction. This means that >> blocks with bad CRCs get read without being flagged as bad, then rewritten >> to a new file with a new, valid CRC. The corruption is now hidden. >> >> A more thorough discussion of the problem is found here: >> >> https://github.com/basho/leveldb/wiki/mv-verify-compactions >> >> >> We added code to the 1.3.2 and 1.4 Riak releases to have the block CRC >> checked during both read (Get) requests and compaction rewrites. This >> prevents future corruption hiding. Unfortunately, it does NOTHING for >> blocks already corrupted and rewritten with valid CRCs. You are >> encountering this latter condition. We have a developer advocate / client >> services person that has walked others through a fix via the Riak data >> replicas … >> >> … please hold and the doctor will be with you shortly. >> >> Matthew >> >> >> On Jul 24, 2013, at 9:39 PM, Vladimir Shabanov <[email protected]> >> wrote: >> >> Hello, >> >> Recently I've started expanding my Riak cluster and found that handoffs >> were continuously retried for one partition. >> >> Here are logs from two nodes >> https://gist.github.com/vshabanov/41282e622479fbe81974 >> >> The most interesting parts of logs are >> "Handoff receiver for partition ... exited abnormally after processing >> 2860338 objects: {{badarg,[{erlang,binary_to_term,..." >> and >> "bad argument in call to erlang:binary_to_term(<<131,104,...." >> >> Both nodes are running Riak 1.3.2 (old one was running 1.3.1 previously). >> >> >> When I've printed corrupted binary string I found that it corresponds to >> one value. >> >> When I've tried to "get" it, it was read OK but node with corrupted value >> shown the same binary_to_term error. >> >> When I've tried to delete corrupted value I've got timeout. >> >> >> I'm running machines with ECC memory and ZFS filesystem (which doesn't >> report any checksum failures) so I doubt data was silently corrupted on >> disk. >> >> LOG from corresponding LevelDB partition doesn't show any errors. But >> there is a lost/BLOCKS.bad file in this partition (7kb, created more than a >> month ago and looks like it doesn't contain corrupted value). >> >> At the moment I've stopped handoffs using "risk-admin transfer-limit 0". >> >> Why the value was corrupted? It there any way to remove it or fix it? >> _______________________________________________ >> riak-users mailing list >> [email protected] >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> >> > >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
