Am Donnerstag, 5. Februar 2015, 16:45:17 schrieb Qu Wenruo: > -------- Original Message -------- > Subject: Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks, > AKA dangerous mode. > From: Martin Steigerwald <mar...@lichtvoll.de> > To: Qu Wenruo <quwen...@cn.fujitsu.com> > Date: 2015年02月05日 16:31 > > > Am Donnerstag, 5. Februar 2015, 09:35:26 schrieb Qu Wenruo: > >> -------- Original Message -------- > >> Subject: Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree > >> blocks, AKA dangerous mode. > >> From: Martin Steigerwald <mar...@lichtvoll.de> > >> To: Qu Wenruo <quwen...@cn.fujitsu.com> > >> Date: 2015年02月04日 17:16 > >> > >>> Am Mittwoch, 4. Februar 2015, 15:16:44 schrieb Qu Wenruo: > >>>> Btrfs's metadata csum is a good mechanism, keeping bit error away > >>>> from > >>>> sensitive kernel. But such mechanism will also be too sensitive, > >>>> like > >>>> bit error in csum bytes or low all zero bits in nodeptr. > >>>> It's a trade using "error tolerance" for stable, and is reasonable > >>>> for > >>>> most cases since there is DUP/RAID1/5/6/10 duplication level. > >>>> > >>>> But in some case, whatever for development purpose or despair user > >>>> who > >>>> can't tolerant all his/her inline data lost, or even crazy QA team > >>>> hoping btrfs can survive heavy random bits bombing, there are some > >>>> guys > >>>> want to get rid of the csum protection and face the crucial raw > >>>> data > >>>> no > >>>> matter what disaster may happen. > >>>> > >>>> So, introduce the new '--dangerous' (or "destruction"/"debug" if > >>>> you > >>>> like) option for btrfsck to reset all csum of tree blocks. > >>> > >>> I often wondered about this: AFAIK if you get a csum error BTRFS > >>> makes > >>> this an input/output error. For being able to access the data in > >>> place, > >>> how about a "iwantmycorrupteddataback" mount option where BTRFS just > >>> logs csum errors but allows one to access the files nonetheless. > >> > >> The idea is good, but don't forget we have metadata(tree block) and > >> data. For data, this is completely OK. > >> But for metadata, this may be a disaster just like the --dangerous > >> option. > > > > Ah yes, so probably only do this for data or have an extra option for > > skipping csum on metadata for the really desparate, but then I´d > > really > > force read only to avoid corrupted causing more damage. > > > >>> This could even > >>> work together with remount. Maybe it would be good not to allow > >>> writing to broken csum blocks, i.e. fail these with input/output > >>> error. > >> > >> Don't forget btrfs' COW write. > >> So write into data shouldn't be a problem.(if COW is enabled). > > > > Yes, but… it hides the corruption. Unless you have a snapshot if an > > application reads corrupted data and then writes it back, then you > > have no indication that the data was corrupted in the first time. > > > >>> This way, the csum would not be automatically fixed, *but* one is > >>> able > >>> to access the broken data, *while* knowing it is broken. > >>> > >>> If that is possible already, I missed it. > >> > >> Much as you considered, data csum can be rebuilt in btrfsck with > >> --init-csum-tree option. > >> Although not every user knows this feature and even less users know > >> the > >> correct timing using it. > > > > I wonder about making a wiki page about recovery options with two > > parts: > > > > 1) Diagnosis. First find out what might be wrong. > > > > 2) Cure. Then decide which steps to try to recover. > > This seems really useful. > > But I'm a little afraid of introducing too much info for end user, > metadata/data, difference between btrfsck > and scrub and tons of other things may make user confused. > And more, this things should be done by btrfsck automatically...
Sure. The page should contain a disclaimer anyway, and I think its good to have it as easy as possible for the user. But also, for the early adopters, I think it is really good to have some guidance available, with the caveat to always ask here on the mailing list if unsure about next step. > Beside this, wiki pages about real world btrfs recovery strategy is very > helpful. > Feel free to add, although I'm not sure how to add pages to btrfs wiki, > maybe you need to contact Marc or > David? David, I requested a wiki account via the page and even made a (not quite serious) 50 words biography in order to pass that form. Thanks, Martin > > Thanks, > Qu > > > And of cause an intro on best practice to only work on a copy of the > > copy for any in-place repair attempts. > > > > I´d be willing to make such a page, provided I get enough hints on > > what to try when. I have some ideas myself, but I am not sure they > > are accurate :) > > > > Thanks, > > Martin > > > >> Thanks, > >> Qu > >> > >>>> The csum reseting have the following features: > >>>> 1) Top to down level by level > >>>> The csum resetting is done from tree to level 1, and only when all > >>>> the > >>>> csum of nodes in this level is reset and can pass read_tree_block() > >>>> check, it will continue to next level. > >>>> And all bytenr in nodeptr will be re-aligned, so bit error in the > >>>> low > >>>> 12 bits(4K sector size case) can also be repaired without pain. > >>>> With this behavior, error in nodeptr has a chance not affecting its > >>>> child. > >>>> > >>>> 2) No Copy-on-write > >>>> COW means we needs to have a valid extent tree, if extent tree is > >>>> corrupted COW will only be a BUG_ON blocking us. > >>>> So all the r/w in this dangerous mode will use no-cow write. That's > >>>> why > >>>> we export and slightly modified write_tree_block() to do no-cow > >>>> tree > >>>> block write with newly calculated csum. > >>>> Since the write is not cowed, if it fails, it will also destroy the > >>>> last hope for manual inspection. > >>>> > >>>> Qu Wenruo (7): > >>>> btrfs-progs: Add btrfs_(prev/next)_tree_block() to keep search > >>>> result > >>>> > >>>> in the same level of path->lowest_level. > >>>> > >>>> btrfs-progs: Introduce btrfs_next_slot() function to iterate to > >>>> next > >>>> > >>>> slot in given level. > >>>> > >>>> btrfs-progs: Allow btrfs_read_fs_root() to re-read the tree > >>>> node. > >>>> btrfs-progs: Export write_tree_block() and allow it to do nocow > >>>> write. > >>>> > >>>> btrfs-progs: Introduce new function reset_tree_block_csum() for > >>>> later > >>>> tree block csum reset. > >>>> > >>>> btrfs-progs: Introduce new function > >>>> reset_(one_root/roots)_csum() > >>>> to > >>>> > >>>> reset one/all tree's csum in tree root. > >>>> > >>>> btrfs-progs: Introduce "--dangerous" option to reset all tree > >>>> block > >>>> > >>>> csum. > >>>> > >>>> cmds-check.c | 284 > >>>> > >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- ctree.c > >>>> > >>>> | 18 ++-- > >>>> > >>>> ctree.h | 25 +++++- > >>>> disk-io.c | 55 +++++++++--- > >>>> disk-io.h | 3 + > >>>> 5 files changed, 359 insertions(+), 26 deletions(-) -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html