Am Donnerstag, 5. Februar 2015, 16:45:17 schrieb Qu Wenruo:
> -------- Original Message --------
> Subject: Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree blocks,
> AKA dangerous mode.
> From: Martin Steigerwald <mar...@lichtvoll.de>
> To: Qu Wenruo <quwen...@cn.fujitsu.com>
> Date: 2015年02月05日 16:31
> 
> > Am Donnerstag, 5. Februar 2015, 09:35:26 schrieb Qu Wenruo:
> >> -------- Original Message --------
> >> Subject: Re: [PATCH 0/7] Allow btrfsck to reset csum of all tree
> >> blocks, AKA dangerous mode.
> >> From: Martin Steigerwald <mar...@lichtvoll.de>
> >> To: Qu Wenruo <quwen...@cn.fujitsu.com>
> >> Date: 2015年02月04日 17:16
> >> 
> >>> Am Mittwoch, 4. Februar 2015, 15:16:44 schrieb Qu Wenruo:
> >>>> Btrfs's metadata csum is a good mechanism, keeping bit error away
> >>>> from
> >>>> sensitive kernel. But such mechanism will also be too sensitive,
> >>>> like
> >>>> bit error in csum bytes or low all zero bits in nodeptr.
> >>>> It's a trade using "error tolerance" for stable, and is reasonable
> >>>> for
> >>>> most cases since there is DUP/RAID1/5/6/10 duplication level.
> >>>> 
> >>>> But in some case, whatever for development purpose or despair user
> >>>> who
> >>>> can't tolerant all his/her inline data lost, or even crazy QA team
> >>>> hoping btrfs can survive heavy random bits bombing, there are some
> >>>> guys
> >>>> want to get rid of the csum protection and face the crucial raw
> >>>> data
> >>>> no
> >>>> matter what disaster may happen.
> >>>> 
> >>>> So, introduce the new '--dangerous' (or "destruction"/"debug" if
> >>>> you
> >>>> like) option for btrfsck to reset all csum of tree blocks.
> >>> 
> >>> I often wondered about this: AFAIK if you get a csum error BTRFS
> >>> makes
> >>> this an input/output error. For being able to access the data in
> >>> place,
> >>> how about a "iwantmycorrupteddataback" mount option where BTRFS just
> >>> logs csum errors but allows one to access the files nonetheless.
> >> 
> >> The idea is good, but don't forget we have metadata(tree block) and
> >> data. For data, this is completely OK.
> >> But for metadata, this may be a disaster just like the --dangerous
> >> option.
> > 
> > Ah yes, so probably only do this for data or have an extra option for
> > skipping csum on metadata for the really desparate, but then I´d
> > really
> > force read only to avoid corrupted causing more damage.
> > 
> >>> This could even
> >>> work together with remount. Maybe it would be good not to allow
> >>> writing to broken csum blocks, i.e. fail these with input/output
> >>> error.
> >> 
> >> Don't forget btrfs' COW write.
> >> So write into data shouldn't be a problem.(if COW is enabled).
> > 
> > Yes, but… it hides the corruption. Unless you have a snapshot if an
> > application reads corrupted data and then writes it back, then you
> > have no indication that the data was corrupted in the first time.
> > 
> >>> This way, the csum would not be automatically fixed, *but* one is
> >>> able
> >>> to access the broken data, *while* knowing it is broken.
> >>> 
> >>> If that is possible already, I missed it.
> >> 
> >> Much as you considered, data csum can be rebuilt in btrfsck with
> >> --init-csum-tree option.
> >> Although not every user knows this feature and even less users know
> >> the
> >> correct timing using it.
> > 
> > I wonder about making a wiki page about recovery options with two
> > parts:
> > 
> > 1) Diagnosis. First find out what might be wrong.
> > 
> > 2) Cure. Then decide which steps to try to recover.
> 
> This seems really useful.
> 
> But I'm a little afraid of introducing too much info for end user,
> metadata/data, difference between btrfsck
> and scrub and tons of other things may make user confused.
> And more, this things should be done by btrfsck automatically...

Sure. The page should contain a disclaimer anyway, and I think its good to 
have it as easy as possible for the user. But also, for the early 
adopters, I think it is really good to have some guidance available, with 
the caveat to always ask here on the mailing list if unsure about next 
step.

> Beside this, wiki pages about real world btrfs recovery strategy is very
> helpful.
> Feel free to add, although I'm not sure how to add pages to btrfs wiki,
> maybe you need to contact Marc or
> David?

David, I requested a wiki account via the page and even made a (not quite 
serious) 50 words biography in order to pass that form.

Thanks,
Martin

> 
> Thanks,
> Qu
> 
> > And of cause an intro on best practice to only work on a copy of the
> > copy for any in-place repair attempts.
> > 
> > I´d be willing to make such a page, provided I get enough hints on
> > what to try when. I have some ideas myself, but I am not sure they
> > are accurate :)
> > 
> > Thanks,
> > Martin
> > 
> >> Thanks,
> >> Qu
> >> 
> >>>> The csum reseting have the following features:
> >>>> 1) Top to down level by level
> >>>> The csum resetting is done from tree to level 1, and only when all
> >>>> the
> >>>> csum of nodes in this level is reset and can pass read_tree_block()
> >>>> check, it will continue to next level.
> >>>> And all bytenr in nodeptr will be re-aligned, so bit error in the
> >>>> low
> >>>> 12 bits(4K sector size case) can also be repaired without pain.
> >>>> With this behavior, error in nodeptr has a chance not affecting its
> >>>> child.
> >>>> 
> >>>> 2) No Copy-on-write
> >>>> COW means we needs to have a valid extent tree, if extent tree is
> >>>> corrupted COW will only be a BUG_ON blocking us.
> >>>> So all the r/w in this dangerous mode will use no-cow write. That's
> >>>> why
> >>>> we export and slightly modified write_tree_block() to do no-cow
> >>>> tree
> >>>> block write with newly calculated csum.
> >>>> Since the write is not cowed, if it fails, it will also destroy the
> >>>> last hope for manual inspection.
> >>>> 
> >>>> Qu Wenruo (7):
> >>>>     btrfs-progs: Add btrfs_(prev/next)_tree_block() to keep search
> >>>>     result
> >>>>     
> >>>>       in     the same level of path->lowest_level.
> >>>>     
> >>>>     btrfs-progs: Introduce btrfs_next_slot() function to iterate to
> >>>>     next
> >>>>     
> >>>>         slot in given level.
> >>>>     
> >>>>     btrfs-progs: Allow btrfs_read_fs_root() to re-read the tree
> >>>>     node.
> >>>>     btrfs-progs: Export write_tree_block() and allow it to do nocow
> >>>>     write.
> >>>> 
> >>>> btrfs-progs: Introduce new function reset_tree_block_csum() for
> >>>> later
> >>>> tree block csum reset.
> >>>> 
> >>>>     btrfs-progs: Introduce new function
> >>>>     reset_(one_root/roots)_csum()
> >>>>     to
> >>>>     
> >>>>         reset one/all tree's csum in tree root.
> >>>>     
> >>>>     btrfs-progs: Introduce "--dangerous" option to reset all tree
> >>>>     block
> >>>>     
> >>>>        csum.
> >>>>    
> >>>>    cmds-check.c | 284
> >>>> 
> >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- ctree.c
> >>>> 
> >>>>    |  18 ++--
> >>>>    
> >>>>    ctree.h      |  25 +++++-
> >>>>    disk-io.c    |  55 +++++++++---
> >>>>    disk-io.h    |   3 +
> >>>>    5 files changed, 359 insertions(+), 26 deletions(-)

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to