Re: [PATCH 1/2] Btrfs-progs: check, fix false error reports for shared prealloc extents
On 2018年03月14日 18:33, Filipe Manana wrote: > On Wed, Mar 14, 2018 at 1:32 AM, Qu Wenruowrote: >> >> >> On 2018年03月14日 02:47, fdman...@kernel.org wrote: >>> From: Filipe Manana >>> >>> Under some cases the filesystem checker reports an error when it finds >>> checksum items for an extent that is referenced by an inode as a prealloc >>> extent. Such cases are not an error when the extent is actually shared >>> (was cloned/reflinked) with other inodes and was written through one of >>> those other inodes. >>> >>> Example: >>> >>> $ mkfs.btrfs -f /dev/sdb >>> $ mount /dev/sdb /mnt >>> >>> $ touch /mnt/foo >>> $ xfs_io -c "falloc 0 256K" /mnt/foo >>> $ sync >>> >>> $ xfs_io -c "pwrite -S 0xab -b 0 256K" /mnt/foo >>> $ touch /mnt/bar >>> $ xfs_io -c "reflink /mnt/foo 0 0 256K" /mnt/bar >>> $ xfs_io -c "fsync" /mnt/bar >>> >>> >>> $ mount /dev/sdb /mnt >>> $ umount /mnt >>> >>> $ btrfs check /dev/sdc >>> Checking filesystem on /dev/sdb >>> UUID: 52d3006e-ee3b-40eb-aa21-e56253a03d39 >>> checking extents >>> checking free space cache >>> checking fs roots >>> root 5 inode 257 errors 800, odd csum item >>> ERROR: errors found in fs roots >>> found 688128 bytes used, error(s) found >>> total csum bytes: 256 >>> total tree bytes: 163840 >>> total fs tree bytes: 65536 >>> total extent tree bytes: 16384 >>> btree space waste bytes: 138819 >>> file data blocks allocated: 10747904 >>>referenced 10747904 >>> $ echo $? >>> 1 >>> >>> So tech check to not report such cases as errors by checking if the >>> extent is shared with other inodes and if so, consider it an error the >>> existence of checksum items only if all those other inodes are referencing >>> the extent as a prealloc extent. >>> This case can be hit often when running the generic/475 testcase from >>> fstests. >>> >>> A test case will follow in a separate patch. >>> >>> Signed-off-by: Filipe Manana >> >> Looks good overall, but still some small nitpicks. >> >> One is, the fix is only for original mode, while the fix itself has >> nothing mode dependent, so it's better to put the check into >> check/mode-original.[ch] so lowmem mode can also benefit from it. > > Ok, I started doing development on a release branch where the split > doesn't exist, them moved to the development branch and wrongly > assumed that shared code would be in main.c. >> >>> --- >>> check/main.c | 270 >>> ++- >>> 1 file changed, 268 insertions(+), 2 deletions(-) >>> >>> diff --git a/check/main.c b/check/main.c >>> index 392195ca..bb816311 100644 >>> --- a/check/main.c >>> +++ b/check/main.c >>> @@ -1424,6 +1424,264 @@ static int process_inode_extref(struct >>> extent_buffer *eb, >>> >>> } >>> >>> +/* >>> + * Check if the inode referenced by the given data reference uses the >>> extent >>> + * at disk_bytenr as a non-prealloc extent. >>> + * >>> + * Returns 1 if true, 0 if false and < 0 on error. >>> + */ >> >> The check itself (along with check_prealloc_shared_data_ref) is good >> enough to detect any regular file extents inside the prealloc extent. >> >> But when it finds any regular extents, it doesn't check if it's covered >> by csum correctly. >> >> For example: >> >> |<-- prealloc extent range -->| >> |<- the *only* regular extent ->| >> |< range covered by csum >| >> |<- ODD ->| >> >> Or >> |<-- prealloc extent range -->| >> |<- the *only* regular extent ->| >> |<- csum ->| >>|<-- ODD --->| >> >> So at least in theory, the best solution is not to just check if there >> is any regular extents inside the large prealloc extent, but also check >> how many csum bytes the regular extents contribute, and then compare it >> to the result of count_csum_range(). > > Yes, I considered that initially but then moved to this simpler, but > less accurate solution. > So the problem is more complex then you think, as a prealloc extent > can be partially written and then reflinked multiple times and > fsync'ed. > For example: > > create inode A with prealloc extent [0, 256k[ > sync > write into prealloc extent, range [0, 128k[ through inode A > reflink prealloc extent into inode B > reflink prealloc extent into inode C > reflink prealloc extent into inode D > fsync inodes B, C and D > power fail > remount to replay log > > Now using a simple counter to count how many bytes the are covered by > chekcsum items we would have, we would get 384K (128K * 3), more then > the size of the extent. > So we would have to track ranges and then ranges that partially > overlap (due to non-zero offsets in the extent items). Yep, I also find that we need that overlap check. > > And this could go even more complex. It's doable, but I wanted to keep > things simple for now. I'm mostly
Re: [PATCH 1/2] Btrfs-progs: check, fix false error reports for shared prealloc extents
On Wed, Mar 14, 2018 at 9:19 AM, Nikolay Borisovwrote: > > > On 13.03.2018 20:47, fdman...@kernel.org wrote: >> From: Filipe Manana >> >> Under some cases the filesystem checker reports an error when it finds >> checksum items for an extent that is referenced by an inode as a prealloc >> extent. Such cases are not an error when the extent is actually shared >> (was cloned/reflinked) with other inodes and was written through one of >> those other inodes. >> >> Example: >> >> $ mkfs.btrfs -f /dev/sdb >> $ mount /dev/sdb /mnt >> >> $ touch /mnt/foo >> $ xfs_io -c "falloc 0 256K" /mnt/foo >> $ sync >> >> $ xfs_io -c "pwrite -S 0xab -b 0 256K" /mnt/foo > ^^ > nit: You are actually missing the argument for the -b switch, how big > should the blocks be ? Same thing applies to the commit message of your test -b is there because originally I had a value there. It's irrelevant the block size value, I simply forgot to delete -b. > >> $ touch /mnt/bar >> $ xfs_io -c "reflink /mnt/foo 0 0 256K" /mnt/bar >> $ xfs_io -c "fsync" /mnt/bar >> >> >> $ mount /dev/sdb /mnt >> $ umount /mnt >> >> $ btrfs check /dev/sdc >> Checking filesystem on /dev/sdb >> UUID: 52d3006e-ee3b-40eb-aa21-e56253a03d39 >> checking extents >> checking free space cache >> checking fs roots >> root 5 inode 257 errors 800, odd csum item >> ERROR: errors found in fs roots >> found 688128 bytes used, error(s) found >> total csum bytes: 256 >> total tree bytes: 163840 >> total fs tree bytes: 65536 >> total extent tree bytes: 16384 >> btree space waste bytes: 138819 >> file data blocks allocated: 10747904 >>referenced 10747904 >> $ echo $? >> 1 >> >> So tech check to not report such cases as errors by checking if the >> extent is shared with other inodes and if so, consider it an error the >> existence of checksum items only if all those other inodes are referencing >> the extent as a prealloc extent. >> This case can be hit often when running the generic/475 testcase from >> fstests. >> >> A test case will follow in a separate patch. >> >> Signed-off-by: Filipe Manana >> --- >> check/main.c | 270 >> ++- >> 1 file changed, 268 insertions(+), 2 deletions(-) >> >> diff --git a/check/main.c b/check/main.c >> index 392195ca..bb816311 100644 >> --- a/check/main.c >> +++ b/check/main.c >> @@ -1424,6 +1424,264 @@ static int process_inode_extref(struct extent_buffer >> *eb, >> >> } >> >> +/* >> + * Check if the inode referenced by the given data reference uses the extent >> + * at disk_bytenr as a non-prealloc extent. >> + * >> + * Returns 1 if true, 0 if false and < 0 on error. >> + */ >> +static int check_prealloc_data_ref(struct btrfs_fs_info *fs_info, >> +u64 ino, >> +u64 disk_bytenr, >> +struct btrfs_extent_data_ref *dref, >> +struct extent_buffer *eb) >> +{ >> + u64 rootid = btrfs_extent_data_ref_root(eb, dref); >> + u64 objectid = btrfs_extent_data_ref_objectid(eb, dref); >> + u64 offset = btrfs_extent_data_ref_offset(eb, dref); >> + struct btrfs_root *root; >> + struct btrfs_key key; >> + struct btrfs_path path; >> + int ret; >> + >> + if (objectid == ino) >> + return 0; >> + >> + btrfs_init_path(); >> + key.objectid = rootid; >> + key.type = BTRFS_ROOT_ITEM_KEY; >> + key.offset = (u64)-1; >> + root = btrfs_read_fs_root(fs_info, ); >> + if (IS_ERR(root)) { >> + ret = PTR_ERR(root); >> + goto out; >> + } >> + >> + key.objectid = objectid; >> + key.type = BTRFS_EXTENT_DATA_KEY; >> + key.offset = offset; >> + ret = btrfs_search_slot(NULL, root, , , 0, 0); >> + if (ret > 0) { >> + fprintf(stderr, >> + "Missing file extent item for inode %llu, root %llu, >> offset %llu", >> + objectid, rootid, offset); >> + ret = -ENOENT; >> + } >> + if (ret < 0) >> + goto out; >> + >> + while (true) { >> + struct btrfs_file_extent_item *fi; >> + int extent_type; >> + >> + if (path.slots[0] >= btrfs_header_nritems(path.nodes[0])) { >> + ret = btrfs_next_leaf(fs_info->extent_root, ); >> + if (ret < 0) >> + goto out; >> + if (ret > 0) >> + break; >> + } >> + >> + btrfs_item_key_to_cpu(path.nodes[0], , path.slots[0]); >> + if (key.objectid != objectid || >> + key.type != BTRFS_EXTENT_DATA_KEY) >> + break; >> + >> + fi = btrfs_item_ptr(path.nodes[0], path.slots[0], >> +
Re: [PATCH 1/2] Btrfs-progs: check, fix false error reports for shared prealloc extents
On Wed, Mar 14, 2018 at 8:19 AM, Nikolay Borisovwrote: > > > On 13.03.2018 20:47, fdman...@kernel.org wrote: >> From: Filipe Manana >> >> Under some cases the filesystem checker reports an error when it finds >> checksum items for an extent that is referenced by an inode as a prealloc >> extent. Such cases are not an error when the extent is actually shared >> (was cloned/reflinked) with other inodes and was written through one of >> those other inodes. >> >> Example: >> >> $ mkfs.btrfs -f /dev/sdb >> $ mount /dev/sdb /mnt >> >> $ touch /mnt/foo >> $ xfs_io -c "falloc 0 256K" /mnt/foo >> $ sync >> >> $ xfs_io -c "pwrite -S 0xab -b 0 256K" /mnt/foo >> $ touch /mnt/bar >> $ xfs_io -c "reflink /mnt/foo 0 0 256K" /mnt/bar >> $ xfs_io -c "fsync" /mnt/bar >> >> >> $ mount /dev/sdb /mnt >> $ umount /mnt >> >> $ btrfs check /dev/sdc >> Checking filesystem on /dev/sdb >> UUID: 52d3006e-ee3b-40eb-aa21-e56253a03d39 >> checking extents >> checking free space cache >> checking fs roots >> root 5 inode 257 errors 800, odd csum item >> ERROR: errors found in fs roots >> found 688128 bytes used, error(s) found >> total csum bytes: 256 >> total tree bytes: 163840 >> total fs tree bytes: 65536 >> total extent tree bytes: 16384 >> btree space waste bytes: 138819 >> file data blocks allocated: 10747904 >>referenced 10747904 >> $ echo $? >> 1 >> >> So tech check to not report such cases as errors by checking if the >> extent is shared with other inodes and if so, consider it an error the >> existence of checksum items only if all those other inodes are referencing >> the extent as a prealloc extent. >> This case can be hit often when running the generic/475 testcase from >> fstests. >> >> A test case will follow in a separate patch. >> >> Signed-off-by: Filipe Manana >> --- >> check/main.c | 270 >> ++- >> 1 file changed, 268 insertions(+), 2 deletions(-) >> >> diff --git a/check/main.c b/check/main.c >> index 392195ca..bb816311 100644 >> --- a/check/main.c >> +++ b/check/main.c >> @@ -1424,6 +1424,264 @@ static int process_inode_extref(struct extent_buffer >> *eb, >> >> } >> >> +/* >> + * Check if the inode referenced by the given data reference uses the extent >> + * at disk_bytenr as a non-prealloc extent. >> + * >> + * Returns 1 if true, 0 if false and < 0 on error. >> + */ >> +static int check_prealloc_data_ref(struct btrfs_fs_info *fs_info, >> +u64 ino, >> +u64 disk_bytenr, >> +struct btrfs_extent_data_ref *dref, >> +struct extent_buffer *eb) >> +{ >> + u64 rootid = btrfs_extent_data_ref_root(eb, dref); >> + u64 objectid = btrfs_extent_data_ref_objectid(eb, dref); > > nit: Shouldn't this ino be equal to the ino passed to the function? If > so objectid can be removed and when setting up the key for the second > search you just assigne ino to its objectid No, we want to skip if it's equal, because we're only interested in references for inodes other then the one being currently processed. > >> + u64 offset = btrfs_extent_data_ref_offset(eb, dref); >> + struct btrfs_root *root; >> + struct btrfs_key key; >> + struct btrfs_path path; >> + int ret; >> + >> + if (objectid == ino) >> + return 0; >> + >> + btrfs_init_path(); >> + key.objectid = rootid; >> + key.type = BTRFS_ROOT_ITEM_KEY; >> + key.offset = (u64)-1; >> + root = btrfs_read_fs_root(fs_info, ); >> + if (IS_ERR(root)) { >> + ret = PTR_ERR(root); >> + goto out; >> + } >> + >> + key.objectid = objectid; >> + key.type = BTRFS_EXTENT_DATA_KEY; >> + key.offset = offset; >> + ret = btrfs_search_slot(NULL, root, , , 0, 0); >> + if (ret > 0) { >> + fprintf(stderr, >> + "Missing file extent item for inode %llu, root %llu, >> offset %llu", >> + objectid, rootid, offset); >> + ret = -ENOENT; >> + } >> + if (ret < 0) >> + goto out; >> + >> + while (true) { >> + struct btrfs_file_extent_item *fi; >> + int extent_type; >> + >> + if (path.slots[0] >= btrfs_header_nritems(path.nodes[0])) { >> + ret = btrfs_next_leaf(fs_info->extent_root, ); >> + if (ret < 0) >> + goto out; >> + if (ret > 0) >> + break; >> + } >> + >> + btrfs_item_key_to_cpu(path.nodes[0], , path.slots[0]); >> + if (key.objectid != objectid || >> + key.type != BTRFS_EXTENT_DATA_KEY) >> + break; >> + >> + fi = btrfs_item_ptr(path.nodes[0], path.slots[0], >> +
Re: [PATCH 1/2] Btrfs-progs: check, fix false error reports for shared prealloc extents
On Wed, Mar 14, 2018 at 1:32 AM, Qu Wenruowrote: > > > On 2018年03月14日 02:47, fdman...@kernel.org wrote: >> From: Filipe Manana >> >> Under some cases the filesystem checker reports an error when it finds >> checksum items for an extent that is referenced by an inode as a prealloc >> extent. Such cases are not an error when the extent is actually shared >> (was cloned/reflinked) with other inodes and was written through one of >> those other inodes. >> >> Example: >> >> $ mkfs.btrfs -f /dev/sdb >> $ mount /dev/sdb /mnt >> >> $ touch /mnt/foo >> $ xfs_io -c "falloc 0 256K" /mnt/foo >> $ sync >> >> $ xfs_io -c "pwrite -S 0xab -b 0 256K" /mnt/foo >> $ touch /mnt/bar >> $ xfs_io -c "reflink /mnt/foo 0 0 256K" /mnt/bar >> $ xfs_io -c "fsync" /mnt/bar >> >> >> $ mount /dev/sdb /mnt >> $ umount /mnt >> >> $ btrfs check /dev/sdc >> Checking filesystem on /dev/sdb >> UUID: 52d3006e-ee3b-40eb-aa21-e56253a03d39 >> checking extents >> checking free space cache >> checking fs roots >> root 5 inode 257 errors 800, odd csum item >> ERROR: errors found in fs roots >> found 688128 bytes used, error(s) found >> total csum bytes: 256 >> total tree bytes: 163840 >> total fs tree bytes: 65536 >> total extent tree bytes: 16384 >> btree space waste bytes: 138819 >> file data blocks allocated: 10747904 >>referenced 10747904 >> $ echo $? >> 1 >> >> So tech check to not report such cases as errors by checking if the >> extent is shared with other inodes and if so, consider it an error the >> existence of checksum items only if all those other inodes are referencing >> the extent as a prealloc extent. >> This case can be hit often when running the generic/475 testcase from >> fstests. >> >> A test case will follow in a separate patch. >> >> Signed-off-by: Filipe Manana > > Looks good overall, but still some small nitpicks. > > One is, the fix is only for original mode, while the fix itself has > nothing mode dependent, so it's better to put the check into > check/mode-original.[ch] so lowmem mode can also benefit from it. Ok, I started doing development on a release branch where the split doesn't exist, them moved to the development branch and wrongly assumed that shared code would be in main.c. > >> --- >> check/main.c | 270 >> ++- >> 1 file changed, 268 insertions(+), 2 deletions(-) >> >> diff --git a/check/main.c b/check/main.c >> index 392195ca..bb816311 100644 >> --- a/check/main.c >> +++ b/check/main.c >> @@ -1424,6 +1424,264 @@ static int process_inode_extref(struct extent_buffer >> *eb, >> >> } >> >> +/* >> + * Check if the inode referenced by the given data reference uses the extent >> + * at disk_bytenr as a non-prealloc extent. >> + * >> + * Returns 1 if true, 0 if false and < 0 on error. >> + */ > > The check itself (along with check_prealloc_shared_data_ref) is good > enough to detect any regular file extents inside the prealloc extent. > > But when it finds any regular extents, it doesn't check if it's covered > by csum correctly. > > For example: > > |<-- prealloc extent range -->| > |<- the *only* regular extent ->| > |< range covered by csum >| > |<- ODD ->| > > Or > |<-- prealloc extent range -->| > |<- the *only* regular extent ->| > |<- csum ->| >|<-- ODD --->| > > So at least in theory, the best solution is not to just check if there > is any regular extents inside the large prealloc extent, but also check > how many csum bytes the regular extents contribute, and then compare it > to the result of count_csum_range(). Yes, I considered that initially but then moved to this simpler, but less accurate solution. So the problem is more complex then you think, as a prealloc extent can be partially written and then reflinked multiple times and fsync'ed. For example: create inode A with prealloc extent [0, 256k[ sync write into prealloc extent, range [0, 128k[ through inode A reflink prealloc extent into inode B reflink prealloc extent into inode C reflink prealloc extent into inode D fsync inodes B, C and D power fail remount to replay log Now using a simple counter to count how many bytes the are covered by chekcsum items we would have, we would get 384K (128K * 3), more then the size of the extent. So we would have to track ranges and then ranges that partially overlap (due to non-zero offsets in the extent items). And this could go even more complex. It's doable, but I wanted to keep things simple for now. thanks > > Thanks, > Qu > >> +static int check_prealloc_data_ref(struct btrfs_fs_info *fs_info, >> +u64 ino, >> +u64 disk_bytenr, >> +struct btrfs_extent_data_ref *dref, >> +
Re: [PATCH 1/2] Btrfs-progs: check, fix false error reports for shared prealloc extents
On 13.03.2018 20:47, fdman...@kernel.org wrote: > From: Filipe Manana> > Under some cases the filesystem checker reports an error when it finds > checksum items for an extent that is referenced by an inode as a prealloc > extent. Such cases are not an error when the extent is actually shared > (was cloned/reflinked) with other inodes and was written through one of > those other inodes. > > Example: > > $ mkfs.btrfs -f /dev/sdb > $ mount /dev/sdb /mnt > > $ touch /mnt/foo > $ xfs_io -c "falloc 0 256K" /mnt/foo > $ sync > > $ xfs_io -c "pwrite -S 0xab -b 0 256K" /mnt/foo ^^ nit: You are actually missing the argument for the -b switch, how big should the blocks be ? Same thing applies to the commit message of your test > $ touch /mnt/bar > $ xfs_io -c "reflink /mnt/foo 0 0 256K" /mnt/bar > $ xfs_io -c "fsync" /mnt/bar > > > $ mount /dev/sdb /mnt > $ umount /mnt > > $ btrfs check /dev/sdc > Checking filesystem on /dev/sdb > UUID: 52d3006e-ee3b-40eb-aa21-e56253a03d39 > checking extents > checking free space cache > checking fs roots > root 5 inode 257 errors 800, odd csum item > ERROR: errors found in fs roots > found 688128 bytes used, error(s) found > total csum bytes: 256 > total tree bytes: 163840 > total fs tree bytes: 65536 > total extent tree bytes: 16384 > btree space waste bytes: 138819 > file data blocks allocated: 10747904 >referenced 10747904 > $ echo $? > 1 > > So tech check to not report such cases as errors by checking if the > extent is shared with other inodes and if so, consider it an error the > existence of checksum items only if all those other inodes are referencing > the extent as a prealloc extent. > This case can be hit often when running the generic/475 testcase from > fstests. > > A test case will follow in a separate patch. > > Signed-off-by: Filipe Manana > --- > check/main.c | 270 > ++- > 1 file changed, 268 insertions(+), 2 deletions(-) > > diff --git a/check/main.c b/check/main.c > index 392195ca..bb816311 100644 > --- a/check/main.c > +++ b/check/main.c > @@ -1424,6 +1424,264 @@ static int process_inode_extref(struct extent_buffer > *eb, > > } > > +/* > + * Check if the inode referenced by the given data reference uses the extent > + * at disk_bytenr as a non-prealloc extent. > + * > + * Returns 1 if true, 0 if false and < 0 on error. > + */ > +static int check_prealloc_data_ref(struct btrfs_fs_info *fs_info, > +u64 ino, > +u64 disk_bytenr, > +struct btrfs_extent_data_ref *dref, > +struct extent_buffer *eb) > +{ > + u64 rootid = btrfs_extent_data_ref_root(eb, dref); > + u64 objectid = btrfs_extent_data_ref_objectid(eb, dref); > + u64 offset = btrfs_extent_data_ref_offset(eb, dref); > + struct btrfs_root *root; > + struct btrfs_key key; > + struct btrfs_path path; > + int ret; > + > + if (objectid == ino) > + return 0; > + > + btrfs_init_path(); > + key.objectid = rootid; > + key.type = BTRFS_ROOT_ITEM_KEY; > + key.offset = (u64)-1; > + root = btrfs_read_fs_root(fs_info, ); > + if (IS_ERR(root)) { > + ret = PTR_ERR(root); > + goto out; > + } > + > + key.objectid = objectid; > + key.type = BTRFS_EXTENT_DATA_KEY; > + key.offset = offset; > + ret = btrfs_search_slot(NULL, root, , , 0, 0); > + if (ret > 0) { > + fprintf(stderr, > + "Missing file extent item for inode %llu, root %llu, > offset %llu", > + objectid, rootid, offset); > + ret = -ENOENT; > + } > + if (ret < 0) > + goto out; > + > + while (true) { > + struct btrfs_file_extent_item *fi; > + int extent_type; > + > + if (path.slots[0] >= btrfs_header_nritems(path.nodes[0])) { > + ret = btrfs_next_leaf(fs_info->extent_root, ); > + if (ret < 0) > + goto out; > + if (ret > 0) > + break; > + } > + > + btrfs_item_key_to_cpu(path.nodes[0], , path.slots[0]); > + if (key.objectid != objectid || > + key.type != BTRFS_EXTENT_DATA_KEY) > + break; > + > + fi = btrfs_item_ptr(path.nodes[0], path.slots[0], > + struct btrfs_file_extent_item); > + extent_type = btrfs_file_extent_type(path.nodes[0], fi); > + if (extent_type != BTRFS_FILE_EXTENT_REG && > + extent_type != BTRFS_FILE_EXTENT_PREALLOC) > + goto next; > + > + if (btrfs_file_extent_disk_bytenr(path.nodes[0],
Re: [PATCH 1/2] Btrfs-progs: check, fix false error reports for shared prealloc extents
On 13.03.2018 20:47, fdman...@kernel.org wrote: > From: Filipe Manana> > Under some cases the filesystem checker reports an error when it finds > checksum items for an extent that is referenced by an inode as a prealloc > extent. Such cases are not an error when the extent is actually shared > (was cloned/reflinked) with other inodes and was written through one of > those other inodes. > > Example: > > $ mkfs.btrfs -f /dev/sdb > $ mount /dev/sdb /mnt > > $ touch /mnt/foo > $ xfs_io -c "falloc 0 256K" /mnt/foo > $ sync > > $ xfs_io -c "pwrite -S 0xab -b 0 256K" /mnt/foo > $ touch /mnt/bar > $ xfs_io -c "reflink /mnt/foo 0 0 256K" /mnt/bar > $ xfs_io -c "fsync" /mnt/bar > > > $ mount /dev/sdb /mnt > $ umount /mnt > > $ btrfs check /dev/sdc > Checking filesystem on /dev/sdb > UUID: 52d3006e-ee3b-40eb-aa21-e56253a03d39 > checking extents > checking free space cache > checking fs roots > root 5 inode 257 errors 800, odd csum item > ERROR: errors found in fs roots > found 688128 bytes used, error(s) found > total csum bytes: 256 > total tree bytes: 163840 > total fs tree bytes: 65536 > total extent tree bytes: 16384 > btree space waste bytes: 138819 > file data blocks allocated: 10747904 >referenced 10747904 > $ echo $? > 1 > > So tech check to not report such cases as errors by checking if the > extent is shared with other inodes and if so, consider it an error the > existence of checksum items only if all those other inodes are referencing > the extent as a prealloc extent. > This case can be hit often when running the generic/475 testcase from > fstests. > > A test case will follow in a separate patch. > > Signed-off-by: Filipe Manana > --- > check/main.c | 270 > ++- > 1 file changed, 268 insertions(+), 2 deletions(-) > > diff --git a/check/main.c b/check/main.c > index 392195ca..bb816311 100644 > --- a/check/main.c > +++ b/check/main.c > @@ -1424,6 +1424,264 @@ static int process_inode_extref(struct extent_buffer > *eb, > > } > > +/* > + * Check if the inode referenced by the given data reference uses the extent > + * at disk_bytenr as a non-prealloc extent. > + * > + * Returns 1 if true, 0 if false and < 0 on error. > + */ > +static int check_prealloc_data_ref(struct btrfs_fs_info *fs_info, > +u64 ino, > +u64 disk_bytenr, > +struct btrfs_extent_data_ref *dref, > +struct extent_buffer *eb) > +{ > + u64 rootid = btrfs_extent_data_ref_root(eb, dref); > + u64 objectid = btrfs_extent_data_ref_objectid(eb, dref); nit: Shouldn't this ino be equal to the ino passed to the function? If so objectid can be removed and when setting up the key for the second search you just assigne ino to its objectid > + u64 offset = btrfs_extent_data_ref_offset(eb, dref); > + struct btrfs_root *root; > + struct btrfs_key key; > + struct btrfs_path path; > + int ret; > + > + if (objectid == ino) > + return 0; > + > + btrfs_init_path(); > + key.objectid = rootid; > + key.type = BTRFS_ROOT_ITEM_KEY; > + key.offset = (u64)-1; > + root = btrfs_read_fs_root(fs_info, ); > + if (IS_ERR(root)) { > + ret = PTR_ERR(root); > + goto out; > + } > + > + key.objectid = objectid; > + key.type = BTRFS_EXTENT_DATA_KEY; > + key.offset = offset; > + ret = btrfs_search_slot(NULL, root, , , 0, 0); > + if (ret > 0) { > + fprintf(stderr, > + "Missing file extent item for inode %llu, root %llu, > offset %llu", > + objectid, rootid, offset); > + ret = -ENOENT; > + } > + if (ret < 0) > + goto out; > + > + while (true) { > + struct btrfs_file_extent_item *fi; > + int extent_type; > + > + if (path.slots[0] >= btrfs_header_nritems(path.nodes[0])) { > + ret = btrfs_next_leaf(fs_info->extent_root, ); > + if (ret < 0) > + goto out; > + if (ret > 0) > + break; > + } > + > + btrfs_item_key_to_cpu(path.nodes[0], , path.slots[0]); > + if (key.objectid != objectid || > + key.type != BTRFS_EXTENT_DATA_KEY) > + break; > + > + fi = btrfs_item_ptr(path.nodes[0], path.slots[0], > + struct btrfs_file_extent_item); > + extent_type = btrfs_file_extent_type(path.nodes[0], fi); > + if (extent_type != BTRFS_FILE_EXTENT_REG && > + extent_type != BTRFS_FILE_EXTENT_PREALLOC) > + goto next; > + > + if (btrfs_file_extent_disk_bytenr(path.nodes[0],
Re: [PATCH 1/2] Btrfs-progs: check, fix false error reports for shared prealloc extents
On Wed, Mar 14, 2018 at 09:32:50AM +0800, Qu Wenruo wrote: > > >On 2018年03月14日 02:47, fdman...@kernel.org wrote: >> From: Filipe Manana>> >> Under some cases the filesystem checker reports an error when it finds >> checksum items for an extent that is referenced by an inode as a prealloc >> extent. Such cases are not an error when the extent is actually shared >> (was cloned/reflinked) with other inodes and was written through one of >> those other inodes. >> >> Example: >> >> $ mkfs.btrfs -f /dev/sdb >> $ mount /dev/sdb /mnt >> >> $ touch /mnt/foo >> $ xfs_io -c "falloc 0 256K" /mnt/foo >> $ sync >> >> $ xfs_io -c "pwrite -S 0xab -b 0 256K" /mnt/foo >> $ touch /mnt/bar >> $ xfs_io -c "reflink /mnt/foo 0 0 256K" /mnt/bar >> $ xfs_io -c "fsync" /mnt/bar >> >> >> $ mount /dev/sdb /mnt >> $ umount /mnt >> >> $ btrfs check /dev/sdc >> Checking filesystem on /dev/sdb >> UUID: 52d3006e-ee3b-40eb-aa21-e56253a03d39 >> checking extents >> checking free space cache >> checking fs roots >> root 5 inode 257 errors 800, odd csum item >> ERROR: errors found in fs roots >> found 688128 bytes used, error(s) found >> total csum bytes: 256 >> total tree bytes: 163840 >> total fs tree bytes: 65536 >> total extent tree bytes: 16384 >> btree space waste bytes: 138819 >> file data blocks allocated: 10747904 >>referenced 10747904 >> $ echo $? >> 1 >> >> So tech check to not report such cases as errors by checking if the >> extent is shared with other inodes and if so, consider it an error the >> existence of checksum items only if all those other inodes are referencing >> the extent as a prealloc extent. >> This case can be hit often when running the generic/475 testcase from >> fstests. >> >> A test case will follow in a separate patch. >> >> Signed-off-by: Filipe Manana > >Looks good overall, but still some small nitpicks. > >One is, the fix is only for original mode, while the fix itself has >nothing mode dependent, so it's better to put the check into >check/mode-original.[ch] so lowmem mode can also benefit from it. Fix a tyro: s/mode-original/mode-common > >> --- >> check/main.c | 270 >> ++- >> 1 file changed, 268 insertions(+), 2 deletions(-) >> >> diff --git a/check/main.c b/check/main.c >> index 392195ca..bb816311 100644 >> --- a/check/main.c >> +++ b/check/main.c >> @@ -1424,6 +1424,264 @@ static int process_inode_extref(struct extent_buffer >> *eb, >> >> } >> >> +/* >> + * Check if the inode referenced by the given data reference uses the extent >> + * at disk_bytenr as a non-prealloc extent. >> + * >> + * Returns 1 if true, 0 if false and < 0 on error. >> + */ > >The check itself (along with check_prealloc_shared_data_ref) is good >enough to detect any regular file extents inside the prealloc extent. > >But when it finds any regular extents, it doesn't check if it's covered >by csum correctly. > >For example: > >|<-- prealloc extent range -->| >|<- the *only* regular extent ->| >|< range covered by csum >| >|<- ODD ->| > >Or >|<-- prealloc extent range -->| >|<- the *only* regular extent ->| >|<- csum ->| > |<-- ODD --->| > >So at least in theory, the best solution is not to just check if there >is any regular extents inside the large prealloc extent, but also check >how many csum bytes the regular extents contribute, and then compare it >to the result of count_csum_range(). > >Thanks, >Qu > >> +static int check_prealloc_data_ref(struct btrfs_fs_info *fs_info, >> + u64 ino, >> + u64 disk_bytenr, >> + struct btrfs_extent_data_ref *dref, >> + struct extent_buffer *eb) >> +{ >> +u64 rootid = btrfs_extent_data_ref_root(eb, dref); >> +u64 objectid = btrfs_extent_data_ref_objectid(eb, dref); >> +u64 offset = btrfs_extent_data_ref_offset(eb, dref); >> +struct btrfs_root *root; >> +struct btrfs_key key; >> +struct btrfs_path path; >> +int ret; >> + >> +if (objectid == ino) >> +return 0; >> + >> +btrfs_init_path(); >> +key.objectid = rootid; >> +key.type = BTRFS_ROOT_ITEM_KEY; >> +key.offset = (u64)-1; >> +root = btrfs_read_fs_root(fs_info, ); >> +if (IS_ERR(root)) { >> +ret = PTR_ERR(root); >> +goto out; >> +} >> + >> +key.objectid = objectid; >> +key.type = BTRFS_EXTENT_DATA_KEY; >> +key.offset = offset; >> +ret = btrfs_search_slot(NULL, root, , , 0, 0); >> +if (ret > 0) { >> +fprintf(stderr, >> +"Missing file extent item for inode %llu, root %llu, >> offset %llu", >> +objectid, rootid, offset); >> +ret = -ENOENT;
Re: [PATCH 1/2] Btrfs-progs: check, fix false error reports for shared prealloc extents
On 2018年03月14日 09:32, Qu Wenruo wrote: > > > On 2018年03月14日 02:47, fdman...@kernel.org wrote: >> From: Filipe Manana>> >> Under some cases the filesystem checker reports an error when it finds >> checksum items for an extent that is referenced by an inode as a prealloc >> extent. Such cases are not an error when the extent is actually shared >> (was cloned/reflinked) with other inodes and was written through one of >> those other inodes. >> >> Example: >> >> $ mkfs.btrfs -f /dev/sdb >> $ mount /dev/sdb /mnt >> >> $ touch /mnt/foo >> $ xfs_io -c "falloc 0 256K" /mnt/foo >> $ sync >> >> $ xfs_io -c "pwrite -S 0xab -b 0 256K" /mnt/foo >> $ touch /mnt/bar >> $ xfs_io -c "reflink /mnt/foo 0 0 256K" /mnt/bar >> $ xfs_io -c "fsync" /mnt/bar >> >> >> $ mount /dev/sdb /mnt >> $ umount /mnt >> >> $ btrfs check /dev/sdc >> Checking filesystem on /dev/sdb >> UUID: 52d3006e-ee3b-40eb-aa21-e56253a03d39 >> checking extents >> checking free space cache >> checking fs roots >> root 5 inode 257 errors 800, odd csum item >> ERROR: errors found in fs roots >> found 688128 bytes used, error(s) found >> total csum bytes: 256 >> total tree bytes: 163840 >> total fs tree bytes: 65536 >> total extent tree bytes: 16384 >> btree space waste bytes: 138819 >> file data blocks allocated: 10747904 >>referenced 10747904 >> $ echo $? >> 1 >> >> So tech check to not report such cases as errors by checking if the >> extent is shared with other inodes and if so, consider it an error the >> existence of checksum items only if all those other inodes are referencing >> the extent as a prealloc extent. >> This case can be hit often when running the generic/475 testcase from >> fstests. >> >> A test case will follow in a separate patch. >> >> Signed-off-by: Filipe Manana > > Looks good overall, but still some small nitpicks. > > One is, the fix is only for original mode, while the fix itself has > nothing mode dependent, so it's better to put the check into > check/mode-original.[ch] so lowmem mode can also benefit from it. ^^^ I mean check/mode-common Sorry for that. Thanks, Qu > >> --- >> check/main.c | 270 >> ++- >> 1 file changed, 268 insertions(+), 2 deletions(-) >> >> diff --git a/check/main.c b/check/main.c >> index 392195ca..bb816311 100644 >> --- a/check/main.c >> +++ b/check/main.c >> @@ -1424,6 +1424,264 @@ static int process_inode_extref(struct extent_buffer >> *eb, >> >> } >> >> +/* >> + * Check if the inode referenced by the given data reference uses the extent >> + * at disk_bytenr as a non-prealloc extent. >> + * >> + * Returns 1 if true, 0 if false and < 0 on error. >> + */ > > The check itself (along with check_prealloc_shared_data_ref) is good > enough to detect any regular file extents inside the prealloc extent. > > But when it finds any regular extents, it doesn't check if it's covered > by csum correctly. > > For example: > > |<-- prealloc extent range -->| > |<- the *only* regular extent ->| > |< range covered by csum >| > |<- ODD ->| > > Or > |<-- prealloc extent range -->| > |<- the *only* regular extent ->| > |<- csum ->| >|<-- ODD --->| > > So at least in theory, the best solution is not to just check if there > is any regular extents inside the large prealloc extent, but also check > how many csum bytes the regular extents contribute, and then compare it > to the result of count_csum_range(). > > Thanks, > Qu > >> +static int check_prealloc_data_ref(struct btrfs_fs_info *fs_info, >> + u64 ino, >> + u64 disk_bytenr, >> + struct btrfs_extent_data_ref *dref, >> + struct extent_buffer *eb) >> +{ >> +u64 rootid = btrfs_extent_data_ref_root(eb, dref); >> +u64 objectid = btrfs_extent_data_ref_objectid(eb, dref); >> +u64 offset = btrfs_extent_data_ref_offset(eb, dref); >> +struct btrfs_root *root; >> +struct btrfs_key key; >> +struct btrfs_path path; >> +int ret; >> + >> +if (objectid == ino) >> +return 0; >> + >> +btrfs_init_path(); >> +key.objectid = rootid; >> +key.type = BTRFS_ROOT_ITEM_KEY; >> +key.offset = (u64)-1; >> +root = btrfs_read_fs_root(fs_info, ); >> +if (IS_ERR(root)) { >> +ret = PTR_ERR(root); >> +goto out; >> +} >> + >> +key.objectid = objectid; >> +key.type = BTRFS_EXTENT_DATA_KEY; >> +key.offset = offset; >> +ret = btrfs_search_slot(NULL, root, , , 0, 0); >> +if (ret > 0) { >> +fprintf(stderr, >> +"Missing file extent item for inode %llu, root %llu, >> offset %llu", >> +objectid,
Re: [PATCH 1/2] Btrfs-progs: check, fix false error reports for shared prealloc extents
On 2018年03月14日 02:47, fdman...@kernel.org wrote: > From: Filipe Manana> > Under some cases the filesystem checker reports an error when it finds > checksum items for an extent that is referenced by an inode as a prealloc > extent. Such cases are not an error when the extent is actually shared > (was cloned/reflinked) with other inodes and was written through one of > those other inodes. > > Example: > > $ mkfs.btrfs -f /dev/sdb > $ mount /dev/sdb /mnt > > $ touch /mnt/foo > $ xfs_io -c "falloc 0 256K" /mnt/foo > $ sync > > $ xfs_io -c "pwrite -S 0xab -b 0 256K" /mnt/foo > $ touch /mnt/bar > $ xfs_io -c "reflink /mnt/foo 0 0 256K" /mnt/bar > $ xfs_io -c "fsync" /mnt/bar > > > $ mount /dev/sdb /mnt > $ umount /mnt > > $ btrfs check /dev/sdc > Checking filesystem on /dev/sdb > UUID: 52d3006e-ee3b-40eb-aa21-e56253a03d39 > checking extents > checking free space cache > checking fs roots > root 5 inode 257 errors 800, odd csum item > ERROR: errors found in fs roots > found 688128 bytes used, error(s) found > total csum bytes: 256 > total tree bytes: 163840 > total fs tree bytes: 65536 > total extent tree bytes: 16384 > btree space waste bytes: 138819 > file data blocks allocated: 10747904 >referenced 10747904 > $ echo $? > 1 > > So tech check to not report such cases as errors by checking if the > extent is shared with other inodes and if so, consider it an error the > existence of checksum items only if all those other inodes are referencing > the extent as a prealloc extent. > This case can be hit often when running the generic/475 testcase from > fstests. > > A test case will follow in a separate patch. > > Signed-off-by: Filipe Manana Looks good overall, but still some small nitpicks. One is, the fix is only for original mode, while the fix itself has nothing mode dependent, so it's better to put the check into check/mode-original.[ch] so lowmem mode can also benefit from it. > --- > check/main.c | 270 > ++- > 1 file changed, 268 insertions(+), 2 deletions(-) > > diff --git a/check/main.c b/check/main.c > index 392195ca..bb816311 100644 > --- a/check/main.c > +++ b/check/main.c > @@ -1424,6 +1424,264 @@ static int process_inode_extref(struct extent_buffer > *eb, > > } > > +/* > + * Check if the inode referenced by the given data reference uses the extent > + * at disk_bytenr as a non-prealloc extent. > + * > + * Returns 1 if true, 0 if false and < 0 on error. > + */ The check itself (along with check_prealloc_shared_data_ref) is good enough to detect any regular file extents inside the prealloc extent. But when it finds any regular extents, it doesn't check if it's covered by csum correctly. For example: |<-- prealloc extent range -->| |<- the *only* regular extent ->| |< range covered by csum >| |<- ODD ->| Or |<-- prealloc extent range -->| |<- the *only* regular extent ->| |<- csum ->| |<-- ODD --->| So at least in theory, the best solution is not to just check if there is any regular extents inside the large prealloc extent, but also check how many csum bytes the regular extents contribute, and then compare it to the result of count_csum_range(). Thanks, Qu > +static int check_prealloc_data_ref(struct btrfs_fs_info *fs_info, > +u64 ino, > +u64 disk_bytenr, > +struct btrfs_extent_data_ref *dref, > +struct extent_buffer *eb) > +{ > + u64 rootid = btrfs_extent_data_ref_root(eb, dref); > + u64 objectid = btrfs_extent_data_ref_objectid(eb, dref); > + u64 offset = btrfs_extent_data_ref_offset(eb, dref); > + struct btrfs_root *root; > + struct btrfs_key key; > + struct btrfs_path path; > + int ret; > + > + if (objectid == ino) > + return 0; > + > + btrfs_init_path(); > + key.objectid = rootid; > + key.type = BTRFS_ROOT_ITEM_KEY; > + key.offset = (u64)-1; > + root = btrfs_read_fs_root(fs_info, ); > + if (IS_ERR(root)) { > + ret = PTR_ERR(root); > + goto out; > + } > + > + key.objectid = objectid; > + key.type = BTRFS_EXTENT_DATA_KEY; > + key.offset = offset; > + ret = btrfs_search_slot(NULL, root, , , 0, 0); > + if (ret > 0) { > + fprintf(stderr, > + "Missing file extent item for inode %llu, root %llu, > offset %llu", > + objectid, rootid, offset); > + ret = -ENOENT; > + } > + if (ret < 0) > + goto out; > + > + while (true) { > + struct btrfs_file_extent_item *fi; > + int extent_type; > + > + if (path.slots[0] >=
[PATCH 1/2] Btrfs-progs: check, fix false error reports for shared prealloc extents
From: Filipe MananaUnder some cases the filesystem checker reports an error when it finds checksum items for an extent that is referenced by an inode as a prealloc extent. Such cases are not an error when the extent is actually shared (was cloned/reflinked) with other inodes and was written through one of those other inodes. Example: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ touch /mnt/foo $ xfs_io -c "falloc 0 256K" /mnt/foo $ sync $ xfs_io -c "pwrite -S 0xab -b 0 256K" /mnt/foo $ touch /mnt/bar $ xfs_io -c "reflink /mnt/foo 0 0 256K" /mnt/bar $ xfs_io -c "fsync" /mnt/bar $ mount /dev/sdb /mnt $ umount /mnt $ btrfs check /dev/sdc Checking filesystem on /dev/sdb UUID: 52d3006e-ee3b-40eb-aa21-e56253a03d39 checking extents checking free space cache checking fs roots root 5 inode 257 errors 800, odd csum item ERROR: errors found in fs roots found 688128 bytes used, error(s) found total csum bytes: 256 total tree bytes: 163840 total fs tree bytes: 65536 total extent tree bytes: 16384 btree space waste bytes: 138819 file data blocks allocated: 10747904 referenced 10747904 $ echo $? 1 So tech check to not report such cases as errors by checking if the extent is shared with other inodes and if so, consider it an error the existence of checksum items only if all those other inodes are referencing the extent as a prealloc extent. This case can be hit often when running the generic/475 testcase from fstests. A test case will follow in a separate patch. Signed-off-by: Filipe Manana --- check/main.c | 270 ++- 1 file changed, 268 insertions(+), 2 deletions(-) diff --git a/check/main.c b/check/main.c index 392195ca..bb816311 100644 --- a/check/main.c +++ b/check/main.c @@ -1424,6 +1424,264 @@ static int process_inode_extref(struct extent_buffer *eb, } +/* + * Check if the inode referenced by the given data reference uses the extent + * at disk_bytenr as a non-prealloc extent. + * + * Returns 1 if true, 0 if false and < 0 on error. + */ +static int check_prealloc_data_ref(struct btrfs_fs_info *fs_info, + u64 ino, + u64 disk_bytenr, + struct btrfs_extent_data_ref *dref, + struct extent_buffer *eb) +{ + u64 rootid = btrfs_extent_data_ref_root(eb, dref); + u64 objectid = btrfs_extent_data_ref_objectid(eb, dref); + u64 offset = btrfs_extent_data_ref_offset(eb, dref); + struct btrfs_root *root; + struct btrfs_key key; + struct btrfs_path path; + int ret; + + if (objectid == ino) + return 0; + + btrfs_init_path(); + key.objectid = rootid; + key.type = BTRFS_ROOT_ITEM_KEY; + key.offset = (u64)-1; + root = btrfs_read_fs_root(fs_info, ); + if (IS_ERR(root)) { + ret = PTR_ERR(root); + goto out; + } + + key.objectid = objectid; + key.type = BTRFS_EXTENT_DATA_KEY; + key.offset = offset; + ret = btrfs_search_slot(NULL, root, , , 0, 0); + if (ret > 0) { + fprintf(stderr, + "Missing file extent item for inode %llu, root %llu, offset %llu", + objectid, rootid, offset); + ret = -ENOENT; + } + if (ret < 0) + goto out; + + while (true) { + struct btrfs_file_extent_item *fi; + int extent_type; + + if (path.slots[0] >= btrfs_header_nritems(path.nodes[0])) { + ret = btrfs_next_leaf(fs_info->extent_root, ); + if (ret < 0) + goto out; + if (ret > 0) + break; + } + + btrfs_item_key_to_cpu(path.nodes[0], , path.slots[0]); + if (key.objectid != objectid || + key.type != BTRFS_EXTENT_DATA_KEY) + break; + + fi = btrfs_item_ptr(path.nodes[0], path.slots[0], + struct btrfs_file_extent_item); + extent_type = btrfs_file_extent_type(path.nodes[0], fi); + if (extent_type != BTRFS_FILE_EXTENT_REG && + extent_type != BTRFS_FILE_EXTENT_PREALLOC) + goto next; + + if (btrfs_file_extent_disk_bytenr(path.nodes[0], fi) != + disk_bytenr) + break; + + if (extent_type == BTRFS_FILE_EXTENT_REG) { + ret = 1; + goto out; + } +next: + path.slots[0]++; + } + ret = 0; + out: + btrfs_release_path(); + return ret; +} + +/* + * Check if a shared data reference points to a node that