Re: inline extents

2018-09-20 Thread Theodore Y. Ts'o
On Wed, Sep 19, 2018 at 09:18:16PM -0600, Chris Murphy wrote:
> > ext4 has inline data, too, so there's every chance grub will corrupt
> > ext4 filesystems with tit's wonderful new feature. I'm not sure if
> > the ext4 metadata cksums cover the entire inode and inline data, but
> > if they do it's the same problem as btrfs.
> 
> I don't see inline used with a default mkfs, but I do see metadata_csum

The grub environment block is 1024 bytes, so unless you are using a 4k
inode size (the default is 256 bytes), the grub environment block
won't be an inline data file.  So in practice, this won't be a
problem.  So in practice this won't be a problem for ext4 today.

There are ways that grub can query the file system (using FIEMAP or
FS_IOC_GETFLAGS) to see whether or not the file is an inline file.
For xfs grub would also need to make sure the file isn't reflinked
file (which can also be queried using FIEMAP).  That's fine for
inline, since a file won't magically change from stored in a block
versus inline, so grub can check this at grub setup time.  But the
problem with reflink is that a file that was originally using a block
exclusively could later have that block be shared by another file, at
which point it's only safe to write the block using copy-on-write.

> I know some of the file systems have reserve areas for bootloader
> stuff. I'm not sure if that's preferred over bootloaders just getting
> their own partition and controlling it stem to stern however they
> want.

Ext4 had reserved the bootloader inode, and we have defined a way for
a bootloader to access it.  The original intent was that not it just
be fore the environment block, but also the grub stage 1.5 or stage 2
loader.  This would allow grub to just use a fixed list of blocks or
extents, and not need to understand the ext4 extents structures.  We
added way this when, because we anticipated this would be easier for
grub than having it learn how to read ext4 extents structures.
Instead, the bootloader would at setup time use the EXT4_IOC_SWAP_BOOT
ioctl and then use a fixed list of blocks or block extents like LILO
used to do.

Of course, grub exceeded expectations and actually learned how to read
ext4 file systems, and probably we (the ext4 developers) should have
reached out to GRUB folks.

Cheers,

- Ted


Re: inline extents

2018-09-19 Thread Chris Murphy
Adding fsdevel@, linux-ext4, and btrfs@ (which has a separate subject
on this same issue)



On Wed, Sep 19, 2018 at 7:45 PM, Dave Chinner  wrote:
>On Wed, Sep 19, 2018 at 10:23:38AM -0600, Chris Murphy wrote:
>> Fedora 29 has a new feature to test if boot+startup fails, so the
>> bootloader can do a fallback at next boot, to a previously working
>> entry. Part of this means GRUB (the bootloader code, not the user
>> space code) uses "save_env" to overwrite the 1024 data bytes with
>> updated environment information.
>
> That's just broken. Illegal. Completely unsupportable. Doesn't
> matter what the filesystem is, nobody is allowed to write directly
> to the block device a filesystem owns.

Yeah, the word I'm thinking of is abomination.

However in their defense, grubenv and the 'save_env' command are old features:

line 3638 @node Environment block
http://git.savannah.gnu.org/cgit/grub.git/tree/docs/grub.texi

"For safety reasons, this storage is only available when installed on a plain
disk (no LVM or RAID), using a non-checksumming filesystem (no ZFS), and
using BIOS or EFI functions (no ATA, USB or IEEE1275)."

I haven't checked how it tests for this. But by now, it should list
the supported file systems, rather than what's exempt. That's a
shorter list.


> ext4 has inline data, too, so there's every chance grub will corrupt
> ext4 filesystems with tit's wonderful new feature. I'm not sure if
> the ext4 metadata cksums cover the entire inode and inline data, but
> if they do it's the same problem as btrfs.

I don't see inline used with a default mkfs, but I do see metadata_csum

e2fsprogs-1.44.3-1.fc29.x86_64

Filesystem features: has_journal ext_attr resize_inode dir_index
filetype extent 64bit flex_bg sparse_super large_file huge_file
dir_nlink extra_isize metadata_csum
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl

>
>> For XFS, I'm not sure how the inline extent is saved, and whether
>> metadata checksumming includes or excludes the inline extent.
>
> When XFS implements this, it will be like btrfs as the data will be
> covered by the metadata CRCs for the inode, and so writing directly
> to it would corrupt the inode and render it unreadable by the
> filesystem.

Good to know.


>
>> I'm also kinda ignoring the reflink ramifications of this behavior,
>> for now. Let's just say even if there's no corruption I'm really
>> suspicious of bootloader code writing anything, even what seems to be
>> a simple overwrite of two sectors.
>
> You're not the only one
>
> Like I said, it doesn't matter what the filesystem is, overwriting
> file data by writing directly to the block device is not
> supportable. It's essentially a filesystem corruption vector, and
> grub needs to have that functionality removed immediately.

I'm in agreement with respect to the more complex file systems. We've
already realized the folly of the bootloader being unable to do
journal replay, ergo it doesn't really for sure have a complete
picture of the file system anyway. That's suboptimal when it results
in boot failure. But if it were going to use stale file system
information, get a wrong idea of the file system, and then use that to
do even 1024 bytes of writes? No, no, and no.

Meanwhile, also in Fedoraland, it's one of the distros where grubenv
and grub.cfg stuff is on the EFI System partition, which is FAT. This
overwrite behavior will work there, but even this case is a kind of
betrayal that the file is being modified, without its metadata being
updated. I think it's an old era hack that by today's standards simply
isn't good enough. I'm a little surprised that all UEFI
implementations permit arbitrary writes from the pre-boot environment
to arbitrary block devices, even with Secure Boot enabled. That seems
specious.

I know some of the file systems have reserve areas for bootloader
stuff. I'm not sure if that's preferred over bootloaders just getting
their own partition and controlling it stem to stern however they
want.


-- 
Chris Murphy


Re: [PATCH v2] Btrfs: fix physical offset reported by fiemap for inline extents

2018-06-25 Thread Nikolay Borisov



On 20.06.2018 12:02, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 


> So fix this by ensuring the physical offset is always set to 0 when we
> are processing an extent with a special block_start value.
> 
> Fixes: 9d311e11fc1f ("Btrfs: fiemap: pass correct bytenr when fm_extent_count 
> is zero")
> Signed-off-by: Filipe Manana 

At first I was "wtf" since I had missed the u64 cast. After realising
that the constants are in fact always parsed as positive numbers then it
makes sense.

Reviewed-by: Nikolay Borisov 

> ---
> 
> v2: Set the physical offset to 0 for other extent maps with a special
> block_start value as well.
> 
>  fs/btrfs/extent_io.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 8e4a7cdbc9f5..1aa91d57404a 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -4545,8 +4545,11 @@ int extent_fiemap(struct inode *inode, struct 
> fiemap_extent_info *fieinfo,
>   offset_in_extent = em_start - em->start;
>   em_end = extent_map_end(em);
>   em_len = em_end - em_start;
> - disko = em->block_start + offset_in_extent;
>   flags = 0;
> + if (em->block_start < EXTENT_MAP_LAST_BYTE)
> + disko = em->block_start + offset_in_extent;
> + else
> + disko = 0;
>  
>   /*
>* bump off for our next call to get_extent
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Btrfs: fix physical offset reported by fiemap for inline extents

2018-06-20 Thread fdmanana
From: Filipe Manana 

Commit 9d311e11fc1f ("Btrfs: fiemap: pass correct bytenr when
fm_extent_count is zero") introduced a regression where we no longer
report 0 as the physical offset for inline extents (and other extents
with a special block_start value). This is because it always sets the
variable used to report the physical offset ("disko") as em->block_start
plus some offset, and em->block_start has the value 18446744073709551614
((u64) -2) for inline extents.

This made the btrfs test 004 (from fstests) often fail, for example, for
a file with an inline extent we have the following items in the subvolume
tree:

item 101 key (418 INODE_ITEM 0) itemoff 11029 itemsize 160
   generation 25 transid 38 size 1525 nbytes 1525
   block group 0 mode 100666 links 1 uid 0 gid 0 rdev 0
   sequence 0 flags 0x2(none)
   atime 1529342058.461891730 (2018-06-18 18:14:18)
   ctime 1529342058.461891730 (2018-06-18 18:14:18)
   mtime 1529342058.461891730 (2018-06-18 18:14:18)
   otime 1529342055.869892885 (2018-06-18 18:14:15)
item 102 key (418 INODE_REF 264) itemoff 11016 itemsize 13
   index 25 namelen 3 name: fc7
item 103 key (418 EXTENT_DATA 0) itemoff 9470 itemsize 1546
   generation 38 type 0 (inline)
   inline extent data size 1525 ram_bytes 1525 compression 0 (none)

Then when test 004 invoked fiemap against the file it got a non-zero
physical offset:

 $ filefrag -v /mnt/p0/d4/d7/fc7
 Filesystem type is: 9123683e
 File size of /mnt/p0/d4/d7/fc7 is 1525 (1 block of 4096 bytes)
  ext: logical_offset:physical_offset: length:   expected: flags:
0:0..4095: 18446744073709551614..  4093:   4096:
 last,not_aligned,inline,eof
 /mnt/p0/d4/d7/fc7: 1 extent found

This resulted in the test failing like this:

btrfs/004 49s ... [failed, exit status 1]- output mismatch (see 
/home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad)
--- tests/btrfs/004.out 2016-08-23 10:17:35.027012095 +0100
+++ /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad  
2018-06-18 18:15:02.385872155 +0100
@@ -1,3 +1,10 @@
 QA output created by 004
 *** test backref walking
-*** done
+./tests/btrfs/004: line 227: [: 7.55578637259143e+22: integer expression 
expected
+ERROR: 7.55578637259143e+22 is not a valid numeric value.
+unexpected output from
+   /home/fdmanana/git/hub/btrfs-progs/btrfs inspect-internal 
logical-resolve -s 65536 -P 7.55578637259143e+22 
/home/fdmanana/btrfs-tests/scratch_1
...
(Run 'diff -u tests/btrfs/004.out 
/home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad'  to see the entire 
diff)
Ran: btrfs/004

The large number in scientific notation reported as an invalid numeric
value is the result from the filter passed to perl which multiplies the
physical offset by the block size reported by fiemap.

So fix this by ensuring the physical offset is always set to 0 when we
are processing an extent with a special block_start value.

Fixes: 9d311e11fc1f ("Btrfs: fiemap: pass correct bytenr when fm_extent_count 
is zero")
Signed-off-by: Filipe Manana 
---

v2: Set the physical offset to 0 for other extent maps with a special
block_start value as well.

 fs/btrfs/extent_io.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 8e4a7cdbc9f5..1aa91d57404a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4545,8 +4545,11 @@ int extent_fiemap(struct inode *inode, struct 
fiemap_extent_info *fieinfo,
offset_in_extent = em_start - em->start;
em_end = extent_map_end(em);
em_len = em_end - em_start;
-   disko = em->block_start + offset_in_extent;
flags = 0;
+   if (em->block_start < EXTENT_MAP_LAST_BYTE)
+   disko = em->block_start + offset_in_extent;
+   else
+   disko = 0;
 
/*
 * bump off for our next call to get_extent
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix physical offset reported by fiemap for inline extents

2018-06-20 Thread Filipe Manana
On Wed, Jun 20, 2018 at 3:55 AM, robbieko  wrote:
> fdman...@kernel.org 於 2018-06-19 19:31 寫到:
>
>> From: Filipe Manana 
>>
>> Commit 9d311e11fc1f ("Btrfs: fiemap: pass correct bytenr when
>> fm_extent_count is zero") introduced a regression where we no longer
>> report 0 as the physical offset for inline extents. This is because it
>> always sets the variable used to report the physical offset ("disko")
>> as em->block_start plus some offset, and em->block_start has the value
>> 18446744073709551614 ((u64) -2) for inline extents.
>>
>> This made the btrfs test 004 (from fstests) often fail, for example, for
>> a file with an inline extent we have the following items in the subvolume
>> tree:
>>
>> item 101 key (418 INODE_ITEM 0) itemoff 11029 itemsize 160
>>generation 25 transid 38 size 1525 nbytes 1525
>>block group 0 mode 100666 links 1 uid 0 gid 0 rdev 0
>>sequence 0 flags 0x2(none)
>>atime 1529342058.461891730 (2018-06-18 18:14:18)
>>ctime 1529342058.461891730 (2018-06-18 18:14:18)
>>mtime 1529342058.461891730 (2018-06-18 18:14:18)
>>otime 1529342055.869892885 (2018-06-18 18:14:15)
>> item 102 key (418 INODE_REF 264) itemoff 11016 itemsize 13
>>index 25 namelen 3 name: fc7
>> item 103 key (418 EXTENT_DATA 0) itemoff 9470 itemsize 1546
>>generation 38 type 0 (inline)
>>inline extent data size 1525 ram_bytes 1525 compression 0
>> (none)
>>
>> Then when test 004 invoked fiemap against the file it got a non-zero
>> physical offset:
>>
>>  $ filefrag -v /mnt/p0/d4/d7/fc7
>>  Filesystem type is: 9123683e
>>  File size of /mnt/p0/d4/d7/fc7 is 1525 (1 block of 4096 bytes)
>>   ext: logical_offset:physical_offset: length:   expected:
>> flags:
>> 0:0..4095: 18446744073709551614..  4093:   4096:
>>   last,not_aligned,inline,eof
>>  /mnt/p0/d4/d7/fc7: 1 extent found
>>
>> This resulted in the test failing like this:
>>
>> btrfs/004 49s ... [failed, exit status 1]- output mismatch (see
>> /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad)
>> --- tests/btrfs/004.out 2016-08-23 10:17:35.027012095 +0100
>> +++
>> /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad  2018-06-18
>> 18:15:02.385872155 +0100
>> @@ -1,3 +1,10 @@
>>  QA output created by 004
>>  *** test backref walking
>> -*** done
>> +./tests/btrfs/004: line 227: [: 7.55578637259143e+22: integer
>> expression expected
>> +ERROR: 7.55578637259143e+22 is not a valid numeric value.
>> +unexpected output from
>> +   /home/fdmanana/git/hub/btrfs-progs/btrfs inspect-internal
>> logical-resolve -s 65536 -P 7.55578637259143e+22
>> /home/fdmanana/btrfs-tests/scratch_1
>> ...
>> (Run 'diff -u tests/btrfs/004.out
>> /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad'  to see
>> the entire diff)
>> Ran: btrfs/004
>>
>> The large number in scientific notation reported as an invalid numeric
>> value is the result from the filter passed to perl which multiplies the
>> physical offset by the block size reported by fiemap.
>>
>> So fix this by ensuring the physical offset is always set to 0 when we
>> are processing an inline extent.
>>
>> Fixes: 9d311e11fc1f ("Btrfs: fiemap: pass correct bytenr when
>> fm_extent_count is zero")
>> Signed-off-by: Filipe Manana 
>> ---
>>  fs/btrfs/extent_io.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
>> index 8e4a7cdbc9f5..978327d98fc5 100644
>> --- a/fs/btrfs/extent_io.c
>> +++ b/fs/btrfs/extent_io.c
>> @@ -4559,6 +4559,7 @@ int extent_fiemap(struct inode *inode, struct
>> fiemap_extent_info *fieinfo,
>> end = 1;
>> flags |= FIEMAP_EXTENT_LAST;
>> } else if (em->block_start == EXTENT_MAP_INLINE) {
>> +   disko = 0;
>> flags |= (FIEMAP_EXTENT_DATA_INLINE |
>>   FIEMAP_EXTENT_NOT_ALIGNED);
>> } else if (em->block_start == EXTENT_MAP_DELALLOC) {
>
>
>
> EXTENT_MAP_DELALLOC should have the same problem.
>
> em->block_start has some special values. The following values should not be
> considered disko
> #define EXTENT_MAP_LAST_BYTE((u64)-4)
> #define EX

Re: [PATCH] Btrfs: fix physical offset reported by fiemap for inline extents

2018-06-19 Thread robbieko

fdman...@kernel.org 於 2018-06-19 19:31 寫到:

From: Filipe Manana 

Commit 9d311e11fc1f ("Btrfs: fiemap: pass correct bytenr when
fm_extent_count is zero") introduced a regression where we no longer
report 0 as the physical offset for inline extents. This is because it
always sets the variable used to report the physical offset ("disko")
as em->block_start plus some offset, and em->block_start has the value
18446744073709551614 ((u64) -2) for inline extents.

This made the btrfs test 004 (from fstests) often fail, for example, 
for
a file with an inline extent we have the following items in the 
subvolume

tree:

item 101 key (418 INODE_ITEM 0) itemoff 11029 itemsize 160
   generation 25 transid 38 size 1525 nbytes 1525
   block group 0 mode 100666 links 1 uid 0 gid 0 rdev 0
   sequence 0 flags 0x2(none)
   atime 1529342058.461891730 (2018-06-18 18:14:18)
   ctime 1529342058.461891730 (2018-06-18 18:14:18)
   mtime 1529342058.461891730 (2018-06-18 18:14:18)
   otime 1529342055.869892885 (2018-06-18 18:14:15)
item 102 key (418 INODE_REF 264) itemoff 11016 itemsize 13
   index 25 namelen 3 name: fc7
item 103 key (418 EXTENT_DATA 0) itemoff 9470 itemsize 1546
   generation 38 type 0 (inline)
   inline extent data size 1525 ram_bytes 1525 compression 0 
(none)


Then when test 004 invoked fiemap against the file it got a non-zero
physical offset:

 $ filefrag -v /mnt/p0/d4/d7/fc7
 Filesystem type is: 9123683e
 File size of /mnt/p0/d4/d7/fc7 is 1525 (1 block of 4096 bytes)
  ext: logical_offset:physical_offset: length:   expected: 
flags:

0:0..4095: 18446744073709551614..  4093:   4096:
  last,not_aligned,inline,eof
 /mnt/p0/d4/d7/fc7: 1 extent found

This resulted in the test failing like this:

btrfs/004 49s ... [failed, exit status 1]- output mismatch (see
/home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad)
--- tests/btrfs/004.out 2016-08-23 10:17:35.027012095 +0100
+++
/home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad  2018-06-18
18:15:02.385872155 +0100
@@ -1,3 +1,10 @@
 QA output created by 004
 *** test backref walking
-*** done
+./tests/btrfs/004: line 227: [: 7.55578637259143e+22: integer
expression expected
+ERROR: 7.55578637259143e+22 is not a valid numeric value.
+unexpected output from
+   /home/fdmanana/git/hub/btrfs-progs/btrfs inspect-internal
logical-resolve -s 65536 -P 7.55578637259143e+22
/home/fdmanana/btrfs-tests/scratch_1
...
(Run 'diff -u tests/btrfs/004.out
/home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad'  to see
the entire diff)
Ran: btrfs/004

The large number in scientific notation reported as an invalid numeric
value is the result from the filter passed to perl which multiplies the
physical offset by the block size reported by fiemap.

So fix this by ensuring the physical offset is always set to 0 when we
are processing an inline extent.

Fixes: 9d311e11fc1f ("Btrfs: fiemap: pass correct bytenr when
fm_extent_count is zero")
Signed-off-by: Filipe Manana 
---
 fs/btrfs/extent_io.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 8e4a7cdbc9f5..978327d98fc5 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4559,6 +4559,7 @@ int extent_fiemap(struct inode *inode, struct
fiemap_extent_info *fieinfo,
end = 1;
flags |= FIEMAP_EXTENT_LAST;
} else if (em->block_start == EXTENT_MAP_INLINE) {
+   disko = 0;
flags |= (FIEMAP_EXTENT_DATA_INLINE |
  FIEMAP_EXTENT_NOT_ALIGNED);
} else if (em->block_start == EXTENT_MAP_DELALLOC) {



EXTENT_MAP_DELALLOC should have the same problem.

em->block_start has some special values. The following values ​​should 
not be considered disko

#define EXTENT_MAP_LAST_BYTE((u64)-4)
#define EXTENT_MAP_HOLE((u64)-3)
#define EXTENT_MAP_INLINE((u64)-2)
#define EXTENT_MAP_DELALLOC((u64)-1)

Is the following change more suitable?
if (em->block_start >= EXTENT_MAP_LAST_BYTE)
disko = 0;
else
disko = em->block_start + offset_in_extent;

Thanks.
Robbie Ko


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix physical offset reported by fiemap for inline extents

2018-06-19 Thread David Sterba
On Tue, Jun 19, 2018 at 12:31:42PM +0100, fdman...@kernel.org wrote:
> From: Filipe Manana 
> So fix this by ensuring the physical offset is always set to 0 when we
> are processing an inline extent.
> 
> Fixes: 9d311e11fc1f ("Btrfs: fiemap: pass correct bytenr when fm_extent_count 
> is zero")
> Signed-off-by: Filipe Manana 

Added to 4.18 queue, thanks.

This is a fix for a patch that was in for-next for a few weeks but the
bug was discovered only after the patch got merged to master. I wonder
if there's something to be improved in the patch flow.

I think it should take less time to catch bugs while the patches are
unmerged, either in the mailinglist or in the development branches.
Post-merge fixes will happen of course, but in this particular case it
looks like something that slipped too easily. I did the "fallback"
review and checked that tests regarding fiemap pass, the occasional
failure you mention has not happened on any of my testing setups.

There's another patch for fiemap that seems to have bigger impact and
for that reason I have postponed merging it to 4.18 unlike the first
one, but now I think this requires a testcase.

https://patchwork.kernel.org/patch/10383491/

The patch will be in for-next topic branch until then.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix physical offset reported by fiemap for inline extents

2018-06-19 Thread fdmanana
From: Filipe Manana 

Commit 9d311e11fc1f ("Btrfs: fiemap: pass correct bytenr when
fm_extent_count is zero") introduced a regression where we no longer
report 0 as the physical offset for inline extents. This is because it
always sets the variable used to report the physical offset ("disko")
as em->block_start plus some offset, and em->block_start has the value
18446744073709551614 ((u64) -2) for inline extents.

This made the btrfs test 004 (from fstests) often fail, for example, for
a file with an inline extent we have the following items in the subvolume
tree:

item 101 key (418 INODE_ITEM 0) itemoff 11029 itemsize 160
   generation 25 transid 38 size 1525 nbytes 1525
   block group 0 mode 100666 links 1 uid 0 gid 0 rdev 0
   sequence 0 flags 0x2(none)
   atime 1529342058.461891730 (2018-06-18 18:14:18)
   ctime 1529342058.461891730 (2018-06-18 18:14:18)
   mtime 1529342058.461891730 (2018-06-18 18:14:18)
   otime 1529342055.869892885 (2018-06-18 18:14:15)
item 102 key (418 INODE_REF 264) itemoff 11016 itemsize 13
   index 25 namelen 3 name: fc7
item 103 key (418 EXTENT_DATA 0) itemoff 9470 itemsize 1546
   generation 38 type 0 (inline)
   inline extent data size 1525 ram_bytes 1525 compression 0 (none)

Then when test 004 invoked fiemap against the file it got a non-zero
physical offset:

 $ filefrag -v /mnt/p0/d4/d7/fc7
 Filesystem type is: 9123683e
 File size of /mnt/p0/d4/d7/fc7 is 1525 (1 block of 4096 bytes)
  ext: logical_offset:physical_offset: length:   expected: flags:
0:0..4095: 18446744073709551614..  4093:   4096:
 last,not_aligned,inline,eof
 /mnt/p0/d4/d7/fc7: 1 extent found

This resulted in the test failing like this:

btrfs/004 49s ... [failed, exit status 1]- output mismatch (see 
/home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad)
--- tests/btrfs/004.out 2016-08-23 10:17:35.027012095 +0100
+++ /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad  
2018-06-18 18:15:02.385872155 +0100
@@ -1,3 +1,10 @@
 QA output created by 004
 *** test backref walking
-*** done
+./tests/btrfs/004: line 227: [: 7.55578637259143e+22: integer expression 
expected
+ERROR: 7.55578637259143e+22 is not a valid numeric value.
+unexpected output from
+   /home/fdmanana/git/hub/btrfs-progs/btrfs inspect-internal 
logical-resolve -s 65536 -P 7.55578637259143e+22 
/home/fdmanana/btrfs-tests/scratch_1
...
(Run 'diff -u tests/btrfs/004.out 
/home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad'  to see the entire 
diff)
Ran: btrfs/004

The large number in scientific notation reported as an invalid numeric
value is the result from the filter passed to perl which multiplies the
physical offset by the block size reported by fiemap.

So fix this by ensuring the physical offset is always set to 0 when we
are processing an inline extent.

Fixes: 9d311e11fc1f ("Btrfs: fiemap: pass correct bytenr when fm_extent_count 
is zero")
Signed-off-by: Filipe Manana 
---
 fs/btrfs/extent_io.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 8e4a7cdbc9f5..978327d98fc5 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4559,6 +4559,7 @@ int extent_fiemap(struct inode *inode, struct 
fiemap_extent_info *fieinfo,
end = 1;
flags |= FIEMAP_EXTENT_LAST;
} else if (em->block_start == EXTENT_MAP_INLINE) {
+   disko = 0;
flags |= (FIEMAP_EXTENT_DATA_INLINE |
  FIEMAP_EXTENT_NOT_ALIGNED);
} else if (em->block_start == EXTENT_MAP_DELALLOC) {
-- 
2.11.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: add missing memset while reading compressed inline extents

2017-04-07 Thread Filipe Manana
On Fri, Apr 7, 2017 at 2:07 AM, Qu Wenruo <quwen...@cn.fujitsu.com> wrote:
>
>
> At 04/07/2017 12:07 AM, Filipe Manana wrote:
>>
>> On Wed, Mar 22, 2017 at 2:37 AM, Qu Wenruo <quwen...@cn.fujitsu.com>
>> wrote:
>>>
>>>
>>>
>>> At 03/09/2017 10:05 AM, Zygo Blaxell wrote:
>>>>
>>>>
>>>> On Wed, Mar 08, 2017 at 10:27:33AM +, Filipe Manana wrote:
>>>>>
>>>>>
>>>>> On Wed, Mar 8, 2017 at 3:18 AM, Zygo Blaxell
>>>>> <zblax...@waya.furryterror.org> wrote:
>>>>>>
>>>>>>
>>>>>> From: Zygo Blaxell <ce3g8...@umail.furryterror.org>
>>>>>>
>>>>>> This is a story about 4 distinct (and very old) btrfs bugs.
>>>>>>
>>>>>> Commit c8b978188c ("Btrfs: Add zlib compression support") added
>>>>>> three data corruption bugs for inline extents (bugs #1-3).
>>>>>>
>>>>>> Commit 93c82d5750 ("Btrfs: zero page past end of inline file items")
>>>>>> fixed bug #1:  uncompressed inline extents followed by a hole and more
>>>>>> extents could get non-zero data in the hole as they were read.  The
>>>>>> fix
>>>>>> was to add a memset in btrfs_get_extent to zero out the hole.
>>>>>>
>>>>>> Commit 166ae5a418 ("btrfs: fix inline compressed read err corruption")
>>>>>> fixed bug #2:  compressed inline extents which contained non-zero
>>>>>> bytes
>>>>>> might be replaced with zero bytes in some cases.  This patch removed
>>>>>> an
>>>>>> unhelpful memset from uncompress_inline, but the case where memset is
>>>>>> required was missed.
>>>>>>
>>>>>> There is also a memset in the decompression code, but this only covers
>>>>>> decompressed data that is shorter than the ram_bytes from the extent
>>>>>> ref record.  This memset doesn't cover the region between the end of
>>>>>> the
>>>>>> decompressed data and the end of the page.  It has also moved around a
>>>>>> few times over the years, so there's no single patch to refer to.
>>>>>>
>>>>>> This patch fixes bug #3:  compressed inline extents followed by a hole
>>>>>> and more extents could get non-zero data in the hole as they were read
>>>>>> (i.e. bug #3 is the same as bug #1, but s/uncompressed/compressed/).
>>>>>> The fix is the same:  zero out the hole in the compressed case too,
>>>>>> by putting a memset back in uncompress_inline, but this time with
>>>>>> correct parameters.
>>>>>>
>>>>>> The last and oldest bug, bug #0, is the cause of the offending inline
>>>>>> extent/hole/extent pattern.  Bug #0 is a subtle and mostly-harmless
>>>>>> quirk
>>>>>> of behavior somewhere in the btrfs write code.  In a few special
>>>>>> cases,
>>>>>> an inline extent and hole are allowed to persist where they normally
>>>>>> would be combined with later extents in the file.
>>>>>>
>>>>>> A fast reproducer for bug #0 is presented below.  A few offending
>>>>>> extents
>>>>>> are also created in the wild during large rsync transfers with the -S
>>>>>> flag.  A Linux kernel build (git checkout; make allyesconfig; make
>>>>>> -j8)
>>>>>> will produce a handful of offending files as well.  Once an offending
>>>>>> file is created, it can present different content to userspace each
>>>>>> time it is read.
>>>>>>
>>>>>> Bug #0 is at least 4 and possibly 8 years old.  I verified every vX.Y
>>>>>> kernel back to v3.5 has this behavior.  There are fossil records of
>>>>>> this
>>>>>> bug's effects in commits all the way back to v2.6.32.  I have no
>>>>>> reason
>>>>>> to believe bug #0 wasn't present at the beginning of btrfs compression
>>>>>> support in v2.6.29, but I can't easily test kernels that old to be
>>>>>> sure.
>>>>>>
>>>>>> It is not clear whether bug #0 is worth fixing.  A fix would likely
>>>>>> require injecting extra reads 

Re: [PATCH] btrfs: add missing memset while reading compressed inline extents

2017-04-06 Thread Qu Wenruo



At 04/07/2017 12:07 AM, Filipe Manana wrote:

On Wed, Mar 22, 2017 at 2:37 AM, Qu Wenruo <quwen...@cn.fujitsu.com> wrote:



At 03/09/2017 10:05 AM, Zygo Blaxell wrote:


On Wed, Mar 08, 2017 at 10:27:33AM +, Filipe Manana wrote:


On Wed, Mar 8, 2017 at 3:18 AM, Zygo Blaxell
<zblax...@waya.furryterror.org> wrote:


From: Zygo Blaxell <ce3g8...@umail.furryterror.org>

This is a story about 4 distinct (and very old) btrfs bugs.

Commit c8b978188c ("Btrfs: Add zlib compression support") added
three data corruption bugs for inline extents (bugs #1-3).

Commit 93c82d5750 ("Btrfs: zero page past end of inline file items")
fixed bug #1:  uncompressed inline extents followed by a hole and more
extents could get non-zero data in the hole as they were read.  The fix
was to add a memset in btrfs_get_extent to zero out the hole.

Commit 166ae5a418 ("btrfs: fix inline compressed read err corruption")
fixed bug #2:  compressed inline extents which contained non-zero bytes
might be replaced with zero bytes in some cases.  This patch removed an
unhelpful memset from uncompress_inline, but the case where memset is
required was missed.

There is also a memset in the decompression code, but this only covers
decompressed data that is shorter than the ram_bytes from the extent
ref record.  This memset doesn't cover the region between the end of the
decompressed data and the end of the page.  It has also moved around a
few times over the years, so there's no single patch to refer to.

This patch fixes bug #3:  compressed inline extents followed by a hole
and more extents could get non-zero data in the hole as they were read
(i.e. bug #3 is the same as bug #1, but s/uncompressed/compressed/).
The fix is the same:  zero out the hole in the compressed case too,
by putting a memset back in uncompress_inline, but this time with
correct parameters.

The last and oldest bug, bug #0, is the cause of the offending inline
extent/hole/extent pattern.  Bug #0 is a subtle and mostly-harmless
quirk
of behavior somewhere in the btrfs write code.  In a few special cases,
an inline extent and hole are allowed to persist where they normally
would be combined with later extents in the file.

A fast reproducer for bug #0 is presented below.  A few offending
extents
are also created in the wild during large rsync transfers with the -S
flag.  A Linux kernel build (git checkout; make allyesconfig; make -j8)
will produce a handful of offending files as well.  Once an offending
file is created, it can present different content to userspace each
time it is read.

Bug #0 is at least 4 and possibly 8 years old.  I verified every vX.Y
kernel back to v3.5 has this behavior.  There are fossil records of this
bug's effects in commits all the way back to v2.6.32.  I have no reason
to believe bug #0 wasn't present at the beginning of btrfs compression
support in v2.6.29, but I can't easily test kernels that old to be sure.

It is not clear whether bug #0 is worth fixing.  A fix would likely
require injecting extra reads into currently write-only paths, and most
of the exceptional cases caused by bug #0 are already handled now.

Whether we like them or not, bug #0's inline extents followed by holes
are part of the btrfs de-facto disk format now, and we need to be able
to read them without data corruption or an infoleak.  So enough about
bug #0, let's get back to bug #3 (this patch).

An example of on-disk structure leading to data corruption:

item 61 key (606890 INODE_ITEM 0) itemoff 9662 itemsize 160
inode generation 50 transid 50 size 47424 nbytes 49141
block group 0 mode 100644 links 1 uid 0 gid 0
rdev 0 flags 0x0(none)
item 62 key (606890 INODE_REF 603050) itemoff 9642 itemsize 20
inode ref index 3 namelen 10 name: DB_File.so
item 63 key (606890 EXTENT_DATA 0) itemoff 8280 itemsize 1362
inline extent data size 1341 ram 4085 compress(zlib)
item 64 key (606890 EXTENT_DATA 4096) itemoff 8227 itemsize 53
extent data disk byte 5367308288 nr 20480
extent data offset 0 nr 45056 ram 45056
extent compression(zlib)



So this case is actually different from the reproducer below, because
once a file has prealloc extents, future writes will never be
compressed. That is, the extent at offset 4096 can not have compressed
content using steps like the reproducer below. I can only imagine that
after an falloc to create the extent at 4K, we cloned some compressed
extent from some other file into that offset.



The above comes from one of my captures of the bug appearing in the wild,
not the reproducer from Liu Bo.  I'll make that clearer.



In fact, I found that btrfs/137 with compress=lzo seems to trigger such
"inline-then-regular" case in a very high possibility.

If abort the test just after populating the fs, I found the possibility to

Re: [PATCH] btrfs: add missing memset while reading compressed inline extents

2017-04-06 Thread Filipe Manana
On Wed, Mar 22, 2017 at 2:37 AM, Qu Wenruo <quwen...@cn.fujitsu.com> wrote:
>
>
> At 03/09/2017 10:05 AM, Zygo Blaxell wrote:
>>
>> On Wed, Mar 08, 2017 at 10:27:33AM +, Filipe Manana wrote:
>>>
>>> On Wed, Mar 8, 2017 at 3:18 AM, Zygo Blaxell
>>> <zblax...@waya.furryterror.org> wrote:
>>>>
>>>> From: Zygo Blaxell <ce3g8...@umail.furryterror.org>
>>>>
>>>> This is a story about 4 distinct (and very old) btrfs bugs.
>>>>
>>>> Commit c8b978188c ("Btrfs: Add zlib compression support") added
>>>> three data corruption bugs for inline extents (bugs #1-3).
>>>>
>>>> Commit 93c82d5750 ("Btrfs: zero page past end of inline file items")
>>>> fixed bug #1:  uncompressed inline extents followed by a hole and more
>>>> extents could get non-zero data in the hole as they were read.  The fix
>>>> was to add a memset in btrfs_get_extent to zero out the hole.
>>>>
>>>> Commit 166ae5a418 ("btrfs: fix inline compressed read err corruption")
>>>> fixed bug #2:  compressed inline extents which contained non-zero bytes
>>>> might be replaced with zero bytes in some cases.  This patch removed an
>>>> unhelpful memset from uncompress_inline, but the case where memset is
>>>> required was missed.
>>>>
>>>> There is also a memset in the decompression code, but this only covers
>>>> decompressed data that is shorter than the ram_bytes from the extent
>>>> ref record.  This memset doesn't cover the region between the end of the
>>>> decompressed data and the end of the page.  It has also moved around a
>>>> few times over the years, so there's no single patch to refer to.
>>>>
>>>> This patch fixes bug #3:  compressed inline extents followed by a hole
>>>> and more extents could get non-zero data in the hole as they were read
>>>> (i.e. bug #3 is the same as bug #1, but s/uncompressed/compressed/).
>>>> The fix is the same:  zero out the hole in the compressed case too,
>>>> by putting a memset back in uncompress_inline, but this time with
>>>> correct parameters.
>>>>
>>>> The last and oldest bug, bug #0, is the cause of the offending inline
>>>> extent/hole/extent pattern.  Bug #0 is a subtle and mostly-harmless
>>>> quirk
>>>> of behavior somewhere in the btrfs write code.  In a few special cases,
>>>> an inline extent and hole are allowed to persist where they normally
>>>> would be combined with later extents in the file.
>>>>
>>>> A fast reproducer for bug #0 is presented below.  A few offending
>>>> extents
>>>> are also created in the wild during large rsync transfers with the -S
>>>> flag.  A Linux kernel build (git checkout; make allyesconfig; make -j8)
>>>> will produce a handful of offending files as well.  Once an offending
>>>> file is created, it can present different content to userspace each
>>>> time it is read.
>>>>
>>>> Bug #0 is at least 4 and possibly 8 years old.  I verified every vX.Y
>>>> kernel back to v3.5 has this behavior.  There are fossil records of this
>>>> bug's effects in commits all the way back to v2.6.32.  I have no reason
>>>> to believe bug #0 wasn't present at the beginning of btrfs compression
>>>> support in v2.6.29, but I can't easily test kernels that old to be sure.
>>>>
>>>> It is not clear whether bug #0 is worth fixing.  A fix would likely
>>>> require injecting extra reads into currently write-only paths, and most
>>>> of the exceptional cases caused by bug #0 are already handled now.
>>>>
>>>> Whether we like them or not, bug #0's inline extents followed by holes
>>>> are part of the btrfs de-facto disk format now, and we need to be able
>>>> to read them without data corruption or an infoleak.  So enough about
>>>> bug #0, let's get back to bug #3 (this patch).
>>>>
>>>> An example of on-disk structure leading to data corruption:
>>>>
>>>> item 61 key (606890 INODE_ITEM 0) itemoff 9662 itemsize 160
>>>> inode generation 50 transid 50 size 47424 nbytes 49141
>>>> block group 0 mode 100644 links 1 uid 0 gid 0
>>>> rdev 0 flags 0x0(none)
>>>> item 62 key (606890 INODE_REF 603050) itemoff 9642 itemsize 20

Re: [v4] btrfs: add missing memset while reading compressed inline extents

2017-03-30 Thread Ismael Luceno
On 10/Mar/2017 16:45, Zygo Blaxell wrote:
> This is a story about 4 distinct (and very old) btrfs bugs.
> 
<...>

Tested-by: Ismael Luceno 

I encountered the issue in the wild; it caused frequent segfaults
at ld.so after some hours of operation, as well as integrity check
failures. A quick inspection revealed corruption in the ELF headers,
with garbage where there should be zeros.

I managed to reproduce the issue right away by enabling page poisoning,
and the problem effectively goes away when applying the patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: add missing memset while reading compressed inline extents

2017-03-21 Thread Qu Wenruo



At 03/09/2017 10:05 AM, Zygo Blaxell wrote:

On Wed, Mar 08, 2017 at 10:27:33AM +, Filipe Manana wrote:

On Wed, Mar 8, 2017 at 3:18 AM, Zygo Blaxell
<zblax...@waya.furryterror.org> wrote:

From: Zygo Blaxell <ce3g8...@umail.furryterror.org>

This is a story about 4 distinct (and very old) btrfs bugs.

Commit c8b978188c ("Btrfs: Add zlib compression support") added
three data corruption bugs for inline extents (bugs #1-3).

Commit 93c82d5750 ("Btrfs: zero page past end of inline file items")
fixed bug #1:  uncompressed inline extents followed by a hole and more
extents could get non-zero data in the hole as they were read.  The fix
was to add a memset in btrfs_get_extent to zero out the hole.

Commit 166ae5a418 ("btrfs: fix inline compressed read err corruption")
fixed bug #2:  compressed inline extents which contained non-zero bytes
might be replaced with zero bytes in some cases.  This patch removed an
unhelpful memset from uncompress_inline, but the case where memset is
required was missed.

There is also a memset in the decompression code, but this only covers
decompressed data that is shorter than the ram_bytes from the extent
ref record.  This memset doesn't cover the region between the end of the
decompressed data and the end of the page.  It has also moved around a
few times over the years, so there's no single patch to refer to.

This patch fixes bug #3:  compressed inline extents followed by a hole
and more extents could get non-zero data in the hole as they were read
(i.e. bug #3 is the same as bug #1, but s/uncompressed/compressed/).
The fix is the same:  zero out the hole in the compressed case too,
by putting a memset back in uncompress_inline, but this time with
correct parameters.

The last and oldest bug, bug #0, is the cause of the offending inline
extent/hole/extent pattern.  Bug #0 is a subtle and mostly-harmless quirk
of behavior somewhere in the btrfs write code.  In a few special cases,
an inline extent and hole are allowed to persist where they normally
would be combined with later extents in the file.

A fast reproducer for bug #0 is presented below.  A few offending extents
are also created in the wild during large rsync transfers with the -S
flag.  A Linux kernel build (git checkout; make allyesconfig; make -j8)
will produce a handful of offending files as well.  Once an offending
file is created, it can present different content to userspace each
time it is read.

Bug #0 is at least 4 and possibly 8 years old.  I verified every vX.Y
kernel back to v3.5 has this behavior.  There are fossil records of this
bug's effects in commits all the way back to v2.6.32.  I have no reason
to believe bug #0 wasn't present at the beginning of btrfs compression
support in v2.6.29, but I can't easily test kernels that old to be sure.

It is not clear whether bug #0 is worth fixing.  A fix would likely
require injecting extra reads into currently write-only paths, and most
of the exceptional cases caused by bug #0 are already handled now.

Whether we like them or not, bug #0's inline extents followed by holes
are part of the btrfs de-facto disk format now, and we need to be able
to read them without data corruption or an infoleak.  So enough about
bug #0, let's get back to bug #3 (this patch).

An example of on-disk structure leading to data corruption:

item 61 key (606890 INODE_ITEM 0) itemoff 9662 itemsize 160
inode generation 50 transid 50 size 47424 nbytes 49141
block group 0 mode 100644 links 1 uid 0 gid 0
rdev 0 flags 0x0(none)
item 62 key (606890 INODE_REF 603050) itemoff 9642 itemsize 20
inode ref index 3 namelen 10 name: DB_File.so
item 63 key (606890 EXTENT_DATA 0) itemoff 8280 itemsize 1362
inline extent data size 1341 ram 4085 compress(zlib)
item 64 key (606890 EXTENT_DATA 4096) itemoff 8227 itemsize 53
extent data disk byte 5367308288 nr 20480
extent data offset 0 nr 45056 ram 45056
extent compression(zlib)


So this case is actually different from the reproducer below, because
once a file has prealloc extents, future writes will never be
compressed. That is, the extent at offset 4096 can not have compressed
content using steps like the reproducer below. I can only imagine that
after an falloc to create the extent at 4K, we cloned some compressed
extent from some other file into that offset.


The above comes from one of my captures of the bug appearing in the wild,
not the reproducer from Liu Bo.  I'll make that clearer.


In fact, I found that btrfs/137 with compress=lzo seems to trigger such 
"inline-then-regular" case in a very high possibility.


If abort the test just after populating the fs, I found the possibility 
to have inline-then-regular extent layout is over 70%.


Although in btrfs/137 case, it's "inline-then-hole", not completely 
"i

[PATCH v4] btrfs: add missing memset while reading compressed inline extents

2017-03-10 Thread Zygo Blaxell
This is a story about 4 distinct (and very old) btrfs bugs.

Commit c8b978188c ("Btrfs: Add zlib compression support") added
three data corruption bugs for inline extents (bugs #1-3).

Commit 93c82d5750 ("Btrfs: zero page past end of inline file items")
fixed bug #1:  uncompressed inline extents followed by a hole and more
extents could get non-zero data in the hole as they were read.  The fix
was to add a memset in btrfs_get_extent to zero out the hole.

Commit 166ae5a418 ("btrfs: fix inline compressed read err corruption")
fixed bug #2:  compressed inline extents which contained non-zero bytes
might be replaced with zero bytes in some cases.  This patch removed an
unhelpful memset from uncompress_inline, but the case where memset is
required was missed.

There is also a memset in the decompression code, but this only covers
decompressed data that is shorter than the ram_bytes from the extent
ref record.  This memset doesn't cover the region between the end of the
decompressed data and the end of the page.  It has also moved around a
few times over the years, so there's no single patch to refer to.

This patch fixes bug #3:  compressed inline extents followed by a hole
and more extents could get non-zero data in the hole as they were read
(i.e. bug #3 is the same as bug #1, but s/uncompressed/compressed/).
The fix is the same:  zero out the hole in the compressed case too,
by putting a memset back in uncompress_inline, but this time with
correct parameters.

The last and oldest bug, bug #0, is the cause of the offending inline
extent/hole/extent pattern.  Bug #0 is a subtle and mostly-harmless quirk
of behavior somewhere in the btrfs write code.  In a few special cases,
an inline extent and hole are allowed to persist where they normally
would be combined with later extents in the file.

A fast reproducer for bug #0 is presented below.  A few offending extents
are also created in the wild during large rsync transfers with the -S
flag.  A Linux kernel build (git checkout; make allyesconfig; make -j8)
will produce a handful of offending files as well.  Once an offending
file is created, it can present different content to userspace each
time it is read.

Bug #0 is at least 4 and possibly 8 years old.  I verified every vX.Y
kernel back to v3.5 has this behavior.  There are fossil records of this
bug's effects in commits all the way back to v2.6.32.  I have no reason
to believe bug #0 wasn't present at the beginning of btrfs compression
support in v2.6.29, but I can't easily test kernels that old to be sure.

It is not clear whether bug #0 is worth fixing.  A fix would likely
require injecting extra reads into currently write-only paths, and most
of the exceptional cases caused by bug #0 are already handled now.

Whether we like them or not, bug #0's inline extents followed by holes
are part of the btrfs de-facto disk format now, and we need to be able
to read them without data corruption or an infoleak.  So enough about
bug #0, let's get back to bug #3 (this patch).

An example of on-disk structure leading to data corruption found in
the wild:

item 61 key (606890 INODE_ITEM 0) itemoff 9662 itemsize 160
inode generation 50 transid 50 size 47424 nbytes 49141
block group 0 mode 100644 links 1 uid 0 gid 0
rdev 0 flags 0x0(none)
item 62 key (606890 INODE_REF 603050) itemoff 9642 itemsize 20
inode ref index 3 namelen 10 name: DB_File.so
item 63 key (606890 EXTENT_DATA 0) itemoff 8280 itemsize 1362
inline extent data size 1341 ram 4085 compress(zlib)
item 64 key (606890 EXTENT_DATA 4096) itemoff 8227 itemsize 53
extent data disk byte 5367308288 nr 20480
extent data offset 0 nr 45056 ram 45056
extent compression(zlib)

Different data appears in userspace during each read of the 11 bytes
between 4085 and 4096.  The extent in item 63 is not long enough to
fill the first page of the file, so a memset is required to fill the
space between item 63 (ending at 4085) and item 64 (beginning at 4096)
with zero.

Here is a reproducer from Liu Bo, which demonstrates another method
of creating the same inline extent and hole pattern:

Using 'page_poison=on' kernel command line (or enable
CONFIG_PAGE_POISONING) run the following:

# touch foo
# chattr +c foo
# xfs_io -f -c "pwrite -W 0 1000" foo
# xfs_io -f -c "falloc 4 8188" foo
# od -x foo
# echo 3 >/proc/sys/vm/drop_caches
# od -x foo

This produce the following on my box:

Correct output:  file contains 1000 data bytes followed
by zeros:

000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
*
0001740 cdcd cdcd cdcd cdcd    
0001760        
*
002

Actual output:  the data after the first 1000 bytes
will be differen

Re: [PATCH v3] btrfs: add missing memset while reading compressed inline extents

2017-03-10 Thread Zygo Blaxell
On Fri, Mar 10, 2017 at 02:12:54PM -0500, Chris Mason wrote:
> 
> 
> On 03/10/2017 01:56 PM, Zygo Blaxell wrote:
> >On Fri, Mar 10, 2017 at 11:19:24AM -0500, Chris Mason wrote:
> >>On 03/09/2017 11:41 PM, Zygo Blaxell wrote:
> >>>On Thu, Mar 09, 2017 at 10:39:49AM -0500, Chris Mason wrote:
> >>>>
> >>>>
> >>>>On 03/08/2017 09:12 PM, Zygo Blaxell wrote:
> >>>>>This is a story about 4 distinct (and very old) btrfs bugs.
> >>>>>
> >>>>
> >>>>Really great write up.
> >>>>
> >>>>[ ... ]
> >>>>
> >>>>>
> >>>>>diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> >>>>>index 25ac2cf..4d41a31 100644
> >>>>>--- a/fs/btrfs/inode.c
> >>>>>+++ b/fs/btrfs/inode.c
> >>>>>@@ -6805,6 +6805,12 @@ static noinline int uncompress_inline(struct 
> >>>>>btrfs_path *path,
> >>>>> max_size = min_t(unsigned long, PAGE_SIZE, max_size);
> >>>>> ret = btrfs_decompress(compress_type, tmp, page,
> >>>>>extent_offset, inline_size, max_size);
> >>>>>+WARN_ON(max_size + pg_offset > PAGE_SIZE);
> >>>>
> >>>>Can you please drop this WARN_ON and make the math reflect any possible
> >>>>pg_offset?  I do agree it shouldn't be happening, but its easy to correct
> >>>>for and the WARN is likely to get lost.
> >>>
> >>>I'm not sure how to do that.  It looks like I'd have to pass pg_offset
> >>>through btrfs_decompress to the decompress functions?
> >>>
> >>>   ret = btrfs_decompress(compress_type, tmp, page,
> >>>   extent_offset, inline_size, max_size, pg_offset);
> >>>
> >>>and in the compression functions get pg_offset from the argument list
> >>>instead of hardcoding zero.
> >>
> >>Yeah, it's a good point.  Both zlib and lzo are assuming a zero pg_offset
> >>right now, but just like there are wacky corners allowing inline extents
> >>followed by more data, there are a few wacky corners allowing inline extents
> >>at the end of the file.
> >>
> >>Lets not mix that change in with this one though.  For now, just get the
> >>memset right and we can pass pg_offset down in a later patch.
> >
> >Are you saying "fix the memset in the patch" (and if so, what's wrong
> >with it?), or are you saying "let's take the patch with its memset as is,
> >and fix the pg_offset > 0 issues later"?
> 
> Your WARN_ON() would fire when this math is bad:
> 
> memset(map + pg_offset + max_size, 0, PAGE_SIZE - max_size - pg_offset);
> 
> Instead or warning, just don't memset if pg_offset + max_size >= PAGE_SIZE

OK.

While I was looking at this function I noticed that there doesn't seem to be
a sanity check on the data in the extent ref.  e.g. ram_bytes could be 2GB
and nothing would notice.  I'm pretty sure that's only possible by fuzzing,
but it seemed worthwhile to log it if it ever happened.

I'll take the WARN_ON out, and also put in the comment you asked for in the
other branch of this thread.

> -chris
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] btrfs: add missing memset while reading compressed inline extents

2017-03-10 Thread Chris Mason



On 03/10/2017 01:56 PM, Zygo Blaxell wrote:

On Fri, Mar 10, 2017 at 11:19:24AM -0500, Chris Mason wrote:

On 03/09/2017 11:41 PM, Zygo Blaxell wrote:

On Thu, Mar 09, 2017 at 10:39:49AM -0500, Chris Mason wrote:



On 03/08/2017 09:12 PM, Zygo Blaxell wrote:

This is a story about 4 distinct (and very old) btrfs bugs.



Really great write up.

[ ... ]



diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 25ac2cf..4d41a31 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6805,6 +6805,12 @@ static noinline int uncompress_inline(struct btrfs_path 
*path,
max_size = min_t(unsigned long, PAGE_SIZE, max_size);
ret = btrfs_decompress(compress_type, tmp, page,
   extent_offset, inline_size, max_size);
+   WARN_ON(max_size + pg_offset > PAGE_SIZE);


Can you please drop this WARN_ON and make the math reflect any possible
pg_offset?  I do agree it shouldn't be happening, but its easy to correct
for and the WARN is likely to get lost.


I'm not sure how to do that.  It looks like I'd have to pass pg_offset
through btrfs_decompress to the decompress functions?

ret = btrfs_decompress(compress_type, tmp, page,
extent_offset, inline_size, max_size, pg_offset);

and in the compression functions get pg_offset from the argument list
instead of hardcoding zero.


Yeah, it's a good point.  Both zlib and lzo are assuming a zero pg_offset
right now, but just like there are wacky corners allowing inline extents
followed by more data, there are a few wacky corners allowing inline extents
at the end of the file.

Lets not mix that change in with this one though.  For now, just get the
memset right and we can pass pg_offset down in a later patch.


Are you saying "fix the memset in the patch" (and if so, what's wrong
with it?), or are you saying "let's take the patch with its memset as is,
and fix the pg_offset > 0 issues later"?


Your WARN_ON() would fire when this math is bad:

memset(map + pg_offset + max_size, 0, PAGE_SIZE - max_size - pg_offset);

Instead or warning, just don't memset if pg_offset + max_size >= PAGE_SIZE

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] btrfs: add missing memset while reading compressed inline extents

2017-03-10 Thread Zygo Blaxell
On Fri, Mar 10, 2017 at 11:19:24AM -0500, Chris Mason wrote:
> On 03/09/2017 11:41 PM, Zygo Blaxell wrote:
> >On Thu, Mar 09, 2017 at 10:39:49AM -0500, Chris Mason wrote:
> >>
> >>
> >>On 03/08/2017 09:12 PM, Zygo Blaxell wrote:
> >>>This is a story about 4 distinct (and very old) btrfs bugs.
> >>>
> >>
> >>Really great write up.
> >>
> >>[ ... ]
> >>
> >>>
> >>>diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> >>>index 25ac2cf..4d41a31 100644
> >>>--- a/fs/btrfs/inode.c
> >>>+++ b/fs/btrfs/inode.c
> >>>@@ -6805,6 +6805,12 @@ static noinline int uncompress_inline(struct 
> >>>btrfs_path *path,
> >>>   max_size = min_t(unsigned long, PAGE_SIZE, max_size);
> >>>   ret = btrfs_decompress(compress_type, tmp, page,
> >>>  extent_offset, inline_size, max_size);
> >>>+  WARN_ON(max_size + pg_offset > PAGE_SIZE);
> >>
> >>Can you please drop this WARN_ON and make the math reflect any possible
> >>pg_offset?  I do agree it shouldn't be happening, but its easy to correct
> >>for and the WARN is likely to get lost.
> >
> >I'm not sure how to do that.  It looks like I'd have to pass pg_offset
> >through btrfs_decompress to the decompress functions?
> >
> > ret = btrfs_decompress(compress_type, tmp, page,
> >     extent_offset, inline_size, max_size, pg_offset);
> >
> >and in the compression functions get pg_offset from the argument list
> >instead of hardcoding zero.
> 
> Yeah, it's a good point.  Both zlib and lzo are assuming a zero pg_offset
> right now, but just like there are wacky corners allowing inline extents
> followed by more data, there are a few wacky corners allowing inline extents
> at the end of the file.
> 
> Lets not mix that change in with this one though.  For now, just get the
> memset right and we can pass pg_offset down in a later patch.

Are you saying "fix the memset in the patch" (and if so, what's wrong
with it?), or are you saying "let's take the patch with its memset as is,
and fix the pg_offset > 0 issues later"?

> -chris
> 


signature.asc
Description: Digital signature


Re: [PATCH v3] btrfs: add missing memset while reading compressed inline extents

2017-03-10 Thread Chris Mason

On 03/09/2017 11:41 PM, Zygo Blaxell wrote:

On Thu, Mar 09, 2017 at 10:39:49AM -0500, Chris Mason wrote:



On 03/08/2017 09:12 PM, Zygo Blaxell wrote:

This is a story about 4 distinct (and very old) btrfs bugs.



Really great write up.

[ ... ]



diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 25ac2cf..4d41a31 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6805,6 +6805,12 @@ static noinline int uncompress_inline(struct btrfs_path 
*path,
max_size = min_t(unsigned long, PAGE_SIZE, max_size);
ret = btrfs_decompress(compress_type, tmp, page,
   extent_offset, inline_size, max_size);
+   WARN_ON(max_size + pg_offset > PAGE_SIZE);


Can you please drop this WARN_ON and make the math reflect any possible
pg_offset?  I do agree it shouldn't be happening, but its easy to correct
for and the WARN is likely to get lost.


I'm not sure how to do that.  It looks like I'd have to pass pg_offset
through btrfs_decompress to the decompress functions?

ret = btrfs_decompress(compress_type, tmp, page,
extent_offset, inline_size, max_size, pg_offset);

and in the compression functions get pg_offset from the argument list
instead of hardcoding zero.


Yeah, it's a good point.  Both zlib and lzo are assuming a zero 
pg_offset right now, but just like there are wacky corners allowing 
inline extents followed by more data, there are a few wacky corners 
allowing inline extents at the end of the file.


Lets not mix that change in with this one though.  For now, just get the 
memset right and we can pass pg_offset down in a later patch.


-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] btrfs: add missing memset while reading compressed inline extents

2017-03-09 Thread Zygo Blaxell
On Thu, Mar 09, 2017 at 10:39:49AM -0500, Chris Mason wrote:
> 
> 
> On 03/08/2017 09:12 PM, Zygo Blaxell wrote:
> >This is a story about 4 distinct (and very old) btrfs bugs.
> >
> 
> Really great write up.
> 
> [ ... ]
> 
> >
> >diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> >index 25ac2cf..4d41a31 100644
> >--- a/fs/btrfs/inode.c
> >+++ b/fs/btrfs/inode.c
> >@@ -6805,6 +6805,12 @@ static noinline int uncompress_inline(struct 
> >btrfs_path *path,
> > max_size = min_t(unsigned long, PAGE_SIZE, max_size);
> > ret = btrfs_decompress(compress_type, tmp, page,
> >extent_offset, inline_size, max_size);
> >+WARN_ON(max_size + pg_offset > PAGE_SIZE);
> 
> Can you please drop this WARN_ON and make the math reflect any possible
> pg_offset?  I do agree it shouldn't be happening, but its easy to correct
> for and the WARN is likely to get lost.

I'm not sure how to do that.  It looks like I'd have to pass pg_offset
through btrfs_decompress to the decompress functions?

ret = btrfs_decompress(compress_type, tmp, page,
extent_offset, inline_size, max_size, pg_offset);

and in the compression functions get pg_offset from the argument list
instead of hardcoding zero.

But how does pg_offset become non-zero for an inline extent?  A micro-hole
before the first byte?  If the offset was >= 4096, the data wouldn't
be in the first block so there would never be an inline extent in the
first place.

> >+if (max_size + pg_offset < PAGE_SIZE) {
> >+char *map = kmap(page);
> >+memset(map + pg_offset + max_size, 0, PAGE_SIZE - max_size - 
> >pg_offset);
> >+kunmap(page);
> >+}
> 
> Both lzo and zlib have a memset to cover the gap between what they actually
> decompress and the max_size that we pass here.  That's important because
> ram_bytes may not be 100% accurate.
> 
> Can you also please toss in a comment about how the decompression code is
> responsible for the memset up to max_bytes?
> 
> -chris
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


signature.asc
Description: Digital signature


Re: [PATCH v3] btrfs: add missing memset while reading compressed inline extents

2017-03-09 Thread Chris Mason



On 03/08/2017 09:12 PM, Zygo Blaxell wrote:

This is a story about 4 distinct (and very old) btrfs bugs.



Really great write up.

[ ... ]



diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 25ac2cf..4d41a31 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6805,6 +6805,12 @@ static noinline int uncompress_inline(struct btrfs_path 
*path,
max_size = min_t(unsigned long, PAGE_SIZE, max_size);
ret = btrfs_decompress(compress_type, tmp, page,
   extent_offset, inline_size, max_size);
+   WARN_ON(max_size + pg_offset > PAGE_SIZE);


Can you please drop this WARN_ON and make the math reflect any possible 
pg_offset?  I do agree it shouldn't be happening, but its easy to 
correct for and the WARN is likely to get lost.



+   if (max_size + pg_offset < PAGE_SIZE) {
+   char *map = kmap(page);
+   memset(map + pg_offset + max_size, 0, PAGE_SIZE - max_size - 
pg_offset);
+   kunmap(page);
+   }


Both lzo and zlib have a memset to cover the gap between what they 
actually decompress and the max_size that we pass here.  That's 
important because ram_bytes may not be 100% accurate.


Can you also please toss in a comment about how the decompression code 
is responsible for the memset up to max_bytes?


-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: add missing memset while reading compressed inline extents

2017-03-08 Thread Zygo Blaxell
On Wed, Mar 08, 2017 at 10:27:33AM +, Filipe Manana wrote:
> On Wed, Mar 8, 2017 at 3:18 AM, Zygo Blaxell
> <zblax...@waya.furryterror.org> wrote:
> > From: Zygo Blaxell <ce3g8...@umail.furryterror.org>
> >
> > This is a story about 4 distinct (and very old) btrfs bugs.
> >
> > Commit c8b978188c ("Btrfs: Add zlib compression support") added
> > three data corruption bugs for inline extents (bugs #1-3).
> >
> > Commit 93c82d5750 ("Btrfs: zero page past end of inline file items")
> > fixed bug #1:  uncompressed inline extents followed by a hole and more
> > extents could get non-zero data in the hole as they were read.  The fix
> > was to add a memset in btrfs_get_extent to zero out the hole.
> >
> > Commit 166ae5a418 ("btrfs: fix inline compressed read err corruption")
> > fixed bug #2:  compressed inline extents which contained non-zero bytes
> > might be replaced with zero bytes in some cases.  This patch removed an
> > unhelpful memset from uncompress_inline, but the case where memset is
> > required was missed.
> >
> > There is also a memset in the decompression code, but this only covers
> > decompressed data that is shorter than the ram_bytes from the extent
> > ref record.  This memset doesn't cover the region between the end of the
> > decompressed data and the end of the page.  It has also moved around a
> > few times over the years, so there's no single patch to refer to.
> >
> > This patch fixes bug #3:  compressed inline extents followed by a hole
> > and more extents could get non-zero data in the hole as they were read
> > (i.e. bug #3 is the same as bug #1, but s/uncompressed/compressed/).
> > The fix is the same:  zero out the hole in the compressed case too,
> > by putting a memset back in uncompress_inline, but this time with
> > correct parameters.
> >
> > The last and oldest bug, bug #0, is the cause of the offending inline
> > extent/hole/extent pattern.  Bug #0 is a subtle and mostly-harmless quirk
> > of behavior somewhere in the btrfs write code.  In a few special cases,
> > an inline extent and hole are allowed to persist where they normally
> > would be combined with later extents in the file.
> >
> > A fast reproducer for bug #0 is presented below.  A few offending extents
> > are also created in the wild during large rsync transfers with the -S
> > flag.  A Linux kernel build (git checkout; make allyesconfig; make -j8)
> > will produce a handful of offending files as well.  Once an offending
> > file is created, it can present different content to userspace each
> > time it is read.
> >
> > Bug #0 is at least 4 and possibly 8 years old.  I verified every vX.Y
> > kernel back to v3.5 has this behavior.  There are fossil records of this
> > bug's effects in commits all the way back to v2.6.32.  I have no reason
> > to believe bug #0 wasn't present at the beginning of btrfs compression
> > support in v2.6.29, but I can't easily test kernels that old to be sure.
> >
> > It is not clear whether bug #0 is worth fixing.  A fix would likely
> > require injecting extra reads into currently write-only paths, and most
> > of the exceptional cases caused by bug #0 are already handled now.
> >
> > Whether we like them or not, bug #0's inline extents followed by holes
> > are part of the btrfs de-facto disk format now, and we need to be able
> > to read them without data corruption or an infoleak.  So enough about
> > bug #0, let's get back to bug #3 (this patch).
> >
> > An example of on-disk structure leading to data corruption:
> >
> > item 61 key (606890 INODE_ITEM 0) itemoff 9662 itemsize 160
> > inode generation 50 transid 50 size 47424 nbytes 49141
> > block group 0 mode 100644 links 1 uid 0 gid 0
> > rdev 0 flags 0x0(none)
> > item 62 key (606890 INODE_REF 603050) itemoff 9642 itemsize 20
> > inode ref index 3 namelen 10 name: DB_File.so
> > item 63 key (606890 EXTENT_DATA 0) itemoff 8280 itemsize 1362
> > inline extent data size 1341 ram 4085 compress(zlib)
> > item 64 key (606890 EXTENT_DATA 4096) itemoff 8227 itemsize 53
> > extent data disk byte 5367308288 nr 20480
> > extent data offset 0 nr 45056 ram 45056
> > extent compression(zlib)
> 
> So this case is actually different from the reproducer below, because
> once a file has prealloc extents, future writes will never be
> compressed. That is, the extent at offset 4096 can not ha

[PATCH v3] btrfs: add missing memset while reading compressed inline extents

2017-03-08 Thread Zygo Blaxell
This is a story about 4 distinct (and very old) btrfs bugs.

Commit c8b978188c ("Btrfs: Add zlib compression support") added
three data corruption bugs for inline extents (bugs #1-3).

Commit 93c82d5750 ("Btrfs: zero page past end of inline file items")
fixed bug #1:  uncompressed inline extents followed by a hole and more
extents could get non-zero data in the hole as they were read.  The fix
was to add a memset in btrfs_get_extent to zero out the hole.

Commit 166ae5a418 ("btrfs: fix inline compressed read err corruption")
fixed bug #2:  compressed inline extents which contained non-zero bytes
might be replaced with zero bytes in some cases.  This patch removed an
unhelpful memset from uncompress_inline, but the case where memset is
required was missed.

There is also a memset in the decompression code, but this only covers
decompressed data that is shorter than the ram_bytes from the extent
ref record.  This memset doesn't cover the region between the end of the
decompressed data and the end of the page.  It has also moved around a
few times over the years, so there's no single patch to refer to.

This patch fixes bug #3:  compressed inline extents followed by a hole
and more extents could get non-zero data in the hole as they were read
(i.e. bug #3 is the same as bug #1, but s/uncompressed/compressed/).
The fix is the same:  zero out the hole in the compressed case too,
by putting a memset back in uncompress_inline, but this time with
correct parameters.

The last and oldest bug, bug #0, is the cause of the offending inline
extent/hole/extent pattern.  Bug #0 is a subtle and mostly-harmless quirk
of behavior somewhere in the btrfs write code.  In a few special cases,
an inline extent and hole are allowed to persist where they normally
would be combined with later extents in the file.

A fast reproducer for bug #0 is presented below.  A few offending extents
are also created in the wild during large rsync transfers with the -S
flag.  A Linux kernel build (git checkout; make allyesconfig; make -j8)
will produce a handful of offending files as well.  Once an offending
file is created, it can present different content to userspace each
time it is read.

Bug #0 is at least 4 and possibly 8 years old.  I verified every vX.Y
kernel back to v3.5 has this behavior.  There are fossil records of this
bug's effects in commits all the way back to v2.6.32.  I have no reason
to believe bug #0 wasn't present at the beginning of btrfs compression
support in v2.6.29, but I can't easily test kernels that old to be sure.

It is not clear whether bug #0 is worth fixing.  A fix would likely
require injecting extra reads into currently write-only paths, and most
of the exceptional cases caused by bug #0 are already handled now.

Whether we like them or not, bug #0's inline extents followed by holes
are part of the btrfs de-facto disk format now, and we need to be able
to read them without data corruption or an infoleak.  So enough about
bug #0, let's get back to bug #3 (this patch).

An example of on-disk structure leading to data corruption found in
the wild:

item 61 key (606890 INODE_ITEM 0) itemoff 9662 itemsize 160
inode generation 50 transid 50 size 47424 nbytes 49141
block group 0 mode 100644 links 1 uid 0 gid 0
rdev 0 flags 0x0(none)
item 62 key (606890 INODE_REF 603050) itemoff 9642 itemsize 20
inode ref index 3 namelen 10 name: DB_File.so
item 63 key (606890 EXTENT_DATA 0) itemoff 8280 itemsize 1362
inline extent data size 1341 ram 4085 compress(zlib)
item 64 key (606890 EXTENT_DATA 4096) itemoff 8227 itemsize 53
extent data disk byte 5367308288 nr 20480
extent data offset 0 nr 45056 ram 45056
extent compression(zlib)

Different data appears in userspace during each read of the 11 bytes
between 4085 and 4096.  The extent in item 63 is not long enough to
fill the first page of the file, so a memset is required to fill the
space between item 63 (ending at 4085) and item 64 (beginning at 4096)
with zero.

Here is a reproducer from Liu Bo, which demonstrates another method
of creating an inline extent and hole pattern leading to the same data
corruption/infoleak on read:

Using 'page_poison=on' kernel command line (or enable
CONFIG_PAGE_POISONING) run the following:

# touch foo
# chattr +c foo
# xfs_io -f -c "pwrite -W 0 1000" foo
# xfs_io -f -c "falloc 4 8188" foo
# od -x foo
# echo 3 >/proc/sys/vm/drop_caches
# od -x foo

This produce the following on my box:

Correct output:  file contains 1000 data bytes followed by zeros
to the end of the file:

000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
*
0001740 cdcd cdcd cdcd cdcd    
0001760        
*
002000

Re: [PATCH] btrfs: add missing memset while reading compressed inline extents

2017-03-08 Thread Filipe Manana
On Wed, Mar 8, 2017 at 3:18 AM, Zygo Blaxell
<zblax...@waya.furryterror.org> wrote:
> From: Zygo Blaxell <ce3g8...@umail.furryterror.org>
>
> This is a story about 4 distinct (and very old) btrfs bugs.
>
> Commit c8b978188c ("Btrfs: Add zlib compression support") added
> three data corruption bugs for inline extents (bugs #1-3).
>
> Commit 93c82d5750 ("Btrfs: zero page past end of inline file items")
> fixed bug #1:  uncompressed inline extents followed by a hole and more
> extents could get non-zero data in the hole as they were read.  The fix
> was to add a memset in btrfs_get_extent to zero out the hole.
>
> Commit 166ae5a418 ("btrfs: fix inline compressed read err corruption")
> fixed bug #2:  compressed inline extents which contained non-zero bytes
> might be replaced with zero bytes in some cases.  This patch removed an
> unhelpful memset from uncompress_inline, but the case where memset is
> required was missed.
>
> There is also a memset in the decompression code, but this only covers
> decompressed data that is shorter than the ram_bytes from the extent
> ref record.  This memset doesn't cover the region between the end of the
> decompressed data and the end of the page.  It has also moved around a
> few times over the years, so there's no single patch to refer to.
>
> This patch fixes bug #3:  compressed inline extents followed by a hole
> and more extents could get non-zero data in the hole as they were read
> (i.e. bug #3 is the same as bug #1, but s/uncompressed/compressed/).
> The fix is the same:  zero out the hole in the compressed case too,
> by putting a memset back in uncompress_inline, but this time with
> correct parameters.
>
> The last and oldest bug, bug #0, is the cause of the offending inline
> extent/hole/extent pattern.  Bug #0 is a subtle and mostly-harmless quirk
> of behavior somewhere in the btrfs write code.  In a few special cases,
> an inline extent and hole are allowed to persist where they normally
> would be combined with later extents in the file.
>
> A fast reproducer for bug #0 is presented below.  A few offending extents
> are also created in the wild during large rsync transfers with the -S
> flag.  A Linux kernel build (git checkout; make allyesconfig; make -j8)
> will produce a handful of offending files as well.  Once an offending
> file is created, it can present different content to userspace each
> time it is read.
>
> Bug #0 is at least 4 and possibly 8 years old.  I verified every vX.Y
> kernel back to v3.5 has this behavior.  There are fossil records of this
> bug's effects in commits all the way back to v2.6.32.  I have no reason
> to believe bug #0 wasn't present at the beginning of btrfs compression
> support in v2.6.29, but I can't easily test kernels that old to be sure.
>
> It is not clear whether bug #0 is worth fixing.  A fix would likely
> require injecting extra reads into currently write-only paths, and most
> of the exceptional cases caused by bug #0 are already handled now.
>
> Whether we like them or not, bug #0's inline extents followed by holes
> are part of the btrfs de-facto disk format now, and we need to be able
> to read them without data corruption or an infoleak.  So enough about
> bug #0, let's get back to bug #3 (this patch).
>
> An example of on-disk structure leading to data corruption:
>
> item 61 key (606890 INODE_ITEM 0) itemoff 9662 itemsize 160
> inode generation 50 transid 50 size 47424 nbytes 49141
> block group 0 mode 100644 links 1 uid 0 gid 0
> rdev 0 flags 0x0(none)
> item 62 key (606890 INODE_REF 603050) itemoff 9642 itemsize 20
> inode ref index 3 namelen 10 name: DB_File.so
> item 63 key (606890 EXTENT_DATA 0) itemoff 8280 itemsize 1362
> inline extent data size 1341 ram 4085 compress(zlib)
> item 64 key (606890 EXTENT_DATA 4096) itemoff 8227 itemsize 53
> extent data disk byte 5367308288 nr 20480
> extent data offset 0 nr 45056 ram 45056
> extent compression(zlib)

So this case is actually different from the reproducer below, because
once a file has prealloc extents, future writes will never be
compressed. That is, the extent at offset 4096 can not have compressed
content using steps like the reproducer below. I can only imagine that
after an falloc to create the extent at 4K, we cloned some compressed
extent from some other file into that offset.

>
> Different data appears in userspace during each read of the 11 bytes
> between 4085 and 4096.  The extent in item 63 is not long enough to
> fill the first page of the file, so a memset is required to fill the
> space between item 63 

[PATCH] btrfs: add missing memset while reading compressed inline extents

2017-03-07 Thread Zygo Blaxell
From: Zygo Blaxell <ce3g8...@umail.furryterror.org>

This is a story about 4 distinct (and very old) btrfs bugs.

Commit c8b978188c ("Btrfs: Add zlib compression support") added
three data corruption bugs for inline extents (bugs #1-3).

Commit 93c82d5750 ("Btrfs: zero page past end of inline file items")
fixed bug #1:  uncompressed inline extents followed by a hole and more
extents could get non-zero data in the hole as they were read.  The fix
was to add a memset in btrfs_get_extent to zero out the hole.

Commit 166ae5a418 ("btrfs: fix inline compressed read err corruption")
fixed bug #2:  compressed inline extents which contained non-zero bytes
might be replaced with zero bytes in some cases.  This patch removed an
unhelpful memset from uncompress_inline, but the case where memset is
required was missed.

There is also a memset in the decompression code, but this only covers
decompressed data that is shorter than the ram_bytes from the extent
ref record.  This memset doesn't cover the region between the end of the
decompressed data and the end of the page.  It has also moved around a
few times over the years, so there's no single patch to refer to.

This patch fixes bug #3:  compressed inline extents followed by a hole
and more extents could get non-zero data in the hole as they were read
(i.e. bug #3 is the same as bug #1, but s/uncompressed/compressed/).
The fix is the same:  zero out the hole in the compressed case too,
by putting a memset back in uncompress_inline, but this time with
correct parameters.

The last and oldest bug, bug #0, is the cause of the offending inline
extent/hole/extent pattern.  Bug #0 is a subtle and mostly-harmless quirk
of behavior somewhere in the btrfs write code.  In a few special cases,
an inline extent and hole are allowed to persist where they normally
would be combined with later extents in the file.

A fast reproducer for bug #0 is presented below.  A few offending extents
are also created in the wild during large rsync transfers with the -S
flag.  A Linux kernel build (git checkout; make allyesconfig; make -j8)
will produce a handful of offending files as well.  Once an offending
file is created, it can present different content to userspace each
time it is read.

Bug #0 is at least 4 and possibly 8 years old.  I verified every vX.Y
kernel back to v3.5 has this behavior.  There are fossil records of this
bug's effects in commits all the way back to v2.6.32.  I have no reason
to believe bug #0 wasn't present at the beginning of btrfs compression
support in v2.6.29, but I can't easily test kernels that old to be sure.

It is not clear whether bug #0 is worth fixing.  A fix would likely
require injecting extra reads into currently write-only paths, and most
of the exceptional cases caused by bug #0 are already handled now.

Whether we like them or not, bug #0's inline extents followed by holes
are part of the btrfs de-facto disk format now, and we need to be able
to read them without data corruption or an infoleak.  So enough about
bug #0, let's get back to bug #3 (this patch).

An example of on-disk structure leading to data corruption:

item 61 key (606890 INODE_ITEM 0) itemoff 9662 itemsize 160
inode generation 50 transid 50 size 47424 nbytes 49141
block group 0 mode 100644 links 1 uid 0 gid 0
rdev 0 flags 0x0(none)
item 62 key (606890 INODE_REF 603050) itemoff 9642 itemsize 20
inode ref index 3 namelen 10 name: DB_File.so
item 63 key (606890 EXTENT_DATA 0) itemoff 8280 itemsize 1362
inline extent data size 1341 ram 4085 compress(zlib)
item 64 key (606890 EXTENT_DATA 4096) itemoff 8227 itemsize 53
extent data disk byte 5367308288 nr 20480
extent data offset 0 nr 45056 ram 45056
extent compression(zlib)

Different data appears in userspace during each read of the 11 bytes
between 4085 and 4096.  The extent in item 63 is not long enough to
fill the first page of the file, so a memset is required to fill the
space between item 63 (ending at 4085) and item 64 (beginning at 4096)
with zero.

Here is a reproducer from Liu Bo:

Using 'page_poison=on' kernel command line (or enable
CONFIG_PAGE_POISONING) run the following:

# touch foo
# chattr +c foo
# xfs_io -f -c "pwrite -W 0 1000" foo
# xfs_io -f -c "falloc 4 8188" foo
# od -x foo
# echo 3 >/proc/sys/vm/drop_caches
# od -x foo

This produce the following on my box:

000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
*
0001740 cdcd cdcd cdcd cdcd    
0001760        
*
002

000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
*
0001740 cdcd cdcd cdcd cdcd 6c63 7400 635f 006d
0001760 5f74 6f43 7400 435f 0053 5f74 7363 7400
0002000 435f 0056 5f74 6164 7400 645f 0062 5f74
(...)

v2: I'm not able to contrive a 

Re: [PATCH] btrfs: fix hole read corruption for compressed inline extents

2017-03-07 Thread Liu Bo
Hi Zygo,

On Tue, Dec 13, 2016 at 01:40:13AM -0500, Zygo Blaxell wrote:
> On Sun, Dec 11, 2016 at 10:56:59PM +0100, Xin Zhou wrote:
> >Hi Zygo,
> > 
> >Since the corruption happens after I/O and checksum,
> >could it be possible to add some bug catcher code in debug build, to help
> >narrowing down the issue?
> 
> The corruption happens only during reads, after IO and checksum.
> It happens at the point that I have patched, after decompression when
> the decompressed data doesn't fill the extent and there is a hole after
> the extent.  No other code exists in btrfs to fill the hole that forms
> in such cases.  There is nothing left to narrow down.
> 
> Chris Mason discovered the same bug in 2009 and fixed half of it in commit
> 93c82d5750.  His fix only fixed one of two cases where corruption occurs
> (i.e. after uncompressed extents).  The compressed extent case remained
> unfixed until the present.
> 
> To be fair to everyone who missed this bug for seven years:  it was
> quite a hard bug to spot.  Over a period of 28 months I observed backup
> verification failures and eliminated competing theories until only this
> bug remained.
> 
> There were at least two other bugs fixed between 2009 and 2014 which
> would have also produced corrupted data in compressed files under
> slightly different circumstances, so anyone looking for data corruption
> in compressed extents would have thought they'd solved the problem at
> least twice before now.
> 
> There was a memset in *almost* the right spot in the code to fix the
> corruption bug until early 2014.  Ironically, that memset (which was
> introduced in 2009) was itself the cause of another corruption bug.
> The memset was removed in 166ae5a418 ("btrfs: fix inline compressed read
> err corruption") but that commit removed the memset entirely instead of
> fixing it.
>

Thanks for the nice analysis here, so I think btrfs does have this
problem, I could reproduce this with fallocate if using
'page_poison=on' kernel command line (or enable CONFIG_PAGE_POISONING).

Here are the steps,

# touch foo
# chattr +c foo
# xfs_io -f -c "pwrite -W 0 1000" foo
# xfs_io -f -c "falloc 4 8188" foo
# od -x foo
# echo 3 >/proc/sys/vm/drop_caches
# od -x foo

This produce the following on my box:

000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
*
0001740 cdcd cdcd cdcd cdcd    
0001760        
*
002



000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
*
0001740 cdcd cdcd cdcd cdcd 6c63 7400 635f 006d
0001760 5f74 6f43 7400 435f 0053 5f74 7363 7400
0002000 435f 0056 5f74 6164 7400 645f 0062 5f74
(...)



Regarding to this patch,

- Although pg_offset is assumed to be zero, using (PAGE_SIZE -
  pg_offset) makes the code look saner, could you please add pg_offset
  to the check?

- Could you please wrap my reproducer script into the commit log so
  that a later test could catch the point quickly?

With those added, you can add

Reviewed-by: Liu Bo 

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix file corruption after cloning inline extents

2017-03-06 Thread Filipe Manana
On Fri, Mar 3, 2017 at 6:48 PM, Liu Bo <bo.li@oracle.com> wrote:
> On Fri, Mar 03, 2017 at 03:36:36PM +, Filipe Manana wrote:
>> On Fri, Mar 3, 2017 at 12:36 AM, Liu Bo <bo.li@oracle.com> wrote:
>> > On Thu, Mar 02, 2017 at 02:18:21PM -0800, Liu Bo wrote:
>> >> On Tue, Jul 14, 2015 at 04:34:48PM +0100, fdman...@kernel.org wrote:
>> >> > From: Filipe Manana <fdman...@suse.com>
>> >> >
>> >> > Using the clone ioctl (or extent_same ioctl, which calls the same extent
>> >> > cloning function as well) we end up allowing copy an inline extent from
>> >> > the source file into a non-zero offset of the destination file. This is
>> >> > something not expected and that the btrfs code is not prepared to deal
>> >> > with - all inline extents must be at a file offset equals to 0.
>> >> >
>> >>
>> >> Somehow I failed to reproduce the BUG_ON with this case.
>> >>
>> >> > For example, the following excerpt of a test case for fstests triggers
>> >> > a crash/BUG_ON() on a write operation after an inline extent is cloned
>> >> > into a non-zero offset:
>> >> >
>> >> >   _scratch_mkfs >>$seqres.full 2>&1
>> >> >   _scratch_mount
>> >> >
>> >> >   # Create our test files. File foo has the same 2K of data at offset 4K
>> >> >   # as file bar has at its offset 0.
>> >> >   $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 4K" \
>> >> >   -c "pwrite -S 0xbb 4k 2K" \
>> >> >   -c "pwrite -S 0xcc 8K 4K" \
>> >> >   $SCRATCH_MNT/foo | _filter_xfs_io
>> >> >
>> >> >   # File bar consists of a single inline extent (2K size).
>> >> >   $XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2K" \
>> >> >  $SCRATCH_MNT/bar | _filter_xfs_io
>> >> >
>> >> >   # Now call the clone ioctl to clone the extent of file bar into file
>> >> >   # foo at its offset 4K. This made file foo have an inline extent at
>> >> >   # offset 4K, something which the btrfs code can not deal with in 
>> >> > future
>> >> >   # IO operations because all inline extents are supposed to start at an
>> >> >   # offset of 0, resulting in all sorts of chaos.
>> >> >   # So here we validate that clone ioctl returns an EOPNOTSUPP, which is
>> >> >   # what it returns for other cases dealing with inlined extents.
>> >> >   $CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
>> >> >   $SCRATCH_MNT/bar $SCRATCH_MNT/foo
>> >> >
>> >> >   # Because of the inline extent at offset 4K, the following write made
>> >> >   # the kernel crash with a BUG_ON().
>> >> >   $XFS_IO_PROG -c "pwrite -S 0xdd 6K 2K" $SCRATCH_MNT/foo | 
>> >> > _filter_xfs_io
>> >> >
>> >>
>> >> On 4.10, after allowing to clone an inline extent to dst file's offset 
>> >> greater
>> >> than zero, I followed the test case manually and got these
>> >>
>> >> [root@localhost trinity]# /home/btrfs-progs/btrfs-debugfs -f 
>> >> /mnt/btrfs/foo
>> >> (257 0): ram 4096 disk 12648448 disk_size 4096
>> >> (257 4096): ram 2048 disk 0 disk_size 2048 -- inline
>> >> (257 8192): ram 4096 disk 12656640 disk_size 4096
>> >> file: /mnt/btrfs/foo extents 3 disk size 10240 logical size 12288 ratio 
>> >> 1.20
>> >>
>> >> [root@localhost trinity]# xfs_io -f -c "pwrite 6k 2k" /mnt/btrfs/foo
>> >> wrote 2048/2048 bytes at offset 6144
>> >> 2 KiB, 1 ops; 0. sec (12.520 MiB/sec and 6410.2564 ops/sec)
>> >>
>> >> [root@localhost trinity]# sync
>> >> [root@localhost trinity]# /home/btrfs-progs/btrfs-debugfs -f 
>> >> /mnt/btrfs/foo
>> >> (257 0): ram 4096 disk 12648448 disk_size 4096
>> >> (257 4096): ram 4096 disk 12582912 disk_size 4096
>> >> (257 8192): ram 4096 disk 12656640 disk_size 4096
>> >> file: /mnt/btrfs/foo extents 3 disk size 12288 logical size 12288 ratio 
>> >> 1.00
>> >>
>> >>
>> >> Looks like we now are able to cope with these inline extents?
>> >
>> > I went back to test against v4.1 and v4.5, it turns out that we got the 
>> > below
>> > BUG_ON() in MM and -EIO 

Re: [PATCH] Btrfs: fix file corruption after cloning inline extents

2017-03-03 Thread Liu Bo
On Fri, Mar 03, 2017 at 03:36:36PM +, Filipe Manana wrote:
> On Fri, Mar 3, 2017 at 12:36 AM, Liu Bo <bo.li@oracle.com> wrote:
> > On Thu, Mar 02, 2017 at 02:18:21PM -0800, Liu Bo wrote:
> >> On Tue, Jul 14, 2015 at 04:34:48PM +0100, fdman...@kernel.org wrote:
> >> > From: Filipe Manana <fdman...@suse.com>
> >> >
> >> > Using the clone ioctl (or extent_same ioctl, which calls the same extent
> >> > cloning function as well) we end up allowing copy an inline extent from
> >> > the source file into a non-zero offset of the destination file. This is
> >> > something not expected and that the btrfs code is not prepared to deal
> >> > with - all inline extents must be at a file offset equals to 0.
> >> >
> >>
> >> Somehow I failed to reproduce the BUG_ON with this case.
> >>
> >> > For example, the following excerpt of a test case for fstests triggers
> >> > a crash/BUG_ON() on a write operation after an inline extent is cloned
> >> > into a non-zero offset:
> >> >
> >> >   _scratch_mkfs >>$seqres.full 2>&1
> >> >   _scratch_mount
> >> >
> >> >   # Create our test files. File foo has the same 2K of data at offset 4K
> >> >   # as file bar has at its offset 0.
> >> >   $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 4K" \
> >> >   -c "pwrite -S 0xbb 4k 2K" \
> >> >   -c "pwrite -S 0xcc 8K 4K" \
> >> >   $SCRATCH_MNT/foo | _filter_xfs_io
> >> >
> >> >   # File bar consists of a single inline extent (2K size).
> >> >   $XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2K" \
> >> >  $SCRATCH_MNT/bar | _filter_xfs_io
> >> >
> >> >   # Now call the clone ioctl to clone the extent of file bar into file
> >> >   # foo at its offset 4K. This made file foo have an inline extent at
> >> >   # offset 4K, something which the btrfs code can not deal with in future
> >> >   # IO operations because all inline extents are supposed to start at an
> >> >   # offset of 0, resulting in all sorts of chaos.
> >> >   # So here we validate that clone ioctl returns an EOPNOTSUPP, which is
> >> >   # what it returns for other cases dealing with inlined extents.
> >> >   $CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
> >> >   $SCRATCH_MNT/bar $SCRATCH_MNT/foo
> >> >
> >> >   # Because of the inline extent at offset 4K, the following write made
> >> >   # the kernel crash with a BUG_ON().
> >> >   $XFS_IO_PROG -c "pwrite -S 0xdd 6K 2K" $SCRATCH_MNT/foo | 
> >> > _filter_xfs_io
> >> >
> >>
> >> On 4.10, after allowing to clone an inline extent to dst file's offset 
> >> greater
> >> than zero, I followed the test case manually and got these
> >>
> >> [root@localhost trinity]# /home/btrfs-progs/btrfs-debugfs -f /mnt/btrfs/foo
> >> (257 0): ram 4096 disk 12648448 disk_size 4096
> >> (257 4096): ram 2048 disk 0 disk_size 2048 -- inline
> >> (257 8192): ram 4096 disk 12656640 disk_size 4096
> >> file: /mnt/btrfs/foo extents 3 disk size 10240 logical size 12288 ratio 
> >> 1.20
> >>
> >> [root@localhost trinity]# xfs_io -f -c "pwrite 6k 2k" /mnt/btrfs/foo
> >> wrote 2048/2048 bytes at offset 6144
> >> 2 KiB, 1 ops; 0. sec (12.520 MiB/sec and 6410.2564 ops/sec)
> >>
> >> [root@localhost trinity]# sync
> >> [root@localhost trinity]# /home/btrfs-progs/btrfs-debugfs -f /mnt/btrfs/foo
> >> (257 0): ram 4096 disk 12648448 disk_size 4096
> >> (257 4096): ram 4096 disk 12582912 disk_size 4096
> >> (257 8192): ram 4096 disk 12656640 disk_size 4096
> >> file: /mnt/btrfs/foo extents 3 disk size 12288 logical size 12288 ratio 
> >> 1.00
> >>
> >>
> >> Looks like we now are able to cope with these inline extents?
> >
> > I went back to test against v4.1 and v4.5, it turns out that we got the 
> > below
> > BUG_ON() in MM and -EIO when writing to the inline extent, because of the 
> > fact
> > that, when writing to the page that covers the inline extent, firstly it 
> > reads
> > page to get an uptodate page for writing, in readpage(), for inline extent,
> > btrfs_get_extent() always goes to search fs tree to read inline data out to 
> > page
> > and then tries to insert a em, -EEXIST would be returned if there is an 
> > ex

Re: [PATCH] Btrfs: fix file corruption after cloning inline extents

2017-03-03 Thread Filipe Manana
On Fri, Mar 3, 2017 at 12:36 AM, Liu Bo <bo.li@oracle.com> wrote:
> On Thu, Mar 02, 2017 at 02:18:21PM -0800, Liu Bo wrote:
>> On Tue, Jul 14, 2015 at 04:34:48PM +0100, fdman...@kernel.org wrote:
>> > From: Filipe Manana <fdman...@suse.com>
>> >
>> > Using the clone ioctl (or extent_same ioctl, which calls the same extent
>> > cloning function as well) we end up allowing copy an inline extent from
>> > the source file into a non-zero offset of the destination file. This is
>> > something not expected and that the btrfs code is not prepared to deal
>> > with - all inline extents must be at a file offset equals to 0.
>> >
>>
>> Somehow I failed to reproduce the BUG_ON with this case.
>>
>> > For example, the following excerpt of a test case for fstests triggers
>> > a crash/BUG_ON() on a write operation after an inline extent is cloned
>> > into a non-zero offset:
>> >
>> >   _scratch_mkfs >>$seqres.full 2>&1
>> >   _scratch_mount
>> >
>> >   # Create our test files. File foo has the same 2K of data at offset 4K
>> >   # as file bar has at its offset 0.
>> >   $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 4K" \
>> >   -c "pwrite -S 0xbb 4k 2K" \
>> >   -c "pwrite -S 0xcc 8K 4K" \
>> >   $SCRATCH_MNT/foo | _filter_xfs_io
>> >
>> >   # File bar consists of a single inline extent (2K size).
>> >   $XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2K" \
>> >  $SCRATCH_MNT/bar | _filter_xfs_io
>> >
>> >   # Now call the clone ioctl to clone the extent of file bar into file
>> >   # foo at its offset 4K. This made file foo have an inline extent at
>> >   # offset 4K, something which the btrfs code can not deal with in future
>> >   # IO operations because all inline extents are supposed to start at an
>> >   # offset of 0, resulting in all sorts of chaos.
>> >   # So here we validate that clone ioctl returns an EOPNOTSUPP, which is
>> >   # what it returns for other cases dealing with inlined extents.
>> >   $CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
>> >   $SCRATCH_MNT/bar $SCRATCH_MNT/foo
>> >
>> >   # Because of the inline extent at offset 4K, the following write made
>> >   # the kernel crash with a BUG_ON().
>> >   $XFS_IO_PROG -c "pwrite -S 0xdd 6K 2K" $SCRATCH_MNT/foo | _filter_xfs_io
>> >
>>
>> On 4.10, after allowing to clone an inline extent to dst file's offset 
>> greater
>> than zero, I followed the test case manually and got these
>>
>> [root@localhost trinity]# /home/btrfs-progs/btrfs-debugfs -f /mnt/btrfs/foo
>> (257 0): ram 4096 disk 12648448 disk_size 4096
>> (257 4096): ram 2048 disk 0 disk_size 2048 -- inline
>> (257 8192): ram 4096 disk 12656640 disk_size 4096
>> file: /mnt/btrfs/foo extents 3 disk size 10240 logical size 12288 ratio 1.20
>>
>> [root@localhost trinity]# xfs_io -f -c "pwrite 6k 2k" /mnt/btrfs/foo
>> wrote 2048/2048 bytes at offset 6144
>> 2 KiB, 1 ops; 0. sec (12.520 MiB/sec and 6410.2564 ops/sec)
>>
>> [root@localhost trinity]# sync
>> [root@localhost trinity]# /home/btrfs-progs/btrfs-debugfs -f /mnt/btrfs/foo
>> (257 0): ram 4096 disk 12648448 disk_size 4096
>> (257 4096): ram 4096 disk 12582912 disk_size 4096
>> (257 8192): ram 4096 disk 12656640 disk_size 4096
>> file: /mnt/btrfs/foo extents 3 disk size 12288 logical size 12288 ratio 1.00
>>
>>
>> Looks like we now are able to cope with these inline extents?
>
> I went back to test against v4.1 and v4.5, it turns out that we got the below
> BUG_ON() in MM and -EIO when writing to the inline extent, because of the fact
> that, when writing to the page that covers the inline extent, firstly it reads
> page to get an uptodate page for writing, in readpage(), for inline extent,
> btrfs_get_extent() always goes to search fs tree to read inline data out to 
> page
> and then tries to insert a em, -EEXIST would be returned if there is an 
> existing
> one.
>
> However, after commit 8dff9c853410 ("Btrfs: deal with duplciates during
> extent_map insertion in btrfs_get_extent"), we have that fixed, so now we can
> read/write inline extent even they've been mixed with other regular extents.
>
> But...I'm not 100% sure whether such files (with mixing inline with regular)
> would have any other problems rather than read/write.  Let me know if you 
> could
> think of a corruption due to that.

Without thinking too much and after do

Re: [PATCH] Btrfs: fix file corruption after cloning inline extents

2017-03-02 Thread Liu Bo
On Thu, Mar 02, 2017 at 02:18:21PM -0800, Liu Bo wrote:
> On Tue, Jul 14, 2015 at 04:34:48PM +0100, fdman...@kernel.org wrote:
> > From: Filipe Manana <fdman...@suse.com>
> > 
> > Using the clone ioctl (or extent_same ioctl, which calls the same extent
> > cloning function as well) we end up allowing copy an inline extent from
> > the source file into a non-zero offset of the destination file. This is
> > something not expected and that the btrfs code is not prepared to deal
> > with - all inline extents must be at a file offset equals to 0.
> >
> 
> Somehow I failed to reproduce the BUG_ON with this case.
> 
> > For example, the following excerpt of a test case for fstests triggers
> > a crash/BUG_ON() on a write operation after an inline extent is cloned
> > into a non-zero offset:
> > 
> >   _scratch_mkfs >>$seqres.full 2>&1
> >   _scratch_mount
> > 
> >   # Create our test files. File foo has the same 2K of data at offset 4K
> >   # as file bar has at its offset 0.
> >   $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 4K" \
> >   -c "pwrite -S 0xbb 4k 2K" \
> >   -c "pwrite -S 0xcc 8K 4K" \
> >   $SCRATCH_MNT/foo | _filter_xfs_io
> > 
> >   # File bar consists of a single inline extent (2K size).
> >   $XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2K" \
> >  $SCRATCH_MNT/bar | _filter_xfs_io
> > 
> >   # Now call the clone ioctl to clone the extent of file bar into file
> >   # foo at its offset 4K. This made file foo have an inline extent at
> >   # offset 4K, something which the btrfs code can not deal with in future
> >   # IO operations because all inline extents are supposed to start at an
> >   # offset of 0, resulting in all sorts of chaos.
> >   # So here we validate that clone ioctl returns an EOPNOTSUPP, which is
> >   # what it returns for other cases dealing with inlined extents.
> >   $CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
> >   $SCRATCH_MNT/bar $SCRATCH_MNT/foo
> > 
> >   # Because of the inline extent at offset 4K, the following write made
> >   # the kernel crash with a BUG_ON().
> >   $XFS_IO_PROG -c "pwrite -S 0xdd 6K 2K" $SCRATCH_MNT/foo | _filter_xfs_io
> >
> 
> On 4.10, after allowing to clone an inline extent to dst file's offset greater
> than zero, I followed the test case manually and got these
> 
> [root@localhost trinity]# /home/btrfs-progs/btrfs-debugfs -f /mnt/btrfs/foo 
> (257 0): ram 4096 disk 12648448 disk_size 4096
> (257 4096): ram 2048 disk 0 disk_size 2048 -- inline
> (257 8192): ram 4096 disk 12656640 disk_size 4096
> file: /mnt/btrfs/foo extents 3 disk size 10240 logical size 12288 ratio 1.20
> 
> [root@localhost trinity]# xfs_io -f -c "pwrite 6k 2k" /mnt/btrfs/foo 
> wrote 2048/2048 bytes at offset 6144
> 2 KiB, 1 ops; 0. sec (12.520 MiB/sec and 6410.2564 ops/sec)
> 
> [root@localhost trinity]# sync
> [root@localhost trinity]# /home/btrfs-progs/btrfs-debugfs -f /mnt/btrfs/foo 
> (257 0): ram 4096 disk 12648448 disk_size 4096
> (257 4096): ram 4096 disk 12582912 disk_size 4096
> (257 8192): ram 4096 disk 12656640 disk_size 4096
> file: /mnt/btrfs/foo extents 3 disk size 12288 logical size 12288 ratio 1.00
> 
> 
> Looks like we now are able to cope with these inline extents?

I went back to test against v4.1 and v4.5, it turns out that we got the below
BUG_ON() in MM and -EIO when writing to the inline extent, because of the fact
that, when writing to the page that covers the inline extent, firstly it reads
page to get an uptodate page for writing, in readpage(), for inline extent,
btrfs_get_extent() always goes to search fs tree to read inline data out to page
and then tries to insert a em, -EEXIST would be returned if there is an existing
one.

However, after commit 8dff9c853410 ("Btrfs: deal with duplciates during
extent_map insertion in btrfs_get_extent"), we have that fixed, so now we can
read/write inline extent even they've been mixed with other regular extents.

But...I'm not 100% sure whether such files (with mixing inline with regular)
would have any other problems rather than read/write.  Let me know if you could
think of a corruption due to that.

Thanks,

-liubo
> 
> 
> Thanks,
> 
> -liubo
> 
> 
> >   status=0
> >   exit
> > 
> > The stack trace of the BUG_ON() triggered by the last write is:
> > 
> >   [152154.035903] [ cut here ]
> >   [152154.036424] kernel BUG at mm/page-writeback.c:2286!
> >   [152154.036424] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
> >   [152154.036424] Modules linked in: btrfs dm_flakey dm_m

Re: [PATCH] Btrfs: fix file corruption after cloning inline extents

2017-03-02 Thread Liu Bo
On Tue, Jul 14, 2015 at 04:34:48PM +0100, fdman...@kernel.org wrote:
> From: Filipe Manana <fdman...@suse.com>
> 
> Using the clone ioctl (or extent_same ioctl, which calls the same extent
> cloning function as well) we end up allowing copy an inline extent from
> the source file into a non-zero offset of the destination file. This is
> something not expected and that the btrfs code is not prepared to deal
> with - all inline extents must be at a file offset equals to 0.
>

Somehow I failed to reproduce the BUG_ON with this case.

> For example, the following excerpt of a test case for fstests triggers
> a crash/BUG_ON() on a write operation after an inline extent is cloned
> into a non-zero offset:
> 
>   _scratch_mkfs >>$seqres.full 2>&1
>   _scratch_mount
> 
>   # Create our test files. File foo has the same 2K of data at offset 4K
>   # as file bar has at its offset 0.
>   $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 4K" \
>   -c "pwrite -S 0xbb 4k 2K" \
>   -c "pwrite -S 0xcc 8K 4K" \
>   $SCRATCH_MNT/foo | _filter_xfs_io
> 
>   # File bar consists of a single inline extent (2K size).
>   $XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2K" \
>  $SCRATCH_MNT/bar | _filter_xfs_io
> 
>   # Now call the clone ioctl to clone the extent of file bar into file
>   # foo at its offset 4K. This made file foo have an inline extent at
>   # offset 4K, something which the btrfs code can not deal with in future
>   # IO operations because all inline extents are supposed to start at an
>   # offset of 0, resulting in all sorts of chaos.
>   # So here we validate that clone ioctl returns an EOPNOTSUPP, which is
>   # what it returns for other cases dealing with inlined extents.
>   $CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
>   $SCRATCH_MNT/bar $SCRATCH_MNT/foo
> 
>   # Because of the inline extent at offset 4K, the following write made
>   # the kernel crash with a BUG_ON().
>   $XFS_IO_PROG -c "pwrite -S 0xdd 6K 2K" $SCRATCH_MNT/foo | _filter_xfs_io
>

On 4.10, after allowing to clone an inline extent to dst file's offset greater
than zero, I followed the test case manually and got these

[root@localhost trinity]# /home/btrfs-progs/btrfs-debugfs -f /mnt/btrfs/foo 
(257 0): ram 4096 disk 12648448 disk_size 4096
(257 4096): ram 2048 disk 0 disk_size 2048 -- inline
(257 8192): ram 4096 disk 12656640 disk_size 4096
file: /mnt/btrfs/foo extents 3 disk size 10240 logical size 12288 ratio 1.20

[root@localhost trinity]# xfs_io -f -c "pwrite 6k 2k" /mnt/btrfs/foo 
wrote 2048/2048 bytes at offset 6144
2 KiB, 1 ops; 0. sec (12.520 MiB/sec and 6410.2564 ops/sec)

[root@localhost trinity]# sync
[root@localhost trinity]# /home/btrfs-progs/btrfs-debugfs -f /mnt/btrfs/foo 
(257 0): ram 4096 disk 12648448 disk_size 4096
(257 4096): ram 4096 disk 12582912 disk_size 4096
(257 8192): ram 4096 disk 12656640 disk_size 4096
file: /mnt/btrfs/foo extents 3 disk size 12288 logical size 12288 ratio 1.00


Looks like we now are able to cope with these inline extents?


Thanks,

-liubo


>   status=0
>   exit
> 
> The stack trace of the BUG_ON() triggered by the last write is:
> 
>   [152154.035903] [ cut here ]
>   [152154.036424] kernel BUG at mm/page-writeback.c:2286!
>   [152154.036424] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
>   [152154.036424] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic 
> xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache 
> sunrpc loop fuse parport_pc acpi_cpu$
>   [152154.036424] CPU: 2 PID: 17873 Comm: xfs_io Tainted: GW   
> 4.1.0-rc6-btrfs-next-11+ #2
>   [152154.036424] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
>   [152154.036424] task: 880429f70990 ti: 880429efc000 task.ti: 
> 880429efc000
>   [152154.036424] RIP: 0010:[]  [] 
> clear_page_dirty_for_io+0x1e/0x90
>   [152154.036424] RSP: 0018:880429effc68  EFLAGS: 00010246
>   [152154.036424] RAX: 02000806 RBX: ea0006a6d8f0 RCX: 
> 0001
>   [152154.036424] RDX:  RSI: 81155d1b RDI: 
> ea0006a6d8f0
>   [152154.036424] RBP: 880429effc78 R08: 8801ce389fe0 R09: 
> 0001
>   [152154.036424] R10: 2000 R11:  R12: 
> 8800200dce68
>   [152154.036424] R13:  R14: 8800200dcc88 R15: 
> 8803d5736d80
>   [152154.036424] FS:  7fbf119f6700() GS:88043d28() 
> knlGS:
>   [152154.036424] CS:  0010 DS:  ES:  CR0: 80050033
>   [152154.036424] CR2: 01bdc000 CR3: 0003aa555000 CR4: 

Re: [PATCH] btrfs: fix hole read corruption for compressed inline extents

2017-02-18 Thread Zygo Blaxell
Ping?

This is still reproducible on 4.9.8.

On Mon, Nov 28, 2016 at 12:03:12AM -0500, Zygo Blaxell wrote:
> Commit c8b978188c ("Btrfs: Add zlib compression support") produces
> data corruption when reading a file with a hole positioned after an
> inline extent.  btrfs_get_extent will return uninitialized kernel memory
> instead of zero bytes in the hole.
> 
> Commit 93c82d5750 ("Btrfs: zero page past end of inline file items")
> fills the hole by memset to zero after *uncompressed* inline extents.
> 
> This patch provides the missing memset for holes after *compressed*
> inline extents.
> 
> The offending holes appear in the wild and will appear during routine
> data integrity audits (e.g. comparing backups against their originals).
> They can also be created intentionally by fuzzing or crafting a filesystem
> image.
> 
> Holes like these are not intended to occur in btrfs; however, I tested
> tagged kernels between v3.5 and the present, and found that all of them
> can create such holes.  Whether we like them or not, this kind of hole
> is now part of the btrfs de-facto on-disk format, and we need to be able
> to read such holes without an infoleak or wrong data.
> 
> An example of a hole leading to data corruption:
> 
> item 61 key (606890 INODE_ITEM 0) itemoff 9662 itemsize 160
> inode generation 50 transid 50 size 47424 nbytes 49141
> block group 0 mode 100644 links 1 uid 0 gid 0
> rdev 0 flags 0x0(none)
> item 62 key (606890 INODE_REF 603050) itemoff 9642 itemsize 20
> inode ref index 3 namelen 10 name: DB_File.so
> item 63 key (606890 EXTENT_DATA 0) itemoff 8280 itemsize 1362
> inline extent data size 1341 ram 4085 compress(zlib)
> item 64 key (606890 EXTENT_DATA 4096) itemoff 8227 itemsize 53
> extent data disk byte 5367308288 nr 20480
> extent data offset 0 nr 45056 ram 45056
> extent compression(zlib)
> 
> Different data appears in userspace during each uncached read of the 10
> bytes between offset 4085 and 4095.  The extent in item 63 is not long
> enough to fill the first page of the file, so a memset is required to
> fill the space between item 63 (ending at 4085) and item 64 (beginning
> at 4096) with zero.
> 
> Signed-off-by: Zygo Blaxell <ce3g8...@umail.furryterror.org>
> 
> ---
>  fs/btrfs/inode.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 8e3a5a2..b1314d6 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -6803,6 +6803,12 @@ static noinline int uncompress_inline(struct 
> btrfs_path *path,
>   max_size = min_t(unsigned long, PAGE_SIZE, max_size);
>   ret = btrfs_decompress(compress_type, tmp, page,
>  extent_offset, inline_size, max_size);
> + WARN_ON(max_size > PAGE_SIZE);
> + if (max_size < PAGE_SIZE) {
> + char *map = kmap(page);
> + memset(map + max_size, 0, PAGE_SIZE - max_size);
> + kunmap(page);
> + }
>   kfree(tmp);
>   return ret;
>  }
> -- 
> 2.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: fix hole read corruption for compressed inline extents

2016-12-12 Thread Zygo Blaxell
05969:  1:   35913575: 
last,eof
./drivers/ata/.pata_sis.o.cmd: 9 extents found

Note that corruption can only occur if the first extent (the inline extent
at offset 0) is compressed (encoded).  Uncompressed inline extents (like
the one above) will not be corrupted due to the fix in commit 93c82d5750.

If commit 93c82d5750 is reverted, you can get corruption on uncompressed
files too.







>Thanks,
>Xin
> 
>Sent: Saturday, December 10, 2016 at 9:16 PM
>From: "Zygo Blaxell" <ce3g8...@umail.furryterror.org>
>To: "Roman Mamedov" <r...@romanrm.net>, "Filipe Manana" 
> <fdman...@gmail.com>
>Cc: linux-btrfs@vger.kernel.org
>Subject: Re: [PATCH] btrfs: fix hole read corruption for compressed inline
>extents
>Ping?
> 
>I know at least two people have read this patch, but it hasn't appeared in
>the usual integration branches yet, and I've seen no actionable suggestion
>to improve it. I've provided two non-overlapping rationales for it.
>Is there something else you are looking for?
> 
>This patch is a fix for a simple data corruption bug. It (or some
>equivalent fix for the same bug) should be on its way to all stable
>kernels starting from 2.6.32.
> 
>Thanks
> 
>On Mon, Nov 28, 2016 at 05:27:10PM +0500, Roman Mamedov wrote:
>> On Mon, 28 Nov 2016 00:03:12 -0500
>> Zygo Blaxell <ce3g8...@umail.furryterror.org> wrote:
>>
>> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> > index 8e3a5a2..b1314d6 100644
>> > --- a/fs/btrfs/inode.c
>> > +++ b/fs/btrfs/inode.c
>> > @@ -6803,6 +6803,12 @@ static noinline int uncompress_inline(struct
>btrfs_path *path,
>> > max_size = min_t(unsigned long, PAGE_SIZE, max_size);
>> > ret = btrfs_decompress(compress_type, tmp, page,
>> > extent_offset, inline_size, max_size);
>> > + WARN_ON(max_size > PAGE_SIZE);
>> > + if (max_size < PAGE_SIZE) {
>> > + char *map = kmap(page);
>> > + memset(map + max_size, 0, PAGE_SIZE - max_size);
>> > + kunmap(page);
>> > + }
>> > kfree(tmp);
>> > return ret;
>> > }
>>
>> Wasn't this already posted as:
>>
>> btrfs: fix silent data corruption while reading compressed inline
>extents
>> [1]https://patchwork.kernel.org/patch/9371971/
>>
>> but you don't indicate that's a V2 or something, and in fact the patch
>seems
>> exactly the same, just the subject and commit message are entirely
>different.
>> Quite confusing.
>>
>> --
>> With respect,
>> Roman
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
>in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at [2]http://vger.kernel.org/majordomo-info.html
> 
> References
> 
>Visible links
>1. https://patchwork.kernel.org/patch/9371971/
>2. http://vger.kernel.org/majordomo-info.html


signature.asc
Description: Digital signature


Re: [PATCH] btrfs: fix hole read corruption for compressed inline extents

2016-12-11 Thread Xin Zhou
Hi Zygo,
Since the corruption happens after I/O and checksum,
could it be possible to add some bug catcher code in code path for debug build,
to help narrowing down the issue?
Thanks,
Xin
 
 

Sent: Saturday, December 10, 2016 at 9:16 PM
From: "Zygo Blaxell" <ce3g8...@umail.furryterror.org>
To: "Roman Mamedov" <r...@romanrm.net>, "Filipe Manana" <fdman...@gmail.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: fix hole read corruption for compressed inline 
extents
Ping?

I know at least two people have read this patch, but it hasn't appeared in
the usual integration branches yet, and I've seen no actionable suggestion
to improve it. I've provided two non-overlapping rationales for it.
Is there something else you are looking for?

This patch is a fix for a simple data corruption bug. It (or some
equivalent fix for the same bug) should be on its way to all stable
kernels starting from 2.6.32.

Thanks

On Mon, Nov 28, 2016 at 05:27:10PM +0500, Roman Mamedov wrote:
> On Mon, 28 Nov 2016 00:03:12 -0500
> Zygo Blaxell <ce3g8...@umail.furryterror.org> wrote:
>
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index 8e3a5a2..b1314d6 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -6803,6 +6803,12 @@ static noinline int uncompress_inline(struct 
> > btrfs_path *path,
> > max_size = min_t(unsigned long, PAGE_SIZE, max_size);
> > ret = btrfs_decompress(compress_type, tmp, page,
> > extent_offset, inline_size, max_size);
> > + WARN_ON(max_size > PAGE_SIZE);
> > + if (max_size < PAGE_SIZE) {
> > + char *map = kmap(page);
> > + memset(map + max_size, 0, PAGE_SIZE - max_size);
> > + kunmap(page);
> > + }
> > kfree(tmp);
> > return ret;
> > }
>
> Wasn't this already posted as:
>
> btrfs: fix silent data corruption while reading compressed inline extents
> https://patchwork.kernel.org/patch/9371971/
>
> but you don't indicate that's a V2 or something, and in fact the patch seems
> exactly the same, just the subject and commit message are entirely different.
> Quite confusing.
>
> --
> With respect,
> Roman
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at 
> http://vger.kernel.org/majordomo-info.html[http://vger.kernel.org/majordomo-info.html]
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: fix hole read corruption for compressed inline extents

2016-12-10 Thread Zygo Blaxell
Ping?

I know at least two people have read this patch, but it hasn't appeared in
the usual integration branches yet, and I've seen no actionable suggestion
to improve it.  I've provided two non-overlapping rationales for it.
Is there something else you are looking for?

This patch is a fix for a simple data corruption bug.  It (or some
equivalent fix for the same bug) should be on its way to all stable
kernels starting from 2.6.32.

Thanks

On Mon, Nov 28, 2016 at 05:27:10PM +0500, Roman Mamedov wrote:
> On Mon, 28 Nov 2016 00:03:12 -0500
> Zygo Blaxell <ce3g8...@umail.furryterror.org> wrote:
> 
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index 8e3a5a2..b1314d6 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -6803,6 +6803,12 @@ static noinline int uncompress_inline(struct 
> > btrfs_path *path,
> > max_size = min_t(unsigned long, PAGE_SIZE, max_size);
> > ret = btrfs_decompress(compress_type, tmp, page,
> >extent_offset, inline_size, max_size);
> > +   WARN_ON(max_size > PAGE_SIZE);
> > +   if (max_size < PAGE_SIZE) {
> > +   char *map = kmap(page);
> > +   memset(map + max_size, 0, PAGE_SIZE - max_size);
> > +   kunmap(page);
> > +   }
> > kfree(tmp);
> > return ret;
> >  }
> 
> Wasn't this already posted as:
> 
> btrfs: fix silent data corruption while reading compressed inline extents
> https://patchwork.kernel.org/patch/9371971/
> 
> but you don't indicate that's a V2 or something, and in fact the patch seems
> exactly the same, just the subject and commit message are entirely different.
> Quite confusing.
> 
> -- 
> With respect,
> Roman
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


signature.asc
Description: Digital signature


Re: [PATCH] btrfs: fix hole read corruption for compressed inline extents

2016-11-28 Thread Zygo Blaxell
On Mon, Nov 28, 2016 at 05:27:10PM +0500, Roman Mamedov wrote:
> On Mon, 28 Nov 2016 00:03:12 -0500
> Zygo Blaxell <ce3g8...@umail.furryterror.org> wrote:
> 
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index 8e3a5a2..b1314d6 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -6803,6 +6803,12 @@ static noinline int uncompress_inline(struct 
> > btrfs_path *path,
> > max_size = min_t(unsigned long, PAGE_SIZE, max_size);
> > ret = btrfs_decompress(compress_type, tmp, page,
> >extent_offset, inline_size, max_size);
> > +   WARN_ON(max_size > PAGE_SIZE);
> > +   if (max_size < PAGE_SIZE) {
> > +   char *map = kmap(page);
> > +   memset(map + max_size, 0, PAGE_SIZE - max_size);
> > +   kunmap(page);
> > +   }
> > kfree(tmp);
> > return ret;
> >  }
> 
> Wasn't this already posted as:
> 
> btrfs: fix silent data corruption while reading compressed inline extents
> https://patchwork.kernel.org/patch/9371971/
> 
> but you don't indicate that's a V2 or something, and in fact the patch seems
> exactly the same, just the subject and commit message are entirely different.
> Quite confusing.

The previous commit message discussed the related hole-creation bug,
including a reproducer; however, this patch does not fix the hole-creation
bug and was never intended to.  Despite my follow-up clarification,
reviewers got distracted by the hole-creation bug discussion and didn't
recover, so the patch didn't go anywhere.

This patch only fixes _reading_ the holes after they are created, and
the new commit message and subject line state that much more clearly.

The patch didn't change, so I didn't add 'v2'.  There's no 'v1' with
the same title, so I thought a 'v2' tag would be more confusing than
just starting over.

The hole-creation bug is a very old, low-urgency issue.  btrfs filesystems
in the field have the buggy holes already, and have been creating new
ones from 2009(*) to the present.  I had to ask a few people before I found
one who know whether it was even a bug, or intentional behavior from
the beginning.


(*) 2009 is the oldest commit date I can find that introduces a change
which would only be necessary in the presence of the hole-creation bug.
I have not been able to test kernels before 2012 because they crash
while running my reproducer.

> -- 
> With respect,
> Roman
> 


signature.asc
Description: Digital signature


Re: [PATCH] btrfs: fix hole read corruption for compressed inline extents

2016-11-28 Thread Roman Mamedov
On Mon, 28 Nov 2016 00:03:12 -0500
Zygo Blaxell <ce3g8...@umail.furryterror.org> wrote:

> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 8e3a5a2..b1314d6 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -6803,6 +6803,12 @@ static noinline int uncompress_inline(struct 
> btrfs_path *path,
>   max_size = min_t(unsigned long, PAGE_SIZE, max_size);
>   ret = btrfs_decompress(compress_type, tmp, page,
>  extent_offset, inline_size, max_size);
> + WARN_ON(max_size > PAGE_SIZE);
> + if (max_size < PAGE_SIZE) {
> + char *map = kmap(page);
> + memset(map + max_size, 0, PAGE_SIZE - max_size);
> + kunmap(page);
> + }
>   kfree(tmp);
>   return ret;
>  }

Wasn't this already posted as:

btrfs: fix silent data corruption while reading compressed inline extents
https://patchwork.kernel.org/patch/9371971/

but you don't indicate that's a V2 or something, and in fact the patch seems
exactly the same, just the subject and commit message are entirely different.
Quite confusing.

-- 
With respect,
Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: fix hole read corruption for compressed inline extents

2016-11-27 Thread Zygo Blaxell
Commit c8b978188c ("Btrfs: Add zlib compression support") produces
data corruption when reading a file with a hole positioned after an
inline extent.  btrfs_get_extent will return uninitialized kernel memory
instead of zero bytes in the hole.

Commit 93c82d5750 ("Btrfs: zero page past end of inline file items")
fills the hole by memset to zero after *uncompressed* inline extents.

This patch provides the missing memset for holes after *compressed*
inline extents.

The offending holes appear in the wild and will appear during routine
data integrity audits (e.g. comparing backups against their originals).
They can also be created intentionally by fuzzing or crafting a filesystem
image.

Holes like these are not intended to occur in btrfs; however, I tested
tagged kernels between v3.5 and the present, and found that all of them
can create such holes.  Whether we like them or not, this kind of hole
is now part of the btrfs de-facto on-disk format, and we need to be able
to read such holes without an infoleak or wrong data.

An example of a hole leading to data corruption:

item 61 key (606890 INODE_ITEM 0) itemoff 9662 itemsize 160
inode generation 50 transid 50 size 47424 nbytes 49141
block group 0 mode 100644 links 1 uid 0 gid 0
rdev 0 flags 0x0(none)
item 62 key (606890 INODE_REF 603050) itemoff 9642 itemsize 20
inode ref index 3 namelen 10 name: DB_File.so
item 63 key (606890 EXTENT_DATA 0) itemoff 8280 itemsize 1362
inline extent data size 1341 ram 4085 compress(zlib)
item 64 key (606890 EXTENT_DATA 4096) itemoff 8227 itemsize 53
extent data disk byte 5367308288 nr 20480
extent data offset 0 nr 45056 ram 45056
extent compression(zlib)

Different data appears in userspace during each uncached read of the 10
bytes between offset 4085 and 4095.  The extent in item 63 is not long
enough to fill the first page of the file, so a memset is required to
fill the space between item 63 (ending at 4085) and item 64 (beginning
at 4096) with zero.

Signed-off-by: Zygo Blaxell <ce3g8...@umail.furryterror.org>

---
 fs/btrfs/inode.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 8e3a5a2..b1314d6 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6803,6 +6803,12 @@ static noinline int uncompress_inline(struct btrfs_path 
*path,
max_size = min_t(unsigned long, PAGE_SIZE, max_size);
ret = btrfs_decompress(compress_type, tmp, page,
   extent_offset, inline_size, max_size);
+   WARN_ON(max_size > PAGE_SIZE);
+   if (max_size < PAGE_SIZE) {
+   char *map = kmap(page);
+   memset(map + max_size, 0, PAGE_SIZE - max_size);
+   kunmap(page);
+   }
kfree(tmp);
return ret;
 }
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: fix silent data corruption while reading compressed inline extents

2016-10-12 Thread Zygo Blaxell
t; a WARN_ON for that case.
> >
> > To reproduce the bug, run this on a 3072M kvm VM:
> >
> > #!/bin/sh
> > # Use your favorite block device here
> > blk=/dev/vdc
> >
> > # Create test filesystem and mount point
> > mkdir -p /try
> > mkfs.btrfs -dsingle -mdup -O ^extref,^skinny-metadata,^no-holes -f 
> > "$blk" || exit 1
> > mount -ocompress-force,flushoncommit,max_inline=8192,noatime "$blk" 
> > /try || exit 1
> >
> > # Create a few inline extents in larger files.
> > # Multiple processes seem to be necessary.
> > y=/usr; for x in $(seq 10 19); do rsync -axHSWI "$y/." "$x"; 
> > y="$x"; done &
> > y=/usr; for x in $(seq 20 29); do rsync -axHSWI "$y/." "$x"; 
> > y="$x"; done &
> > y=/usr; for x in $(seq 30 39); do rsync -axHSWI "$y/." "$x"; 
> > y="$x"; done &
> > y=/usr; for x in $(seq 40 49); do rsync -axHSWI "$y/." "$x"; 
> > y="$x"; done &
> > wait
> >
> > # Make a list of the files with inline extents
> > touch /try/list
> > find -type f -size +4097c -exec sh -c 'for x; do if filefrag -v 
> > "$x" | sed -n "4p" | grep -q "inline"; then echo "$x" >> list; fi; done' -- 
> > {} +
> >
> > # Check the inline extents to see if they change as they are read 
> > multiple times
> > while read -r x; do
> > sum="$(sha1sum "$x")"
> > for y in $(seq 0 99); do
> > sysctl vm.drop_caches=1
> > sum2="$(sha1sum "$x")"
> > if [ "$sum" != "$sum2" ]; then
> > echo "Inconsistent reads from '$x'"
> > exit 1
> > fi
> > done
> > done < list
> >
> > The reproducer may need to run up to 20 times before it finds a
> > corruption.
> >
> > Signed-off-by: Zygo Blaxell <ce3g8...@umail.furryterror.org>
> > ---
> >  fs/btrfs/inode.c | 6 ++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index e6811c4..34f9c80 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -6791,6 +6791,12 @@ static noinline int uncompress_inline(struct 
> > btrfs_path *path,
> > max_size = min_t(unsigned long, PAGE_SIZE, max_size);
> > ret = btrfs_decompress(compress_type, tmp, page,
> >extent_offset, inline_size, max_size);
> > +   WARN_ON(max_size > PAGE_SIZE);
> > +   if (max_size < PAGE_SIZE) {
> > +   char *map = kmap(page);
> > +   memset(map + max_size, 0, PAGE_SIZE - max_size);
> > +   kunmap(page);
> > +   }
> > kfree(tmp);
> > return ret;
> >  }
> > --
> > 2.1.4
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 
> -- 
> Filipe David Manana,
> 
> "People will forget what you said,
>  people will forget what you did,
>  but people will never forget how you made them feel."
> 


signature.asc
Description: Digital signature


Re: [PATCH] btrfs: fix silent data corruption while reading compressed inline extents

2016-10-12 Thread Filipe Manana
On Wed, Oct 12, 2016 at 3:28 AM, Zygo Blaxell
<ce3g8...@umail.furryterror.org> wrote:
> rsync -S causes a large number of small writes separated by small seeks
> to form sparse holes in files that contain runs of zero bytes.  Rarely,
> this can lead btrfs to write a file with a compressed inline extent
> followed by other data, like this:
>
> Filesystem type is: 9123683e
> File size of /try/./30/share/locale/nl/LC_MESSAGES/tar.mo is 61906 (16 blocks 
> of 4096 bytes)
>  ext: logical_offset:physical_offset: length:   expected: flags:
>0:0..4095:  0..  4095:   4096: 
> encoded,not_aligned,inline
>1:1..  15: 331372..331386: 15:  1: 
> last,encoded,eof
> /try/./30/share/locale/nl/LC_MESSAGES/tar.mo: 2 extents found

So that's the problem. We should never have non-inlined extents after
an inline extent (this causes problems in many places) - the only
exception is for pre-allocated extents created with fallocate using
its flag FALLOC_FL_KEEP_SIZE (so that the inode size doesn't change).

btrfs_cont_expand() is responsible for inserting holes, and for the
first page, which is where the inline extent belongs to, it pads it
with zeroes (done through btrfs_truncate_block()). So the bug here is
some corner case where we don't do this hole expansion thing and end
up with an inline extent followed by some non-inlined one and with an
isize (file size) greater than the uncompressed size of the inline
extent.

Given how hard the bug is to trigger with your reproducer below, and
that you didn't find a way to trigger it using deterministic steps
like for example:

for ((i = 1; i <= 1000; i++)); do
umount /dev/sdd &> /dev/null
mkfs.btrfs -f /dev/sdd > /dev/null
mount -o compress=lzo /dev/sdd /mnt/sdd
xfs_io -f -c "pwrite -S 0xaa 0 1500" /mnt/sdd/foobar > /dev/null
sync
xfs_io -c "pwrite -S 0xbb 1M 4K" /mnt/sdd/foobar > /dev/null
DIGEST1=$(md5sum /mnt/sdd/foobar | cut -d ' ' -f 1)
echo 3 > /proc/sys/vm/drop_caches
DIGEST2=$(md5sum /mnt/sdd/foobar | cut -d ' ' -f 1)
echo -e "\n\nIteration $i\n"
echo "Digest1 = $DIGEST1"
echo "Digest2 = $DIGEST2"
echo
if [ "$DIGEST1" != "$DIGEST2" ]; then
exit 1
fi
done

It's more than evidence that your fix is only hidding away the real problem.

Once it's figured out why this happens, we should ideally have a
deterministic or easy to trigger reproducer for xfstests.

thanks

>
> The inline extent size is less than the page size, so the ram_bytes field
> in the extent is smaller than 4096.  The difference between ram_bytes and
> the end of the first page of the file forms a small hole.  Like any other
> hole, the correct value of each byte within the hole is zero.
>
> When the inline extent is not compressed, btrfs_get_extent copies the
> inline extent data and then memsets the remainder of the page to zero.
> There is no corruption in this case.
>
> When the inline extent is compressed, uncompress_inline uses the
> ram_bytes field from the extent ref as the size of the uncompressed data.
> ram_bytes is smaller than the page size, so the remainder of the page
> (i.e. the bytes in the small hole) is uninitialized memory.  Each time the
> extent is read into the page cache, userspace may see different contents.
>
> Fix this by zeroing out the difference between the size of the
> uncompressed inline extent and PAGE_CACHE_SIZE in uncompress_inline.
>
> Only bytes within the hole are affected, so affected files can be read
> correctly with a fixed kernel.  The corruption happens after IO and
> checksum validation, so the corruption is never reported in dmesg or
> counted in dev stats.
>
> The bug is at least as old as 3.5.7 (the oldest kernel I can conveniently
> test), and possibly much older.
>
> The code may not be correct if the extent is larger than a page, so add
> a WARN_ON for that case.
>
> To reproduce the bug, run this on a 3072M kvm VM:
>
> #!/bin/sh
> # Use your favorite block device here
> blk=/dev/vdc
>
> # Create test filesystem and mount point
>     mkdir -p /try
> mkfs.btrfs -dsingle -mdup -O ^extref,^skinny-metadata,^no-holes -f 
> "$blk" || exit 1
> mount -ocompress-force,flushoncommit,max_inline=8192,noatime "$blk" 
> /try || exit 1
>
> # Create a few inline extents in larger files.
> # Multiple processes seem to be necessary.
> y=/usr; for x in $(seq 10 19); do rsync -axHSWI "$y/." "$x"; y="$x"; 
> done &
> y=/usr; for x in $(seq 20 29); do rsync -axHSWI "$y/." "$x"; y="$x"; 
> done &
> y=/usr; for x in $(seq 30 39); do rsyn

[PATCH] btrfs: fix silent data corruption while reading compressed inline extents

2016-10-11 Thread Zygo Blaxell
rsync -S causes a large number of small writes separated by small seeks
to form sparse holes in files that contain runs of zero bytes.  Rarely,
this can lead btrfs to write a file with a compressed inline extent
followed by other data, like this:

Filesystem type is: 9123683e
File size of /try/./30/share/locale/nl/LC_MESSAGES/tar.mo is 61906 (16 blocks 
of 4096 bytes)
 ext: logical_offset:physical_offset: length:   expected: flags:
   0:0..4095:  0..  4095:   4096: 
encoded,not_aligned,inline
   1:1..  15: 331372..331386: 15:  1: 
last,encoded,eof
/try/./30/share/locale/nl/LC_MESSAGES/tar.mo: 2 extents found

The inline extent size is less than the page size, so the ram_bytes field
in the extent is smaller than 4096.  The difference between ram_bytes and
the end of the first page of the file forms a small hole.  Like any other
hole, the correct value of each byte within the hole is zero.

When the inline extent is not compressed, btrfs_get_extent copies the
inline extent data and then memsets the remainder of the page to zero.
There is no corruption in this case.

When the inline extent is compressed, uncompress_inline uses the
ram_bytes field from the extent ref as the size of the uncompressed data.
ram_bytes is smaller than the page size, so the remainder of the page
(i.e. the bytes in the small hole) is uninitialized memory.  Each time the
extent is read into the page cache, userspace may see different contents.

Fix this by zeroing out the difference between the size of the
uncompressed inline extent and PAGE_CACHE_SIZE in uncompress_inline.

Only bytes within the hole are affected, so affected files can be read
correctly with a fixed kernel.  The corruption happens after IO and
checksum validation, so the corruption is never reported in dmesg or
counted in dev stats.

The bug is at least as old as 3.5.7 (the oldest kernel I can conveniently
test), and possibly much older.

The code may not be correct if the extent is larger than a page, so add
a WARN_ON for that case.

To reproduce the bug, run this on a 3072M kvm VM:

#!/bin/sh
# Use your favorite block device here
blk=/dev/vdc

# Create test filesystem and mount point
mkdir -p /try
mkfs.btrfs -dsingle -mdup -O ^extref,^skinny-metadata,^no-holes -f 
"$blk" || exit 1
mount -ocompress-force,flushoncommit,max_inline=8192,noatime "$blk" 
/try || exit 1

    # Create a few inline extents in larger files.
# Multiple processes seem to be necessary.
y=/usr; for x in $(seq 10 19); do rsync -axHSWI "$y/." "$x"; y="$x"; 
done &
y=/usr; for x in $(seq 20 29); do rsync -axHSWI "$y/." "$x"; y="$x"; 
done &
y=/usr; for x in $(seq 30 39); do rsync -axHSWI "$y/." "$x"; y="$x"; 
done &
y=/usr; for x in $(seq 40 49); do rsync -axHSWI "$y/." "$x"; y="$x"; 
done &
wait

# Make a list of the files with inline extents
touch /try/list
find -type f -size +4097c -exec sh -c 'for x; do if filefrag -v "$x" | 
sed -n "4p" | grep -q "inline"; then echo "$x" >> list; fi; done' -- {} +

# Check the inline extents to see if they change as they are read 
multiple times
while read -r x; do
sum="$(sha1sum "$x")"
for y in $(seq 0 99); do
sysctl vm.drop_caches=1
sum2="$(sha1sum "$x")"
if [ "$sum" != "$sum2" ]; then
echo "Inconsistent reads from '$x'"
exit 1
fi
done
done < list

The reproducer may need to run up to 20 times before it finds a
corruption.

Signed-off-by: Zygo Blaxell <ce3g8...@umail.furryterror.org>
---
 fs/btrfs/inode.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e6811c4..34f9c80 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6791,6 +6791,12 @@ static noinline int uncompress_inline(struct btrfs_path 
*path,
max_size = min_t(unsigned long, PAGE_SIZE, max_size);
ret = btrfs_decompress(compress_type, tmp, page,
   extent_offset, inline_size, max_size);
+   WARN_ON(max_size > PAGE_SIZE);
+   if (max_size < PAGE_SIZE) {
+   char *map = kmap(page);
+   memset(map + max_size, 0, PAGE_SIZE - max_size);
+   kunmap(page);
+   }
kfree(tmp);
return ret;
 }
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mixed inline, non-inline extents leading to EIO when reading small files

2016-05-26 Thread Chris Mason
On Thu, May 26, 2016 at 12:19:52PM -0400, Zygo Blaxell wrote:
> I frequently see these in /etc/lvm/backup/*.  Something that LVM does
> when it writes these files triggers the problem.  This problem occurs
> in kernels 3.18..4.4.11 (i.e. all the kernels I've tested).
> 
> btrfs-debug-tree finds this:
> 
> item 26 key (2702988 INODE_ITEM 0) itemoff 12632 itemsize 160
> inode generation 49642 transid 49799 size 7856 nbytes 8192
> block group 0 mode 100644 links 1 uid 0 gid 0
> rdev 0 flags 0x0
> item 27 key (2702988 INODE_REF 2799) itemoff 12617 itemsize 15
> inode ref index 4 namelen 5 name: volgr
> item 28 key (2702988 EXTENT_DATA 0) itemoff 11247 itemsize 1370
> inline extent data size 1349 ram 4096 compress(zlib)
> item 29 key (2702988 EXTENT_DATA 4096) itemoff 11194 itemsize 53
> extent data disk byte 1161560064 nr 4096
> extent data offset 0 nr 4096 ram 4096
> extent compression(none)
> 
> When the problem occurs it usually affects all files in /etc/lvm/backup.
> I have seen it randomly in other parts of the filesystem but it's much
> rarer elsewhere.
> 
> Attempts to read this file return EIO.  There are no errors reported in
> scrub or kmesg.
> 
> Filesystem is mounted with options:
> 
>   
> noatime,compress-force=zlib,flushoncommit,space_cache,skip_balance,commit=300
> 
> Am I missing anything?

I've got this queued up to send out.  We hit this with holes instead of
extents, and without compression, but its a similar problem:

Btrfs: deal with duplciates during extent_map insertion in btrfs_get_extent

When dealing with inline extents, btrfs_get_extent will incorrectly try
to insert a duplicate extent_map.  The dup hits -EEXIST from
add_extent_map, but then we try to merge with the existing one and end
up trying to insert a zero length extent_map.

This actually works most of the time, except when there are extent maps
past the end of the inline extent.  rocksdb will trigger this sometimes
because it preallocates an extent and then truncates down.

Josef made a script to trigger with xfs_io:

#!/bin/bash

xfs_io -f -c "pwrite 0 1000" inline
xfs_io -c "falloc -k 4k 1M" inline
xfs_io -c "pread 0 1000" -c "fadvise -d 0 1000" -c "pread 0 1000" inline
xfs_io -c "fadvise -d 0 1000" inline
cat inline

You'll get EIOs trying to read inline after this because add_extent_map
is returning EEXIST

Signed-off-by: Chris Mason <c...@fb.com>

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 98a3ba2..4352589 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6914,7 +6914,18 @@ insert:
 * existing will always be non-NULL, since there must be
 * extent causing the -EEXIST.
 */
-   if (start >= extent_map_end(existing) ||
+   if (existing->start == em->start &&
+   extent_map_end(existing) == extent_map_end(em) &&
+   em->block_start == existing->block_start) {
+   /*
+* these two extents are the same, it happens
+* with inlines especially
+*/
+   free_extent_map(em);
+   em = existing;
+   err = 0;
+
+   } else if (start >= extent_map_end(existing) ||
start <= existing->start) {
/*
 * The existing extent map is the one nearest to
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mixed inline, non-inline extents leading to EIO when reading small files

2016-05-26 Thread Zygo Blaxell
I frequently see these in /etc/lvm/backup/*.  Something that LVM does
when it writes these files triggers the problem.  This problem occurs
in kernels 3.18..4.4.11 (i.e. all the kernels I've tested).

btrfs-debug-tree finds this:

item 26 key (2702988 INODE_ITEM 0) itemoff 12632 itemsize 160
inode generation 49642 transid 49799 size 7856 nbytes 8192
block group 0 mode 100644 links 1 uid 0 gid 0
rdev 0 flags 0x0
item 27 key (2702988 INODE_REF 2799) itemoff 12617 itemsize 15
inode ref index 4 namelen 5 name: volgr
item 28 key (2702988 EXTENT_DATA 0) itemoff 11247 itemsize 1370
inline extent data size 1349 ram 4096 compress(zlib)
item 29 key (2702988 EXTENT_DATA 4096) itemoff 11194 itemsize 53
extent data disk byte 1161560064 nr 4096
extent data offset 0 nr 4096 ram 4096
extent compression(none)

When the problem occurs it usually affects all files in /etc/lvm/backup.
I have seen it randomly in other parts of the filesystem but it's much
rarer elsewhere.

Attempts to read this file return EIO.  There are no errors reported in
scrub or kmesg.

Filesystem is mounted with options:


noatime,compress-force=zlib,flushoncommit,space_cache,skip_balance,commit=300

Am I missing anything?


signature.asc
Description: Digital signature


[PATCH V9 09/12] Btrfs: Limit inline extents to root->sectorsize

2015-11-15 Thread Chandan Rajendra
cow_file_range_inline() limits the size of an inline extent to
PAGE_CACHE_SIZE. This breaks in subpagesize-blocksize scenarios. Fix this by
comparing against root->sectorsize.

Reviewed-by: Josef Bacik 
Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d466ff4..5fe6c8d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -257,7 +257,7 @@ static noinline int cow_file_range_inline(struct btrfs_root 
*root,
data_len = compressed_size;
 
if (start > 0 ||
-   actual_end > PAGE_CACHE_SIZE ||
+   actual_end > root->sectorsize ||
data_len > BTRFS_MAX_INLINE_DATA_SIZE(root) ||
(!compressed_size &&
(actual_end & (root->sectorsize - 1)) == 0) ||
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V8 09/13] Btrfs: Limit inline extents to root->sectorsize

2015-10-28 Thread Chandan Rajendra
cow_file_range_inline() limits the size of an inline extent to
PAGE_CACHE_SIZE. This breaks in subpagesize-blocksize scenarios. Fix this by
comparing against root->sectorsize.

Reviewed-by: Josef Bacik 
Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index cbb05d0..e2f7699 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -257,7 +257,7 @@ static noinline int cow_file_range_inline(struct btrfs_root 
*root,
data_len = compressed_size;
 
if (start > 0 ||
-   actual_end > PAGE_CACHE_SIZE ||
+   actual_end > root->sectorsize ||
data_len > BTRFS_MAX_INLINE_DATA_SIZE(root) ||
(!compressed_size &&
(actual_end & (root->sectorsize - 1)) == 0) ||
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V7 09/13] Btrfs: Limit inline extents to root->sectorsize

2015-10-27 Thread Chandan Rajendra
cow_file_range_inline() limits the size of an inline extent to
PAGE_CACHE_SIZE. This breaks in subpagesize-blocksize scenarios. Fix this by
comparing against root->sectorsize.

Reviewed-by: Josef Bacik 
Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 6d66558..3a527af 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -257,7 +257,7 @@ static noinline int cow_file_range_inline(struct btrfs_root 
*root,
data_len = compressed_size;
 
if (start > 0 ||
-   actual_end > PAGE_CACHE_SIZE ||
+   actual_end > root->sectorsize ||
data_len > BTRFS_MAX_INLINE_DATA_SIZE(root) ||
(!compressed_size &&
(actual_end & (root->sectorsize - 1)) == 0) ||
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V6 09/13] Btrfs: Limit inline extents to root->sectorsize

2015-10-18 Thread Chandan Rajendra
cow_file_range_inline() limits the size of an inline extent to
PAGE_CACHE_SIZE. This breaks in subpagesize-blocksize scenarios. Fix this by
comparing against root->sectorsize.

Reviewed-by: Josef Bacik 
Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1826603..c0c83f1 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -257,7 +257,7 @@ static noinline int cow_file_range_inline(struct btrfs_root 
*root,
data_len = compressed_size;
 
if (start > 0 ||
-   actual_end > PAGE_CACHE_SIZE ||
+   actual_end > root->sectorsize ||
data_len > BTRFS_MAX_INLINE_DATA_SIZE(root) ||
(!compressed_size &&
(actual_end & (root->sectorsize - 1)) == 0) ||
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix file corruption and data loss after cloning inline extents

2015-10-13 Thread fdmanana
 bytes, file has 0 bytes i_size (the
  # clone operation failed and did not modify our file).
  od -t x1 $SCRATCH_MNT/foo3
  $XFS_IO_PROG -c "pwrite -S 0xff 0 90" $SCRATCH_MNT/foo3 | _filter_xfs_io

  # Test cloning the inline extent against a file which consists of a
  # single inline extent that has a size not greater than the size of
  # bar's inline extent (40 < 50).
  # It should be possible to do the extent cloning from bar to this file.
  $XFS_IO_PROG -f -c "pwrite -S 0x01 0 40" $SCRATCH_MNT/foo4 \
  | _filter_xfs_io
  $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo4

  # Doing IO against any range in the first 4K of the file should work.
  echo "File foo4 data after clone operation:"
  # Must match file bar's content.
  od -t x1 $SCRATCH_MNT/foo4
  $XFS_IO_PROG -c "pwrite -S 0x02 0 90" $SCRATCH_MNT/foo4 | _filter_xfs_io

  # Test cloning the inline extent against a file which consists of a
  # single inline extent that has a size greater than the size of bar's
  # inline extent (60 > 50).
  # It should not be possible to clone the inline extent from file bar
  # into this file.
  $XFS_IO_PROG -f -c "pwrite -S 0x03 0 60" $SCRATCH_MNT/foo5 \
  | _filter_xfs_io
  $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo5

  # Reading the file should not fail.
  echo "File foo5 data after clone operation:"
  # Must have a size of 60 bytes, with all bytes having a value of 0x03
  # (the clone operation failed and did not modify our file).
  od -t x1 $SCRATCH_MNT/foo5

  # Test cloning the inline extent against a file which has no extents but
  # has a size greater than bar's inline extent (16K > 50).
  # It should not be possible to clone the inline extent from file bar
  # into this file.
  $XFS_IO_PROG -f -c "truncate 16K" $SCRATCH_MNT/foo6 | _filter_xfs_io
  $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo6

  # Reading the file should not fail.
  echo "File foo6 data after clone operation:"
  # Must have a size of 16K, with all bytes having a value of 0x00 (the
  # clone operation failed and did not modify our file).
  od -t x1 $SCRATCH_MNT/foo6

  # Test cloning the inline extent against a file which has no extents but
  # has a size not greater than bar's inline extent (30 < 50).
  # It should be possible to clone the inline extent from file bar into
  # this file.
  $XFS_IO_PROG -f -c "truncate 30" $SCRATCH_MNT/foo7 | _filter_xfs_io
  $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo7

  # Reading the file should not fail.
  echo "File foo7 data after clone operation:"
  # Must have a size of 50 bytes, with all bytes having a value of 0xbb.
  od -t x1 $SCRATCH_MNT/foo7

  # Test cloning the inline extent against a file which has a size not
  # greater than the size of bar's inline extent (20 < 50) but has
  # a prealloc extent that goes beyond the file's size. It should not be
  # possible to clone the inline extent from bar into this file.
  $XFS_IO_PROG -f -c "falloc -k 0 1M" \
  -c "pwrite -S 0x88 0 20" \
  $SCRATCH_MNT/foo8 | _filter_xfs_io
  $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo8

  echo "File foo8 data after clone operation:"
  # Must have a size of 20 bytes, with all bytes having a value of 0x88
  # (the clone operation did not modify our file).
  od -t x1 $SCRATCH_MNT/foo8

  _scratch_unmount
  }

  echo -e "\nTesting without compression and without the no-holes feature...\n"
  test_cloning_inline_extents

  echo -e "\nTesting with compression and without the no-holes feature...\n"
  test_cloning_inline_extents "" "-o compress"

  echo -e "\nTesting without compression and with the no-holes feature...\n"
  test_cloning_inline_extents "-O no-holes" ""

  echo -e "\nTesting with compression and with the no-holes feature...\n"
  test_cloning_inline_extents "-O no-holes" "-o compress"

  status=0
  exit

Signed-off-by: Filipe Manana <fdman...@suse.com>
---
 fs/btrfs/ioctl.c | 192 ++-
 1 file changed, 149 insertions(+), 43 deletions(-)

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 80342d3..c71cd36 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3328,6 +3328,147 @@ static void clone_update_extent_map(struct inode *inode,
_I(inode)->runtime_flags);
 }
 
+/*
+ * Make sure we do not end up inserting an inline extent into a file that has
+ * already other (non-inline) extents. If a file has an inline extent i

[PATCH 1/2] fstests: btrfs test for cloning of inline extents

2015-10-13 Thread fdmanana
From: Filipe Manana <fdman...@suse.com>

Test several cases of cloning inline extents that used to lead to file
corruption or data loss.

These file corruption and data loss cases are fixed by the linux kernel
patch titled:

  "Btrfs: fix file corruption and data loss after cloning inline extents"

Signed-off-by: Filipe Manana <fdman...@suse.com>
---
 tests/btrfs/110 | 199 
 tests/btrfs/110.out | 257 
 tests/btrfs/group   |   1 +
 3 files changed, 457 insertions(+)
 create mode 100755 tests/btrfs/110
 create mode 100644 tests/btrfs/110.out

diff --git a/tests/btrfs/110 b/tests/btrfs/110
new file mode 100755
index 000..327c8c0
--- /dev/null
+++ b/tests/btrfs/110
@@ -0,0 +1,199 @@
+#! /bin/bash
+# FSQA Test No. 110
+#
+# Test several cases of cloning inline extents that used to lead to file
+# corruption or data loss.
+#
+#---
+#
+# Copyright (C) 2015 SUSE Linux Products GmbH. All Rights Reserved.
+# Author: Filipe Manana <fdman...@suse.com>
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_need_to_be_root
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_require_cloner
+_require_btrfs_fs_feature "no_holes"
+_require_btrfs_mkfs_feature "no-holes"
+
+rm -f $seqres.full
+
+test_cloning_inline_extents()
+{
+   local mkfs_opts=$1
+   local mount_opts=$2
+
+   _scratch_mkfs $mkfs_opts >>$seqres.full 2>&1
+   _scratch_mount $mount_opts
+
+   # File bar, the source for all the following clone operations, consists
+# of a # single inline extent (50 bytes).
+   $XFS_IO_PROG -f -c "pwrite -S 0xbb 0 50" $SCRATCH_MNT/bar \
+   | _filter_xfs_io
+
+   # Test cloning into a file with an extent (non-inlined) where the
+   # destination offset overlaps that extent. It should not be possible to
+   # clone the inline extent from file bar into this file.
+   $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 16K" $SCRATCH_MNT/foo \
+   | _filter_xfs_io
+   $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo
+
+   # Doing IO against any range in the first 4K of the file should work.
+   # Due to a past clone ioctl bug which allowed cloning the inline extent,
+   # these operations resulted in EIO errors.
+   echo "File foo data after clone operation:"
+   # All bytes should have the value 0xaa (clone operation failed and did
+   # not modify our file).
+   od -t x1 $SCRATCH_MNT/foo
+   $XFS_IO_PROG -c "pwrite -S 0xcc 0 100" $SCRATCH_MNT/foo | _filter_xfs_io
+
+   # Test cloning the inline extent against a file which has a hole in its
+   # first 4K followed by a non-inlined extent. It should not be possible
+   # as well to clone the inline extent from file bar into this file.
+   $XFS_IO_PROG -f -c "pwrite -S 0xdd 4K 12K" $SCRATCH_MNT/foo2 \
+   | _filter_xfs_io
+   $CLONER_PROG -s 0 -d 0 -l 0 $SCRATCH_MNT/bar $SCRATCH_MNT/foo2
+
+   # Doing IO against any range in the first 4K of the file should work.
+   # Due to a past clone ioctl bug which allowed cloning the inline extent,
+   # these operations resulted in EIO errors.
+   echo "File foo2 data after clone operation:"
+   # All bytes should have the value 0x00 (clone operation failed and did
+   # not modify our file).
+   od -t x1 $SCRATCH_MNT/foo2
+   $XFS_IO_PROG -c "pwrite -S 0xee 0 90" $SCRATCH_MNT/foo2 | _filter_xfs_io
+
+   # Test cloning the inline extent against a file which has a size of zero
+   # but has a prealloc extent. It should not be possible as well to clone
+   # the inline extent from file bar into this file.
+   $XFS_IO_PROG -f -c "fall

[PATCH V5 09/13] Btrfs: Limit inline extents to root->sectorsize

2015-09-30 Thread Chandan Rajendra
cow_file_range_inline() limits the size of an inline extent to
PAGE_CACHE_SIZE. This breaks in subpagesize-blocksize scenarios. Fix this by
comparing against root->sectorsize.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b1ceba4..b2eedb9 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -257,7 +257,7 @@ static noinline int cow_file_range_inline(struct btrfs_root 
*root,
data_len = compressed_size;
 
if (start > 0 ||
-   actual_end > PAGE_CACHE_SIZE ||
+   actual_end > root->sectorsize ||
data_len > BTRFS_MAX_INLINE_DATA_SIZE(root) ||
(!compressed_size &&
(actual_end & (root->sectorsize - 1)) == 0) ||
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH V4 09/13] Btrfs: Limit inline extents to root->sectorsize

2015-09-30 Thread Chandan Rajendra
cow_file_range_inline() limits the size of an inline extent to
PAGE_CACHE_SIZE. This breaks in subpagesize-blocksize scenarios. Fix this by
comparing against root->sectorsize.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b1ceba4..b2eedb9 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -257,7 +257,7 @@ static noinline int cow_file_range_inline(struct btrfs_root 
*root,
data_len = compressed_size;
 
if (start > 0 ||
-   actual_end > PAGE_CACHE_SIZE ||
+   actual_end > root->sectorsize ||
data_len > BTRFS_MAX_INLINE_DATA_SIZE(root) ||
(!compressed_size &&
(actual_end & (root->sectorsize - 1)) == 0) ||
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3 09/11] Btrfs: Limit inline extents to root-sectorsize

2015-08-09 Thread Chandan Rajendra
cow_file_range_inline() limits the size of an inline extent to
PAGE_CACHE_SIZE. This breaks in subpagesize-blocksize scenarios. Fix this by
comparing against root-sectorsize.

Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
---
 fs/btrfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index f7afdff..99288ff 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -257,7 +257,7 @@ static noinline int cow_file_range_inline(struct btrfs_root 
*root,
data_len = compressed_size;
 
if (start  0 ||
-   actual_end  PAGE_CACHE_SIZE ||
+   actual_end  root-sectorsize ||
data_len  BTRFS_MAX_INLINE_DATA_SIZE(root) ||
(!compressed_size 
(actual_end  (root-sectorsize - 1)) == 0) ||
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2 09/11] Btrfs: Limit inline extents to root-sectorsize

2015-08-07 Thread Chandan Rajendra
cow_file_range_inline() limits the size of an inline extent to
PAGE_CACHE_SIZE. This breaks in subpagesize-blocksize scenarios. Fix this by
comparing against root-sectorsize.

Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
---
 fs/btrfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1acee74..daf2462 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -257,7 +257,7 @@ static noinline int cow_file_range_inline(struct btrfs_root 
*root,
data_len = compressed_size;
 
if (start  0 ||
-   actual_end  PAGE_CACHE_SIZE ||
+   actual_end  root-sectorsize ||
data_len  BTRFS_MAX_INLINE_DATA_SIZE(root) ||
(!compressed_size 
(actual_end  (root-sectorsize - 1)) == 0) ||
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/11] Btrfs: Limit inline extents to root-sectorsize

2015-08-06 Thread Chandan Rajendra
cow_file_range_inline() limits the size of an inline extent to
PAGE_CACHE_SIZE. This breaks in subpagesize-blocksize scenarios. Fix this by
comparing against root-sectorsize.

Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
---
 fs/btrfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1acee74..daf2462 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -257,7 +257,7 @@ static noinline int cow_file_range_inline(struct btrfs_root 
*root,
data_len = compressed_size;
 
if (start  0 ||
-   actual_end  PAGE_CACHE_SIZE ||
+   actual_end  root-sectorsize ||
data_len  BTRFS_MAX_INLINE_DATA_SIZE(root) ||
(!compressed_size 
(actual_end  (root-sectorsize - 1)) == 0) ||
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix file corruption after cloning inline extents

2015-07-14 Thread fdmanana
From: Filipe Manana fdman...@suse.com

Using the clone ioctl (or extent_same ioctl, which calls the same extent
cloning function as well) we end up allowing copy an inline extent from
the source file into a non-zero offset of the destination file. This is
something not expected and that the btrfs code is not prepared to deal
with - all inline extents must be at a file offset equals to 0.

For example, the following excerpt of a test case for fstests triggers
a crash/BUG_ON() on a write operation after an inline extent is cloned
into a non-zero offset:

  _scratch_mkfs $seqres.full 21
  _scratch_mount

  # Create our test files. File foo has the same 2K of data at offset 4K
  # as file bar has at its offset 0.
  $XFS_IO_PROG -f -s -c pwrite -S 0xaa 0 4K \
  -c pwrite -S 0xbb 4k 2K \
  -c pwrite -S 0xcc 8K 4K \
  $SCRATCH_MNT/foo | _filter_xfs_io

  # File bar consists of a single inline extent (2K size).
  $XFS_IO_PROG -f -s -c pwrite -S 0xbb 0 2K \
 $SCRATCH_MNT/bar | _filter_xfs_io

  # Now call the clone ioctl to clone the extent of file bar into file
  # foo at its offset 4K. This made file foo have an inline extent at
  # offset 4K, something which the btrfs code can not deal with in future
  # IO operations because all inline extents are supposed to start at an
  # offset of 0, resulting in all sorts of chaos.
  # So here we validate that clone ioctl returns an EOPNOTSUPP, which is
  # what it returns for other cases dealing with inlined extents.
  $CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
  $SCRATCH_MNT/bar $SCRATCH_MNT/foo

  # Because of the inline extent at offset 4K, the following write made
  # the kernel crash with a BUG_ON().
  $XFS_IO_PROG -c pwrite -S 0xdd 6K 2K $SCRATCH_MNT/foo | _filter_xfs_io

  status=0
  exit

The stack trace of the BUG_ON() triggered by the last write is:

  [152154.035903] [ cut here ]
  [152154.036424] kernel BUG at mm/page-writeback.c:2286!
  [152154.036424] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
  [152154.036424] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor 
raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc 
loop fuse parport_pc acpi_cpu$
  [152154.036424] CPU: 2 PID: 17873 Comm: xfs_io Tainted: GW   
4.1.0-rc6-btrfs-next-11+ #2
  [152154.036424] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
  [152154.036424] task: 880429f70990 ti: 880429efc000 task.ti: 
880429efc000
  [152154.036424] RIP: 0010:[8111a9d5]  [8111a9d5] 
clear_page_dirty_for_io+0x1e/0x90
  [152154.036424] RSP: 0018:880429effc68  EFLAGS: 00010246
  [152154.036424] RAX: 02000806 RBX: ea0006a6d8f0 RCX: 
0001
  [152154.036424] RDX:  RSI: 81155d1b RDI: 
ea0006a6d8f0
  [152154.036424] RBP: 880429effc78 R08: 8801ce389fe0 R09: 
0001
  [152154.036424] R10: 2000 R11:  R12: 
8800200dce68
  [152154.036424] R13:  R14: 8800200dcc88 R15: 
8803d5736d80
  [152154.036424] FS:  7fbf119f6700() GS:88043d28() 
knlGS:
  [152154.036424] CS:  0010 DS:  ES:  CR0: 80050033
  [152154.036424] CR2: 01bdc000 CR3: 0003aa555000 CR4: 
06e0
  [152154.036424] Stack:
  [152154.036424]  8803d5736d80 0001 880429effcd8 
a04e97c1
  [152154.036424]  880429effd68 880429effd60 0001 
8800200dc9c8
  [152154.036424]  0001 8800200dcc88  
1000
  [152154.036424] Call Trace:
  [152154.036424]  [a04e97c1] 
lock_and_cleanup_extent_if_need+0x147/0x18d [btrfs]
  [152154.036424]  [a04ea82c] __btrfs_buffered_write+0x245/0x4c8 
[btrfs]
  [152154.036424]  [a04ed14b] ? btrfs_file_write_iter+0x150/0x3e0 
[btrfs]
  [152154.036424]  [a04ed15a] ? btrfs_file_write_iter+0x15f/0x3e0 
[btrfs]
  [152154.036424]  [a04ed2c7] btrfs_file_write_iter+0x2cc/0x3e0 
[btrfs]
  [152154.036424]  [81165a4a] __vfs_write+0x7c/0xa5
  [152154.036424]  [81165f89] vfs_write+0xa0/0xe4
  [152154.036424]  [81166855] SyS_pwrite64+0x64/0x82
  [152154.036424]  [81465197] system_call_fastpath+0x12/0x6f
  [152154.036424] Code: 48 89 c7 e8 0f ff ff ff 5b 41 5c 5d c3 0f 1f 44 00 00 
55 48 89 e5 41 54 53 48 89 fb e8 ae ef 00 00 49 89 c4 48 8b 03 a8 01 75 02 0f 
0b 4d 85 e4 74 59 49 8b 3c 2$
  [152154.036424] RIP  [8111a9d5] clear_page_dirty_for_io+0x1e/0x90
  [152154.036424]  RSP 880429effc68
  [152154.242621] ---[ end trace e3d3376b23a57041 ]---

Fix this by returning the error EOPNOTSUPP if an attempt to copy an
inline extent into a non-zero offset happens, just like what is done for
other scenarios that would require copying/splitting inline extents,
which were introduced

[RFC PATCH V11 20/21] Btrfs: subpagesize-blockssize: Limit inline extents to root-sectorsize.

2015-06-01 Thread Chandan Rajendra
cow_file_range_inline() limits the size of an inline extent to
PAGE_CACHE_SIZE. This breaks in subpagesize-blocksize scenarios. Fix this by
comparing against root-sectorsize.

Signed-off-by: Chandan Rajendra chan...@linux.vnet.ibm.com
---
 fs/btrfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1684339..4d42123 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -258,7 +258,7 @@ static noinline int cow_file_range_inline(struct btrfs_root 
*root,
data_len = compressed_size;
 
if (start  0 ||
-   actual_end  PAGE_CACHE_SIZE ||
+   actual_end  root-sectorsize ||
data_len  BTRFS_MAX_INLINE_DATA_SIZE(root) ||
(!compressed_size 
(actual_end  (root-sectorsize - 1)) == 0) ||
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: delete inline extents when we find them during logging

2013-03-01 Thread Josef Bacik
Apparently when we do inline extents we allow the data to overlap the last chunk
of the btrfs_file_extent_item, which means that we can possibly have a
btrfs_file_extent_item that isn't actually as large as a btrfs_file_extent_item.
This messes with us when we try to overwrite the extent when logging new extents
since we expect for it to be the right size.  To fix this just delete the item
and try to do the insert again which will give us the proper sized
btrfs_file_extent_item.  This fixes a panic where map_private_extent_buffer
would blow up because we're trying to write past the end of the leaf.  Thanks,

Cc: sta...@vger.kernel.org
Signed-off-by: Josef Bacik jba...@fusionio.com
---
 fs/btrfs/tree-log.c |   18 ++
 1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 8e85e0e..c7ef569 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3300,6 +3300,7 @@ static int log_one_extent(struct btrfs_trans_handle 
*trans,
int index = log-log_transid % 2;
bool skip_csum = BTRFS_I(inode)-flags  BTRFS_INODE_NODATASUM;
 
+insert:
INIT_LIST_HEAD(ordered_sums);
btrfs_init_map_token(token);
key.objectid = btrfs_ino(inode);
@@ -3315,6 +3316,23 @@ static int log_one_extent(struct btrfs_trans_handle 
*trans,
leaf = path-nodes[0];
fi = btrfs_item_ptr(leaf, path-slots[0],
struct btrfs_file_extent_item);
+
+   /*
+* If we are overwriting an inline extent with a real one then we need
+* to just delete the inline extent as it may not be large enough to
+* have the entire file_extent_item.
+*/
+   if (ret  btrfs_token_file_extent_type(leaf, fi, token) ==
+   BTRFS_FILE_EXTENT_INLINE) {
+   ret = btrfs_del_item(trans, log, path);
+   btrfs_release_path(path);
+   if (ret) {
+   path-really_keep_locks = 0;
+   return ret;
+   }
+   goto insert;
+   }
+
btrfs_set_token_file_extent_generation(leaf, fi, em-generation,
   token);
if (test_bit(EXTENT_FLAG_PREALLOC, em-flags)) {
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix decompressing of snappy-compressed inline extents

2012-01-17 Thread Li Zefan
The first four bytes is the length of all data chunks, and the first
four bytes of each chunk is the length of compressed chunk data,
even when there's only one chunk, which is the case for inline
extents.

Signed-off-by: Li Zefan l...@cn.fujitsu.com
---
 fs/btrfs/snappy.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/btrfs/snappy.c b/fs/btrfs/snappy.c
index d6bd607..35b877e 100644
--- a/fs/btrfs/snappy.c
+++ b/fs/btrfs/snappy.c
@@ -392,12 +392,16 @@ static int btrfs_snappy_decompress(struct list_head *ws, 
unsigned char *data_in,
struct workspace *workspace = list_entry(ws, struct workspace, list);
size_t in_len;
size_t out_len;
+   size_t tot_len;
int ret = 0;
char *kaddr;
unsigned long bytes;
 
BUG_ON(srclen  SNAPPY_LEN);
 
+   tot_len = read_compress_length(data_in);
+   data_in += SNAPPY_LEN;
+
in_len = read_compress_length(data_in);
data_in += SNAPPY_LEN;
 
-- 
1.7.3.1
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


clone ioctl bug with inline extents

2011-08-09 Thread Sage Weil
Hi all,

I'm hitting a problem cloning inline extents that I haven't had much 
success tracking down.  It's simple enough to reproduce:

 echo   src
 echo 2  dst
 clone_range src 0 29 dst 0
 cmp src dst   # fails! dst is size 29 but contains 2\n\0\0\0\0...

where clone_range comes from 

 
http://ceph.newdream.net/git/?p=ceph.git;a=blob;f=qa/btrfs/clone_range.c;h=0a88e16013104c27aa87e7cd0d75e4d292419a19;hb=HEAD

The file size is adjusted for the target, and debug-tree shows an inline 
data extent of length 29, but it has the old data in it.  I'm not sure why

ret = btrfs_insert_empty_item(trans, root, path,
  new_key, size);
BUG_ON(ret);

[...]

leaf = path-nodes[0];
slot = path-slots[0];
write_extent_buffer(leaf, buf,
btrfs_item_ptr_offset(leaf, slot),
size);
inode_add_bytes(inode, datal);

is working when cloning to a new file but not over an existing one.  
Hopefully this is something silly I'm missing...

sage
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: clone ioctl bug with inline extents

2011-08-09 Thread Sage Weil
On Tue, 9 Aug 2011, Sage Weil wrote:
 Hi all,
 
 I'm hitting a problem cloning inline extents that I haven't had much 
 success tracking down.  It's simple enough to reproduce:
 
  echo   src
  echo 2  dst
  clone_range src 0 29 dst 0
  cmp src dst   # fails! dst is size 29 but contains 2\n\0\0\0\0...

facepalm, ok, this was just a matter of the ioctl code not dropping the 
old page cache pages.  I'm not sure how we didn't notice that for so long!

sage
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: release reservations properly when doing inline extents

2011-08-02 Thread Josef Bacik
We were only releasing our metadata reservations when doing inline extents,
which isn't correct since we don't need our data reservation either.  So call
btrfs_delalloc_release_space() instead of btrfs_delalloc_release_metadata().
This would have been caught earlier but we don't check for bytes_may_use still
being set on unmount, so add that to the WARN_ON() condition so we can catch
this sort of thing in the future.  This would cause us to leak space and run
into early ENOSPC problems.  Thanks,

Signed-off-by: Josef Bacik jo...@redhat.com
---
 fs/btrfs/extent-tree.c |3 ++-
 fs/btrfs/inode.c   |2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a3e61eb..d003508 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -6915,7 +6915,8 @@ int btrfs_free_block_groups(struct btrfs_fs_info *info)
struct btrfs_space_info,
list);
if (space_info-bytes_pinned  0 ||
-   space_info-bytes_reserved  0) {
+   space_info-bytes_reserved  0 ||
+   space_info-bytes_may_use  0) {
WARN_ON(1);
dump_space_info(space_info, 0, 0);
}
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 68884ba..e449a1f 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -255,7 +255,7 @@ static noinline int cow_file_range_inline(struct 
btrfs_trans_handle *trans,
   inline_len, compressed_size,
   compress_type, compressed_pages);
BUG_ON(ret);
-   btrfs_delalloc_release_metadata(inode, end + 1 - start);
+   btrfs_delalloc_release_space(inode, end + 1 - start);
btrfs_drop_extent_cache(inode, start, aligned_end - 1, 0);
return 0;
 }
-- 
1.7.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html