On Mon, Oct 27, 2014 at 7:57 PM, Chris Mason wrote:
> On Tue, Oct 21, 2014 at 6:12 AM, Filipe Manana wrote:
>>
>> If right after starting the snapshot creation ioctl we perform a write
>> against a
>> file followed by a truncate, with both operations increasing the file's
>> size, we
>> can get a snapshot tree that reflects a state of the source subvolume's
>> tree where
>> the file truncation happened but the write operation didn't. This leaves a
>> gap
>> between 2 file extent items of the inode, which makes btrfs' fsck complain
>> about it.
>>
>> For example, if we perform the following file operations:
>>
>> $ mkfs.btrfs -f /dev/vdd
>> $ mount /dev/vdd /mnt
>> $ xfs_io -f \
>> -c "pwrite -S 0xaa -b 32K 0 32K" \
>> -c "fsync" \
>> -c "pwrite -S 0xbb -b 32770 16K 32770" \
>> -c "truncate 90123" \
>> /mnt/foobar
>>
>> and the snapshot creation ioctl was just called before the second write,
>> we often
>> can get the following inode items in the snapshot's btree:
>>
>> item 120 key (257 INODE_ITEM 0) itemoff 7987 itemsize 160
>> inode generation 146 transid 7 size 90123 block group 0
>> mode 100600 links 1 uid 0 gid 0 rdev 0 flags 0x0
>> item 121 key (257 INODE_REF 256) itemoff 7967 itemsize 20
>> inode ref index 282 namelen 10 name: foobar
>> item 122 key (257 EXTENT_DATA 0) itemoff 7914 itemsize 53
>> extent data disk byte 1104855040 nr 32768
>> extent data offset 0 nr 32768 ram 32768
>> extent compression 0
>> item 123 key (257 EXTENT_DATA 53248) itemoff 7861 itemsize 53
>> extent data disk byte 0 nr 0
>> extent data offset 0 nr 40960 ram 40960
>> extent compression 0
>>
>> There's a file range, corresponding to the interval [32K; ALIGN(16K +
>> 32770, 4096)[
>> for which there's no file extent item covering it. This is because the
>> file write
>> and file truncate operations happened both right after the snapshot
>> creation ioctl
>> called btrfs_start_delalloc_inodes(), which means we didn't start and wait
>> for the
>> ordered extent that matches the write and, in btrfs_setsize(), we were
>> able to call
>> btrfs_cont_expand() before being able to commit the current transaction in
>> the
>> snapshot creation ioctl. So this made it possibe to insert the hole file
>> extent
>> item in the source subvolume (which represents the region added by the
>> truncate)
>> right before the transaction commit from the snapshot creation ioctl.
>>
>> Btrfs' fsck tool complains about such cases with a message like the
>> following:
>>
>> "root 331 inode 257 errors 100, file extent discount"
>>
>> From a user perspective, the expectation when a snapshot is created while
>> those
>> file operations are being performed is that the snapshot will have a file
>> that
>> either:
>>
>> 1) is empty
>> 2) only the first write was captured
>> 3) only the 2 writes were captured
>> 4) both writes and the truncation were captured
>>
>> But never capture a state where only the first write and the truncation
>> were
>> captured (since the second write was performed before the truncation).
>>
>> A test case for xfstests follows.
>>
>> Signed-off-by: Filipe Manana
>> ---
>> fs/btrfs/inode.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 0d41741..c28b78f 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -4622,6 +4622,9 @@ static int btrfs_setsize(struct inode *inode, struct
>> iattr *attr)
>> }
>>
>> if (newsize > oldsize) {
>> + ret = btrfs_wait_ordered_range(inode, 0, (u64)-1);
>> + if (ret)
>> + return ret;
>
>
> Expanding truncates aren't my favorite operation, but we don't want them to
> imply fsync. I'm holding off on this one while I work out the rest of the
> vacation backlog ;)
Yes, it's very close to an fsync.
Any suggestion for a more lightweight approach?
thanks
>
> -chris
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Filipe David Manana,
"Reasonable men adapt themselves to the world.
Unreasonable men adapt the world to themselves.
That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html