[PATCH v2] Btrfs: use generic_remap_file_range_prep() for cloning and deduplication

2018-12-07 Thread fdmanana
From: Filipe Manana Since cloning and deduplication are no longer Btrfs specific operations, we now have generic code to handle parameter validation, compare file ranges used for deduplication, clear capabilities when cloning, etc. This change makes Btrfs use it, eliminating a lot of code in

[PATCH] Btrfs: use generic_remap_file_range_prep() for cloning and deduplication

2018-12-07 Thread fdmanana
From: Filipe Manana Since cloning and deduplication are no longer Btrfs specific operations, we now have generic code to handle parameter validation, compare file ranges used for deduplication, clear capabilities when cloning, etc. This change makes Btrfs use it, eliminating a lot of code in

[PATCH] Btrfs: scrub, move setup of nofs contexts higher in the stack

2018-12-07 Thread fdmanana
From: Filipe Manana Since scrub workers only do memory allocation with GFP_KERNEL when they need to perform repair, we can move the recent setup of the nofs context up to scrub_handle_errored_block() instead of setting it up down the call chain at insert_full_stripe_lock() and

[PATCH v2] Btrfs: fix fsync of files with multiple hard links in new directories

2018-11-28 Thread fdmanana
From: Filipe Manana The log tree has a long standing problem that when a file is fsync'ed we only check for new ancestors, created in the current transaction, by following only the hard link for which the fsync was issued. We follow the ancestors using the VFS' dget_parent() API. This means that

[PATCH] Btrfs: fix fsync of files with multiple hard links in new directories

2018-11-28 Thread fdmanana
From: Filipe Manana The log tree has a long standing problem that when a file is fsync'ed we only check for new ancestors, created in the current transaction, by following only the hard link for which the fsync was issued. We follow the ancestors using the VFS' dget_parent() API. This means that

[PATCH v5] Btrfs: fix deadlock with memory reclaim during scrub

2018-11-26 Thread fdmanana
From: Filipe Manana When a transaction commit starts, it attempts to pause scrub and it blocks until the scrub is paused. So while the transaction is blocked waiting for scrub to pause, we can not do memory allocation with GFP_KERNEL from scrub, otherwise we risk getting into a deadlock with

[PATCH v4] Btrfs: fix deadlock with memory reclaim during scrub

2018-11-23 Thread fdmanana
From: Filipe Manana When a transaction commit starts, it attempts to pause scrub and it blocks until the scrub is paused. So while the transaction is blocked waiting for scrub to pause, we can not do memory allocation with GFP_KERNEL from scrub, otherwise we risk getting into a deadlock with

[PATCH v3] Btrfs: fix deadlock with memory reclaim during scrub

2018-11-23 Thread fdmanana
From: Filipe Manana When a transaction commit starts, it attempts to pause scrub and it blocks until the scrub is paused. So while the transaction is blocked waiting for scrub to pause, we can not do memory allocation with GFP_KERNEL while scrub is running, we must use GFP_NOS to avoid deadlock

[PATCH v2] Btrfs: fix deadlock with memory reclaim during scrub

2018-11-23 Thread fdmanana
From: Filipe Manana When a transaction commit starts, it attempts to pause scrub and it blocks until the scrub is paused. So while the transaction is blocked waiting for scrub to pause, we can not do memory allocation with GFP_KERNEL while scrub is running, we must use GFP_NOS to avoid deadlock

[PATCH] Btrfs: fix deadlock with memory reclaim during scrub

2018-11-23 Thread fdmanana
From: Filipe Manana When a transaction commit starts, it attempts to pause scrub and it blocks until the scrub is paused. So while the transaction is blocked waiting for scrub to pause, we can not do memory allocation with GFP_KERNEL while scrub is running, we must use GFP_NOS to avoid deadlock

[PATCH] Btrfs: fix race between enabling quotas and subvolume creation

2018-11-19 Thread fdmanana
From: Filipe Manana We have a race between enabling quotas end subvolume creation that cause subvolume creation to fail with -EINVAL, and the following diagram shows how it happens: CPU 0 CPU 1 btrfs_ioctl() btrfs_ioctl_quota_ctl()

[PATCH v2] Btrfs: fix deadlock when enabling quotas due to concurrent snapshot creation

2018-11-19 Thread fdmanana
From: Filipe Manana If the quota enable and snapshot creation ioctls are called concurrently we can get into a deadlock where the task enabling quotas will deadlock on the fs_info->qgroup_ioctl_lock mutex because it attempts to lock it twice, or the task creating a snapshot tries to commit the

[PATCH] Btrfs: fix deadlock when enabling quotas due to concurrent snapshot creation

2018-11-19 Thread fdmanana
From: Filipe Manana If the quota enable and snapshot creation ioctls are called concurrently we can get into a deadlock where the task enabling quotas will deadlock on the fs_info->qgroup_ioctl_lock mutex because it attempts to lock it twice. The following time diagram shows how this happens.

[PATCH] Btrfs: fix access to available allocation bits when starting balance

2018-11-19 Thread fdmanana
From: Filipe Manana The available allocation bits members from struct btrfs_fs_info are protected by a sequence lock, and when starting balance we access them incorrectly in two different ways: 1) In the read sequence lock loop at btrfs_balance() we use the values we read from

[PATCH] Btrfs: allow clear_extent_dirty() to receive a cached extent state record

2018-11-16 Thread fdmanana
From: Filipe Manana We can have a lot freed extents during the life span of transaction, so the red black tree that keeps track of the ranges of each freed extent (fs_info->freed_extents[]) can get quite big. When finishing a transaction commit we find each range, process it (discard the

[PATCH] Btrfs: bring back key search optimization to btrfs_search_old_slot()

2018-11-16 Thread fdmanana
From: Filipe Manana Commit d7396f07358a ("Btrfs: optimize key searches in btrfs_search_slot"), dated from August 2013, introduced an optimization to search for keys in a node/leaf to both btrfs_search_slot() and btrfs_search_old_slot(). For the later, it ended up being reverted in commit

[PATCH] btrfs: test send after radical changes in a complex directory hierarchy

2018-11-14 Thread fdmanana
From: Filipe Manana Test an incremental send operation in a scenario where the relationship of ancestor-descendant between multiple directories is inversed, and where multiple directories that were previously ancestors of another directory now become descendents of multiple directories that used

[PATCH] Btrfs: send, fix infinite loop due to directory rename dependencies

2018-11-14 Thread fdmanana
From: Robbie Ko When doing an incremental send, due to the need of delaying directory move (rename) operations we can end up in infinite loop at apply_children_dir_moves(). An example scenario that triggers this problem is described below, where directory names correspond to the numbers of

[PATCH] Btrfs: ensure path name is null terminated at btrfs_control_ioctl

2018-11-14 Thread fdmanana
From: Filipe Manana We were using the path name received from user space without checking that it is null terminated. While btrfs-progs is well behaved and does proper validation and null termination, someone could call the ioctl and pass a non-null terminated patch, leading to buffer overrun

[PATCH] Btrfs: remove no longer used io_err from btrfs_log_ctx

2018-11-12 Thread fdmanana
From: Filipe Manana The io_err field of struct btrfs_log_ctx is no longer used after the recent simplification of the fast fsync path, where we now wait for ordered extents to complete before logging the inode. We did this in commit b5e6c3e170b7 ("btrfs: always wait on ordered extents at fsync

[PATCH] Btrfs: fix rare chances for data loss when doing a fast fsync

2018-11-12 Thread fdmanana
From: Filipe Manana After the simplification of the fast fsync patch done recently by commit b5e6c3e170b7 ("btrfs: always wait on ordered extents at fsync time") and commit e7175a692765 ("btrfs: remove the wait ordered logic in the log_one_extent path"), we got a very short time window where we

[PATCH] Btrfs: simpler and more efficient cleanup of a log tree's extent io tree

2018-11-09 Thread fdmanana
From: Filipe Manana We currently are in a loop finding each range (corresponding to a btree node/leaf) in a log root's extent io tree and then clean it up. This is a waste of time since we are traversing the extent io tree's rb_tree more times then needed (one for a range lookup and another for

[PATCH] Btrfs: do not set log for full commit when creating non-data block groups

2018-11-08 Thread fdmanana
From: Filipe Manana When creating a block group we don't need to set the log for full commit if the new block group is not used for data. Logged items can only point to logical addresses of data block groups (through file extent items) so there is no need to for the next fsync to fallback to a

[PATCH] btrfs: fix computation of max fs size for multiple device fs tests

2018-11-06 Thread fdmanana
From: Filipe Manana We were sorting numerical values with the 'sort' tool without telling it that we are sorting numbers, giving us unexpected ordering. So just pass the '-n' option to the 'sort' tool. Example: $ echo -e "11\n9\n20" | sort 11 20 9 $ echo -e "11\n9\n20" | sort -n 9 11 20

[PATCH 3/3] btrfs: add new filter for file cloning error translation

2018-11-05 Thread fdmanana
From: Filipe Manana A bug in file cloning/reflinking was recently found that afftected both Btrfs and XFS, which was caused by allowing the cloning of an eof block into the middle of a file when the eof is not aligned to the filesystem's block size. The fix consists of returning the errno

[PATCH 2/3] generic: test attempt to reflink eof block into the middle of a file

2018-11-05 Thread fdmanana
From: Filipe Manana Test that we can not clone a range from a file A into the middle of a file B when the range includes the last block of file A and file A's size is not aligned with the filesystem's block size. Allowing such case would lead to data corruption since the data between EOF and the

[PATCH 1/3] generic: test attempt to dedup eof block into the middle of a file

2018-11-05 Thread fdmanana
From: Filipe Manana Test that deduplication of an entire file that has a size that is not aligned to the filesystem's block size into the middle of a different file does not corrupt the destination's file data by reflinking the last (eof) block. This test is motivated by a bug recently found

[PATCH] Btrfs: fix data corruption due to cloning of eof block

2018-11-05 Thread fdmanana
From: Filipe Manana We currently allow cloning a range from a file which includes the last block of the file even if the file's size is not aligned to the block size. This is fine and useful when the destination file has the same size, but when it does not and the range ends somewhere in the

[PATCH] Btrfs: fix infinite loop on inode eviction after deduplication of eof block

2018-11-05 Thread fdmanana
From: Filipe Manana If we attempt to deduplicate the last block of a file A into the middle of a file B, and file A's size is not a multiple of the block size, we end rounding the deduplication length to 0 bytes, to avoid the data corruption issue fixed by commit de02b9f6bb65 ("Btrfs: fix data

[PATCH] fstests: fix fssum to actually ignore file holes when supposed to

2018-10-29 Thread fdmanana
From: Filipe Manana Unless the '-s' option is passed to fssum, it should not detect file holes and have their existence influence the computed checksum for a file. This tool was added to test btrfs' send/receive feature, so that it checks for any metadata and data differences between the

[PATCH] Btrfs: fix missing data checksums after a ranged fsync (msync)

2018-10-29 Thread fdmanana
From: Filipe Manana Recently we got a massive simplification for fsync, where for the fast path we no longer log new extents while their respective ordered extents are still running. However that simplification introduced a subtle regression for the case where we use a ranged fsync (msync).

[PATCH] Btrfs: remove no longer used logged range variables when logging extents

2018-10-26 Thread fdmanana
From: Filipe Manana The logged_start and logged_end variables, at btrfs_log_changed_extents(), were added in commit 8c6c592831a0 ("btrfs: log csums for all modified extents"). However since the recent simplification for fsync, which makes us wait for all ordered extents to complete before

[PATCH] Btrfs: remove no longer used stuff for tracking pending ordered extents

2018-10-26 Thread fdmanana
From: Filipe Manana Tracking pending ordered extents per transaction was introduced in commit 50d9aa99bd35 ("Btrfs: make sure logged extents complete in the current transaction V3") and later updated in commit 161c3549b45a ("Btrfs: change how we wait for pending ordered extents"). However now

[PATCH v4] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-10-24 Thread fdmanana
From: Filipe Manana When we are writing out a free space cache, during the transaction commit phase, we can end up in a deadlock which results in a stack trace like the following: schedule+0x28/0x80 btrfs_tree_read_lock+0x8e/0x120 [btrfs] ? finish_wait+0x80/0x80

[PATCH v3] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-10-22 Thread fdmanana
From: Filipe Manana When we are writing out a free space cache, during the transaction commit phase, we can end up in a deadlock which results in a stack trace like the following: schedule+0x28/0x80 btrfs_tree_read_lock+0x8e/0x120 [btrfs] ? finish_wait+0x80/0x80

[PATCH v2] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-10-22 Thread fdmanana
From: Filipe Manana When we are writing out a free space cache, during the transaction commit phase, we can end up in a deadlock which results in a stack trace like the following: schedule+0x28/0x80 btrfs_tree_read_lock+0x8e/0x120 [btrfs] ? finish_wait+0x80/0x80

[PATCH] Btrfs: fix use-after-free when dumping free space

2018-10-22 Thread fdmanana
From: Filipe Manana We were iterating a block group's free space cache rbtree without locking first the lock that protects it (the free_space_ctl->free_space_offset rbtree is protected by the free_space_ctl->tree_lock spinlock). KASAN reported an use-after-free problem when iterating such a

[PATCH] Btrfs: fix deadlock on tree root leaf when finding free extent

2018-10-22 Thread fdmanana
From: Filipe Manana When we are writing out a free space cache, during the transaction commit phase, we can end up in a deadlock which results in a stack trace like the following: schedule+0x28/0x80 btrfs_tree_read_lock+0x8e/0x120 [btrfs] ? finish_wait+0x80/0x80

[PATCH] btrfs: fix test btrfs/007 to not leave temporary files in /tmp

2018-10-15 Thread fdmanana
From: Filipe Manana This test was using the "mktemp -d" command to create a temporary directory for storing send streams and computations from fssum, without ever deleting them when it finishes. Therefore after running it for many times it filled up all space from /tmp. Fix this by using a

[PATCH] generic: test fsync after fallocate on a very small file

2018-10-15 Thread fdmanana
From: Filipe Manana Test that if we have a very small file, with a size smaller than the block size, then fallocate a very small range within the block size but past the file's current size, fsync the file and then power fail, after mounting the filesystem all the file data is there and the file

[PATCH] Btrfs: fix assertion on fsync of regular file when using no-holes feature

2018-10-15 Thread fdmanana
From: Filipe Manana When using the NO_HOLES feature and logging a regular file, we were expecting that if we find an inline extent, that either its size in ram (uncompressed and unenconded) matches the size of the file or if it does not, that it matches the sector size and it represents

[PATCH v2] Btrfs: fix null pointer dereference on compressed write path error

2018-10-12 Thread fdmanana
From: Filipe Manana At inode.c:compress_file_range(), under the "free_pages_out" label, we can end up dereferencing the "pages" pointer when it has a NULL value. This case happens when "start" has a value of 0 and we fail to allocate memory for the "pages" pointer. When that happens we jump to

[PATCH] Btrfs: fix null pointer dereference on compressed write path error

2018-10-12 Thread fdmanana
From: Filipe Manana At inode.c:compress_file_range(), under the "free_pages_out" label, we can end up dereferencing the "pages" pointer when it has a NULL value. This case happens when "start" has a value of 0 and we fail to allocate memory for the "pages" pointer. When that happens we jump to

[PATCH] Btrfs: fix use-after-free during inode eviction

2018-10-12 Thread fdmanana
From: Filipe Manana At inode.c:evict_inode_truncate_pages(), when we iterate over the inode's extent states, we access an extent state record's "state" field after we unlocked the inode's io tree lock. This can lead to a use-after-free issue because after we unlock the io tree that extent state

[PATCH] Btrfs: fix deadlock when writing out free space caches

2018-10-12 Thread fdmanana
From: Filipe Manana When writing out a block group free space cache we can end deadlocking with ourseves on an extent buffer lock resulting in a warning like the following: [245043.379979] WARNING: CPU: 4 PID: 2608 at fs/btrfs/locking.c:251 btrfs_tree_lock+0x1be/0x1d0 [btrfs]

[PATCH] generic: test for file fsync after moving it to a new parent directory

2018-10-09 Thread fdmanana
From: Filipe Manana Test that if we move a file from a directory B to a directory A, replace directory B with directory A, fsync the file and then power fail, after mounting the filesystem the file has a single parent, named B and there is no longer any directory with the name A. This test is

[PATCH] Btrfs: fix wrong dentries after fsync of file that got its parent replaced

2018-10-09 Thread fdmanana
From: Filipe Manana In a scenario like the following: mkdir /mnt/A # inode 258 mkdir /mnt/B # inode 259 touch /mnt/B/bar # inode 260 sync mv /mnt/B/bar /mnt/A/bar mv -T /mnt/A /mnt/B fsync /mnt/B/bar After replaying the log we end up

[PATCH] Btrfs: fix warning when replaying log after fsync of a tmpfile

2018-10-08 Thread fdmanana
From: Filipe Manana When replaying a log which contains a tmpfile (which necessarily has a link count of 0) we end up calling inc_nlink(), at fs/btrfs/tree-log.c:replay_one_buffer(), which produces a warning like the following: [195191.943673] WARNING: CPU: 0 PID: 6924 at fs/inode.c:342

[PATCH] generic: test mounting filesystem after fsync of a tmpfile

2018-10-08 Thread fdmanana
From: Filipe Manana Test that if we fsync a tmpfile, without adding a hard link to it, and then power fail, we will be able to mount the filesystem without triggering any crashes, warnings or corruptions. This test is motivated by an issue in btrfs where this scenario triggered a warning

[PATCH] generic: test for deduplication between different files

2018-08-17 Thread fdmanana
From: Filipe Manana Test that deduplication of an entire file that has a size that is not aligned to the filesystem's block size into a different file does not corrupt the destination's file data. This test is motivated by a bug found in Btrfs which is fixed by the following patch for the linux

[PATCH] Btrfs: fix data corruption when deduplicating between different files

2018-08-17 Thread fdmanana
From: Filipe Manana If we deduplicate extents between two different files we can end up corrupting data if the source range ends at the size of the source file, the source file's size is not aligned to the filesystem's block size and the destination range does not go past the size of the

[PATCH] btrfs: test writing into unwritten extent right before snapshotting

2018-08-06 Thread fdmanana
From: Filipe Manana Test that if we write into an unwritten extent of a file when there is no more space left to allocate in the filesystem and then snapshot the file's subvolume, after a clean shutdown the data was not lost. This test is motivated by a bug found by Robbie Ko for which there is

[PATCH] btrfs: test send with prealloc extent beyond EOF and hole punching

2018-07-30 Thread fdmanana
From: Filipe Manana Test that an incremental send operation produces correct results if a file that has a prealloc (unwritten) extent beyond its EOF gets a hole punched in a section of that prealloc extent. This test is motivated by a bug found in btrfs which is fixed by a patch for the linux

[PATCH] Btrfs: send, fix incorrect file layout after hole punching beyond eof

2018-07-30 Thread fdmanana
From: Filipe Manana When doing an incremental send, if we have a file in the parent snapshot that has prealloc extents beyond EOF and in the send snapshot it got a hole punch that partially covers the prealloc extents, the send stream, when replayed by a receiver, can result in a file that has a

[PATCH v2] Btrfs: fix send failure when root has deleted files still open

2018-07-24 Thread fdmanana
From: Filipe Manana The more common use case of send involves creating a RO snapshot and then use it for a send operation. In this case it's not possible to have inodes in the snapshot that have a link count of zero (inode with an orphan item) since during snapshot creation we do the orphan

[PATCH] btrfs: test send with snapshots that have files deleted while open

2018-07-23 Thread fdmanana
From: Filipe Manana Test that we are able to do send operations when one of the source snapshots (or subvolume) has a file that is deleted while there is still a open file descriptor for that file. This test is motivated by a bug found in btrfs which is fixed by a patch for the linux kernel

[PATCH] Btrfs: remove unused key assignment when doing a full send

2018-07-23 Thread fdmanana
From: Filipe Manana At send.c:full_send_tree() we were setting the 'key' variable in the loop while never using it later. We were also using two btrfs_key variables to store the initial key for search and the key found in every iteration of the loop. So remove this useless key assignment and use

[PATCH] Btrfs: fix send failure when root has deleted files still open

2018-07-23 Thread fdmanana
From: Filipe Manana The more common use case of send involves creating a RO snapshot and then use it for a send operation. In this case it's not possible to have inodes in the snapshot that have a link count of zero (inode with an orphan item) since during snapshot creation we do the orphan

[PATCH v2] Btrfs: fix mount failure after fsync due to hard link recreation

2018-07-20 Thread fdmanana
From: Filipe Manana If we end up with logging an inode reference item which has the same name but different index from the one we have persisted, we end up failing when replaying the log with an errno value of -EEXIST. The error comes from btrfs_add_link(), which is called from add_inode_ref(),

[PATCH] fstests: add test for fsync after renaming hard links of same file

2018-07-19 Thread fdmanana
From: Filipe Manana Test that if we have a file with 2 (or more) hard links in the same parent directory, rename of the hard links, rename one of the other hard links to the old name of the hard link we renamed before, create a new file in the same parent directory with the old name of second

[PATCH] Btrfs: fix mount failure after fsync due to hard link recreation

2018-07-19 Thread fdmanana
From: Filipe Manana If we end up with logging an inode reference item which has the same name but different index from the one we have persisted, we end up failing when replaying the log with an errno value of -EEXIST. The error comes from btrfs_add_link(), which is called from add_inode_ref(),

[PATCH v2] generic: add test for fsync after cloning file range

2018-07-12 Thread fdmanana
From: Filipe Manana Test that if we do a buffered write to a file, fsync it, clone a range from another file into our file that overlaps the previously written range, fsync the file again and then power fail, after we mount again the filesystem, no file data was lost or corrupted. This test is

[PATCH v2] Btrfs: fix file data corruption after cloning a range and fsync

2018-07-12 Thread fdmanana
From: Filipe Manana When we clone a range into a file we can end up dropping existing extent maps (or trimming them) and replacing them with new ones if the range to be cloned overlaps with a range in the destination inode. When that happens we add the new extent maps to the list of modified

[PATCH] generic: add test for fsync after cloning file range

2018-07-12 Thread fdmanana
From: Filipe Manana Test that if we do a buffered write to a file, fsync it, clone a range from another file into our file that overlaps the previously written range, fsync the file again and then power fail, after we mount again the filesystem, no file data was lost or corrupted. This test is

[PATCH] Btrfs: file data corruption after cloning a range and fsync

2018-07-12 Thread fdmanana
From: Filipe Manana When we clone a range into a file we can end up dropping existing extent maps (or trimming them) and replacing them with new ones if the range to be cloned overlaps with a range in the destination inode. When that happens we add the new extent maps to the list of modified

[PATCH] fstests: test power failure on btrfs while qgroups rescan is in progress

2018-06-27 Thread fdmanana
From: Filipe Manana Test that if a power failure happens on a filesystem with quotas (qgroups) enabled while the quota rescan kernel thread is running, we will be able to mount the filesystem after the power failure. This test is motivated by a recent regression introduced in the linux kernel's

[PATCH] Btrfs: fix mount failure when qgroup rescan is in progress

2018-06-27 Thread fdmanana
From: Filipe Manana If a power failure happens while the qgroup rescan kthread is running, the next mount operation will always fail. This is because of a recent regression that makes qgroup_rescan_init() incorrectly return -EINVAL when we are mounting the filesystem (through

[PATCH v2] Btrfs: fix physical offset reported by fiemap for inline extents

2018-06-20 Thread fdmanana
,eof /mnt/p0/d4/d7/fc7: 1 extent found This resulted in the test failing like this: btrfs/004 49s ... [failed, exit status 1]- output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad) --- tests/btrfs/004.out 2016-08-23 10:17:35.027012095 +0100 +++ /home/fdmanan

[PATCH] Btrfs: fix physical offset reported by fiemap for inline extents

2018-06-19 Thread fdmanana
d in the test failing like this: btrfs/004 49s ... [failed, exit status 1]- output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad) --- tests/btrfs/004.out 2016-08-23 10:17:35.027012095 +0100 +++ /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.

[PATCH] generic: add test for fsync of directory after creating hard link

2018-06-11 Thread fdmanana
From: Filipe Manana Test that if we create a new hard link for a file which was previously fsync'ed, fsync a parent directory of the new hard link and power fail, the parent directory exists after mounting the filesystem again. The parent directory must be a new directory, not yet persisted.

[PATCH 1/2] Btrfs: fix return value on rename exchange failure

2018-06-11 Thread fdmanana
From: Filipe Manana If we failed during a rename exchange operation after starting/joining a transaction, we would end up replacing the return value, stored in the local 'ret' variable, with the return value from btrfs_end_transaction(). So this could end up returning 0 (success) to user space

[PATCH 2/2] Btrfs: sync log after logging new name

2018-06-11 Thread fdmanana
From: Filipe Manana When we add a new name for an inode which was logged in the current transaction, we update the inode in the log so that its new name and ancestors are added to the log. However when we do this we do not persist the log, so the changes remain in memory only, and as a

[PATCH] fstests: generic test for fsync of file with xattrs

2018-05-11 Thread fdmanana
From: Filipe Manana Test that xattrs are not lost after calling fsync multiple times with a filesystem commit in between the fsync calls. This test is motivated by a bug found in btrfs which is fixed by a patch for the linux kernel titled: Btrfs: fix xattr loss after power

[PATCH] Btrfs: fix xattr loss after power failure

2018-05-11 Thread fdmanana
From: Filipe Manana If a file has xattrs, we fsync it, to ensure we clear the flags BTRFS_INODE_NEEDS_FULL_SYNC and BTRFS_INODE_COPY_EVERYTHING from its inode, the current transaction commits and then we fsync it (without either of those bits being set in its inode), we end up

[PATCH] Btrfs: fix duplicate extents after fsync of file with prealloc extents

2018-05-09 Thread fdmanana
From: Filipe Manana In commit 471d557afed1 ("Btrfs: fix loss of prealloc extents past i_size after fsync log replay"), on fsync, we started to always log all prealloc extents beyond an inode's i_size in order to avoid losing them after a power failure. However under some

[PATCH] Btrfs: send, fix missing truncate for inode with prealloc extent past eof

2018-04-30 Thread fdmanana
From: Filipe Manana An incremental send operation can miss a truncate operation when an inode has an increased size in the send snapshot and a prealloc extent beyond its size. Consider the following scenario where a necessary truncate operation is missing in the incremental

[PATCH v2] fstests: generic test for fsync after fallocate

2018-04-09 Thread fdmanana
From: Filipe Manana Test that fsync operations preserve extents allocated with fallocate(2) that are placed beyond a file's size. This test is motivated by a bug found in btrfs where unwritten extents beyond the inode's i_size were not preserved after a fsync and power

[PATCH] fstests: generic test for fsync after fallocate

2018-04-06 Thread fdmanana
From: Filipe Manana Test that fsync operations preserve extents allocated with fallocate(2) that are placed beyond a file's size. This test is motivated by a bug found in btrfs where unwritten extents beyond the inode's i_size were not preserved after a fsync and power

[PATCH] Btrfs: fix loss of prealloc extents past i_size after fsync log replay

2018-04-06 Thread fdmanana
From: Filipe Manana Currently if we allocate extents beyond an inode's i_size (through the fallocate system call) and then fsync the file, we log the extents but after a power failure we replay them and then immediately drop them. This behaviour happens since about 2009,

[PATCH v2] fstests: test btrfs fsync after hole punching with no-holes mode

2018-04-02 Thread fdmanana
From: Filipe Manana Test that when we have the no-holes mode enabled and a specific metadata layout, if we punch a hole and fsync the file, at replay time the whole hole was preserved. This issue is fixed by the following btrfs patch for the linux kernel: "Btrfs: fix fsync

[PATCH] fstests: test btrfs fsync after hole punching with no-holes mode

2018-03-27 Thread fdmanana
From: Filipe Manana Test that when we have the no-holes mode enabled and a specific metadata layout, if we punch a hole and fsync the file, at replay time the whole hole was preserved. This issue is fixed by the following btrfs patch for the linux kernel: "Btrfs: fix fsync

[PATCH 2/2] Btrfs: fix copy_items() return value when logging an inode

2018-03-27 Thread fdmanana
From: Filipe Manana When logging an inode, at tree-log.c:copy_items(), if we call btrfs_next_leaf() at the loop which checks for the need to log holes, we need to make sure copy_items() returns the value 1 to its caller and not 0 (on success). This is because the path the

[PATCH 1/2] Btrfs: fix fsync after hole punching when using no-holes feature

2018-03-27 Thread fdmanana
From: Filipe Manana When we have the no-holes mode enabled and fsync a file after punching a hole in it, we can end up not logging the whole hole range in the log tree. This happens if the file has extent items that span more than one leaf and we punch a hole that covers a

[PATCH 2/2 v3] Btrfs-progs: add fsck test for filesystem with shared prealloc extents

2018-03-15 Thread fdmanana
From: Filipe Manana Verify that a filesystem check operation (fsck) does not report the following scenario as an error: An extent is shared between two inodes, as a result of clone/reflink operation, and for one of the inodes, lets call it inode A, the extent is referenced

[PATCH 1/2 v3] Btrfs-progs: check, fix false error reports for shared prealloc extents

2018-03-15 Thread fdmanana
From: Filipe Manana Under some cases the filesystem checker reports an error when it finds checksum items for an extent that is referenced by an inode as a prealloc extent. Such cases are not an error when the extent is actually shared (was cloned/reflinked) with other inodes

[PATCH 2/2 v2] Btrfs-progs: add fsck test for filesystem with shared prealloc extents

2018-03-14 Thread fdmanana
From: Filipe Manana Verify that a filesystem check operation (fsck) does not report the following scenario as an error: An extent is shared between two inodes, as a result of clone/reflink operation, and for one of the inodes, lets call it inode A, the extent is referenced

[PATCH 1/2 v2] Btrfs-progs: check, fix false error reports for shared prealloc extents

2018-03-14 Thread fdmanana
From: Filipe Manana Under some cases the filesystem checker reports an error when it finds checksum items for an extent that is referenced by an inode as a prealloc extent. Such cases are not an error when the extent is actually shared (was cloned/reflinked) with other inodes

[PATCH 1/2] Btrfs-progs: check, fix false error reports for shared prealloc extents

2018-03-13 Thread fdmanana
From: Filipe Manana Under some cases the filesystem checker reports an error when it finds checksum items for an extent that is referenced by an inode as a prealloc extent. Such cases are not an error when the extent is actually shared (was cloned/reflinked) with other inodes

[PATCH 2/2] Btrfs-progs: add fsck test for filesystem with shared prealloc extents

2018-03-13 Thread fdmanana
From: Filipe Manana Verify that a filesystem check operation (fsck) does not report the following scenario as an error: An extent is shared between two inodes, as a result of clone/reflink operation, and for one of the inodes, lets call it inode A, the extent is referenced

[PATCH] generic: add test for fsync after renaming and linking special file

2018-02-28 Thread fdmanana
From: Filipe Manana Test that when a fsync journal/log exists, if we rename a special file (fifo, symbolic link or device), create a hard link for it with its old name and then commit the journal/log, if a power loss happens the filesystem will not fail to replay the

[PATCH] generic: test fsync new file after removing hard link

2018-02-28 Thread fdmanana
From: Filipe Manana Test that if we have a file with two hard links in the same parent directory, then remove of the links, create a new file in the same parent directory and with the name of the link removed, fsync the new file and have a power loss, mounting the filesystem

[PATCH 1/2] Btrfs: fix log replay failure after linking special file and fsync

2018-02-28 Thread fdmanana
From: Filipe Manana If in the same transaction we rename a special file (fifo, character/block device or symbolic link), create a hard link for it having its old name then sync the log, we will end up with a log that can not be replayed and at when attempting to replay it, an

[PATCH] Btrfs: fix log replay failure after unlink and link combination

2018-02-28 Thread fdmanana
From: Filipe Manana If we have a file with 2 (or more) hard links in the same directory, remove one of the hard links, create a new file (or link an existing file) in the same directory with the name of the removed hard link, and then finally fsync the new file, we end up with

[PATCH] Btrfs: send, do not issue unnecessary truncate operations

2018-02-07 Thread fdmanana
From: Filipe Manana When send finishes processing an inode representing a regular file, it always issues a truncate operation for that file, even if its size did not change or the last write sets the file size correctly. In the most common cases, the issued write operations

[PATCH] Btrfs: skip writeback of last page when truncating file to same size

2018-02-07 Thread fdmanana
From: Filipe Manana When we truncate a file to the same size and that size is not aligned with the sector size, we end up triggering writeback (and wait for it to complete) of the last page. This is unncessary as we can not have delayed allocation beyond the inode's i_size and

[PATCH] Btrfs: send, fix issuing write op when processing hole in no data mode

2018-02-07 Thread fdmanana
From: Filipe Manana When doing an incremental send of a filesystem with the no-holes feature enabled, we end up issuing a write operation when using the no data mode send flag, instead of issuing an update extent operation. Fix this by issuing the update extent operation

[PATCH] Btrfs: fix null pointer dereference when replacing missing device

2018-01-30 Thread fdmanana
From: Filipe Manana When we are replacing a missing device we mount the filesystem with the degraded mode option in which case we are allowed to have a btrfs device structure without a backing device member (its bdev member is NULL) and therefore we can't dereference that

[PATCH 1/2] Btrfs: fix missing inode i_size update after zero range operation

2018-01-18 Thread fdmanana
From: Filipe Manana For a fallocate's zero range operation that targets a range with an end that is not aligned to the sector size, we can end up not updating the inode's i_size. This happens when the last page of the range maps to an unwritten (prealloc) extent and before

[PATCH 2/2] Btrfs: fix space leak after fallocate and zero range operations

2018-01-18 Thread fdmanana
From: Filipe Manana If we do a buffered write after a zero range operation that has an unaligned (with the filesystem's sector size) end which also falls within an unwritten (prealloc) extent that is currently beyond the inode's i_size, and the zero range operation has the

[PATCH] btrfs: test send for files with multiple hard links renamed

2017-11-24 Thread fdmanana
From: Filipe Manana Test that an incremental send operation works if a file that has multiple hard links has some of its hard links renamed in the send snapshot, with one of them getting the same path that some other inode had in the send snapshot. At the moment this test

  1   2   3   4   5   >