From: Filipe Manana
Since cloning and deduplication are no longer Btrfs specific operations, we
now have generic code to handle parameter validation, compare file ranges
used for deduplication, clear capabilities when cloning, etc. This change
makes Btrfs use it, eliminating a lot of code in
From: Filipe Manana
Since cloning and deduplication are no longer Btrfs specific operations, we
now have generic code to handle parameter validation, compare file ranges
used for deduplication, clear capabilities when cloning, etc. This change
makes Btrfs use it, eliminating a lot of code in
From: Filipe Manana
Since scrub workers only do memory allocation with GFP_KERNEL when they
need to perform repair, we can move the recent setup of the nofs context
up to scrub_handle_errored_block() instead of setting it up down the call
chain at insert_full_stripe_lock() and
From: Filipe Manana
The log tree has a long standing problem that when a file is fsync'ed we
only check for new ancestors, created in the current transaction, by
following only the hard link for which the fsync was issued. We follow the
ancestors using the VFS' dget_parent() API. This means that
From: Filipe Manana
The log tree has a long standing problem that when a file is fsync'ed we
only check for new ancestors, created in the current transaction, by
following only the hard link for which the fsync was issued. We follow the
ancestors using the VFS' dget_parent() API. This means that
From: Filipe Manana
When a transaction commit starts, it attempts to pause scrub and it blocks
until the scrub is paused. So while the transaction is blocked waiting for
scrub to pause, we can not do memory allocation with GFP_KERNEL from scrub,
otherwise we risk getting into a deadlock with
From: Filipe Manana
When a transaction commit starts, it attempts to pause scrub and it blocks
until the scrub is paused. So while the transaction is blocked waiting for
scrub to pause, we can not do memory allocation with GFP_KERNEL from scrub,
otherwise we risk getting into a deadlock with
From: Filipe Manana
When a transaction commit starts, it attempts to pause scrub and it blocks
until the scrub is paused. So while the transaction is blocked waiting for
scrub to pause, we can not do memory allocation with GFP_KERNEL while scrub
is running, we must use GFP_NOS to avoid deadlock
From: Filipe Manana
When a transaction commit starts, it attempts to pause scrub and it blocks
until the scrub is paused. So while the transaction is blocked waiting for
scrub to pause, we can not do memory allocation with GFP_KERNEL while scrub
is running, we must use GFP_NOS to avoid deadlock
From: Filipe Manana
When a transaction commit starts, it attempts to pause scrub and it blocks
until the scrub is paused. So while the transaction is blocked waiting for
scrub to pause, we can not do memory allocation with GFP_KERNEL while scrub
is running, we must use GFP_NOS to avoid deadlock
From: Filipe Manana
We have a race between enabling quotas end subvolume creation that cause
subvolume creation to fail with -EINVAL, and the following diagram shows
how it happens:
CPU 0 CPU 1
btrfs_ioctl()
btrfs_ioctl_quota_ctl()
From: Filipe Manana
If the quota enable and snapshot creation ioctls are called concurrently
we can get into a deadlock where the task enabling quotas will deadlock
on the fs_info->qgroup_ioctl_lock mutex because it attempts to lock it
twice, or the task creating a snapshot tries to commit the
From: Filipe Manana
If the quota enable and snapshot creation ioctls are called concurrently
we can get into a deadlock where the task enabling quotas will deadlock
on the fs_info->qgroup_ioctl_lock mutex because it attempts to lock it
twice. The following time diagram shows how this happens.
From: Filipe Manana
The available allocation bits members from struct btrfs_fs_info are
protected by a sequence lock, and when starting balance we access them
incorrectly in two different ways:
1) In the read sequence lock loop at btrfs_balance() we use the values we
read from
From: Filipe Manana
We can have a lot freed extents during the life span of transaction, so
the red black tree that keeps track of the ranges of each freed extent
(fs_info->freed_extents[]) can get quite big. When finishing a transaction
commit we find each range, process it (discard the
From: Filipe Manana
Commit d7396f07358a ("Btrfs: optimize key searches in btrfs_search_slot"),
dated from August 2013, introduced an optimization to search for keys in a
node/leaf to both btrfs_search_slot() and btrfs_search_old_slot(). For the
later, it ended up being reverted in commit
From: Filipe Manana
Test an incremental send operation in a scenario where the relationship
of ancestor-descendant between multiple directories is inversed, and
where multiple directories that were previously ancestors of another
directory now become descendents of multiple directories that used
From: Robbie Ko
When doing an incremental send, due to the need of delaying directory move
(rename) operations we can end up in infinite loop at
apply_children_dir_moves().
An example scenario that triggers this problem is described below, where
directory names correspond to the numbers of
From: Filipe Manana
We were using the path name received from user space without checking that
it is null terminated. While btrfs-progs is well behaved and does proper
validation and null termination, someone could call the ioctl and pass
a non-null terminated patch, leading to buffer overrun
From: Filipe Manana
The io_err field of struct btrfs_log_ctx is no longer used after the
recent simplification of the fast fsync path, where we now wait for
ordered extents to complete before logging the inode. We did this in
commit b5e6c3e170b7 ("btrfs: always wait on ordered extents at fsync
From: Filipe Manana
After the simplification of the fast fsync patch done recently by commit
b5e6c3e170b7 ("btrfs: always wait on ordered extents at fsync time") and
commit e7175a692765 ("btrfs: remove the wait ordered logic in the
log_one_extent path"), we got a very short time window where we
From: Filipe Manana
We currently are in a loop finding each range (corresponding to a btree
node/leaf) in a log root's extent io tree and then clean it up. This is a
waste of time since we are traversing the extent io tree's rb_tree more
times then needed (one for a range lookup and another for
From: Filipe Manana
When creating a block group we don't need to set the log for full commit
if the new block group is not used for data. Logged items can only point
to logical addresses of data block groups (through file extent items) so
there is no need to for the next fsync to fallback to a
From: Filipe Manana
We were sorting numerical values with the 'sort' tool without telling it
that we are sorting numbers, giving us unexpected ordering. So just pass
the '-n' option to the 'sort' tool.
Example:
$ echo -e "11\n9\n20" | sort
11
20
9
$ echo -e "11\n9\n20" | sort -n
9
11
20
From: Filipe Manana
A bug in file cloning/reflinking was recently found that afftected both
Btrfs and XFS, which was caused by allowing the cloning of an eof block
into the middle of a file when the eof is not aligned to the filesystem's
block size.
The fix consists of returning the errno
From: Filipe Manana
Test that we can not clone a range from a file A into the middle of a file B
when the range includes the last block of file A and file A's size is not
aligned with the filesystem's block size. Allowing such case would lead to
data corruption since the data between EOF and the
From: Filipe Manana
Test that deduplication of an entire file that has a size that is not
aligned to the filesystem's block size into the middle of a different
file does not corrupt the destination's file data by reflinking the last
(eof) block.
This test is motivated by a bug recently found
From: Filipe Manana
We currently allow cloning a range from a file which includes the last
block of the file even if the file's size is not aligned to the block
size. This is fine and useful when the destination file has the same size,
but when it does not and the range ends somewhere in the
From: Filipe Manana
If we attempt to deduplicate the last block of a file A into the middle of
a file B, and file A's size is not a multiple of the block size, we end
rounding the deduplication length to 0 bytes, to avoid the data corruption
issue fixed by commit de02b9f6bb65 ("Btrfs: fix data
From: Filipe Manana
Unless the '-s' option is passed to fssum, it should not detect file holes
and have their existence influence the computed checksum for a file. This
tool was added to test btrfs' send/receive feature, so that it checks for
any metadata and data differences between the
From: Filipe Manana
Recently we got a massive simplification for fsync, where for the fast
path we no longer log new extents while their respective ordered extents
are still running.
However that simplification introduced a subtle regression for the case
where we use a ranged fsync (msync).
From: Filipe Manana
The logged_start and logged_end variables, at btrfs_log_changed_extents(),
were added in commit 8c6c592831a0 ("btrfs: log csums for all modified
extents"). However since the recent simplification for fsync, which makes
us wait for all ordered extents to complete before
From: Filipe Manana
Tracking pending ordered extents per transaction was introduced in commit
50d9aa99bd35 ("Btrfs: make sure logged extents complete in the current
transaction V3") and later updated in commit 161c3549b45a ("Btrfs: change
how we wait for pending ordered extents").
However now
From: Filipe Manana
When we are writing out a free space cache, during the transaction commit
phase, we can end up in a deadlock which results in a stack trace like the
following:
schedule+0x28/0x80
btrfs_tree_read_lock+0x8e/0x120 [btrfs]
? finish_wait+0x80/0x80
From: Filipe Manana
When we are writing out a free space cache, during the transaction commit
phase, we can end up in a deadlock which results in a stack trace like the
following:
schedule+0x28/0x80
btrfs_tree_read_lock+0x8e/0x120 [btrfs]
? finish_wait+0x80/0x80
From: Filipe Manana
When we are writing out a free space cache, during the transaction commit
phase, we can end up in a deadlock which results in a stack trace like the
following:
schedule+0x28/0x80
btrfs_tree_read_lock+0x8e/0x120 [btrfs]
? finish_wait+0x80/0x80
From: Filipe Manana
We were iterating a block group's free space cache rbtree without locking
first the lock that protects it (the free_space_ctl->free_space_offset
rbtree is protected by the free_space_ctl->tree_lock spinlock).
KASAN reported an use-after-free problem when iterating such a
From: Filipe Manana
When we are writing out a free space cache, during the transaction commit
phase, we can end up in a deadlock which results in a stack trace like the
following:
schedule+0x28/0x80
btrfs_tree_read_lock+0x8e/0x120 [btrfs]
? finish_wait+0x80/0x80
From: Filipe Manana
This test was using the "mktemp -d" command to create a temporary
directory for storing send streams and computations from fssum, without
ever deleting them when it finishes. Therefore after running it for many
times it filled up all space from /tmp.
Fix this by using a
From: Filipe Manana
Test that if we have a very small file, with a size smaller than the
block size, then fallocate a very small range within the block size but
past the file's current size, fsync the file and then power fail, after
mounting the filesystem all the file data is there and the file
From: Filipe Manana
When using the NO_HOLES feature and logging a regular file, we were
expecting that if we find an inline extent, that either its size in ram
(uncompressed and unenconded) matches the size of the file or if it does
not, that it matches the sector size and it represents
From: Filipe Manana
At inode.c:compress_file_range(), under the "free_pages_out" label, we can
end up dereferencing the "pages" pointer when it has a NULL value. This
case happens when "start" has a value of 0 and we fail to allocate memory
for the "pages" pointer. When that happens we jump to
From: Filipe Manana
At inode.c:compress_file_range(), under the "free_pages_out" label, we can
end up dereferencing the "pages" pointer when it has a NULL value. This
case happens when "start" has a value of 0 and we fail to allocate memory
for the "pages" pointer. When that happens we jump to
From: Filipe Manana
At inode.c:evict_inode_truncate_pages(), when we iterate over the inode's
extent states, we access an extent state record's "state" field after we
unlocked the inode's io tree lock. This can lead to a use-after-free issue
because after we unlock the io tree that extent state
From: Filipe Manana
When writing out a block group free space cache we can end deadlocking
with ourseves on an extent buffer lock resulting in a warning like the
following:
[245043.379979] WARNING: CPU: 4 PID: 2608 at fs/btrfs/locking.c:251
btrfs_tree_lock+0x1be/0x1d0 [btrfs]
From: Filipe Manana
Test that if we move a file from a directory B to a directory A, replace
directory B with directory A, fsync the file and then power fail, after
mounting the filesystem the file has a single parent, named B and there
is no longer any directory with the name A.
This test is
From: Filipe Manana
In a scenario like the following:
mkdir /mnt/A # inode 258
mkdir /mnt/B # inode 259
touch /mnt/B/bar # inode 260
sync
mv /mnt/B/bar /mnt/A/bar
mv -T /mnt/A /mnt/B
fsync /mnt/B/bar
After replaying the log we end up
From: Filipe Manana
When replaying a log which contains a tmpfile (which necessarily has a
link count of 0) we end up calling inc_nlink(), at
fs/btrfs/tree-log.c:replay_one_buffer(), which produces a warning like
the following:
[195191.943673] WARNING: CPU: 0 PID: 6924 at fs/inode.c:342
From: Filipe Manana
Test that if we fsync a tmpfile, without adding a hard link to it, and
then power fail, we will be able to mount the filesystem without
triggering any crashes, warnings or corruptions.
This test is motivated by an issue in btrfs where this scenario triggered
a warning
From: Filipe Manana
Test that deduplication of an entire file that has a size that is not
aligned to the filesystem's block size into a different file does not
corrupt the destination's file data.
This test is motivated by a bug found in Btrfs which is fixed by the
following patch for the linux
From: Filipe Manana
If we deduplicate extents between two different files we can end up
corrupting data if the source range ends at the size of the source file,
the source file's size is not aligned to the filesystem's block size
and the destination range does not go past the size of the
From: Filipe Manana
Test that if we write into an unwritten extent of a file when there is no
more space left to allocate in the filesystem and then snapshot the file's
subvolume, after a clean shutdown the data was not lost.
This test is motivated by a bug found by Robbie Ko for which there is
From: Filipe Manana
Test that an incremental send operation produces correct results if a file
that has a prealloc (unwritten) extent beyond its EOF gets a hole punched
in a section of that prealloc extent.
This test is motivated by a bug found in btrfs which is fixed by a patch
for the linux
From: Filipe Manana
When doing an incremental send, if we have a file in the parent snapshot
that has prealloc extents beyond EOF and in the send snapshot it got a
hole punch that partially covers the prealloc extents, the send stream,
when replayed by a receiver, can result in a file that has a
From: Filipe Manana
The more common use case of send involves creating a RO snapshot and then
use it for a send operation. In this case it's not possible to have inodes
in the snapshot that have a link count of zero (inode with an orphan item)
since during snapshot creation we do the orphan
From: Filipe Manana
Test that we are able to do send operations when one of the source
snapshots (or subvolume) has a file that is deleted while there is still
a open file descriptor for that file.
This test is motivated by a bug found in btrfs which is fixed by a patch
for the linux kernel
From: Filipe Manana
At send.c:full_send_tree() we were setting the 'key' variable in the loop
while never using it later. We were also using two btrfs_key variables
to store the initial key for search and the key found in every iteration
of the loop. So remove this useless key assignment and use
From: Filipe Manana
The more common use case of send involves creating a RO snapshot and then
use it for a send operation. In this case it's not possible to have inodes
in the snapshot that have a link count of zero (inode with an orphan item)
since during snapshot creation we do the orphan
From: Filipe Manana
If we end up with logging an inode reference item which has the same name
but different index from the one we have persisted, we end up failing when
replaying the log with an errno value of -EEXIST. The error comes from
btrfs_add_link(), which is called from add_inode_ref(),
From: Filipe Manana
Test that if we have a file with 2 (or more) hard links in the same parent
directory, rename of the hard links, rename one of the other hard links to
the old name of the hard link we renamed before, create a new file in the
same parent directory with the old name of second
From: Filipe Manana
If we end up with logging an inode reference item which has the same name
but different index from the one we have persisted, we end up failing when
replaying the log with an errno value of -EEXIST. The error comes from
btrfs_add_link(), which is called from add_inode_ref(),
From: Filipe Manana
Test that if we do a buffered write to a file, fsync it, clone a range
from another file into our file that overlaps the previously written
range, fsync the file again and then power fail, after we mount again the
filesystem, no file data was lost or corrupted.
This test is
From: Filipe Manana
When we clone a range into a file we can end up dropping existing
extent maps (or trimming them) and replacing them with new ones if the
range to be cloned overlaps with a range in the destination inode.
When that happens we add the new extent maps to the list of modified
From: Filipe Manana
Test that if we do a buffered write to a file, fsync it, clone a range
from another file into our file that overlaps the previously written
range, fsync the file again and then power fail, after we mount again the
filesystem, no file data was lost or corrupted.
This test is
From: Filipe Manana
When we clone a range into a file we can end up dropping existing
extent maps (or trimming them) and replacing them with new ones if the
range to be cloned overlaps with a range in the destination inode.
When that happens we add the new extent maps to the list of modified
From: Filipe Manana
Test that if a power failure happens on a filesystem with quotas (qgroups)
enabled while the quota rescan kernel thread is running, we will be able
to mount the filesystem after the power failure.
This test is motivated by a recent regression introduced in the linux
kernel's
From: Filipe Manana
If a power failure happens while the qgroup rescan kthread is running,
the next mount operation will always fail. This is because of a recent
regression that makes qgroup_rescan_init() incorrectly return -EINVAL
when we are mounting the filesystem (through
,eof
/mnt/p0/d4/d7/fc7: 1 extent found
This resulted in the test failing like this:
btrfs/004 49s ... [failed, exit status 1]- output mismatch (see
/home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad)
--- tests/btrfs/004.out 2016-08-23 10:17:35.027012095 +0100
+++ /home/fdmanan
d in the test failing like this:
btrfs/004 49s ... [failed, exit status 1]- output mismatch (see
/home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad)
--- tests/btrfs/004.out 2016-08-23 10:17:35.027012095 +0100
+++ /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.
From: Filipe Manana
Test that if we create a new hard link for a file which was previously
fsync'ed, fsync a parent directory of the new hard link and power fail,
the parent directory exists after mounting the filesystem again. The
parent directory must be a new directory, not yet persisted.
From: Filipe Manana
If we failed during a rename exchange operation after starting/joining a
transaction, we would end up replacing the return value, stored in the
local 'ret' variable, with the return value from btrfs_end_transaction().
So this could end up returning 0 (success) to user space
From: Filipe Manana
When we add a new name for an inode which was logged in the current
transaction, we update the inode in the log so that its new name and
ancestors are added to the log. However when we do this we do not persist
the log, so the changes remain in memory only, and as a
From: Filipe Manana
Test that xattrs are not lost after calling fsync multiple times with a
filesystem commit in between the fsync calls.
This test is motivated by a bug found in btrfs which is fixed by a patch
for the linux kernel titled:
Btrfs: fix xattr loss after power
From: Filipe Manana
If a file has xattrs, we fsync it, to ensure we clear the flags
BTRFS_INODE_NEEDS_FULL_SYNC and BTRFS_INODE_COPY_EVERYTHING from its
inode, the current transaction commits and then we fsync it (without
either of those bits being set in its inode), we end up
From: Filipe Manana
In commit 471d557afed1 ("Btrfs: fix loss of prealloc extents past i_size
after fsync log replay"), on fsync, we started to always log all prealloc
extents beyond an inode's i_size in order to avoid losing them after a
power failure. However under some
From: Filipe Manana
An incremental send operation can miss a truncate operation when an inode
has an increased size in the send snapshot and a prealloc extent beyond
its size.
Consider the following scenario where a necessary truncate operation is
missing in the incremental
From: Filipe Manana
Test that fsync operations preserve extents allocated with fallocate(2)
that are placed beyond a file's size.
This test is motivated by a bug found in btrfs where unwritten extents
beyond the inode's i_size were not preserved after a fsync and power
From: Filipe Manana
Test that fsync operations preserve extents allocated with fallocate(2)
that are placed beyond a file's size.
This test is motivated by a bug found in btrfs where unwritten extents
beyond the inode's i_size were not preserved after a fsync and power
From: Filipe Manana
Currently if we allocate extents beyond an inode's i_size (through the
fallocate system call) and then fsync the file, we log the extents but
after a power failure we replay them and then immediately drop them.
This behaviour happens since about 2009,
From: Filipe Manana
Test that when we have the no-holes mode enabled and a specific metadata
layout, if we punch a hole and fsync the file, at replay time the whole
hole was preserved.
This issue is fixed by the following btrfs patch for the linux kernel:
"Btrfs: fix fsync
From: Filipe Manana
Test that when we have the no-holes mode enabled and a specific metadata
layout, if we punch a hole and fsync the file, at replay time the whole
hole was preserved.
This issue is fixed by the following btrfs patch for the linux kernel:
"Btrfs: fix fsync
From: Filipe Manana
When logging an inode, at tree-log.c:copy_items(), if we call
btrfs_next_leaf() at the loop which checks for the need to log holes, we
need to make sure copy_items() returns the value 1 to its caller and
not 0 (on success). This is because the path the
From: Filipe Manana
When we have the no-holes mode enabled and fsync a file after punching a
hole in it, we can end up not logging the whole hole range in the log tree.
This happens if the file has extent items that span more than one leaf and
we punch a hole that covers a
From: Filipe Manana
Verify that a filesystem check operation (fsck) does not report the
following scenario as an error:
An extent is shared between two inodes, as a result of clone/reflink
operation, and for one of the inodes, lets call it inode A, the extent is
referenced
From: Filipe Manana
Under some cases the filesystem checker reports an error when it finds
checksum items for an extent that is referenced by an inode as a prealloc
extent. Such cases are not an error when the extent is actually shared
(was cloned/reflinked) with other inodes
From: Filipe Manana
Verify that a filesystem check operation (fsck) does not report the
following scenario as an error:
An extent is shared between two inodes, as a result of clone/reflink
operation, and for one of the inodes, lets call it inode A, the extent is
referenced
From: Filipe Manana
Under some cases the filesystem checker reports an error when it finds
checksum items for an extent that is referenced by an inode as a prealloc
extent. Such cases are not an error when the extent is actually shared
(was cloned/reflinked) with other inodes
From: Filipe Manana
Under some cases the filesystem checker reports an error when it finds
checksum items for an extent that is referenced by an inode as a prealloc
extent. Such cases are not an error when the extent is actually shared
(was cloned/reflinked) with other inodes
From: Filipe Manana
Verify that a filesystem check operation (fsck) does not report the
following scenario as an error:
An extent is shared between two inodes, as a result of clone/reflink
operation, and for one of the inodes, lets call it inode A, the extent is
referenced
From: Filipe Manana
Test that when a fsync journal/log exists, if we rename a special file
(fifo, symbolic link or device), create a hard link for it with its old
name and then commit the journal/log, if a power loss happens the
filesystem will not fail to replay the
From: Filipe Manana
Test that if we have a file with two hard links in the same parent
directory, then remove of the links, create a new file in the same
parent directory and with the name of the link removed, fsync the new
file and have a power loss, mounting the filesystem
From: Filipe Manana
If in the same transaction we rename a special file (fifo, character/block
device or symbolic link), create a hard link for it having its old name
then sync the log, we will end up with a log that can not be replayed and
at when attempting to replay it, an
From: Filipe Manana
If we have a file with 2 (or more) hard links in the same directory,
remove one of the hard links, create a new file (or link an existing file)
in the same directory with the name of the removed hard link, and then
finally fsync the new file, we end up with
From: Filipe Manana
When send finishes processing an inode representing a regular file, it
always issues a truncate operation for that file, even if its size did
not change or the last write sets the file size correctly. In the most
common cases, the issued write operations
From: Filipe Manana
When we truncate a file to the same size and that size is not aligned
with the sector size, we end up triggering writeback (and wait for it to
complete) of the last page. This is unncessary as we can not have delayed
allocation beyond the inode's i_size and
From: Filipe Manana
When doing an incremental send of a filesystem with the no-holes feature
enabled, we end up issuing a write operation when using the no data mode
send flag, instead of issuing an update extent operation. Fix this by
issuing the update extent operation
From: Filipe Manana
When we are replacing a missing device we mount the filesystem with the
degraded mode option in which case we are allowed to have a btrfs device
structure without a backing device member (its bdev member is NULL) and
therefore we can't dereference that
From: Filipe Manana
For a fallocate's zero range operation that targets a range with an end
that is not aligned to the sector size, we can end up not updating the
inode's i_size. This happens when the last page of the range maps to an
unwritten (prealloc) extent and before
From: Filipe Manana
If we do a buffered write after a zero range operation that has an
unaligned (with the filesystem's sector size) end which also falls within
an unwritten (prealloc) extent that is currently beyond the inode's
i_size, and the zero range operation has the
From: Filipe Manana
Test that an incremental send operation works if a file that has multiple
hard links has some of its hard links renamed in the send snapshot, with
one of them getting the same path that some other inode had in the send
snapshot.
At the moment this test
1 - 100 of 414 matches
Mail list logo