Hi,
On 10/10/14 04:39, Bob Peterson wrote:
- Original Message -
- Original Message -
This patch introduces a new block reservation doubling scheme. If we
Maybe I sent this patch out prematurely. Instead of doubling the
reservation, maybe I should experiment with making it
When heavily exercising xattr code the assertion that
jbd2_journal_dirty_metadata() shouldn't return error was triggered:
WARNING: at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1237
jbd2_journal_dirty_metadata+0x1ba/0x260()
CPU: 0 PID: 8877 Comm: ceph-osd Tainted: GW
From: Shaohua Li shaohua...@intel.com
This patch reverts commit 35ae66e0a09ab70ed(block: Make rq_affinity = 1
work as expected). The purpose is to avoid an unnecessary IPI.
Let's take an example. My test box has cpu 0-7, one socket. Say request is
added from CPU 1, blk_complete_request() occurs
Error recovery in ext4_alloc_branch() calls ext4_forget() even for
buffer corresponding to indirect block it did not allocate. This leads
to brelse() being called twice for that buffer (once from ext4_forget()
and once from cleanup in ext4_ind_map_blocks()) leading to buffer use
count
From: Dave Jones da...@redhat.com
Commit 7982e90c3a57 (block: fix q-flush_rq NULL pointer crash on
dm-mpath flush) moved an allocation to blk_init_allocated_queue(), but
neglected to free that allocation on the error paths that follow.
Signed-off-by: Dave Jones da...@fedoraproject.org
Acked-by:
Currently we allocated anon_inode_inode in anon_inodefs_mount. This is
somewhat fragile as if that function ever gets called again, it will
overwrite anon_inode_inode pointer. So move the initialization of
anon_inode_inode to anon_inode_init().
Signed-off-by: Jan Kara j...@suse.cz
---
For !CONFIG_MMU systems we defined pagecache_isize_extended() in both
include/linux/mm.h and mm/truncate.c which causes compilation error.
Although pagecache_isize_extended() doesn't do anything useful for
!CONFIG_MMU systems, it could do something in future and it's overhead
isn't huge. So don't
From: Dan Williams dan.j.willi...@intel.com
Some systems benefit from completions always being steered to the strict
requester cpu rather than the looser per-socket steering that
blk_cpu_to_group() attempts by default. This is because the first
CPU in the group mask ends up being completely
Signed-off-by: Jan Kara j...@suse.cz
---
include/trace/events/printk.h | 42
kernel/printk/printk.c| 112 --
2 files changed, 151 insertions(+), 3 deletions(-)
diff --git a/include/trace/events/printk.h
From: Dave Chiluk chi...@canonical.com
1d2ef5901483004d74947bbf78d5146c24038fe7 caused a regression in ncpfs such that
directories could no longer be removed. This was because ncp_rmdir checked
to see if a dentry could be unhashed before allowing it to be removed. Since
We did not implement any bound on number of indirect ICBs we follow when
loading inode. Thus corrupted medium could cause kernel to go into an
infinite loop, possibly causing a stack overflow.
Fix the possible stack overflow by removing recursion from
__udf_read_inode() and limit number of
clockevents_increase_min_delta() calls printk() from under
hrtimer_bases.lock. That causes lock inversion on scheduler locks because
printk() can call into the scheduler. Lockdep puts it as:
==
[ INFO: possible circular locking dependency
Yuanhan has reported that when he is running fsync(2) heavy workload
creating new files over ramdisk, significant amount of time is spent in
__jbd2_journal_clean_checkpoint_list() trying to clean old transactions
(but they cannot be cleaned up because flusher hasn't yet checkpointed
those
Currently last dqput() can race with dquot_scan_active() causing it to
call callback for an already deactivated dquot. The race is as follows:
CPU1CPU2
dqput()
spin_lock(dq_list_lock);
if (atomic_read(dquot-dq_count) 1) {
- not taken
if
__jbd2_journal_clean_checkpoint_list() returns number of buffers it
freed but noone was using the value so just stop doing that. This
also allows for simplifying the calling convention for
journal_clean_once_cp_list().
Signed-off-by: Jan Kara j...@suse.cz
---
fs/jbd2/checkpoint.c | 56
-page_mkwrite() is used by filesystems to allocate blocks under a page
which is becoming writeably mmapped in some process' address space. This
allows a filesystem to return a page fault if there is not enough space
available, user exceeds quota or similar problem happens, rather than
silently
Global quota files are accessed from different nodes. Thus we cannot
cache offset of quota structure in the quota file after we drop our
node reference count to it because after that moment quota structure may
be freed and reallocated elsewhere by a different node resulting in
corruption of quota
When ext3 is used in data=journal mode, syncing filesystem makes sure
all the data is committed in the journal but the data doesn't have to be
checkpointed. ext3_freeze() then takes care of checkpointing all the
data so all buffer heads are clean but pages can still have dangling
dirty bits. So
commit 5838d4442bd5971687b72221736222637e03140d upstream.
Commit 85816794240b (fanotify: Fix use after free for permission
events) introduced a double free issue for permission events which are
pending in group's notification queue while group is being destroyed.
These events are freed from
Use truncate_isize_extended() when hole is being created in a file so that
-page_mkwrite() will get called for the partial tail page if it is
mmaped (see the first patch in the series for details).
Signed-off-by: Jan Kara j...@suse.cz
---
fs/ext4/inode.c | 6 +-
1 file changed, 5
From: Derek Basehore dbaseh...@chromium.org
bdi_wakeup_thread_delayed() used the mod_delayed_work() function to
schedule work to writeback dirty inodes. The problem with this is that
it can delay work that is scheduled for immediate execution, such as the
work from sync_inodes_sb(). This can
We need interrupts disabled when calling console_trylock_for_printk()
only so that cpu id we pass to can_use_console() remains valid (for
other things console_sem provides all the exclusion we need and
deadlocks on console_sem due to interrupts are impossible because we use
down_trylock()).
Signed-off-by: Jan Kara j...@suse.cz
---
kernel/printk/printk.c | 14 --
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index a39f4129f848..00a9ad5c2708 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@
anon_inodefs filesystem is a kernel internal filesystem userspace
shouldn't mess with. Remove registration of it so userspace cannot
even try to mount it (which would fail anyway because the filesystem is
MS_NOUSER).
This fixes an oops triggered by trinity when it tried mounting
anon_inodefs
Hello,
this is a second version of the patches to fix data corruption in mmapped
data when blocksize pagesize as tested by xfstests generic/030 test.
The patchset fixes XFS and ext4. I've checked and btrfs doesn't need fixing
because it doesn't support blocksize pagesize. If that's ever
scsi_request_fn() can be called from softirq context during IO
completion. If it enables interrupts there, HW interrupts can interrupt
softirq processing and queue more IO completion work which can
eventually lead to softlockup reports because IO completion softirq runs
for too long. Keep
From: Dave Chinner dchin...@redhat.com
When sync does it's WB_SYNC_ALL writeback, it issues data Io and
then immediately waits for IO completion. This is done in the
context of the flusher thread, and hence completely ties up the
flusher thread for the backing device until all the dirty inodes
When DAX is enabled, it uses i_mmap_mutex as a protection against
truncate during page fault. This inevitably forces i_mmap_mutex to rank
outside of a transaction start and thus we have to avoid calling
pagecache purging operations when transaction is started.
Signed-off-by: Jan Kara j...@suse.cz
Hole punching code for files with indirect blocks wrongly computed
number of blocks which need to be cleared when traversing the indirect
block tree. That could result in punching more blocks than actually
requested and thus effectively cause a data loss. For example:
fallocate -n -p 1024
After 839a8e8660b67 writeback: replace custom worker pool
implementation with unbound workqueue when device is removed while we
are writing to it we crash in bdi_writeback_workfn() -
set_worker_desc() because bdi-dev is NULL. This can happen because
even though bdi_unregister() cancels all
Tail of a page straddling inode size must be zeroed when being written
out due to POSIX requirement that modifications of mmaped page beyond
inode size must not be written to the file. ext4_bio_write_page() did
this only for blocks fully beyond inode size but didn't properly zero
blocks partially
free_holes_block() passed local variable as a block pointer
to ext4_clear_blocks(). Thus ext4_clear_blocks() zeroed out this local
variable instead of proper place in inode / indirect block. We later
zero out proper place in inode / indirect block but don't dirty the
inode / buffer again which can
Generic implementation of SEEK_HOLE SEEK_DATA in
generic_file_llseek_size() and default_llseek() behaved as if everything
within i_size is data and everything beyond i_size is a hole. That makes
sense at the first sight (and definitely is a valid implementation of
the spec) but at the second
Signed-off-by: Jan Kara j...@suse.cz
---
kernel/printk/printk.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index ea2d5f6962ed..a39f4129f848 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1718,7
Signed-off-by: Jan Kara j...@suse.cz
---
kernel/locking/lockdep.c | 707 +++
1 file changed, 402 insertions(+), 305 deletions(-)
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index d24e4339b46d..b15e7dec55f6 100644
---
Signed-off-by: Jan Kara j...@suse.cz
---
fs/udf/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/udf/super.c b/fs/udf/super.c
index 5401fc33f5cc..479875155d77 100644
--- a/fs/udf/super.c
+++ b/fs/udf/super.c
@@ -962,7 +962,7 @@ struct inode
The check whether quota format is set even though there are no
quota files with journalled quota is pointless and it actually
makes it impossible to turn off journalled quotas (as there's
no way to unset journalled quota format). Just remove the check.
CC: sta...@vger.kernel.org
Signed-off-by:
From: Dave Chinner dchin...@redhat.com
tl;dr: 3 lines of code, 86% better fsmark thoughput consuming 13%
less CPU and 43% lower runtime.
Doing writeback on lots of little files causes terrible IOPS storms
because of the per-mapping writeback plugging we do. This
essentially causes imeediate
From: Tao Ma boyu...@taobao.com
Commit 5757a6d76c introduced a new rq_affinity = 2 so as to make
the request completed in the __make_request cpu. But it makes the
old rq_affinity = 1 not work any more. The root cause is that
if the 'cpu' and 'req-cpu' is in the same group and cpu != req-cpu,
ccpu
When we discover written out buffer in transaction checkpoint list we
don't have to recheck validity of a transaction. Either this is the last
buffer in a transaction - and then we are done - or this isn't and then
we can just take another buffer from the checkpoint list without
dropping
Hello,
this patch set moves i_dquot array from struct inode into filesystem private
part of the inode. Thus filesystems which don't need it save 2 pointers in
their inodes (would be 3 after we add project quota support into generic
quota).
I have patches to move inode-i_data.private_list
We support user, group, and project quotas. Tell VFS about it.
CC: x...@oss.sgi.com
CC: Dave Chinner da...@fromorbit.com
Signed-off-by: Jan Kara j...@suse.cz
---
fs/xfs/xfs_super.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index
i_dquot is a first candidate for using optional inode fields since it is
used by relatively few filesystems (ext?, ocfs2, jfs, reiserfs). We
cannot just pass quota pointers from filesystems to quota functions
because during quotaon and quotaoff we have to traverse list of all
inodes and manipulate
CC: Dave Kleikamp sha...@kernel.org
CC: jfs-discuss...@lists.sourceforge.net
Signed-off-by: Jan Kara j...@suse.cz
---
fs/jfs/jfs_incore.h | 3 +++
fs/jfs/super.c | 13 +
2 files changed, 16 insertions(+)
diff --git a/fs/jfs/jfs_incore.h b/fs/jfs/jfs_incore.h
index
All filesystems using VFS quotas are now converted to use their private
i_dquot fields. Remove the i_dquot field from generic inode structure.
Signed-off-by: Jan Kara j...@suse.cz
---
fs/inode.c | 3 ---
fs/super.c | 10 --
include/linux/fs.h | 3 ---
3 files changed,
CC: linux-e...@vger.kernel.org
CC: Theodore Ts'o ty...@mit.edu
Signed-off-by: Jan Kara j...@suse.cz
---
fs/ext4/ext4.h | 4
fs/ext4/super.c | 10 ++
2 files changed, 14 insertions(+)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index b0c225cdb52c..571a9f409e94 100644
---
CC: reiserfs-de...@vger.kernel.org
CC: Jeff Mahoney je...@suse.de
Signed-off-by: Jan Kara j...@suse.cz
---
fs/reiserfs/reiserfs.h | 4
fs/reiserfs/super.c| 13 +
2 files changed, 17 insertions(+)
diff --git a/fs/reiserfs/reiserfs.h b/fs/reiserfs/reiserfs.h
index
We support user and group quotas. Tell vfs about it.
Acked-by: Steven Whitehouse swhit...@redhat.com
CC: cluster-devel@redhat.com
Signed-off-by: Jan Kara j...@suse.cz
---
fs/gfs2/ops_fstype.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index
CC: linux-e...@vger.kernel.org
Signed-off-by: Jan Kara j...@suse.cz
---
fs/ext3/ext3.h | 4
fs/ext3/super.c | 13 +
2 files changed, 17 insertions(+)
diff --git a/fs/ext3/ext3.h b/fs/ext3/ext3.h
index e85ff15a060e..04f30a1f96cb 100644
--- a/fs/ext3/ext3.h
+++ b/fs/ext3/ext3.h
Currently all filesystems supporting VFS quota support user and group
quotas. With introduction of project quotas this is going to change so
make sure filesystem isn't called for quota type it doesn't support by
introduction of a bitmask determining which quota types each filesystem
supports.
CC: Mark Fasheh mfas...@suse.com
CC: Joel Becker jl...@evilplan.org
CC: ocfs2-de...@oss.oracle.com
Signed-off-by: Jan Kara j...@suse.cz
---
fs/ocfs2/inode.h | 4
fs/ocfs2/super.c | 12
2 files changed, 16 insertions(+)
diff --git a/fs/ocfs2/inode.h b/fs/ocfs2/inode.h
index
On 10/10/2014 09:54 AM, Jan Kara wrote:
Currently all filesystems supporting VFS quota support user and group
quotas. With introduction of project quotas this is going to change so
make sure filesystem isn't called for quota type it doesn't support by
introduction of a bitmask determining
On Fri 10-10-14 11:19:06, Dave Jones wrote:
On Fri, Oct 10, 2014 at 04:23:07PM +0200, Jan Kara wrote:
From: Dave Jones da...@redhat.com
Commit 7982e90c3a57 (block: fix q-flush_rq NULL pointer crash on
dm-mpath flush) moved an allocation to blk_init_allocated_queue(), but
neglected
On 10/10/2014 09:55 AM, Jan Kara wrote:
CC: Dave Kleikamp sha...@kernel.org
CC: jfs-discuss...@lists.sourceforge.net
Signed-off-by: Jan Kara j...@suse.cz
---
fs/jfs/jfs_incore.h | 3 +++
fs/jfs/super.c | 13 +
2 files changed, 16 insertions(+)
diff --git
On Fri 10-10-14 10:33:02, Dave Kleikamp wrote:
On 10/10/2014 09:55 AM, Jan Kara wrote:
CC: Dave Kleikamp sha...@kernel.org
CC: jfs-discuss...@lists.sourceforge.net
Signed-off-by: Jan Kara j...@suse.cz
---
fs/jfs/jfs_incore.h | 3 +++
fs/jfs/super.c | 13 +
2 files
You can add my
Acked-by: Dave Kleikamp dave.kleik...@oracle.com
On 10/10/2014 10:40 AM, Jan Kara wrote:
On Fri 10-10-14 10:33:02, Dave Kleikamp wrote:
On 10/10/2014 09:55 AM, Jan Kara wrote:
CC: Dave Kleikamp sha...@kernel.org
CC: jfs-discuss...@lists.sourceforge.net
Signed-off-by: Jan Kara
On Fri, Oct 10, 2014 at 04:23:07PM +0200, Jan Kara wrote:
From: Dave Jones da...@redhat.com
Commit 7982e90c3a57 (block: fix q-flush_rq NULL pointer crash on
dm-mpath flush) moved an allocation to blk_init_allocated_queue(), but
neglected to free that allocation on the error paths that
57 matches
Mail list logo