Re: [Cluster-devel] [GFS2 PATCH] GFS2: Block reservation doubling scheme

2014-10-10 Thread Steven Whitehouse
Hi, On 10/10/14 04:39, Bob Peterson wrote: - Original Message - - Original Message - This patch introduces a new block reservation doubling scheme. If we Maybe I sent this patch out prematurely. Instead of doubling the reservation, maybe I should experiment with making it

[Cluster-devel] [PATCH] ext4: Fix jbd2 warning under heavy xattr load

2014-10-10 Thread Jan Kara
When heavily exercising xattr code the assertion that jbd2_journal_dirty_metadata() shouldn't return error was triggered: WARNING: at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1237 jbd2_journal_dirty_metadata+0x1ba/0x260() CPU: 0 PID: 8877 Comm: ceph-osd Tainted: GW

[Cluster-devel] [PATCH] block: improve rq_affinity placement

2014-10-10 Thread Jan Kara
From: Shaohua Li shaohua...@intel.com This patch reverts commit 35ae66e0a09ab70ed(block: Make rq_affinity = 1 work as expected). The purpose is to avoid an unnecessary IPI. Let's take an example. My test box has cpu 0-7, one socket. Say request is added from CPU 1, blk_complete_request() occurs

[Cluster-devel] [PATCH] ext4: Fix buffer double free in ext4_alloc_branch()

2014-10-10 Thread Jan Kara
Error recovery in ext4_alloc_branch() calls ext4_forget() even for buffer corresponding to indirect block it did not allocate. This leads to brelse() being called twice for that buffer (once from ext4_forget() and once from cleanup in ext4_ind_map_blocks()) leading to buffer use count

[Cluster-devel] [PATCH] block: free q-flush_rq in blk_init_allocated_queue error paths

2014-10-10 Thread Jan Kara
From: Dave Jones da...@redhat.com Commit 7982e90c3a57 (block: fix q-flush_rq NULL pointer crash on dm-mpath flush) moved an allocation to blk_init_allocated_queue(), but neglected to free that allocation on the error paths that follow. Signed-off-by: Dave Jones da...@fedoraproject.org Acked-by:

[Cluster-devel] [PATCH] vfs: Allocate anon_inode_inode in anon_inode_init()

2014-10-10 Thread Jan Kara
Currently we allocated anon_inode_inode in anon_inodefs_mount. This is somewhat fragile as if that function ever gets called again, it will overwrite anon_inode_inode pointer. So move the initialization of anon_inode_inode to anon_inode_init(). Signed-off-by: Jan Kara j...@suse.cz ---

[Cluster-devel] [PATCH] mm: Fixup pagecache_isize_extended() definitions for !CONFIG_MMU

2014-10-10 Thread Jan Kara
For !CONFIG_MMU systems we defined pagecache_isize_extended() in both include/linux/mm.h and mm/truncate.c which causes compilation error. Although pagecache_isize_extended() doesn't do anything useful for !CONFIG_MMU systems, it could do something in future and it's overhead isn't huge. So don't

[Cluster-devel] [PATCH] block: strict rq_affinity

2014-10-10 Thread Jan Kara
From: Dan Williams dan.j.willi...@intel.com Some systems benefit from completions always being steered to the strict requester cpu rather than the looser per-socket steering that blk_cpu_to_group() attempts by default. This is because the first CPU in the group mask ends up being completely

[Cluster-devel] [PATCH] printk: debug: Slow down printing to 9600 bauds

2014-10-10 Thread Jan Kara
Signed-off-by: Jan Kara j...@suse.cz --- include/trace/events/printk.h | 42 kernel/printk/printk.c| 112 -- 2 files changed, 151 insertions(+), 3 deletions(-) diff --git a/include/trace/events/printk.h

[Cluster-devel] [PATCH] ncpfs: fix rmdir returns Device or resource busy

2014-10-10 Thread Jan Kara
From: Dave Chiluk chi...@canonical.com 1d2ef5901483004d74947bbf78d5146c24038fe7 caused a regression in ncpfs such that directories could no longer be removed. This was because ncp_rmdir checked to see if a dentry could be unhashed before allowing it to be removed. Since

[Cluster-devel] [PATCH] udf: Avoid infinite loop when processing indirect ICBs

2014-10-10 Thread Jan Kara
We did not implement any bound on number of indirect ICBs we follow when loading inode. Thus corrupted medium could cause kernel to go into an infinite loop, possibly causing a stack overflow. Fix the possible stack overflow by removing recursion from __udf_read_inode() and limit number of

[Cluster-devel] [PATCH] timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks

2014-10-10 Thread Jan Kara
clockevents_increase_min_delta() calls printk() from under hrtimer_bases.lock. That causes lock inversion on scheduler locks because printk() can call into the scheduler. Lockdep puts it as: == [ INFO: possible circular locking dependency

[Cluster-devel] [PATCH 1/2] jbd2: Avoid pointless scanning of checkpoint lists

2014-10-10 Thread Jan Kara
Yuanhan has reported that when he is running fsync(2) heavy workload creating new files over ramdisk, significant amount of time is spent in __jbd2_journal_clean_checkpoint_list() trying to clean old transactions (but they cannot be cleaned up because flusher hasn't yet checkpointed those

[Cluster-devel] [PATCH] quota: Fix race between dqput() and dquot_scan_active()

2014-10-10 Thread Jan Kara
Currently last dqput() can race with dquot_scan_active() causing it to call callback for an already deactivated dquot. The race is as follows: CPU1CPU2 dqput() spin_lock(dq_list_lock); if (atomic_read(dquot-dq_count) 1) { - not taken if

[Cluster-devel] [PATCH 2/2] jbd2: Simplify calling convention around __jbd2_journal_clean_checkpoint_list

2014-10-10 Thread Jan Kara
__jbd2_journal_clean_checkpoint_list() returns number of buffers it freed but noone was using the value so just stop doing that. This also allows for simplifying the calling convention for journal_clean_once_cp_list(). Signed-off-by: Jan Kara j...@suse.cz --- fs/jbd2/checkpoint.c | 56

[Cluster-devel] [PATCH 1/2] vfs: Fix data corruption when blocksize pagesize for mmaped data

2014-10-10 Thread Jan Kara
-page_mkwrite() is used by filesystems to allocate blocks under a page which is becoming writeably mmapped in some process' address space. This allows a filesystem to return a page fault if there is not enough space available, user exceeds quota or similar problem happens, rather than silently

[Cluster-devel] [PATCH] ocfs2: Fix quota file corruption

2014-10-10 Thread Jan Kara
Global quota files are accessed from different nodes. Thus we cannot cache offset of quota structure in the quota file after we drop our node reference count to it because after that moment quota structure may be freed and reallocated elsewhere by a different node resulting in corruption of quota

[Cluster-devel] [PATCH] ext3: Fix deadlock in data=journal mode when fs is frozen

2014-10-10 Thread Jan Kara
When ext3 is used in data=journal mode, syncing filesystem makes sure all the data is committed in the journal but the data doesn't have to be checkpointed. ext3_freeze() then takes care of checkpointing all the data so all buffer heads are clean but pages can still have dangling dirty bits. So

[Cluster-devel] [PATCH for 3.14-stable] fanotify: fix double free of pending permission events

2014-10-10 Thread Jan Kara
commit 5838d4442bd5971687b72221736222637e03140d upstream. Commit 85816794240b (fanotify: Fix use after free for permission events) introduced a double free issue for permission events which are pending in group's notification queue while group is being destroyed. These events are freed from

[Cluster-devel] [PATCH 2/2] ext4: Fix mmap data corruption when blocksize pagesize

2014-10-10 Thread Jan Kara
Use truncate_isize_extended() when hole is being created in a file so that -page_mkwrite() will get called for the partial tail page if it is mmaped (see the first patch in the series for details). Signed-off-by: Jan Kara j...@suse.cz --- fs/ext4/inode.c | 6 +- 1 file changed, 5

[Cluster-devel] [PATCH 1/2 RESEND] bdi: Fix hung task on sync

2014-10-10 Thread Jan Kara
From: Derek Basehore dbaseh...@chromium.org bdi_wakeup_thread_delayed() used the mod_delayed_work() function to schedule work to writeback dirty inodes. The problem with this is that it can delay work that is scheduled for immediate execution, such as the work from sync_inodes_sb(). This can

[Cluster-devel] [PATCH] printk: enable interrupts before calling console_trylock_for_printk()

2014-10-10 Thread Jan Kara
We need interrupts disabled when calling console_trylock_for_printk() only so that cpu id we pass to can_use_console() remains valid (for other things console_sem provides all the exclusion we need and deadlocks on console_sem due to interrupts are impossible because we use down_trylock()).

[Cluster-devel] [PATCH 2/2] printk: Debug patch 2

2014-10-10 Thread Jan Kara
Signed-off-by: Jan Kara j...@suse.cz --- kernel/printk/printk.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index a39f4129f848..00a9ad5c2708 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@

[Cluster-devel] [PATCH] fs: Avoid userspace mounting anon_inodefs filesystem

2014-10-10 Thread Jan Kara
anon_inodefs filesystem is a kernel internal filesystem userspace shouldn't mess with. Remove registration of it so userspace cannot even try to mount it (which would fail anyway because the filesystem is MS_NOUSER). This fixes an oops triggered by trinity when it tried mounting anon_inodefs

[Cluster-devel] [PATCH 0/2 v2] Fix data corruption when blocksize pagesize for mmapped data

2014-10-10 Thread Jan Kara
Hello, this is a second version of the patches to fix data corruption in mmapped data when blocksize pagesize as tested by xfstests generic/030 test. The patchset fixes XFS and ext4. I've checked and btrfs doesn't need fixing because it doesn't support blocksize pagesize. If that's ever

[Cluster-devel] [PATCH] scsi: Keep interrupts disabled while submitting requests

2014-10-10 Thread Jan Kara
scsi_request_fn() can be called from softirq context during IO completion. If it enables interrupts there, HW interrupts can interrupt softirq processing and queue more IO completion work which can eventually lead to softlockup reports because IO completion softirq runs for too long. Keep

[Cluster-devel] [PATCH] sync: don't block the flusher thread waiting on IO

2014-10-10 Thread Jan Kara
From: Dave Chinner dchin...@redhat.com When sync does it's WB_SYNC_ALL writeback, it issues data Io and then immediately waits for IO completion. This is done in the context of the flusher thread, and hence completely ties up the flusher thread for the backing device until all the dirty inodes

[Cluster-devel] [PATCH] ext4: Avoid lock inversion between i_mmap_mutex and transaction start

2014-10-10 Thread Jan Kara
When DAX is enabled, it uses i_mmap_mutex as a protection against truncate during page fault. This inevitably forces i_mmap_mutex to rank outside of a transaction start and thus we have to avoid calling pagecache purging operations when transaction is started. Signed-off-by: Jan Kara j...@suse.cz

[Cluster-devel] [PATCH 2/2] ext4: Fix hole punching for files with indirect blocks

2014-10-10 Thread Jan Kara
Hole punching code for files with indirect blocks wrongly computed number of blocks which need to be cleared when traversing the indirect block tree. That could result in punching more blocks than actually requested and thus effectively cause a data loss. For example: fallocate -n -p 1024

[Cluster-devel] [PATCH 2/2 RESEND] bdi: Avoid oops on device removal

2014-10-10 Thread Jan Kara
After 839a8e8660b67 writeback: replace custom worker pool implementation with unbound workqueue when device is removed while we are writing to it we crash in bdi_writeback_workfn() - set_worker_desc() because bdi-dev is NULL. This can happen because even though bdi_unregister() cancels all

[Cluster-devel] [PATCH] ext4: Fix zeroing of page during writeback

2014-10-10 Thread Jan Kara
Tail of a page straddling inode size must be zeroed when being written out due to POSIX requirement that modifications of mmaped page beyond inode size must not be written to the file. ext4_bio_write_page() did this only for blocks fully beyond inode size but didn't properly zero blocks partially

[Cluster-devel] [PATCH 1/2] ext4: Fix block zeroing when punching holes in indirect block files

2014-10-10 Thread Jan Kara
free_holes_block() passed local variable as a block pointer to ext4_clear_blocks(). Thus ext4_clear_blocks() zeroed out this local variable instead of proper place in inode / indirect block. We later zero out proper place in inode / indirect block but don't dirty the inode / buffer again which can

[Cluster-devel] [PATCH RESEND] vfs: Return EINVAL for default SEEK_HOLE, SEEK_DATA implementation

2014-10-10 Thread Jan Kara
Generic implementation of SEEK_HOLE SEEK_DATA in generic_file_llseek_size() and default_llseek() behaved as if everything within i_size is data and everything beyond i_size is a hole. That makes sense at the first sight (and definitely is a valid implementation of the spec) but at the second

[Cluster-devel] [PATCH 1/2] printk: Debug patch1

2014-10-10 Thread Jan Kara
Signed-off-by: Jan Kara j...@suse.cz --- kernel/printk/printk.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index ea2d5f6962ed..a39f4129f848 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -1718,7

[Cluster-devel] [PATCH] lockdep: Dump info via tracing

2014-10-10 Thread Jan Kara
Signed-off-by: Jan Kara j...@suse.cz --- kernel/locking/lockdep.c | 707 +++ 1 file changed, 402 insertions(+), 305 deletions(-) diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index d24e4339b46d..b15e7dec55f6 100644 ---

[Cluster-devel] [PATCH] udf: Print error when inode is loaded

2014-10-10 Thread Jan Kara
Signed-off-by: Jan Kara j...@suse.cz --- fs/udf/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/udf/super.c b/fs/udf/super.c index 5401fc33f5cc..479875155d77 100644 --- a/fs/udf/super.c +++ b/fs/udf/super.c @@ -962,7 +962,7 @@ struct inode

[Cluster-devel] [PATCH 2/2] ext3: Don't check quota format when there are no quota files

2014-10-10 Thread Jan Kara
The check whether quota format is set even though there are no quota files with journalled quota is pointless and it actually makes it impossible to turn off journalled quotas (as there's no way to unset journalled quota format). Just remove the check. CC: sta...@vger.kernel.org Signed-off-by:

[Cluster-devel] [PATCH] writeback: plug writeback at a high level

2014-10-10 Thread Jan Kara
From: Dave Chinner dchin...@redhat.com tl;dr: 3 lines of code, 86% better fsmark thoughput consuming 13% less CPU and 43% lower runtime. Doing writeback on lots of little files causes terrible IOPS storms because of the per-mapping writeback plugging we do. This essentially causes imeediate

[Cluster-devel] [PATCH] block: Make rq_affinity = 1 work as expected

2014-10-10 Thread Jan Kara
From: Tao Ma boyu...@taobao.com Commit 5757a6d76c introduced a new rq_affinity = 2 so as to make the request completed in the __make_request cpu. But it makes the old rq_affinity = 1 not work any more. The root cause is that if the 'cpu' and 'req-cpu' is in the same group and cpu != req-cpu, ccpu

[Cluster-devel] [PATCH] jbd2: Optimize jbd2_log_do_checkpoint() a bit

2014-10-10 Thread Jan Kara
When we discover written out buffer in transaction checkpoint list we don't have to recheck validity of a transaction. Either this is the last buffer in a transaction - and then we are done - or this isn't and then we can just take another buffer from the checkpoint list without dropping

[Cluster-devel] [PATCH 0/12 v2] Moving i_dquot out of struct inode

2014-10-10 Thread Jan Kara
Hello, this patch set moves i_dquot array from struct inode into filesystem private part of the inode. Thus filesystems which don't need it save 2 pointers in their inodes (would be 3 after we add project quota support into generic quota). I have patches to move inode-i_data.private_list

[Cluster-devel] [PATCH 03/12] xfs: Set allowed quota types

2014-10-10 Thread Jan Kara
We support user, group, and project quotas. Tell VFS about it. CC: x...@oss.sgi.com CC: Dave Chinner da...@fromorbit.com Signed-off-by: Jan Kara j...@suse.cz --- fs/xfs/xfs_super.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index

[Cluster-devel] [PATCH 05/12] quota: Use optional inode field for i_dquot pointers

2014-10-10 Thread Jan Kara
i_dquot is a first candidate for using optional inode fields since it is used by relatively few filesystems (ext?, ocfs2, jfs, reiserfs). We cannot just pass quota pointers from filesystems to quota functions because during quotaon and quotaoff we have to traverse list of all inodes and manipulate

[Cluster-devel] [PATCH 11/12] jfs: Convert to private i_dquot field

2014-10-10 Thread Jan Kara
CC: Dave Kleikamp sha...@kernel.org CC: jfs-discuss...@lists.sourceforge.net Signed-off-by: Jan Kara j...@suse.cz --- fs/jfs/jfs_incore.h | 3 +++ fs/jfs/super.c | 13 + 2 files changed, 16 insertions(+) diff --git a/fs/jfs/jfs_incore.h b/fs/jfs/jfs_incore.h index

[Cluster-devel] [PATCH 12/12] vfs: Remove i_dquot field from inode

2014-10-10 Thread Jan Kara
All filesystems using VFS quotas are now converted to use their private i_dquot fields. Remove the i_dquot field from generic inode structure. Signed-off-by: Jan Kara j...@suse.cz --- fs/inode.c | 3 --- fs/super.c | 10 -- include/linux/fs.h | 3 --- 3 files changed,

[Cluster-devel] [PATCH 08/12] ext4: Convert to private i_dquot field

2014-10-10 Thread Jan Kara
CC: linux-e...@vger.kernel.org CC: Theodore Ts'o ty...@mit.edu Signed-off-by: Jan Kara j...@suse.cz --- fs/ext4/ext4.h | 4 fs/ext4/super.c | 10 ++ 2 files changed, 14 insertions(+) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index b0c225cdb52c..571a9f409e94 100644 ---

[Cluster-devel] [PATCH 10/12] reiserfs: Convert to private i_dquot field

2014-10-10 Thread Jan Kara
CC: reiserfs-de...@vger.kernel.org CC: Jeff Mahoney je...@suse.de Signed-off-by: Jan Kara j...@suse.cz --- fs/reiserfs/reiserfs.h | 4 fs/reiserfs/super.c| 13 + 2 files changed, 17 insertions(+) diff --git a/fs/reiserfs/reiserfs.h b/fs/reiserfs/reiserfs.h index

[Cluster-devel] [PATCH 02/12] gfs2: Set allowed quota types

2014-10-10 Thread Jan Kara
We support user and group quotas. Tell vfs about it. Acked-by: Steven Whitehouse swhit...@redhat.com CC: cluster-devel@redhat.com Signed-off-by: Jan Kara j...@suse.cz --- fs/gfs2/ops_fstype.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index

[Cluster-devel] [PATCH 07/12] ext3: Convert to private i_dquot field

2014-10-10 Thread Jan Kara
CC: linux-e...@vger.kernel.org Signed-off-by: Jan Kara j...@suse.cz --- fs/ext3/ext3.h | 4 fs/ext3/super.c | 13 + 2 files changed, 17 insertions(+) diff --git a/fs/ext3/ext3.h b/fs/ext3/ext3.h index e85ff15a060e..04f30a1f96cb 100644 --- a/fs/ext3/ext3.h +++ b/fs/ext3/ext3.h

[Cluster-devel] [PATCH 01/12] quota: Allow each filesystem to specify which quota types it supports

2014-10-10 Thread Jan Kara
Currently all filesystems supporting VFS quota support user and group quotas. With introduction of project quotas this is going to change so make sure filesystem isn't called for quota type it doesn't support by introduction of a bitmask determining which quota types each filesystem supports.

[Cluster-devel] [PATCH 09/12] ocfs2: Convert to private i_dquot field

2014-10-10 Thread Jan Kara
CC: Mark Fasheh mfas...@suse.com CC: Joel Becker jl...@evilplan.org CC: ocfs2-de...@oss.oracle.com Signed-off-by: Jan Kara j...@suse.cz --- fs/ocfs2/inode.h | 4 fs/ocfs2/super.c | 12 2 files changed, 16 insertions(+) diff --git a/fs/ocfs2/inode.h b/fs/ocfs2/inode.h index

Re: [Cluster-devel] [PATCH 01/12] quota: Allow each filesystem to specify which quota types it supports

2014-10-10 Thread Dave Kleikamp
On 10/10/2014 09:54 AM, Jan Kara wrote: Currently all filesystems supporting VFS quota support user and group quotas. With introduction of project quotas this is going to change so make sure filesystem isn't called for quota type it doesn't support by introduction of a bitmask determining

Re: [Cluster-devel] [PATCH] block: free q-flush_rq in blk_init_allocated_queue error paths

2014-10-10 Thread Jan Kara
On Fri 10-10-14 11:19:06, Dave Jones wrote: On Fri, Oct 10, 2014 at 04:23:07PM +0200, Jan Kara wrote: From: Dave Jones da...@redhat.com Commit 7982e90c3a57 (block: fix q-flush_rq NULL pointer crash on dm-mpath flush) moved an allocation to blk_init_allocated_queue(), but neglected

Re: [Cluster-devel] [PATCH 11/12] jfs: Convert to private i_dquot field

2014-10-10 Thread Dave Kleikamp
On 10/10/2014 09:55 AM, Jan Kara wrote: CC: Dave Kleikamp sha...@kernel.org CC: jfs-discuss...@lists.sourceforge.net Signed-off-by: Jan Kara j...@suse.cz --- fs/jfs/jfs_incore.h | 3 +++ fs/jfs/super.c | 13 + 2 files changed, 16 insertions(+) diff --git

Re: [Cluster-devel] [PATCH 11/12] jfs: Convert to private i_dquot field

2014-10-10 Thread Jan Kara
On Fri 10-10-14 10:33:02, Dave Kleikamp wrote: On 10/10/2014 09:55 AM, Jan Kara wrote: CC: Dave Kleikamp sha...@kernel.org CC: jfs-discuss...@lists.sourceforge.net Signed-off-by: Jan Kara j...@suse.cz --- fs/jfs/jfs_incore.h | 3 +++ fs/jfs/super.c | 13 + 2 files

Re: [Cluster-devel] [PATCH 11/12] jfs: Convert to private i_dquot field

2014-10-10 Thread Dave Kleikamp
You can add my Acked-by: Dave Kleikamp dave.kleik...@oracle.com On 10/10/2014 10:40 AM, Jan Kara wrote: On Fri 10-10-14 10:33:02, Dave Kleikamp wrote: On 10/10/2014 09:55 AM, Jan Kara wrote: CC: Dave Kleikamp sha...@kernel.org CC: jfs-discuss...@lists.sourceforge.net Signed-off-by: Jan Kara

Re: [Cluster-devel] [PATCH] block: free q-flush_rq in blk_init_allocated_queue error paths

2014-10-10 Thread Dave Jones
On Fri, Oct 10, 2014 at 04:23:07PM +0200, Jan Kara wrote: From: Dave Jones da...@redhat.com Commit 7982e90c3a57 (block: fix q-flush_rq NULL pointer crash on dm-mpath flush) moved an allocation to blk_init_allocated_queue(), but neglected to free that allocation on the error paths that