Re: [Cluster-devel] [PATCH 08/12] ext4: hold i_rwsem until AIO completes

2020-01-14 Thread Christoph Hellwig
On Tue, Jan 14, 2020 at 04:50:23PM -0500, Theodore Y. Ts'o wrote: > I note that you've dropped the inode_dio_wait() in ext4's ZERO_RANGE, > COLLAPSE_RANGE, INSERT_RANGE, etc. We had added these to avoid > problems when various fallocate operations which modify the inode's > logical->physical

Re: [Cluster-devel] RFC: hold i_rwsem until aio completes

2020-01-14 Thread Christoph Hellwig
On Tue, Jan 14, 2020 at 10:47:07AM -0800, Matthew Wilcox wrote: > It would be helpful if we could also use the same lockdep logic > for PageLocked. Again, it's a case where returning to userspace with > PageLock held is fine, because we're expecting an interrupt to come in > and drop the lock for

Re: [Cluster-devel] RFC: hold i_rwsem until aio completes

2020-01-14 Thread Christoph Hellwig
On Tue, Jan 14, 2020 at 03:27:00PM -0400, Jason Gunthorpe wrote: > I've seen similar locking patterns quite a lot, enough I've thought > about having a dedicated locking primitive to do it. It really wants > to be a rwsem, but as here the rwsem rules don't allow it. > > The common pattern I'm

Re: [Cluster-devel] [PATCH 02/12] locking/rwsem: Exit early when held by an anonymous owner

2020-01-14 Thread Waiman Long
On 1/14/20 11:12 AM, Christoph Hellwig wrote: > The rwsem code overloads the owner field with either a task struct or > negative magic numbers. Add a quick hack to catch these negative > values early on. Without this spinning on a writer that replaced the > owner with RWSEM_OWNER_UNKNOWN,

Re: [Cluster-devel] [PATCH 02/12] locking/rwsem: Exit early when held by an anonymous owner

2020-01-14 Thread Christoph Hellwig
On Tue, Jan 14, 2020 at 01:17:45PM -0500, Waiman Long wrote: > The owner field is just a pointer to the task structure with the lower 3 > bits served as flag bits. Setting owner to RWSEM_OWNER_UNKNOWN (-2) will > stop optimistic spinning. So under what condition did the crash happen? When running

Re: [Cluster-devel] [PATCH 02/12] locking/rwsem: Exit early when held by an anonymous owner

2020-01-14 Thread Waiman Long
On 1/14/20 1:25 PM, Christoph Hellwig wrote: > On Tue, Jan 14, 2020 at 01:17:45PM -0500, Waiman Long wrote: >> The owner field is just a pointer to the task structure with the lower 3 >> bits served as flag bits. Setting owner to RWSEM_OWNER_UNKNOWN (-2) will >> stop optimistic spinning. So under

Re: [Cluster-devel] RFC: hold i_rwsem until aio completes

2020-01-14 Thread Matthew Wilcox
On Tue, Jan 14, 2020 at 05:12:13PM +0100, Christoph Hellwig wrote: > Second I/O > completions often come from interrupt context, which means the re-acquire > is recorded as from irq context, leading to warnings about incorrect > contexts. I wonder if we could just have a bit in lockdep that says

Re: [Cluster-devel] [PATCH 02/12] locking/rwsem: Exit early when held by an anonymous owner

2020-01-14 Thread Waiman Long
On 1/14/20 1:25 PM, Christoph Hellwig wrote: > On Tue, Jan 14, 2020 at 01:17:45PM -0500, Waiman Long wrote: >> The owner field is just a pointer to the task structure with the lower 3 >> bits served as flag bits. Setting owner to RWSEM_OWNER_UNKNOWN (-2) will >> stop optimistic spinning. So under

[Cluster-devel] [PATCH 09/12] gfs2: hold i_rwsem until AIO completes

2020-01-14 Thread Christoph Hellwig
Switch gfs from the magic i_dio_count scheme to just hold i_rwsem until the actual I/O has completed to reduce the locking complexity and avoid nasty bugs due to missing inode_dio_wait calls. Note that gfs only uses i_rwsem for direct I/O writes, not for reads so no change to the read behavior.

[Cluster-devel] [PATCH 01/12] mm: fix a comment in sys_swapon

2020-01-14 Thread Christoph Hellwig
claim_swapfile now always takes i_rwsem. Signed-off-by: Christoph Hellwig --- mm/swapfile.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index bb3261d45b6a..fe6e4c1add0b 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3157,7 +3157,7 @@

[Cluster-devel] [PATCH 08/12] ext4: hold i_rwsem until AIO completes

2020-01-14 Thread Christoph Hellwig
Switch ext4 from the magic i_dio_count scheme to just hold i_rwsem until the actual I/O has completed to reduce the locking complexity and avoid nasty bugs due to missing inode_dio_wait calls. Signed-off-by: Christoph Hellwig --- fs/ext4/extents.c | 12 fs/ext4/file.c|

[Cluster-devel] [PATCH 10/12] xfs: hold i_rwsem until AIO completes

2020-01-14 Thread Christoph Hellwig
Switch ext4 from the magic i_dio_count scheme to just hold i_rwsem until the actual I/O has completed to reduce the locking complexity and avoid nasty bugs due to missing inode_dio_wait calls. Signed-off-by: Christoph Hellwig --- fs/xfs/scrub/bmap.c| 1 - fs/xfs/xfs_bmap_util.c | 3 ---

[Cluster-devel] [PATCH 03/12] xfs: fix IOCB_NOWAIT handling in xfs_file_dio_aio_read

2020-01-14 Thread Christoph Hellwig
Direct I/O reads can also be used with RWF_NOWAIT & co. Fix the inode locking in xfs_file_dio_aio_read to take IOCB_NOWAIT into account. Signed-off-by: Christoph Hellwig --- fs/xfs/xfs_file.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_file.c

[Cluster-devel] [PATCH 04/12] gfs2: move setting current->backing_dev_info

2020-01-14 Thread Christoph Hellwig
Only set current->backing_dev_info just around the buffered write calls to prepare for the next fix. Signed-off-by: Christoph Hellwig --- fs/gfs2/file.c | 21 ++--- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index

[Cluster-devel] [PATCH 06/12] iomap: pass a flags value to iomap_dio_rw

2020-01-14 Thread Christoph Hellwig
Replace the wait_for_completion flag in struct iomap_dio with a new IOMAP_DIO_SYNCHRONOUS flag for dio->flags, and allow passing the initial flags to iomap_dio_rw. Also take the check for synchronous iocbs into iomap_dio_rw instead of duplicating it in all the callers. Signed-off-by: Christoph

[Cluster-devel] [PATCH 05/12] gfs2: fix O_SYNC write handling

2020-01-14 Thread Christoph Hellwig
Don't ignore the return value from generic_write_sync for the direct to buffered I/O callback case when written is non-zero. Also don't bother to call generic_write_sync for the pure direct I/O case, as iomap_dio_rw already takes care of that. Signed-off-by: Christoph Hellwig ---

[Cluster-devel] [PATCH 11/12] xfs: don't set IOMAP_DIO_SYNCHRONOUS for unaligned I/O

2020-01-14 Thread Christoph Hellwig
Now that i_rwsem is held until asynchronous writes complete, there is no need to force them to execute synchronously, as the i_rwsem protection is exactly the same as for synchronous writes. Signed-off-by: Christoph Hellwig --- fs/xfs/xfs_file.c | 12 +++- 1 file changed, 3

[Cluster-devel] [PATCH 12/12] iomap: remove the inode_dio_begin/end calls

2020-01-14 Thread Christoph Hellwig
Now that all iomap users hold i_rwsem over asynchronous I/O operations these calls can be removed. Signed-off-by: Christoph Hellwig --- fs/iomap/direct-io.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 0113ac33b0a0..c90ec82e8e08 100644

[Cluster-devel] [PATCH 02/12] locking/rwsem: Exit early when held by an anonymous owner

2020-01-14 Thread Christoph Hellwig
The rwsem code overloads the owner field with either a task struct or negative magic numbers. Add a quick hack to catch these negative values early on. Without this spinning on a writer that replaced the owner with RWSEM_OWNER_UNKNOWN, rwsem_spin_on_owner can crash while deferencing the

[Cluster-devel] RFC: hold i_rwsem until aio completes

2020-01-14 Thread Christoph Hellwig
Hi all, Asynchronous read/write operations currently use a rather magic locking scheme, were access to file data is normally protected using a rw_semaphore, but if we are doing aio where the syscall returns to userspace before the I/O has completed we also use an atomic_t to track the outstanding

Re: [Cluster-devel] [PATCH 08/12] ext4: hold i_rwsem until AIO completes

2020-01-14 Thread Theodore Y. Ts'o
On Tue, Jan 14, 2020 at 05:12:21PM +0100, Christoph Hellwig wrote: > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c > index 0e8708b77da6..b6aa2d249b30 100644 > --- a/fs/ext4/extents.c > +++ b/fs/ext4/extents.c > @@ -4777,9 +4777,6 @@ static long ext4_zero_range(struct file *file, loff_t >