from:"Andreas Gruenbacher"

Re: [PATCH 00/18] Rid W=1 warnings from GFS2

2021-04-09 Thread Andreas Gruenbacher

Hi Lee,

On Tue, Apr 6, 2021 at 1:54 PM Lee Jones  wrote:
> > > These have been on the list for a couple of weeks now.

thanks for your fixes, I've gone through them now. I've fixed up some
comments instead of "demoting" them to make the patch somewhat less
destructive, and I found a few more minor issues along the way. Those
changes are now all in the following commit:

  
https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git/commit/?id=c551f66c5dfef

Thanks,
Andreas

[GIT PULL] gfs2 fixes for v5.12-rc6

2021-04-03 Thread Andreas Gruenbacher

Hi Linus,

please consider pulling the following two gfs2 fixes for v5.12-rc6.

Thanks,
Andreas

The following changes since commit b77b5fdd052e7ee61b35164abb10e8433d3160e8:

  Merge tag 'gfs2-v5.12-rc2-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 (2021-03-12 
11:46:09 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git 
tags/gfs2-v5.12-rc2-fixes2

for you to fetch changes up to ff132c5f93c06bd4432bbab5c369e468653bdec4:

  gfs2: report "already frozen/thawed" errors (2021-03-25 18:53:38 +0100)


Two more gfs2 fixes


Andrew Price (1):
  gfs2: Flag a withdraw if init_threads() fails

Bob Peterson (1):
  gfs2: report "already frozen/thawed" errors

 fs/gfs2/super.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

Re: fs/gfs2/rgrp.c:1772:53: warning: Possible null pointer dereference: minext [nullPointer]

2021-04-02 Thread Andreas Gruenbacher

gfs2: Silence possible null pointer dereference warning

In gfs2_rbm_find, rs is always NULL when minext is NULL, so
gfs2_reservation_check_and_update will never be called on a NULL minext.
This isn't immediately obvious though, so also check for a NULL minext
for better code readability.

Reported-by: kernel test robot 
Signed-off-by: Andreas Gruenbacher 
---
 fs/gfs2/rgrp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 89c37a845e64..2dab313442a7 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1783,7 +1783,7 @@ static int gfs2_rbm_find(struct gfs2_rbm *rbm, u8 state, 
u32 *minext,
goto next_bitmap;
}
rbm->offset = offset;
-   if (!rs)
+   if (!rs || !minext)
return 0;
 
ret = gfs2_reservation_check_and_update(rbm, rs, *minext,
-- 
2.27.0

[GIT PULL] gfs2 fixes for 5.12-rc3

2021-03-12 Thread Andreas Gruenbacher

Hi Linus,

please consider pulling the following gfs2 fixes.

Thanks,
Andreas

The following changes since commit a38fd8748464831584a19438cbb3082b5a2dab15:

  Linux 5.12-rc2 (2021-03-05 17:33:41 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git 
tags/gfs2-v5.12-rc2-fixes

for you to fetch changes up to 0efc4976e3da40b09c592b21f722022d8f12a16b:

  gfs2: bypass log flush if the journal is not live (2021-03-12 15:52:48 +0100)


Various gfs2 fixes


Bob Peterson (3):
  gfs2: fix use-after-free in trans_drain
  gfs2: bypass signal_our_withdraw if no journal
  gfs2: bypass log flush if the journal is not live

Yang Li (1):
  gfs2: make function gfs2_make_fs_ro() to void type

 fs/gfs2/log.c|  6 +-
 fs/gfs2/ops_fstype.c |  4 +---
 fs/gfs2/super.c  | 10 ++
 fs/gfs2/super.h  |  2 +-
 fs/gfs2/trans.c  |  2 ++
 fs/gfs2/util.c   | 17 +++--
 6 files changed, 22 insertions(+), 19 deletions(-)

[GIT PULL] GFS2 changes for 5.12

2021-02-23 Thread Andreas Gruenbacher

Hi Linus,

please consider pulling the following gfs2 changes for 5.12.  My apologies for
the late request; we ended up getting stuck with two broken patches that have
now both been removed.

Thanks a lot,
Andreas

The following changes since commit 19c329f6808995b142b3966301f217c831e7cf31:

  Linux 5.11-rc4 (2021-01-17 16:37:05 -0800)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git 
tags/gfs2-for-5.12

for you to fetch changes up to 17d77684088510df84ff8285982d0eed52cd5890:

  gfs2: Don't get stuck with I/O plugged in gfs2_ail1_flush (2021-02-23 
19:01:42 +0100)


Changes in gfs2:
* Log space and revoke accounting rework to fix some failed asserts.
* Local resource group glock sharing for better local performance.
* Add support for version 1802 filesystems: trusted xattr support and
  '-o rgrplvb' mounts by default.
* Actually synchronize on the inode glock's FREEING bit during withdraw
  ("gfs2: fix glock confusion in function signal_our_withdraw").
* Fix parallel recovery of multiple journals ("gfs2: keep bios separate
  for each journal").
* Various other bug fixes.

--------
Andreas Gruenbacher (37):
  gfs2: Turn gfs2_rbm_incr into gfs2_rbm_add
  gfs2: Only use struct gfs2_rbm for bitmap manipulations
  gfs2: Get rid of unnecessary variable in gfs2_alloc_blocks
  gfs2: Minor gfs2_inplace_reserve cleanup
  Revert "gfs2: Don't reject a supposedly full bitmap if we have blocks 
reserved"
  gfs2: Don't clear GBF_FULL flags in rs_deltree
  gfs2: Set GBF_FULL flags when reading resource group
  gfs2: Un-obfuscate function jdesc_find_i
  gfs2: Simplify the buf_limit and databuf_limit definitions
  gfs2: Minor gfs2_write_revokes cleanups
  gfs2: Some documentation updates
  gfs2: Minor debugging improvement
  gfs2: Rename gfs2_{write => flush}_revokes
  gfs2: Clean up ail2_empty
  gfs2: Use sb_start_intwrite in gfs2_ail_empty_gl
  gfs2: Clean up on-stack transactions
  gfs2: Get rid of sd_reserving_log
  gfs2: Move lock flush locking to gfs2_trans_{begin,end}
  gfs2: Don't wait for journal flush in clean_journal
  gfs2: Clean up gfs2_log_reserve
  gfs2: Use a tighter bound in gfs2_trans_begin
  gfs2: Get rid of current_tail()
  gfs2: Move function gfs2_ail_empty_tr
  gfs2: Lock imbalance on error path in gfs2_recover_one
  gfs2: Add trusted xattr support
  gfs2: Recursive gfs2_quota_hold in gfs2_iomap_end
  gfs2: Also reflect single-block allocations in rgd->rd_extfail_pt
  gfs2: Only pass reservation down to gfs2_rbm_find
  gfs2: Don't search for unreserved space twice
  gfs2: Check for active reservation in gfs2_release
  gfs2: Rename rs_{free -> requested} and rd_{reserved -> requested}
  gfs2: Add per-reservation reserved block accounting
  gfs2: Add local resource group locking
  gfs2: Minor calc_reserved cleanup
  gfs2: Rework the log space allocation logic
  gfs2: Per-revoke accounting in transactions
  Merge branches 'rgrp-glock-sharing' and 'gfs2-revoke' from 
https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git

Andrew Price (2):
  gfs2: Fix invalid block size message
  gfs2: Enable rgrplvb for sb_fs_format 1802

Bob Peterson (10):
  gfs2: Add common helper for holding and releasing the freeze glock
  gfs2: move freeze glock outside the make_fs_rw and _ro functions
  gfs2: make gfs2_log_write_page static
  Revert "GFS2: Re-add a call to log_flush_wait when flushing the journal"
  gfs2: fix glock confusion in function signal_our_withdraw
  gfs2: keep bios separate for each journal
  gfs2: Don't skip dlm unlock if glock has an lvb
  gfs2: Allow node-wide exclusive glock sharing
  gfs2: Use resource group glock sharing
  gfs2: Don't get stuck with I/O plugged in gfs2_ail1_flush

Zhaoyang Huang (1):
  gfs2: amend SLAB_RECLAIM_ACCOUNT on gfs2 related slab cache

 fs/gfs2/bmap.c   |  10 +-
 fs/gfs2/file.c   |   8 +-
 fs/gfs2/glock.c  |  22 +-
 fs/gfs2/glock.h  |   6 +
 fs/gfs2/glops.c  |  38 +--
 fs/gfs2/incore.h |  54 ++--
 fs/gfs2/inode.c  |   6 +-
 fs/gfs2/lock_dlm.c   |   8 +-
 fs/gfs2/log.c| 525 ++-
 fs/gfs2/log.h|  20 +-
 fs/gfs2/lops.c   |  26 +-
 fs/gfs2/lops.h   |  23 +-
 fs/gfs2/main.c   |   4 +-
 fs/gfs2/ops_fstype.c |  71 --
 fs/gfs2/recovery.c   |  14 +-
 fs/gfs2/rgrp.c   | 442 
 fs/gfs2/rgrp.h   |   6 +-
 fs

[GIT PULL] GFS2 changes for 5.11

2020-12-19 Thread Andreas Gruenbacher

Hi Linus,

could you please pull the following gfs2 changes for 5.11?

Thanks a lot,
Andreas

The following changes since commit dd0ecf544125639e54056d851e4887dbb94b6d2f:

  gfs2: Fix deadlock between gfs2_{create_inode,inode_lookup} and 
delete_work_func (2020-12-01 00:21:10 +0100)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git 
tags/gfs2-for-5.11

for you to fetch changes up to 6e5c4ea37a99e5b97aba227fc43f3682d4bc0496:

  gfs2: in signal_our_withdraw wait for unfreeze of _this_ fs only (2020-12-03 
17:04:41 +0100)


Changes in gfs2:
* Don't wait for unfreeze of the wrong filesystems.
* Remove an obsolete delete_work_func hack and an incorrect sb_start_write.
* Minor documentation updates and cosmetic care.


Andreas Gruenbacher (2):
  gfs2: Make inode operations static
  Revert "GFS2: Prevent delete work from occurring on glocks used for 
create"

Andrew Price (2):
  Documentation: Update filesystems/gfs2.rst
  MAINTAINERS: Add gfs2 bug tracker link

Bob Peterson (2):
  gfs2: Remove sb_start_write from gfs2_statfs_sync
  gfs2: in signal_our_withdraw wait for unfreeze of _this_ fs only

Tom Rix (1):
  gfs2: remove trailing semicolons from macro definitions

 Documentation/filesystems/gfs2.rst | 37 ++---
 MAINTAINERS|  2 +-
 fs/gfs2/glock.c|  8 
 fs/gfs2/incore.h   |  1 -
 fs/gfs2/inode.c| 16 
 fs/gfs2/inode.h|  3 ---
 fs/gfs2/super.c|  2 --
 fs/gfs2/util.c |  2 +-
 fs/gfs2/util.h |  6 +++---
 9 files changed, 31 insertions(+), 46 deletions(-)

[GIT PULL] gfs2 fixes for 5.10-rc5

2020-12-02 Thread Andreas Gruenbacher

Hi Linus,

please consider pulling the following additional gfs2 fixes.

Thanks,
Andreas

The following changes since commit 418baf2c28f3473039f2f7377760bd8f6897ae18:

  Linux 5.10-rc5 (2020-11-22 15:36:08 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git 
tags/gfs2-v5.10-rc5-fixes

for you to fetch changes up to dd0ecf544125639e54056d851e4887dbb94b6d2f:

  gfs2: Fix deadlock between gfs2_{create_inode,inode_lookup} and 
delete_work_func (2020-12-01 00:21:10 +0100)


Various gfs2 fixes


Alexander Aring (2):
  gfs2: Fix deadlock dumping resource group glocks
  gfs2: set lockdep subclass for iopen glocks

Andreas Gruenbacher (2):
  gfs2: Upgrade shared glocks for atime updates
  gfs2: Fix deadlock between gfs2_{create_inode,inode_lookup} and 
delete_work_func

Bob Peterson (2):
  gfs2: check for empty rgrp tree in gfs2_ri_update
  gfs2: Don't freeze the file system during unmount

 fs/gfs2/glock.c  |  1 +
 fs/gfs2/glops.c  |  6 --
 fs/gfs2/incore.h |  1 +
 fs/gfs2/inode.c  | 42 --
 fs/gfs2/rgrp.c   |  4 
 5 files changed, 42 insertions(+), 12 deletions(-)

Re: [PATCH] posix_acl.h: define missing ACL functions on non-posix-acl build

2020-11-30 Thread Andreas Gruenbacher

On Mon, Nov 30, 2020 at 2:09 PM Andreas Gruenbacher  wrote:
> Note that ext2 / ext4 could be built without POSIX ACL support in the
> past. That's at least broken since the following two commits though:
>
>   commit 59fed3bf8a461 ("ext2: cache NULL when both default_acl and
> acl are NULL")
>   commit 6fd941784b8ac ("ext4: cache NULL when both default_acl and
> acl are NULL")

Scratch that, this is in fs/ext[24]/acl.c which is only included when
CONFIG_FS_POSIX_ACL is defined.

Thanks,
Andreas

Re: [PATCH] posix_acl.h: define missing ACL functions on non-posix-acl build

2020-11-30 Thread Andreas Gruenbacher

On Mon, Nov 30, 2020 at 5:29 AM Randy Dunlap  wrote:
> On 11/29/20 7:37 PM, Sergey Senozhatsky wrote:
> > A quick question, shouldn't there be dummy definitions for
> > the EXPORT_SYMBOL-s? So that external modules can be modprobed
> > and used.
> >
> > Some of posix_acl exported symbols have dummy definitions,
> > others don't.
> >
> > E.g. posix_acl_create() is exported symbol and it's defined for
> > both FS_POSIX_ACL and !FS_POSIX_ACL. While exported set_posix_acl()
> > is defined only for FS_POSIX_ACL config.

This is to keep the amount of ifdefs in the code reasonably low: by
defining posix_acl_create as a dummy inline function like that, inode
creation in filesystems can be implemented without any ifdefs as in
jffs2_init_acl_pre whether or not CONFIG_FS_POSIX_ACL is enabled, for
example. Have a look at different filesystems to see how they avoid
using POSIX ACL code when that feature is disabled.

Note that ext2 / ext4 could be built without POSIX ACL support in the
past. That's at least broken since the following two commits though:

  commit 59fed3bf8a461 ("ext2: cache NULL when both default_acl and
acl are NULL")
  commit 6fd941784b8ac ("ext4: cache NULL when both default_acl and
acl are NULL")

> Hi,
>
> Currently CONFIG_FS_POSIX_ACL differences seem to be handled in
> each source file as needed:
>
> fs/inode.c:#ifdef CONFIG_FS_POSIX_ACL
> fs/inode.c:#ifdef CONFIG_FS_POSIX_ACL
> fs/namei.c:#ifdef CONFIG_FS_POSIX_ACL
> fs/overlayfs/dir.c: if (!IS_ENABLED(CONFIG_FS_POSIX_ACL) || !acl)
> fs/overlayfs/inode.c:   if (!IS_ENABLED(CONFIG_FS_POSIX_ACL) || 
> !IS_POSIXACL(realinode))
> fs/overlayfs/inode.c:#ifdef CONFIG_FS_POSIX_ACL
> fs/overlayfs/super.c:#ifdef CONFIG_FS_POSIX_ACL
> fs/xattr.c:#ifdef CONFIG_FS_POSIX_ACL
> include/linux/evm.h:#ifdef CONFIG_FS_POSIX_ACL
> include/linux/fs.h:#ifdef CONFIG_FS_POSIX_ACL
> include/linux/posix_acl.h:#ifdef CONFIG_FS_POSIX_ACL
> include/linux/posix_acl.h:#endif /* CONFIG_FS_POSIX_ACL */
> include/linux/posix_acl_xattr.h:#ifdef CONFIG_FS_POSIX_ACL
>
> However, I have no objection to your patch.
>
> I am adding Andreas & Al for their viewpoints.

Sergey, what actual problem is your patch trying to solve? It sounds
like this is either theoretical and pointless, or you're trying to
build an external module that uses POSIX ACL functions that shouldn't
be needed when CONFIG_FS_POSIX_ACL is disabled. In the latter case,
the external module will just end up including dead code, so the
module should be fixed instead.

Thanks,
Andreas

Re: [PATCH] gfs2: remove trailing semicolon in macro definition

2020-11-27 Thread Andreas Gruenbacher

On Fri, Nov 27, 2020 at 8:12 PM  wrote:
> From: Tom Rix 
>
> The macro use will already have a semicolon.
>
> Signed-off-by: Tom Rix 
> ---
>  fs/gfs2/util.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/gfs2/util.h b/fs/gfs2/util.h
> index d7562981b3a0..493020393ceb 100644
> --- a/fs/gfs2/util.h
> +++ b/fs/gfs2/util.h
> @@ -162,7 +162,7 @@ void gfs2_io_error_bh_i(struct gfs2_sbd *sdp, struct 
> buffer_head *bh,
>  gfs2_io_error_bh_i((sdp), (bh), __func__, __FILE__, __LINE__, true);
>
>  #define gfs2_io_error_bh(sdp, bh) \
> -gfs2_io_error_bh_i((sdp), (bh), __func__, __FILE__, __LINE__, false);
> +gfs2_io_error_bh_i((sdp), (bh), __func__, __FILE__, __LINE__, false)
>
>
>  extern struct kmem_cache *gfs2_glock_cachep;
> --
> 2.18.4

Yeah okay, there are two more instances of exactly the same pattern
further below in the same header file. I'm adding fixes for those as
well.

Thanks,
Andreas

[GIT PULL] Another gfs2 fix for 5.10-rc4

2020-11-18 Thread Andreas Gruenbacher

Hi Linus,

please consider pulling the following additional gfs2 fix.

Thanks,
Andreas

The following changes since commit 09162bc32c880a791c6c0668ce0745cf7958f576:

  Linux 5.10-rc4 (2020-11-15 16:44:31 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git 
tags/gfs2-v5.10-rc4-fixes

for you to fetch changes up to 20b329129009caf1c646152abe09b697227e1c37:

  gfs2: Fix regression in freeze_go_sync (2020-11-18 16:28:11 +0100)


Fix gfs2 freeze/thaw


Bob Peterson (1):
  gfs2: Fix regression in freeze_go_sync

 fs/gfs2/glops.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

[GIT PULL] GFS2 fixes for 5.10-rc3

2020-11-12 Thread Andreas Gruenbacher

Hi Linus,

please consider pulling the following additional gfs2 fixes on top of 5.10-rc3.

Thank you very much,
Andreas

The following changes since commit f8394f232b1eab649ce2df5c5f15b0e528c92091:

  Linux 5.10-rc3 (2020-11-08 16:10:16 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git 
tags/gfs2-v5.10-rc3-fixes

for you to fetch changes up to 4e79e3f08e576acd51dffb4520037188703238b3:

  gfs2: Fix case in which ail writes are done to jdata holes (2020-11-12 
18:55:20 +0100)


Fix jdata data corruption and glock reference leak


Bob Peterson (2):
  Revert "gfs2: Ignore journal log writes for jdata holes"
  gfs2: Fix case in which ail writes are done to jdata holes

Zhang Qilong (1):
  gfs2: fix possible reference leak in gfs2_check_blk_type

 fs/gfs2/aops.c |  2 +-
 fs/gfs2/bmap.c |  8 ++--
 fs/gfs2/log.c  |  2 ++
 fs/gfs2/rgrp.c | 10 +-
 4 files changed, 10 insertions(+), 12 deletions(-)

[GIT PULL] GFS2 fixes for 5.10-rc1

2020-11-05 Thread Andreas Gruenbacher

Hi Linus,

please consider pulling the following gfs2 fixes on top of 5.10-rc1 (ish).

Thank you very much,
Andreas

The following changes since commit 4525c8781ec0701ce824e8bd379ae1b129e26568:

  scsi: qla2xxx: remove incorrect sparse #ifdef (2020-10-26 15:45:22 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git 
tags/gfs2-v5.10-rc1-fixes

for you to fetch changes up to da7d554f7c62d0c17c1ac3cc2586473c2d99f0bd:

  gfs2: Wake up when sd_glock_disposal becomes zero (2020-11-03 14:39:11 +0100)


Various gfs2 fixes


Alexander Aring (1):
  gfs2: Wake up when sd_glock_disposal becomes zero

Andreas Gruenbacher (1):
  gfs2: Don't call cancel_delayed_work_sync from within delete work function

Bob Peterson (6):
  gfs2: Free rd_bits later in gfs2_clear_rgrpd to fix use-after-free
  gfs2: Add missing truncate_inode_pages_final for sd_aspace
  gfs2: init_journal's undo directive should also undo the statfs inodes
  gfs2: Split up gfs2_meta_sync into inode and rgrp versions
  gfs2: don't initialize statfs_change inodes in spectator mode
  gfs2: check for live vs. read-only file system in gfs2_fitrim

 fs/gfs2/glock.c  |  3 ++-
 fs/gfs2/glops.c  | 56 +---
 fs/gfs2/glops.h  |  1 +
 fs/gfs2/inode.c  |  3 ++-
 fs/gfs2/lops.c   | 31 +
 fs/gfs2/lops.h   |  2 --
 fs/gfs2/ops_fstype.c | 14 -
 fs/gfs2/recovery.c   |  2 +-
 fs/gfs2/rgrp.c   |  5 -
 fs/gfs2/super.c  |  1 +
 10 files changed, 70 insertions(+), 48 deletions(-)

[GIT PULL] GFS2 changes for 5.10

2020-10-23 Thread Andreas Gruenbacher

Hi Linus,

could you please pull the following gfs2 changes for 5.10?

Thanks a lot,
Andreas

The following changes since commit bbf5c979011a099af5dc76498918ed7df445635b:

  Linux 5.9 (2020-10-11 14:15:50 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git 
tags/gfs2-for-5.10

for you to fetch changes up to bedb0f056faa94e953e7b3da5a77d25e0008364b:

  gfs2: Recover statfs info in journal head (2020-10-23 15:47:38 +0200)


Changes in gfs2:
* Use iomap for non-journaled buffered I/O.  This largely eliminates buffer
  heads on filesystems where the block size matches the page size.  Many thanks
  to Christoph Hellwig for this patch!
* Fixes for some more journaled data filesystem bugs, found by running xfstests
  with data journaling on for all files (chattr +j $MNT) (Bob Peterson).
* gfs2_evict_inode refactoring (Bob Peterson).
* Use the statfs data in the journal during recovery instead of reading it in
  from the local statfs inodes (Abhi Das).
* Several other minor fixes by various people.


Abhi Das (3):
  gfs2: Add fields for statfs info in struct gfs2_log_header_host
  gfs2: lookup local statfs inodes prior to journal recovery
  gfs2: Recover statfs info in journal head

Anant Thazhemadam (1):
  gfs2: add validation checks for size of superblock

Andreas Gruenbacher (2):
  gfs2: Make sure we don't miss any delayed withdraws
  gfs2: Ignore subsequent errors after withdraw in rgrp_go_sync

Andrew Price (1):
  gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump

Bob Peterson (20):
  gfs2: Fix bad comment for trans_drain
  gfs2: rename variable error to ret in gfs2_evict_inode
  gfs2: factor evict_unlinked_inode out of gfs2_evict_inode
  gfs2: further simplify gfs2_evict_inode with new func evict_should_delete
  gfs2: factor evict_linked_inode out of gfs2_evict_inode
  gfs2: simplify the logic in gfs2_evict_inode
  gfs2: call truncate_inode_pages_final for address space glocks
  gfs2: rename gfs2_write_full_page to gfs2_write_jdata_page, remove parm
  gfs2: add missing log_blocks trace points in gfs2_write_revokes
  gfs2: enhance log_blocks trace point to show log blocks free
  gfs2: Wipe jdata and ail1 in gfs2_journal_wipe, formerly gfs2_meta_wipe
  gfs2: make gfs2_ail1_empty_one return the count of active items
  gfs2: don't lock sd_ail_lock in gfs2_releasepage
  gfs2: Only set PageChecked if we have a transaction
  gfs2: simplify gfs2_block_map
  gfs2: Ignore journal log writes for jdata holes
  gfs2: eliminate GLF_QUEUED flag in favor of list_empty(gl_holders)
  gfs2: Fix comments to glock_hash_walk
  gfs2: Only access gl_delete for iopen glocks
  gfs2: Eliminate gl_vm

Christoph Hellwig (1):
  gfs2: use iomap for buffered I/O in ordered and writeback mode

Jamie Iles (1):
  gfs2: use-after-free in sysfs deregistration

Liu Shixin (1):
  gfs2: convert to use DEFINE_SEQ_ATTRIBUTE macro

 fs/gfs2/aops.c   |  68 
 fs/gfs2/bmap.c   |  62 ++-
 fs/gfs2/bmap.h   |   1 +
 fs/gfs2/glock.c  |  52 +---
 fs/gfs2/glops.c  |  36 +
 fs/gfs2/incore.h |  29 ---
 fs/gfs2/log.c|  89 -
 fs/gfs2/log.h|   2 +-
 fs/gfs2/lops.c   |   2 +-
 fs/gfs2/lops.h   |   1 +
 fs/gfs2/meta_io.c|  81 +--
 fs/gfs2/meta_io.h|   2 +-
 fs/gfs2/ops_fstype.c | 173 ++--
 fs/gfs2/recovery.c   | 108 +
 fs/gfs2/rgrp.c   |  19 ++---
 fs/gfs2/rgrp.h   |   2 +-
 fs/gfs2/super.c  | 220 +++
 fs/gfs2/super.h  |   5 ++
 fs/gfs2/sys.c|   5 +-
 fs/gfs2/trace_gfs2.h |   7 +-
 fs/gfs2/util.c   |   2 +-
 fs/gfs2/util.h   |  10 +++
 22 files changed, 675 insertions(+), 301 deletions(-)

Re: [PATCH] gfs2: use helper macro abs()

2020-10-19 Thread Andreas Gruenbacher

Hi,

On Mon, Oct 19, 2020 at 10:03 AM Xianting Tian  wrote:
> Use helper macro abs() to simplify the "x >= y || x <= -y" cmp.
>
> Signed-off-by: Xianting Tian 
> ---
>  fs/gfs2/super.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index 9f4d9e7be..05eb709de 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -304,7 +304,7 @@ void gfs2_statfs_change(struct gfs2_sbd *sdp, s64 total, 
> s64 free,
> if (sdp->sd_args.ar_statfs_percent) {
> x = 100 * l_sc->sc_free;
> y = m_sc->sc_free * sdp->sd_args.ar_statfs_percent;
> -   if (x >= y || x <= -y)
> +   if (abs(x) >= y)

x and y are signed values, so this really doesn't seem right.

> need_sync = 1;
> }
> spin_unlock(>sd_statfs_spin);
> --
> 2.17.1

Thanks,
Andreas

Re: [PATCH v2] fs: gfs2: add validation checks for size of superblock

2020-10-14 Thread Andreas Gruenbacher

Anant,

On Wed, Oct 14, 2020 at 6:31 PM Anant Thazhemadam
 wrote:
> In gfs2_check_sb(), no validation checks are performed with regards to
> the size of the superblock.
> syzkaller detected a slab-out-of-bounds bug that was primarily caused
> because the block size for a superblock was set to zero.
> A valid size for a superblock is a power of 2 between 512 and PAGE_SIZE.
> Performing validation checks and ensuring that the size of the superblock
> is valid fixes this bug.
>
> Reported-by: syzbot+af90d47a37376844e...@syzkaller.appspotmail.com
> Tested-by: syzbot+af90d47a37376844e...@syzkaller.appspotmail.com
> Suggested-by: Andrew Price 
> Signed-off-by: Anant Thazhemadam 
> ---
>
> Changes in v2:
>
> * Completely dropped the changes proposed in v1. Instead,
>   validity checks for superblock size have been introduced.
>   (Suggested by Andrew Price)
>
> * Addded a "Suggested-by" tag accrediting the patch idea to
>   Andrew. If there's any issue with that, please let me know.
>
> * Changed the commit header and commit message appropriately.
>
> * Updated "Reported-by" and "Tested-by" tags to the same instance
>   of the bug that was detected earlier (non consequential change).
>
>
>  fs/gfs2/ops_fstype.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
> index 6d18d2c91add..f0605fae2c4c 100644
> --- a/fs/gfs2/ops_fstype.c
> +++ b/fs/gfs2/ops_fstype.c
> @@ -169,6 +169,13 @@ static int gfs2_check_sb(struct gfs2_sbd *sdp, int 
> silent)
> return -EINVAL;
> }
>
> +   /* Check if the size of the block is valid - a power of 2 between 512 
> and  PAGE_SIZE */
> +   if (sb->sb_bsize < 512 || sb->sb_bsize > PAGE_SIZE || (sb->sb_bsize & 
> (sb->sb_bsize - 1))) {
> +   if (!silent)
> +   pr_warn("Invalid superblock size\n");
> +   return -EINVAL;
> +   }
> +

I'll add that to for-next.

Thanks,
Andreas

Re: [PATCH] gfs2: gfs2_read_sb: put gfs2_assert inside the loop

2020-10-05 Thread Andreas Gruenbacher

Hi Fox Chen,

On Sat, Oct 3, 2020 at 8:33 AM Fox Chen  wrote:
> Before this patch, gfs2_assert is put outside of the loop of
> sdp->sd_heightsize[x] calculation. When something goes wrong,
> x exceeds the size of GFS2_MAX_META_HEIGHT, it may already crash inside
> the loop when
>
> sdp->sd_heightsize[x] = space
>
> tries to reach the out-of-bound
> location, gfs2_assert won't help here.

that's true, but the smallest possible block size is 1024 bytes, and
with that, the height cannot grow bigger than 10. So the assert is
basically there only for documentation purposes.

Thanks,
Andreas

Re: [PATCH -next] gfs2: convert to use DEFINE_SEQ_ATTRIBUTE macro

2020-09-16 Thread Andreas Gruenbacher

Hello Liu,

On Wed, Sep 16, 2020 at 4:44 AM Liu Shixin  wrote:
> Use DEFINE_SEQ_ATTRIBUTE macro to simplify the code.

I've pushed this to for-next.

Thanks,
Andreas

Re: More filesystem need this fix (xfs: use MMAPLOCK around filemap_map_pages())

2020-09-14 Thread Andreas Gruenbacher

Could the xfs mmap lock documentation please be cleaned up? For
example, the xfs_ilock description says:

> * In addition to i_rwsem in the VFS inode, the xfs inode contains 2
> * multi-reader locks: i_mmap_lock and the i_lock.  This routine allows
> * various combinations of the locks to be obtained.

The field in struct xfs_inode is called i_mmaplock though, not
i_mmap_lock. In addition, struct inode has an i_mmap_rwsem field which
is also referred to as i_mmap_lock. If that isn't irritating enough.

Thanks,
Andreas

Re: [PATCH 4.19 142/206] gfs2: fix use-after-free on transaction ail lists

2020-09-11 Thread Andreas Gruenbacher

On Fri, Sep 11, 2020 at 2:09 PM Salvatore Bonaccorso  wrote:
> On Fri, Sep 11, 2020 at 01:58:16PM +0200, Greg Kroah-Hartman wrote:
> > Can you report this to the gfs2 developers?
>
> Sure! Bob Peterson and Andreas Gruenbacher were already on the
> recipient list but I forgot cluster-de...@redhat.com .
>
> I can send there a separate report as followup if still needed.

No need right now, we're looking, thanks.

Andreas

Re: [PATCH] iomap: Fix direct I/O write consistency check

2020-09-03 Thread Andreas Gruenbacher

On Thu, Sep 3, 2020 at 11:12 PM Eric Sandeen  wrote:
> On 9/3/20 11:56 AM, Andreas Gruenbacher wrote:
> > When a direct I/O write falls back to buffered I/O entirely, dio->size
> > will be 0 in iomap_dio_complete.  Function invalidate_inode_pages2_range
> > will try to invalidate the rest of the address space.
>
> (Because if ->size == 0 and offset == 0, then invalidating up to (0+0-1) will
> invalidate the entire range.)
>
>
> err = invalidate_inode_pages2_range(inode->i_mapping,
> offset >> PAGE_SHIFT,
> (offset + dio->size - 1) >> PAGE_SHIFT);
>
> so I guess this behavior is unique to writing to a hole at offset 0?
>
> FWIW, this same test yields the same results on ext3 when it falls back to
> buffered.

That's interesting. An ext3 formatted filesystem will invoke
dio_warn_stale_pagecache and thus log the error message, but the error
isn't immediately reported by the "pwrite 0 4k". It takes adding '-c
"fsync"' to the xfs_io command or similar to make it fail.

An ext4 formatted filesystem doesn't show any of these problems.

Thanks,
Andreas

> -Eric
>
> > If there are any
> > dirty pages in that range, the write will fail and a "Page cache
> > invalidation failure on direct I/O" error will be logged.
> >
> > On gfs2, this can be reproduced as follows:
> >
> >   xfs_io \
> > -c "open -ft foo" -c "pwrite 4k 4k" -c "close" \
> > -c "open -d foo" -c "pwrite 0 4k"
> >
> > Fix this by recognizing 0-length writes.
> >
> > Signed-off-by: Andreas Gruenbacher 
> > ---
> >  fs/iomap/direct-io.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> > index c1aafb2ab990..c9d6b4eecdb7 100644
> > --- a/fs/iomap/direct-io.c
> > +++ b/fs/iomap/direct-io.c
> > @@ -108,7 +108,7 @@ static ssize_t iomap_dio_complete(struct iomap_dio *dio)
> >* ->end_io() when necessary, otherwise a racing buffer read would 
> > cache
> >* zeros from unwritten extents.
> >*/
> > - if (!dio->error &&
> > + if (!dio->error && dio->size &&
> >   (dio->flags & IOMAP_DIO_WRITE) && inode->i_mapping->nrpages) {
> >   int err;
> >   err = invalidate_inode_pages2_range(inode->i_mapping,
> >
>

[PATCH] iomap: Fix direct I/O write consistency check

2020-09-03 Thread Andreas Gruenbacher

When a direct I/O write falls back to buffered I/O entirely, dio->size
will be 0 in iomap_dio_complete.  Function invalidate_inode_pages2_range
will try to invalidate the rest of the address space.  If there are any
dirty pages in that range, the write will fail and a "Page cache
invalidation failure on direct I/O" error will be logged.

On gfs2, this can be reproduced as follows:

  xfs_io \
-c "open -ft foo" -c "pwrite 4k 4k" -c "close" \
-c "open -d foo" -c "pwrite 0 4k"

Fix this by recognizing 0-length writes.

Signed-off-by: Andreas Gruenbacher 
---
 fs/iomap/direct-io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index c1aafb2ab990..c9d6b4eecdb7 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -108,7 +108,7 @@ static ssize_t iomap_dio_complete(struct iomap_dio *dio)
 * ->end_io() when necessary, otherwise a racing buffer read would cache
 * zeros from unwritten extents.
 */
-   if (!dio->error &&
+   if (!dio->error && dio->size &&
(dio->flags & IOMAP_DIO_WRITE) && inode->i_mapping->nrpages) {
int err;
err = invalidate_inode_pages2_range(inode->i_mapping,
-- 
2.26.2

[GIT PULL] gfs2: Fix memory leak on filesystem withdraw

2020-08-28 Thread Andreas Gruenbacher

Hi Linus,

could you please pull the following gfs2 fix?

We didn't detect this bug because we have slab merging on by default
(CONFIG_SLAB_MERGE_DEFAULT).  Adding "slub_nomerge" to the kernel
command line exposed the problem.

Thanks,
Andreas

The following changes since commit d012a7190fc1fd72ed48911e77ca97ba4521bccd:

  Linux 5.9-rc2 (2020-08-23 14:08:43 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git 
tags/gfs2-v5.9-rc2-fixes

for you to fetch changes up to 462582b99b6079a6fbcdfc65bac49f5c2a27cfff:

  gfs2: add some much needed cleanup for log flushes that fail (2020-08-24 
13:54:07 +0200)


Fix memory leak on filesystem withdraw


Bob Peterson (1):
  gfs2: add some much needed cleanup for log flushes that fail

 fs/gfs2/log.c   | 31 +++
 fs/gfs2/trans.c |  1 +
 2 files changed, 32 insertions(+)

Re: [PATCH 5.4 129/270] iomap: Make sure iomap_end is called after iomap_begin

2020-08-17 Thread Andreas Gruenbacher

Greg,

On Mon, Aug 17, 2020 at 6:06 PM Greg Kroah-Hartman
 wrote:
> From: Andreas Gruenbacher 
>
> [ Upstream commit 856473cd5d17dbbf3055710857c67a4af6d9fcc0 ]
>
> Make sure iomap_end is always called when iomap_begin succeeds.
>
> Without this fix, iomap_end won't be called when a filesystem's
> iomap_begin operation returns an invalid mapping, bypassing any
> unlocking done in iomap_end.  With this fix, the unlocking will still
> happen.
>
> This bug was found by Bob Peterson during code review.  It's unlikely
> that such iomap_begin bugs will survive to affect users, so backporting
> this fix seems unnecessary.

as said for 5.7 / 5.8 already, this patch doesn't need to be backported.

Thanks,
Andreas

> Fixes: ae259a9c8593 ("fs: introduce iomap infrastructure")
> Signed-off-by: Andreas Gruenbacher 
> Reviewed-by: Christoph Hellwig 
> Reviewed-by: Darrick J. Wong 
> Signed-off-by: Darrick J. Wong 
> Signed-off-by: Sasha Levin 
> ---
>  fs/iomap/apply.c | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/fs/iomap/apply.c b/fs/iomap/apply.c
> index 54c02aecf3cd8..c2281a6a7f320 100644
> --- a/fs/iomap/apply.c
> +++ b/fs/iomap/apply.c
> @@ -41,10 +41,14 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t 
> length, unsigned flags,
> ret = ops->iomap_begin(inode, pos, length, flags, );
> if (ret)
> return ret;
> -   if (WARN_ON(iomap.offset > pos))
> -   return -EIO;
> -   if (WARN_ON(iomap.length == 0))
> -   return -EIO;
> +   if (WARN_ON(iomap.offset > pos)) {
> +   written = -EIO;
> +   goto out;
> +   }
> +   if (WARN_ON(iomap.length == 0)) {
> +   written = -EIO;
> +   goto out;
> +   }
>
> /*
>  * Cut down the length to the one actually provided by the filesystem,
> @@ -60,6 +64,7 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, 
> unsigned flags,
>  */
> written = actor(inode, pos, length, data, );
>
> +out:
> /*
>  * Now the data has been copied, commit the range we've copied.  This
>  * should not fail unless the filesystem has had a fatal error.
> --
> 2.25.1
>
>
>

[GIT PULL] GFS2 changes for 5.9

2020-08-10 Thread Andreas Gruenbacher

Hi Linus,

please consider pulling the following gfs2 changes for 5.9.

Thanks a lot,
Andreas

The following changes since commit 11ba468877bb23f28956a35e896356252d63c983:

  Linux 5.8-rc5 (2020-07-12 16:34:50 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git 
tags/gfs2-for-5.9

for you to fetch changes up to e28c02b94f9e039beeb5c75198caf6e17b66c520:

  gfs2: When gfs2_dirty_inode gets a glock error, dump the glock (2020-08-07 
17:26:24 +0200)


Changes in gfs2:

- Make sure transactions won't be started recursively in gfs2_block_zero_range.
  (Bug introduced in 5.4 when switching to iomap_zero_range.)
- Fix a glock holder refcount leak introduced in the iopen glock locking
  scheme rework merged in 5.8.
- A few other small improvements (debugging, stack usage, comment fixes).


Andreas Gruenbacher (3):
  gfs2: Pass glock holder to gfs2_file_direct_{read,write}
  gfs2: Fix refcount leak in gfs2_glock_poke
  fs: Fix typo in comment

Bob Peterson (5):
  gfs2: Add some flags missing from glock output
  gfs2: Fix inaccurate comment
  gfs2: print details on transactions that aren't properly ended
  gfs2: Never call gfs2_block_zero_range with an open transaction
  gfs2: When gfs2_dirty_inode gets a glock error, dump the glock

 fs/gfs2/bmap.c | 69 ++
 fs/gfs2/file.c | 31 
 fs/gfs2/glock.c| 10 +++-
 fs/gfs2/log.c  |  2 +-
 fs/gfs2/super.c|  1 +
 fs/gfs2/trans.c| 29 +--
 include/linux/fs.h |  2 +-
 7 files changed, 82 insertions(+), 62 deletions(-)

Re: [GIT PULL] iomap: new code for 5.9-rc1

2020-08-05 Thread Andreas Gruenbacher

Hi Darrick,

On Wed, Aug 5, 2020 at 5:40 PM Darrick J. Wong  wrote:
> 
> Andreas Gruenbacher (1):
>   iomap: Make sure iomap_end is called after iomap_begin

that commit (d1b4f507d71de) contains the following garbage in the
commit message:

The message from this sender included one or more files
which could not be scanned for virus detection; do not
open these files unless you are certain of the sender's intent.

--

How did it come to that?

Thanks,
Andreas

Re: [PATCH] gfs2: Use kvmalloc instead of opencoded kmalloc/vmalloc

2020-08-03 Thread Andreas Gruenbacher

Hello,

On Sat, Aug 1, 2020 at 2:05 PM Denis Efremov  wrote:
> Please, skip this patch. I missed that kvmalloc checks (flags & GFP_KERNEL) 
> == GFP_KERNEL
> before calling vmalloc.

okay. Assuming that you'll follow up with a fixed version, please also
mention that this change supplements commit 3cdcf63ed2d1 ("GFS2: use
kvfree() instead of open-coding it").

Thanks,
Andreas

[GIT PULL] Fix gfs2 readahead deadlocks

2020-07-10 Thread Andreas Gruenbacher

Hi Linus,

could you please pull the following two commits to fix the gfs2
deadlocks resulting from the conversion to ->readahead in commit
d4388340ae0b ("fs: convert mpage_readpages to mpage_readahead")?

The first commit adds a new IOCB_NOIO flag to generic_file_read_iter.

In the previous version [1] which you've acked [2] and Matthew Willcox
has reviewed [3], ->readpage could still be called even when IOCB_NOIO
was set, so I've added an additional check above that call and I've
dropped the ack and reviewed-by tags.  In addition, bit 8 is now left
unused for the new IOCB_WAITQ flag in the block tree per Jens Axboe's
request.

[1] 
https://lore.kernel.org/linux-fsdevel/cahc6fu6lmr7m_8uhmb_77jupyno-kgcz-1ytlqya-ppqvvb...@mail.gmail.com/
[2] 
https://lore.kernel.org/linux-fsdevel/CAHk-=whbk-jym6_hbxbu6+gs7gtw3hwg4islncq0qtwshm6...@mail.gmail.com/
[3] 
https://lore.kernel.org/linux-fsdevel/20200703114108.ge25...@casper.infradead.org/

Thanks a lot,
Andreas


The following changes since commit dcb7fd82c75ee2d6e6f9d8cc71c52519ed52e258:

  Linux 5.8-rc4 (2020-07-05 16:20:22 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git 
tags/gfs2-v5.8-rc4.fixes

for you to fetch changes up to 20f82c38b18e3d17f9e40dea3a28f721fac4:

  gfs2: Rework read and page fault locking (2020-07-07 23:40:12 +0200)


Fix gfs2 readahead deadlocks

--------
Andreas Gruenbacher (2):
  fs: Add IOCB_NOIO flag for generic_file_read_iter
  gfs2: Rework read and page fault locking

 fs/gfs2/aops.c | 45 +
 fs/gfs2/file.c | 52 ++--
 include/linux/fs.h |  1 +
 mm/filemap.c   | 23 +--
 4 files changed, 73 insertions(+), 48 deletions(-)

[RFC v3 0/2] Fix gfs2 readahead deadlocks

2020-07-07 Thread Andreas Gruenbacher

In the previous version, we could end up calling ->readpage without
checking IOCB_NOIO.

Andreas Gruenbacher (2):
  fs: Add IOCB_NOIO flag for generic_file_read_iter
  gfs2: Rework read and page fault locking

 fs/gfs2/aops.c | 45 +--
 fs/gfs2/file.c | 52 --
 include/linux/fs.h |  1 +
 mm/filemap.c   | 23 ++--
 4 files changed, 73 insertions(+), 48 deletions(-)

-- 
2.26.2

[RFC v3 2/2] gfs2: Rework read and page fault locking

2020-07-07 Thread Andreas Gruenbacher

So far, gfs2 has taken the inode glocks inside the ->readpage and
->readahead address space operations.  Since commit d4388340ae0b ("fs:
convert mpage_readpages to mpage_readahead"), gfs2_readahead is passed
the pages to read ahead locked.  With that, the current holder of the
inode glock may be trying to lock one of those pages while
gfs2_readahead is trying to take the inode glock, resulting in a
deadlock.

Fix that by moving the lock taking to the higher-level ->read_iter file
and ->fault vm operations.  This also gets rid of an ugly lock inversion
workaround in gfs2_readpage.

The cache consistency model of filesystems like gfs2 is such that if
data is found in the page cache, the data is up to date and can be used
without taking any filesystem locks.  If a page is not cached,
filesystem locks must be taken before populating the page cache.

To avoid taking the inode glock when the data is already cached,
gfs2_file_read_iter first tries to read the data with the IOCB_NOIO flag
set.  If that fails, the inode glock is taken and the operation is
retried with the IOCB_NOIO flag cleared.

Signed-off-by: Andreas Gruenbacher 
---
 fs/gfs2/aops.c | 45 +--
 fs/gfs2/file.c | 52 --
 2 files changed, 51 insertions(+), 46 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 72c9560f4467..68cd700a2719 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -468,21 +468,10 @@ static int stuffed_readpage(struct gfs2_inode *ip, struct 
page *page)
 }
 
 
-/**
- * __gfs2_readpage - readpage
- * @file: The file to read a page for
- * @page: The page to read
- *
- * This is the core of gfs2's readpage. It's used by the internal file
- * reading code as in that case we already hold the glock. Also it's
- * called by gfs2_readpage() once the required lock has been granted.
- */
-
 static int __gfs2_readpage(void *file, struct page *page)
 {
struct gfs2_inode *ip = GFS2_I(page->mapping->host);
struct gfs2_sbd *sdp = GFS2_SB(page->mapping->host);
-
int error;
 
if (i_blocksize(page->mapping->host) == PAGE_SIZE &&
@@ -505,36 +494,11 @@ static int __gfs2_readpage(void *file, struct page *page)
  * gfs2_readpage - read a page of a file
  * @file: The file to read
  * @page: The page of the file
- *
- * This deals with the locking required. We have to unlock and
- * relock the page in order to get the locking in the right
- * order.
  */
 
 static int gfs2_readpage(struct file *file, struct page *page)
 {
-   struct address_space *mapping = page->mapping;
-   struct gfs2_inode *ip = GFS2_I(mapping->host);
-   struct gfs2_holder gh;
-   int error;
-
-   unlock_page(page);
-   gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, );
-   error = gfs2_glock_nq();
-   if (unlikely(error))
-   goto out;
-   error = AOP_TRUNCATED_PAGE;
-   lock_page(page);
-   if (page->mapping == mapping && !PageUptodate(page))
-   error = __gfs2_readpage(file, page);
-   else
-   unlock_page(page);
-   gfs2_glock_dq();
-out:
-   gfs2_holder_uninit();
-   if (error && error != AOP_TRUNCATED_PAGE)
-   lock_page(page);
-   return error;
+   return __gfs2_readpage(file, page);
 }
 
 /**
@@ -598,16 +562,9 @@ static void gfs2_readahead(struct readahead_control *rac)
 {
struct inode *inode = rac->mapping->host;
struct gfs2_inode *ip = GFS2_I(inode);
-   struct gfs2_holder gh;
 
-   gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, );
-   if (gfs2_glock_nq())
-   goto out_uninit;
if (!gfs2_is_stuffed(ip))
mpage_readahead(rac, gfs2_block_map);
-   gfs2_glock_dq();
-out_uninit:
-   gfs2_holder_uninit();
 }
 
 /**
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index fe305e4bfd37..bebde537ac8c 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -558,8 +558,29 @@ static vm_fault_t gfs2_page_mkwrite(struct vm_fault *vmf)
return block_page_mkwrite_return(ret);
 }
 
+static vm_fault_t gfs2_fault(struct vm_fault *vmf)
+{
+   struct inode *inode = file_inode(vmf->vma->vm_file);
+   struct gfs2_inode *ip = GFS2_I(inode);
+   struct gfs2_holder gh;
+   vm_fault_t ret;
+   int err;
+
+   gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, );
+   err = gfs2_glock_nq();
+   if (err) {
+   ret = block_page_mkwrite_return(err);
+   goto out_uninit;
+   }
+   ret = filemap_fault(vmf);
+   gfs2_glock_dq();
+out_uninit:
+   gfs2_holder_uninit();
+   return ret;
+}
+
 static const struct vm_operations_struct gfs2_vm_ops = {
-   .fault = filemap_fault,
+   .fault = gfs2_fault,
.map_pages = filemap_map_pages,
.page_mkwrite = gfs2_page_mkwrite,
 };
@@ -824,6 +845,9 @@ static ssize_t gfs2_file_

[RFC v3 1/2] fs: Add IOCB_NOIO flag for generic_file_read_iter

2020-07-07 Thread Andreas Gruenbacher

Add an IOCB_NOIO flag that indicates to generic_file_read_iter that it
shouldn't trigger any filesystem I/O for the actual request or for
readahead.  This allows to do tentative reads out of the page cache as
some filesystems allow, and to take the appropriate locks and retry the
reads only if the requested pages are not cached.

Signed-off-by: Andreas Gruenbacher 
---
 include/linux/fs.h |  1 +
 mm/filemap.c   | 23 +--
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3f881a892ea7..1ab2ea19e883 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -315,6 +315,7 @@ enum rw_hint {
 #define IOCB_SYNC  (1 << 5)
 #define IOCB_WRITE (1 << 6)
 #define IOCB_NOWAIT(1 << 7)
+#define IOCB_NOIO  (1 << 8)
 
 struct kiocb {
struct file *ki_filp;
diff --git a/mm/filemap.c b/mm/filemap.c
index f0ae9a6308cb..385759c4ce4b 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2028,7 +2028,7 @@ ssize_t generic_file_buffered_read(struct kiocb *iocb,
 
page = find_get_page(mapping, index);
if (!page) {
-   if (iocb->ki_flags & IOCB_NOWAIT)
+   if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_NOIO))
goto would_block;
page_cache_sync_readahead(mapping,
ra, filp,
@@ -2038,6 +2038,10 @@ ssize_t generic_file_buffered_read(struct kiocb *iocb,
goto no_cached_page;
}
if (PageReadahead(page)) {
+   if (iocb->ki_flags & IOCB_NOIO) {
+   put_page(page);
+   goto out;
+   }
page_cache_async_readahead(mapping,
ra, filp, page,
index, last_index - index);
@@ -2160,6 +2164,11 @@ ssize_t generic_file_buffered_read(struct kiocb *iocb,
}
 
 readpage:
+   if (iocb->ki_flags & IOCB_NOIO) {
+   unlock_page(page);
+   put_page(page);
+   goto would_block;
+   }
/*
 * A previous I/O error may have been due to temporary
 * failures, eg. multipath errors.
@@ -2249,9 +2258,19 @@ EXPORT_SYMBOL_GPL(generic_file_buffered_read);
  *
  * This is the "read_iter()" routine for all filesystems
  * that can use the page cache directly.
+ *
+ * The IOCB_NOWAIT flag in iocb->ki_flags indicates that -EAGAIN shall
+ * be returned when no data can be read without waiting for I/O requests
+ * to complete; it doesn't prevent readahead.
+ *
+ * The IOCB_NOIO flag in iocb->ki_flags indicates that no new I/O
+ * requests shall be made for the read or for readahead.  When no data
+ * can be read, -EAGAIN shall be returned.  When readahead would be
+ * triggered, a partial, possibly empty read shall be returned.
+ *
  * Return:
  * * number of bytes copied, even for partial reads
- * * negative error code if nothing was read
+ * * negative error code (or 0 if IOCB_NOIO) if nothing was read
  */
 ssize_t
 generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
-- 
2.26.2

Re: [RFC 2/4] fs: Add IOCB_NOIO flag for generic_file_read_iter

2020-07-07 Thread Andreas Gruenbacher

On Thu, Jul 2, 2020 at 8:06 PM Linus Torvalds
 wrote:
> On Thu, Jul 2, 2020 at 9:51 AM Andreas Gruenbacher  
> wrote:
> > Add an IOCB_NOIO flag that indicates to generic_file_read_iter that it
> > shouldn't trigger any filesystem I/O for the actual request or for
> > readahead.  This allows to do tentative reads out of the page cache as
> > some filesystems allow, and to take the appropriate locks and retry the
> > reads only if the requested pages are not cached.
>
> This looks sane to me, except for this part:
> > if (!PageUptodate(page)) {
> > -   if (iocb->ki_flags & IOCB_NOWAIT) {
> > +   if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_NOIO)) {
> > put_page(page);
> > goto would_block;
> > }
>
> This path doesn't actually initiate reads at all - it waits for
> existing reads to finish.
>
> So I think it should only check for IOCB_NOWAIT.

It turns out that label readpage is reachable from here via goto
page_not_up_to_date / goto page_not_up_to_date_locked. So IOCB_NOIO
needs to be checked somewhere. I'll send an update.

Andreas

Re: [RFC v2 1/2] fs: Add IOCB_NOIO flag for generic_file_read_iter

2020-07-05 Thread Andreas Gruenbacher

On Fri, Jul 3, 2020 at 1:41 PM Matthew Wilcox  wrote:
>
> On Fri, Jul 03, 2020 at 11:53:24AM +0200, Andreas Gruenbacher wrote:
> > Add an IOCB_NOIO flag that indicates to generic_file_read_iter that it
> > shouldn't trigger any filesystem I/O for the actual request or for
> > readahead.  This allows to do tentative reads out of the page cache as
> > some filesystems allow, and to take the appropriate locks and retry the
> > reads only if the requested pages are not cached.
> >
> > Signed-off-by: Andreas Gruenbacher 
>
> Reviewed-by: Matthew Wilcox (Oracle) 
>
> > @@ -2249,9 +2253,18 @@ EXPORT_SYMBOL_GPL(generic_file_buffered_read);
> >   *
> >   * This is the "read_iter()" routine for all filesystems
> >   * that can use the page cache directly.
> > + *
> > + * The IOCB_NOWAIT flag in iocb->ki_flags indicates that -EAGAIN shall
> > + * be returned when no data can be read without waiting for I/O requests
> > + * to complete; it doesn't prevent readahead.
> > + *
> > + * The IOCB_NOIO flag in iocb->ki_flags indicates that -EAGAIN shall be
> > + * returned when no data can be read without issuing new I/O requests,
> > + * and 0 shall be returned when readhead would have been triggered.
>
> s/shall/may/ -- if we read a previous page then hit a readahead page,
> we'll return a positive value.  If the first page we hit is a readahead
> page, then yes, we'll return zero.

How about this?

 * The IOCB_NOIO flag in iocb->ki_flags indicates that no new I/O
 * requests shall be made for the read or for readahead.  When no data
 * can be read, -EAGAIN shall be returned.  When readahead would be
 * triggered, a short read (possibly of length 0) shall be returned.

> Again, I'm happy for the patch to go in as-is without this nitpick.

Thanks,
Andreas

[GIT PULL] Additional gfs2 fixes for 5.8

2020-07-03 Thread Andreas Gruenbacher

Hi Linus,

could you please pull the following additional gfs2 fixes?

These don't conflict with the IOCB_NOIO / readahead deadlock fixes.

Thanks a lot,
Andreas

The following changes since commit 9ebcfadb0610322ac537dd7aa5d9cbc2b2894c68:

  Linux 5.8-rc3 (2020-06-28 15:00:24 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git 
tags/gfs2-v5.8-rc3.fixes

for you to fetch changes up to c860f8ffbea8924de05a281b937128773d30a77c:

  gfs2: The freeze glock should never be frozen (2020-07-03 12:05:35 +0200)


Various gfs2 fixes


Andreas Gruenbacher (2):
  gfs2: Don't return NULL from gfs2_inode_lookup
  gfs2: Don't sleep during glock hash walk

Bob Peterson (6):
  gfs2: fix trans slab error when withdraw occurs inside log_flush
  gfs2: eliminate GIF_ORDERED in favor of list_empty
  gfs2: freeze should work on read-only mounts
  gfs2: read-only mounts should grab the sd_freeze_gl glock
  gfs2: When freezing gfs2, use GL_EXACT and not GL_NOCACHE
  gfs2: The freeze glock should never be frozen

 fs/gfs2/glock.c  |  5 -
 fs/gfs2/glops.c  | 10 ++
 fs/gfs2/incore.h |  1 -
 fs/gfs2/inode.c  |  3 ++-
 fs/gfs2/log.c| 25 +++--
 fs/gfs2/log.h|  4 ++--
 fs/gfs2/main.c   |  1 +
 fs/gfs2/ops_fstype.c | 13 -
 fs/gfs2/recovery.c   |  4 ++--
 fs/gfs2/super.c  | 20 ++--
 10 files changed, 58 insertions(+), 28 deletions(-)

[RFC v2 0/2] Fix gfs2 readahead deadlocks

2020-07-03 Thread Andreas Gruenbacher

Here's an improved version.  If the IOCB_NOIO flag can be added right
away, we can just fix the locking in gfs2.

Thanks,
Andreas

Andreas Gruenbacher (2):
  fs: Add IOCB_NOIO flag for generic_file_read_iter
  gfs2: Rework read and page fault locking

 fs/gfs2/aops.c | 45 +--
 fs/gfs2/file.c | 52 --
 include/linux/fs.h |  1 +
 mm/filemap.c   | 17 +--
 4 files changed, 67 insertions(+), 48 deletions(-)

-- 
2.26.2

[RFC v2 1/2] fs: Add IOCB_NOIO flag for generic_file_read_iter

2020-07-03 Thread Andreas Gruenbacher

Add an IOCB_NOIO flag that indicates to generic_file_read_iter that it
shouldn't trigger any filesystem I/O for the actual request or for
readahead.  This allows to do tentative reads out of the page cache as
some filesystems allow, and to take the appropriate locks and retry the
reads only if the requested pages are not cached.

Signed-off-by: Andreas Gruenbacher 
---
 include/linux/fs.h |  1 +
 mm/filemap.c   | 17 +++--
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3f881a892ea7..1ab2ea19e883 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -315,6 +315,7 @@ enum rw_hint {
 #define IOCB_SYNC  (1 << 5)
 #define IOCB_WRITE (1 << 6)
 #define IOCB_NOWAIT(1 << 7)
+#define IOCB_NOIO  (1 << 8)
 
 struct kiocb {
struct file *ki_filp;
diff --git a/mm/filemap.c b/mm/filemap.c
index f0ae9a6308cb..22f7ff2d369e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2028,7 +2028,7 @@ ssize_t generic_file_buffered_read(struct kiocb *iocb,
 
page = find_get_page(mapping, index);
if (!page) {
-   if (iocb->ki_flags & IOCB_NOWAIT)
+   if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_NOIO))
goto would_block;
page_cache_sync_readahead(mapping,
ra, filp,
@@ -2038,6 +2038,10 @@ ssize_t generic_file_buffered_read(struct kiocb *iocb,
goto no_cached_page;
}
if (PageReadahead(page)) {
+   if (iocb->ki_flags & IOCB_NOIO) {
+   put_page(page);
+   goto out;
+   }
page_cache_async_readahead(mapping,
ra, filp, page,
index, last_index - index);
@@ -2249,9 +2253,18 @@ EXPORT_SYMBOL_GPL(generic_file_buffered_read);
  *
  * This is the "read_iter()" routine for all filesystems
  * that can use the page cache directly.
+ *
+ * The IOCB_NOWAIT flag in iocb->ki_flags indicates that -EAGAIN shall
+ * be returned when no data can be read without waiting for I/O requests
+ * to complete; it doesn't prevent readahead.
+ *
+ * The IOCB_NOIO flag in iocb->ki_flags indicates that -EAGAIN shall be
+ * returned when no data can be read without issuing new I/O requests,
+ * and 0 shall be returned when readhead would have been triggered.
+ *
  * Return:
  * * number of bytes copied, even for partial reads
- * * negative error code if nothing was read
+ * * negative error code (or 0 if IOCB_NOIO) if nothing was read
  */
 ssize_t
 generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
-- 
2.26.2

[RFC v2 2/2] gfs2: Rework read and page fault locking

2020-07-03 Thread Andreas Gruenbacher

So far, gfs2 has taken the inode glocks inside the ->readpage and
->readahead address space operations.  Since commit d4388340ae0b ("fs:
convert mpage_readpages to mpage_readahead"), gfs2_readahead is passed
the pages to read ahead locked.  With that, the current holder of the
inode glock may be trying to lock one of those pages while
gfs2_readahead is trying to take the inode glock, resulting in a
deadlock.

Fix that by moving the lock taking to the higher-level ->read_iter file
and ->fault vm operations.  This also gets rid of an ugly lock inversion
workaround in gfs2_readpage.

The cache consistency model of filesystems like gfs2 is such that if
data is found in the page cache, the data is up to date and can be used
without taking any filesystem locks.  If a page is not cached,
filesystem locks must be taken before populating the page cache.

To avoid taking the inode glock when the data is already cached,
gfs2_file_read_iter first tries to read the data with the IOCB_NOIO flag
set.  If that fails, the inode glock is taken and the operation is
retried with the IOCB_NOIO flag cleared.

Signed-off-by: Andreas Gruenbacher 
---
 fs/gfs2/aops.c | 45 +--
 fs/gfs2/file.c | 52 --
 2 files changed, 51 insertions(+), 46 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 72c9560f4467..68cd700a2719 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -468,21 +468,10 @@ static int stuffed_readpage(struct gfs2_inode *ip, struct 
page *page)
 }
 
 
-/**
- * __gfs2_readpage - readpage
- * @file: The file to read a page for
- * @page: The page to read
- *
- * This is the core of gfs2's readpage. It's used by the internal file
- * reading code as in that case we already hold the glock. Also it's
- * called by gfs2_readpage() once the required lock has been granted.
- */
-
 static int __gfs2_readpage(void *file, struct page *page)
 {
struct gfs2_inode *ip = GFS2_I(page->mapping->host);
struct gfs2_sbd *sdp = GFS2_SB(page->mapping->host);
-
int error;
 
if (i_blocksize(page->mapping->host) == PAGE_SIZE &&
@@ -505,36 +494,11 @@ static int __gfs2_readpage(void *file, struct page *page)
  * gfs2_readpage - read a page of a file
  * @file: The file to read
  * @page: The page of the file
- *
- * This deals with the locking required. We have to unlock and
- * relock the page in order to get the locking in the right
- * order.
  */
 
 static int gfs2_readpage(struct file *file, struct page *page)
 {
-   struct address_space *mapping = page->mapping;
-   struct gfs2_inode *ip = GFS2_I(mapping->host);
-   struct gfs2_holder gh;
-   int error;
-
-   unlock_page(page);
-   gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, );
-   error = gfs2_glock_nq();
-   if (unlikely(error))
-   goto out;
-   error = AOP_TRUNCATED_PAGE;
-   lock_page(page);
-   if (page->mapping == mapping && !PageUptodate(page))
-   error = __gfs2_readpage(file, page);
-   else
-   unlock_page(page);
-   gfs2_glock_dq();
-out:
-   gfs2_holder_uninit();
-   if (error && error != AOP_TRUNCATED_PAGE)
-   lock_page(page);
-   return error;
+   return __gfs2_readpage(file, page);
 }
 
 /**
@@ -598,16 +562,9 @@ static void gfs2_readahead(struct readahead_control *rac)
 {
struct inode *inode = rac->mapping->host;
struct gfs2_inode *ip = GFS2_I(inode);
-   struct gfs2_holder gh;
 
-   gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, );
-   if (gfs2_glock_nq())
-   goto out_uninit;
if (!gfs2_is_stuffed(ip))
mpage_readahead(rac, gfs2_block_map);
-   gfs2_glock_dq();
-out_uninit:
-   gfs2_holder_uninit();
 }
 
 /**
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index fe305e4bfd37..bebde537ac8c 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -558,8 +558,29 @@ static vm_fault_t gfs2_page_mkwrite(struct vm_fault *vmf)
return block_page_mkwrite_return(ret);
 }
 
+static vm_fault_t gfs2_fault(struct vm_fault *vmf)
+{
+   struct inode *inode = file_inode(vmf->vma->vm_file);
+   struct gfs2_inode *ip = GFS2_I(inode);
+   struct gfs2_holder gh;
+   vm_fault_t ret;
+   int err;
+
+   gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, );
+   err = gfs2_glock_nq();
+   if (err) {
+   ret = block_page_mkwrite_return(err);
+   goto out_uninit;
+   }
+   ret = filemap_fault(vmf);
+   gfs2_glock_dq();
+out_uninit:
+   gfs2_holder_uninit();
+   return ret;
+}
+
 static const struct vm_operations_struct gfs2_vm_ops = {
-   .fault = filemap_fault,
+   .fault = gfs2_fault,
.map_pages = filemap_map_pages,
.page_mkwrite = gfs2_page_mkwrite,
 };
@@ -824,6 +845,9 @@ static ssize_t gfs2_file_

Re: [RFC 2/4] fs: Add IOCB_NOIO flag for generic_file_read_iter

2020-07-03 Thread Andreas Gruenbacher

On Thu, Jul 2, 2020 at 10:18 PM Linus Torvalds
 wrote:
> On Thu, Jul 2, 2020 at 12:58 PM Andreas Gruenbacher  
> wrote:
> > > Of course, if you want to avoid both new reads to be submitted _and_
> > > avoid waiting for existing pending reads, you should just set both
> > > flags, and you get the semantics you want. So for your case, this may
> > > not make any difference.
> >
> > Indeed, in the gfs2 case, waiting for existing pending reads should be
> > fine. I'll send an update after some testing.
>
> Do note that "wait for pending reads" very much does imply "wait for
> those reads to _complete_".
>
> And maybe the IO completion handler itself ends up having to finalize
> something and take the lock to do that?
>
> So in that case, even just "waiting" will cause a deadlock. Not
> because the waiter itself needs the lock, but because the thing it
> waits for might possibly need it.
>
> But in many simple cases, IO completion shouldn't need any filesystem
> locks. I just don't know the gfs2 code at all, so I'm not even going
> to guess. I just wanted to mention it.

Yes, that makes sense. Luckily gfs2 doesn't do any such locking on IO
completion.

Thanks,
Andreas

Re: [RFC 2/4] fs: Add IOCB_NOIO flag for generic_file_read_iter

2020-07-02 Thread Andreas Gruenbacher

On Thu, Jul 2, 2020 at 8:06 PM Linus Torvalds
 wrote:
> On Thu, Jul 2, 2020 at 9:51 AM Andreas Gruenbacher  
> wrote:
> > Add an IOCB_NOIO flag that indicates to generic_file_read_iter that it
> > shouldn't trigger any filesystem I/O for the actual request or for
> > readahead.  This allows to do tentative reads out of the page cache as
> > some filesystems allow, and to take the appropriate locks and retry the
> > reads only if the requested pages are not cached.
>
> This looks sane to me, except for this part:
> > if (!PageUptodate(page)) {
> > -   if (iocb->ki_flags & IOCB_NOWAIT) {
> > +   if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_NOIO)) {
> > put_page(page);
> > goto would_block;
> > }
>
> This path doesn't actually initiate reads at all - it waits for
> existing reads to finish.
>
> So I think it should only check for IOCB_NOWAIT.
>
> Of course, if you want to avoid both new reads to be submitted _and_
> avoid waiting for existing pending reads, you should just set both
> flags, and you get the semantics you want. So for your case, this may
> not make any difference.

Indeed, in the gfs2 case, waiting for existing pending reads should be
fine. I'll send an update after some testing.

Thanks,
Andreas

Re: [RFC 0/4] Fix gfs2 readahead deadlocks

2020-07-02 Thread Andreas Gruenbacher

On Thu, Jul 2, 2020 at 8:11 PM Linus Torvalds
 wrote:
> On Thu, Jul 2, 2020 at 9:51 AM Andreas Gruenbacher  
> wrote:
> >
> > Of this patch queue, either only the first patch or all four patches can
> > be applied to fix gfs2's current issues in 5.8.  Please let me know what
> > you think.
>
> I think the IOCB_NOIO flag looks fine (apart from the nit I pointed
> out), and we could do that.

Ok, that's a step forward.

> However, is the "revert and reinstate" looks odd. Is the reinstate so
> different from the original that it makes sense to do that way?
>
> Or was it done that way only to give the choice of just doing the revert?
>
> Because if so, I think I'd rather just see a "fix" rather than
> "revert+reinstate".

I only did the "revert and reinstate" so that the revert alone will
give us a working gfs2 in 5.8. If there's agreement to add the
IOCB_NOIO flag, then we can just fix gfs2 (basically
https://lore.kernel.org/linux-fsdevel/20200619093916.1081129-3-agrue...@redhat.com/
with IOCB_CACHED renamed to IOCB_NOIO).

Thanks,
Andreas

[RFC 4/4] gfs2: Reinstate readahead conversion

2020-07-02 Thread Andreas Gruenbacher

Now that the locking order in gfs2 is fixed, switch back to using the
->readahead address space operation.  With that, mpage_readpages is
unused and can be removed.

Signed-off-by: Andreas Gruenbacher 
---
 fs/gfs2/aops.c| 19 +--
 fs/mpage.c| 75 ---
 include/linux/mpage.h |  2 --
 3 files changed, 10 insertions(+), 86 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 28f097636e78..68cd700a2719 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -541,7 +541,7 @@ int gfs2_internal_read(struct gfs2_inode *ip, char *buf, 
loff_t *pos,
 }
 
 /**
- * gfs2_readpages - Read a bunch of pages at once
+ * gfs2_readahead - Read a bunch of pages at once
  * @file: The file to read from
  * @mapping: Address space info
  * @pages: List of pages to read
@@ -554,16 +554,17 @@ int gfs2_internal_read(struct gfs2_inode *ip, char *buf, 
loff_t *pos,
  *obviously not something we'd want to do on too regular a basis.
  *Any I/O we ignore at this time will be done via readpage later.
  * 2. We don't handle stuffed files here we let readpage do the honours.
- * 3. mpage_readpages() does most of the heavy lifting in the common case.
+ * 3. mpage_readahead() does most of the heavy lifting in the common case.
  * 4. gfs2_block_map() is relied upon to set BH_Boundary in the right places.
  */
 
-static int gfs2_readpages(struct file *file, struct address_space *mapping,
- struct list_head *pages, unsigned nr_pages)
+static void gfs2_readahead(struct readahead_control *rac)
 {
-   if (!gfs2_is_stuffed(GFS2_I(mapping->host)))
-   return mpage_readpages(mapping, pages, nr_pages, 
gfs2_block_map);
-   return 0;
+   struct inode *inode = rac->mapping->host;
+   struct gfs2_inode *ip = GFS2_I(inode);
+
+   if (!gfs2_is_stuffed(ip))
+   mpage_readahead(rac, gfs2_block_map);
 }
 
 /**
@@ -782,7 +783,7 @@ static const struct address_space_operations gfs2_aops = {
.writepage = gfs2_writepage,
.writepages = gfs2_writepages,
.readpage = gfs2_readpage,
-   .readpages = gfs2_readpages,
+   .readahead = gfs2_readahead,
.bmap = gfs2_bmap,
.invalidatepage = gfs2_invalidatepage,
.releasepage = gfs2_releasepage,
@@ -796,7 +797,7 @@ static const struct address_space_operations 
gfs2_jdata_aops = {
.writepage = gfs2_jdata_writepage,
.writepages = gfs2_jdata_writepages,
.readpage = gfs2_readpage,
-   .readpages = gfs2_readpages,
+   .readahead = gfs2_readahead,
.set_page_dirty = jdata_set_page_dirty,
.bmap = gfs2_bmap,
.invalidatepage = gfs2_invalidatepage,
diff --git a/fs/mpage.c b/fs/mpage.c
index 5243a065a062..830e6cc2a9e7 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -396,81 +396,6 @@ void mpage_readahead(struct readahead_control *rac, 
get_block_t get_block)
 }
 EXPORT_SYMBOL(mpage_readahead);
 
-/**
- * mpage_readpages - populate an address space with some pages & start reads 
against them
- * @mapping: the address_space
- * @pages: The address of a list_head which contains the target pages.  These
- *   pages have their ->index populated and are otherwise uninitialised.
- *   The page at @pages->prev has the lowest file offset, and reads should be
- *   issued in @pages->prev to @pages->next order.
- * @nr_pages: The number of pages at *@pages
- * @get_block: The filesystem's block mapper function.
- *
- * This function walks the pages and the blocks within each page, building and
- * emitting large BIOs.
- *
- * If anything unusual happens, such as:
- *
- * - encountering a page which has buffers
- * - encountering a page which has a non-hole after a hole
- * - encountering a page with non-contiguous blocks
- *
- * then this code just gives up and calls the buffer_head-based read function.
- * It does handle a page which has holes at the end - that is a common case:
- * the end-of-file on blocksize < PAGE_SIZE setups.
- *
- * BH_Boundary explanation:
- *
- * There is a problem.  The mpage read code assembles several pages, gets all
- * their disk mappings, and then submits them all.  That's fine, but obtaining
- * the disk mappings may require I/O.  Reads of indirect blocks, for example.
- *
- * So an mpage read of the first 16 blocks of an ext2 file will cause I/O to be
- * submitted in the following order:
- *
- * 12 0 1 2 3 4 5 6 7 8 9 10 11 13 14 15 16
- *
- * because the indirect block has to be read to get the mappings of blocks
- * 13,14,15,16.  Obviously, this impacts performance.
- *
- * So what we do it to allow the filesystem's get_block() function to set
- * BH_Boundary when it maps block 11.  BH_Boundary says: mapping of the block
- * after this one will require I/O against a block which is probably close to
- * this one.  So you should push what I/O you have currently accumulated.
- *
- * This all causes the disk requests to

[RFC 1/4] gfs2: Revert readahead conversion

2020-07-02 Thread Andreas Gruenbacher

Commit d4388340ae0b ("fs: convert mpage_readpages to mpage_readahead")
converted gfs2 and other filesystems from the ->readpages to the
->readahead address space operation.  Other than ->readpages,
->readahead is passed the pages to readahead locked.  Due to problems in
the current page locking strategy, this is now causing deadlocks in
gfs2.

Fix this by reinstating mpage_readpages from before commit d4388340ae0b
and by converting gfs2 back to ->readpages.

Signed-off-by: Andreas Gruenbacher 
---
 fs/gfs2/aops.c| 23 -
 fs/mpage.c| 75 +++
 include/linux/mpage.h |  2 ++
 3 files changed, 92 insertions(+), 8 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 72c9560f4467..786c1ce8f030 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -577,7 +577,7 @@ int gfs2_internal_read(struct gfs2_inode *ip, char *buf, 
loff_t *pos,
 }
 
 /**
- * gfs2_readahead - Read a bunch of pages at once
+ * gfs2_readpages - Read a bunch of pages at once
  * @file: The file to read from
  * @mapping: Address space info
  * @pages: List of pages to read
@@ -590,24 +590,31 @@ int gfs2_internal_read(struct gfs2_inode *ip, char *buf, 
loff_t *pos,
  *obviously not something we'd want to do on too regular a basis.
  *Any I/O we ignore at this time will be done via readpage later.
  * 2. We don't handle stuffed files here we let readpage do the honours.
- * 3. mpage_readahead() does most of the heavy lifting in the common case.
+ * 3. mpage_readpages() does most of the heavy lifting in the common case.
  * 4. gfs2_block_map() is relied upon to set BH_Boundary in the right places.
  */
 
-static void gfs2_readahead(struct readahead_control *rac)
+static int gfs2_readpages(struct file *file, struct address_space *mapping,
+ struct list_head *pages, unsigned nr_pages)
 {
-   struct inode *inode = rac->mapping->host;
+   struct inode *inode = mapping->host;
struct gfs2_inode *ip = GFS2_I(inode);
+   struct gfs2_sbd *sdp = GFS2_SB(inode);
struct gfs2_holder gh;
+   int ret;
 
gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, );
-   if (gfs2_glock_nq())
+   ret = gfs2_glock_nq();
+   if (unlikely(ret))
goto out_uninit;
if (!gfs2_is_stuffed(ip))
-   mpage_readahead(rac, gfs2_block_map);
+   ret = mpage_readpages(mapping, pages, nr_pages, gfs2_block_map);
gfs2_glock_dq();
 out_uninit:
gfs2_holder_uninit();
+   if (unlikely(gfs2_withdrawn(sdp)))
+   ret = -EIO;
+   return ret;
 }
 
 /**
@@ -826,7 +833,7 @@ static const struct address_space_operations gfs2_aops = {
.writepage = gfs2_writepage,
.writepages = gfs2_writepages,
.readpage = gfs2_readpage,
-   .readahead = gfs2_readahead,
+   .readpages = gfs2_readpages,
.bmap = gfs2_bmap,
.invalidatepage = gfs2_invalidatepage,
.releasepage = gfs2_releasepage,
@@ -840,7 +847,7 @@ static const struct address_space_operations 
gfs2_jdata_aops = {
.writepage = gfs2_jdata_writepage,
.writepages = gfs2_jdata_writepages,
.readpage = gfs2_readpage,
-   .readahead = gfs2_readahead,
+   .readpages = gfs2_readpages,
.set_page_dirty = jdata_set_page_dirty,
.bmap = gfs2_bmap,
.invalidatepage = gfs2_invalidatepage,
diff --git a/fs/mpage.c b/fs/mpage.c
index 830e6cc2a9e7..5243a065a062 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -396,6 +396,81 @@ void mpage_readahead(struct readahead_control *rac, 
get_block_t get_block)
 }
 EXPORT_SYMBOL(mpage_readahead);
 
+/**
+ * mpage_readpages - populate an address space with some pages & start reads 
against them
+ * @mapping: the address_space
+ * @pages: The address of a list_head which contains the target pages.  These
+ *   pages have their ->index populated and are otherwise uninitialised.
+ *   The page at @pages->prev has the lowest file offset, and reads should be
+ *   issued in @pages->prev to @pages->next order.
+ * @nr_pages: The number of pages at *@pages
+ * @get_block: The filesystem's block mapper function.
+ *
+ * This function walks the pages and the blocks within each page, building and
+ * emitting large BIOs.
+ *
+ * If anything unusual happens, such as:
+ *
+ * - encountering a page which has buffers
+ * - encountering a page which has a non-hole after a hole
+ * - encountering a page with non-contiguous blocks
+ *
+ * then this code just gives up and calls the buffer_head-based read function.
+ * It does handle a page which has holes at the end - that is a common case:
+ * the end-of-file on blocksize < PAGE_SIZE setups.
+ *
+ * BH_Boundary explanation:
+ *
+ * There is a problem.  The mpage read code assembles several pages, gets all
+ * their disk mappings, and then submits them all.  That's fine, but obtaining
+ * the disk mappings

[RFC 0/4] Fix gfs2 readahead deadlocks

2020-07-02 Thread Andreas Gruenbacher

Hi all,

commit d4388340ae0b ("fs: convert mpage_readpages to mpage_readahead")
converted gfs2 and other filesystems from the ->readpages to the
->readahead address space operation.  Due to gfs2 doing its locking in
the ->readpage and ->readahead address space operations rather than at a
higher level, this is leading to deadlocks.  Switching to a trylock
operation in ->readahead improves things but doesn't eliminate all
deadlocks; the only reasonable fix seems to be to lift gfs2's locking to
the ->fault vm operation and ->read_iter file operation.

However, gfs2 includes an optimization that allows reads to be served
out of the page cache without any filesystem locking.  This optimization
is important in concurrent read scenarios.  The best way we could find
to preserve this optimization is by introducing a new IOCB_NOIO flag for
generic_file_read_iter.

Introducing this new flag may be too big a change for 5.8.  So this
patch queue takes a different approach:  it first walks back gfs2's
conversion to ->readahead.  Then it introduces IOCB_NOIO, fixes the
locking, and re-applies the readahead conversion.

Of this patch queue, either only the first patch or all four patches can
be applied to fix gfs2's current issues in 5.8.  Please let me know what
you think.

Thanks,
Andreas

Andreas Gruenbacher (4):
  gfs2: Revert readahead conversion
  fs: Add IOCB_NOIO flag for generic_file_read_iter
  gfs2: Rework read and page fault locking
  gfs2: Reinstate readahead conversion

 fs/gfs2/aops.c | 45 +
 fs/gfs2/file.c | 55 --
 include/linux/fs.h |  1 +
 mm/filemap.c   | 16 --
 4 files changed, 69 insertions(+), 48 deletions(-)

-- 
2.26.2

[RFC 3/4] gfs2: Rework read and page fault locking

2020-07-02 Thread Andreas Gruenbacher

So far, gfs2 has taken the filesystem locks inside the ->readpage and
->readahead address space operations.  These operations are too
low-level, causing lock ordering problems and workarounds.  To get rid
of those, move the locking to the ->read_iter file and ->fault vm
operations.

The cache consistency model of filesystems like gfs2 is such that if
data is found in the page cache, the data is up to date and can be used
without taking any filesystem locks.  If a page is not cached,
filesystem locks must be taken before populating the page cache.

To avoid taking the inode glock when the data is already cached, the
->read_iter file operation first tries to read the data with the
IOCB_NOIO flag set.  If that fails, the inode glock is taken and the
operation is repeated with the IOCB_NOIO flag cleared.

Signed-off-by: Andreas Gruenbacher 
---
 fs/gfs2/aops.c | 59 --
 fs/gfs2/file.c | 55 --
 2 files changed, 57 insertions(+), 57 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 786c1ce8f030..28f097636e78 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -468,21 +468,10 @@ static int stuffed_readpage(struct gfs2_inode *ip, struct 
page *page)
 }
 
 
-/**
- * __gfs2_readpage - readpage
- * @file: The file to read a page for
- * @page: The page to read
- *
- * This is the core of gfs2's readpage. It's used by the internal file
- * reading code as in that case we already hold the glock. Also it's
- * called by gfs2_readpage() once the required lock has been granted.
- */
-
 static int __gfs2_readpage(void *file, struct page *page)
 {
struct gfs2_inode *ip = GFS2_I(page->mapping->host);
struct gfs2_sbd *sdp = GFS2_SB(page->mapping->host);
-
int error;
 
if (i_blocksize(page->mapping->host) == PAGE_SIZE &&
@@ -505,36 +494,11 @@ static int __gfs2_readpage(void *file, struct page *page)
  * gfs2_readpage - read a page of a file
  * @file: The file to read
  * @page: The page of the file
- *
- * This deals with the locking required. We have to unlock and
- * relock the page in order to get the locking in the right
- * order.
  */
 
 static int gfs2_readpage(struct file *file, struct page *page)
 {
-   struct address_space *mapping = page->mapping;
-   struct gfs2_inode *ip = GFS2_I(mapping->host);
-   struct gfs2_holder gh;
-   int error;
-
-   unlock_page(page);
-   gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, );
-   error = gfs2_glock_nq();
-   if (unlikely(error))
-   goto out;
-   error = AOP_TRUNCATED_PAGE;
-   lock_page(page);
-   if (page->mapping == mapping && !PageUptodate(page))
-   error = __gfs2_readpage(file, page);
-   else
-   unlock_page(page);
-   gfs2_glock_dq();
-out:
-   gfs2_holder_uninit();
-   if (error && error != AOP_TRUNCATED_PAGE)
-   lock_page(page);
-   return error;
+   return __gfs2_readpage(file, page);
 }
 
 /**
@@ -597,24 +561,9 @@ int gfs2_internal_read(struct gfs2_inode *ip, char *buf, 
loff_t *pos,
 static int gfs2_readpages(struct file *file, struct address_space *mapping,
  struct list_head *pages, unsigned nr_pages)
 {
-   struct inode *inode = mapping->host;
-   struct gfs2_inode *ip = GFS2_I(inode);
-   struct gfs2_sbd *sdp = GFS2_SB(inode);
-   struct gfs2_holder gh;
-   int ret;
-
-   gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, );
-   ret = gfs2_glock_nq();
-   if (unlikely(ret))
-   goto out_uninit;
-   if (!gfs2_is_stuffed(ip))
-   ret = mpage_readpages(mapping, pages, nr_pages, gfs2_block_map);
-   gfs2_glock_dq();
-out_uninit:
-   gfs2_holder_uninit();
-   if (unlikely(gfs2_withdrawn(sdp)))
-   ret = -EIO;
-   return ret;
+   if (!gfs2_is_stuffed(GFS2_I(mapping->host)))
+   return mpage_readpages(mapping, pages, nr_pages, 
gfs2_block_map);
+   return 0;
 }
 
 /**
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index fe305e4bfd37..607bbf4dfadb 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -558,8 +558,29 @@ static vm_fault_t gfs2_page_mkwrite(struct vm_fault *vmf)
return block_page_mkwrite_return(ret);
 }
 
+static vm_fault_t gfs2_fault(struct vm_fault *vmf)
+{
+   struct inode *inode = file_inode(vmf->vma->vm_file);
+   struct gfs2_inode *ip = GFS2_I(inode);
+   struct gfs2_holder gh;
+   vm_fault_t ret;
+   int err;
+
+   gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, );
+   err = gfs2_glock_nq();
+   if (err) {
+   ret = block_page_mkwrite_return(err);
+   goto out_uninit;
+   }
+   ret = filemap_fault(vmf);
+   gfs2_glock_dq();
+out_uninit:
+   gfs2_holder_uninit();
+   return ret;
+}
+
 static const struct vm_operat

[RFC 2/4] fs: Add IOCB_NOIO flag for generic_file_read_iter

2020-07-02 Thread Andreas Gruenbacher

Add an IOCB_NOIO flag that indicates to generic_file_read_iter that it
shouldn't trigger any filesystem I/O for the actual request or for
readahead.  This allows to do tentative reads out of the page cache as
some filesystems allow, and to take the appropriate locks and retry the
reads only if the requested pages are not cached.

Signed-off-by: Andreas Gruenbacher 
---
 include/linux/fs.h |  1 +
 mm/filemap.c   | 16 ++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3f881a892ea7..1ab2ea19e883 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -315,6 +315,7 @@ enum rw_hint {
 #define IOCB_SYNC  (1 << 5)
 #define IOCB_WRITE (1 << 6)
 #define IOCB_NOWAIT(1 << 7)
+#define IOCB_NOIO  (1 << 8)
 
 struct kiocb {
struct file *ki_filp;
diff --git a/mm/filemap.c b/mm/filemap.c
index f0ae9a6308cb..e8318f99f468 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2028,7 +2028,7 @@ ssize_t generic_file_buffered_read(struct kiocb *iocb,
 
page = find_get_page(mapping, index);
if (!page) {
-   if (iocb->ki_flags & IOCB_NOWAIT)
+   if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_NOIO))
goto would_block;
page_cache_sync_readahead(mapping,
ra, filp,
@@ -2038,12 +2038,16 @@ ssize_t generic_file_buffered_read(struct kiocb *iocb,
goto no_cached_page;
}
if (PageReadahead(page)) {
+   if (iocb->ki_flags & IOCB_NOIO) {
+   put_page(page);
+   goto out;
+   }
page_cache_async_readahead(mapping,
ra, filp, page,
index, last_index - index);
}
if (!PageUptodate(page)) {
-   if (iocb->ki_flags & IOCB_NOWAIT) {
+   if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_NOIO)) {
put_page(page);
goto would_block;
}
@@ -2249,6 +2253,14 @@ EXPORT_SYMBOL_GPL(generic_file_buffered_read);
  *
  * This is the "read_iter()" routine for all filesystems
  * that can use the page cache directly.
+ *
+ * The IOCB_NOWAIT flag in iocb->ki_flags indicates that -EAGAIN shall
+ * be returned when no data can be read without issuing I/O requests;
+ * this doesn't prevent readahead.  The IOCB_NOIO flag indicates that
+ * -EAGAIN shall be returned when no data can be read without issuing
+ * I/O requests, and 0 shall be returned when readhead would be
+ * triggered.
+ *
  * Return:
  * * number of bytes copied, even for partial reads
  * * negative error code if nothing was read
-- 
2.26.2

Re: [RFC] Bypass filesystems for reading cached pages

2020-07-02 Thread Andreas Gruenbacher

On Wed, Jun 24, 2020 at 2:35 PM Andreas Gruenbacher  wrote:
> On Mon, Jun 22, 2020 at 8:13 PM Matthew Wilcox  wrote:
> > On Mon, Jun 22, 2020 at 04:35:05PM +0200, Andreas Gruenbacher wrote:
> > > I'm fine with not moving that functionality into the VFS. The problem
> > > I have in gfs2 is that taking glocks is really expensive. Part of that
> > > overhead is accidental, but we definitely won't be able to fix it in
> > > the short term. So something like the IOCB_CACHED flag that prevents
> > > generic_file_read_iter from issuing readahead I/O would save the day
> > > for us. Does that idea stand a chance?
> >
> > For the short-term fix, is switching to a trylock in gfs2_readahead()
> > acceptable?
>
> Well, it's the only thing we can do for now, right?

It turns out that gfs2 can still deadlock with a trylock in
gfs2_readahead, just differently: in this instance, gfs2_glock_nq will
call inode_dio_wait. When there is pending direct I/O, we'll end up
waiting for iomap_dio_complete, which will call
invalidate_inode_pages2_range, which will try to lock the pages
already locked for gfs2_readahead.

This late in the 5.8 release cycle, I'd like to propose converting
gfs2 back to use mpage_readpages. This requires reinstating
mpage_readpages, but it's otherwise relatively trivial.
We can then introduce an IOCB_CACHED or equivalent flag, fix the
locking order in gfs2, convert gfs2 to mpage_readahead, and finally
remove mage_readpages in 5.9.

I'll post a patch queue that does this for comment.

Thanks,
Andreas

Re: [RFC] Bypass filesystems for reading cached pages

2020-06-24 Thread Andreas Gruenbacher

On Mon, Jun 22, 2020 at 8:13 PM Matthew Wilcox  wrote:
> On Mon, Jun 22, 2020 at 04:35:05PM +0200, Andreas Gruenbacher wrote:
> > I'm fine with not moving that functionality into the VFS. The problem
> > I have in gfs2 is that taking glocks is really expensive. Part of that
> > overhead is accidental, but we definitely won't be able to fix it in
> > the short term. So something like the IOCB_CACHED flag that prevents
> > generic_file_read_iter from issuing readahead I/O would save the day
> > for us. Does that idea stand a chance?
>
> For the short-term fix, is switching to a trylock in gfs2_readahead()
> acceptable?

Well, it's the only thing we can do for now, right?

> diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
> index 72c9560f4467..6ccd478c81ff 100644
> --- a/fs/gfs2/aops.c
> +++ b/fs/gfs2/aops.c
> @@ -600,7 +600,7 @@ static void gfs2_readahead(struct readahead_control *rac)
> struct gfs2_inode *ip = GFS2_I(inode);
> struct gfs2_holder gh;
>
> -   gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, );
> +   gfs2_holder_init(ip->i_gl, LM_ST_SHARED, LM_FLAG_TRY, );
> if (gfs2_glock_nq())
> goto out_uninit;
> if (!gfs2_is_stuffed(ip))

Thanks,
Andreas

Re: [RFC] Bypass filesystems for reading cached pages

2020-06-23 Thread Andreas Gruenbacher

On Tue, Jun 23, 2020 at 2:52 AM Dave Chinner  wrote:
> On Mon, Jun 22, 2020 at 04:35:05PM +0200, Andreas Gruenbacher wrote:
> > On Mon, Jun 22, 2020 at 2:32 AM Dave Chinner  wrote:
> > > On Fri, Jun 19, 2020 at 08:50:36AM -0700, Matthew Wilcox wrote:
> > > >
> > > > This patch lifts the IOCB_CACHED idea expressed by Andreas to the VFS.
> > > > The advantage of this patch is that we can avoid taking any filesystem
> > > > lock, as long as the pages being accessed are in the cache (and we don't
> > > > need to readahead any pages into the cache).  We also avoid an indirect
> > > > function call in these cases.
> > >
> > > What does this micro-optimisation actually gain us except for more
> > > complexity in the IO path?
> > >
> > > i.e. if a filesystem lock has such massive overhead that it slows
> > > down the cached readahead path in production workloads, then that's
> > > something the filesystem needs to address, not unconditionally
> > > bypass the filesystem before the IO gets anywhere near it.
> >
> > I'm fine with not moving that functionality into the VFS. The problem
> > I have in gfs2 is that taking glocks is really expensive. Part of that
> > overhead is accidental, but we definitely won't be able to fix it in
> > the short term. So something like the IOCB_CACHED flag that prevents
> > generic_file_read_iter from issuing readahead I/O would save the day
> > for us. Does that idea stand a chance?
>
> I have no problem with a "NOREADAHEAD" flag being passed to
> generic_file_read_iter(). It's not a "already cached" flag though,
> it's a "don't start any IO" directive, just like the NOWAIT flag is
> a "don't block on locks or IO in progress" directive and not an
> "already cached" flag. Readahead is something we should be doing,
> unless a filesystem has a very good reason not to, such as the gfs2
> locking case here...

The requests coming in can have the IOCB_NOWAIT flag set or cleared.
The idea was to have an additional flag that implies IOCB_NOWAIT so
that you can do:

iocb->ki_flags |= IOCB_NOIO;
generic_file_read_iter()
if ("failed because of IOCB_NOIO") {
if ("failed because of IOCB_NOWAIT")
return -EAGAIN;
iocb->ki_flags &= ~IOCB_NOIO;
"locking"
 generic_file_read_iter()
"unlocking"
}

without having to save iocb->ki_flags. The alternative would be:

int flags = iocb->ki_flags;
iocb->ki_flags |= IOCB_NOIO | IOCB_NOWAIT;
ret = generic_file_read_iter()
if ("failed because of IOCB_NOIO or IOCB_NOWAIT") {
if ("failed because of IOCB_NOWAIT" && (flags & IOCB_NOWAIT))
return -EAGAIN;
iocb->ki_flags &= ~IOCB_NOIO;
"locking"
 generic_file_read_iter()
"unlocking"
}

Andreas

Re: [RFC] Bypass filesystems for reading cached pages

2020-06-22 Thread Andreas Gruenbacher

On Mon, Jun 22, 2020 at 2:32 AM Dave Chinner  wrote:
> On Fri, Jun 19, 2020 at 08:50:36AM -0700, Matthew Wilcox wrote:
> >
> > This patch lifts the IOCB_CACHED idea expressed by Andreas to the VFS.
> > The advantage of this patch is that we can avoid taking any filesystem
> > lock, as long as the pages being accessed are in the cache (and we don't
> > need to readahead any pages into the cache).  We also avoid an indirect
> > function call in these cases.
>
> What does this micro-optimisation actually gain us except for more
> complexity in the IO path?
>
> i.e. if a filesystem lock has such massive overhead that it slows
> down the cached readahead path in production workloads, then that's
> something the filesystem needs to address, not unconditionally
> bypass the filesystem before the IO gets anywhere near it.

I'm fine with not moving that functionality into the VFS. The problem
I have in gfs2 is that taking glocks is really expensive. Part of that
overhead is accidental, but we definitely won't be able to fix it in
the short term. So something like the IOCB_CACHED flag that prevents
generic_file_read_iter from issuing readahead I/O would save the day
for us. Does that idea stand a chance?

We could theoretically also create a copy of
generic_file_buffered_read in gfs2 and strip out the I/O parts, but
that would be very messy.

Thanks,
Andreas

[RFC PATCH 0/2] gfs2 readahead regression in v5.8-rc1

2020-06-19 Thread Andreas Gruenbacher

Hello,

can the two patches in this set still be considered for v5.8?

Commit d4388340ae0b ("fs: convert mpage_readpages to mpage_readahead")
which converts gfs2 and other filesystems to use the new ->readahead
address space operation is leading to deadlocks between the inode glocks
and page locks: ->readahead is called with the pages to readahead
already locked.  When gfs2_readahead then tries to lock the associated
inode glock, another process already holding the inode glock may be
trying to lock the same pages.

We could work around this in gfs by using a LM_FLAG_TRY lock in
->readahead for now.  The real reason for this deadlock is that gfs2
shouldn't be taking the inode glock in ->readahead in the first place
though, so I'd prefer to fix this "properly" instead.  Unfortunately,
this depends on a new IOCB_CACHED flag for generic_file_read_iter.

A previous version was posted in November:

https://lore.kernel.org/linux-fsdevel/20191122235324.17245-1-agrue...@redhat.com/

Thanks,
Andreas

Andreas Gruenbacher (2):
  fs: Add IOCB_CACHED flag for generic_file_read_iter
  gfs2: Rework read and page fault locking

 fs/gfs2/aops.c | 27 ++--
 fs/gfs2/file.c | 61 --
 include/linux/fs.h |  1 +
 mm/filemap.c   | 16 ++--
 4 files changed, 76 insertions(+), 29 deletions(-)


base-commit: af42d3466bdc8f39806b26f593604fdc54140bcb
-- 
2.26.2

[PATCH 1/2] fs: Add IOCB_CACHED flag for generic_file_read_iter

2020-06-19 Thread Andreas Gruenbacher

Add an IOCB_CACHED flag which indicates to generic_file_read_iter that
it should only regard the page cache, without triggering any filesystem
I/O for the actual request or for readahead.  With this flag, -EAGAIN is
returned when regular I/O would be triggered similar to the IOCB_NOWAIT
flag, and -ECANCELED is returned when readahead would be triggered.

This allows the caller to perform a tentative read out of the page
cache, and to retry the read if the requested pages are not cached.

Please see the next commit for what this is used for.

Signed-off-by: Andreas Gruenbacher 
---
 include/linux/fs.h |  1 +
 mm/filemap.c   | 16 ++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6c4ab4dc1cd7..74eade571b1c 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -315,6 +315,7 @@ enum rw_hint {
 #define IOCB_SYNC  (1 << 5)
 #define IOCB_WRITE (1 << 6)
 #define IOCB_NOWAIT(1 << 7)
+#define IOCB_CACHED(1 << 8)
 
 struct kiocb {
struct file *ki_filp;
diff --git a/mm/filemap.c b/mm/filemap.c
index f0ae9a6308cb..bd11f27bf6ae 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2028,7 +2028,7 @@ ssize_t generic_file_buffered_read(struct kiocb *iocb,
 
page = find_get_page(mapping, index);
if (!page) {
-   if (iocb->ki_flags & IOCB_NOWAIT)
+   if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_CACHED))
goto would_block;
page_cache_sync_readahead(mapping,
ra, filp,
@@ -2038,12 +2038,17 @@ ssize_t generic_file_buffered_read(struct kiocb *iocb,
goto no_cached_page;
}
if (PageReadahead(page)) {
+   if (iocb->ki_flags & IOCB_CACHED) {
+   put_page(page);
+   error = -ECANCELED;
+   goto out;
+   }
page_cache_async_readahead(mapping,
ra, filp, page,
index, last_index - index);
}
if (!PageUptodate(page)) {
-   if (iocb->ki_flags & IOCB_NOWAIT) {
+   if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_CACHED)) {
put_page(page);
goto would_block;
}
@@ -2249,6 +2254,13 @@ EXPORT_SYMBOL_GPL(generic_file_buffered_read);
  *
  * This is the "read_iter()" routine for all filesystems
  * that can use the page cache directly.
+ *
+ * In the IOCB_NOWAIT flag in iocb->ki_flags indicates that -EAGAIN should be
+ * returned if completing the request would require I/O; this does not prevent
+ * readahead.  The IOCB_CACHED flag indicates that -EAGAIN should be returned
+ * as under the IOCB_NOWAIT flag, and that -ECANCELED should be returned when
+ * readhead would be triggered.
+ *
  * Return:
  * * number of bytes copied, even for partial reads
  * * negative error code if nothing was read
-- 
2.26.2

[PATCH 2/2] gfs2: Rework read and page fault locking

2020-06-19 Thread Andreas Gruenbacher

The cache consistency model of filesystems like gfs2 is such that if
data is found in the page cache, the data is up to date and can be used
without taking any filesystem locks.  If a page is not cached,
filesystem locks must be taken before populating the page cache.

Thus far,  gfs2 has taken the filesystem locks inside the ->readpage and
->readpages address space operations.  This was already causing lock
ordering problems, but commit d4388340ae0b ("fs: convert mpage_readpages
to mpage_readahead") made things worse: the ->readahead operation is
called with the pages to readahead locked, so grabbing the inode's glock
can now deadlock with processes which are holding the inode glock while
trying to lock the same pages.

Fix this by taking the inode glock in the ->read_iter file and ->fault
vm operations.  To avoid taking the inode glock when the data is already
cached, the ->read_iter file operation first tries to read the data with
the IOCB_CACHED flag set.  If that fails, the inode glock is locked and
the operation is repeated without the IOCB_CACHED flag.

Signed-off-by: Andreas Gruenbacher 
---
 fs/gfs2/aops.c | 27 ++
 fs/gfs2/file.c | 61 --
 2 files changed, 61 insertions(+), 27 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 72c9560f4467..73c2fe768a3f 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -513,26 +513,10 @@ static int __gfs2_readpage(void *file, struct page *page)
 
 static int gfs2_readpage(struct file *file, struct page *page)
 {
-   struct address_space *mapping = page->mapping;
-   struct gfs2_inode *ip = GFS2_I(mapping->host);
-   struct gfs2_holder gh;
int error;
 
-   unlock_page(page);
-   gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, );
-   error = gfs2_glock_nq();
-   if (unlikely(error))
-   goto out;
-   error = AOP_TRUNCATED_PAGE;
-   lock_page(page);
-   if (page->mapping == mapping && !PageUptodate(page))
-   error = __gfs2_readpage(file, page);
-   else
-   unlock_page(page);
-   gfs2_glock_dq();
-out:
-   gfs2_holder_uninit();
-   if (error && error != AOP_TRUNCATED_PAGE)
+   error = __gfs2_readpage(file, page);
+   if (error)
lock_page(page);
return error;
 }
@@ -598,16 +582,9 @@ static void gfs2_readahead(struct readahead_control *rac)
 {
struct inode *inode = rac->mapping->host;
struct gfs2_inode *ip = GFS2_I(inode);
-   struct gfs2_holder gh;
 
-   gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, );
-   if (gfs2_glock_nq())
-   goto out_uninit;
if (!gfs2_is_stuffed(ip))
mpage_readahead(rac, gfs2_block_map);
-   gfs2_glock_dq();
-out_uninit:
-   gfs2_holder_uninit();
 }
 
 /**
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index fe305e4bfd37..f729b0ff2a3c 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -558,8 +558,29 @@ static vm_fault_t gfs2_page_mkwrite(struct vm_fault *vmf)
return block_page_mkwrite_return(ret);
 }
 
+static vm_fault_t gfs2_fault(struct vm_fault *vmf)
+{
+   struct inode *inode = file_inode(vmf->vma->vm_file);
+   struct gfs2_inode *ip = GFS2_I(inode);
+   struct gfs2_holder gh;
+   vm_fault_t ret;
+   int err;
+
+   gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, );
+   err = gfs2_glock_nq();
+   if (err) {
+   ret = block_page_mkwrite_return(err);
+   goto out_uninit;
+   }
+   ret = filemap_fault(vmf);
+   gfs2_glock_dq();
+out_uninit:
+   gfs2_holder_uninit();
+   return ret;
+}
+
 static const struct vm_operations_struct gfs2_vm_ops = {
-   .fault = filemap_fault,
+   .fault = gfs2_fault,
.map_pages = filemap_map_pages,
.page_mkwrite = gfs2_page_mkwrite,
 };
@@ -824,15 +845,51 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, 
struct iov_iter *from)
 
 static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 {
+   struct gfs2_inode *ip;
+   struct gfs2_holder gh;
+   size_t written = 0;
ssize_t ret;
 
+   gfs2_holder_mark_uninitialized();
if (iocb->ki_flags & IOCB_DIRECT) {
ret = gfs2_file_direct_read(iocb, to);
if (likely(ret != -ENOTBLK))
return ret;
iocb->ki_flags &= ~IOCB_DIRECT;
}
-   return generic_file_read_iter(iocb, to);
+   iocb->ki_flags |= IOCB_CACHED;
+   ret = generic_file_read_iter(iocb, to);
+   iocb->ki_flags &= ~IOCB_CACHED;
+   if (ret >= 0) {
+   if (!iov_iter_count(to))
+   return ret;
+   written = ret;
+   } else {
+   switch(ret) {
+   case -EAGAIN:
+   if (iocb->ki_flags &

Re: [Cluster-devel] [PATCH v11 16/25] fs: Convert mpage_readpages to mpage_readahead

2020-06-18 Thread Andreas Gruenbacher

On Thu, Jun 18, 2020 at 5:03 PM Matthew Wilcox  wrote:
>
> On Thu, Jun 18, 2020 at 02:46:03PM +0200, Andreas Gruenbacher wrote:
> > On Wed, Jun 17, 2020 at 4:22 AM Matthew Wilcox  wrote:
> > > On Wed, Jun 17, 2020 at 02:57:14AM +0200, Andreas GrÃ¼nbacher wrote:
> > > > Right, the approach from the following thread might fix this:
> > > >
> > > > https://lore.kernel.org/linux-fsdevel/20191122235324.17245-1-agrue...@redhat.com/T/#t
> > >
> > > In general, I think this is a sound approach.
> > >
> > > Specifically, I think FAULT_FLAG_CACHED can go away.  map_pages()
> > > will bring in the pages which are in the page cache, so when we get to
> > > gfs2_fault(), we know there's a reason to acquire the glock.
> >
> > We'd still be grabbing a glock while holding a dependent page lock.
> > Another process could be holding the glock and could try to grab the
> > same page lock (i.e., a concurrent writer), leading to the same kind
> > of deadlock.
>
> What I'm saying is that gfs2_fault should just be:
>
> +static vm_fault_t gfs2_fault(struct vm_fault *vmf)
> +{
> +   struct inode *inode = file_inode(vmf->vma->vm_file);
> +   struct gfs2_inode *ip = GFS2_I(inode);
> +   struct gfs2_holder gh;
> +   vm_fault_t ret;
> +   int err;
> +
> +   gfs2_holder_init(ip->i_gl, LM_ST_SHARED, 0, );
> +   err = gfs2_glock_nq();
> +   if (err) {
> +   ret = block_page_mkwrite_return(err);
> +   goto out_uninit;
> +   }
> +   ret = filemap_fault(vmf);
> +   gfs2_glock_dq();
> +out_uninit:
> +   gfs2_holder_uninit();
> +   return ret;
> +}
>
> because by the time gfs2_fault() is called, map_pages() has already been
> called and has failed to insert the necessary page, so we should just
> acquire the glock now instead of trying again to look for the page in
> the page cache.

Okay, that's great.

Thanks,
Andreas

Re: [Cluster-devel] [PATCH v11 16/25] fs: Convert mpage_readpages to mpage_readahead

2020-06-18 Thread Andreas Gruenbacher

On Wed, Jun 17, 2020 at 4:22 AM Matthew Wilcox  wrote:
> On Wed, Jun 17, 2020 at 02:57:14AM +0200, Andreas Grünbacher wrote:
> > Am Mi., 17. Juni 2020 um 02:33 Uhr schrieb Matthew Wilcox 
> > :
> > >
> > > On Wed, Jun 17, 2020 at 12:36:13AM +0200, Andreas Gruenbacher wrote:
> > > > Am Mi., 15. Apr. 2020 um 23:39 Uhr schrieb Matthew Wilcox 
> > > > :
> > > > > From: "Matthew Wilcox (Oracle)" 
> > > > >
> > > > > Implement the new readahead aop and convert all callers (block_dev,
> > > > > exfat, ext2, fat, gfs2, hpfs, isofs, jfs, nilfs2, ocfs2, omfs, qnx6,
> > > > > reiserfs & udf).  The callers are all trivial except for GFS2 & OCFS2.
> > > >
> > > > This patch leads to an ABBA deadlock in xfstest generic/095 on gfs2.
> > > >
> > > > Our lock hierarchy is such that the inode cluster lock ("inode glock")
> > > > for an inode needs to be taken before any page locks in that inode's
> > > > address space.
> > >
> > > How does that work for ...
> > >
> > > writepage:  yes, unlocks (see below)
> > > readpage:   yes, unlocks
> > > invalidatepage: yes
> > > releasepage:yes
> > > freepage:   yes
> > > isolate_page:   yes
> > > migratepage:yes (both)
> > > putback_page:   yes
> > > launder_page:   yes
> > > is_partially_uptodate:  yes
> > > error_remove_page:  yes
> > >
> > > Is there a reason that you don't take the glock in the higher level
> > > ops which are called before readhead gets called?  I'm looking at XFS,
> > > and it takes the xfs_ilock SHARED in xfs_file_buffered_aio_read()
> > > (called from xfs_file_read_iter).
> >
> > Right, the approach from the following thread might fix this:
> >
> > https://lore.kernel.org/linux-fsdevel/20191122235324.17245-1-agrue...@redhat.com/T/#t
>
> In general, I think this is a sound approach.
>
> Specifically, I think FAULT_FLAG_CACHED can go away.  map_pages()
> will bring in the pages which are in the page cache, so when we get to
> gfs2_fault(), we know there's a reason to acquire the glock.

We'd still be grabbing a glock while holding a dependent page lock.
Another process could be holding the glock and could try to grab the
same page lock (i.e., a concurrent writer), leading to the same kind
of deadlock.

Andreas

Re: [Cluster-devel] [PATCH v11 16/25] fs: Convert mpage_readpages to mpage_readahead

2020-06-16 Thread Andreas Gruenbacher

Am Mi., 15. Apr. 2020 um 23:39 Uhr schrieb Matthew Wilcox :
> From: "Matthew Wilcox (Oracle)" 
>
> Implement the new readahead aop and convert all callers (block_dev,
> exfat, ext2, fat, gfs2, hpfs, isofs, jfs, nilfs2, ocfs2, omfs, qnx6,
> reiserfs & udf).  The callers are all trivial except for GFS2 & OCFS2.

This patch leads to an ABBA deadlock in xfstest generic/095 on gfs2.

Our lock hierarchy is such that the inode cluster lock ("inode glock")
for an inode needs to be taken before any page locks in that inode's
address space. However, the readahead address space operation is
called with the pages already locked. When we try to grab the inode
glock inside gfs2_readahead, we'll deadlock with processes that are
holding that inode glock and trying to lock one of those same pages.

One possible solution is to use a trylock on the glock in
gfs2_readahead, and to give up the readahead in case of a locking
conflict. I have no idea how this is going to affect performance.

Any other ideas?

Thanks,
Andreas

[GIT PULL] GFS2 changes for 5.8

2020-06-08 Thread Andreas Gruenbacher

Hi Linus,

please consider pulling the following gfs2 changes for 5.8.

Thanks a lot,
Andreas

The following changes since commit 3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162:

  Linux 5.7 (2020-05-31 16:49:15 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git 
tags/gfs2-for-5.8

for you to fetch changes up to 300e549b6e53025ea69550f009451f7a13bfc3eb:

  Merge branch 'gfs2-iopen' into for-next (2020-06-05 21:25:36 +0200)


Changes in gfs2:

- An iopen glock locking scheme rework that speeds up deletes of
  inodes accessed from multiple nodes.
- Various bug fixes and debugging improvements.
- Convert gfs2-glocks.txt to ReST.


Andreas Gruenbacher (10):
  gfs2: Keep track of deleted inode generations in LVBs
  gfs2: Turn gl_delete into a delayed work
  gfs2: Give up the iopen glock on contention
  gfs2: Try harder to delete inodes locally
  gfs2: Minor gfs2_lookup_by_inum cleanup
  gfs2: Move inode generation number check into gfs2_inode_lookup
  gfs2: Check inode generation number in delete_work_func
  gfs2: Wake up when setting GLF_DEMOTE
  gfs2: Smarter iopen glock waiting
  Merge branch 'gfs2-iopen' into for-next

Bob Peterson (10):
  gfs2: Don't ignore inode write errors during inode_go_sync
  gfs2: Allow lock_nolock mount to specify jid=X
  gfs2: Only do glock put in gfs2_create_inode for free inodes
  gfs2: print mapping->nrpages in glock dump for address space glocks
  gfs2: introduce new gfs2_glock_assert_withdraw
  gfs2: instrumentation wrt log_flush stuck
  gfs2: Allow ASPACE glocks to also have an lvb
  gfs2: initialize transaction tr_ailX_lists earlier
  gfs2: new slab for transactions
  gfs2: fix use-after-free on transaction ail lists

Mauro Carvalho Chehab (1):
  docs: filesystems: convert gfs2-glocks.txt to ReST

 .../{gfs2-glocks.txt => gfs2-glocks.rst}   | 149 ---
 Documentation/filesystems/index.rst|   1 +
 MAINTAINERS|   2 +-
 fs/gfs2/export.c   |   4 +-
 fs/gfs2/glock.c| 208 ++---
 fs/gfs2/glock.h|  16 ++
 fs/gfs2/glops.c|  21 ++-
 fs/gfs2/incore.h   |   9 +-
 fs/gfs2/inode.c|  47 +++--
 fs/gfs2/inode.h|   2 +-
 fs/gfs2/log.c  |  56 --
 fs/gfs2/main.c |   9 +
 fs/gfs2/ops_fstype.c   |   2 +-
 fs/gfs2/rgrp.c |   2 +-
 fs/gfs2/super.c|  72 ++-
 fs/gfs2/trans.c|  21 ++-
 fs/gfs2/trans.h|   1 +
 fs/gfs2/util.c |   1 +
 fs/gfs2/util.h |   1 +
 include/uapi/linux/gfs2_ondisk.h   |   6 +
 20 files changed, 489 insertions(+), 141 deletions(-)
 rename Documentation/filesystems/{gfs2-glocks.txt => gfs2-glocks.rst} (63%)

[GIT PULL] GFS2 fix for v5.7-rc7

2020-05-29 Thread Andreas Gruenbacher

Hi Linus,

I've screwed up commit aa83da7f47b ("gfs2: More gfs2_find_jhead
fixes") which went into v5.7-rc6. Could you please consider pulling
the following fix?

Thanks a lot,
Andreas

The following changes since commit 9cb1fd0efd195590b828b9b865421ad345a4a145:

  Linux 5.7-rc7 (2020-05-24 15:32:54 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git
tags/gfs2-v5.7-rc7.fixes

for you to fetch changes up to 20be493b787cd581c9fffad7fcd6bfbe6af1050c:

  gfs2: Even more gfs2_find_jhead fixes (2020-05-29 17:00:24 +0200)


Fix the previous, flawed gfs2_find_jhead commit

--------
Andreas Gruenbacher (1):
  gfs2: Even more gfs2_find_jhead fixes

 fs/gfs2/lops.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

[GIT PULL] GFS2 fixes for 5.7

2020-05-12 Thread Andreas Gruenbacher

Hi Linus,

could you please pull the following fixes for gfs2?

Thanks a lot,
Andreas

The following changes since commit 8f3d9f354286745c751374f5f1fcafee6b3f3136:

  Linux 5.7-rc1 (2020-04-12 12:35:55 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git 
tags/gfs2-v5.7-rc1.fixes

for you to fetch changes up to b14c94908b1b884276a6608dea3d0b1b510338b7:

  Revert "gfs2: Don't demote a glock until its revokes are written" (2020-05-08 
15:01:25 -0500)


Various gfs2 fixes

Fixes for bugs prior to v5.7-rc1:
- Fix random block reads when reading fragmented journals (v5.2).
- Fix a possible random memory access in gfs2_walk_metadata (v5.3).

Fixes for v5.7-rc1:
- Fix several overlooked gfs2_qa_get / gfs2_qa_put imbalances.
- Fix several bugs in the new filesystem withdraw logic.

--------
Andreas Gruenbacher (3):
  gfs2: Another gfs2_walk_metadata fix
  gfs2: More gfs2_find_jhead fixes
  gfs2: Grab glock reference sooner in gfs2_add_revoke

Bob Peterson (11):
  gfs2: fix withdraw sequence deadlock
  gfs2: Fix error exit in do_xmote
  gfs2: Fix BUG during unmount after file system withdraw
  gfs2: Fix use-after-free in gfs2_logd after withdraw
  gfs2: Fix problems regarding gfs2_qa_get and _put
  gfs2: Change BUG_ON to an assert_withdraw in gfs2_quota_change
  gfs2: remove check for quotas on in gfs2_quota_check
  gfs2: move privileged user check to gfs2_quota_lock_check
  gfs2: don't call quota_unhold if quotas are not locked
  gfs2: If go_sync returns error, withdraw but skip invalidate
  Revert "gfs2: Don't demote a glock until its revokes are written"

 fs/gfs2/bmap.c| 16 +---
 fs/gfs2/glock.c   |  6 ++
 fs/gfs2/inode.c   |  7 ---
 fs/gfs2/log.c | 11 ---
 fs/gfs2/lops.c| 19 ---
 fs/gfs2/meta_io.c |  2 +-
 fs/gfs2/quota.c   | 13 +
 fs/gfs2/quota.h   |  3 ++-
 fs/gfs2/super.c   |  1 -
 fs/gfs2/util.c| 10 ++
 10 files changed, 49 insertions(+), 39 deletions(-)

Re: [PATCH] fs/gfs2:lock a spinlock always before returning from do_xmote()

2020-04-28 Thread Andreas Gruenbacher

Hi,

On Tue, Apr 28, 2020 at 5:30 AM Wu Bo  wrote:
> The call stack is as follows:
> finish_xmote()
> ...
> spin_lock(>gl_lockref.lock);
> ...
> --> do_xmote()
> spin_unlock(>gl_lockref.lock);
> ...
> return;
> ...
> spin_unlock(>gl_lockref.lock);
>
> do_xmote function needs to be locked before returning,
> Otherwise, there will be a double release lock in finish_xmote() function.
>
> Signed-off-by: Wu Bo 
> ---
>  fs/gfs2/glock.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index 29f9b66..7129d10 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -613,6 +613,7 @@ static void do_xmote(struct gfs2_glock *gl, struct 
> gfs2_holder *gh, unsigned int
> fs_err(sdp, "Error %d syncing glock \n", ret);
> gfs2_dump_glock(NULL, gl, true);
> }
> +   spin_lock(>gl_lockref.lock);
> return;
> }
> }
> --
> 1.8.3.1

this patch looks correct. We've independently discovered this bug as
well in the meantime, and we'll send the fix upstream.

Thanks,
Andreas

Re: INFO: task hung in pipe_write (2)

2019-10-14 Thread Andreas Gruenbacher

Hi Darrick,

On Thu, Sep 19, 2019 at 11:10 PM Darrick J. Wong
 wrote:
> On Thu, Sep 19, 2019 at 10:55:44PM +0200, Rasmus Villemoes wrote:
> > On 19/09/2019 19.19, syzbot wrote:
> > > Hello,
> > >
> > > syzbot found the following crash on:
> > >
> > > HEAD commit:288b9117 Add linux-next specific files for 20190918
> > > git tree:   linux-next
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=17e8664560
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=f6126e51304ef1c3
> > > dashboard link:
> > > https://syzkaller.appspot.com/bug?extid=3c01db6025f26530cf8d
> > > compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
> > > syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1185576960
> > > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=143580a160
> > >
> > > The bug was bisected to:
> > >
> > > commit cfb864757d8690631aadf1c4b80022c18ae865b3
> > > Author: Darrick J. Wong 
> > > Date:   Tue Sep 17 16:05:22 2019 +
> > >
> > > splice: only read in as much information as there is pipe buffer space
> >
> > The middle hunk (the one before splice_pipe_to_pipe()) accesses
> > opipe->{buffers, nrbufs}, but opipe is not locked at that point. So
> > maybe we end up passing len==0, which seems (once there's room in opipe)
> > it would put a zero-length pipe_buffer in opipe - and that probably
> > violates an invariant somewhere.
> >
> > But does the splice_pipe_to_pipe() case even need that extra logic?
> > Doesn't it handle short writes correctly already?
>
> Yep.  I missed the part where splice_pipe_to_pipe is already perfectly
> capable of detecting insufficient space in opipe and kicking opipe's
> readers to clear out the buffer.  So that hunk isn't needed, and now I'm
> wondering how in the other clause we return 0 from wait_for_space yet
> still don't have buffer space...
>
> Oh well, back to the drawing board.  Good catch, though now it's become
> painfully clear that xfstests lacks rigorous testing of splice()...

have you had any luck figuring out how to fix this? We're still
suffering from the regression I've reported a while ago (*).

If not, I wonder if reverting commit 8f67b5adc030 would make sense for now.

* 
https://lore.kernel.org/linux-fsdevel/CAHpGcM+WQYFHOOC8SzKq+=duhvz4fw4rhltmudn-o6gx3yt...@mail.gmail.com/T/#u

Thanks,
Andreas

Re: [PATCH v5 03/18] gfs2: add compat_ioctl support

2019-08-16 Thread Andreas Gruenbacher

Arnd,

On Wed, Aug 14, 2019 at 10:45 PM Arnd Bergmann  wrote:
>
> Out of the four ioctl commands supported on gfs2, only FITRIM
> works in compat mode.
>
> Add a proper handler based on the ext4 implementation.
>
> Fixes: 6ddc5c3ddf25 ("gfs2: getlabel support")
> Signed-off-by: Arnd Bergmann 
> ---
>  fs/gfs2/file.c | 24 
>  1 file changed, 24 insertions(+)
>
> diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
> index 52fa1ef8400b..49287f0b96d0 100644
> --- a/fs/gfs2/file.c
> +++ b/fs/gfs2/file.c
> @@ -6,6 +6,7 @@
>
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -354,6 +355,25 @@ static long gfs2_ioctl(struct file *filp, unsigned int 
> cmd, unsigned long arg)
> return -ENOTTY;
>  }
>
> +#ifdef CONFIG_COMPAT
> +static long gfs2_compat_ioctl(struct file *filp, unsigned int cmd, unsigned 
> long arg)
> +{
> +   /* These are just misnamed, they actually get/put from/to user an int 
> */
> +   switch(cmd) {
> +   case FS_IOC32_GETFLAGS:
> +   cmd = FS_IOC_GETFLAGS;
> +   break;
> +   case FS_IOC32_SETFLAGS:
> +   cmd = FS_IOC_SETFLAGS;
> +   break;

I'd like the code to be more explicit here:

case FITRIM:
case FS_IOC_GETFSLABEL:
  break;
default:
  return -ENOIOCTLCMD;

> +   }
> +
> +   return gfs2_ioctl(filp, cmd, (unsigned long)compat_ptr(arg));
> +}
> +#else
> +#define gfs2_compat_ioctl NULL
> +#endif
> +
>  /**
>   * gfs2_size_hint - Give a hint to the size of a write request
>   * @filep: The struct file
> @@ -1294,6 +1314,7 @@ const struct file_operations gfs2_file_fops = {
> .write_iter = gfs2_file_write_iter,
> .iopoll = iomap_dio_iopoll,
> .unlocked_ioctl = gfs2_ioctl,
> +   .compat_ioctl   = gfs2_compat_ioctl,
> .mmap   = gfs2_mmap,
> .open   = gfs2_open,
> .release= gfs2_release,
> @@ -1309,6 +1330,7 @@ const struct file_operations gfs2_file_fops = {
>  const struct file_operations gfs2_dir_fops = {
> .iterate_shared = gfs2_readdir,
> .unlocked_ioctl = gfs2_ioctl,
> +   .compat_ioctl   = gfs2_compat_ioctl,
> .open   = gfs2_open,
> .release= gfs2_release,
> .fsync  = gfs2_fsync,
> @@ -1325,6 +1347,7 @@ const struct file_operations gfs2_file_fops_nolock = {
> .write_iter = gfs2_file_write_iter,
> .iopoll = iomap_dio_iopoll,
> .unlocked_ioctl = gfs2_ioctl,
> +   .compat_ioctl   = gfs2_compat_ioctl,
> .mmap   = gfs2_mmap,
> .open   = gfs2_open,
> .release= gfs2_release,
> @@ -1338,6 +1361,7 @@ const struct file_operations gfs2_file_fops_nolock = {
>  const struct file_operations gfs2_dir_fops_nolock = {
> .iterate_shared = gfs2_readdir,
> .unlocked_ioctl = gfs2_ioctl,
> +   .compat_ioctl   = gfs2_compat_ioctl,
> .open   = gfs2_open,
> .release= gfs2_release,
> .fsync  = gfs2_fsync,
> --
> 2.20.0
>

Should we feed this through the gfs2 tree?

Thanks,
Andreas

[GIT PULL] Fix incorrect lseek / fiemap results

2019-08-10 Thread Andreas Gruenbacher

Hi Linus,

could you please pull the following gfs2 fix?

Thanks,
Andreas

The following changes since commit e21a712a9685488f5ce80495b37b9fdbe96c230d:

  Linux 5.3-rc3 (2019-08-04 18:40:12 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git 
tags/gfs2-v5.3-rc3.fixes

for you to fetch changes up to a27a0c9b6a208722016c8ec5ad31ec96082b91ec:

  gfs2: gfs2_walk_metadata fix (2019-08-09 16:56:12 +0100)


Fix incorrect lseek / fiemap results


Andreas Gruenbacher (1):
  gfs2: gfs2_walk_metadata fix

 fs/gfs2/bmap.c | 164 +++--
 1 file changed, 101 insertions(+), 63 deletions(-)

Re: lift the xfs writepage code into iomap v3

2019-07-27 Thread Andreas Gruenbacher

On Sat, 27 Jul 2019 at 01:38, Darrick J. Wong  wrote:
> On Mon, Jul 23, 2019 at 11:50:12AM +0200, Christoph Hellwig wrote:
> > Hi all,
> >
> > this series cleans up the xfs writepage code and then lifts it to
> > fs/iomap.c so that it could be use by other file system.  I've been
> > wanting to this for a while so that I could eventually convert gfs2
> > over to it, but I never got to it.  Now Damien has a new zonefs
> > file system for semi-raw access to zoned block devices that would
> > like to use the iomap code instead of reinventing it, so I finally
> > had to do the work.
>
> Hmm... I don't like how there are xfs changes mixed in with the iomap
> changes, because were I to take this series as-is then I'd have to
> commit both to adding iomap writeback code /and/ converting xfs at the
> same time.
>
> I think I'd be more comfortable creating a branch to merge the changes
> to list.h and fs/iomap/, and then gfs2/zonefs/xfs can sprout their own
> branches from there to do whatever conversions are necessary.
>
> To me what that means is splitting patch 7 into 7a which does the iomap
> changes and 7b which does the xfs changes.  To get there, I'd create a
> iomap-writeback branch containing:
>
> 1 7a 8 9 10 11 12
>
> and then a new xfs-iomap-writeback branch containing:
>
> 2 4 7b
>
> This eliminates the need for patches 3, 5, and 6, though the cost is
> that it's less clear from the history that we did some reorganizing of
> the xfs writeback code and then moved it over to iomap.  OTOH, I also
> see this as a way to lower risk because if some patch in the
> xfs-iomap-writeback branch shakes loose a bug that doesn't affect gfs2
> or zonedfs we don't have to hold them up.
>
> I'll try to restructure this series along those lines and report back
> how it went.

Keeping the infrastructure changes in separate commits would certainly
make the patches easier to work with for me. Keeping the commits
interleaved should be fine though: patch "iomap: zero newly allocated
mapped blocks" depends on "xfs: set IOMAP_F_NEW more carefully", so a
pure infrastructure branch without "xfs: set IOMAP_F_NEW more
carefully" probably wouldn't be correct.

Thanks,
Andreas

Re: lift the xfs writepage code into iomap v3

2019-07-22 Thread Andreas Gruenbacher

Hi Christoph,

On Mon, 22 Jul 2019 at 11:50, Christoph Hellwig  wrote:
> this series cleans up the xfs writepage code and then lifts it to
> fs/iomap.c so that it could be use by other file system.  I've been
> wanting to this for a while so that I could eventually convert gfs2
> over to it, but I never got to it.  Now Damien has a new zonefs
> file system for semi-raw access to zoned block devices that would
> like to use the iomap code instead of reinventing it, so I finally
> had to do the work.
>
>
> Changes since v2:
>  - rebased to v5.3-rc1
>  - folded in a few changes from the gfs2 enablement series

thanks for the much appreciated update.

I've added that and the remaining gfs2 iomap patches to this branch:

  git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git iomap

Your final patch on this branch, "gfs2: use iomap for buffered I/O in
ordered and writeback mode", still causes the following four xfstest
failures:

  generic/311 generic/322 generic/393 generic/418

The bug in generic/393 is that in the nobh case, when unstuffing an
inode, we fail to dirty the unstuffed page and so the page is never
written. This should be easy to fix. The other failures have other
causes (still investigating).

> Changes since v1:
>  - rebased to the latest xfs for-next tree
>  - keep the preallocated transactions for size updates
>  - rename list_pop to list_pop_entry and related cleanups
>  - better document the nofs context handling
>  - document that the iomap tracepoints are not a stable API

Thanks,
Andreas

Re: [PATCH 0/3] ceph: don't NULL terminate virtual xattr values

2019-06-14 Thread Andreas Gruenbacher

On Fri, 14 Jun 2019 at 15:46, Jeff Layton  wrote:
> kcephfs has several "virtual" xattrs that return strings that are
> currently populated using snprintf(), which always NULL terminates the
> string.
>
> This leads to the string being truncated when we use a buffer length
> acquired by calling getxattr with a 0 size first. The last character
> of the string ends up being clobbered by the termination.
>
> The convention with xattrs is to not store the termination with string
> data, given that we have the length. This is how setfattr/getfattr
> operate.
>
> This patch makes ceph's virtual xattrs not include NULL termination
> when formatting their values. In order to handle this, a new
> snprintf_noterm function is added, and ceph is changed over to use
> this to populate the xattr value buffer. Finally, we fix ceph to
> return -ERANGE properly when the string didn't fit in the buffer.

This looks reasonable from an xattr point of view.

Thanks,
Andreas

> Jeff Layton (3):
>   lib/vsprintf: add snprintf_noterm
>   ceph: don't NULL terminate virtual xattr strings
>   ceph: return -ERANGE if virtual xattr value didn't fit in buffer
>
>  fs/ceph/xattr.c|  49 +++---
>  include/linux/kernel.h |   2 +
>  lib/vsprintf.c | 145 -
>  3 files changed, 130 insertions(+), 66 deletions(-)
>
> --
> 2.21.0
>

[GIT PULL] gfs2: Fix rounding error in gfs2_iomap_page_prepare

2019-06-14 Thread Andreas Gruenbacher

Hi Linus,

could you please pull the following gfs2 fix?

Thank you very much,
Andreas

The following changes since commit dc8ca9cc6e23054eb85599c9417ef2416848e7e8:

  Merge tag 'gfs2-v5.2.fixes' of
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
(2019-06-06 12:33:52 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git
tags/gfs2-v5.2.fixes2

for you to fetch changes up to 2741b6723bf6f7d92d07c44bd6a09c6e37f3f949:

  gfs2: Fix rounding error in gfs2_iomap_page_prepare (2019-06-14
18:49:07 +0200)


Fix rounding error in gfs2_iomap_page_prepare


Andreas Gruenbacher (1):
  gfs2: Fix rounding error in gfs2_iomap_page_prepare

 fs/gfs2/bmap.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

[PATCH] gfs2: Fix error path kobject memory leak

2019-05-13 Thread Andreas Gruenbacher

From: "Tobin C. Harding" 

If a call to kobject_init_and_add() fails we must call kobject_put()
otherwise we leak memory.

Function gfs2_sys_fs_add always calls kobject_init_and_add() which
always calls kobject_init().

It is safe to leave object destruction up to the kobject release
function and never free it manually.

Remove call to kfree() and always call kobject_put() in the error path.

Signed-off-by: Tobin C. Harding 
Reviewed-by: Greg Kroah-Hartman 
Signed-off-by: Andreas Gruenbacher 
---
 fs/gfs2/sys.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
index 1787d295834e..08e4996adc23 100644
--- a/fs/gfs2/sys.c
+++ b/fs/gfs2/sys.c
@@ -650,7 +650,6 @@ int gfs2_sys_fs_add(struct gfs2_sbd *sdp)
char ro[20];
char spectator[20];
char *envp[] = { ro, spectator, NULL };
-   int sysfs_frees_sdp = 0;
 
sprintf(ro, "RDONLY=%d", sb_rdonly(sb));
sprintf(spectator, "SPECTATOR=%d", sdp->sd_args.ar_spectator ? 1 : 0);
@@ -661,8 +660,6 @@ int gfs2_sys_fs_add(struct gfs2_sbd *sdp)
if (error)
goto fail_reg;
 
-   sysfs_frees_sdp = 1; /* Freeing sdp is now done by sysfs calling
-   function gfs2_sbd_release. */
error = sysfs_create_group(>sd_kobj, _group);
if (error)
goto fail_reg;
@@ -687,10 +684,7 @@ int gfs2_sys_fs_add(struct gfs2_sbd *sdp)
 fail_reg:
free_percpu(sdp->sd_lkstats);
fs_err(sdp, "error %d adding sysfs files\n", error);
-   if (sysfs_frees_sdp)
-   kobject_put(>sd_kobj);
-   else
-   kfree(sdp);
+   kobject_put(>sd_kobj);
sb->s_fs_info = NULL;
return error;
 }
-- 
2.20.1

Re: [PATCH] gfs2: Fix error path kobject memory leak

2019-05-13 Thread Andreas Gruenbacher

On Mon, 13 May 2019 at 12:40, Tobin C. Harding  wrote:
>
> On Mon, May 13, 2019 at 09:14:05AM +0200, Greg Kroah-Hartman wrote:
> > On Mon, May 13, 2019 at 01:32:13PM +1000, Tobin C. Harding wrote:
> > > If a call to kobject_init_and_add() fails we must call kobject_put()
> > > otherwise we leak memory.
> > >
> > > Function always calls kobject_init_and_add() which always calls
> > > kobject_init().
> > >
> > > It is safe to leave object destruction up to the kobject release
> > > function and never free it manually.
> > >
> > > Remove call to kfree() and always call kobject_put() in the error path.
> > >
> > > Signed-off-by: Tobin C. Harding 
> > > ---
> > >
> > > Is it ok to send patches during the merge window?
> > >
> > > Applies on top of Linus' mainline tag: v5.1
> > >
> > > Happy to rebase if there are conflicts.
> > >
> > > thanks,
> > > Tobin.
> > >
> > >  fs/gfs2/sys.c | 7 +--
> > >  1 file changed, 1 insertion(+), 6 deletions(-)
> > >
> > > diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
> > > index 1787d295834e..98586b139386 100644
> > > --- a/fs/gfs2/sys.c
> > > +++ b/fs/gfs2/sys.c
> > > @@ -661,8 +661,6 @@ int gfs2_sys_fs_add(struct gfs2_sbd *sdp)
> > > if (error)
> > > goto fail_reg;
> > >
> > > -   sysfs_frees_sdp = 1; /* Freeing sdp is now done by sysfs calling
> > > -   function gfs2_sbd_release. */
> >
> > You should also delete this variable at the top of the function, as it
> > is now only set once there and never used.
>
> Thanks, I should have gotten a compiler warning for that.  I was feeling
> so confident with my builds this morning ... pays not to get too cocky
> I suppose.
>
> > With that:
> >
> > Reviewed-by: Greg Kroah-Hartman 
>
> Thanks, will re-spin.

Don't bother, I'll fix that up.

Thanks,
Andreas

Re: GFS2: Pull Request

2019-05-08 Thread Andreas Gruenbacher

Linus,

On Wed, 8 May 2019 at 20:06, Linus Torvalds
 wrote:
> On Wed, May 8, 2019 at 10:55 AM Linus Torvalds
>  wrote:
> >
> > See what I'm saying? You would ask me to pull the un-merged state, but
> > then say "I noticed a few merge conflicts when I did my test merge,
> > and this is what I did" kind of aside.
>
> Side note: this is more important if you know of a merge issue that
> doesn't cause a conflict, and that I won't see in my simple build
> tests.
>
> For example, in this case, the merge issue doesn't cause a conflict
> (because it's a totally new user of bio_for_each_segment_all() and the
> syntax changed in another branch), but I see it trivially when I do a
> test build, since the compiler spews out lots of warnings, and so I
> can trivially fix it up (and you _mentioning_ the issue gives me the
> heads up that you knew about it and what it's all about).
>
> But if it's other architectures, or only happens under special config
> options etc, I might not have seen the merge issue at all. And then
> it's really good if the maintainer talks about it and shows that yes,
> the maintainer knows what he's doing.
>
> Now I'm in the situation where I have actually done the merge the way
> I *like* doing them, and without your superfluous merge commit. But if
> I use my merge, I'll lose the signature from your tag, because you
> signed *your* merge that I didn't actually want to use at all.
>
> See? Your "helpful" merge actually caused me extra work, and made me
> have to pick one of two *worse* situations than if you had just tagged
> your own development tree. Either my tree has a extra pointless merge
> commit, or my tree lacks your signature on your work.

Ok, got it.

Would it make sense to describe how to deal with merge conflicts in
Documentation/maintainer/pull-requests.rst to stop people from getting
this wrong over and over again?

Thanks,
Andreas

Re: [PATCH] overlayfs: ignore empty NFSv4 ACLs in ext4 upperdir

2019-05-02 Thread Andreas Gruenbacher

On Thu, 2 May 2019 at 19:16, J. Bruce Fields  wrote:
> On Thu, May 02, 2019 at 05:08:14PM +0200, Andreas Grünbacher wrote:
> > You'll still see permissions that differ from what the filesystem
> > enforces, and copy-up would change that behavior.
>
> That's always true, and this issue isn't really specific to NFSv4 ACLs
> (or ACLs at all), it already exists with just mode bits.  The client
> doesn't know how principals may be mapped on the server, doesn't know
> group membership, etc.
>
> That's the usual model, anyway.  Permissions are almost entirely the
> server's responsibility, and we just provide a few attributes to set/get
> those server-side permissions.

Sure, if the client and server don't share the same user and group
databases, ACLs can get a very different meaning.

Andreas

> The overlayfs/NFS case is different, I think: the nfs filesystem may be
> just a static read-only template for a filesystem that's only ever used
> by clients, and for all I know maybe permissions should only be
> interpreted on the client side in that case.
>
> --b.

Re: [PATCH] overlayfs: ignore empty NFSv4 ACLs in ext4 upperdir

2019-05-02 Thread Andreas Gruenbacher

On Thu, 2 May 2019 at 19:27, Goetz, Patrick G  wrote:
> On 5/1/19 10:57 PM, NeilBrown wrote:
> > Support some day support for nfs4 acls were added to ext4 (not a totally
> > ridiculous suggestion).  We would then want NFS to allow it's ACLs to be
> > copied up.
>
> Is there some reason why there hasn't been a greater effort to add NFSv4
> ACL support to the mainstream linux filesystems?  I have to support a
> hybrid linux/windows environment and not having these ACLs on ext4 is a
> daily headache for me.

The patches for implementing that have been rejected over and over
again, and nobody is working on them anymore.

Andreas

Re: [PATCH] overlayfs: ignore empty NFSv4 ACLs in ext4 upperdir

2019-05-02 Thread Andreas Gruenbacher

On Thu, 2 May 2019 at 05:57, NeilBrown  wrote:
> On Wed, May 01 2019, Amir Goldstein wrote:
> > On Wed, May 1, 2019 at 10:03 PM NeilBrown  wrote:
> >> On Tue, Dec 06 2016, J. Bruce Fields wrote:
> >> > On Tue, Dec 06, 2016 at 02:18:31PM +0100, Andreas Gruenbacher wrote:
> >> >> On Tue, Dec 6, 2016 at 11:08 AM, Miklos Szeredi  
> >> >> wrote:
> >> >> > On Tue, Dec 6, 2016 at 12:24 AM, Andreas Grünbacher
> >> >> >  wrote:
> >> >> >> 2016-12-06 0:19 GMT+01:00 Andreas Grünbacher 
> >> >> >> :
> >> >> >
> >> >> >>> It's not hard to come up with a heuristic that determines if a
> >> >> >>> system.nfs4_acl value is equivalent to a file mode, and to ignore 
> >> >> >>> the
> >> >> >>> attribute in that case. (The file mode is transmitted in its own
> >> >> >>> attribute already, so actually converting .) That way, overlayfs 
> >> >> >>> could
> >> >> >>> still fail copying up files that have an actual ACL. It's still an
> >> >> >>> ugly hack ...
> >> >> >>
> >> >> >> Actually, that kind of heuristic would make sense in the NFS client
> >> >> >> which could then hide the "system.nfs4_acl" attribute.

I still think the nfs client could make this problem mostly go away by
not exposing "system.nfs4_acl" xattrs when the acl is equivalent to
the file mode. The richacl patches contain a workable abgorithm for
that. The problem would remain for files that have an actual NFS4 ACL,
which just cannot be mapped to a file mode or to POSIX ACLs in the
general case, as well as for files that have a POSIX ACL. Mapping NFS4
ACL that used to be a POSIX ACL back to POSIX ACLs could be achieved
in many cases as well, but the code would be quite messy. A better way
seems to be to using a filesystem that doesn't support POSIX ACLs in
the first place. Unfortunately, xfs doesn't allow turning off POSIX
ACLs, for example.

Andreas

> >> >> > Even simpler would be if knfsd didn't send the attribute if not
> >> >> > necessary.  Looks like there's code actively creating the nfs4_acl on
> >> >> > the wire even if the filesystem had none:
> >> >> >
> >> >> > pacl = get_acl(inode, ACL_TYPE_ACCESS);
> >> >> > if (!pacl)
> >> >> > pacl = posix_acl_from_mode(inode->i_mode, GFP_KERNEL);
> >> >> >
> >> >> > What's the point?
> >> >>
> >> >> That's how the protocol is specified.
> >> >
> >> > Yep, even if we could make that change to nfsd it wouldn't help the
> >> > client with the large number of other servers that are out there
> >> > (including older knfsd's).
> >> >
> >> > --b.
> >> >
> >> >> (I'm not saying that that's very helpful.)
> >> >>
> >> >> Andreas
> >>
> >> Hi everyone.
> >>  I have a customer facing this problem, and so stumbled onto the email
> >>  thread.
> >>  Unfortunately it didn't resolve anything.  Maybe I can help kick things
> >>  along???
> >>
> >>  The core problem here is that NFSv4 and ext4 use different and largely
> >>  incompatible ACL implementations.  There is no way to accurately
> >>  translate from one to the other in general (common specific examples
> >>  can be converted).
> >>
> >>  This means that either:
> >>1/ overlayfs cannot use ext4 for upper and NFS for lower (or vice
> >>   versa) or
> >>2/ overlayfs need to accept that sometimes it cannot copy ACLs, and
> >>   that is OK.
> >>
> >>  Silently not copying the ACLs is probably not a good idea as it might
> >>  result in inappropriate permissions being given away.
> >
> > For example? permissions given away to do what?
> > Note that ovl_permission() only check permissions of *mounter*
> > to read the lower NFS file and ovl_open()/ovl_read_iter() access
> > the lower file with *mounter* credentials.
> >
> > I might be wrong, but seems to me that once admin mounted
> > overlayfs with lower NFS, NFS ACLs are not being enforced at all
> > even before copy up.
>
> I guess it is just as well that copy-up fails then - if the lower-level
> permission check is being ignored.
>
> >
> >> So if the
> >>  sysadmin wants

Re: [PATCH] gfs2: Revert "Fix loop in gfs2_rbm_find"

2019-01-31 Thread Andreas Gruenbacher

On Thu, 31 Jan 2019 at 19:41, Linus Torvalds
 wrote:
> On Wed, Jan 30, 2019 at 12:30 PM Andreas Gruenbacher  
> wrote:
> >
> > This reverts commit 2d29f6b96d8f80322ed2dd895bca590491c38d34.
> >
> > It turns out that the fix can lead to a ~20 percent performance regression
> > in initial writes to the page cache according to iozone.  Let's revert this
> > for now to have more time for a proper fix.
>
> Ugh. I think part of the problem is that the
>
> n += (rbm->bii - initial_bii);
>
> doesn't make any sense when we just set rbm->bii to 0 a couple of
> lines earlier. So it's basically a really odd way to write
>
> n -= initial_bii;
>
> which in turn really doesn't make any sense _either_.

Right, that code clearly doesn't make sense. The only positive about
it is that it hasn't caused any obvious issues so far.

> So I'l all for reverting the commit because it caused a performance
> regression, but the end result really is all kinds of odd.
>
> Is the bug as simple as "we incremented the iterator counter twice"?
> Because in the -E2BIG case, we first increment it by the difference in
> bii, but then we *also* increment it in res_covered_end_of_rgrp()
> (which we do *not* do for the "ret > 0" case, which goes to
> next_iter).

In the "ret > 0" case, rbm->bii should have already been incremented
in gfs2_reservation_check_and_update.

> So if somebody can run the performance test again, how does it look if
> *instead* of the revert, we do what looks at least _slightly_ more
> sensible, and move the "increment iteration count" up to the
> next-bitmap case instead?
>
> At that point, it will actually match the bii updates (because
> next_bitmap also updates rbm->bii). Hmm?
>
> Something like the whitespace-damaged thinig below. Completely
> untested. But *if* this works, it would make more sense than the
> revert..
>
> Hmm?

My idea was to fix that whole loop properly as in this patch instead:

  https://www.redhat.com/archives/cluster-devel/2019-January/msg00111.html

That patch just hasn't seen enough testing to make me comfortable
sending it yet. We're testing it right now, and my plan was to get it
into the next merge window. I don't think an intermediate workaround
would make much sense. Does that sound acceptable? Would you rather
have that fix sent to you sometime next week instead?

Thanks a lot,
Andreas

[PATCH] gfs2: Revert "Fix loop in gfs2_rbm_find"

2019-01-30 Thread Andreas Gruenbacher

This reverts commit 2d29f6b96d8f80322ed2dd895bca590491c38d34.

It turns out that the fix can lead to a ~20 percent performance regression
in initial writes to the page cache according to iozone.  Let's revert this
for now to have more time for a proper fix.

Cc: sta...@vger.kernel.org # v3.13+
Signed-off-by: Andreas Gruenbacher 
Signed-off-by: Bob Peterson 

---
 fs/gfs2/rgrp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 831d7cb5a49c4..17a8d3b439905 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1780,9 +1780,9 @@ static int gfs2_rbm_find(struct gfs2_rbm *rbm, u8 state, 
u32 *minext,
goto next_iter;
}
if (ret == -E2BIG) {
-   n += rbm->bii - initial_bii;
rbm->bii = 0;
rbm->offset = 0;
+   n += (rbm->bii - initial_bii);
goto res_covered_end_of_rgrp;
}
return ret;
-- 
2.20.1

Re: [PATCH] gfs: no need to check return value of debugfs_create functions

2019-01-23 Thread Andreas Gruenbacher

Greg,

On Tue, 22 Jan 2019 at 16:24, Greg Kroah-Hartman
 wrote:
> When calling debugfs functions, there is no need to ever check the
> return value.  The function can work or not, but the code logic should
> never do something different based on this.
>
> There is no need to save the dentries for the debugfs files, so drop
> those variables to save a bit of space and make the code simpler.

looking good, pushed to for-next.

Thanks,
Andreas

Re: [PATCH] jffs2: Fix retry handling jffs2_listxattr

2018-12-17 Thread Andreas Gruenbacher

Hi Richard,

On Sat, 15 Dec 2018 at 11:22, Richard Weinberger  wrote:
> When jffs2 has to retry reading xattrs we need to reset
> the buffer pointer. Otherwise we return old xattrs from the
> previous iteration which leads to a inconsistency between
> the number of bytes we return and the real list size.
>
> Cc: 
> Cc: Andreas Gruenbacher 
> Fixes: 764a5c6b1fa4 ("xattr handlers: Simplify list operation")
> Signed-off-by: Richard Weinberger 

Reviewed-by: Andreas Gruenbacher 

> ---
> Andreas,
>
> since you maintain the attr package too, I report it right here. :-)
> This jffs2 bug lead to a crash in attr_list().
>
> for() will loop to crash when there is no trailing \0 in the
> list of xattrs.
>
> for (l = lbuf; l != lbuf + length; l = strchr(l, '\0') + 1) {
> if (api_unconvert(name, l, flags))
> continue;
>
> ...
> }
>
> I suggest changing the loop condition to something like l < lbuf + length.

With that change alone, strchr would still go beyond the buffer. Let's
just append a null character at the end instead.

Thanks,
Andreas

> Thanks,
> //richard
> ---
>  fs/jffs2/xattr.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/fs/jffs2/xattr.c b/fs/jffs2/xattr.c
> index da3e18503c65..0cb322eb9516 100644
> --- a/fs/jffs2/xattr.c
> +++ b/fs/jffs2/xattr.c
> @@ -967,6 +967,7 @@ ssize_t jffs2_listxattr(struct dentry *dentry, char 
> *buffer, size_t size)
> struct jffs2_xattr_ref *ref, **pref;
> struct jffs2_xattr_datum *xd;
> const struct xattr_handler *xhandle;
> +   char *orig_buffer = buffer;
> const char *prefix;
> ssize_t prefix_len, len, rc;
> int retry = 0;
> @@ -977,6 +978,7 @@ ssize_t jffs2_listxattr(struct dentry *dentry, char 
> *buffer, size_t size)
>
> down_read(>xattr_sem);
>   retry:
> +   buffer = orig_buffer;
> len = 0;
> for (ref=ic->xref, pref=>xref; ref; pref=>next, 
> ref=ref->next) {
> BUG_ON(ref->ic != ic);
> --
> 2.20.0
>

[PATCH] sysfs: Do not return POSIX ACL xattrs via listxattr

2018-09-17 Thread Andreas Gruenbacher

Commit 786534b92f3c introduced a regression that caused listxattr to
return the POSIX ACL attribute names even though sysfs doesn't support
POSIX ACLs.  This happens because simple_xattr_list checks for NULL
i_acl / i_default_acl, but inode_init_always initializes those fields to
ACL_NOT_CACHED ((void *)-1).  For example:

$ getfattr -m- -d /sys
/sys: system.posix_acl_access: Operation not supported
/sys: system.posix_acl_default: Operation not supported

Fix this in simple_xattr_list by checking if the filesystem supports
POSIX ACLs.

Fixes: 786534b92f3c ("tmpfs: listxattr should include POSIX ACL xattrs")
Reported-by: Marc Aurele La France 
Tested-by: Marc Aurèle La France 
Signed-off-by: Andreas Gruenbacher 
CC: Stable  # v4.5+
---
 fs/xattr.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/fs/xattr.c b/fs/xattr.c
index daa732550088..0d6a6a4af861 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -948,17 +948,19 @@ ssize_t simple_xattr_list(struct inode *inode, struct 
simple_xattrs *xattrs,
int err = 0;
 
 #ifdef CONFIG_FS_POSIX_ACL
-   if (inode->i_acl) {
-   err = xattr_list_one(, _size,
-XATTR_NAME_POSIX_ACL_ACCESS);
-   if (err)
-   return err;
-   }
-   if (inode->i_default_acl) {
-   err = xattr_list_one(, _size,
-XATTR_NAME_POSIX_ACL_DEFAULT);
-   if (err)
-   return err;
+   if (IS_POSIXACL(inode)) {
+   if (inode->i_acl) {
+   err = xattr_list_one(, _size,
+XATTR_NAME_POSIX_ACL_ACCESS);
+   if (err)
+   return err;
+   }
+   if (inode->i_default_acl) {
+   err = xattr_list_one(, _size,
+XATTR_NAME_POSIX_ACL_DEFAULT);
+   if (err)
+   return err;
+   }
}
 #endif
 
-- 
2.17.1

[PATCH] sysfs: Do not return POSIX ACL xattrs via listxattr

2018-09-17 Thread Andreas Gruenbacher

Commit 786534b92f3c introduced a regression that caused listxattr to
return the POSIX ACL attribute names even though sysfs doesn't support
POSIX ACLs.  This happens because simple_xattr_list checks for NULL
i_acl / i_default_acl, but inode_init_always initializes those fields to
ACL_NOT_CACHED ((void *)-1).  For example:

$ getfattr -m- -d /sys
/sys: system.posix_acl_access: Operation not supported
/sys: system.posix_acl_default: Operation not supported

Fix this in simple_xattr_list by checking if the filesystem supports
POSIX ACLs.

Fixes: 786534b92f3c ("tmpfs: listxattr should include POSIX ACL xattrs")
Reported-by: Marc Aurele La France 
Tested-by: Marc Aurèle La France 
Signed-off-by: Andreas Gruenbacher 
CC: Stable  # v4.5+
---
 fs/xattr.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/fs/xattr.c b/fs/xattr.c
index daa732550088..0d6a6a4af861 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -948,17 +948,19 @@ ssize_t simple_xattr_list(struct inode *inode, struct 
simple_xattrs *xattrs,
int err = 0;
 
 #ifdef CONFIG_FS_POSIX_ACL
-   if (inode->i_acl) {
-   err = xattr_list_one(, _size,
-XATTR_NAME_POSIX_ACL_ACCESS);
-   if (err)
-   return err;
-   }
-   if (inode->i_default_acl) {
-   err = xattr_list_one(, _size,
-XATTR_NAME_POSIX_ACL_DEFAULT);
-   if (err)
-   return err;
+   if (IS_POSIXACL(inode)) {
+   if (inode->i_acl) {
+   err = xattr_list_one(, _size,
+XATTR_NAME_POSIX_ACL_ACCESS);
+   if (err)
+   return err;
+   }
+   if (inode->i_default_acl) {
+   err = xattr_list_one(, _size,
+XATTR_NAME_POSIX_ACL_DEFAULT);
+   if (err)
+   return err;
+   }
}
 #endif
 
-- 
2.17.1

Re: [PATCH] fs/posix_acl.c: fix kernel-doc warnings, formatting, and typo

2018-09-03 Thread Andreas Gruenbacher

On 3 September 2018 at 22:52, Randy Dunlap  wrote:
> From: Randy Dunlap 
>
> Fix kernel-doc warnings in fs/posic_acl.c.
> Also fix one typo (setgit -> setgid).
>
> ../fs/posix_acl.c:646: warning: Function parameter or member 'inode' not 
> described in 'posix_acl_update_mode'
> ../fs/posix_acl.c:646: warning: Function parameter or member 'mode_p' not 
> described in 'posix_acl_update_mode'
> ../fs/posix_acl.c:646: warning: Function parameter or member 'acl' not 
> described in 'posix_acl_update_mode'
>
> Fixes: 073931017b49d ("posix_acl: Clear SGID bit when setting file 
> permissions")
>
> Signed-off-by: Randy Dunlap 
> Cc: Jan Kara 
> Cc: Andreas Gruenbacher 
> Cc: Alexander Viro 
> Cc: linux-fsde...@vger.kernel.org
> ---
>  fs/posix_acl.c |7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> --- lnx-419-rc2.orig/fs/posix_acl.c
> +++ lnx-419-rc2/fs/posix_acl.c
> @@ -630,12 +630,15 @@ EXPORT_SYMBOL_GPL(posix_acl_create);
>
>  /**
>   * posix_acl_update_mode  -  update mode in set_acl
> + * @inode: target inode
> + * @mode_p: mode (pointer) for update
> + * @acl: acl pointer
>   *
>   * Update the file mode when setting an ACL: compute the new file permission
>   * bits based on the ACL.  In addition, if the ACL is equivalent to the new
> - * file mode, set *acl to NULL to indicate that no ACL should be set.
> + * file mode, set ``*acl`` to NULL to indicate that no ACL should be set.

Should be turned into *@acl instead.

>   *
> - * As with chmod, clear the setgit bit if the caller is not in the owning 
> group
> + * As with chmod, clear the setgid bit if the caller is not in the owning 
> group
>   * or capable of CAP_FSETID (see inode_change_ok).
>   *
>   * Called from set_acl inode operations.

Otherwise,
Acked-by: Andreas Gruenbacher 

Thanks,
Andreas

Re: [PATCH] fs/posix_acl.c: fix kernel-doc warnings, formatting, and typo

2018-09-03 Thread Andreas Gruenbacher

On 3 September 2018 at 22:52, Randy Dunlap  wrote:
> From: Randy Dunlap 
>
> Fix kernel-doc warnings in fs/posic_acl.c.
> Also fix one typo (setgit -> setgid).
>
> ../fs/posix_acl.c:646: warning: Function parameter or member 'inode' not 
> described in 'posix_acl_update_mode'
> ../fs/posix_acl.c:646: warning: Function parameter or member 'mode_p' not 
> described in 'posix_acl_update_mode'
> ../fs/posix_acl.c:646: warning: Function parameter or member 'acl' not 
> described in 'posix_acl_update_mode'
>
> Fixes: 073931017b49d ("posix_acl: Clear SGID bit when setting file 
> permissions")
>
> Signed-off-by: Randy Dunlap 
> Cc: Jan Kara 
> Cc: Andreas Gruenbacher 
> Cc: Alexander Viro 
> Cc: linux-fsde...@vger.kernel.org
> ---
>  fs/posix_acl.c |7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> --- lnx-419-rc2.orig/fs/posix_acl.c
> +++ lnx-419-rc2/fs/posix_acl.c
> @@ -630,12 +630,15 @@ EXPORT_SYMBOL_GPL(posix_acl_create);
>
>  /**
>   * posix_acl_update_mode  -  update mode in set_acl
> + * @inode: target inode
> + * @mode_p: mode (pointer) for update
> + * @acl: acl pointer
>   *
>   * Update the file mode when setting an ACL: compute the new file permission
>   * bits based on the ACL.  In addition, if the ACL is equivalent to the new
> - * file mode, set *acl to NULL to indicate that no ACL should be set.
> + * file mode, set ``*acl`` to NULL to indicate that no ACL should be set.

Should be turned into *@acl instead.

>   *
> - * As with chmod, clear the setgit bit if the caller is not in the owning 
> group
> + * As with chmod, clear the setgid bit if the caller is not in the owning 
> group
>   * or capable of CAP_FSETID (see inode_change_ok).
>   *
>   * Called from set_acl inode operations.

Otherwise,
Acked-by: Andreas Gruenbacher 

Thanks,
Andreas

[PATCH] iomap: Switch to offset_in_page for clarity

2018-08-10 Thread Andreas Gruenbacher

Instead of open-coding pos & (PAGE_SIZE - 1) and pos & ~PAGE_MASK, use
the offset_in_page macro.

Signed-off-by: Andreas Gruenbacher 
---
 fs/iomap.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index 13cdcf33e6c0..ae317ac7e925 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -150,7 +150,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, 
loff_t length, void *data,
 {
struct iomap_readpage_ctx *ctx = data;
struct page *page = ctx->cur_page;
-   unsigned poff = pos & (PAGE_SIZE - 1);
+   unsigned poff = offset_in_page(pos);
unsigned plen = min_t(loff_t, PAGE_SIZE - poff, length);
bool is_contig = false;
sector_t sector;
@@ -280,7 +280,7 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, 
loff_t length,
loff_t done, ret;
 
for (done = 0; done < length; done += ret) {
-   if (ctx->cur_page && ((pos + done) & (PAGE_SIZE - 1)) == 0) {
+   if (ctx->cur_page && offset_in_page(pos + done) == 0) {
if (!ctx->cur_page_in_bio)
unlock_page(ctx->cur_page);
put_page(ctx->cur_page);
@@ -382,9 +382,9 @@ __iomap_write_begin(struct inode *inode, loff_t pos, 
unsigned len,
loff_t block_size = i_blocksize(inode);
loff_t block_start = pos & ~(block_size - 1);
loff_t block_end = (pos + len + block_size - 1) & ~(block_size - 1);
-   unsigned poff = block_start & (PAGE_SIZE - 1);
+   unsigned poff = offset_in_page(block_start);
unsigned plen = min_t(loff_t, PAGE_SIZE - poff, block_end - 
block_start);
-   unsigned from = pos & (PAGE_SIZE - 1), to = from + len;
+   unsigned from = offset_in_page(pos), to = from + len;
 
WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE);
 
@@ -538,7 +538,7 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t 
length, void *data,
unsigned long bytes;/* Bytes to write to page */
size_t copied;  /* Bytes copied from user */
 
-   offset = (pos & (PAGE_SIZE - 1));
+   offset = offset_in_page(pos);
bytes = min_t(unsigned long, PAGE_SIZE - offset,
iov_iter_count(i));
 again:
@@ -652,7 +652,7 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t 
length, void *data,
unsigned long offset;   /* Offset into pagecache page */
unsigned long bytes;/* Bytes to write to page */
 
-   offset = (pos & (PAGE_SIZE - 1));
+   offset = offset_in_page(pos);
bytes = min_t(loff_t, PAGE_SIZE - offset, length);
 
rpage = __iomap_read_page(inode, pos);
@@ -744,7 +744,7 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, 
loff_t count,
do {
unsigned offset, bytes;
 
-   offset = pos & (PAGE_SIZE - 1); /* Within page */
+   offset = offset_in_page(pos);
bytes = min_t(loff_t, PAGE_SIZE - offset, count);
 
if (IS_DAX(inode))
@@ -837,7 +837,7 @@ int iomap_page_mkwrite(struct vm_fault *vmf, const struct 
iomap_ops *ops)
 
/* page is wholly or partially inside EOF */
if (((page->index + 1) << PAGE_SHIFT) > size)
-   length = size & ~PAGE_MASK;
+   length = offset_in_page(size);
else
length = PAGE_SIZE;
 
@@ -1000,7 +1000,7 @@ page_seek_hole_data(struct inode *inode, struct page 
*page, loff_t *lastoff,
goto out_unlock_not_found;
 
for (off = 0; off < PAGE_SIZE; off += bsize) {
-   if ((*lastoff & ~PAGE_MASK) >= off + bsize)
+   if (offset_in_page(*lastoff) >= off + bsize)
continue;
if (ops->is_partially_uptodate(page, off, bsize) == seek_data) {
unlock_page(page);
-- 
2.17.1

[PATCH] iomap: Switch to offset_in_page for clarity

2018-08-10 Thread Andreas Gruenbacher

Instead of open-coding pos & (PAGE_SIZE - 1) and pos & ~PAGE_MASK, use
the offset_in_page macro.

Signed-off-by: Andreas Gruenbacher 
---
 fs/iomap.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/iomap.c b/fs/iomap.c
index 13cdcf33e6c0..ae317ac7e925 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -150,7 +150,7 @@ iomap_readpage_actor(struct inode *inode, loff_t pos, 
loff_t length, void *data,
 {
struct iomap_readpage_ctx *ctx = data;
struct page *page = ctx->cur_page;
-   unsigned poff = pos & (PAGE_SIZE - 1);
+   unsigned poff = offset_in_page(pos);
unsigned plen = min_t(loff_t, PAGE_SIZE - poff, length);
bool is_contig = false;
sector_t sector;
@@ -280,7 +280,7 @@ iomap_readpages_actor(struct inode *inode, loff_t pos, 
loff_t length,
loff_t done, ret;
 
for (done = 0; done < length; done += ret) {
-   if (ctx->cur_page && ((pos + done) & (PAGE_SIZE - 1)) == 0) {
+   if (ctx->cur_page && offset_in_page(pos + done) == 0) {
if (!ctx->cur_page_in_bio)
unlock_page(ctx->cur_page);
put_page(ctx->cur_page);
@@ -382,9 +382,9 @@ __iomap_write_begin(struct inode *inode, loff_t pos, 
unsigned len,
loff_t block_size = i_blocksize(inode);
loff_t block_start = pos & ~(block_size - 1);
loff_t block_end = (pos + len + block_size - 1) & ~(block_size - 1);
-   unsigned poff = block_start & (PAGE_SIZE - 1);
+   unsigned poff = offset_in_page(block_start);
unsigned plen = min_t(loff_t, PAGE_SIZE - poff, block_end - 
block_start);
-   unsigned from = pos & (PAGE_SIZE - 1), to = from + len;
+   unsigned from = offset_in_page(pos), to = from + len;
 
WARN_ON_ONCE(i_blocksize(inode) < PAGE_SIZE);
 
@@ -538,7 +538,7 @@ iomap_write_actor(struct inode *inode, loff_t pos, loff_t 
length, void *data,
unsigned long bytes;/* Bytes to write to page */
size_t copied;  /* Bytes copied from user */
 
-   offset = (pos & (PAGE_SIZE - 1));
+   offset = offset_in_page(pos);
bytes = min_t(unsigned long, PAGE_SIZE - offset,
iov_iter_count(i));
 again:
@@ -652,7 +652,7 @@ iomap_dirty_actor(struct inode *inode, loff_t pos, loff_t 
length, void *data,
unsigned long offset;   /* Offset into pagecache page */
unsigned long bytes;/* Bytes to write to page */
 
-   offset = (pos & (PAGE_SIZE - 1));
+   offset = offset_in_page(pos);
bytes = min_t(loff_t, PAGE_SIZE - offset, length);
 
rpage = __iomap_read_page(inode, pos);
@@ -744,7 +744,7 @@ iomap_zero_range_actor(struct inode *inode, loff_t pos, 
loff_t count,
do {
unsigned offset, bytes;
 
-   offset = pos & (PAGE_SIZE - 1); /* Within page */
+   offset = offset_in_page(pos);
bytes = min_t(loff_t, PAGE_SIZE - offset, count);
 
if (IS_DAX(inode))
@@ -837,7 +837,7 @@ int iomap_page_mkwrite(struct vm_fault *vmf, const struct 
iomap_ops *ops)
 
/* page is wholly or partially inside EOF */
if (((page->index + 1) << PAGE_SHIFT) > size)
-   length = size & ~PAGE_MASK;
+   length = offset_in_page(size);
else
length = PAGE_SIZE;
 
@@ -1000,7 +1000,7 @@ page_seek_hole_data(struct inode *inode, struct page 
*page, loff_t *lastoff,
goto out_unlock_not_found;
 
for (off = 0; off < PAGE_SIZE; off += bsize) {
-   if ((*lastoff & ~PAGE_MASK) >= off + bsize)
+   if (offset_in_page(*lastoff) >= off + bsize)
continue;
if (ops->is_partially_uptodate(page, off, bsize) == seek_data) {
unlock_page(page);
-- 
2.17.1

Re: linux-next: duplicate patches in the gfs2 and xfs trees

2018-07-11 Thread Andreas Gruenbacher

Hi Stephen,

On 12 July 2018 at 06:41, Stephen Rothwell  wrote:
> Hi Andreas,
>
> On Thu, 12 Jul 2018 04:30:07 +0200 Andreas Gruenbacher  
> wrote:
>>
>> this is what I'm seeing (git log --pretty=oneline --abbrev-commit
>> --graph ^origin/master gfs2/for-next):
>>
>> * f79caf101801 (gfs2/for-next) gfs2: use iomap_readpage for blocksize
>> == PAGE_SIZE
>> * af58827ee500 gfs2: Use iomap for stuffed direct I/O reads
>> *   c38e838abe42 Merge branch 'iomap-4.19-merge' into linux-gfs2/for-next
>> |\
>> | * 806a1477b10a (xfs/iomap-4.19-merge) iomap: add inline data support
>> to iomap_readpage_actor
>> | * ec181f6782d8 iomap: support direct I/O to inline data
>> | * 09230435dffd iomap: refactor iomap_dio_actor
>> | * c03cea42149d iomap: add initial support for writes without buffer heads
>> | * 72b4daa24129 iomap: add an iomap-based readpage and readpages 
>> implementation
>> * | 9ab5aa4f4e10 gfs2: fallocate_chunk: Always initialize struct iomap
>> * | 2e2834ef1797 GFS2: Fix recovery issues for spectators
>> * |   5db0147b887e Merge branch 'iomap-write' into linux-gfs2/for-next
>> |\ \
>> | * | 025d0e7f73c6 (gfs2/iomap-write) gfs2: Remove gfs2_write_{begin,end}
>> | * | 967bcc91b044 gfs2: iomap direct I/O support
>> | * | bcfe94139a45 gfs2: gfs2_extent_length cleanup
>> | * | 64bc06bb32ee gfs2: iomap buffered write support
>> | * | d505a96a3b16 gfs2: Further iomap cleanups
>> | |/
>> | * e184fde6f3f5 iomap: add private pointer to struct iomap
>> | * 63899c6f8851 iomap: add a page_done callback
>> | * 19e0c58f6552 iomap: generic inline data handling
>> | * ebf00be37de3 iomap: complete partial direct I/O writes synchronously
>> | * 3d7b6b21f6c5 iomap: mark newly allocated buffer heads as new
>> | * a6d639da63ae fs: factor out a __generic_write_end helper
>> * 3beacef8093b fs: gfs2: Adding new return type vm_fault_t
>> * d80ff78468e4 gfs2: using posix_acl_xattr_size instead of posix_acl_to_xattr
>> * e904f3d486f9 gfs2: Don't reject a supposedly full bitmap if we have
>> blocks reserved
>> * d1475c07f7ce GFS2: rgrp free blocks used incorrectly
>> * b7eba890a228 gfs2: Eliminate redundant ip->i_rgd
>> * 03f8c41c73da gfs2: Stop messing with ip->i_rgd in the rlist code
>> * ee9c7f9ae3d4 gfs2: call ktime_get_coarse_real_ts64() directly
>> * 00251a16d7f9 gfs2: Minor clarification to __gfs2_punch_hole
>> * 9e1a9ecd13b9 gfs2: Don't withdraw under a spin lock
>> * f85c10e24ab9 gfs2: eliminate rs_inum and reduce the size of gfs2 inodes
>>
>> Commit e184fde6f3f5 "iomap: add private pointer to struct iomap" is on
>> xfs/iomap-4.19-merge. That was my initial merge from
>> xfs/iomap-4.19-merge, but it was a fast-forward so there is no merge
>> commit. I've then merged our iomap-write branch into for-next, with
>> two additional commits on top. Then comes the rest of
>> xfs/iomap-4.19-merge (that branch has moved ahead in the meantime),
>> again with two more commits on top.
>>
>> There are no rebased commits, you're looking at the exact same commits.
>
> The problem is that commits
>
>   a6d639da63ae fs: factor out a __generic_write_end helper
>
> to
>
>   806a1477b10a (xfs/iomap-4.19-merge) iomap: add inline data support
>
> have been rebased in the xfs tree from a base of v4.18-rc1 to
> v4.18-rc4, so that those patches now appear twice in linux-next where I
> have merged the gfs2 tree and the xfs tree.

Ah, I see now. It's xfs/for-next that contains those rebased commits
from xfs/iomap-4.19-merge.

> This has caused a few
> conflicts today as there are more changes to the same files affected by
> those commits in the xfs tree. to iomap_readpage_actor
>
> What should have happened is that those commits should not have been
> rebased, so either the xfs tree needs rebuilding to use the old
> commits, or your tree needs to be rebuilt using the new commits from
> the xfs tree.  This is why we do not like the rebasing of published
> trees (especially when a subset of the tree is shared with other
> developers).
>
> Also, if you are going to merge (part of) another tree you need to make
> sure that the other maintainer will not do a rebase of it (I assume
> that this was probably talked about).

Indeed, the idea of setting up xfs/iomap-4.19-merge was to have a
common base that xfs/for-next and gfs2/for-next could both merge from.
Darrick, could you please fix xfs/for-next?

Thanks a lot,
Andreas

Re: linux-next: duplicate patches in the gfs2 and xfs trees

2018-07-11 Thread Andreas Gruenbacher

Hi Stephen,

On 12 July 2018 at 06:41, Stephen Rothwell  wrote:
> Hi Andreas,
>
> On Thu, 12 Jul 2018 04:30:07 +0200 Andreas Gruenbacher  
> wrote:
>>
>> this is what I'm seeing (git log --pretty=oneline --abbrev-commit
>> --graph ^origin/master gfs2/for-next):
>>
>> * f79caf101801 (gfs2/for-next) gfs2: use iomap_readpage for blocksize
>> == PAGE_SIZE
>> * af58827ee500 gfs2: Use iomap for stuffed direct I/O reads
>> *   c38e838abe42 Merge branch 'iomap-4.19-merge' into linux-gfs2/for-next
>> |\
>> | * 806a1477b10a (xfs/iomap-4.19-merge) iomap: add inline data support
>> to iomap_readpage_actor
>> | * ec181f6782d8 iomap: support direct I/O to inline data
>> | * 09230435dffd iomap: refactor iomap_dio_actor
>> | * c03cea42149d iomap: add initial support for writes without buffer heads
>> | * 72b4daa24129 iomap: add an iomap-based readpage and readpages 
>> implementation
>> * | 9ab5aa4f4e10 gfs2: fallocate_chunk: Always initialize struct iomap
>> * | 2e2834ef1797 GFS2: Fix recovery issues for spectators
>> * |   5db0147b887e Merge branch 'iomap-write' into linux-gfs2/for-next
>> |\ \
>> | * | 025d0e7f73c6 (gfs2/iomap-write) gfs2: Remove gfs2_write_{begin,end}
>> | * | 967bcc91b044 gfs2: iomap direct I/O support
>> | * | bcfe94139a45 gfs2: gfs2_extent_length cleanup
>> | * | 64bc06bb32ee gfs2: iomap buffered write support
>> | * | d505a96a3b16 gfs2: Further iomap cleanups
>> | |/
>> | * e184fde6f3f5 iomap: add private pointer to struct iomap
>> | * 63899c6f8851 iomap: add a page_done callback
>> | * 19e0c58f6552 iomap: generic inline data handling
>> | * ebf00be37de3 iomap: complete partial direct I/O writes synchronously
>> | * 3d7b6b21f6c5 iomap: mark newly allocated buffer heads as new
>> | * a6d639da63ae fs: factor out a __generic_write_end helper
>> * 3beacef8093b fs: gfs2: Adding new return type vm_fault_t
>> * d80ff78468e4 gfs2: using posix_acl_xattr_size instead of posix_acl_to_xattr
>> * e904f3d486f9 gfs2: Don't reject a supposedly full bitmap if we have
>> blocks reserved
>> * d1475c07f7ce GFS2: rgrp free blocks used incorrectly
>> * b7eba890a228 gfs2: Eliminate redundant ip->i_rgd
>> * 03f8c41c73da gfs2: Stop messing with ip->i_rgd in the rlist code
>> * ee9c7f9ae3d4 gfs2: call ktime_get_coarse_real_ts64() directly
>> * 00251a16d7f9 gfs2: Minor clarification to __gfs2_punch_hole
>> * 9e1a9ecd13b9 gfs2: Don't withdraw under a spin lock
>> * f85c10e24ab9 gfs2: eliminate rs_inum and reduce the size of gfs2 inodes
>>
>> Commit e184fde6f3f5 "iomap: add private pointer to struct iomap" is on
>> xfs/iomap-4.19-merge. That was my initial merge from
>> xfs/iomap-4.19-merge, but it was a fast-forward so there is no merge
>> commit. I've then merged our iomap-write branch into for-next, with
>> two additional commits on top. Then comes the rest of
>> xfs/iomap-4.19-merge (that branch has moved ahead in the meantime),
>> again with two more commits on top.
>>
>> There are no rebased commits, you're looking at the exact same commits.
>
> The problem is that commits
>
>   a6d639da63ae fs: factor out a __generic_write_end helper
>
> to
>
>   806a1477b10a (xfs/iomap-4.19-merge) iomap: add inline data support
>
> have been rebased in the xfs tree from a base of v4.18-rc1 to
> v4.18-rc4, so that those patches now appear twice in linux-next where I
> have merged the gfs2 tree and the xfs tree.

Ah, I see now. It's xfs/for-next that contains those rebased commits
from xfs/iomap-4.19-merge.

> This has caused a few
> conflicts today as there are more changes to the same files affected by
> those commits in the xfs tree. to iomap_readpage_actor
>
> What should have happened is that those commits should not have been
> rebased, so either the xfs tree needs rebuilding to use the old
> commits, or your tree needs to be rebuilt using the new commits from
> the xfs tree.  This is why we do not like the rebasing of published
> trees (especially when a subset of the tree is shared with other
> developers).
>
> Also, if you are going to merge (part of) another tree you need to make
> sure that the other maintainer will not do a rebase of it (I assume
> that this was probably talked about).

Indeed, the idea of setting up xfs/iomap-4.19-merge was to have a
common base that xfs/for-next and gfs2/for-next could both merge from.
Darrick, could you please fix xfs/for-next?

Thanks a lot,
Andreas

Re: linux-next: duplicate patches in the gfs2 and xfs trees

2018-07-11 Thread Andreas Gruenbacher

Hi,

On 12 July 2018 at 03:37, Stephen Rothwell  wrote:
> Hi all,
>
> It looks like part of the xfs tree (the iomap-4.19-merge branch) that
> was merge into the gfs2 tree has been rebased and the gfs2 tree has not
> been updated to cope.  The rebase commits are exactly the same patches
> (though I didn't check to see if any of the commit messages had changed).

this is what I'm seeing (git log --pretty=oneline --abbrev-commit
--graph ^origin/master gfs2/for-next):

* f79caf101801 (gfs2/for-next) gfs2: use iomap_readpage for blocksize
== PAGE_SIZE
* af58827ee500 gfs2: Use iomap for stuffed direct I/O reads
*   c38e838abe42 Merge branch 'iomap-4.19-merge' into linux-gfs2/for-next
|\
| * 806a1477b10a (xfs/iomap-4.19-merge) iomap: add inline data support
to iomap_readpage_actor
| * ec181f6782d8 iomap: support direct I/O to inline data
| * 09230435dffd iomap: refactor iomap_dio_actor
| * c03cea42149d iomap: add initial support for writes without buffer heads
| * 72b4daa24129 iomap: add an iomap-based readpage and readpages implementation
* | 9ab5aa4f4e10 gfs2: fallocate_chunk: Always initialize struct iomap
* | 2e2834ef1797 GFS2: Fix recovery issues for spectators
* |   5db0147b887e Merge branch 'iomap-write' into linux-gfs2/for-next
|\ \
| * | 025d0e7f73c6 (gfs2/iomap-write) gfs2: Remove gfs2_write_{begin,end}
| * | 967bcc91b044 gfs2: iomap direct I/O support
| * | bcfe94139a45 gfs2: gfs2_extent_length cleanup
| * | 64bc06bb32ee gfs2: iomap buffered write support
| * | d505a96a3b16 gfs2: Further iomap cleanups
| |/
| * e184fde6f3f5 iomap: add private pointer to struct iomap
| * 63899c6f8851 iomap: add a page_done callback
| * 19e0c58f6552 iomap: generic inline data handling
| * ebf00be37de3 iomap: complete partial direct I/O writes synchronously
| * 3d7b6b21f6c5 iomap: mark newly allocated buffer heads as new
| * a6d639da63ae fs: factor out a __generic_write_end helper
* 3beacef8093b fs: gfs2: Adding new return type vm_fault_t
* d80ff78468e4 gfs2: using posix_acl_xattr_size instead of posix_acl_to_xattr
* e904f3d486f9 gfs2: Don't reject a supposedly full bitmap if we have
blocks reserved
* d1475c07f7ce GFS2: rgrp free blocks used incorrectly
* b7eba890a228 gfs2: Eliminate redundant ip->i_rgd
* 03f8c41c73da gfs2: Stop messing with ip->i_rgd in the rlist code
* ee9c7f9ae3d4 gfs2: call ktime_get_coarse_real_ts64() directly
* 00251a16d7f9 gfs2: Minor clarification to __gfs2_punch_hole
* 9e1a9ecd13b9 gfs2: Don't withdraw under a spin lock
* f85c10e24ab9 gfs2: eliminate rs_inum and reduce the size of gfs2 inodes

Commit e184fde6f3f5 "iomap: add private pointer to struct iomap" is on
xfs/iomap-4.19-merge. That was my initial merge from
xfs/iomap-4.19-merge, but it was a fast-forward so there is no merge
commit. I've then merged our iomap-write branch into for-next, with
two additional commits on top. Then comes the rest of
xfs/iomap-4.19-merge (that branch has moved ahead in the meantime),
again with two more commits on top.

There are no rebased commits, you're looking at the exact same commits.

I could redo for-next and add an explicit merge commit with one parent
instead of the fast-forward merge; would that be preferable?

Thanks,
Andreas

Re: linux-next: duplicate patches in the gfs2 and xfs trees

2018-07-11 Thread Andreas Gruenbacher

Hi,

On 12 July 2018 at 03:37, Stephen Rothwell  wrote:
> Hi all,
>
> It looks like part of the xfs tree (the iomap-4.19-merge branch) that
> was merge into the gfs2 tree has been rebased and the gfs2 tree has not
> been updated to cope.  The rebase commits are exactly the same patches
> (though I didn't check to see if any of the commit messages had changed).

this is what I'm seeing (git log --pretty=oneline --abbrev-commit
--graph ^origin/master gfs2/for-next):

* f79caf101801 (gfs2/for-next) gfs2: use iomap_readpage for blocksize
== PAGE_SIZE
* af58827ee500 gfs2: Use iomap for stuffed direct I/O reads
*   c38e838abe42 Merge branch 'iomap-4.19-merge' into linux-gfs2/for-next
|\
| * 806a1477b10a (xfs/iomap-4.19-merge) iomap: add inline data support
to iomap_readpage_actor
| * ec181f6782d8 iomap: support direct I/O to inline data
| * 09230435dffd iomap: refactor iomap_dio_actor
| * c03cea42149d iomap: add initial support for writes without buffer heads
| * 72b4daa24129 iomap: add an iomap-based readpage and readpages implementation
* | 9ab5aa4f4e10 gfs2: fallocate_chunk: Always initialize struct iomap
* | 2e2834ef1797 GFS2: Fix recovery issues for spectators
* |   5db0147b887e Merge branch 'iomap-write' into linux-gfs2/for-next
|\ \
| * | 025d0e7f73c6 (gfs2/iomap-write) gfs2: Remove gfs2_write_{begin,end}
| * | 967bcc91b044 gfs2: iomap direct I/O support
| * | bcfe94139a45 gfs2: gfs2_extent_length cleanup
| * | 64bc06bb32ee gfs2: iomap buffered write support
| * | d505a96a3b16 gfs2: Further iomap cleanups
| |/
| * e184fde6f3f5 iomap: add private pointer to struct iomap
| * 63899c6f8851 iomap: add a page_done callback
| * 19e0c58f6552 iomap: generic inline data handling
| * ebf00be37de3 iomap: complete partial direct I/O writes synchronously
| * 3d7b6b21f6c5 iomap: mark newly allocated buffer heads as new
| * a6d639da63ae fs: factor out a __generic_write_end helper
* 3beacef8093b fs: gfs2: Adding new return type vm_fault_t
* d80ff78468e4 gfs2: using posix_acl_xattr_size instead of posix_acl_to_xattr
* e904f3d486f9 gfs2: Don't reject a supposedly full bitmap if we have
blocks reserved
* d1475c07f7ce GFS2: rgrp free blocks used incorrectly
* b7eba890a228 gfs2: Eliminate redundant ip->i_rgd
* 03f8c41c73da gfs2: Stop messing with ip->i_rgd in the rlist code
* ee9c7f9ae3d4 gfs2: call ktime_get_coarse_real_ts64() directly
* 00251a16d7f9 gfs2: Minor clarification to __gfs2_punch_hole
* 9e1a9ecd13b9 gfs2: Don't withdraw under a spin lock
* f85c10e24ab9 gfs2: eliminate rs_inum and reduce the size of gfs2 inodes

Commit e184fde6f3f5 "iomap: add private pointer to struct iomap" is on
xfs/iomap-4.19-merge. That was my initial merge from
xfs/iomap-4.19-merge, but it was a fast-forward so there is no merge
commit. I've then merged our iomap-write branch into for-next, with
two additional commits on top. Then comes the rest of
xfs/iomap-4.19-merge (that branch has moved ahead in the meantime),
again with two more commits on top.

There are no rebased commits, you're looking at the exact same commits.

I could redo for-next and add an explicit merge commit with one parent
instead of the fast-forward merge; would that be preferable?

Thanks,
Andreas

Re: [PATCH] fs: iomap: Change return type to vm_fault_t

2018-07-03 Thread Andreas Gruenbacher

On 3 July 2018 at 23:39, Darrick J. Wong  wrote:
> On Mon, Jul 02, 2018 at 07:52:41PM +0200, Andreas Gruenbacher wrote:
>> On 2 July 2018 at 17:43, Souptick Joarder  wrote:
>> > Return type has been changed to vm_fault_t type for
>> > iomap_page_mkwrite().
>> >
>> > see commit 1c8f422059ae ("mm: change return type to
>> > vm_fault_t") for reference.
>> >
>> > Signed-off-by: Souptick Joarder 
>> > Reviewed-by: Matthew Wilcox 
>
> I don't recall Christoph [now cc'd] rescinding his NAK of the previous
> version of this patch[1].  Has he changed his mind since May?

Oops, a reply gone wrong -- I was meaning to reply to Souptick
Joarder's gfs2 change which I've added to the gfs2 for-next branch.
Not the iomap change. Sorry for the confusion.

> [1] https://spinics.net/lists/linux-fsdevel/msg126032.html
>
> Now granted I didn't have a problem with the code (and applied the xfs
> version to 4.18 after monitoring to satisfy myself that nothing
> particularly weird happened during 4.17) but seeing as most of the iomap
> changes have gone through hch's review and landed via the xfs tree...
>
>> > ---
>> >  fs/iomap.c| 2 +-
>> >  include/linux/iomap.h | 4 +++-
>> >  2 files changed, 4 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/fs/iomap.c b/fs/iomap.c
>> > index afd1635..58477ee 100644
>> > --- a/fs/iomap.c
>> > +++ b/fs/iomap.c
>> > @@ -443,7 +443,7 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, 
>> > unsigned bytes,
>> > return length;
>> >  }
>> >
>> > -int iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops)
>> > +vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct 
>> > iomap_ops *ops)
>> >  {
>> > struct page *page = vmf->page;
>> > struct inode *inode = file_inode(vmf->vma->vm_file);
>> > diff --git a/include/linux/iomap.h b/include/linux/iomap.h
>> > index 19a07de..666b717 100644
>> > --- a/include/linux/iomap.h
>> > +++ b/include/linux/iomap.h
>> > @@ -3,6 +3,7 @@
>> >  #define LINUX_IOMAP_H 1
>> >
>> >  #include 
>> > +#include 
>> >
>> >  struct fiemap_extent_info;
>> >  struct inode;
>> > @@ -88,7 +89,8 @@ int iomap_zero_range(struct inode *inode, loff_t pos, 
>> > loff_t len,
>> > bool *did_zero, const struct iomap_ops *ops);
>> >  int iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
>> > const struct iomap_ops *ops);
>> > -int iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops);
>> > +vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf,
>> > +   const struct iomap_ops *ops);
>> >  int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>> > loff_t start, loff_t len, const struct iomap_ops *ops);
>> >  loff_t iomap_seek_hole(struct inode *inode, loff_t offset,
>> > --
>> > 1.9.1
>> >
>>
>> Added to for-next, thanks.
>
> ...this part caused me to sit up and say "Hey what?" :)
>
> Which tree is this? gfs2?
>
> --D
>
>>
>> Andreas

Re: [PATCH] fs: iomap: Change return type to vm_fault_t

2018-07-03 Thread Andreas Gruenbacher

On 3 July 2018 at 23:39, Darrick J. Wong  wrote:
> On Mon, Jul 02, 2018 at 07:52:41PM +0200, Andreas Gruenbacher wrote:
>> On 2 July 2018 at 17:43, Souptick Joarder  wrote:
>> > Return type has been changed to vm_fault_t type for
>> > iomap_page_mkwrite().
>> >
>> > see commit 1c8f422059ae ("mm: change return type to
>> > vm_fault_t") for reference.
>> >
>> > Signed-off-by: Souptick Joarder 
>> > Reviewed-by: Matthew Wilcox 
>
> I don't recall Christoph [now cc'd] rescinding his NAK of the previous
> version of this patch[1].  Has he changed his mind since May?

Oops, a reply gone wrong -- I was meaning to reply to Souptick
Joarder's gfs2 change which I've added to the gfs2 for-next branch.
Not the iomap change. Sorry for the confusion.

> [1] https://spinics.net/lists/linux-fsdevel/msg126032.html
>
> Now granted I didn't have a problem with the code (and applied the xfs
> version to 4.18 after monitoring to satisfy myself that nothing
> particularly weird happened during 4.17) but seeing as most of the iomap
> changes have gone through hch's review and landed via the xfs tree...
>
>> > ---
>> >  fs/iomap.c| 2 +-
>> >  include/linux/iomap.h | 4 +++-
>> >  2 files changed, 4 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/fs/iomap.c b/fs/iomap.c
>> > index afd1635..58477ee 100644
>> > --- a/fs/iomap.c
>> > +++ b/fs/iomap.c
>> > @@ -443,7 +443,7 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, 
>> > unsigned bytes,
>> > return length;
>> >  }
>> >
>> > -int iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops)
>> > +vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct 
>> > iomap_ops *ops)
>> >  {
>> > struct page *page = vmf->page;
>> > struct inode *inode = file_inode(vmf->vma->vm_file);
>> > diff --git a/include/linux/iomap.h b/include/linux/iomap.h
>> > index 19a07de..666b717 100644
>> > --- a/include/linux/iomap.h
>> > +++ b/include/linux/iomap.h
>> > @@ -3,6 +3,7 @@
>> >  #define LINUX_IOMAP_H 1
>> >
>> >  #include 
>> > +#include 
>> >
>> >  struct fiemap_extent_info;
>> >  struct inode;
>> > @@ -88,7 +89,8 @@ int iomap_zero_range(struct inode *inode, loff_t pos, 
>> > loff_t len,
>> > bool *did_zero, const struct iomap_ops *ops);
>> >  int iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
>> > const struct iomap_ops *ops);
>> > -int iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops);
>> > +vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf,
>> > +   const struct iomap_ops *ops);
>> >  int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>> > loff_t start, loff_t len, const struct iomap_ops *ops);
>> >  loff_t iomap_seek_hole(struct inode *inode, loff_t offset,
>> > --
>> > 1.9.1
>> >
>>
>> Added to for-next, thanks.
>
> ...this part caused me to sit up and say "Hey what?" :)
>
> Which tree is this? gfs2?
>
> --D
>
>>
>> Andreas

Re: [PATCH] fs: iomap: Change return type to vm_fault_t

2018-07-02 Thread Andreas Gruenbacher

On 2 July 2018 at 17:43, Souptick Joarder  wrote:
> Return type has been changed to vm_fault_t type for
> iomap_page_mkwrite().
>
> see commit 1c8f422059ae ("mm: change return type to
> vm_fault_t") for reference.
>
> Signed-off-by: Souptick Joarder 
> Reviewed-by: Matthew Wilcox 
> ---
>  fs/iomap.c| 2 +-
>  include/linux/iomap.h | 4 +++-
>  2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/fs/iomap.c b/fs/iomap.c
> index afd1635..58477ee 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -443,7 +443,7 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, 
> unsigned bytes,
> return length;
>  }
>
> -int iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops)
> +vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops 
> *ops)
>  {
> struct page *page = vmf->page;
> struct inode *inode = file_inode(vmf->vma->vm_file);
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 19a07de..666b717 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -3,6 +3,7 @@
>  #define LINUX_IOMAP_H 1
>
>  #include 
> +#include 
>
>  struct fiemap_extent_info;
>  struct inode;
> @@ -88,7 +89,8 @@ int iomap_zero_range(struct inode *inode, loff_t pos, 
> loff_t len,
> bool *did_zero, const struct iomap_ops *ops);
>  int iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
> const struct iomap_ops *ops);
> -int iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops);
> +vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf,
> +   const struct iomap_ops *ops);
>  int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> loff_t start, loff_t len, const struct iomap_ops *ops);
>  loff_t iomap_seek_hole(struct inode *inode, loff_t offset,
> --
> 1.9.1
>

Added to for-next, thanks.

Andreas

Re: [PATCH] fs: iomap: Change return type to vm_fault_t

2018-07-02 Thread Andreas Gruenbacher

On 2 July 2018 at 17:43, Souptick Joarder  wrote:
> Return type has been changed to vm_fault_t type for
> iomap_page_mkwrite().
>
> see commit 1c8f422059ae ("mm: change return type to
> vm_fault_t") for reference.
>
> Signed-off-by: Souptick Joarder 
> Reviewed-by: Matthew Wilcox 
> ---
>  fs/iomap.c| 2 +-
>  include/linux/iomap.h | 4 +++-
>  2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/fs/iomap.c b/fs/iomap.c
> index afd1635..58477ee 100644
> --- a/fs/iomap.c
> +++ b/fs/iomap.c
> @@ -443,7 +443,7 @@ static int iomap_dax_zero(loff_t pos, unsigned offset, 
> unsigned bytes,
> return length;
>  }
>
> -int iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops)
> +vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops 
> *ops)
>  {
> struct page *page = vmf->page;
> struct inode *inode = file_inode(vmf->vma->vm_file);
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 19a07de..666b717 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -3,6 +3,7 @@
>  #define LINUX_IOMAP_H 1
>
>  #include 
> +#include 
>
>  struct fiemap_extent_info;
>  struct inode;
> @@ -88,7 +89,8 @@ int iomap_zero_range(struct inode *inode, loff_t pos, 
> loff_t len,
> bool *did_zero, const struct iomap_ops *ops);
>  int iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
> const struct iomap_ops *ops);
> -int iomap_page_mkwrite(struct vm_fault *vmf, const struct iomap_ops *ops);
> +vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf,
> +   const struct iomap_ops *ops);
>  int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> loff_t start, loff_t len, const struct iomap_ops *ops);
>  loff_t iomap_seek_hole(struct inode *inode, loff_t offset,
> --
> 1.9.1
>

Added to for-next, thanks.

Andreas

Re: How to get lockref patch merged?

2018-04-03 Thread Andreas Gruenbacher

On 3 April 2018 at 19:20, Greg Kroah-Hartman <gre...@linuxfoundation.org> wrote:
> On Tue, Apr 03, 2018 at 01:13:00PM +0200, Andreas Gruenbacher wrote:
>> Hello,
>>
>> what's the best way to get the following lockref patch merged? The
>> maintainers file doesn't list a maintainer. Should we go straight to
>> Linus? Does one of you want to take it?
>
> Andrew normally takes lib/ patches, but:
>
>> We have a gfs2 patch that depends on it.
>
> Then just include it in your patch series for gfs2 through the normal
> path you use for that.

Ok, will do. Thanks for your help.

Andreas

Re: How to get lockref patch merged?

2018-04-03 Thread Andreas Gruenbacher

On 3 April 2018 at 19:20, Greg Kroah-Hartman  wrote:
> On Tue, Apr 03, 2018 at 01:13:00PM +0200, Andreas Gruenbacher wrote:
>> Hello,
>>
>> what's the best way to get the following lockref patch merged? The
>> maintainers file doesn't list a maintainer. Should we go straight to
>> Linus? Does one of you want to take it?
>
> Andrew normally takes lib/ patches, but:
>
>> We have a gfs2 patch that depends on it.
>
> Then just include it in your patch series for gfs2 through the normal
> path you use for that.

Ok, will do. Thanks for your help.

Andreas

How to get lockref patch merged?

2018-04-03 Thread Andreas Gruenbacher

Hello,

what's the best way to get the following lockref patch merged? The
maintainers file doesn't list a maintainer. Should we go straight to
Linus? Does one of you want to take it?

We have a gfs2 patch that depends on it.

Thanks,
Andreas

-- Forwarded message --
From: Andreas Gruenbacher <agrue...@redhat.com>
Date: 29 March 2018 at 14:06
Subject: [PATCH v2 1/2] lockref: Add lockref_put_not_zero
To: cluster-de...@redhat.com
Cc: net...@vger.kernel.org, linux-kernel@vger.kernel.org, NeilBrown
<ne...@suse.com>, Thomas Graf <tg...@suug.ch>, Herbert Xu
<herb...@gondor.apana.org.au>, Tom Herbert <t...@quantonium.net>,
Andreas Gruenbacher <agrue...@redhat.com>


Put a lockref unless the lockref is dead or its count would become zero.
This is the same as lockref_put_or_lock except that the lock is never
left held.

Signed-off-by: Andreas Gruenbacher <agrue...@redhat.com>
---
 include/linux/lockref.h |  1 +
 lib/lockref.c   | 28 
 2 files changed, 29 insertions(+)

diff --git a/include/linux/lockref.h b/include/linux/lockref.h
index 2eac32095113..99f17cc8e163 100644
--- a/include/linux/lockref.h
+++ b/include/linux/lockref.h
@@ -37,6 +37,7 @@ struct lockref {
 extern void lockref_get(struct lockref *);
 extern int lockref_put_return(struct lockref *);
 extern int lockref_get_not_zero(struct lockref *);
+extern int lockref_put_not_zero(struct lockref *);
 extern int lockref_get_or_lock(struct lockref *);
 extern int lockref_put_or_lock(struct lockref *);

diff --git a/lib/lockref.c b/lib/lockref.c
index 47169ed7e964..3d468b53d4c9 100644
--- a/lib/lockref.c
+++ b/lib/lockref.c
@@ -80,6 +80,34 @@ int lockref_get_not_zero(struct lockref *lockref)
 }
 EXPORT_SYMBOL(lockref_get_not_zero);

+/**
+ * lockref_put_not_zero - Decrements count unless count <= 1 before decrement
+ * @lockref: pointer to lockref structure
+ * Return: 1 if count updated successfully or 0 if count would become zero
+ */
+int lockref_put_not_zero(struct lockref *lockref)
+{
+   int retval;
+
+   CMPXCHG_LOOP(
+   new.count--;
+   if (old.count <= 1)
+   return 0;
+   ,
+   return 1;
+   );
+
+   spin_lock(>lock);
+   retval = 0;
+   if (lockref->count > 1) {
+   lockref->count--;
+   retval = 1;
+   }
+   spin_unlock(>lock);
+   return retval;
+}
+EXPORT_SYMBOL(lockref_put_not_zero);
+
 /**
  * lockref_get_or_lock - Increments count unless the count is 0 or dead
  * @lockref: pointer to lockref structure
--
2.14.3

How to get lockref patch merged?

2018-04-03 Thread Andreas Gruenbacher

Hello,

what's the best way to get the following lockref patch merged? The
maintainers file doesn't list a maintainer. Should we go straight to
Linus? Does one of you want to take it?

We have a gfs2 patch that depends on it.

Thanks,
Andreas

-- Forwarded message --
From: Andreas Gruenbacher 
Date: 29 March 2018 at 14:06
Subject: [PATCH v2 1/2] lockref: Add lockref_put_not_zero
To: cluster-de...@redhat.com
Cc: net...@vger.kernel.org, linux-kernel@vger.kernel.org, NeilBrown
, Thomas Graf , Herbert Xu
, Tom Herbert ,
Andreas Gruenbacher 


Put a lockref unless the lockref is dead or its count would become zero.
This is the same as lockref_put_or_lock except that the lock is never
left held.

Signed-off-by: Andreas Gruenbacher 
---
 include/linux/lockref.h |  1 +
 lib/lockref.c   | 28 
 2 files changed, 29 insertions(+)

diff --git a/include/linux/lockref.h b/include/linux/lockref.h
index 2eac32095113..99f17cc8e163 100644
--- a/include/linux/lockref.h
+++ b/include/linux/lockref.h
@@ -37,6 +37,7 @@ struct lockref {
 extern void lockref_get(struct lockref *);
 extern int lockref_put_return(struct lockref *);
 extern int lockref_get_not_zero(struct lockref *);
+extern int lockref_put_not_zero(struct lockref *);
 extern int lockref_get_or_lock(struct lockref *);
 extern int lockref_put_or_lock(struct lockref *);

diff --git a/lib/lockref.c b/lib/lockref.c
index 47169ed7e964..3d468b53d4c9 100644
--- a/lib/lockref.c
+++ b/lib/lockref.c
@@ -80,6 +80,34 @@ int lockref_get_not_zero(struct lockref *lockref)
 }
 EXPORT_SYMBOL(lockref_get_not_zero);

+/**
+ * lockref_put_not_zero - Decrements count unless count <= 1 before decrement
+ * @lockref: pointer to lockref structure
+ * Return: 1 if count updated successfully or 0 if count would become zero
+ */
+int lockref_put_not_zero(struct lockref *lockref)
+{
+   int retval;
+
+   CMPXCHG_LOOP(
+   new.count--;
+   if (old.count <= 1)
+   return 0;
+   ,
+   return 1;
+   );
+
+   spin_lock(>lock);
+   retval = 0;
+   if (lockref->count > 1) {
+   lockref->count--;
+   retval = 1;
+   }
+   spin_unlock(>lock);
+   return retval;
+}
+EXPORT_SYMBOL(lockref_put_not_zero);
+
 /**
  * lockref_get_or_lock - Increments count unless the count is 0 or dead
  * @lockref: pointer to lockref structure
--
2.14.3

Re: [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek

2018-03-29 Thread Andreas Gruenbacher

On 29 March 2018 at 17:41, Herbert Xu <herb...@gondor.apana.org.au> wrote:
> On Thu, Mar 29, 2018 at 03:15:54PM +0200, Andreas Gruenbacher wrote:
>>
>> For all I know, Neil's latest plan is to get rhashtable_walk_peek
>> replaced and removed because it is unfixable. This patch removes the
>> one and only user.
>
> His latest patch makes rhashtable_walk_peek stable in the face of
> removals.
>
> https://patchwork.ozlabs.org/patch/892534/

Ok, I can slightly update my patch description. The problem still
remains that glocks can be deleted from the rhashtable between
stop/start, and that needs to be fixed in gfs2. Once that's done,
keeping track of the current glock comes for free and we won't need
rhashtable_walk_peek anymore.

Should rhashtable_walk_peek be kept around even if there are no more
users? I have my doubts.

Thanks,
Andreas

Re: [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek

2018-03-29 Thread Andreas Gruenbacher

On 29 March 2018 at 17:41, Herbert Xu  wrote:
> On Thu, Mar 29, 2018 at 03:15:54PM +0200, Andreas Gruenbacher wrote:
>>
>> For all I know, Neil's latest plan is to get rhashtable_walk_peek
>> replaced and removed because it is unfixable. This patch removes the
>> one and only user.
>
> His latest patch makes rhashtable_walk_peek stable in the face of
> removals.
>
> https://patchwork.ozlabs.org/patch/892534/

Ok, I can slightly update my patch description. The problem still
remains that glocks can be deleted from the rhashtable between
stop/start, and that needs to be fixed in gfs2. Once that's done,
keeping track of the current glock comes for free and we won't need
rhashtable_walk_peek anymore.

Should rhashtable_walk_peek be kept around even if there are no more
users? I have my doubts.

Thanks,
Andreas

Re: [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek

2018-03-29 Thread Andreas Gruenbacher

On 29 March 2018 at 14:35, Herbert Xu <herb...@gondor.apana.org.au> wrote:
> On Thu, Mar 29, 2018 at 02:06:10PM +0200, Andreas Gruenbacher wrote:
>> Here's a second version of the patch (now a patch set) to eliminate
>> rhashtable_walk_peek in gfs2.
>>
>> The first patch introduces lockref_put_not_zero, the inverse of
>> lockref_get_not_zero.
>>
>> The second patch eliminates rhashtable_walk_peek in gfs2.  In
>> gfs2_glock_iter_next, the new lockref function from patch one is used to
>> drop a lockref count as long as the count doesn't drop to zero.  This is
>> almost always the case; if there is a risk of dropping the last
>> reference, we must defer that to a work queue because dropping the last
>> reference may sleep.
>
> In light of Neil's latest patch, do we still need this?

For all I know, Neil's latest plan is to get rhashtable_walk_peek
replaced and removed because it is unfixable. This patch removes the
one and only user.

Thanks,
Andreas

Re: [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek

2018-03-29 Thread Andreas Gruenbacher

On 29 March 2018 at 14:35, Herbert Xu  wrote:
> On Thu, Mar 29, 2018 at 02:06:10PM +0200, Andreas Gruenbacher wrote:
>> Here's a second version of the patch (now a patch set) to eliminate
>> rhashtable_walk_peek in gfs2.
>>
>> The first patch introduces lockref_put_not_zero, the inverse of
>> lockref_get_not_zero.
>>
>> The second patch eliminates rhashtable_walk_peek in gfs2.  In
>> gfs2_glock_iter_next, the new lockref function from patch one is used to
>> drop a lockref count as long as the count doesn't drop to zero.  This is
>> almost always the case; if there is a risk of dropping the last
>> reference, we must defer that to a work queue because dropping the last
>> reference may sleep.
>
> In light of Neil's latest patch, do we still need this?

For all I know, Neil's latest plan is to get rhashtable_walk_peek
replaced and removed because it is unfixable. This patch removes the
one and only user.

Thanks,
Andreas

Re: [Cluster-devel] [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek

2018-03-29 Thread Andreas Gruenbacher

On 29 March 2018 at 14:24, Steven Whitehouse  wrote:
> Hi,
>
> Can we solve the problem another way, by not taking refs on the glocks when
> we are iterating over them for the debugfs files? I assume that is the main
> issue here.
>
> We didn't used to take refs since the rcu locking was enough during the walk
> itself. We used to only keep track of the hash bucket and offset within the
> bucket when we dropped the rcu lock between calls to the iterator. I may
> have lost track of why that approach did not work?

That doesn't work because when a glock doesn't fit into one read, we
need to make sure that the next read will continue with the same glock
or else we'll end up with a corrupted dump. And rhashtable_walk_peek
cannot guarantee that.

I've done some minimal performance testing and the additional ref
taking only impacted the performance in the 10% range or less, so it
doesn't really matter.

Andreas

Re: [Cluster-devel] [PATCH v2 0/2] gfs2: Stop using rhashtable_walk_peek

2018-03-29 Thread Andreas Gruenbacher

On 29 March 2018 at 14:24, Steven Whitehouse  wrote:
> Hi,
>
> Can we solve the problem another way, by not taking refs on the glocks when
> we are iterating over them for the debugfs files? I assume that is the main
> issue here.
>
> We didn't used to take refs since the rcu locking was enough during the walk
> itself. We used to only keep track of the hash bucket and offset within the
> bucket when we dropped the rcu lock between calls to the iterator. I may
> have lost track of why that approach did not work?

That doesn't work because when a glock doesn't fit into one read, we
need to make sure that the next read will continue with the same glock
or else we'll end up with a corrupted dump. And rhashtable_walk_peek
cannot guarantee that.

I've done some minimal performance testing and the additional ref
taking only impacted the performance in the 10% range or less, so it
doesn't really matter.

Andreas

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2694 matches

Mail list logo