Re: [PATCH 4/4] locks: map correct ino/dev pairs when exporting to userspace

2018-08-01 Thread Jeff Layton
On Tue, 2018-07-31 at 14:10 -0700, Mark Fasheh wrote:
> /proc/locks does not always print the correct inode number/device pair.
> Update lock_get_status() to use vfs_map_unique_ino_dev() to get the real,
> unique values for userspace.
> 
> Signed-off-by: Mark Fasheh 
> ---
>  fs/locks.c | 12 +---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/locks.c b/fs/locks.c
> index db7b6917d9c5..3a012df87fd8 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -2621,6 +2621,7 @@ static void lock_get_status(struct seq_file *f, struct 
> file_lock *fl,
>   loff_t id, char *pfx)
>  {
>   struct inode *inode = NULL;
> + struct dentry *dentry;
>   unsigned int fl_pid;
>   struct pid_namespace *proc_pidns = file_inode(f->file)->i_sb->s_fs_info;
>  
> @@ -2633,8 +2634,10 @@ static void lock_get_status(struct seq_file *f, struct 
> file_lock *fl,
>   if (fl_pid == 0)
>   return;
>  
> - if (fl->fl_file != NULL)
> + if (fl->fl_file != NULL) {
>   inode = locks_inode(fl->fl_file);
> + dentry = file_dentry(fl->fl_file);
> + }
>  
>   seq_printf(f, "%lld:%s ", id, pfx);
>   if (IS_POSIX(fl)) {
> @@ -2681,10 +2684,13 @@ static void lock_get_status(struct seq_file *f, 
> struct file_lock *fl,
>  : (fl->fl_type == F_WRLCK) ? "WRITE" : "READ ");
>   }
>   if (inode) {
> + __u64 ino;
> + dev_t dev;
> +
> + vfs_map_unique_ino_dev(dentry, , );

This code is under a spinlock (blocked_locks_lock or ctx->flc_lock). I
don't think it'll be ok to call ->getattr while holding a spinlock.

>   /* userspace relies on this representation of dev_t */
>   seq_printf(f, "%d %02x:%02x:%ld ", fl_pid,
> - MAJOR(inode->i_sb->s_dev),
> -         MINOR(inode->i_sb->s_dev), inode->i_ino);
> + MAJOR(dev), MINOR(dev), inode->i_ino);
>   } else {
>   seq_printf(f, "%d :0 ", fl_pid);
>   }

-- 
Jeff Layton 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] iversion: make inode_cmp_iversion{+raw} return bool instead of s64

2018-01-31 Thread Jeff Layton
On Wed, 2018-01-31 at 08:46 -0800, Linus Torvalds wrote:
> On Wed, Jan 31, 2018 at 4:29 AM, Jeff Layton <jlay...@kernel.org> wrote:
> > 
> > Do you mind just taking it directly? I don't have anything else queued
> > up for this cycle.
> 
> Done.
> 

Thanks...and also many thanks for spotting the original issue. I agree
that this makes it much harder for the callers to get things wrong (and
is probably much more efficient on some arches, as Ted pointed out).

> I wonder if "false for same, true for different" calling convention
> makes much sense, but it matches the old "0 for same" so obviously
> makes for a smaller diff.
> 
> If it ever ends up confusing people, maybe the sense of that function
> should be reversed, and the name changed to something like
> "same_inode_version()" or something.
> 
> But at least for now the situation seems ok to me,
> 

G. Baroncelli suggested changing the name too, so maybe we should just
go ahead and do it. Let me think on what the best approach is and I may
try to send another patch or PR before the end of the merge window.

Cheers,
-- 
Jeff Layton <jlay...@kernel.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] iversion: make inode_cmp_iversion{+raw} return bool instead of s64

2018-01-31 Thread Jeff Layton
On Tue, 2018-01-30 at 12:53 -0800, Linus Torvalds wrote:
> Ack. Should I expect this in a future pull request, or take it directly?
> 
> There's no hurry about this, since none of the existing users of that
> function actually do anything but test the return value against zero,
> and nobody saves it into anything but a "bool" (which has magical
> casting properties and does not lose upper bits).
> 
>   Linus

Do you mind just taking it directly? I don't have anything else queued
up for this cycle.

Thanks,
-- 
Jeff Layton <jlay...@kernel.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] iversion: make inode_cmp_iversion{+raw} return bool instead of s64

2018-01-30 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

As Linus points out:

The inode_cmp_iversion{+raw}() functions are pure and utter crap.

Why?

You say that they return 0/negative/positive, but they do so in a
completely broken manner. They return that ternary value as the
sequence number difference in a 's64', which means that if you
actually care about that ternary value, and do the *sane* thing that
the kernel-doc of the function implies is the right thing, you would
do

int cmp = inode_cmp_iversion(inode, old);
if (cmp < 0 ...

and as a result you get code that looks sane, but that doesn't
actually *WORK* right.

Since none of the callers actually care about the ternary value here,
convert the inode_cmp_iversion{+raw} functions to just return a boolean
value (false for matching, true for non-matching).

This matches the existing use of these functions just fine, and makes it
simple to convert them to return a ternary value in the future if we
grow callers that need it.

With this change we can also reimplement inode_cmp_iversion in a simpler
way using inode_peek_iversion.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 include/linux/iversion.h | 20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/include/linux/iversion.h b/include/linux/iversion.h
index 858463fca249..3d2fd06495ec 100644
--- a/include/linux/iversion.h
+++ b/include/linux/iversion.h
@@ -309,13 +309,13 @@ inode_query_iversion(struct inode *inode)
  * @inode: inode to check
  * @old: old value to check against its i_version
  *
- * Compare the current raw i_version counter with a previous one. Returns 0 if
- * they are the same or non-zero if they are different.
+ * Compare the current raw i_version counter with a previous one. Returns false
+ * if they are the same or true if they are different.
  */
-static inline s64
+static inline bool
 inode_cmp_iversion_raw(const struct inode *inode, u64 old)
 {
-   return (s64)inode_peek_iversion_raw(inode) - (s64)old;
+   return inode_peek_iversion_raw(inode) != old;
 }
 
 /**
@@ -323,19 +323,15 @@ inode_cmp_iversion_raw(const struct inode *inode, u64 old)
  * @inode: inode to check
  * @old: old value to check against its i_version
  *
- * Compare an i_version counter with a previous one. Returns 0 if they are
- * the same, a positive value if the one in the inode appears newer than @old,
- * and a negative value if @old appears to be newer than the one in the
- * inode.
+ * Compare an i_version counter with a previous one. Returns false if they are
+ * the same, and true if they are different.
  *
  * Note that we don't need to set the QUERIED flag in this case, as the value
  * in the inode is not being recorded for later use.
  */
-
-static inline s64
+static inline bool
 inode_cmp_iversion(const struct inode *inode, u64 old)
 {
-   return (s64)(inode_peek_iversion_raw(inode) & ~I_VERSION_QUERIED) -
-  (s64)(old << I_VERSION_QUERIED_SHIFT);
+   return inode_peek_iversion(inode) != old;
 }
 #endif
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] iversion: make inode_cmp_iversion{+raw} return bool instead of s64

2018-01-30 Thread Jeff Layton
On Tue, 2018-01-30 at 17:50 +, Trond Myklebust wrote:
> On Tue, 2018-01-30 at 12:31 -0500, Jeff Layton wrote:
> > From: Jeff Layton <jlay...@redhat.com>
> > 
> > As Linus points out:
> > 
> > The inode_cmp_iversion{+raw}() functions are pure and utter crap.
> > 
> > Why?
> > 
> > You say that they return 0/negative/positive, but they do so in a
> > completely broken manner. They return that ternary value as the
> > sequence number difference in a 's64', which means that if you
> > actually care about that ternary value, and do the *sane* thing
> > that
> > the kernel-doc of the function implies is the right thing, you
> > would
> > do
> > 
> > int cmp = inode_cmp_iversion(inode, old);
> > if (cmp < 0 ...
> > 
> > and as a result you get code that looks sane, but that doesn't
> > actually *WORK* right.
> > 
> > Since none of the callers actually care about the ternary value here,
> > convert the inode_cmp_iversion{+raw} functions to just return a
> > boolean
> > value (false for matching, true for non-matching).
> > 
> > This matches the existing use of these functions just fine, and makes
> > it
> > simple to convert them to return a ternary value in the future if we
> > grow callers that need it.
> > 
> > Signed-off-by: Jeff Layton <jlay...@redhat.com>
> > ---
> >  include/linux/iversion.h | 20 +---
> >  1 file changed, 9 insertions(+), 11 deletions(-)
> > 
> > diff --git a/include/linux/iversion.h b/include/linux/iversion.h
> > index 858463fca249..ace32775c5f0 100644
> > --- a/include/linux/iversion.h
> > +++ b/include/linux/iversion.h
> > @@ -309,13 +309,13 @@ inode_query_iversion(struct inode *inode)
> >   * @inode: inode to check
> >   * @old: old value to check against its i_version
> >   *
> > - * Compare the current raw i_version counter with a previous one.
> > Returns 0 if
> > - * they are the same or non-zero if they are different.
> > + * Compare the current raw i_version counter with a previous one.
> > Returns false
> > + * if they are the same or true if they are different.
> >   */
> > -static inline s64
> > +static inline bool
> >  inode_cmp_iversion_raw(const struct inode *inode, u64 old)
> >  {
> > -   return (s64)inode_peek_iversion_raw(inode) - (s64)old;
> > +   return inode_peek_iversion_raw(inode) != old;
> >  }
> >  
> >  /**
> > @@ -323,19 +323,17 @@ inode_cmp_iversion_raw(const struct inode
> > *inode, u64 old)
> >   * @inode: inode to check
> >   * @old: old value to check against its i_version
> >   *
> > - * Compare an i_version counter with a previous one. Returns 0 if
> > they are
> > - * the same, a positive value if the one in the inode appears newer
> > than @old,
> > - * and a negative value if @old appears to be newer than the one in
> > the
> > - * inode.
> > + * Compare an i_version counter with a previous one. Returns false
> > if they are
> > + * the same, and true if they are different.
> >   *
> >   * Note that we don't need to set the QUERIED flag in this case, as
> > the value
> >   * in the inode is not being recorded for later use.
> >   */
> >  
> > -static inline s64
> > +static inline bool
> >  inode_cmp_iversion(const struct inode *inode, u64 old)
> >  {
> > -   return (s64)(inode_peek_iversion_raw(inode) &
> > ~I_VERSION_QUERIED) -
> > -  (s64)(old << I_VERSION_QUERIED_SHIFT);
> > +   return (inode_peek_iversion_raw(inode) & ~I_VERSION_QUERIED)
> > !=
> > +   (old << I_VERSION_QUERIED_SHIFT);
> >  }
> 
> Is there any reason why this couldn't just use inode_peek_iversion()
> instead of having to both mask the output from
> inode_peek_iversion_raw() and shift 'old'?

None at all. I'll send a v2.
-- 
Jeff Layton <jlay...@kernel.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] inode->i_version rework for v4.16

2018-01-30 Thread Jeff Layton
On Mon, 2018-01-29 at 13:50 -0800, Linus Torvalds wrote:
> On Mon, Jan 29, 2018 at 4:26 AM, Jeff Layton <jlay...@redhat.com> wrote:
> > 
> > This pile of patches is a rework of the inode->i_version field. We have
> > traditionally incremented that field on every inode data or metadata
> > change. Typically this increment needs to be logged on disk even when
> > nothing else has changed, which is rather expensive.
> 
> Hmm. I have pulled this, but it is really really broken in one place,
> to the degree that I always went "no, I won't pull this garbage".
> 
> But the breakage is potential, not actual, and can be fixed trivially,
> so I'll let it slide - but I do require it to be fixed. And I require
> people to *think* about it.
> 
> So what's to horribly horribly wrong?
> 
> The inode_cmp_iversion{+raw}() functions are pure and utter crap.
> 
> Why?
> 
> You say that they return 0/negative/positive, but they do so in a
> completely broken manner. They return that ternary value as the
> sequence number difference in a 's64', which means that if you
> actually care about that ternary value, and do the *sane* thing that
> the kernel-doc of the function implies is the right thing, you would
> do
> 
> int cmp = inode_cmp_iversion(inode, old);
> 
> if (cmp < 0 ...
> 
> and as a result you get code that looks sane, but that doesn't
> actually *WORK* right.
> 

My intent here was to have this handle wraparound using the same sort of
method that the time_before/time_after macros do. Obviously, I didn't
document that well.

I want to make sure I understand what's actually broken here thoug. Is
it only broken when the two values are more than 2**63 apart, or is
there something else more fundamentally wrong here?

> To make it even worse, it will actually work in practice by accident
> in 99.9% of all cases, so now you have
> 
>  (a) subtly buggy code
>  (b) that looks fine
>  (c) and that works in testing
> 
> which is just about the worst possible case for any code. The
> interface is simply garbage that encourages bugs.
> 
> And the bug wouldn't be in the user, the bug would be in this code you
> just sent me. The interface is simply wrong.
> 
> So this absolutely needs to be fixed. I see two fixes:
> 
>  - just return a boolean. That's all that any current user actually
> wants, so the ternary value seems pointless.
> 
>  - make it return an 'int', and not just any int, but -1/0/1. That way
> there is no worry about uses, and if somebody *really* cares about the
> ternary value, they can now use a "switch" statement to get it
> (alternatively, make it return an enum, but whatever).
> 
> That "ternary" function that has 18446744069414584320 incorrect return
> values really is unacceptable.
> 

I think I'll just make it return a boolean value like you suggested
first. I'll send a patch to fix it once I've done some basic testing
with it.

Many thanks,
--
Jeff Layton <jlay...@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] inode->i_version rework for v4.16

2018-01-29 Thread Jeff Layton
The following changes since commit 50c4c4e268a2d7a3e58ebb698ac74da0de40ae36:

  Linux 4.15-rc3 (2017-12-10 17:56:26 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git 
tags/iversion-v4.16-1

for you to fetch changes up to f02a9ad1f15daf4378afeda025a53455f72645dd:

  fs: handle inode->i_version more efficiently (2018-01-29 06:42:21 -0500)


Hi Linus,

This pile of patches is a rework of the inode->i_version field. We have
traditionally incremented that field on every inode data or metadata
change. Typically this increment needs to be logged on disk even when
nothing else has changed, which is rather expensive.

It turns out though that none of the consumers of that field actually
require this behavior. The only real requirement for all of them is that
it be different iff the inode has changed since the last time the field
was checked.

Given that, we can optimize away most of the i_version increments and
avoid dirtying inode metadata when the only change is to the i_version 
and no one is querying it. Queries of the i_version field are rather
rare, so we can help write performance under many common workloads.

This patch series converts existing accesses of the i_version field to a
new API, and then converts all of the in-kernel filesystems to use it.
The last patch in the series then converts the backend implementation to
a scheme that optimizes away a large portion of the metadata updates
when no one is looking at it.

In my own testing this series significantly helps performance with small
I/O sizes. I also got this email for Christmas this year from the kernel
test robot (a 244% r/w bandwidth improvement with XFS over DAX, with 4k
writes):

https://lkml.org/lkml/2017/12/25/8

A few of the earlier patches in this pile are also flowing to you via
other trees (mm, integrity, and nfsd trees in particular), so there may
be some minor merge conflicts here. Hopefully they're trivial to
resolve, but let me know if there are problems.

Thanks!
----
Jeff Layton (21):
  lustre: don't set f_version in ll_readdir
  ntfs: remove i_version handling
  fs: new API for handling inode->i_version
  fs: don't take the i_lock in inode_inc_iversion
  fat: convert to new i_version API
  affs: convert to new i_version API
  afs: convert to new i_version API
  btrfs: convert to new i_version API
  exofs: switch to new i_version API
  ext2: convert to new i_version API
  ext4: convert to new i_version API
  nfs: convert to new i_version API
  nfsd: convert to new i_version API
  ocfs2: convert to new i_version API
  ufs: use new i_version API
  xfs: convert to new i_version API
  IMA: switch IMA over to new i_version API
  fs: only set S_VERSION when updating times if necessary
  xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need incrementing
  btrfs: only dirty the inode in btrfs_update_time if something was changed
  fs: handle inode->i_version more efficiently

Sascha Hauer (1):
  ima: Use i_version only when filesystem supports it

 drivers/staging/lustre/lustre/llite/dir.c |   3 -
 fs/affs/amigaffs.c|   5 +-
 fs/affs/dir.c |   5 +-
 fs/affs/super.c   |   3 +-
 fs/afs/fsclient.c |   3 +-
 fs/afs/inode.c|   5 +-
 fs/btrfs/delayed-inode.c  |   7 +-
 fs/btrfs/file.c   |   1 +
 fs/btrfs/inode.c  |  12 +-
 fs/btrfs/ioctl.c  |   1 +
 fs/btrfs/tree-log.c   |   4 +-
 fs/btrfs/xattr.c  |   1 +
 fs/exofs/dir.c|   9 +-
 fs/exofs/super.c  |   3 +-
 fs/ext2/dir.c |   9 +-
 fs/ext2/super.c   |   5 +-
 fs/ext4/dir.c |   9 +-
 fs/ext4/inline.c  |   7 +-
 fs/ext4/inode.c   |  13 +-
 fs/ext4/ioctl.c   |   3 +-
 fs/ext4/namei.c   |   5 +-
 fs/ext4/super.c   |   3 +-
 fs/ext4/xattr.c   |   5 +-
 fs/fat/dir.c  |   3 +-
 fs/fat/inode.c|   9 +-
 fs/fat/namei_msdos.c  |   7 +-
 fs/fat/namei_vfat.c   |  22 +-
 fs/inode.c|  11 +-
 fs/nfs/delegation.c   |   3 +-
 fs/nfs/fscache-index.c|   5 +-
 fs/nfs/inode.c|  18 +-
 fs/nfs/nfs4proc.c |  10 +-
 fs/nfs/nfstrace.h |   5 +-
 fs

Re: [PATCH v5 02/19] fs: don't take the i_lock in inode_inc_iversion

2018-01-19 Thread Jeff Layton
On Thu, 2018-01-18 at 16:45 -0500, J. Bruce Fields wrote:
> On Tue, Jan 09, 2018 at 09:10:42AM -0500, Jeff Layton wrote:
> > From: Jeff Layton <jlay...@redhat.com>
> > 
> > The rationale for taking the i_lock when incrementing this value is
> > lost in antiquity. The readers of the field don't take it (at least
> > not universally), so my assumption is that it was only done here to
> > serialize incrementors.
> > 
> > If that is indeed the case, then we can drop the i_lock from this
> > codepath and treat it as a atomic64_t for the purposes of
> > incrementing it. This allows us to use inode_inc_iversion without
> > any danger of lock inversion.
> > 
> > Note that the read side is not fetched atomically with this change.
> > The assumption here is that that is not a critical issue since the
> > i_version is not fully synchronized with anything else anyway.
> 
> So I guess it's theoretically possible that e.g. if you read while it's
> incrementing from 2^32-1 to 2^32 you could read 0, 1, or 2^32+1?
>
> If so then you could see an i_version value reused and incorrectly
> decide that a file hadn't changed.
> 
> But it's such a tiny case, and I think you convert this to atomic64_t
> later anyway, so, whatever.
> 
> --b.
> 

Shrug...we have that problem with the spinlock in place too. The bottom
line is that reads of this value are not serialized with the increment
at all.

I'm not 100% thrilled with this patch, but I think it's probably better
not to add the i_lock all over the place, even as an interim step in
cleaning this stuff up.

The good news here (as you mention) is that this nastiness gets cleaned
up in the last patch when we convert the thing to an atomic64_t.


> > 
> > Signed-off-by: Jeff Layton <jlay...@redhat.com>
> > ---
> >  include/linux/iversion.h | 7 ---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/iversion.h b/include/linux/iversion.h
> > index d09cc3a08740..5ad9eaa3a9b0 100644
> > --- a/include/linux/iversion.h
> > +++ b/include/linux/iversion.h
> > @@ -104,12 +104,13 @@ inode_set_iversion_queried(struct inode *inode, u64 
> > new)
> >  static inline bool
> >  inode_maybe_inc_iversion(struct inode *inode, bool force)
> >  {
> > -   spin_lock(>i_lock);
> > -   inode->i_version++;
> > -   spin_unlock(>i_lock);
> > +   atomic64_t *ivp = (atomic64_t *)>i_version;
> > +
> > +   atomic64_inc(ivp);
> > return true;
> >  }
> >  
> > +
> >  /**
> >   * inode_inc_iversion - forcibly increment i_version
> >   * @inode: inode that needs to be updated
> > -- 
> > 2.14.3

-- 
Jeff Layton <jlay...@kernel.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 01/19] fs: new API for handling inode->i_version

2018-01-18 Thread Jeff Layton
On Thu, 2018-01-18 at 16:38 -0500, J. Bruce Fields wrote:
> On Tue, Jan 09, 2018 at 09:10:41AM -0500, Jeff Layton wrote:
> > --- /dev/null
> > +++ b/include/linux/iversion.h
> > @@ -0,0 +1,236 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef _LINUX_IVERSION_H
> > +#define _LINUX_IVERSION_H
> > +
> > +#include 
> > +
> > +/*
> > + * The change attribute (i_version) is mandated by NFSv4 and is mostly for
> > + * knfsd, but is also used for other purposes (e.g. IMA). The i_version 
> > must
> > + * appear different to observers if there was a change to the inode's data 
> > or
> > + * metadata since it was last queried.
> > + *
> > + * Observers see the i_version as a 64-bit number that never changes.
> 
> I don't understand that sentence.
> 

That's because it's utter nonsense. I noticed that the other day and
fixed it in my tree. It now reads:

* Observers see the i_version as a 64-bit number that never decreases.

> > If it
> > + * remains the same since it was last checked, then nothing has changed in 
> > the
> > + * inode. If it's different then something has changed. Observers cannot 
> > infer
> > + * anything about the nature or magnitude of the changes from the value, 
> > only
> > + * that the inode has changed in some fashion.
> 
> As we've discussed before, there may be brief windows where the first
> two statements aren't quite correct.  I think that would be worth a
> mention if we can keep it concise.  Maybe add something like this?:
> 
>   It may be impractical for filesystems to keep i_version updates
>   atomic with respect to the changes that cause them.  They
>   should, however, guarantee that i_version updates are never
>   visible before the changes that caused them.  Also, i_version
>   updates should never be delayed longer than it takes the
>   original change to reach disk.

That makes sense. I added it in pretty much verbatim. I think we mostly
follow the latter should already.

> Or maybe those details are best left to documentation on the relevant
> parts of the api below (maybe inode_maybe_inc_iversion?).
> 
> I dunno if it's also worth mentioning that nfsd doesn't actually use the
> raw i_version--it mixes it with ctime to prevent i_version reuse after
> reboot.  Presumably that doesn't matter to IMA since it doesn't compare
> i_version across reboots.
> 

I think I won't document that here. nfsd is a consumer of i_version.
What it does with it is sort of its own business. Might be good to have
a comment blurb in the nfsd code about it though.

> The documentation here is all very helpful, thanks.

Thanks for all of the suggestions so far!
-- 
Jeff Layton <jlay...@kernel.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 00/19] fs: rework and optimize i_version handling in filesystems

2018-01-12 Thread Jeff Layton
On Thu, 2018-01-11 at 12:23 -0800, Liu Bo wrote:
> On Tue, Jan 09, 2018 at 09:10:40AM -0500, Jeff Layton wrote:
> > From: Jeff Layton <jlay...@redhat.com>
> > 
> > v5:
> > - don't corrupt refcounts stashed in i_version of ext4 xattr inodes
> > - add raw variants of inc and cmp functions, and have nfs use them
> > 
> > v4:
> > - fix SB_LAZYTIME handling in generic_update_time
> > - add memory barriers to patch to convert i_version field to atomic64_t
> > 
> > v3:
> > - move i_version handling functions to new header file
> > - document that the kernel-managed i_version implementation will appear to
> >   increase over time
> > - fix inode_cmp_iversion to handle wraparound correctly
> > 
> > v2:
> > - xfs should use inode_peek_iversion instead of inode_peek_iversion_raw
> > - rework file_update_time patch
> > - don't dirty inode when only S_ATIME is set and SB_LAZYTIME is enabled
> > - better comments and documentation
> > 
> > I think this is now approaching merge readiness.
> > 
> > Special thanks to Jan Kara and Dave Chinner who helped me tighten up the
> > memory barriers in the final patch, and Krzysztof Kozlowski for help in
> > tracking down a set of bugs in the NFS client patch.
> > 
> > tl;dr: I think we can greatly reduce the cost of the inode->i_version
> > counter, by exploiting the fact that we don't need to increment it if no
> > one is looking at it. We can also clean up the code to prepare to
> > eventually expose this value via statx().
> > 
> > Note that this set relies on a few patches that are in other trees. The
> > full stack that I've been testing with is here:
> > 
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git/log/?h=iversion
> > 
> > The inode->i_version field is supposed to be a value that changes
> > whenever there is any data or metadata change to the inode. Some
> > filesystems use it internally to detect directory changes during
> > readdir. knfsd will use it if the filesystem has MS_I_VERSION set. IMA
> > will also use it to optimize away some remeasurement if it's available.
> > NFS and AFS just use it to store an opaque change attribute from the
> > server.
> > 
> > Only btrfs, ext4, and xfs increment it for data changes. Because of
> > this, these filesystems must log the inode to disk whenever the
> > i_version counter changes. That has a non-zero performance impact,
> > especially on write-heavy workloads, because we end up dirtying the
> > inode metadata on every write, not just when the times change.
> > 
> > It turns out though that none of these users of i_version require that
> > it change on every change to the file. The only real requirement is that
> > it be different if something changed since the last time we queried for
> > it.
> > 
> > If we keep track of when something queries the value, we can avoid
> > bumping the counter and an on-disk update when nothing else has changed
> > if no one has queried it since it was last incremented.
> > 
> > This patchset changes the code to only bump the i_version counter when
> > it's strictly necessary, or when we're updating the inode metadata
> > anyway (e.g. when times change).
> > 
> > It takes the approach of converting the existing accessors of i_version
> > to use a new API, while leaving the underlying implementation mostly the
> > same.  The last patch then converts the existing implementation to keep
> > track of whether the value has been queried since it was last
> > incremented. It then uses that to avoid incrementing the counter when
> > it can.
> > 
> > With this, we reduce inode metadata updates across all 3 filesystems
> > down to roughly the frequency of the timestamp granularity, particularly
> > when it's not being queried (the vastly common case).
> > 
> > I can see measurable performance gains on xfs and ext4 with iversion
> > enabled, when streaming small (4k) I/Os.
> > 
> > btrfs shows some slight gain in testing, but not quite the magnitude
> > that xfs and ext4 show. I'm not sure why yet and would appreciate some
> > input from btrfs folks.
> > 
> 
> Thanks for the patchset.
> 
> Not sure about how you tested the performance, but in terms of
> write+fsync or synchronous write, btrfs's fsync doesn't check if no
> timestamp/iversion has been changed, instead only checks if inode has
> been logged by some btrfs internal flags and counters, probably
> because by default every write is cow and every write incurs a new
> block a

[PATCH v5 00/19] fs: rework and optimize i_version handling in filesystems

2018-01-09 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

v5:
- don't corrupt refcounts stashed in i_version of ext4 xattr inodes
- add raw variants of inc and cmp functions, and have nfs use them

v4:
- fix SB_LAZYTIME handling in generic_update_time
- add memory barriers to patch to convert i_version field to atomic64_t

v3:
- move i_version handling functions to new header file
- document that the kernel-managed i_version implementation will appear to
  increase over time
- fix inode_cmp_iversion to handle wraparound correctly

v2:
- xfs should use inode_peek_iversion instead of inode_peek_iversion_raw
- rework file_update_time patch
- don't dirty inode when only S_ATIME is set and SB_LAZYTIME is enabled
- better comments and documentation

I think this is now approaching merge readiness.

Special thanks to Jan Kara and Dave Chinner who helped me tighten up the
memory barriers in the final patch, and Krzysztof Kozlowski for help in
tracking down a set of bugs in the NFS client patch.

tl;dr: I think we can greatly reduce the cost of the inode->i_version
counter, by exploiting the fact that we don't need to increment it if no
one is looking at it. We can also clean up the code to prepare to
eventually expose this value via statx().

Note that this set relies on a few patches that are in other trees. The
full stack that I've been testing with is here:


https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git/log/?h=iversion

The inode->i_version field is supposed to be a value that changes
whenever there is any data or metadata change to the inode. Some
filesystems use it internally to detect directory changes during
readdir. knfsd will use it if the filesystem has MS_I_VERSION set. IMA
will also use it to optimize away some remeasurement if it's available.
NFS and AFS just use it to store an opaque change attribute from the
server.

Only btrfs, ext4, and xfs increment it for data changes. Because of
this, these filesystems must log the inode to disk whenever the
i_version counter changes. That has a non-zero performance impact,
especially on write-heavy workloads, because we end up dirtying the
inode metadata on every write, not just when the times change.

It turns out though that none of these users of i_version require that
it change on every change to the file. The only real requirement is that
it be different if something changed since the last time we queried for
it.

If we keep track of when something queries the value, we can avoid
bumping the counter and an on-disk update when nothing else has changed
if no one has queried it since it was last incremented.

This patchset changes the code to only bump the i_version counter when
it's strictly necessary, or when we're updating the inode metadata
anyway (e.g. when times change).

It takes the approach of converting the existing accessors of i_version
to use a new API, while leaving the underlying implementation mostly the
same.  The last patch then converts the existing implementation to keep
track of whether the value has been queried since it was last
incremented. It then uses that to avoid incrementing the counter when
it can.

With this, we reduce inode metadata updates across all 3 filesystems
down to roughly the frequency of the timestamp granularity, particularly
when it's not being queried (the vastly common case).

I can see measurable performance gains on xfs and ext4 with iversion
enabled, when streaming small (4k) I/Os.

btrfs shows some slight gain in testing, but not quite the magnitude
that xfs and ext4 show. I'm not sure why yet and would appreciate some
input from btrfs folks.

My goal is to get this into linux-next fairly soon. If it shows no
problems then we can look at merging it for 4.16, or 4.17 if all of the
prequisite patches are not yet merged.

Jeff Layton (19):
  fs: new API for handling inode->i_version
  fs: don't take the i_lock in inode_inc_iversion
  fat: convert to new i_version API
  affs: convert to new i_version API
  afs: convert to new i_version API
  btrfs: convert to new i_version API
  exofs: switch to new i_version API
  ext2: convert to new i_version API
  ext4: convert to new i_version API
  nfs: convert to new i_version API
  nfsd: convert to new i_version API
  ocfs2: convert to new i_version API
  ufs: use new i_version API
  xfs: convert to new i_version API
  IMA: switch IMA over to new i_version API
  fs: only set S_VERSION when updating times if necessary
  xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need
incrementing
  btrfs: only dirty the inode in btrfs_update_time if something was
changed
  fs: handle inode->i_version more efficiently

 fs/affs/amigaffs.c|   5 +-
 fs/affs/dir.c |   5 +-
 fs/affs/super.c   |   3 +-
 fs/afs/fsclient.c |   3 +-
 fs/afs/inode.c|   5 +-
 fs/btrfs/delayed-inode.c  |   7 +-
 fs/btrfs/file.c   |   1 +
 fs/btrfs/inode.c 

[PATCH v5 01/19] fs: new API for handling inode->i_version

2018-01-09 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Add a documentation blob that explains what the i_version field is, how
it is expected to work, and how it is currently implemented by various
filesystems.

We already have inode_inc_iversion. Add several other functions for
manipulating and accessing the i_version counter. For now, the
implementation is trivial and basically works the way that all of the
open-coded i_version accesses work today.

Future patches will convert existing users of i_version to use the new
API, and then convert the backend implementation to do things more
efficiently.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
Reviewed-by: Jan Kara <j...@suse.cz>
---
 fs/btrfs/file.c  |   1 +
 fs/btrfs/inode.c |   1 +
 fs/btrfs/ioctl.c |   1 +
 fs/btrfs/xattr.c |   1 +
 fs/ext4/inode.c  |   1 +
 fs/ext4/namei.c  |   1 +
 fs/inode.c   |   1 +
 include/linux/fs.h   |  15 ---
 include/linux/iversion.h | 236 +++
 9 files changed, 243 insertions(+), 15 deletions(-)
 create mode 100644 include/linux/iversion.h

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index eb1bac7c8553..c95d7b2efefb 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ctree.h"
 #include "disk-io.h"
 #include "transaction.h"
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e1a7f3cb5be9..27f008b33fc1 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ctree.h"
 #include "disk-io.h"
 #include "transaction.h"
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 2ef8acaac688..aa452c9e2eff 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ctree.h"
 #include "disk-io.h"
 #include "transaction.h"
diff --git a/fs/btrfs/xattr.c b/fs/btrfs/xattr.c
index 2c7e53f9ff1b..5258c1714830 100644
--- a/fs/btrfs/xattr.c
+++ b/fs/btrfs/xattr.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ctree.h"
 #include "btrfs_inode.h"
 #include "transaction.h"
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 7df2c5644e59..fa5d8bc52d2d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ext4_jbd2.h"
 #include "xattr.h"
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 798b3ac680db..bcf0dff517be 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ext4.h"
 #include "ext4_jbd2.h"
 
diff --git a/fs/inode.c b/fs/inode.c
index 03102d6ef044..19e72f500f71 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -18,6 +18,7 @@
 #include  /* for inode_has_buffers */
 #include 
 #include 
+#include 
 #include 
 #include "internal.h"
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 511fbaabf624..76382c24e9d0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2036,21 +2036,6 @@ static inline void inode_dec_link_count(struct inode 
*inode)
mark_inode_dirty(inode);
 }
 
-/**
- * inode_inc_iversion - increments i_version
- * @inode: inode that need to be updated
- *
- * Every time the inode is modified, the i_version field will be incremented.
- * The filesystem has to be mounted with i_version flag
- */
-
-static inline void inode_inc_iversion(struct inode *inode)
-{
-   spin_lock(>i_lock);
-   inode->i_version++;
-   spin_unlock(>i_lock);
-}
-
 enum file_time_flags {
S_ATIME = 1,
S_MTIME = 2,
diff --git a/include/linux/iversion.h b/include/linux/iversion.h
new file mode 100644
index ..d09cc3a08740
--- /dev/null
+++ b/include/linux/iversion.h
@@ -0,0 +1,236 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_IVERSION_H
+#define _LINUX_IVERSION_H
+
+#include 
+
+/*
+ * The change attribute (i_version) is mandated by NFSv4 and is mostly for
+ * knfsd, but is also used for other purposes (e.g. IMA). The i_version must
+ * appear different to observers if there was a change to the inode's data or
+ * metadata since it was last queried.
+ *
+ * Observers see the i_version as a 64-bit number that never changes. If it
+ * remains the same since it was last checked, then nothing has changed in the
+ * inode. If it's different then something has changed. Observers cannot infer
+ * anything about the nature or magnitude of the changes from the value, only
+ * that the inode has changed in some fashion.
+ *
+ * Not all filesystems properly implement the i_version counter. Subsystems 
that
+ * want to use i_version field on an inode should first check whether the
+ * filesystem sets the SB_I_VERSION flag (usually via the

[PATCH v5 02/19] fs: don't take the i_lock in inode_inc_iversion

2018-01-09 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

The rationale for taking the i_lock when incrementing this value is
lost in antiquity. The readers of the field don't take it (at least
not universally), so my assumption is that it was only done here to
serialize incrementors.

If that is indeed the case, then we can drop the i_lock from this
codepath and treat it as a atomic64_t for the purposes of
incrementing it. This allows us to use inode_inc_iversion without
any danger of lock inversion.

Note that the read side is not fetched atomically with this change.
The assumption here is that that is not a critical issue since the
i_version is not fully synchronized with anything else anyway.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 include/linux/iversion.h | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/linux/iversion.h b/include/linux/iversion.h
index d09cc3a08740..5ad9eaa3a9b0 100644
--- a/include/linux/iversion.h
+++ b/include/linux/iversion.h
@@ -104,12 +104,13 @@ inode_set_iversion_queried(struct inode *inode, u64 new)
 static inline bool
 inode_maybe_inc_iversion(struct inode *inode, bool force)
 {
-   spin_lock(>i_lock);
-   inode->i_version++;
-   spin_unlock(>i_lock);
+   atomic64_t *ivp = (atomic64_t *)>i_version;
+
+   atomic64_inc(ivp);
return true;
 }
 
+
 /**
  * inode_inc_iversion - forcibly increment i_version
  * @inode: inode that needs to be updated
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 03/19] fat: convert to new i_version API

2018-01-09 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/fat/dir.c |  3 ++-
 fs/fat/inode.c   |  9 +
 fs/fat/namei_msdos.c |  7 ---
 fs/fat/namei_vfat.c  | 22 +++---
 4 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/fs/fat/dir.c b/fs/fat/dir.c
index b833ffeee1e1..8e100c3bf72c 100644
--- a/fs/fat/dir.c
+++ b/fs/fat/dir.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "fat.h"
 
 /*
@@ -1055,7 +1056,7 @@ int fat_remove_entries(struct inode *dir, struct 
fat_slot_info *sinfo)
brelse(bh);
if (err)
return err;
-   dir->i_version++;
+   inode_inc_iversion(dir);
 
if (nr_slots) {
/*
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 20a0a89eaca5..ffbbf0520d9e 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "fat.h"
 
 #ifndef CONFIG_FAT_DEFAULT_IOCHARSET
@@ -507,7 +508,7 @@ int fat_fill_inode(struct inode *inode, struct 
msdos_dir_entry *de)
MSDOS_I(inode)->i_pos = 0;
inode->i_uid = sbi->options.fs_uid;
inode->i_gid = sbi->options.fs_gid;
-   inode->i_version++;
+   inode_inc_iversion(inode);
inode->i_generation = get_seconds();
 
if ((de->attr & ATTR_DIR) && !IS_FREE(de->name)) {
@@ -590,7 +591,7 @@ struct inode *fat_build_inode(struct super_block *sb,
goto out;
}
inode->i_ino = iunique(sb, MSDOS_ROOT_INO);
-   inode->i_version = 1;
+   inode_set_iversion(inode, 1);
err = fat_fill_inode(inode, de);
if (err) {
iput(inode);
@@ -1377,7 +1378,7 @@ static int fat_read_root(struct inode *inode)
MSDOS_I(inode)->i_pos = MSDOS_ROOT_INO;
inode->i_uid = sbi->options.fs_uid;
inode->i_gid = sbi->options.fs_gid;
-   inode->i_version++;
+   inode_inc_iversion(inode);
inode->i_generation = 0;
inode->i_mode = fat_make_mode(sbi, ATTR_DIR, S_IRWXUGO);
inode->i_op = sbi->dir_ops;
@@ -1828,7 +1829,7 @@ int fat_fill_super(struct super_block *sb, void *data, 
int silent, int isvfat,
if (!root_inode)
goto out_fail;
root_inode->i_ino = MSDOS_ROOT_INO;
-   root_inode->i_version = 1;
+   inode_set_iversion(root_inode, 1);
error = fat_read_root(root_inode);
if (error < 0) {
iput(root_inode);
diff --git a/fs/fat/namei_msdos.c b/fs/fat/namei_msdos.c
index d24d2758a363..582ca731a6c9 100644
--- a/fs/fat/namei_msdos.c
+++ b/fs/fat/namei_msdos.c
@@ -7,6 +7,7 @@
  */
 
 #include 
+#include 
 #include "fat.h"
 
 /* Characters that are undesirable in an MS-DOS file name */
@@ -480,7 +481,7 @@ static int do_msdos_rename(struct inode *old_dir, unsigned 
char *old_name,
} else
mark_inode_dirty(old_inode);
 
-   old_dir->i_version++;
+   inode_inc_iversion(old_dir);
old_dir->i_ctime = old_dir->i_mtime = 
current_time(old_dir);
if (IS_DIRSYNC(old_dir))
(void)fat_sync_inode(old_dir);
@@ -508,7 +509,7 @@ static int do_msdos_rename(struct inode *old_dir, unsigned 
char *old_name,
goto out;
new_i_pos = sinfo.i_pos;
}
-   new_dir->i_version++;
+   inode_inc_iversion(new_dir);
 
fat_detach(old_inode);
fat_attach(old_inode, new_i_pos);
@@ -540,7 +541,7 @@ static int do_msdos_rename(struct inode *old_dir, unsigned 
char *old_name,
old_sinfo.bh = NULL;
if (err)
goto error_dotdot;
-   old_dir->i_version++;
+   inode_inc_iversion(old_dir);
old_dir->i_ctime = old_dir->i_mtime = ts;
if (IS_DIRSYNC(old_dir))
(void)fat_sync_inode(old_dir);
diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 02c03a3a..cefea792cde8 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -20,7 +20,7 @@
 #include 
 #include 
 #include 
-
+#include 
 #include "fat.h"
 
 static inline unsigned long vfat_d_version(struct dentry *dentry)
@@ -46,7 +46,7 @@ static int vfat_revalidate_shortname(struct dentry *dentry)
 {
int ret = 1;
spin_lock(>d_lock);
-   if (vfat_d_version(dentry) != d_inode(dentry->d_parent)->i_version)
+   if (inode_cmp_iversion(d_inode(dentry->d_parent), 
vfat_d_version(dentry)))
ret = 0;
spin_unlock(>d_lock);
return ret;
@@ -759,7 +759,7 @@ static struct dentry *vfat_lookup(struct inode *dir, struct 
dentry *dentry,
 out:
mutex_unlock(_SB(sb)->s_lock);
if (!inode)
-   vfat_d_version_set

[PATCH v5 04/19] affs: convert to new i_version API

2018-01-09 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/affs/amigaffs.c | 5 +++--
 fs/affs/dir.c  | 5 +++--
 fs/affs/super.c| 3 ++-
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/affs/amigaffs.c b/fs/affs/amigaffs.c
index 0f0e6925e97d..14a6c1b90c9f 100644
--- a/fs/affs/amigaffs.c
+++ b/fs/affs/amigaffs.c
@@ -10,6 +10,7 @@
  */
 
 #include 
+#include 
 #include "affs.h"
 
 /*
@@ -60,7 +61,7 @@ affs_insert_hash(struct inode *dir, struct buffer_head *bh)
affs_brelse(dir_bh);
 
dir->i_mtime = dir->i_ctime = current_time(dir);
-   dir->i_version++;
+   inode_inc_iversion(dir);
mark_inode_dirty(dir);
 
return 0;
@@ -114,7 +115,7 @@ affs_remove_hash(struct inode *dir, struct buffer_head 
*rem_bh)
affs_brelse(bh);
 
dir->i_mtime = dir->i_ctime = current_time(dir);
-   dir->i_version++;
+   inode_inc_iversion(dir);
mark_inode_dirty(dir);
 
return retval;
diff --git a/fs/affs/dir.c b/fs/affs/dir.c
index a105e77df2c1..d180b46453cf 100644
--- a/fs/affs/dir.c
+++ b/fs/affs/dir.c
@@ -14,6 +14,7 @@
  *
  */
 
+#include 
 #include "affs.h"
 
 static int affs_readdir(struct file *, struct dir_context *);
@@ -80,7 +81,7 @@ affs_readdir(struct file *file, struct dir_context *ctx)
 * we can jump directly to where we left off.
 */
ino = (u32)(long)file->private_data;
-   if (ino && file->f_version == inode->i_version) {
+   if (ino && inode_cmp_iversion(inode, file->f_version) == 0) {
pr_debug("readdir() left off=%d\n", ino);
goto inside;
}
@@ -130,7 +131,7 @@ affs_readdir(struct file *file, struct dir_context *ctx)
} while (ino);
}
 done:
-   file->f_version = inode->i_version;
+   file->f_version = inode_query_iversion(inode);
file->private_data = (void *)(long)ino;
affs_brelse(fh_bh);
 
diff --git a/fs/affs/super.c b/fs/affs/super.c
index 1117e36134cc..e602619aed9d 100644
--- a/fs/affs/super.c
+++ b/fs/affs/super.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "affs.h"
 
 static int affs_statfs(struct dentry *dentry, struct kstatfs *buf);
@@ -102,7 +103,7 @@ static struct inode *affs_alloc_inode(struct super_block 
*sb)
if (!i)
return NULL;
 
-   i->vfs_inode.i_version = 1;
+   inode_set_iversion(>vfs_inode, 1);
i->i_lc = NULL;
i->i_ext_bh = NULL;
i->i_pa_cnt = 0;
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 05/19] afs: convert to new i_version API

2018-01-09 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

For AFS, it's generally treated as an opaque value, so we use the
*_raw variants of the API here.

Note that AFS has quite a different definition for this counter. AFS
only increments it on changes to the data to the data in regular files
and contents of the directories. Inode metadata changes do not result
in a version increment.

We'll need to reconcile that somehow if we ever want to present this to
userspace via statx.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/afs/fsclient.c | 3 ++-
 fs/afs/inode.c| 5 +++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index b90ef39ae914..88ec38c2d83c 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 #include "afs_fs.h"
 
@@ -124,7 +125,7 @@ static void xdr_decode_AFSFetchStatus(const __be32 **_bp,
vnode->vfs_inode.i_ctime.tv_sec = status->mtime_client;
vnode->vfs_inode.i_mtime= vnode->vfs_inode.i_ctime;
vnode->vfs_inode.i_atime= vnode->vfs_inode.i_ctime;
-   vnode->vfs_inode.i_version  = data_version;
+   inode_set_iversion_raw(>vfs_inode, data_version);
}
 
expected_version = status->data_version;
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 3415eb7484f6..dcd2e08d6cdb 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 
 static const struct inode_operations afs_symlink_inode_operations = {
@@ -89,7 +90,7 @@ static int afs_inode_map_status(struct afs_vnode *vnode, 
struct key *key)
inode->i_atime  = inode->i_mtime = inode->i_ctime;
inode->i_blocks = 0;
inode->i_generation = vnode->fid.unique;
-   inode->i_version= vnode->status.data_version;
+   inode_set_iversion_raw(inode, vnode->status.data_version);
inode->i_mapping->a_ops = _fs_aops;
 
read_sequnlock_excl(>cb_lock);
@@ -218,7 +219,7 @@ struct inode *afs_iget_autocell(struct inode *dir, const 
char *dev_name,
inode->i_ctime.tv_nsec  = 0;
inode->i_atime  = inode->i_mtime = inode->i_ctime;
inode->i_blocks = 0;
-   inode->i_version= 0;
+   inode_set_iversion_raw(inode, 0);
inode->i_generation = 0;
 
set_bit(AFS_VNODE_PSEUDODIR, >flags);
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 06/19] btrfs: convert to new i_version API

2018-01-09 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
Acked-by: David Sterba <dste...@suse.com>
---
 fs/btrfs/delayed-inode.c | 7 +--
 fs/btrfs/inode.c | 6 --
 fs/btrfs/tree-log.c  | 4 +++-
 3 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 5d73f79ded8b..6a246ae2bcb2 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -18,6 +18,7 @@
  */
 
 #include 
+#include 
 #include "delayed-inode.h"
 #include "disk-io.h"
 #include "transaction.h"
@@ -1700,7 +1701,8 @@ static void fill_stack_inode_item(struct 
btrfs_trans_handle *trans,
btrfs_set_stack_inode_nbytes(inode_item, inode_get_bytes(inode));
btrfs_set_stack_inode_generation(inode_item,
 BTRFS_I(inode)->generation);
-   btrfs_set_stack_inode_sequence(inode_item, inode->i_version);
+   btrfs_set_stack_inode_sequence(inode_item,
+  inode_peek_iversion(inode));
btrfs_set_stack_inode_transid(inode_item, trans->transid);
btrfs_set_stack_inode_rdev(inode_item, inode->i_rdev);
btrfs_set_stack_inode_flags(inode_item, BTRFS_I(inode)->flags);
@@ -1754,7 +1756,8 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
BTRFS_I(inode)->generation = btrfs_stack_inode_generation(inode_item);
 BTRFS_I(inode)->last_trans = btrfs_stack_inode_transid(inode_item);
 
-   inode->i_version = btrfs_stack_inode_sequence(inode_item);
+   inode_set_iversion_queried(inode,
+  btrfs_stack_inode_sequence(inode_item));
inode->i_rdev = 0;
*rdev = btrfs_stack_inode_rdev(inode_item);
BTRFS_I(inode)->flags = btrfs_stack_inode_flags(inode_item);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 27f008b33fc1..ac8692849a81 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3778,7 +3778,8 @@ static int btrfs_read_locked_inode(struct inode *inode)
BTRFS_I(inode)->generation = btrfs_inode_generation(leaf, inode_item);
BTRFS_I(inode)->last_trans = btrfs_inode_transid(leaf, inode_item);
 
-   inode->i_version = btrfs_inode_sequence(leaf, inode_item);
+   inode_set_iversion_queried(inode,
+  btrfs_inode_sequence(leaf, inode_item));
inode->i_generation = BTRFS_I(inode)->generation;
inode->i_rdev = 0;
rdev = btrfs_inode_rdev(leaf, inode_item);
@@ -3946,7 +3947,8 @@ static void fill_inode_item(struct btrfs_trans_handle 
*trans,
 );
btrfs_set_token_inode_generation(leaf, item, BTRFS_I(inode)->generation,
 );
-   btrfs_set_token_inode_sequence(leaf, item, inode->i_version, );
+   btrfs_set_token_inode_sequence(leaf, item, inode_peek_iversion(inode),
+  );
btrfs_set_token_inode_transid(leaf, item, trans->transid, );
btrfs_set_token_inode_rdev(leaf, item, inode->i_rdev, );
btrfs_set_token_inode_flags(leaf, item, BTRFS_I(inode)->flags, );
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 7bf9b31561db..1b7d92075c1f 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "tree-log.h"
 #include "disk-io.h"
 #include "locking.h"
@@ -3609,7 +3610,8 @@ static void fill_inode_item(struct btrfs_trans_handle 
*trans,
btrfs_set_token_inode_nbytes(leaf, item, inode_get_bytes(inode),
 );
 
-   btrfs_set_token_inode_sequence(leaf, item, inode->i_version, );
+   btrfs_set_token_inode_sequence(leaf, item,
+  inode_peek_iversion(inode), );
btrfs_set_token_inode_transid(leaf, item, trans->transid, );
btrfs_set_token_inode_rdev(leaf, item, inode->i_rdev, );
btrfs_set_token_inode_flags(leaf, item, BTRFS_I(inode)->flags, );
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 08/19] ext2: convert to new i_version API

2018-01-09 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
Reviewed-by: Jan Kara <j...@suse.cz>
---
 fs/ext2/dir.c   | 9 +
 fs/ext2/super.c | 5 +++--
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c
index 987647986f47..4111085a129f 100644
--- a/fs/ext2/dir.c
+++ b/fs/ext2/dir.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 typedef struct ext2_dir_entry_2 ext2_dirent;
 
@@ -92,7 +93,7 @@ static int ext2_commit_chunk(struct page *page, loff_t pos, 
unsigned len)
struct inode *dir = mapping->host;
int err = 0;
 
-   dir->i_version++;
+   inode_inc_iversion(dir);
block_write_end(NULL, mapping, pos, len, len, page, NULL);
 
if (pos+len > dir->i_size) {
@@ -293,7 +294,7 @@ ext2_readdir(struct file *file, struct dir_context *ctx)
unsigned long npages = dir_pages(inode);
unsigned chunk_mask = ~(ext2_chunk_size(inode)-1);
unsigned char *types = NULL;
-   int need_revalidate = file->f_version != inode->i_version;
+   bool need_revalidate = inode_cmp_iversion(inode, file->f_version);
 
if (pos > inode->i_size - EXT2_DIR_REC_LEN(1))
return 0;
@@ -319,8 +320,8 @@ ext2_readdir(struct file *file, struct dir_context *ctx)
offset = ext2_validate_entry(kaddr, offset, 
chunk_mask);
ctx->pos = (n<<PAGE_SHIFT) + offset;
}
-   file->f_version = inode->i_version;
-   need_revalidate = 0;
+   file->f_version = inode_query_iversion(inode);
+   need_revalidate = false;
}
de = (ext2_dirent *)(kaddr+offset);
limit = kaddr + ext2_last_byte(inode, n) - EXT2_DIR_REC_LEN(1);
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 7646818ab266..554c98b8a93a 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ext2.h"
 #include "xattr.h"
 #include "acl.h"
@@ -184,7 +185,7 @@ static struct inode *ext2_alloc_inode(struct super_block 
*sb)
if (!ei)
return NULL;
ei->i_block_alloc_info = NULL;
-   ei->vfs_inode.i_version = 1;
+   inode_set_iversion(>vfs_inode, 1);
 #ifdef CONFIG_QUOTA
memset(>i_dquot, 0, sizeof(ei->i_dquot));
 #endif
@@ -1569,7 +1570,7 @@ static ssize_t ext2_quota_write(struct super_block *sb, 
int type,
return err;
if (inode->i_size < off+len-towrite)
i_size_write(inode, off+len-towrite);
-   inode->i_version++;
+   inode_inc_iversion(inode);
inode->i_mtime = inode->i_ctime = current_time(inode);
mark_inode_dirty(inode);
return len - towrite;
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 07/19] exofs: switch to new i_version API

2018-01-09 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/exofs/dir.c   | 9 +
 fs/exofs/super.c | 3 ++-
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/exofs/dir.c b/fs/exofs/dir.c
index 98233a97b7b8..c5a53fcc43ea 100644
--- a/fs/exofs/dir.c
+++ b/fs/exofs/dir.c
@@ -31,6 +31,7 @@
  * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
  */
 
+#include 
 #include "exofs.h"
 
 static inline unsigned exofs_chunk_size(struct inode *inode)
@@ -60,7 +61,7 @@ static int exofs_commit_chunk(struct page *page, loff_t pos, 
unsigned len)
struct inode *dir = mapping->host;
int err = 0;
 
-   dir->i_version++;
+   inode_inc_iversion(dir);
 
if (!PageUptodate(page))
SetPageUptodate(page);
@@ -241,7 +242,7 @@ exofs_readdir(struct file *file, struct dir_context *ctx)
unsigned long n = pos >> PAGE_SHIFT;
unsigned long npages = dir_pages(inode);
unsigned chunk_mask = ~(exofs_chunk_size(inode)-1);
-   int need_revalidate = (file->f_version != inode->i_version);
+   bool need_revalidate = inode_cmp_iversion(inode, file->f_version);
 
if (pos > inode->i_size - EXOFS_DIR_REC_LEN(1))
return 0;
@@ -264,8 +265,8 @@ exofs_readdir(struct file *file, struct dir_context *ctx)
chunk_mask);
ctx->pos = (n<<PAGE_SHIFT) + offset;
}
-   file->f_version = inode->i_version;
-   need_revalidate = 0;
+   file->f_version = inode_query_iversion(inode);
+   need_revalidate = false;
}
de = (struct exofs_dir_entry *)(kaddr + offset);
limit = kaddr + exofs_last_byte(inode, n) -
diff --git a/fs/exofs/super.c b/fs/exofs/super.c
index 819624cfc8da..7e244093c0e5 100644
--- a/fs/exofs/super.c
+++ b/fs/exofs/super.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "exofs.h"
 
@@ -159,7 +160,7 @@ static struct inode *exofs_alloc_inode(struct super_block 
*sb)
if (!oi)
return NULL;
 
-   oi->vfs_inode.i_version = 1;
+   inode_set_iversion(>vfs_inode, 1);
return >vfs_inode;
 }
 
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 09/19] ext4: convert to new i_version API

2018-01-09 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
Acked-by: Theodore Ts'o <ty...@mit.edu>
---
 fs/ext4/dir.c|  9 +
 fs/ext4/inline.c |  7 ---
 fs/ext4/inode.c  | 12 
 fs/ext4/ioctl.c  |  3 ++-
 fs/ext4/namei.c  |  4 ++--
 fs/ext4/super.c  |  3 ++-
 fs/ext4/xattr.c  |  5 +++--
 7 files changed, 26 insertions(+), 17 deletions(-)

diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index d5babc9f222b..afda0a0499ce 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ext4.h"
 #include "xattr.h"
 
@@ -208,7 +209,7 @@ static int ext4_readdir(struct file *file, struct 
dir_context *ctx)
 * readdir(2), then we might be pointing to an invalid
 * dirent right now.  Scan from the start of the block
 * to make sure. */
-   if (file->f_version != inode->i_version) {
+   if (inode_cmp_iversion(inode, file->f_version)) {
for (i = 0; i < sb->s_blocksize && i < offset; ) {
de = (struct ext4_dir_entry_2 *)
(bh->b_data + i);
@@ -227,7 +228,7 @@ static int ext4_readdir(struct file *file, struct 
dir_context *ctx)
offset = i;
ctx->pos = (ctx->pos & ~(sb->s_blocksize - 1))
| offset;
-   file->f_version = inode->i_version;
+   file->f_version = inode_query_iversion(inode);
}
 
while (ctx->pos < inode->i_size
@@ -568,10 +569,10 @@ static int ext4_dx_readdir(struct file *file, struct 
dir_context *ctx)
 * cached entries.
 */
if ((!info->curr_node) ||
-   (file->f_version != inode->i_version)) {
+   inode_cmp_iversion(inode, file->f_version)) {
info->curr_node = NULL;
free_rb_tree_fname(>root);
-   file->f_version = inode->i_version;
+   file->f_version = inode_query_iversion(inode);
ret = ext4_htree_fill_tree(file, info->curr_hash,
   info->curr_minor_hash,
   >next_hash);
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 1367553c43bb..a8b987b71173 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -14,6 +14,7 @@
 
 #include 
 #include 
+#include 
 
 #include "ext4_jbd2.h"
 #include "ext4.h"
@@ -1042,7 +1043,7 @@ static int ext4_add_dirent_to_inline(handle_t *handle,
 */
dir->i_mtime = dir->i_ctime = current_time(dir);
ext4_update_dx_flag(dir);
-   dir->i_version++;
+   inode_inc_iversion(dir);
return 1;
 }
 
@@ -1494,7 +1495,7 @@ int ext4_read_inline_dir(struct file *file,
 * dirent right now.  Scan from the start of the inline
 * dir to make sure.
 */
-   if (file->f_version != inode->i_version) {
+   if (inode_cmp_iversion(inode, file->f_version)) {
for (i = 0; i < extra_size && i < offset;) {
/*
 * "." is with offset 0 and
@@ -1526,7 +1527,7 @@ int ext4_read_inline_dir(struct file *file,
}
offset = i;
ctx->pos = offset;
-   file->f_version = inode->i_version;
+   file->f_version = inode_query_iversion(inode);
}
 
while (ctx->pos < extra_size) {
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index fa5d8bc52d2d..1b0d54b372f2 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4874,12 +4874,14 @@ struct inode *ext4_iget(struct super_block *sb, 
unsigned long ino)
EXT4_EINODE_GET_XTIME(i_crtime, ei, raw_inode);
 
if (likely(!test_opt2(inode->i_sb, HURD_COMPAT))) {
-   inode->i_version = le32_to_cpu(raw_inode->i_disk_version);
+   u64 ivers = le32_to_cpu(raw_inode->i_disk_version);
+
if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE) {
if (EXT4_FITS_IN_INODE(raw_inode, ei, i_version_hi))
-   inode->i_version |=
+   ivers |=
(__u64)(le32_to_cpu(raw_inode->i_version_hi)) << 32;
}
+   inode_set_iversion_queried(inode, ivers);
}
 
ret = 0;
@@ -5165,11 +5167,13 @@ static int ext4_do_update_inode(handle_t *handle,
}
 
if (likely(!test_opt2(inode->i_sb, HURD_COMPAT))) {
-   raw_inode->i_disk_version = cpu_

[PATCH v5 10/19] nfs: convert to new i_version API

2018-01-09 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

For NFS, we just use the "raw" API since the i_version is mostly
managed by the server. The exception there is when the client
holds a write delegation, but we only need to bump it once
there anyway to handle CB_GETATTR.

Tested-by: Krzysztof Kozlowski <k...@kernel.org>
Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/nfs/delegation.c|  3 ++-
 fs/nfs/fscache-index.c |  5 +++--
 fs/nfs/inode.c | 18 +-
 fs/nfs/nfs4proc.c  | 10 ++
 fs/nfs/nfstrace.h  |  5 +++--
 fs/nfs/write.c |  8 +++-
 6 files changed, 26 insertions(+), 23 deletions(-)

diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index ade44ca0c66c..d8b47624fee2 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -347,7 +348,7 @@ int nfs_inode_set_delegation(struct inode *inode, struct 
rpc_cred *cred, struct
nfs4_stateid_copy(>stateid, >delegation);
delegation->type = res->delegation_type;
delegation->pagemod_limit = res->pagemod_limit;
-   delegation->change_attr = inode->i_version;
+   delegation->change_attr = inode_peek_iversion_raw(inode);
delegation->cred = get_rpccred(cred);
delegation->inode = inode;
delegation->flags = 1<<NFS_DELEGATION_REFERENCED;
diff --git a/fs/nfs/fscache-index.c b/fs/nfs/fscache-index.c
index 3025fe8584a0..0ee4b93d36ea 100644
--- a/fs/nfs/fscache-index.c
+++ b/fs/nfs/fscache-index.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "internal.h"
 #include "fscache.h"
@@ -211,7 +212,7 @@ static uint16_t nfs_fscache_inode_get_aux(const void 
*cookie_netfs_data,
auxdata.ctime = nfsi->vfs_inode.i_ctime;
 
if (NFS_SERVER(>vfs_inode)->nfs_client->rpc_ops->version == 4)
-   auxdata.change_attr = nfsi->vfs_inode.i_version;
+   auxdata.change_attr = inode_peek_iversion_raw(>vfs_inode);
 
if (bufmax > sizeof(auxdata))
bufmax = sizeof(auxdata);
@@ -243,7 +244,7 @@ enum fscache_checkaux nfs_fscache_inode_check_aux(void 
*cookie_netfs_data,
auxdata.ctime = nfsi->vfs_inode.i_ctime;
 
if (NFS_SERVER(>vfs_inode)->nfs_client->rpc_ops->version == 4)
-   auxdata.change_attr = nfsi->vfs_inode.i_version;
+   auxdata.change_attr = inode_peek_iversion_raw(>vfs_inode);
 
if (memcmp(data, , datalen) != 0)
return FSCACHE_CHECKAUX_OBSOLETE;
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index b992d2382ffa..93552c482992 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -38,8 +38,8 @@
 #include 
 #include 
 #include 
-
 #include 
+#include 
 
 #include "nfs4_fs.h"
 #include "callback.h"
@@ -483,7 +483,7 @@ nfs_fhget(struct super_block *sb, struct nfs_fh *fh, struct 
nfs_fattr *fattr, st
memset(>i_atime, 0, sizeof(inode->i_atime));
memset(>i_mtime, 0, sizeof(inode->i_mtime));
memset(>i_ctime, 0, sizeof(inode->i_ctime));
-   inode->i_version = 0;
+   inode_set_iversion_raw(inode, 0);
inode->i_size = 0;
clear_nlink(inode);
inode->i_uid = make_kuid(_user_ns, -2);
@@ -508,7 +508,7 @@ nfs_fhget(struct super_block *sb, struct nfs_fh *fh, struct 
nfs_fattr *fattr, st
else if (nfs_server_capable(inode, NFS_CAP_CTIME))
nfs_set_cache_invalid(inode, NFS_INO_INVALID_ATTR);
if (fattr->valid & NFS_ATTR_FATTR_CHANGE)
-   inode->i_version = fattr->change_attr;
+   inode_set_iversion_raw(inode, fattr->change_attr);
else
nfs_set_cache_invalid(inode, NFS_INO_INVALID_ATTR
| NFS_INO_REVAL_PAGECACHE);
@@ -1289,8 +1289,8 @@ static unsigned long nfs_wcc_update_inode(struct inode 
*inode, struct nfs_fattr
 
if ((fattr->valid & NFS_ATTR_FATTR_PRECHANGE)
&& (fattr->valid & NFS_ATTR_FATTR_CHANGE)
-   && inode->i_version == fattr->pre_change_attr) {
-   inode->i_version = fattr->change_attr;
+   && !inode_cmp_iversion_raw(inode, 
fattr->pre_change_attr)) {
+   inode_set_iversion_raw(inode, fattr->change_attr);
if (S_ISDIR(inode->i_mode))
nfs_set_cache_invalid(inode, NFS_INO_INVALID_DATA);
ret |= NFS_INO_INVALID_ATTR;
@@ -1348,7 +1348,7 @@ static int nfs_check_inode_attributes(struct inode 
*inode, struct nfs_fattr *fat
 
if (!nfs_file_has_buffered_writers(nfsi)) {
   

[PATCH v5 13/19] ufs: use new i_version API

2018-01-09 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/ufs/dir.c   | 9 +
 fs/ufs/inode.c | 3 ++-
 fs/ufs/super.c | 3 ++-
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/ufs/dir.c b/fs/ufs/dir.c
index 2edc1755b7c5..50dfce000864 100644
--- a/fs/ufs/dir.c
+++ b/fs/ufs/dir.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ufs_fs.h"
 #include "ufs.h"
@@ -47,7 +48,7 @@ static int ufs_commit_chunk(struct page *page, loff_t pos, 
unsigned len)
struct inode *dir = mapping->host;
int err = 0;
 
-   dir->i_version++;
+   inode_inc_iversion(dir);
block_write_end(NULL, mapping, pos, len, len, page, NULL);
if (pos+len > dir->i_size) {
i_size_write(dir, pos+len);
@@ -428,7 +429,7 @@ ufs_readdir(struct file *file, struct dir_context *ctx)
unsigned long n = pos >> PAGE_SHIFT;
unsigned long npages = dir_pages(inode);
unsigned chunk_mask = ~(UFS_SB(sb)->s_uspi->s_dirblksize - 1);
-   int need_revalidate = file->f_version != inode->i_version;
+   bool need_revalidate = inode_cmp_iversion(inode, file->f_version);
unsigned flags = UFS_SB(sb)->s_flags;
 
UFSD("BEGIN\n");
@@ -455,8 +456,8 @@ ufs_readdir(struct file *file, struct dir_context *ctx)
offset = ufs_validate_entry(sb, kaddr, offset, 
chunk_mask);
ctx->pos = (n<<PAGE_SHIFT) + offset;
}
-   file->f_version = inode->i_version;
-   need_revalidate = 0;
+   file->f_version = inode_query_iversion(inode);
+   need_revalidate = false;
}
de = (struct ufs_dir_entry *)(kaddr+offset);
limit = kaddr + ufs_last_byte(inode, n) - UFS_DIR_REC_LEN(1);
diff --git a/fs/ufs/inode.c b/fs/ufs/inode.c
index afb601c0dda0..c843ec858cf7 100644
--- a/fs/ufs/inode.c
+++ b/fs/ufs/inode.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ufs_fs.h"
 #include "ufs.h"
@@ -693,7 +694,7 @@ struct inode *ufs_iget(struct super_block *sb, unsigned 
long ino)
if (err)
goto bad_inode;
 
-   inode->i_version++;
+   inode_inc_iversion(inode);
ufsi->i_lastfrag =
(inode->i_size + uspi->s_fsize - 1) >> uspi->s_fshift;
ufsi->i_dir_start_lookup = 0;
diff --git a/fs/ufs/super.c b/fs/ufs/super.c
index 4d497e9c6883..b6ba80e05bff 100644
--- a/fs/ufs/super.c
+++ b/fs/ufs/super.c
@@ -88,6 +88,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ufs_fs.h"
 #include "ufs.h"
@@ -1440,7 +1441,7 @@ static struct inode *ufs_alloc_inode(struct super_block 
*sb)
if (!ei)
return NULL;
 
-   ei->vfs_inode.i_version = 1;
+   inode_set_iversion(>vfs_inode, 1);
seqlock_init(>meta_lock);
mutex_init(>truncate_mutex);
return >vfs_inode;
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 11/19] nfsd: convert to new i_version API

2018-01-09 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Mostly just making sure we use the "get" wrappers so we know when
it is being fetched for later use.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/nfsd/nfsfh.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
index 43f31cf49bae..b8444189223b 100644
--- a/fs/nfsd/nfsfh.h
+++ b/fs/nfsd/nfsfh.h
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static inline __u32 ino_t_to_u32(ino_t ino)
 {
@@ -259,7 +260,7 @@ static inline u64 nfsd4_change_attribute(struct inode 
*inode)
chattr =  inode->i_ctime.tv_sec;
chattr <<= 30;
chattr += inode->i_ctime.tv_nsec;
-   chattr += inode->i_version;
+   chattr += inode_query_iversion(inode);
return chattr;
 }
 
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 12/19] ocfs2: convert to new i_version API

2018-01-09 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
Reviewed-by: Jan Kara <j...@suse.cz>
---
 fs/ocfs2/dir.c  | 15 ---
 fs/ocfs2/inode.c|  3 ++-
 fs/ocfs2/namei.c|  3 ++-
 fs/ocfs2/quota_global.c |  3 ++-
 4 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c
index febe6312ceff..32f9c72dff17 100644
--- a/fs/ocfs2/dir.c
+++ b/fs/ocfs2/dir.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -1174,7 +1175,7 @@ static int __ocfs2_delete_entry(handle_t *handle, struct 
inode *dir,
le16_add_cpu(>rec_len,
le16_to_cpu(de->rec_len));
de->inode = 0;
-   dir->i_version++;
+   inode_inc_iversion(dir);
ocfs2_journal_dirty(handle, bh);
goto bail;
}
@@ -1729,7 +1730,7 @@ int __ocfs2_add_entry(handle_t *handle,
if (ocfs2_dir_indexed(dir))
ocfs2_recalc_free_list(dir, handle, lookup);
 
-   dir->i_version++;
+   inode_inc_iversion(dir);
ocfs2_journal_dirty(handle, insert_bh);
retval = 0;
goto bail;
@@ -1775,7 +1776,7 @@ static int ocfs2_dir_foreach_blk_id(struct inode *inode,
 * readdir(2), then we might be pointing to an invalid
 * dirent right now.  Scan from the start of the block
 * to make sure. */
-   if (*f_version != inode->i_version) {
+   if (inode_cmp_iversion(inode, *f_version)) {
for (i = 0; i < i_size_read(inode) && i < offset; ) {
de = (struct ocfs2_dir_entry *)
(data->id_data + i);
@@ -1791,7 +1792,7 @@ static int ocfs2_dir_foreach_blk_id(struct inode *inode,
i += le16_to_cpu(de->rec_len);
}
ctx->pos = offset = i;
-   *f_version = inode->i_version;
+   *f_version = inode_query_iversion(inode);
}
 
de = (struct ocfs2_dir_entry *) (data->id_data + ctx->pos);
@@ -1869,7 +1870,7 @@ static int ocfs2_dir_foreach_blk_el(struct inode *inode,
 * readdir(2), then we might be pointing to an invalid
 * dirent right now.  Scan from the start of the block
 * to make sure. */
-   if (*f_version != inode->i_version) {
+   if (inode_cmp_iversion(inode, *f_version)) {
for (i = 0; i < sb->s_blocksize && i < offset; ) {
de = (struct ocfs2_dir_entry *) (bh->b_data + 
i);
/* It's too expensive to do a full
@@ -1886,7 +1887,7 @@ static int ocfs2_dir_foreach_blk_el(struct inode *inode,
offset = i;
ctx->pos = (ctx->pos & ~(sb->s_blocksize - 1))
| offset;
-   *f_version = inode->i_version;
+   *f_version = inode_query_iversion(inode);
}
 
while (ctx->pos < i_size_read(inode)
@@ -1940,7 +1941,7 @@ static int ocfs2_dir_foreach_blk(struct inode *inode, u64 
*f_version,
  */
 int ocfs2_dir_foreach(struct inode *inode, struct dir_context *ctx)
 {
-   u64 version = inode->i_version;
+   u64 version = inode_query_iversion(inode);
ocfs2_dir_foreach_blk(inode, , ctx, true);
return 0;
 }
diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
index 1a1e0078ab38..d51b80edd972 100644
--- a/fs/ocfs2/inode.c
+++ b/fs/ocfs2/inode.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -302,7 +303,7 @@ void ocfs2_populate_inode(struct inode *inode, struct 
ocfs2_dinode *fe,
OCFS2_I(inode)->ip_attr = le32_to_cpu(fe->i_attr);
OCFS2_I(inode)->ip_dyn_features = le16_to_cpu(fe->i_dyn_features);
 
-   inode->i_version = 1;
+   inode_set_iversion(inode, 1);
inode->i_generation = le32_to_cpu(fe->i_generation);
inode->i_rdev = huge_decode_dev(le64_to_cpu(fe->id1.dev1.i_rdev));
inode->i_mode = le16_to_cpu(fe->i_mode);
diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
index 3b0a10d9b36f..c801eddc4bf3 100644
--- a/fs/ocfs2/namei.c
+++ b/fs/ocfs2/namei.c
@@ -41,6 +41,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -1520,7 +1521,7 @@ static int ocfs2_rename(struct inode *old_dir,
mlog_errno(status);
goto bail;

[PATCH v5 15/19] IMA: switch IMA over to new i_version API

2018-01-09 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 security/integrity/ima/ima_api.c  | 3 ++-
 security/integrity/ima/ima_main.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c
index c7e8db0ea4c0..c6ae42266270 100644
--- a/security/integrity/ima/ima_api.c
+++ b/security/integrity/ima/ima_api.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ima.h"
 
@@ -215,7 +216,7 @@ int ima_collect_measurement(struct integrity_iint_cache 
*iint,
 * which do not support i_version, support is limited to an initial
 * measurement/appraisal/audit.
 */
-   i_version = file_inode(file)->i_version;
+   i_version = inode_query_iversion(inode);
hash.hdr.algo = algo;
 
/* Initialize hash digest to 0's in case of failure */
diff --git a/security/integrity/ima/ima_main.c 
b/security/integrity/ima/ima_main.c
index 50b8254d..06a70c5a2329 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ima.h"
 
@@ -128,7 +129,7 @@ static void ima_check_last_writer(struct 
integrity_iint_cache *iint,
inode_lock(inode);
if (atomic_read(>i_writecount) == 1) {
if (!IS_I_VERSION(inode) ||
-   (iint->version != inode->i_version) ||
+   inode_cmp_iversion(inode, iint->version) ||
(iint->flags & IMA_NEW_FILE)) {
iint->flags &= ~(IMA_DONE_MASK | IMA_NEW_FILE);
iint->measured_pcrs = 0;
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 14/19] xfs: convert to new i_version API

2018-01-09 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
Acked-by: Darrick J. Wong <darrick.w...@oracle.com>
---
 fs/xfs/libxfs/xfs_inode_buf.c | 7 +--
 fs/xfs/xfs_icache.c   | 5 +++--
 fs/xfs/xfs_inode.c| 3 ++-
 fs/xfs/xfs_inode_item.c   | 3 ++-
 fs/xfs/xfs_trans_inode.c  | 4 +++-
 5 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index 6b7989038d75..b9c0bf80669c 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -32,6 +32,8 @@
 #include "xfs_ialloc.h"
 #include "xfs_dir2.h"
 
+#include 
+
 /*
  * Check that none of the inode's in the buffer have a next
  * unlinked field of 0.
@@ -264,7 +266,8 @@ xfs_inode_from_disk(
to->di_flags= be16_to_cpu(from->di_flags);
 
if (to->di_version == 3) {
-   inode->i_version = be64_to_cpu(from->di_changecount);
+   inode_set_iversion_queried(inode,
+  be64_to_cpu(from->di_changecount));
to->di_crtime.t_sec = be32_to_cpu(from->di_crtime.t_sec);
to->di_crtime.t_nsec = be32_to_cpu(from->di_crtime.t_nsec);
to->di_flags2 = be64_to_cpu(from->di_flags2);
@@ -314,7 +317,7 @@ xfs_inode_to_disk(
to->di_flags = cpu_to_be16(from->di_flags);
 
if (from->di_version == 3) {
-   to->di_changecount = cpu_to_be64(inode->i_version);
+   to->di_changecount = cpu_to_be64(inode_peek_iversion(inode));
to->di_crtime.t_sec = cpu_to_be32(from->di_crtime.t_sec);
to->di_crtime.t_nsec = cpu_to_be32(from->di_crtime.t_nsec);
to->di_flags2 = cpu_to_be64(from->di_flags2);
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 43005fbe8b1e..4c315adb05e6 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -37,6 +37,7 @@
 
 #include 
 #include 
+#include 
 
 /*
  * Allocate and initialise an xfs_inode.
@@ -293,14 +294,14 @@ xfs_reinit_inode(
int error;
uint32_tnlink = inode->i_nlink;
uint32_tgeneration = inode->i_generation;
-   uint64_tversion = inode->i_version;
+   uint64_tversion = inode_peek_iversion(inode);
umode_t mode = inode->i_mode;
 
error = inode_init_always(mp->m_super, inode);
 
set_nlink(inode, nlink);
inode->i_generation = generation;
-   inode->i_version = version;
+   inode_set_iversion_queried(inode, version);
inode->i_mode = mode;
return error;
 }
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 801274126648..dfc5e60d8af3 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -16,6 +16,7 @@
  * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
  */
 #include 
+#include 
 
 #include "xfs.h"
 #include "xfs_fs.h"
@@ -833,7 +834,7 @@ xfs_ialloc(
ip->i_d.di_flags = 0;
 
if (ip->i_d.di_version == 3) {
-   inode->i_version = 1;
+   inode_set_iversion(inode, 1);
ip->i_d.di_flags2 = 0;
ip->i_d.di_cowextsize = 0;
ip->i_d.di_crtime.t_sec = (int32_t)tv.tv_sec;
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 6ee5c3bf19ad..7571abf5dfb3 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -30,6 +30,7 @@
 #include "xfs_buf_item.h"
 #include "xfs_log.h"
 
+#include 
 
 kmem_zone_t*xfs_ili_zone;  /* inode log item zone */
 
@@ -354,7 +355,7 @@ xfs_inode_to_log_dinode(
to->di_next_unlinked = NULLAGINO;
 
if (from->di_version == 3) {
-   to->di_changecount = inode->i_version;
+   to->di_changecount = inode_peek_iversion(inode);
to->di_crtime.t_sec = from->di_crtime.t_sec;
to->di_crtime.t_nsec = from->di_crtime.t_nsec;
to->di_flags2 = from->di_flags2;
diff --git a/fs/xfs/xfs_trans_inode.c b/fs/xfs/xfs_trans_inode.c
index daa7615497f9..225544327c4f 100644
--- a/fs/xfs/xfs_trans_inode.c
+++ b/fs/xfs/xfs_trans_inode.c
@@ -28,6 +28,8 @@
 #include "xfs_inode_item.h"
 #include "xfs_trace.h"
 
+#include 
+
 /*
  * Add a locked inode to the transaction.
  *
@@ -117,7 +119,7 @@ xfs_trans_log_inode(
 */
if (!(ip->i_itemp->ili_item.li_desc->lid_flags & XFS_LID_DIRTY) &&
IS_I_VERSION(VFS_I(ip))) {
-   VFS_I(ip)->i_version++;
+   inode_inc_iversion(VFS_I(ip));
flags |= XFS_ILOG_CORE;
}
 
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 17/19] xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need incrementing

2018-01-09 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

If XFS_ILOG_CORE is already set then go ahead and increment it.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
Acked-by: Darrick J. Wong <darrick.w...@oracle.com>
---
 fs/xfs/xfs_trans_inode.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_trans_inode.c b/fs/xfs/xfs_trans_inode.c
index 225544327c4f..4a89da4b6fe7 100644
--- a/fs/xfs/xfs_trans_inode.c
+++ b/fs/xfs/xfs_trans_inode.c
@@ -112,15 +112,17 @@ xfs_trans_log_inode(
 
/*
 * First time we log the inode in a transaction, bump the inode change
-* counter if it is configured for this to occur. We don't use
-* inode_inc_version() because there is no need for extra locking around
-* i_version as we already hold the inode locked exclusively for
-* metadata modification.
+* counter if it is configured for this to occur. While we have the
+* inode locked exclusively for metadata modification, we can usually
+* avoid setting XFS_ILOG_CORE if no one has queried the value since
+* the last time it was incremented. If we have XFS_ILOG_CORE already
+* set however, then go ahead and bump the i_version counter
+* unconditionally.
 */
if (!(ip->i_itemp->ili_item.li_desc->lid_flags & XFS_LID_DIRTY) &&
IS_I_VERSION(VFS_I(ip))) {
-   inode_inc_iversion(VFS_I(ip));
-   flags |= XFS_ILOG_CORE;
+   if (inode_maybe_inc_iversion(VFS_I(ip), flags & XFS_ILOG_CORE))
+   flags |= XFS_ILOG_CORE;
}
 
tp->t_flags |= XFS_TRANS_DIRTY;
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 18/19] btrfs: only dirty the inode in btrfs_update_time if something was changed

2018-01-09 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

At this point, we know that "now" and the file times may differ, and we
suspect that the i_version has been flagged to be bumped. Attempt to
bump the i_version, and only mark the inode dirty if that actually
occurred or if one of the times was updated.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
Acked-by: David Sterba <dste...@suse.com>
---
 fs/btrfs/inode.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ac8692849a81..76245323a7c8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6107,19 +6107,20 @@ static int btrfs_update_time(struct inode *inode, 
struct timespec *now,
 int flags)
 {
struct btrfs_root *root = BTRFS_I(inode)->root;
+   bool dirty = flags & ~S_VERSION;
 
if (btrfs_root_readonly(root))
return -EROFS;
 
if (flags & S_VERSION)
-   inode_inc_iversion(inode);
+   dirty |= inode_maybe_inc_iversion(inode, dirty);
if (flags & S_CTIME)
inode->i_ctime = *now;
if (flags & S_MTIME)
inode->i_mtime = *now;
if (flags & S_ATIME)
inode->i_atime = *now;
-   return btrfs_dirty_inode(inode);
+   return dirty ? btrfs_dirty_inode(inode) : 0;
 }
 
 /*
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 16/19] fs: only set S_VERSION when updating times if necessary

2018-01-09 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

We only really need to update i_version if someone has queried for it
since we last incremented it. By doing that, we can avoid having to
update the inode if the times haven't changed.

If the times have changed, then we go ahead and forcibly increment the
counter, under the assumption that we'll be going to the storage
anyway, and the increment itself is relatively cheap.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
Reviewed-by: Jan Kara <j...@suse.cz>
---
 fs/inode.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 19e72f500f71..e2ca0f4b5151 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1635,17 +1635,21 @@ static int relatime_need_update(const struct path 
*path, struct inode *inode,
 int generic_update_time(struct inode *inode, struct timespec *time, int flags)
 {
int iflags = I_DIRTY_TIME;
+   bool dirty = false;
 
if (flags & S_ATIME)
inode->i_atime = *time;
if (flags & S_VERSION)
-   inode_inc_iversion(inode);
+   dirty = inode_maybe_inc_iversion(inode, false);
if (flags & S_CTIME)
inode->i_ctime = *time;
if (flags & S_MTIME)
inode->i_mtime = *time;
+   if ((flags & (S_ATIME | S_CTIME | S_MTIME)) &&
+   !(inode->i_sb->s_flags & SB_LAZYTIME))
+   dirty = true;
 
-   if (!(inode->i_sb->s_flags & SB_LAZYTIME) || (flags & S_VERSION))
+   if (dirty)
iflags |= I_DIRTY_SYNC;
__mark_inode_dirty(inode, iflags);
return 0;
@@ -1864,7 +1868,7 @@ int file_update_time(struct file *file)
if (!timespec_equal(>i_ctime, ))
sync_it |= S_CTIME;
 
-   if (IS_I_VERSION(inode))
+   if (IS_I_VERSION(inode) && inode_iversion_need_inc(inode))
sync_it |= S_VERSION;
 
if (!sync_it)
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 19/19] fs: handle inode->i_version more efficiently

2018-01-09 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Since i_version is mostly treated as an opaque value, we can exploit that
fact to avoid incrementing it when no one is watching. With that change,
we can avoid incrementing the counter on writes, unless someone has
queried for it since it was last incremented. If the a/c/mtime don't
change, and the i_version hasn't changed, then there's no need to dirty
the inode metadata on a write.

Convert the i_version counter to an atomic64_t, and use the lowest order
bit to hold a flag that will tell whether anyone has queried the value
since it was last incremented.

When we go to maybe increment it, we fetch the value and check the flag
bit.  If it's clear then we don't need to do anything if the update
isn't being forced.

If we do need to update, then we increment the counter by 2, and clear
the flag bit, and then use a CAS op to swap it into place. If that
works, we return true. If it doesn't then do it again with the value
that we fetch from the CAS operation.

On the query side, if the flag is already set, then we just shift the
value down by 1 bit and return it. Otherwise, we set the flag in our
on-stack value and again use cmpxchg to swap it into place if it hasn't
changed. If it has, then we use the value from the cmpxchg as the new
"old" value and try again.

This method allows us to avoid incrementing the counter on writes (and
dirtying the metadata) under typical workloads. We only need to increment
if it has been queried since it was last changed.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
Reviewed-by: Jan Kara <j...@suse.cz>
---
 include/linux/fs.h   |   2 +-
 include/linux/iversion.h | 210 ++-
 2 files changed, 155 insertions(+), 57 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 76382c24e9d0..6804d075933e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -639,7 +639,7 @@ struct inode {
struct hlist_head   i_dentry;
struct rcu_head i_rcu;
};
-   u64 i_version;
+   atomic64_t  i_version;
atomic_ti_count;
atomic_ti_dio_count;
atomic_ti_writecount;
diff --git a/include/linux/iversion.h b/include/linux/iversion.h
index 5ad9eaa3a9b0..9764c102b129 100644
--- a/include/linux/iversion.h
+++ b/include/linux/iversion.h
@@ -5,6 +5,8 @@
 #include 
 
 /*
+ * The inode->i_version field:
+ * ---
  * The change attribute (i_version) is mandated by NFSv4 and is mostly for
  * knfsd, but is also used for other purposes (e.g. IMA). The i_version must
  * appear different to observers if there was a change to the inode's data or
@@ -27,86 +29,171 @@
  * i_version on namespace changes in directories (mkdir, rmdir, unlink, etc.).
  * We consider these sorts of filesystems to have a kernel-managed i_version.
  *
+ * This implementation uses the low bit in the i_version field as a flag to
+ * track when the value has been queried. If it has not been queried since it
+ * was last incremented, we can skip the increment in most cases.
+ *
+ * In the event that we're updating the ctime, we will usually go ahead and
+ * bump the i_version anyway. Since that has to go to stable storage in some
+ * fashion, we might as well increment it as well.
+ *
+ * With this implementation, the value should always appear to observers to
+ * increase over time if the file has changed. It's recommended to use
+ * inode_cmp_iversion() helper to compare values.
+ *
  * Note that some filesystems (e.g. NFS and AFS) just use the field to store
- * a server-provided value (for the most part). For that reason, those
+ * a server-provided value for the most part. For that reason, those
  * filesystems do not set SB_I_VERSION. These filesystems are considered to
  * have a self-managed i_version.
+ *
+ * Persistently storing the i_version
+ * --
+ * Queries of the i_version field are not gated on them hitting the backing
+ * store. It's always possible that the host could crash after allowing
+ * a query of the value but before it has made it to disk.
+ *
+ * To mitigate this problem, filesystems should always use
+ * inode_set_iversion_queried when loading an existing inode from disk. This
+ * ensures that the next attempted inode increment will result in the value
+ * changing.
+ *
+ * Storing the value to disk therefore does not count as a query, so those
+ * filesystems should use inode_peek_iversion to grab the value to be stored.
+ * There is no need to flag the value as having been queried in that case.
  */
 
+/*
+ * We borrow the lowest bit in the i_version to use as a flag to tell whether
+ * it has been queried since we last incremented it. If it has, then we must
+ * increment it on the next change. After that, we can clear the flag and
+ * avoid incrementing i

Re: [PATCH v4 19/19] fs: handle inode->i_version more efficiently

2018-01-08 Thread Jeff Layton
On Mon, 2018-01-08 at 21:17 +0100, Krzysztof Kozlowski wrote:
> On Mon, Jan 08, 2018 at 02:15:29PM -0500, Jeff Layton wrote:
> > On Mon, 2018-01-08 at 19:33 +0100, Krzysztof Kozlowski wrote:
> > > On Mon, Jan 08, 2018 at 01:00:19PM -0500, Jeff Layton wrote:
> > > > On Mon, 2018-01-08 at 18:29 +0100, Krzysztof Kozlowski wrote:
> > > 
> > > (...)
> > > 
> > > > > > Ok, thanks. If you're seeing hangs then that might imply that we 
> > > > > > have
> > > > > > some sort of excessive looping going on in the cmpxchg loops.
> > > > > > 
> > > > > > Could you apply the patch below and let me know if it causes either 
> > > > > > of
> > > > > > the warnings to pop? That might at least point us in the right
> > > > > > direction:
> > > > > 
> > > > > No new warnings with attached patch (except existing already lockdep:
> > > > > "INFO: trying to register non-static key.").
> > > > > 
> > > > 
> > > > Yeah, I saw that in the original logs and it looks unrelated (and
> > > > harmless).
> > > > 
> > > > > Systemd timeouts on mounting /home but after entering rescue shell 
> > > > > there
> > > > > is no problem running mount /home:
> > > > >   Give root password for maintenance
> > > > >   (or press Control-D to continue): 
> > > > >   root@odroidxu3:~# mount /home
> > > > >   [  220.659331] EXT4-fs (mmcblk1p2): mounted filesystem with 
> > > > > ordered data mode. Opts: (null)
> > > > > 
> > > > 
> > > > Ok, thanks for testing it. So I guess we can probably rule out excessive
> > > > looping in those functions as the issue.
> > > > 
> > > > To make sure I understand the problem: When systemd tries to do the
> > > > initial mount of /home (which is an ext4 filesystem), it hangs. But once
> > > > it drops to the shell, it works, if you do the mount by hand.
> > > > 
> > > > Is that correct?
> > > 
> > > Yes, although it also timeouts on setting up /dev/ttySAC2 (serial
> > > console).
> > > 
> > > > If so, then is it possible to trigger sysrq commands during the hanging
> > > > mount attempt? Maybe you could use e.g. sysrq-l, sysrq-w, etc. to
> > > > determine what it's blocking on?
> > 
> > (trimming the output)
> > 
> > Thanks. I don't really see anything obvious in that info,
> > unfortunately. What we really need to do is find the systemd task
> > performing the mount, and see what it's doing.
> 
> It's systemd 236.0-2 coming from Arch Linux for ARM. All packages are
> updated.
> 
> > We do have one questionable bug in the NFS changes though. Does this
> > patch help at all?
> 
> Patches do not change anything (I tried "SQUASH: nfs: fix i_version
> increment when adding a request" and "SQUASH: ext4: use raw API for
> xattr inode refcounts").
> 
> I tried again regular SDcard-root boot and it succeeded. Only nfsroot
> fails.
> 
> Best regards,
> Krzysztof
> 

Got it, that's helpful. Does this patch help (on top of the others) ?

8<--

SQUASH: nfs: compare raw iversion counter since that's what's
 being stored

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/nfs/inode.c   |  6 +++---
 include/linux/iversion.h | 13 +
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 0b85cca1184b..93552c482992 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1289,7 +1289,7 @@ static unsigned long nfs_wcc_update_inode(struct inode 
*inode, struct nfs_fattr
 
if ((fattr->valid & NFS_ATTR_FATTR_PRECHANGE)
&& (fattr->valid & NFS_ATTR_FATTR_CHANGE)
-   && !inode_cmp_iversion(inode, fattr->pre_change_attr)) {
+   && !inode_cmp_iversion_raw(inode, 
fattr->pre_change_attr)) {
inode_set_iversion_raw(inode, fattr->change_attr);
if (S_ISDIR(inode->i_mode))
nfs_set_cache_invalid(inode, NFS_INO_INVALID_DATA);
@@ -1348,7 +1348,7 @@ static int nfs_check_inode_attributes(struct inode 
*inode, struct nfs_fattr *fat
 
if (!nfs_file_has_buffered_writers(nfsi)) {
/* Verify a few of the more important attributes */
-   if ((fattr->va

Re: [PATCH v4 19/19] fs: handle inode->i_version more efficiently

2018-01-08 Thread Jeff Layton
On Mon, 2018-01-08 at 14:15 -0500, Jeff Layton wrote:
> On Mon, 2018-01-08 at 19:33 +0100, Krzysztof Kozlowski wrote:
> > On Mon, Jan 08, 2018 at 01:00:19PM -0500, Jeff Layton wrote:
> > > On Mon, 2018-01-08 at 18:29 +0100, Krzysztof Kozlowski wrote:
> > 
> > (...)
> > 
> > > > > Ok, thanks. If you're seeing hangs then that might imply that we have
> > > > > some sort of excessive looping going on in the cmpxchg loops.
> > > > > 
> > > > > Could you apply the patch below and let me know if it causes either of
> > > > > the warnings to pop? That might at least point us in the right
> > > > > direction:
> > > > 
> > > > No new warnings with attached patch (except existing already lockdep:
> > > > "INFO: trying to register non-static key.").
> > > > 
> > > 
> > > Yeah, I saw that in the original logs and it looks unrelated (and
> > > harmless).
> > > 
> > > > Systemd timeouts on mounting /home but after entering rescue shell there
> > > > is no problem running mount /home:
> > > > Give root password for maintenance
> > > > (or press Control-D to continue): 
> > > > root@odroidxu3:~# mount /home
> > > > [  220.659331] EXT4-fs (mmcblk1p2): mounted filesystem with 
> > > > ordered data mode. Opts: (null)
> > > > 
> > > 
> > > Ok, thanks for testing it. So I guess we can probably rule out excessive
> > > looping in those functions as the issue.
> > > 
> > > To make sure I understand the problem: When systemd tries to do the
> > > initial mount of /home (which is an ext4 filesystem), it hangs. But once
> > > it drops to the shell, it works, if you do the mount by hand.
> > > 
> > > Is that correct?
> > 
> > Yes, although it also timeouts on setting up /dev/ttySAC2 (serial
> > console).
> > 
> > > If so, then is it possible to trigger sysrq commands during the hanging
> > > mount attempt? Maybe you could use e.g. sysrq-l, sysrq-w, etc. to
> > > determine what it's blocking on?
> 
> (trimming the output)
> 
> Thanks. I don't really see anything obvious in that info,
> unfortunately. What we really need to do is find the systemd task
> performing the mount, and see what it's doing.
> 
> We do have one questionable bug in the NFS changes though. Does this
> patch help at all?
> 
> ---8<-
> 
> SQUASH: nfs: fix i_version increment when adding a request
> 
> NFS treats this value as an opaque value with no flag, so we must
> increment it as such instead of using inode_inc_iversion.
> 
> Signed-off-by: Jeff Layton <jlay...@redhat.com>
> ---
>  fs/nfs/write.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index a03fbac1f88c..48837b6250e9 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -755,7 +755,7 @@ static void nfs_inode_add_request(struct inode *inode, 
> struct nfs_page *req)
>   spin_lock(>private_lock);
>   if (!nfs_have_writebacks(inode) &&
>   NFS_PROTO(inode)->have_delegation(inode, FMODE_WRITE))
> -     inode_inc_iversion(inode);
> + atomic64_inc(>i_version);
>   if (likely(!PageSwapCache(req->wb_page))) {
>   set_bit(PG_MAPPED, >wb_flags);
>   SetPagePrivate(req->wb_page);

Sorry for the slow dribble of patches. I found a bug in the ext4 code
too. Might want to try this ext4 patch as well:

--8<-

SQUASH: ext4: use raw API for xattr inode refcounts

This could distort the values otherwise.

(Side Q: Why do we do this across the ctime and iversion like this?)

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/ext4/xattr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index ba6fd5439aa4..63656dbafdc4 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -295,13 +295,13 @@ ext4_xattr_inode_hash(struct ext4_sb_info *sbi, const 
void *buffer, size_t size)
 static u64 ext4_xattr_inode_get_ref(struct inode *ea_inode)
 {
return ((u64)ea_inode->i_ctime.tv_sec << 32) |
-   (u32) inode_peek_iversion(ea_inode);
+   (u32) inode_peek_iversion_raw(ea_inode);
 }
 
 static void ext4_xattr_inode_set_ref(struct inode *ea_inode, u64 ref_count)
 {
ea_inode->i_ctime.tv_sec = (u32)(ref_count >> 32);
-   inode_set_iversion(ea_inode, ref_count & 0x);
+   inode_set_iversion_raw(ea_inode, ref_count & 0x);
 }
 
 static u32 ext4_xattr_inode_get_hash(struct inode *ea_inode)
-- 
2.14.3



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 19/19] fs: handle inode->i_version more efficiently

2018-01-08 Thread Jeff Layton
On Mon, 2018-01-08 at 19:33 +0100, Krzysztof Kozlowski wrote:
> On Mon, Jan 08, 2018 at 01:00:19PM -0500, Jeff Layton wrote:
> > On Mon, 2018-01-08 at 18:29 +0100, Krzysztof Kozlowski wrote:
> 
> (...)
> 
> > > > Ok, thanks. If you're seeing hangs then that might imply that we have
> > > > some sort of excessive looping going on in the cmpxchg loops.
> > > > 
> > > > Could you apply the patch below and let me know if it causes either of
> > > > the warnings to pop? That might at least point us in the right
> > > > direction:
> > > 
> > > No new warnings with attached patch (except existing already lockdep:
> > > "INFO: trying to register non-static key.").
> > > 
> > 
> > Yeah, I saw that in the original logs and it looks unrelated (and
> > harmless).
> > 
> > > Systemd timeouts on mounting /home but after entering rescue shell there
> > > is no problem running mount /home:
> > >   Give root password for maintenance
> > >   (or press Control-D to continue): 
> > >   root@odroidxu3:~# mount /home
> > >   [  220.659331] EXT4-fs (mmcblk1p2): mounted filesystem with ordered 
> > > data mode. Opts: (null)
> > > 
> > 
> > Ok, thanks for testing it. So I guess we can probably rule out excessive
> > looping in those functions as the issue.
> > 
> > To make sure I understand the problem: When systemd tries to do the
> > initial mount of /home (which is an ext4 filesystem), it hangs. But once
> > it drops to the shell, it works, if you do the mount by hand.
> > 
> > Is that correct?
> 
> Yes, although it also timeouts on setting up /dev/ttySAC2 (serial
> console).
> 
> > If so, then is it possible to trigger sysrq commands during the hanging
> > mount attempt? Maybe you could use e.g. sysrq-l, sysrq-w, etc. to
> > determine what it's blocking on?

(trimming the output)

Thanks. I don't really see anything obvious in that info,
unfortunately. What we really need to do is find the systemd task
performing the mount, and see what it's doing.

We do have one questionable bug in the NFS changes though. Does this
patch help at all?

---8<-

SQUASH: nfs: fix i_version increment when adding a request

NFS treats this value as an opaque value with no flag, so we must
increment it as such instead of using inode_inc_iversion.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/nfs/write.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index a03fbac1f88c..48837b6250e9 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -755,7 +755,7 @@ static void nfs_inode_add_request(struct inode *inode, 
struct nfs_page *req)
spin_lock(>private_lock);
if (!nfs_have_writebacks(inode) &&
NFS_PROTO(inode)->have_delegation(inode, FMODE_WRITE))
-   inode_inc_iversion(inode);
+   atomic64_inc(>i_version);
if (likely(!PageSwapCache(req->wb_page))) {
set_bit(PG_MAPPED, >wb_flags);
SetPagePrivate(req->wb_page);
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 19/19] fs: handle inode->i_version more efficiently

2018-01-08 Thread Jeff Layton
On Mon, 2018-01-08 at 18:29 +0100, Krzysztof Kozlowski wrote:
> On Mon, Jan 08, 2018 at 08:29:24AM -0500, Jeff Layton wrote:
> > On Mon, 2018-01-08 at 14:21 +0100, Krzysztof Kozlowski wrote:
> > > On Mon, Jan 8, 2018 at 1:56 PM, Jeff Layton <jlay...@kernel.org> wrote:
> > > > On Sun, 2018-01-07 at 13:44 +0100, Krzysztof Kozlowski wrote:
> > > > > On Fri, Dec 22, 2017 at 1:05 PM, Jeff Layton <jlay...@kernel.org> 
> > > > > wrote:
> > > > > > From: Jeff Layton <jlay...@redhat.com>
> > > > > > 
> > > > > > Since i_version is mostly treated as an opaque value, we can 
> > > > > > exploit that
> > > > > > fact to avoid incrementing it when no one is watching. With that 
> > > > > > change,
> > > > > > we can avoid incrementing the counter on writes, unless someone has
> > > > > > queried for it since it was last incremented. If the a/c/mtime don't
> > > > > > change, and the i_version hasn't changed, then there's no need to 
> > > > > > dirty
> > > > > > the inode metadata on a write.
> > > > > > 
> > > > > > Convert the i_version counter to an atomic64_t, and use the lowest 
> > > > > > order
> > > > > > bit to hold a flag that will tell whether anyone has queried the 
> > > > > > value
> > > > > > since it was last incremented.
> > > > > > 
> > > > > > When we go to maybe increment it, we fetch the value and check the 
> > > > > > flag
> > > > > > bit.  If it's clear then we don't need to do anything if the update
> > > > > > isn't being forced.
> > > > > > 
> > > > > > If we do need to update, then we increment the counter by 2, and 
> > > > > > clear
> > > > > > the flag bit, and then use a CAS op to swap it into place. If that
> > > > > > works, we return true. If it doesn't then do it again with the value
> > > > > > that we fetch from the CAS operation.
> > > > > > 
> > > > > > On the query side, if the flag is already set, then we just shift 
> > > > > > the
> > > > > > value down by 1 bit and return it. Otherwise, we set the flag in our
> > > > > > on-stack value and again use cmpxchg to swap it into place if it 
> > > > > > hasn't
> > > > > > changed. If it has, then we use the value from the cmpxchg as the 
> > > > > > new
> > > > > > "old" value and try again.
> > > > > > 
> > > > > > This method allows us to avoid incrementing the counter on writes 
> > > > > > (and
> > > > > > dirtying the metadata) under typical workloads. We only need to 
> > > > > > increment
> > > > > > if it has been queried since it was last changed.
> > > > > > 
> > > > > > Signed-off-by: Jeff Layton <jlay...@redhat.com>
> > > > > > ---
> > > > > >  include/linux/fs.h   |   2 +-
> > > > > >  include/linux/iversion.h | 208 
> > > > > > ++-
> > > > > >  2 files changed, 154 insertions(+), 56 deletions(-)
> > > > > > 
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > On recent linux-next my ARM/Exynos boards fail to boot over nfsroot.
> > > > > This commit popped up through bisect (log at the end). Systemd
> > > > > timeouts on some device-specific services, including mounting ext4
> > > > > /home:
> > > > > 
> > > > > [ *** ] (1 of 4) A start job is running for…ress polling (1min 41s / 
> > > > > no limit)
> > > > > [ TIME ] Timed out waiting for device dev-ttySAC2.device.
> > > > > Jan 07 13:29:38 [DEPEND] Dependency failed for Serial Getty on 
> > > > > ttySAC2.
> > > > > Jan 07 13:29:38 [ TIME ] Timed out waiting for device
> > > > > dev-disk-by\x2dlabel-home.device.
> > > > > Jan 07 13:29:38 [DEPEND] Dependency failed for /home.
> > > > > Jan 07 13:29:38 [DEPEND] Dependency failed for Local File Systems.
> > > > > Jan 07 13:29:38 [DEPEND] Dependency failed for File System Check on
> > > > > /dev/

Re: [PATCH v4 19/19] fs: handle inode->i_version more efficiently

2018-01-08 Thread Jeff Layton
On Mon, 2018-01-08 at 05:30 -0800, Matthew Wilcox wrote:
> On Fri, Dec 22, 2017 at 07:05:56AM -0500, Jeff Layton wrote:
> > +   cur = inode_peek_iversion_raw(inode);
> > +   for (;;) {
> > +   /* If flag is clear then we needn't do anything */
> > +   if (!force && !(cur & I_VERSION_QUERIED))
> > +   return false;
> > +   /* Since lowest bit is flag, add 2 to avoid it */
> > +   new = (cur & ~I_VERSION_QUERIED) + I_VERSION_INCREMENT;
> 
> Isn't this an extraordinarily complicated way of spelling:
> 
>   new = cur + 1;
> 
> We know 'cur' has I_VERSION_QUERIED set, so clearing that bit and adding
> two is going to be the same as adding 1 ... right?
> 

It would be, but if "force" is true, then I_VERSION_QUERIED may not be
set.
-- 
Jeff Layton <jlay...@kernel.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 19/19] fs: handle inode->i_version more efficiently

2018-01-08 Thread Jeff Layton
On Mon, 2018-01-08 at 14:21 +0100, Krzysztof Kozlowski wrote:
> On Mon, Jan 8, 2018 at 1:56 PM, Jeff Layton <jlay...@kernel.org> wrote:
> > On Sun, 2018-01-07 at 13:44 +0100, Krzysztof Kozlowski wrote:
> > > On Fri, Dec 22, 2017 at 1:05 PM, Jeff Layton <jlay...@kernel.org> wrote:
> > > > From: Jeff Layton <jlay...@redhat.com>
> > > > 
> > > > Since i_version is mostly treated as an opaque value, we can exploit 
> > > > that
> > > > fact to avoid incrementing it when no one is watching. With that change,
> > > > we can avoid incrementing the counter on writes, unless someone has
> > > > queried for it since it was last incremented. If the a/c/mtime don't
> > > > change, and the i_version hasn't changed, then there's no need to dirty
> > > > the inode metadata on a write.
> > > > 
> > > > Convert the i_version counter to an atomic64_t, and use the lowest order
> > > > bit to hold a flag that will tell whether anyone has queried the value
> > > > since it was last incremented.
> > > > 
> > > > When we go to maybe increment it, we fetch the value and check the flag
> > > > bit.  If it's clear then we don't need to do anything if the update
> > > > isn't being forced.
> > > > 
> > > > If we do need to update, then we increment the counter by 2, and clear
> > > > the flag bit, and then use a CAS op to swap it into place. If that
> > > > works, we return true. If it doesn't then do it again with the value
> > > > that we fetch from the CAS operation.
> > > > 
> > > > On the query side, if the flag is already set, then we just shift the
> > > > value down by 1 bit and return it. Otherwise, we set the flag in our
> > > > on-stack value and again use cmpxchg to swap it into place if it hasn't
> > > > changed. If it has, then we use the value from the cmpxchg as the new
> > > > "old" value and try again.
> > > > 
> > > > This method allows us to avoid incrementing the counter on writes (and
> > > > dirtying the metadata) under typical workloads. We only need to 
> > > > increment
> > > > if it has been queried since it was last changed.
> > > > 
> > > > Signed-off-by: Jeff Layton <jlay...@redhat.com>
> > > > ---
> > > >  include/linux/fs.h   |   2 +-
> > > >  include/linux/iversion.h | 208 
> > > > ++-
> > > >  2 files changed, 154 insertions(+), 56 deletions(-)
> > > > 
> > > 
> > > Hi,
> > > 
> > > On recent linux-next my ARM/Exynos boards fail to boot over nfsroot.
> > > This commit popped up through bisect (log at the end). Systemd
> > > timeouts on some device-specific services, including mounting ext4
> > > /home:
> > > 
> > > [ *** ] (1 of 4) A start job is running for…ress polling (1min 41s / no 
> > > limit)
> > > [ TIME ] Timed out waiting for device dev-ttySAC2.device.
> > > Jan 07 13:29:38 [DEPEND] Dependency failed for Serial Getty on ttySAC2.
> > > Jan 07 13:29:38 [ TIME ] Timed out waiting for device
> > > dev-disk-by\x2dlabel-home.device.
> > > Jan 07 13:29:38 [DEPEND] Dependency failed for /home.
> > > Jan 07 13:29:38 [DEPEND] Dependency failed for Local File Systems.
> > > Jan 07 13:29:38 [DEPEND] Dependency failed for File System Check on
> > > /dev/disk/by-label/home.
> > > Jan 07 13:30:02 [ *** ] (1 of 2) A start job is running for…ress
> > > polling (1min 53s / no limit)
> > > 
> > > Kernel command line:
> > > console=tty1 console=ttySAC2,115200n8
> > > ip=192.168.1.11:192.168.1.10:192.168.1.1:255.255.255.0::eth0:none
> > > nfsrootdebug root=/dev/nfs
> > > nfsroot=192.168.1.10:/srv/nfs/odroidxu3,vers=4,nolock rootwait rw
> > > smsc95xx.macaddr=00:1e:06:61:7a:93 no_console_suspend
> > > 
> > > /home is /dev/mmcblk1p2:
> > > kozik@odroidxu3:~$ tune2fs -l /dev/mmcblk1p2
> > > tune2fs 1.43.7 (16-Oct-2017)
> > > Filesystem volume name:   home
> > > Last mounted on:  /home
> > > Filesystem UUID:  3f9dbeba-2738-45d3-807e-c1b2e21128ed
> > > Filesystem magic number:  0xEF53
> > > Filesystem revision #:1 (dynamic)
> > > Filesystem features:  has_journal ext_attr resize_inode dir_index
> > > filetype needs_recovery extent flex_bg sparse_super large_file
> > &

Re: [PATCH v4 19/19] fs: handle inode->i_version more efficiently

2018-01-08 Thread Jeff Layton
On Sun, 2018-01-07 at 13:44 +0100, Krzysztof Kozlowski wrote:
> On Fri, Dec 22, 2017 at 1:05 PM, Jeff Layton <jlay...@kernel.org> wrote:
> > From: Jeff Layton <jlay...@redhat.com>
> > 
> > Since i_version is mostly treated as an opaque value, we can exploit that
> > fact to avoid incrementing it when no one is watching. With that change,
> > we can avoid incrementing the counter on writes, unless someone has
> > queried for it since it was last incremented. If the a/c/mtime don't
> > change, and the i_version hasn't changed, then there's no need to dirty
> > the inode metadata on a write.
> > 
> > Convert the i_version counter to an atomic64_t, and use the lowest order
> > bit to hold a flag that will tell whether anyone has queried the value
> > since it was last incremented.
> > 
> > When we go to maybe increment it, we fetch the value and check the flag
> > bit.  If it's clear then we don't need to do anything if the update
> > isn't being forced.
> > 
> > If we do need to update, then we increment the counter by 2, and clear
> > the flag bit, and then use a CAS op to swap it into place. If that
> > works, we return true. If it doesn't then do it again with the value
> > that we fetch from the CAS operation.
> > 
> > On the query side, if the flag is already set, then we just shift the
> > value down by 1 bit and return it. Otherwise, we set the flag in our
> > on-stack value and again use cmpxchg to swap it into place if it hasn't
> > changed. If it has, then we use the value from the cmpxchg as the new
> > "old" value and try again.
> > 
> > This method allows us to avoid incrementing the counter on writes (and
> > dirtying the metadata) under typical workloads. We only need to increment
> > if it has been queried since it was last changed.
> > 
> > Signed-off-by: Jeff Layton <jlay...@redhat.com>
> > ---
> >  include/linux/fs.h   |   2 +-
> >  include/linux/iversion.h | 208 
> > ++-
> >  2 files changed, 154 insertions(+), 56 deletions(-)
> > 
> 
> Hi,
> 
> On recent linux-next my ARM/Exynos boards fail to boot over nfsroot.
> This commit popped up through bisect (log at the end). Systemd
> timeouts on some device-specific services, including mounting ext4
> /home:
> 
> [ *** ] (1 of 4) A start job is running for…ress polling (1min 41s / no limit)
> [ TIME ] Timed out waiting for device dev-ttySAC2.device.
> Jan 07 13:29:38 [DEPEND] Dependency failed for Serial Getty on ttySAC2.
> Jan 07 13:29:38 [ TIME ] Timed out waiting for device
> dev-disk-by\x2dlabel-home.device.
> Jan 07 13:29:38 [DEPEND] Dependency failed for /home.
> Jan 07 13:29:38 [DEPEND] Dependency failed for Local File Systems.
> Jan 07 13:29:38 [DEPEND] Dependency failed for File System Check on
> /dev/disk/by-label/home.
> Jan 07 13:30:02 [ *** ] (1 of 2) A start job is running for…ress
> polling (1min 53s / no limit)
> 
> Kernel command line:
> console=tty1 console=ttySAC2,115200n8
> ip=192.168.1.11:192.168.1.10:192.168.1.1:255.255.255.0::eth0:none
> nfsrootdebug root=/dev/nfs
> nfsroot=192.168.1.10:/srv/nfs/odroidxu3,vers=4,nolock rootwait rw
> smsc95xx.macaddr=00:1e:06:61:7a:93 no_console_suspend
> 
> /home is /dev/mmcblk1p2:
> kozik@odroidxu3:~$ tune2fs -l /dev/mmcblk1p2
> tune2fs 1.43.7 (16-Oct-2017)
> Filesystem volume name:   home
> Last mounted on:  /home
> Filesystem UUID:  3f9dbeba-2738-45d3-807e-c1b2e21128ed
> Filesystem magic number:  0xEF53
> Filesystem revision #:1 (dynamic)
> Filesystem features:  has_journal ext_attr resize_inode dir_index
> filetype needs_recovery extent flex_bg sparse_super large_file
> uninit_bg dir_nlink extra_isize
> Filesystem flags: signed_directory_hash
> Default mount options:user_xattr acl
> Filesystem state: clean
> Errors behavior:  Continue
> Filesystem OS type:   Linux
> Inode count:  1430800
> Block count:  5717760
> Reserved block count: 285888
> Free blocks:  5467576
> Free inodes:  1428301
> First block:  0
> Block size:   4096
> Fragment size:4096
> Reserved GDT blocks:  1022
> Blocks per group: 32768
> Fragments per group:  32768
> Inodes per group: 8176
> Inode blocks per group:   511
> Flex block group size:16
> Filesystem created:   Thu May 21 12:17:05 2015
> Last mount time:  Thu Dec 21 13:31:26 2017
> Last write time:  Thu Dec 21 13:31:26 2017
> Mount count:  1
> Maximum mount count:   

Re: [PATCH v4 12/19] ocfs2: convert to new i_version API

2018-01-04 Thread Jeff Layton
On Fri, 2017-12-22 at 07:05 -0500, Jeff Layton wrote:
> From: Jeff Layton <jlay...@redhat.com>
> 
> Signed-off-by: Jeff Layton <jlay...@redhat.com>
> Reviewed-by: Jan Kara <j...@suse.cz>
> ---
>  fs/ocfs2/dir.c  | 15 ---
>  fs/ocfs2/inode.c|  3 ++-
>  fs/ocfs2/namei.c|  3 ++-
>  fs/ocfs2/quota_global.c |  3 ++-
>  4 files changed, 14 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c
> index febe6312ceff..32f9c72dff17 100644
> --- a/fs/ocfs2/dir.c
> +++ b/fs/ocfs2/dir.c
> @@ -42,6 +42,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -1174,7 +1175,7 @@ static int __ocfs2_delete_entry(handle_t *handle, 
> struct inode *dir,
>   le16_add_cpu(>rec_len,
>   le16_to_cpu(de->rec_len));
>   de->inode = 0;
> - dir->i_version++;
> + inode_inc_iversion(dir);
>   ocfs2_journal_dirty(handle, bh);
>   goto bail;
>   }
> @@ -1729,7 +1730,7 @@ int __ocfs2_add_entry(handle_t *handle,
>   if (ocfs2_dir_indexed(dir))
>   ocfs2_recalc_free_list(dir, handle, lookup);
>  
> - dir->i_version++;
> + inode_inc_iversion(dir);
>   ocfs2_journal_dirty(handle, insert_bh);
>   retval = 0;
>   goto bail;
> @@ -1775,7 +1776,7 @@ static int ocfs2_dir_foreach_blk_id(struct inode *inode,
>* readdir(2), then we might be pointing to an invalid
>* dirent right now.  Scan from the start of the block
>* to make sure. */
> - if (*f_version != inode->i_version) {
> + if (inode_cmp_iversion(inode, *f_version)) {
>   for (i = 0; i < i_size_read(inode) && i < offset; ) {
>   de = (struct ocfs2_dir_entry *)
>   (data->id_data + i);
> @@ -1791,7 +1792,7 @@ static int ocfs2_dir_foreach_blk_id(struct inode *inode,
>   i += le16_to_cpu(de->rec_len);
>   }
>   ctx->pos = offset = i;
> - *f_version = inode->i_version;
> + *f_version = inode_query_iversion(inode);
>   }
>  
>   de = (struct ocfs2_dir_entry *) (data->id_data + ctx->pos);
> @@ -1869,7 +1870,7 @@ static int ocfs2_dir_foreach_blk_el(struct inode *inode,
>* readdir(2), then we might be pointing to an invalid
>* dirent right now.  Scan from the start of the block
>* to make sure. */
> - if (*f_version != inode->i_version) {
> + if (inode_cmp_iversion(inode, *f_version)) {
>   for (i = 0; i < sb->s_blocksize && i < offset; ) {
>   de = (struct ocfs2_dir_entry *) (bh->b_data + 
> i);
>   /* It's too expensive to do a full
> @@ -1886,7 +1887,7 @@ static int ocfs2_dir_foreach_blk_el(struct inode *inode,
>   offset = i;
>   ctx->pos = (ctx->pos & ~(sb->s_blocksize - 1))
>   | offset;
> - *f_version = inode->i_version;
> + *f_version = inode_query_iversion(inode);
>   }
>  
>   while (ctx->pos < i_size_read(inode)
> @@ -1940,7 +1941,7 @@ static int ocfs2_dir_foreach_blk(struct inode *inode, 
> u64 *f_version,
>   */
>  int ocfs2_dir_foreach(struct inode *inode, struct dir_context *ctx)
>  {
> - u64 version = inode->i_version;
> + u64 version = inode_query_iversion(inode);
>   ocfs2_dir_foreach_blk(inode, , ctx, true);
>   return 0;
>  }
> diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
> index 1a1e0078ab38..d51b80edd972 100644
> --- a/fs/ocfs2/inode.c
> +++ b/fs/ocfs2/inode.c
> @@ -28,6 +28,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -302,7 +303,7 @@ void ocfs2_populate_inode(struct inode *inode, struct 
> ocfs2_dinode *fe,
>   OCFS2_I(inode)->ip_attr = le32_to_cpu(fe->i_attr);
>   OCFS2_I(inode)->ip_dyn_features = le16_to_cpu(fe->i_dyn_features);
>  
> - inode->i_version = 1;
> + inode_set_iversion(inode, 1);
>   inode->i_generation = le32_to_cpu(fe->i_generation);
>

Re: [PATCH v4 18/19] btrfs: only dirty the inode in btrfs_update_time if something was changed

2018-01-03 Thread Jeff Layton
On Fri, 2017-12-22 at 07:05 -0500, Jeff Layton wrote:
> From: Jeff Layton <jlay...@redhat.com>
> 
> At this point, we know that "now" and the file times may differ, and we
> suspect that the i_version has been flagged to be bumped. Attempt to
> bump the i_version, and only mark the inode dirty if that actually
> occurred or if one of the times was updated.
> 
> Signed-off-by: Jeff Layton <jlay...@redhat.com>
> ---
>  fs/btrfs/inode.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index ac8692849a81..76245323a7c8 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -6107,19 +6107,20 @@ static int btrfs_update_time(struct inode *inode, 
> struct timespec *now,
>int flags)
>  {
>   struct btrfs_root *root = BTRFS_I(inode)->root;
> + bool dirty = flags & ~S_VERSION;
>  
>   if (btrfs_root_readonly(root))
>   return -EROFS;
>  
>   if (flags & S_VERSION)
> - inode_inc_iversion(inode);
> + dirty |= inode_maybe_inc_iversion(inode, dirty);
>   if (flags & S_CTIME)
>   inode->i_ctime = *now;
>   if (flags & S_MTIME)
>   inode->i_mtime = *now;
>   if (flags & S_ATIME)
>   inode->i_atime = *now;
> - return btrfs_dirty_inode(inode);
> + return dirty ? btrfs_dirty_inode(inode) : 0;
>  }
>  
>  /*

Hi btrfs folks!

I posted this set just before xmas, and am hoping to send Linus a PR for
it when the next merge window opens. Both the ext4 and xfs folks have
acked their respective parts.

I'll note that this set helps small I/O workloads on xfs and ext4
significantly, but somewhat less so on btrfs. I'd also appreciate any
insight as to why it doesn't help on btrfs as much, and whether we can
alter this approach to help things there.

Many thanks!
-- 
Jeff Layton <jlay...@kernel.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 16/19] fs: only set S_VERSION when updating times if necessary

2018-01-02 Thread Jeff Layton
On Tue, 2018-01-02 at 17:50 +0100, Jan Kara wrote:
> On Fri 22-12-17 07:05:53, Jeff Layton wrote:
> > From: Jeff Layton <jlay...@redhat.com>
> > 
> > We only really need to update i_version if someone has queried for it
> > since we last incremented it. By doing that, we can avoid having to
> > update the inode if the times haven't changed.
> > 
> > If the times have changed, then we go ahead and forcibly increment the
> > counter, under the assumption that we'll be going to the storage
> > anyway, and the increment itself is relatively cheap.
> > 
> > Signed-off-by: Jeff Layton <jlay...@redhat.com>
> > ---
> >  fs/inode.c | 10 +++---
> >  1 file changed, 7 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/inode.c b/fs/inode.c
> > index 19e72f500f71..2fa920188759 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -1635,17 +1635,21 @@ static int relatime_need_update(const struct path 
> > *path, struct inode *inode,
> >  int generic_update_time(struct inode *inode, struct timespec *time, int 
> > flags)
> >  {
> > int iflags = I_DIRTY_TIME;
> > +   bool dirty = false;
> >  
> > if (flags & S_ATIME)
> > inode->i_atime = *time;
> > if (flags & S_VERSION)
> > -   inode_inc_iversion(inode);
> > +   dirty |= inode_maybe_inc_iversion(inode, dirty);
> > if (flags & S_CTIME)
> > inode->i_ctime = *time;
> > if (flags & S_MTIME)
> > inode->i_mtime = *time;
> > +   if ((flags & (S_ATIME | S_CTIME | S_MTIME)) &&
> > +   !(inode->i_sb->s_flags & SB_LAZYTIME))
> > +   dirty = true;
> 
> When you pass 'dirty' to inode_maybe_inc_iversion(), it is always false.
> Maybe this condition should be at the beginning of the function? Once you
> fix that the patch looks good so you can add:
> 
> Reviewed-by: Jan Kara <j...@suse.cz>
> 

Thanks for the review! I've fixed it in my tree. I'll not re-post the
set unless I have to make another significant change or someone requests
it.

I did make one other change, and that was to drop the "const" qualifiers
on the integer arguments in the new API. David Howells pointed out that
they don't really help anything, and the prototypes look cleaner without
them.

This set is now in linux-next as well, so I'm going to try to get this
merged into v4.16, assuming no problems between now and the merge
window.
-- 
Jeff Layton <jlay...@kernel.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 05/19] afs: convert to new i_version API

2018-01-02 Thread Jeff Layton
On Tue, 2018-01-02 at 17:20 +, David Howells wrote:
> Jeff Layton <jlay...@kernel.org> wrote:
> 
> > Note that AFS has quite a different definition for this counter. AFS
> > only increments it on changes to the data, not for the metadata.
> 
> This also applies to AFS directories: create, mkdir, unlink, rmdir, link,
> symlink, rename, and mountpoint creation/removal all bump the data version
> number on a directory by exactly one if they change it.
> 

Thanks! I updated that part of the the commit log to read:

Note that AFS has quite a different definition for this counter. AFS
only increments it on changes to the data to the data in regular files
and contents of the directories. Inode metadata changes do not result
in a version increment.

-- 
Jeff Layton <jlay...@kernel.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 01/19] fs: new API for handling inode->i_version

2017-12-22 Thread Jeff Layton
On Sat, 2017-12-23 at 10:14 +1100, NeilBrown wrote:
> On Fri, Dec 22 2017, Jeff Layton wrote:
> 
> > From: Jeff Layton <jlay...@redhat.com>
> > 
> > Add a documentation blob that explains what the i_version field is, how
> > it is expected to work, and how it is currently implemented by various
> > filesystems.
> > 
> > We already have inode_inc_iversion. Add several other functions for
> > manipulating and accessing the i_version counter. For now, the
> > implementation is trivial and basically works the way that all of the
> > open-coded i_version accesses work today.
> > 
> > Future patches will convert existing users of i_version to use the new
> > API, and then convert the backend implementation to do things more
> > efficiently.
> > 
> > Signed-off-by: Jeff Layton <jlay...@redhat.com>
> > ---
> >  fs/btrfs/file.c  |   1 +
> >  fs/btrfs/inode.c |   1 +
> >  fs/btrfs/ioctl.c |   1 +
> >  fs/btrfs/xattr.c |   1 +
> >  fs/ext4/inode.c  |   1 +
> >  fs/ext4/namei.c  |   1 +
> >  fs/inode.c   |   1 +
> >  include/linux/fs.h   |  15 
> >  include/linux/iversion.h | 205 
> > +++
> >  9 files changed, 212 insertions(+), 15 deletions(-)
> >  create mode 100644 include/linux/iversion.h
> > 
> > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> > index eb1bac7c8553..c95d7b2efefb 100644
> > --- a/fs/btrfs/file.c
> > +++ b/fs/btrfs/file.c
> > @@ -31,6 +31,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include "ctree.h"
> >  #include "disk-io.h"
> >  #include "transaction.h"
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index e1a7f3cb5be9..27f008b33fc1 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -43,6 +43,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include "ctree.h"
> >  #include "disk-io.h"
> >  #include "transaction.h"
> > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> > index 2ef8acaac688..aa452c9e2eff 100644
> > --- a/fs/btrfs/ioctl.c
> > +++ b/fs/btrfs/ioctl.c
> > @@ -43,6 +43,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include "ctree.h"
> >  #include "disk-io.h"
> >  #include "transaction.h"
> > diff --git a/fs/btrfs/xattr.c b/fs/btrfs/xattr.c
> > index 2c7e53f9ff1b..5258c1714830 100644
> > --- a/fs/btrfs/xattr.c
> > +++ b/fs/btrfs/xattr.c
> > @@ -23,6 +23,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include "ctree.h"
> >  #include "btrfs_inode.h"
> >  #include "transaction.h"
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index 7df2c5644e59..fa5d8bc52d2d 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -39,6 +39,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #include "ext4_jbd2.h"
> >  #include "xattr.h"
> > diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> > index 798b3ac680db..bcf0dff517be 100644
> > --- a/fs/ext4/namei.c
> > +++ b/fs/ext4/namei.c
> > @@ -34,6 +34,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include "ext4.h"
> >  #include "ext4_jbd2.h"
> >  
> > diff --git a/fs/inode.c b/fs/inode.c
> > index 03102d6ef044..19e72f500f71 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -18,6 +18,7 @@
> >  #include  /* for inode_has_buffers */
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include "internal.h"
> >  
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 511fbaabf624..76382c24e9d0 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -2036,21 +2036,6 @@ static inline void inode_dec_link_count(struct inode 
> > *inode)
> > mark_inode_dirty(inode);
> >  }
> >  
> > -/**
> > - * inode_inc_iversion - increments i_version
> > - * @inode: inode that need to be updated
> > - *
> > - * Every time the inode is modified, the i_version field will be 
> > incremented.
> > - * The filesystem has to be mounted with i_version flag
> > - */
> > -
> > -stat

[PATCH v4 00/19] fs: rework and optimize i_version handling in filesystems

2017-12-22 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

v4:
- fix SB_LAZYTIME handling in generic_update_time
- add memory barriers to patch to convert i_version field to atomic64_t

v3:
- move i_version handling functions to new header file
- document that the kernel-managed i_version implementation will appear to
  increase over time
- fix inode_cmp_iversion to handle wraparound correctly

v2:
- xfs should use inode_peek_iversion instead of inode_peek_iversion_raw
- rework file_update_time patch
- don't dirty inode when only S_ATIME is set and SB_LAZYTIME is enabled
- better comments and documentation

I think this is now approaching merge readiness.

Special thanks to Jan Kara and Dave Chinner who helped me tighten up the
memory barriers in the final patch.

tl;dr: I think we can greatly reduce the cost of the inode->i_version
counter, by exploiting the fact that we don't need to increment it if no
one is looking at it. We can also clean up the code to prepare to
eventually expose this value via statx().

Note that this set relies on a few patches that are in other trees. The
full stack that I've been testing with is here:


https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git/log/?h=iversion

The inode->i_version field is supposed to be a value that changes
whenever there is any data or metadata change to the inode. Some
filesystems use it internally to detect directory changes during
readdir. knfsd will use it if the filesystem has MS_I_VERSION set. IMA
will also use it to optimize away some remeasurement if it's available.
NFS and AFS just use it to store an opaque change attribute from the
server.

Only btrfs, ext4, and xfs increment it for data changes. Because of
this, these filesystems must log the inode to disk whenever the
i_version counter changes. That has a non-zero performance impact,
especially on write-heavy workloads, because we end up dirtying the
inode metadata on every write, not just when the times change.

It turns out though that none of these users of i_version require that
it change on every change to the file. The only real requirement is that
it be different if something changed since the last time we queried for
it.

If we keep track of when something queries the value, we can avoid
bumping the counter and an on-disk update when nothing else has changed
if no one has queried it since it was last incremented.

This patchset changes the code to only bump the i_version counter when
it's strictly necessary, or when we're updating the inode metadata
anyway (e.g. when times change).

It takes the approach of converting the existing accessors of i_version
to use a new API, while leaving the underlying implementation mostly the
same.  The last patch then converts the existing implementation to keep
track of whether the value has been queried since it was last
incremented. It then uses that to avoid incrementing the counter when
it can.

With this, we reduce inode metadata updates across all 3 filesystems
down to roughly the frequency of the timestamp granularity, particularly
when it's not being queried (the vastly common case).

I can see measurable performance gains on xfs and ext4 with iversion
enabled, when streaming small (4k) I/Os.

btrfs shows some slight gain in testing, but not quite the magnitude
that xfs and ext4 show. I'm not sure why yet and would appreciate some
input from btrfs folks.

My goal is to get this into linux-next fairly soon. If it shows no
problems then we can look at merging it for 4.16, or 4.17 if all of the
prequisite patches are not yet merged.

Jeff Layton (19):
  fs: new API for handling inode->i_version
  fs: don't take the i_lock in inode_inc_iversion
  fat: convert to new i_version API
  affs: convert to new i_version API
  afs: convert to new i_version API
  btrfs: convert to new i_version API
  exofs: switch to new i_version API
  ext2: convert to new i_version API
  ext4: convert to new i_version API
  nfs: convert to new i_version API
  nfsd: convert to new i_version API
  ocfs2: convert to new i_version API
  ufs: use new i_version API
  xfs: convert to new i_version API
  IMA: switch IMA over to new i_version API
  fs: only set S_VERSION when updating times if necessary
  xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need
incrementing
  btrfs: only dirty the inode in btrfs_update_time if something was
changed
  fs: handle inode->i_version more efficiently

 fs/affs/amigaffs.c|   5 +-
 fs/affs/dir.c |   5 +-
 fs/affs/super.c   |   3 +-
 fs/afs/fsclient.c |   3 +-
 fs/afs/inode.c|   5 +-
 fs/btrfs/delayed-inode.c  |   7 +-
 fs/btrfs/file.c   |   1 +
 fs/btrfs/inode.c  |  12 +-
 fs/btrfs/ioctl.c  |   1 +
 fs/btrfs/tree-log.c   |   4 +-
 fs/btrfs/xattr.c  |   1 +
 fs/exofs/dir.c|   9 +-
 fs/exofs/super.c 

[PATCH v4 01/19] fs: new API for handling inode->i_version

2017-12-22 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Add a documentation blob that explains what the i_version field is, how
it is expected to work, and how it is currently implemented by various
filesystems.

We already have inode_inc_iversion. Add several other functions for
manipulating and accessing the i_version counter. For now, the
implementation is trivial and basically works the way that all of the
open-coded i_version accesses work today.

Future patches will convert existing users of i_version to use the new
API, and then convert the backend implementation to do things more
efficiently.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/btrfs/file.c  |   1 +
 fs/btrfs/inode.c |   1 +
 fs/btrfs/ioctl.c |   1 +
 fs/btrfs/xattr.c |   1 +
 fs/ext4/inode.c  |   1 +
 fs/ext4/namei.c  |   1 +
 fs/inode.c   |   1 +
 include/linux/fs.h   |  15 
 include/linux/iversion.h | 205 +++
 9 files changed, 212 insertions(+), 15 deletions(-)
 create mode 100644 include/linux/iversion.h

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index eb1bac7c8553..c95d7b2efefb 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ctree.h"
 #include "disk-io.h"
 #include "transaction.h"
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e1a7f3cb5be9..27f008b33fc1 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ctree.h"
 #include "disk-io.h"
 #include "transaction.h"
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 2ef8acaac688..aa452c9e2eff 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ctree.h"
 #include "disk-io.h"
 #include "transaction.h"
diff --git a/fs/btrfs/xattr.c b/fs/btrfs/xattr.c
index 2c7e53f9ff1b..5258c1714830 100644
--- a/fs/btrfs/xattr.c
+++ b/fs/btrfs/xattr.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ctree.h"
 #include "btrfs_inode.h"
 #include "transaction.h"
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 7df2c5644e59..fa5d8bc52d2d 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ext4_jbd2.h"
 #include "xattr.h"
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 798b3ac680db..bcf0dff517be 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ext4.h"
 #include "ext4_jbd2.h"
 
diff --git a/fs/inode.c b/fs/inode.c
index 03102d6ef044..19e72f500f71 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -18,6 +18,7 @@
 #include  /* for inode_has_buffers */
 #include 
 #include 
+#include 
 #include 
 #include "internal.h"
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 511fbaabf624..76382c24e9d0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2036,21 +2036,6 @@ static inline void inode_dec_link_count(struct inode 
*inode)
mark_inode_dirty(inode);
 }
 
-/**
- * inode_inc_iversion - increments i_version
- * @inode: inode that need to be updated
- *
- * Every time the inode is modified, the i_version field will be incremented.
- * The filesystem has to be mounted with i_version flag
- */
-
-static inline void inode_inc_iversion(struct inode *inode)
-{
-   spin_lock(>i_lock);
-   inode->i_version++;
-   spin_unlock(>i_lock);
-}
-
 enum file_time_flags {
S_ATIME = 1,
S_MTIME = 2,
diff --git a/include/linux/iversion.h b/include/linux/iversion.h
new file mode 100644
index ..bb50d27c71f9
--- /dev/null
+++ b/include/linux/iversion.h
@@ -0,0 +1,205 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_IVERSION_H
+#define _LINUX_IVERSION_H
+
+#include 
+
+/*
+ * The change attribute (i_version) is mandated by NFSv4 and is mostly for
+ * knfsd, but is also used for other purposes (e.g. IMA). The i_version must
+ * appear different to observers if there was a change to the inode's data or
+ * metadata since it was last queried.
+ *
+ * It should be considered an opaque value by observers. If it remains the same
+ * since it was last checked, then nothing has changed in the inode. If it's
+ * different then something has changed. Observers cannot infer anything about
+ * the nature or magnitude of the changes from the value, only that the inode
+ * has changed in some fashion.
+ *
+ * Not all filesystems properly implement the i_version counter. Subsystems 
that
+ * want to use i_version field on an inode should first check whether the
+ * filesystem sets the SB_I_VERSION flag (usually via the IS_I_VERSION macro).
+ *
+ * Those that set SB_I_VERSION will

[PATCH v4 03/19] fat: convert to new i_version API

2017-12-22 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/fat/dir.c |  3 ++-
 fs/fat/inode.c   |  9 +
 fs/fat/namei_msdos.c |  7 ---
 fs/fat/namei_vfat.c  | 22 +++---
 4 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/fs/fat/dir.c b/fs/fat/dir.c
index b833ffeee1e1..8e100c3bf72c 100644
--- a/fs/fat/dir.c
+++ b/fs/fat/dir.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "fat.h"
 
 /*
@@ -1055,7 +1056,7 @@ int fat_remove_entries(struct inode *dir, struct 
fat_slot_info *sinfo)
brelse(bh);
if (err)
return err;
-   dir->i_version++;
+   inode_inc_iversion(dir);
 
if (nr_slots) {
/*
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 20a0a89eaca5..ffbbf0520d9e 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "fat.h"
 
 #ifndef CONFIG_FAT_DEFAULT_IOCHARSET
@@ -507,7 +508,7 @@ int fat_fill_inode(struct inode *inode, struct 
msdos_dir_entry *de)
MSDOS_I(inode)->i_pos = 0;
inode->i_uid = sbi->options.fs_uid;
inode->i_gid = sbi->options.fs_gid;
-   inode->i_version++;
+   inode_inc_iversion(inode);
inode->i_generation = get_seconds();
 
if ((de->attr & ATTR_DIR) && !IS_FREE(de->name)) {
@@ -590,7 +591,7 @@ struct inode *fat_build_inode(struct super_block *sb,
goto out;
}
inode->i_ino = iunique(sb, MSDOS_ROOT_INO);
-   inode->i_version = 1;
+   inode_set_iversion(inode, 1);
err = fat_fill_inode(inode, de);
if (err) {
iput(inode);
@@ -1377,7 +1378,7 @@ static int fat_read_root(struct inode *inode)
MSDOS_I(inode)->i_pos = MSDOS_ROOT_INO;
inode->i_uid = sbi->options.fs_uid;
inode->i_gid = sbi->options.fs_gid;
-   inode->i_version++;
+   inode_inc_iversion(inode);
inode->i_generation = 0;
inode->i_mode = fat_make_mode(sbi, ATTR_DIR, S_IRWXUGO);
inode->i_op = sbi->dir_ops;
@@ -1828,7 +1829,7 @@ int fat_fill_super(struct super_block *sb, void *data, 
int silent, int isvfat,
if (!root_inode)
goto out_fail;
root_inode->i_ino = MSDOS_ROOT_INO;
-   root_inode->i_version = 1;
+   inode_set_iversion(root_inode, 1);
error = fat_read_root(root_inode);
if (error < 0) {
iput(root_inode);
diff --git a/fs/fat/namei_msdos.c b/fs/fat/namei_msdos.c
index d24d2758a363..582ca731a6c9 100644
--- a/fs/fat/namei_msdos.c
+++ b/fs/fat/namei_msdos.c
@@ -7,6 +7,7 @@
  */
 
 #include 
+#include 
 #include "fat.h"
 
 /* Characters that are undesirable in an MS-DOS file name */
@@ -480,7 +481,7 @@ static int do_msdos_rename(struct inode *old_dir, unsigned 
char *old_name,
} else
mark_inode_dirty(old_inode);
 
-   old_dir->i_version++;
+   inode_inc_iversion(old_dir);
old_dir->i_ctime = old_dir->i_mtime = 
current_time(old_dir);
if (IS_DIRSYNC(old_dir))
(void)fat_sync_inode(old_dir);
@@ -508,7 +509,7 @@ static int do_msdos_rename(struct inode *old_dir, unsigned 
char *old_name,
goto out;
new_i_pos = sinfo.i_pos;
}
-   new_dir->i_version++;
+   inode_inc_iversion(new_dir);
 
fat_detach(old_inode);
fat_attach(old_inode, new_i_pos);
@@ -540,7 +541,7 @@ static int do_msdos_rename(struct inode *old_dir, unsigned 
char *old_name,
old_sinfo.bh = NULL;
if (err)
goto error_dotdot;
-   old_dir->i_version++;
+   inode_inc_iversion(old_dir);
old_dir->i_ctime = old_dir->i_mtime = ts;
if (IS_DIRSYNC(old_dir))
(void)fat_sync_inode(old_dir);
diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 02c03a3a..cefea792cde8 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -20,7 +20,7 @@
 #include 
 #include 
 #include 
-
+#include 
 #include "fat.h"
 
 static inline unsigned long vfat_d_version(struct dentry *dentry)
@@ -46,7 +46,7 @@ static int vfat_revalidate_shortname(struct dentry *dentry)
 {
int ret = 1;
spin_lock(>d_lock);
-   if (vfat_d_version(dentry) != d_inode(dentry->d_parent)->i_version)
+   if (inode_cmp_iversion(d_inode(dentry->d_parent), 
vfat_d_version(dentry)))
ret = 0;
spin_unlock(>d_lock);
return ret;
@@ -759,7 +759,7 @@ static struct dentry *vfat_lookup(struct inode *dir, struct 
dentry *dentry,
 out:
mutex_unlock(_SB(sb)->s_lock);
if (!inode)
-   vfat_d_version_set

[PATCH v4 06/19] btrfs: convert to new i_version API

2017-12-22 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/btrfs/delayed-inode.c | 7 +--
 fs/btrfs/inode.c | 6 --
 fs/btrfs/tree-log.c  | 4 +++-
 3 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 5d73f79ded8b..6a246ae2bcb2 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -18,6 +18,7 @@
  */
 
 #include 
+#include 
 #include "delayed-inode.h"
 #include "disk-io.h"
 #include "transaction.h"
@@ -1700,7 +1701,8 @@ static void fill_stack_inode_item(struct 
btrfs_trans_handle *trans,
btrfs_set_stack_inode_nbytes(inode_item, inode_get_bytes(inode));
btrfs_set_stack_inode_generation(inode_item,
 BTRFS_I(inode)->generation);
-   btrfs_set_stack_inode_sequence(inode_item, inode->i_version);
+   btrfs_set_stack_inode_sequence(inode_item,
+  inode_peek_iversion(inode));
btrfs_set_stack_inode_transid(inode_item, trans->transid);
btrfs_set_stack_inode_rdev(inode_item, inode->i_rdev);
btrfs_set_stack_inode_flags(inode_item, BTRFS_I(inode)->flags);
@@ -1754,7 +1756,8 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
BTRFS_I(inode)->generation = btrfs_stack_inode_generation(inode_item);
 BTRFS_I(inode)->last_trans = btrfs_stack_inode_transid(inode_item);
 
-   inode->i_version = btrfs_stack_inode_sequence(inode_item);
+   inode_set_iversion_queried(inode,
+  btrfs_stack_inode_sequence(inode_item));
inode->i_rdev = 0;
*rdev = btrfs_stack_inode_rdev(inode_item);
BTRFS_I(inode)->flags = btrfs_stack_inode_flags(inode_item);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 27f008b33fc1..ac8692849a81 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3778,7 +3778,8 @@ static int btrfs_read_locked_inode(struct inode *inode)
BTRFS_I(inode)->generation = btrfs_inode_generation(leaf, inode_item);
BTRFS_I(inode)->last_trans = btrfs_inode_transid(leaf, inode_item);
 
-   inode->i_version = btrfs_inode_sequence(leaf, inode_item);
+   inode_set_iversion_queried(inode,
+  btrfs_inode_sequence(leaf, inode_item));
inode->i_generation = BTRFS_I(inode)->generation;
inode->i_rdev = 0;
rdev = btrfs_inode_rdev(leaf, inode_item);
@@ -3946,7 +3947,8 @@ static void fill_inode_item(struct btrfs_trans_handle 
*trans,
 );
btrfs_set_token_inode_generation(leaf, item, BTRFS_I(inode)->generation,
 );
-   btrfs_set_token_inode_sequence(leaf, item, inode->i_version, );
+   btrfs_set_token_inode_sequence(leaf, item, inode_peek_iversion(inode),
+  );
btrfs_set_token_inode_transid(leaf, item, trans->transid, );
btrfs_set_token_inode_rdev(leaf, item, inode->i_rdev, );
btrfs_set_token_inode_flags(leaf, item, BTRFS_I(inode)->flags, );
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 7bf9b31561db..1b7d92075c1f 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "tree-log.h"
 #include "disk-io.h"
 #include "locking.h"
@@ -3609,7 +3610,8 @@ static void fill_inode_item(struct btrfs_trans_handle 
*trans,
btrfs_set_token_inode_nbytes(leaf, item, inode_get_bytes(inode),
 );
 
-   btrfs_set_token_inode_sequence(leaf, item, inode->i_version, );
+   btrfs_set_token_inode_sequence(leaf, item,
+  inode_peek_iversion(inode), );
btrfs_set_token_inode_transid(leaf, item, trans->transid, );
btrfs_set_token_inode_rdev(leaf, item, inode->i_rdev, );
btrfs_set_token_inode_flags(leaf, item, BTRFS_I(inode)->flags, );
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 07/19] exofs: switch to new i_version API

2017-12-22 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/exofs/dir.c   | 9 +
 fs/exofs/super.c | 3 ++-
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/exofs/dir.c b/fs/exofs/dir.c
index 98233a97b7b8..c5a53fcc43ea 100644
--- a/fs/exofs/dir.c
+++ b/fs/exofs/dir.c
@@ -31,6 +31,7 @@
  * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
  */
 
+#include 
 #include "exofs.h"
 
 static inline unsigned exofs_chunk_size(struct inode *inode)
@@ -60,7 +61,7 @@ static int exofs_commit_chunk(struct page *page, loff_t pos, 
unsigned len)
struct inode *dir = mapping->host;
int err = 0;
 
-   dir->i_version++;
+   inode_inc_iversion(dir);
 
if (!PageUptodate(page))
SetPageUptodate(page);
@@ -241,7 +242,7 @@ exofs_readdir(struct file *file, struct dir_context *ctx)
unsigned long n = pos >> PAGE_SHIFT;
unsigned long npages = dir_pages(inode);
unsigned chunk_mask = ~(exofs_chunk_size(inode)-1);
-   int need_revalidate = (file->f_version != inode->i_version);
+   bool need_revalidate = inode_cmp_iversion(inode, file->f_version);
 
if (pos > inode->i_size - EXOFS_DIR_REC_LEN(1))
return 0;
@@ -264,8 +265,8 @@ exofs_readdir(struct file *file, struct dir_context *ctx)
chunk_mask);
ctx->pos = (n<<PAGE_SHIFT) + offset;
}
-   file->f_version = inode->i_version;
-   need_revalidate = 0;
+   file->f_version = inode_query_iversion(inode);
+   need_revalidate = false;
}
de = (struct exofs_dir_entry *)(kaddr + offset);
limit = kaddr + exofs_last_byte(inode, n) -
diff --git a/fs/exofs/super.c b/fs/exofs/super.c
index 819624cfc8da..7e244093c0e5 100644
--- a/fs/exofs/super.c
+++ b/fs/exofs/super.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "exofs.h"
 
@@ -159,7 +160,7 @@ static struct inode *exofs_alloc_inode(struct super_block 
*sb)
if (!oi)
return NULL;
 
-   oi->vfs_inode.i_version = 1;
+   inode_set_iversion(>vfs_inode, 1);
return >vfs_inode;
 }
 
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 08/19] ext2: convert to new i_version API

2017-12-22 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
Reviewed-by: Jan Kara <j...@suse.cz>
---
 fs/ext2/dir.c   | 9 +
 fs/ext2/super.c | 5 +++--
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c
index 987647986f47..4111085a129f 100644
--- a/fs/ext2/dir.c
+++ b/fs/ext2/dir.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 typedef struct ext2_dir_entry_2 ext2_dirent;
 
@@ -92,7 +93,7 @@ static int ext2_commit_chunk(struct page *page, loff_t pos, 
unsigned len)
struct inode *dir = mapping->host;
int err = 0;
 
-   dir->i_version++;
+   inode_inc_iversion(dir);
block_write_end(NULL, mapping, pos, len, len, page, NULL);
 
if (pos+len > dir->i_size) {
@@ -293,7 +294,7 @@ ext2_readdir(struct file *file, struct dir_context *ctx)
unsigned long npages = dir_pages(inode);
unsigned chunk_mask = ~(ext2_chunk_size(inode)-1);
unsigned char *types = NULL;
-   int need_revalidate = file->f_version != inode->i_version;
+   bool need_revalidate = inode_cmp_iversion(inode, file->f_version);
 
if (pos > inode->i_size - EXT2_DIR_REC_LEN(1))
return 0;
@@ -319,8 +320,8 @@ ext2_readdir(struct file *file, struct dir_context *ctx)
offset = ext2_validate_entry(kaddr, offset, 
chunk_mask);
ctx->pos = (n<<PAGE_SHIFT) + offset;
}
-   file->f_version = inode->i_version;
-   need_revalidate = 0;
+   file->f_version = inode_query_iversion(inode);
+   need_revalidate = false;
}
de = (ext2_dirent *)(kaddr+offset);
limit = kaddr + ext2_last_byte(inode, n) - EXT2_DIR_REC_LEN(1);
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 7646818ab266..554c98b8a93a 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ext2.h"
 #include "xattr.h"
 #include "acl.h"
@@ -184,7 +185,7 @@ static struct inode *ext2_alloc_inode(struct super_block 
*sb)
if (!ei)
return NULL;
ei->i_block_alloc_info = NULL;
-   ei->vfs_inode.i_version = 1;
+   inode_set_iversion(>vfs_inode, 1);
 #ifdef CONFIG_QUOTA
memset(>i_dquot, 0, sizeof(ei->i_dquot));
 #endif
@@ -1569,7 +1570,7 @@ static ssize_t ext2_quota_write(struct super_block *sb, 
int type,
return err;
if (inode->i_size < off+len-towrite)
i_size_write(inode, off+len-towrite);
-   inode->i_version++;
+   inode_inc_iversion(inode);
inode->i_mtime = inode->i_ctime = current_time(inode);
mark_inode_dirty(inode);
return len - towrite;
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 09/19] ext4: convert to new i_version API

2017-12-22 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
Acked-by: Theodore Ts'o <ty...@mit.edu>
---
 fs/ext4/dir.c|  9 +
 fs/ext4/inline.c |  7 ---
 fs/ext4/inode.c  | 12 
 fs/ext4/ioctl.c  |  3 ++-
 fs/ext4/namei.c  |  4 ++--
 fs/ext4/super.c  |  3 ++-
 fs/ext4/xattr.c  |  5 +++--
 7 files changed, 26 insertions(+), 17 deletions(-)

diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index d5babc9f222b..afda0a0499ce 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ext4.h"
 #include "xattr.h"
 
@@ -208,7 +209,7 @@ static int ext4_readdir(struct file *file, struct 
dir_context *ctx)
 * readdir(2), then we might be pointing to an invalid
 * dirent right now.  Scan from the start of the block
 * to make sure. */
-   if (file->f_version != inode->i_version) {
+   if (inode_cmp_iversion(inode, file->f_version)) {
for (i = 0; i < sb->s_blocksize && i < offset; ) {
de = (struct ext4_dir_entry_2 *)
(bh->b_data + i);
@@ -227,7 +228,7 @@ static int ext4_readdir(struct file *file, struct 
dir_context *ctx)
offset = i;
ctx->pos = (ctx->pos & ~(sb->s_blocksize - 1))
| offset;
-   file->f_version = inode->i_version;
+   file->f_version = inode_query_iversion(inode);
}
 
while (ctx->pos < inode->i_size
@@ -568,10 +569,10 @@ static int ext4_dx_readdir(struct file *file, struct 
dir_context *ctx)
 * cached entries.
 */
if ((!info->curr_node) ||
-   (file->f_version != inode->i_version)) {
+   inode_cmp_iversion(inode, file->f_version)) {
info->curr_node = NULL;
free_rb_tree_fname(>root);
-   file->f_version = inode->i_version;
+   file->f_version = inode_query_iversion(inode);
ret = ext4_htree_fill_tree(file, info->curr_hash,
   info->curr_minor_hash,
   >next_hash);
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 1367553c43bb..a8b987b71173 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -14,6 +14,7 @@
 
 #include 
 #include 
+#include 
 
 #include "ext4_jbd2.h"
 #include "ext4.h"
@@ -1042,7 +1043,7 @@ static int ext4_add_dirent_to_inline(handle_t *handle,
 */
dir->i_mtime = dir->i_ctime = current_time(dir);
ext4_update_dx_flag(dir);
-   dir->i_version++;
+   inode_inc_iversion(dir);
return 1;
 }
 
@@ -1494,7 +1495,7 @@ int ext4_read_inline_dir(struct file *file,
 * dirent right now.  Scan from the start of the inline
 * dir to make sure.
 */
-   if (file->f_version != inode->i_version) {
+   if (inode_cmp_iversion(inode, file->f_version)) {
for (i = 0; i < extra_size && i < offset;) {
/*
 * "." is with offset 0 and
@@ -1526,7 +1527,7 @@ int ext4_read_inline_dir(struct file *file,
}
offset = i;
ctx->pos = offset;
-   file->f_version = inode->i_version;
+   file->f_version = inode_query_iversion(inode);
}
 
while (ctx->pos < extra_size) {
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index fa5d8bc52d2d..1b0d54b372f2 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4874,12 +4874,14 @@ struct inode *ext4_iget(struct super_block *sb, 
unsigned long ino)
EXT4_EINODE_GET_XTIME(i_crtime, ei, raw_inode);
 
if (likely(!test_opt2(inode->i_sb, HURD_COMPAT))) {
-   inode->i_version = le32_to_cpu(raw_inode->i_disk_version);
+   u64 ivers = le32_to_cpu(raw_inode->i_disk_version);
+
if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE) {
if (EXT4_FITS_IN_INODE(raw_inode, ei, i_version_hi))
-   inode->i_version |=
+   ivers |=
(__u64)(le32_to_cpu(raw_inode->i_version_hi)) << 32;
}
+   inode_set_iversion_queried(inode, ivers);
}
 
ret = 0;
@@ -5165,11 +5167,13 @@ static int ext4_do_update_inode(handle_t *handle,
}
 
if (likely(!test_opt2(inode->i_sb, HURD_COMPAT))) {
-   raw_inode->i_disk_version = cpu_

[PATCH v4 04/19] affs: convert to new i_version API

2017-12-22 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/affs/amigaffs.c | 5 +++--
 fs/affs/dir.c  | 5 +++--
 fs/affs/super.c| 3 ++-
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/affs/amigaffs.c b/fs/affs/amigaffs.c
index 0f0e6925e97d..14a6c1b90c9f 100644
--- a/fs/affs/amigaffs.c
+++ b/fs/affs/amigaffs.c
@@ -10,6 +10,7 @@
  */
 
 #include 
+#include 
 #include "affs.h"
 
 /*
@@ -60,7 +61,7 @@ affs_insert_hash(struct inode *dir, struct buffer_head *bh)
affs_brelse(dir_bh);
 
dir->i_mtime = dir->i_ctime = current_time(dir);
-   dir->i_version++;
+   inode_inc_iversion(dir);
mark_inode_dirty(dir);
 
return 0;
@@ -114,7 +115,7 @@ affs_remove_hash(struct inode *dir, struct buffer_head 
*rem_bh)
affs_brelse(bh);
 
dir->i_mtime = dir->i_ctime = current_time(dir);
-   dir->i_version++;
+   inode_inc_iversion(dir);
mark_inode_dirty(dir);
 
return retval;
diff --git a/fs/affs/dir.c b/fs/affs/dir.c
index a105e77df2c1..d180b46453cf 100644
--- a/fs/affs/dir.c
+++ b/fs/affs/dir.c
@@ -14,6 +14,7 @@
  *
  */
 
+#include 
 #include "affs.h"
 
 static int affs_readdir(struct file *, struct dir_context *);
@@ -80,7 +81,7 @@ affs_readdir(struct file *file, struct dir_context *ctx)
 * we can jump directly to where we left off.
 */
ino = (u32)(long)file->private_data;
-   if (ino && file->f_version == inode->i_version) {
+   if (ino && inode_cmp_iversion(inode, file->f_version) == 0) {
pr_debug("readdir() left off=%d\n", ino);
goto inside;
}
@@ -130,7 +131,7 @@ affs_readdir(struct file *file, struct dir_context *ctx)
} while (ino);
}
 done:
-   file->f_version = inode->i_version;
+   file->f_version = inode_query_iversion(inode);
file->private_data = (void *)(long)ino;
affs_brelse(fh_bh);
 
diff --git a/fs/affs/super.c b/fs/affs/super.c
index 1117e36134cc..e602619aed9d 100644
--- a/fs/affs/super.c
+++ b/fs/affs/super.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "affs.h"
 
 static int affs_statfs(struct dentry *dentry, struct kstatfs *buf);
@@ -102,7 +103,7 @@ static struct inode *affs_alloc_inode(struct super_block 
*sb)
if (!i)
return NULL;
 
-   i->vfs_inode.i_version = 1;
+   inode_set_iversion(>vfs_inode, 1);
i->i_lc = NULL;
i->i_ext_bh = NULL;
i->i_pa_cnt = 0;
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 10/19] nfs: convert to new i_version API

2017-12-22 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

For NFS, we just use the "raw" API since the i_version is mostly
managed by the server. The exception there is when the client
holds a write delegation, but we only need to bump it once
there anyway to handle CB_GETATTR.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/nfs/delegation.c|  3 ++-
 fs/nfs/fscache-index.c |  5 +++--
 fs/nfs/inode.c | 18 +-
 fs/nfs/nfs4proc.c  | 10 ++
 fs/nfs/nfstrace.h  |  5 +++--
 fs/nfs/write.c |  8 +++-
 6 files changed, 26 insertions(+), 23 deletions(-)

diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index ade44ca0c66c..d8b47624fee2 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -347,7 +348,7 @@ int nfs_inode_set_delegation(struct inode *inode, struct 
rpc_cred *cred, struct
nfs4_stateid_copy(>stateid, >delegation);
delegation->type = res->delegation_type;
delegation->pagemod_limit = res->pagemod_limit;
-   delegation->change_attr = inode->i_version;
+   delegation->change_attr = inode_peek_iversion_raw(inode);
delegation->cred = get_rpccred(cred);
delegation->inode = inode;
delegation->flags = 1<<NFS_DELEGATION_REFERENCED;
diff --git a/fs/nfs/fscache-index.c b/fs/nfs/fscache-index.c
index 3025fe8584a0..0ee4b93d36ea 100644
--- a/fs/nfs/fscache-index.c
+++ b/fs/nfs/fscache-index.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "internal.h"
 #include "fscache.h"
@@ -211,7 +212,7 @@ static uint16_t nfs_fscache_inode_get_aux(const void 
*cookie_netfs_data,
auxdata.ctime = nfsi->vfs_inode.i_ctime;
 
if (NFS_SERVER(>vfs_inode)->nfs_client->rpc_ops->version == 4)
-   auxdata.change_attr = nfsi->vfs_inode.i_version;
+   auxdata.change_attr = inode_peek_iversion_raw(>vfs_inode);
 
if (bufmax > sizeof(auxdata))
bufmax = sizeof(auxdata);
@@ -243,7 +244,7 @@ enum fscache_checkaux nfs_fscache_inode_check_aux(void 
*cookie_netfs_data,
auxdata.ctime = nfsi->vfs_inode.i_ctime;
 
if (NFS_SERVER(>vfs_inode)->nfs_client->rpc_ops->version == 4)
-   auxdata.change_attr = nfsi->vfs_inode.i_version;
+   auxdata.change_attr = inode_peek_iversion_raw(>vfs_inode);
 
if (memcmp(data, , datalen) != 0)
return FSCACHE_CHECKAUX_OBSOLETE;
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index b992d2382ffa..0b85cca1184b 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -38,8 +38,8 @@
 #include 
 #include 
 #include 
-
 #include 
+#include 
 
 #include "nfs4_fs.h"
 #include "callback.h"
@@ -483,7 +483,7 @@ nfs_fhget(struct super_block *sb, struct nfs_fh *fh, struct 
nfs_fattr *fattr, st
memset(>i_atime, 0, sizeof(inode->i_atime));
memset(>i_mtime, 0, sizeof(inode->i_mtime));
memset(>i_ctime, 0, sizeof(inode->i_ctime));
-   inode->i_version = 0;
+   inode_set_iversion_raw(inode, 0);
inode->i_size = 0;
clear_nlink(inode);
inode->i_uid = make_kuid(_user_ns, -2);
@@ -508,7 +508,7 @@ nfs_fhget(struct super_block *sb, struct nfs_fh *fh, struct 
nfs_fattr *fattr, st
else if (nfs_server_capable(inode, NFS_CAP_CTIME))
nfs_set_cache_invalid(inode, NFS_INO_INVALID_ATTR);
if (fattr->valid & NFS_ATTR_FATTR_CHANGE)
-   inode->i_version = fattr->change_attr;
+   inode_set_iversion_raw(inode, fattr->change_attr);
else
nfs_set_cache_invalid(inode, NFS_INO_INVALID_ATTR
| NFS_INO_REVAL_PAGECACHE);
@@ -1289,8 +1289,8 @@ static unsigned long nfs_wcc_update_inode(struct inode 
*inode, struct nfs_fattr
 
if ((fattr->valid & NFS_ATTR_FATTR_PRECHANGE)
&& (fattr->valid & NFS_ATTR_FATTR_CHANGE)
-   && inode->i_version == fattr->pre_change_attr) {
-   inode->i_version = fattr->change_attr;
+   && !inode_cmp_iversion(inode, fattr->pre_change_attr)) {
+   inode_set_iversion_raw(inode, fattr->change_attr);
if (S_ISDIR(inode->i_mode))
nfs_set_cache_invalid(inode, NFS_INO_INVALID_DATA);
ret |= NFS_INO_INVALID_ATTR;
@@ -1348,7 +1348,7 @@ static int nfs_check_inode_attributes(struct inode 
*inode, struct nfs_fattr *fat
 
if (!nfs_file_has_buffered_writers(nfsi)) {
/* Verify a few of the more important attributes */
-  

[PATCH v4 13/19] ufs: use new i_version API

2017-12-22 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/ufs/dir.c   | 9 +
 fs/ufs/inode.c | 3 ++-
 fs/ufs/super.c | 3 ++-
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/ufs/dir.c b/fs/ufs/dir.c
index 2edc1755b7c5..50dfce000864 100644
--- a/fs/ufs/dir.c
+++ b/fs/ufs/dir.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ufs_fs.h"
 #include "ufs.h"
@@ -47,7 +48,7 @@ static int ufs_commit_chunk(struct page *page, loff_t pos, 
unsigned len)
struct inode *dir = mapping->host;
int err = 0;
 
-   dir->i_version++;
+   inode_inc_iversion(dir);
block_write_end(NULL, mapping, pos, len, len, page, NULL);
if (pos+len > dir->i_size) {
i_size_write(dir, pos+len);
@@ -428,7 +429,7 @@ ufs_readdir(struct file *file, struct dir_context *ctx)
unsigned long n = pos >> PAGE_SHIFT;
unsigned long npages = dir_pages(inode);
unsigned chunk_mask = ~(UFS_SB(sb)->s_uspi->s_dirblksize - 1);
-   int need_revalidate = file->f_version != inode->i_version;
+   bool need_revalidate = inode_cmp_iversion(inode, file->f_version);
unsigned flags = UFS_SB(sb)->s_flags;
 
UFSD("BEGIN\n");
@@ -455,8 +456,8 @@ ufs_readdir(struct file *file, struct dir_context *ctx)
offset = ufs_validate_entry(sb, kaddr, offset, 
chunk_mask);
ctx->pos = (n<<PAGE_SHIFT) + offset;
}
-   file->f_version = inode->i_version;
-   need_revalidate = 0;
+   file->f_version = inode_query_iversion(inode);
+   need_revalidate = false;
}
de = (struct ufs_dir_entry *)(kaddr+offset);
limit = kaddr + ufs_last_byte(inode, n) - UFS_DIR_REC_LEN(1);
diff --git a/fs/ufs/inode.c b/fs/ufs/inode.c
index afb601c0dda0..c843ec858cf7 100644
--- a/fs/ufs/inode.c
+++ b/fs/ufs/inode.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ufs_fs.h"
 #include "ufs.h"
@@ -693,7 +694,7 @@ struct inode *ufs_iget(struct super_block *sb, unsigned 
long ino)
if (err)
goto bad_inode;
 
-   inode->i_version++;
+   inode_inc_iversion(inode);
ufsi->i_lastfrag =
(inode->i_size + uspi->s_fsize - 1) >> uspi->s_fshift;
ufsi->i_dir_start_lookup = 0;
diff --git a/fs/ufs/super.c b/fs/ufs/super.c
index 4d497e9c6883..b6ba80e05bff 100644
--- a/fs/ufs/super.c
+++ b/fs/ufs/super.c
@@ -88,6 +88,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ufs_fs.h"
 #include "ufs.h"
@@ -1440,7 +1441,7 @@ static struct inode *ufs_alloc_inode(struct super_block 
*sb)
if (!ei)
return NULL;
 
-   ei->vfs_inode.i_version = 1;
+   inode_set_iversion(>vfs_inode, 1);
seqlock_init(>meta_lock);
mutex_init(>truncate_mutex);
return >vfs_inode;
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 14/19] xfs: convert to new i_version API

2017-12-22 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/xfs/libxfs/xfs_inode_buf.c | 7 +--
 fs/xfs/xfs_icache.c   | 5 +++--
 fs/xfs/xfs_inode.c| 3 ++-
 fs/xfs/xfs_inode_item.c   | 3 ++-
 fs/xfs/xfs_trans_inode.c  | 4 +++-
 5 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index 6b7989038d75..b9c0bf80669c 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -32,6 +32,8 @@
 #include "xfs_ialloc.h"
 #include "xfs_dir2.h"
 
+#include 
+
 /*
  * Check that none of the inode's in the buffer have a next
  * unlinked field of 0.
@@ -264,7 +266,8 @@ xfs_inode_from_disk(
to->di_flags= be16_to_cpu(from->di_flags);
 
if (to->di_version == 3) {
-   inode->i_version = be64_to_cpu(from->di_changecount);
+   inode_set_iversion_queried(inode,
+  be64_to_cpu(from->di_changecount));
to->di_crtime.t_sec = be32_to_cpu(from->di_crtime.t_sec);
to->di_crtime.t_nsec = be32_to_cpu(from->di_crtime.t_nsec);
to->di_flags2 = be64_to_cpu(from->di_flags2);
@@ -314,7 +317,7 @@ xfs_inode_to_disk(
to->di_flags = cpu_to_be16(from->di_flags);
 
if (from->di_version == 3) {
-   to->di_changecount = cpu_to_be64(inode->i_version);
+   to->di_changecount = cpu_to_be64(inode_peek_iversion(inode));
to->di_crtime.t_sec = cpu_to_be32(from->di_crtime.t_sec);
to->di_crtime.t_nsec = cpu_to_be32(from->di_crtime.t_nsec);
to->di_flags2 = cpu_to_be64(from->di_flags2);
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 43005fbe8b1e..4c315adb05e6 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -37,6 +37,7 @@
 
 #include 
 #include 
+#include 
 
 /*
  * Allocate and initialise an xfs_inode.
@@ -293,14 +294,14 @@ xfs_reinit_inode(
int error;
uint32_tnlink = inode->i_nlink;
uint32_tgeneration = inode->i_generation;
-   uint64_tversion = inode->i_version;
+   uint64_tversion = inode_peek_iversion(inode);
umode_t mode = inode->i_mode;
 
error = inode_init_always(mp->m_super, inode);
 
set_nlink(inode, nlink);
inode->i_generation = generation;
-   inode->i_version = version;
+   inode_set_iversion_queried(inode, version);
inode->i_mode = mode;
return error;
 }
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 801274126648..dfc5e60d8af3 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -16,6 +16,7 @@
  * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
  */
 #include 
+#include 
 
 #include "xfs.h"
 #include "xfs_fs.h"
@@ -833,7 +834,7 @@ xfs_ialloc(
ip->i_d.di_flags = 0;
 
if (ip->i_d.di_version == 3) {
-   inode->i_version = 1;
+   inode_set_iversion(inode, 1);
ip->i_d.di_flags2 = 0;
ip->i_d.di_cowextsize = 0;
ip->i_d.di_crtime.t_sec = (int32_t)tv.tv_sec;
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 6ee5c3bf19ad..7571abf5dfb3 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -30,6 +30,7 @@
 #include "xfs_buf_item.h"
 #include "xfs_log.h"
 
+#include 
 
 kmem_zone_t*xfs_ili_zone;  /* inode log item zone */
 
@@ -354,7 +355,7 @@ xfs_inode_to_log_dinode(
to->di_next_unlinked = NULLAGINO;
 
if (from->di_version == 3) {
-   to->di_changecount = inode->i_version;
+   to->di_changecount = inode_peek_iversion(inode);
to->di_crtime.t_sec = from->di_crtime.t_sec;
to->di_crtime.t_nsec = from->di_crtime.t_nsec;
to->di_flags2 = from->di_flags2;
diff --git a/fs/xfs/xfs_trans_inode.c b/fs/xfs/xfs_trans_inode.c
index daa7615497f9..225544327c4f 100644
--- a/fs/xfs/xfs_trans_inode.c
+++ b/fs/xfs/xfs_trans_inode.c
@@ -28,6 +28,8 @@
 #include "xfs_inode_item.h"
 #include "xfs_trace.h"
 
+#include 
+
 /*
  * Add a locked inode to the transaction.
  *
@@ -117,7 +119,7 @@ xfs_trans_log_inode(
 */
if (!(ip->i_itemp->ili_item.li_desc->lid_flags & XFS_LID_DIRTY) &&
IS_I_VERSION(VFS_I(ip))) {
-   VFS_I(ip)->i_version++;
+   inode_inc_iversion(VFS_I(ip));
flags |= XFS_ILOG_CORE;
}
 
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 11/19] nfsd: convert to new i_version API

2017-12-22 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Mostly just making sure we use the "get" wrappers so we know when
it is being fetched for later use.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/nfsd/nfsfh.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
index 43f31cf49bae..b8444189223b 100644
--- a/fs/nfsd/nfsfh.h
+++ b/fs/nfsd/nfsfh.h
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static inline __u32 ino_t_to_u32(ino_t ino)
 {
@@ -259,7 +260,7 @@ static inline u64 nfsd4_change_attribute(struct inode 
*inode)
chattr =  inode->i_ctime.tv_sec;
chattr <<= 30;
chattr += inode->i_ctime.tv_nsec;
-   chattr += inode->i_version;
+   chattr += inode_query_iversion(inode);
return chattr;
 }
 
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 15/19] IMA: switch IMA over to new i_version API

2017-12-22 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 security/integrity/ima/ima_api.c  | 3 ++-
 security/integrity/ima/ima_main.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c
index c7e8db0ea4c0..c6ae42266270 100644
--- a/security/integrity/ima/ima_api.c
+++ b/security/integrity/ima/ima_api.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ima.h"
 
@@ -215,7 +216,7 @@ int ima_collect_measurement(struct integrity_iint_cache 
*iint,
 * which do not support i_version, support is limited to an initial
 * measurement/appraisal/audit.
 */
-   i_version = file_inode(file)->i_version;
+   i_version = inode_query_iversion(inode);
hash.hdr.algo = algo;
 
/* Initialize hash digest to 0's in case of failure */
diff --git a/security/integrity/ima/ima_main.c 
b/security/integrity/ima/ima_main.c
index 50b8254d..06a70c5a2329 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ima.h"
 
@@ -128,7 +129,7 @@ static void ima_check_last_writer(struct 
integrity_iint_cache *iint,
inode_lock(inode);
if (atomic_read(>i_writecount) == 1) {
if (!IS_I_VERSION(inode) ||
-   (iint->version != inode->i_version) ||
+   inode_cmp_iversion(inode, iint->version) ||
(iint->flags & IMA_NEW_FILE)) {
iint->flags &= ~(IMA_DONE_MASK | IMA_NEW_FILE);
iint->measured_pcrs = 0;
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 16/19] fs: only set S_VERSION when updating times if necessary

2017-12-22 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

We only really need to update i_version if someone has queried for it
since we last incremented it. By doing that, we can avoid having to
update the inode if the times haven't changed.

If the times have changed, then we go ahead and forcibly increment the
counter, under the assumption that we'll be going to the storage
anyway, and the increment itself is relatively cheap.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/inode.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 19e72f500f71..2fa920188759 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1635,17 +1635,21 @@ static int relatime_need_update(const struct path 
*path, struct inode *inode,
 int generic_update_time(struct inode *inode, struct timespec *time, int flags)
 {
int iflags = I_DIRTY_TIME;
+   bool dirty = false;
 
if (flags & S_ATIME)
inode->i_atime = *time;
if (flags & S_VERSION)
-   inode_inc_iversion(inode);
+   dirty |= inode_maybe_inc_iversion(inode, dirty);
if (flags & S_CTIME)
inode->i_ctime = *time;
if (flags & S_MTIME)
inode->i_mtime = *time;
+   if ((flags & (S_ATIME | S_CTIME | S_MTIME)) &&
+   !(inode->i_sb->s_flags & SB_LAZYTIME))
+   dirty = true;
 
-   if (!(inode->i_sb->s_flags & SB_LAZYTIME) || (flags & S_VERSION))
+   if (dirty)
iflags |= I_DIRTY_SYNC;
__mark_inode_dirty(inode, iflags);
return 0;
@@ -1864,7 +1868,7 @@ int file_update_time(struct file *file)
if (!timespec_equal(>i_ctime, ))
sync_it |= S_CTIME;
 
-   if (IS_I_VERSION(inode))
+   if (IS_I_VERSION(inode) && inode_iversion_need_inc(inode))
sync_it |= S_VERSION;
 
if (!sync_it)
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 12/19] ocfs2: convert to new i_version API

2017-12-22 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
Reviewed-by: Jan Kara <j...@suse.cz>
---
 fs/ocfs2/dir.c  | 15 ---
 fs/ocfs2/inode.c|  3 ++-
 fs/ocfs2/namei.c|  3 ++-
 fs/ocfs2/quota_global.c |  3 ++-
 4 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c
index febe6312ceff..32f9c72dff17 100644
--- a/fs/ocfs2/dir.c
+++ b/fs/ocfs2/dir.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -1174,7 +1175,7 @@ static int __ocfs2_delete_entry(handle_t *handle, struct 
inode *dir,
le16_add_cpu(>rec_len,
le16_to_cpu(de->rec_len));
de->inode = 0;
-   dir->i_version++;
+   inode_inc_iversion(dir);
ocfs2_journal_dirty(handle, bh);
goto bail;
}
@@ -1729,7 +1730,7 @@ int __ocfs2_add_entry(handle_t *handle,
if (ocfs2_dir_indexed(dir))
ocfs2_recalc_free_list(dir, handle, lookup);
 
-   dir->i_version++;
+   inode_inc_iversion(dir);
ocfs2_journal_dirty(handle, insert_bh);
retval = 0;
goto bail;
@@ -1775,7 +1776,7 @@ static int ocfs2_dir_foreach_blk_id(struct inode *inode,
 * readdir(2), then we might be pointing to an invalid
 * dirent right now.  Scan from the start of the block
 * to make sure. */
-   if (*f_version != inode->i_version) {
+   if (inode_cmp_iversion(inode, *f_version)) {
for (i = 0; i < i_size_read(inode) && i < offset; ) {
de = (struct ocfs2_dir_entry *)
(data->id_data + i);
@@ -1791,7 +1792,7 @@ static int ocfs2_dir_foreach_blk_id(struct inode *inode,
i += le16_to_cpu(de->rec_len);
}
ctx->pos = offset = i;
-   *f_version = inode->i_version;
+   *f_version = inode_query_iversion(inode);
}
 
de = (struct ocfs2_dir_entry *) (data->id_data + ctx->pos);
@@ -1869,7 +1870,7 @@ static int ocfs2_dir_foreach_blk_el(struct inode *inode,
 * readdir(2), then we might be pointing to an invalid
 * dirent right now.  Scan from the start of the block
 * to make sure. */
-   if (*f_version != inode->i_version) {
+   if (inode_cmp_iversion(inode, *f_version)) {
for (i = 0; i < sb->s_blocksize && i < offset; ) {
de = (struct ocfs2_dir_entry *) (bh->b_data + 
i);
/* It's too expensive to do a full
@@ -1886,7 +1887,7 @@ static int ocfs2_dir_foreach_blk_el(struct inode *inode,
offset = i;
ctx->pos = (ctx->pos & ~(sb->s_blocksize - 1))
| offset;
-   *f_version = inode->i_version;
+   *f_version = inode_query_iversion(inode);
}
 
while (ctx->pos < i_size_read(inode)
@@ -1940,7 +1941,7 @@ static int ocfs2_dir_foreach_blk(struct inode *inode, u64 
*f_version,
  */
 int ocfs2_dir_foreach(struct inode *inode, struct dir_context *ctx)
 {
-   u64 version = inode->i_version;
+   u64 version = inode_query_iversion(inode);
ocfs2_dir_foreach_blk(inode, , ctx, true);
return 0;
 }
diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
index 1a1e0078ab38..d51b80edd972 100644
--- a/fs/ocfs2/inode.c
+++ b/fs/ocfs2/inode.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -302,7 +303,7 @@ void ocfs2_populate_inode(struct inode *inode, struct 
ocfs2_dinode *fe,
OCFS2_I(inode)->ip_attr = le32_to_cpu(fe->i_attr);
OCFS2_I(inode)->ip_dyn_features = le16_to_cpu(fe->i_dyn_features);
 
-   inode->i_version = 1;
+   inode_set_iversion(inode, 1);
inode->i_generation = le32_to_cpu(fe->i_generation);
inode->i_rdev = huge_decode_dev(le64_to_cpu(fe->id1.dev1.i_rdev));
inode->i_mode = le16_to_cpu(fe->i_mode);
diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
index 3b0a10d9b36f..c801eddc4bf3 100644
--- a/fs/ocfs2/namei.c
+++ b/fs/ocfs2/namei.c
@@ -41,6 +41,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -1520,7 +1521,7 @@ static int ocfs2_rename(struct inode *old_dir,
mlog_errno(status);
goto bail;

[PATCH v4 18/19] btrfs: only dirty the inode in btrfs_update_time if something was changed

2017-12-22 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

At this point, we know that "now" and the file times may differ, and we
suspect that the i_version has been flagged to be bumped. Attempt to
bump the i_version, and only mark the inode dirty if that actually
occurred or if one of the times was updated.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/btrfs/inode.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ac8692849a81..76245323a7c8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6107,19 +6107,20 @@ static int btrfs_update_time(struct inode *inode, 
struct timespec *now,
 int flags)
 {
struct btrfs_root *root = BTRFS_I(inode)->root;
+   bool dirty = flags & ~S_VERSION;
 
if (btrfs_root_readonly(root))
return -EROFS;
 
if (flags & S_VERSION)
-   inode_inc_iversion(inode);
+   dirty |= inode_maybe_inc_iversion(inode, dirty);
if (flags & S_CTIME)
inode->i_ctime = *now;
if (flags & S_MTIME)
inode->i_mtime = *now;
if (flags & S_ATIME)
inode->i_atime = *now;
-   return btrfs_dirty_inode(inode);
+   return dirty ? btrfs_dirty_inode(inode) : 0;
 }
 
 /*
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 17/19] xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need incrementing

2017-12-22 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

If XFS_ILOG_CORE is already set then go ahead and increment it.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/xfs/xfs_trans_inode.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_trans_inode.c b/fs/xfs/xfs_trans_inode.c
index 225544327c4f..4a89da4b6fe7 100644
--- a/fs/xfs/xfs_trans_inode.c
+++ b/fs/xfs/xfs_trans_inode.c
@@ -112,15 +112,17 @@ xfs_trans_log_inode(
 
/*
 * First time we log the inode in a transaction, bump the inode change
-* counter if it is configured for this to occur. We don't use
-* inode_inc_version() because there is no need for extra locking around
-* i_version as we already hold the inode locked exclusively for
-* metadata modification.
+* counter if it is configured for this to occur. While we have the
+* inode locked exclusively for metadata modification, we can usually
+* avoid setting XFS_ILOG_CORE if no one has queried the value since
+* the last time it was incremented. If we have XFS_ILOG_CORE already
+* set however, then go ahead and bump the i_version counter
+* unconditionally.
 */
if (!(ip->i_itemp->ili_item.li_desc->lid_flags & XFS_LID_DIRTY) &&
IS_I_VERSION(VFS_I(ip))) {
-   inode_inc_iversion(VFS_I(ip));
-   flags |= XFS_ILOG_CORE;
+   if (inode_maybe_inc_iversion(VFS_I(ip), flags & XFS_ILOG_CORE))
+   flags |= XFS_ILOG_CORE;
}
 
tp->t_flags |= XFS_TRANS_DIRTY;
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 02/19] fs: don't take the i_lock in inode_inc_iversion

2017-12-22 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

The rationale for taking the i_lock when incrementing this value is
lost in antiquity. The readers of the field don't take it (at least
not universally), so my assumption is that it was only done here to
serialize incrementors.

If that is indeed the case, then we can drop the i_lock from this
codepath and treat it as a atomic64_t for the purposes of
incrementing it. This allows us to use inode_inc_iversion without
any danger of lock inversion.

Note that the read side is not fetched atomically with this change.
The assumption here is that that is not a critical issue since the
i_version is not fully synchronized with anything else anyway.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 include/linux/iversion.h | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/linux/iversion.h b/include/linux/iversion.h
index bb50d27c71f9..e08c634779df 100644
--- a/include/linux/iversion.h
+++ b/include/linux/iversion.h
@@ -104,12 +104,13 @@ inode_set_iversion_queried(struct inode *inode, const u64 
new)
 static inline bool
 inode_maybe_inc_iversion(struct inode *inode, bool force)
 {
-   spin_lock(>i_lock);
-   inode->i_version++;
-   spin_unlock(>i_lock);
+   atomic64_t *ivp = (atomic64_t *)>i_version;
+
+   atomic64_inc(ivp);
return true;
 }
 
+
 /**
  * inode_inc_iversion - forcibly increment i_version
  * @inode: inode that needs to be updated
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 19/19] fs: handle inode->i_version more efficiently

2017-12-21 Thread Jeff Layton
On Wed, 2017-12-20 at 17:41 +0100, Jan Kara wrote:
> On Wed 20-12-17 09:03:06, Jeff Layton wrote:
> > On Tue, 2017-12-19 at 09:07 +1100, Dave Chinner wrote:
> > > On Mon, Dec 18, 2017 at 02:35:20PM -0500, Jeff Layton wrote:
> > > > [PATCH] SQUASH: add memory barriers around i_version accesses
> > > 
> > > Why explicit memory barriers rather than annotating the operations
> > > with the required semantics and getting the barriers the arch
> > > requires automatically?  I suspect this should be using
> > > atomic_read_acquire() and atomic_cmpxchg_release(), because AFAICT
> > > the atomic_cmpxchg needs to have release semantics to match the
> > > acquire semantics needed for the load of the current value.
> > > 
> > > From include/linux/atomics.h:
> > > 
> > >  * For compound atomics performing both a load and a store, ACQUIRE
> > >  * semantics apply only to the load and RELEASE semantics only to the
> > >  * store portion of the operation. Note that a failed cmpxchg_acquire
> > >  * does -not- imply any memory ordering constraints.
> > > 
> > > Memory barriers hurt my brain. :/
> > > 
> > > At minimum, shouldn't the atomic op specific barriers be used rather
> > > than full memory barriers? i.e:
> > > 
> > 
> > They hurt my brain too. Yes, definitely atomic-specific barriers should
> > be used here instead, since this is an atomic64_t now.
> > 
> > After going over the docs again...my understanding has always been that
> > you primarily need memory barriers to order accesses to different areas
> > of memory.
> 
> That is correct.
> 
> > As Jan and I were discussing in the other thread, i_version is not
> > synchronized with anything else. In this code, we're only dealing with a
> > single 64-bit word. I don't think there are any races there wrt the API
> > itself.
> 
> There are not but it is like saying that lock implementation is correct
> because the lock state does not get corrupted ;). Who cares about protected
> data...
> 
> > The "legacy" inode_inc_iversion() however _does_ have implied memory
> > barriers from the i_lock. There could be some subtle de-facto ordering
> > there, so I think we probably do want some barriers in here if only to
> > preserve that. It's not likely to cost much, and may save us tracking
> > down some fiddly bugs.
> > 
> > What about this patch? Note that I've only added barriers to
> > inode_maybe_inc_iversion. I don't see that we need it for the other
> > functions, but please do tell me if I'm wrong there:
> > 
> > 8<---
> > 
> > [PATCH] SQUASH: add memory barriers around i_version accesses
> > 
> > Signed-off-by: Jeff Layton <jlay...@redhat.com>
> > ---
> >  include/linux/iversion.h | 53 
> > +---
> >  1 file changed, 32 insertions(+), 21 deletions(-)
> > 
> > diff --git a/include/linux/iversion.h b/include/linux/iversion.h
> > index a9fbf99709df..02187a3bec3b 100644
> > --- a/include/linux/iversion.h
> > +++ b/include/linux/iversion.h
> > @@ -89,6 +89,23 @@ inode_set_iversion_raw(struct inode *inode, const u64 
> > val)
> > atomic64_set(>i_version, val);
> >  }
> >  
> > +/**
> > + * inode_peek_iversion_raw - grab a "raw" iversion value
> > + * @inode: inode from which i_version should be read
> > + *
> > + * Grab a "raw" inode->i_version value and return it. The i_version is not
> > + * flagged or converted in any way. This is mostly used to access a 
> > self-managed
> > + * i_version.
> > + *
> > + * With those filesystems, we want to treat the i_version as an entirely
> > + * opaque value.
> > + */
> > +static inline u64
> > +inode_peek_iversion_raw(const struct inode *inode)
> > +{
> > +   return atomic64_read(>i_version);
> > +}
> > +
> >  /**
> >   * inode_set_iversion - set i_version to a particular value
> >   * @inode: inode to set
> > @@ -152,7 +169,16 @@ inode_maybe_inc_iversion(struct inode *inode, bool 
> > force)
> >  {
> > u64 cur, old, new;
> >  
> > -   cur = (u64)atomic64_read(>i_version);
> > +   /*
> > +* The i_version field is not strictly ordered with any other inode
> > +* information, but the legacy inode_inc_iversion code used a spinlock
> > +* to serialize increments.
> > +*
> > +* This code adds full mem

Re: [PATCH v3 19/19] fs: handle inode->i_version more efficiently

2017-12-20 Thread Jeff Layton
On Tue, 2017-12-19 at 09:07 +1100, Dave Chinner wrote:
> On Mon, Dec 18, 2017 at 02:35:20PM -0500, Jeff Layton wrote:
> > [PATCH] SQUASH: add memory barriers around i_version accesses
> 
> Why explicit memory barriers rather than annotating the operations
> with the required semantics and getting the barriers the arch
> requires automatically?  I suspect this should be using
> atomic_read_acquire() and atomic_cmpxchg_release(), because AFAICT
> the atomic_cmpxchg needs to have release semantics to match the
> acquire semantics needed for the load of the current value.
> 
> From include/linux/atomics.h:
> 
>  * For compound atomics performing both a load and a store, ACQUIRE
>  * semantics apply only to the load and RELEASE semantics only to the
>  * store portion of the operation. Note that a failed cmpxchg_acquire
>  * does -not- imply any memory ordering constraints.
> 
> Memory barriers hurt my brain. :/
> 
> At minimum, shouldn't the atomic op specific barriers be used rather
> than full memory barriers? i.e:
> 

They hurt my brain too. Yes, definitely atomic-specific barriers should
be used here instead, since this is an atomic64_t now.

After going over the docs again...my understanding has always been that
you primarily need memory barriers to order accesses to different areas
of memory.

As Jan and I were discussing in the other thread, i_version is not
synchronized with anything else. In this code, we're only dealing with a
single 64-bit word. I don't think there are any races there wrt the API
itself.

The "legacy" inode_inc_iversion() however _does_ have implied memory
barriers from the i_lock. There could be some subtle de-facto ordering
there, so I think we probably do want some barriers in here if only to
preserve that. It's not likely to cost much, and may save us tracking
down some fiddly bugs.

What about this patch? Note that I've only added barriers to
inode_maybe_inc_iversion. I don't see that we need it for the other
functions, but please do tell me if I'm wrong there:

8<---

[PATCH] SQUASH: add memory barriers around i_version accesses

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 include/linux/iversion.h | 53 +---
 1 file changed, 32 insertions(+), 21 deletions(-)

diff --git a/include/linux/iversion.h b/include/linux/iversion.h
index a9fbf99709df..02187a3bec3b 100644
--- a/include/linux/iversion.h
+++ b/include/linux/iversion.h
@@ -89,6 +89,23 @@ inode_set_iversion_raw(struct inode *inode, const u64 val)
atomic64_set(>i_version, val);
 }
 
+/**
+ * inode_peek_iversion_raw - grab a "raw" iversion value
+ * @inode: inode from which i_version should be read
+ *
+ * Grab a "raw" inode->i_version value and return it. The i_version is not
+ * flagged or converted in any way. This is mostly used to access a 
self-managed
+ * i_version.
+ *
+ * With those filesystems, we want to treat the i_version as an entirely
+ * opaque value.
+ */
+static inline u64
+inode_peek_iversion_raw(const struct inode *inode)
+{
+   return atomic64_read(>i_version);
+}
+
 /**
  * inode_set_iversion - set i_version to a particular value
  * @inode: inode to set
@@ -152,7 +169,16 @@ inode_maybe_inc_iversion(struct inode *inode, bool force)
 {
u64 cur, old, new;
 
-   cur = (u64)atomic64_read(>i_version);
+   /*
+* The i_version field is not strictly ordered with any other inode
+* information, but the legacy inode_inc_iversion code used a spinlock
+* to serialize increments.
+*
+* This code adds full memory barriers to ensure that any de-facto
+* ordering with other info is preserved.
+*/
+   smp_mb__before_atomic();
+   cur = inode_peek_iversion_raw(inode);
for (;;) {
/* If flag is clear then we needn't do anything */
if (!force && !(cur & I_VERSION_QUERIED))
@@ -162,8 +188,10 @@ inode_maybe_inc_iversion(struct inode *inode, bool force)
new = (cur & ~I_VERSION_QUERIED) + I_VERSION_INCREMENT;
 
old = atomic64_cmpxchg(>i_version, cur, new);
-   if (likely(old == cur))
+   if (likely(old == cur)) {
+   smp_mb__after_atomic();
break;
+   }
cur = old;
}
return true;
@@ -183,23 +211,6 @@ inode_inc_iversion(struct inode *inode)
inode_maybe_inc_iversion(inode, true);
 }
 
-/**
- * inode_peek_iversion_raw - grab a "raw" iversion value
- * @inode: inode from which i_version should be read
- *
- * Grab a "raw" inode->i_version value and return it. The i_version is not
- * flagged or converted in any way. This is mostly used to access a 
self-managed
- * i_version.
- *
- * 

Re: [PATCH v3 19/19] fs: handle inode->i_version more efficiently

2017-12-19 Thread Jeff Layton
On Tue, 2017-12-19 at 10:29 +0100, Jan Kara wrote:
> On Mon 18-12-17 12:22:20, Jeff Layton wrote:
> > On Mon, 2017-12-18 at 17:34 +0100, Jan Kara wrote:
> > > On Mon 18-12-17 10:11:56, Jeff Layton wrote:
> > > >  static inline bool
> > > >  inode_maybe_inc_iversion(struct inode *inode, bool force)
> > > >  {
> > > > -   atomic64_t *ivp = (atomic64_t *)>i_version;
> > > > +   u64 cur, old, new;
> > > >  
> > > > -   atomic64_inc(ivp);
> > > > +   cur = (u64)atomic64_read(>i_version);
> > > > +   for (;;) {
> > > > +   /* If flag is clear then we needn't do anything */
> > > > +   if (!force && !(cur & I_VERSION_QUERIED))
> > > > +   return false;
> > > 
> > > The fast path here misses any memory barrier. Thus it seems this query
> > > could be in theory reordered before any store that happened to modify the
> > > inode? Or maybe we could race and miss the fact that in fact this 
> > > i_version
> > > has already been queried? But maybe there's some higher level locking that
> > > makes sure this is all a non-issue... But in that case it would deserve
> > > some comment I guess.
> > > 
> > 
> > There's no higher-level locking. Getting locking out of this codepath is
> > a good thing IMO. The larger question here is whether we really care
> > about ordering this with anything else.
> > 
> > The i_version, as implemented today, is not ordered with actual changes
> > to the inode. We only take the i_lock today when modifying it, not when
> > querying it. It's possible today that you could see the results of a
> > change and then do a fetch of the i_version that doesn't show an
> > increment vs. a previous change.
> 
> Yeah, so I don't suggest that you should fix unrelated issues but original
> i_lock protection did actually provide memory barriers (although
> semi-permeable, but in practice they are very often enough) and your patch
> removing those could have changed a theoretical issue to a practical
> problem. So at least preserving that original acquire-release semantics
> of i_version handling would be IMHO good.
> 

Agreed. I've no objection to memory barriers here and I'm looking at
that, I just need to go over Dave's comments and memory-barriers.txt
(again!) to make sure I get them right.

> > It'd be nice if this were atomic with the actual changes that it
> > represents, but I think that would be prohibitively expensive. That may
> > be something we need to address. I'm not sure we really want to do it as
> > part of this patchset though.
> > 
> > > > +
> > > > +   /* Since lowest bit is flag, add 2 to avoid it */
> > > > +   new = (cur & ~I_VERSION_QUERIED) + I_VERSION_INCREMENT;
> > > > +
> > > > +   old = atomic64_cmpxchg(>i_version, cur, new);
> > > > +   if (likely(old == cur))
> > > > +   break;
> > > > +   cur = old;
> > > > +   }
> > > > return true;
> > > >  }
> > > >  
> > > 
> > > ...
> > > 
> > > >  static inline u64
> > > >  inode_query_iversion(struct inode *inode)
> > > >  {
> > > > -   return inode_peek_iversion(inode);
> > > > +   u64 cur, old, new;
> > > > +
> > > > +   cur = atomic64_read(>i_version);
> > > > +   for (;;) {
> > > > +   /* If flag is already set, then no need to swap */
> > > > +   if (cur & I_VERSION_QUERIED)
> > > > +   break;
> > > > +
> > > > +   new = cur | I_VERSION_QUERIED;
> > > > +   old = atomic64_cmpxchg(>i_version, cur, new);
> > > > +   if (old == cur)
> > > > +   break;
> > > > +   cur = old;
> > > > +   }
> > > 
> > > Why not just use atomic64_or() here?
> > > 
> > 
> > If the cmpxchg fails, then either:
> > 
> > 1) it was incremented
> > 2) someone flagged it QUERIED
> > 
> > If an increment happened then we don't need to flag it as QUERIED if
> > we're returning an older value. If we use atomic64_or, then we can't
> > tell if an increment happened so we'd end up potentially flagging it
> > more than necessary.
> &

Re: [PATCH v3 19/19] fs: handle inode->i_version more efficiently

2017-12-18 Thread Jeff Layton
On Mon, 2017-12-18 at 12:36 -0500, J. Bruce Fields wrote:
> On Mon, Dec 18, 2017 at 12:22:20PM -0500, Jeff Layton wrote:
> > On Mon, 2017-12-18 at 17:34 +0100, Jan Kara wrote:
> > > On Mon 18-12-17 10:11:56, Jeff Layton wrote:
> > > >  static inline bool
> > > >  inode_maybe_inc_iversion(struct inode *inode, bool force)
> > > >  {
> > > > -   atomic64_t *ivp = (atomic64_t *)>i_version;
> > > > +   u64 cur, old, new;
> > > >  
> > > > -   atomic64_inc(ivp);
> > > > +   cur = (u64)atomic64_read(>i_version);
> > > > +   for (;;) {
> > > > +   /* If flag is clear then we needn't do anything */
> > > > +   if (!force && !(cur & I_VERSION_QUERIED))
> > > > +   return false;
> > > 
> > > The fast path here misses any memory barrier. Thus it seems this query
> > > could be in theory reordered before any store that happened to modify the
> > > inode? Or maybe we could race and miss the fact that in fact this 
> > > i_version
> > > has already been queried? But maybe there's some higher level locking that
> > > makes sure this is all a non-issue... But in that case it would deserve
> > > some comment I guess.
> > > 
> > 
> > There's no higher-level locking. Getting locking out of this codepath is
> > a good thing IMO. The larger question here is whether we really care
> > about ordering this with anything else.
> > 
> > The i_version, as implemented today, is not ordered with actual changes
> > to the inode. We only take the i_lock today when modifying it, not when
> > querying it. It's possible today that you could see the results of a
> > change and then do a fetch of the i_version that doesn't show an
> > increment vs. a previous change.
> > 
> > It'd be nice if this were atomic with the actual changes that it
> > represents, but I think that would be prohibitively expensive. That may
> > be something we need to address. I'm not sure we really want to do it as
> > part of this patchset though.
> 
> I don't think we need full atomicity, but we *do* need the i_version
> increment to be seen as ordered after the inode change.  That should be
> relatively inexpensive, yes?
> 
> Otherwise an inode change could be overlooked indefinitely, because a
> client could cache the old contents with the new i_version value and
> never see the i_version change again as long as the inode isn't touched
> again.
> 
> (If the client instead caches new data with the old inode version, there
> may be an unnecessary cache invalidation next time the client queries
> the change attribute.  That's a performance bug, but not really a
> correctness problem, at least for clients depending only on
> close-to-open semantics: they'll only depend on seeing the new change
> attribute after the change has been flushed to disk.  The performance
> problem should be mitigated by the race being very rare.)

Perhaps something like this patch squashed into #19?

This should tighten up the initial loads and stores, like Jan was
suggesting (I think). The increments are done via cmpxchg so they
already have a full implied barrier (IIUC), and we don't exit the loop
until it succeeds or it looks like the flag is set.

To be clear though:

This series does not try to order the i_version update with any other
inode metadata or data (though we generally do update it after a write
is done, rather than before or during).

Inode field updates are not at all synchronized either and this series
does not change that. A stat() is also generally done locklessly so you
can easily get a half-baked attributes there if you hit the timing
wrong.

-8<--

[PATCH] SQUASH: add memory barriers around i_version accesses

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 include/linux/iversion.h | 40 +---
 1 file changed, 21 insertions(+), 19 deletions(-)

diff --git a/include/linux/iversion.h b/include/linux/iversion.h
index a9fbf99709df..39ec9aa9e08e 100644
--- a/include/linux/iversion.h
+++ b/include/linux/iversion.h
@@ -87,6 +87,25 @@ static inline void
 inode_set_iversion_raw(struct inode *inode, const u64 val)
 {
atomic64_set(>i_version, val);
+   smp_wmb();
+}
+
+/**
+ * inode_peek_iversion_raw - grab a "raw" iversion value
+ * @inode: inode from which i_version should be read
+ *
+ * Grab a "raw" inode->i_version value and return it. The i_version is not
+ * flagged or converted in any way. This is mostly used to access a 
self-managed
+ * i_version.
+ *
+ * 

Re: [PATCH v3 01/19] fs: new API for handling inode->i_version

2017-12-18 Thread Jeff Layton
On Mon, 2017-12-18 at 10:11 -0500, Jeff Layton wrote:
> From: Jeff Layton <jlay...@redhat.com>
> 
> Add a documentation blob that explains what the i_version field is, how
> it is expected to work, and how it is currently implemented by various
> filesystems.
> 
> We already have inode_inc_iversion. Add several other functions for
> manipulating and accessing the i_version counter. For now, the
> implementation is trivial and basically works the way that all of the
> open-coded i_version accesses work today.
> 
> Future patches will convert existing users of i_version to use the new
> API, and then convert the backend implementation to do things more
> efficiently.
> 
> Signed-off-by: Jeff Layton <jlay...@redhat.com>
> ---
>  include/linux/fs.h   |  15 
>  include/linux/iversion.h | 205 
> +++
>  2 files changed, 205 insertions(+), 15 deletions(-)
>  create mode 100644 include/linux/iversion.h
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 511fbaabf624..76382c24e9d0 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2036,21 +2036,6 @@ static inline void inode_dec_link_count(struct inode 
> *inode)
>   mark_inode_dirty(inode);
>  }
>  
> -/**
> - * inode_inc_iversion - increments i_version
> - * @inode: inode that need to be updated
> - *
> - * Every time the inode is modified, the i_version field will be incremented.
> - * The filesystem has to be mounted with i_version flag
> - */
> -
> -static inline void inode_inc_iversion(struct inode *inode)
> -{
> -   spin_lock(>i_lock);
> -   inode->i_version++;
> -   spin_unlock(>i_lock);
> -}
> -

Note that this patch is broken, as I neglected to include
linux/iversion.h in the files with the existing callers of
inode_inc_iversion. Fixed in my in-tree version.


>  enum file_time_flags {
>   S_ATIME = 1,
>   S_MTIME = 2,
> diff --git a/include/linux/iversion.h b/include/linux/iversion.h
> new file mode 100644
> index ..bb50d27c71f9
> --- /dev/null
> +++ b/include/linux/iversion.h
> @@ -0,0 +1,205 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_IVERSION_H
> +#define _LINUX_IVERSION_H
> +
> +#include 
> +
> +/*
> + * The change attribute (i_version) is mandated by NFSv4 and is mostly for
> + * knfsd, but is also used for other purposes (e.g. IMA). The i_version must
> + * appear different to observers if there was a change to the inode's data or
> + * metadata since it was last queried.
> + *
> + * It should be considered an opaque value by observers. If it remains the 
> same
> + * since it was last checked, then nothing has changed in the inode. If it's
> + * different then something has changed. Observers cannot infer anything 
> about
> + * the nature or magnitude of the changes from the value, only that the inode
> + * has changed in some fashion.
> + *
> + * Not all filesystems properly implement the i_version counter. Subsystems 
> that
> + * want to use i_version field on an inode should first check whether the
> + * filesystem sets the SB_I_VERSION flag (usually via the IS_I_VERSION 
> macro).
> + *
> + * Those that set SB_I_VERSION will automatically have their i_version 
> counter
> + * incremented on writes to normal files. If the SB_I_VERSION is not set, 
> then
> + * the VFS will not touch it on writes, and the filesystem can use it how it
> + * wishes. Note that the filesystem is always responsible for updating the
> + * i_version on namespace changes in directories (mkdir, rmdir, unlink, 
> etc.).
> + * We consider these sorts of filesystems to have a kernel-managed i_version.
> + *
> + * Note that some filesystems (e.g. NFS and AFS) just use the field to store
> + * a server-provided value (for the most part). For that reason, those
> + * filesystems do not set SB_I_VERSION. These filesystems are considered to
> + * have a self-managed i_version.
> + */
> +
> +/**
> + * inode_set_iversion_raw - set i_version to the specified raw value
> + * @inode: inode to set
> + * @new: new i_version value to set
> + *
> + * Set @inode's i_version field to @new. This function is for use by
> + * filesystems that self-manage the i_version.
> + *
> + * For example, the NFS client stores its NFSv4 change attribute in this way,
> + * and the AFS client stores the data_version from the server here.
> + */
> +static inline void
> +inode_set_iversion_raw(struct inode *inode, const u64 new)
> +{
> + inode->i_version = new;
> +}
> +
> +/**
> + * inode_set_iversion - set i_version to a particular value
> + * @inode: inod

Re: [PATCH v3 19/19] fs: handle inode->i_version more efficiently

2017-12-18 Thread Jeff Layton
On Mon, 2017-12-18 at 17:34 +0100, Jan Kara wrote:
> On Mon 18-12-17 10:11:56, Jeff Layton wrote:
> >  static inline bool
> >  inode_maybe_inc_iversion(struct inode *inode, bool force)
> >  {
> > -   atomic64_t *ivp = (atomic64_t *)>i_version;
> > +   u64 cur, old, new;
> >  
> > -   atomic64_inc(ivp);
> > +   cur = (u64)atomic64_read(>i_version);
> > +   for (;;) {
> > +   /* If flag is clear then we needn't do anything */
> > +   if (!force && !(cur & I_VERSION_QUERIED))
> > +   return false;
> 
> The fast path here misses any memory barrier. Thus it seems this query
> could be in theory reordered before any store that happened to modify the
> inode? Or maybe we could race and miss the fact that in fact this i_version
> has already been queried? But maybe there's some higher level locking that
> makes sure this is all a non-issue... But in that case it would deserve
> some comment I guess.
> 

There's no higher-level locking. Getting locking out of this codepath is
a good thing IMO. The larger question here is whether we really care
about ordering this with anything else.

The i_version, as implemented today, is not ordered with actual changes
to the inode. We only take the i_lock today when modifying it, not when
querying it. It's possible today that you could see the results of a
change and then do a fetch of the i_version that doesn't show an
increment vs. a previous change.

It'd be nice if this were atomic with the actual changes that it
represents, but I think that would be prohibitively expensive. That may
be something we need to address. I'm not sure we really want to do it as
part of this patchset though.

> > +
> > +   /* Since lowest bit is flag, add 2 to avoid it */
> > +   new = (cur & ~I_VERSION_QUERIED) + I_VERSION_INCREMENT;
> > +
> > +   old = atomic64_cmpxchg(>i_version, cur, new);
> > +   if (likely(old == cur))
> > +   break;
> > +   cur = old;
> > +   }
> > return true;
> >  }
> >  
> 
> ...
> 
> >  static inline u64
> >  inode_query_iversion(struct inode *inode)
> >  {
> > -   return inode_peek_iversion(inode);
> > +   u64 cur, old, new;
> > +
> > +   cur = atomic64_read(>i_version);
> > +   for (;;) {
> > +   /* If flag is already set, then no need to swap */
> > +   if (cur & I_VERSION_QUERIED)
> > +   break;
> > +
> > +   new = cur | I_VERSION_QUERIED;
> > +   old = atomic64_cmpxchg(>i_version, cur, new);
> > +   if (old == cur)
> > +   break;
> > +   cur = old;
> > +   }
> 
> Why not just use atomic64_or() here?
> 

If the cmpxchg fails, then either:

1) it was incremented
2) someone flagged it QUERIED

If an increment happened then we don't need to flag it as QUERIED if
we're returning an older value. If we use atomic64_or, then we can't
tell if an increment happened so we'd end up potentially flagging it
more than necessary.

In principle, either outcome is technically OK and we don't have to loop
if the cmpxchg doesn't work. That said, if we think there might be a
later i_version available, then I think we probably want to try to query
it again so we can return as late a one as possible.
-- 
Jeff Layton <jlay...@kernel.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 01/19] fs: new API for handling inode->i_version

2017-12-18 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Add a documentation blob that explains what the i_version field is, how
it is expected to work, and how it is currently implemented by various
filesystems.

We already have inode_inc_iversion. Add several other functions for
manipulating and accessing the i_version counter. For now, the
implementation is trivial and basically works the way that all of the
open-coded i_version accesses work today.

Future patches will convert existing users of i_version to use the new
API, and then convert the backend implementation to do things more
efficiently.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 include/linux/fs.h   |  15 
 include/linux/iversion.h | 205 +++
 2 files changed, 205 insertions(+), 15 deletions(-)
 create mode 100644 include/linux/iversion.h

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 511fbaabf624..76382c24e9d0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2036,21 +2036,6 @@ static inline void inode_dec_link_count(struct inode 
*inode)
mark_inode_dirty(inode);
 }
 
-/**
- * inode_inc_iversion - increments i_version
- * @inode: inode that need to be updated
- *
- * Every time the inode is modified, the i_version field will be incremented.
- * The filesystem has to be mounted with i_version flag
- */
-
-static inline void inode_inc_iversion(struct inode *inode)
-{
-   spin_lock(>i_lock);
-   inode->i_version++;
-   spin_unlock(>i_lock);
-}
-
 enum file_time_flags {
S_ATIME = 1,
S_MTIME = 2,
diff --git a/include/linux/iversion.h b/include/linux/iversion.h
new file mode 100644
index ..bb50d27c71f9
--- /dev/null
+++ b/include/linux/iversion.h
@@ -0,0 +1,205 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_IVERSION_H
+#define _LINUX_IVERSION_H
+
+#include 
+
+/*
+ * The change attribute (i_version) is mandated by NFSv4 and is mostly for
+ * knfsd, but is also used for other purposes (e.g. IMA). The i_version must
+ * appear different to observers if there was a change to the inode's data or
+ * metadata since it was last queried.
+ *
+ * It should be considered an opaque value by observers. If it remains the same
+ * since it was last checked, then nothing has changed in the inode. If it's
+ * different then something has changed. Observers cannot infer anything about
+ * the nature or magnitude of the changes from the value, only that the inode
+ * has changed in some fashion.
+ *
+ * Not all filesystems properly implement the i_version counter. Subsystems 
that
+ * want to use i_version field on an inode should first check whether the
+ * filesystem sets the SB_I_VERSION flag (usually via the IS_I_VERSION macro).
+ *
+ * Those that set SB_I_VERSION will automatically have their i_version counter
+ * incremented on writes to normal files. If the SB_I_VERSION is not set, then
+ * the VFS will not touch it on writes, and the filesystem can use it how it
+ * wishes. Note that the filesystem is always responsible for updating the
+ * i_version on namespace changes in directories (mkdir, rmdir, unlink, etc.).
+ * We consider these sorts of filesystems to have a kernel-managed i_version.
+ *
+ * Note that some filesystems (e.g. NFS and AFS) just use the field to store
+ * a server-provided value (for the most part). For that reason, those
+ * filesystems do not set SB_I_VERSION. These filesystems are considered to
+ * have a self-managed i_version.
+ */
+
+/**
+ * inode_set_iversion_raw - set i_version to the specified raw value
+ * @inode: inode to set
+ * @new: new i_version value to set
+ *
+ * Set @inode's i_version field to @new. This function is for use by
+ * filesystems that self-manage the i_version.
+ *
+ * For example, the NFS client stores its NFSv4 change attribute in this way,
+ * and the AFS client stores the data_version from the server here.
+ */
+static inline void
+inode_set_iversion_raw(struct inode *inode, const u64 new)
+{
+   inode->i_version = new;
+}
+
+/**
+ * inode_set_iversion - set i_version to a particular value
+ * @inode: inode to set
+ * @new: new i_version value to set
+ *
+ * Set @inode's i_version field to @new. This function is for filesystems with
+ * a kernel-managed i_version.
+ *
+ * For now, this just does the same thing as the _raw variant.
+ */
+static inline void
+inode_set_iversion(struct inode *inode, const u64 new)
+{
+   inode_set_iversion_raw(inode, new);
+}
+
+/**
+ * inode_set_iversion_queried - set i_version to a particular value and set
+ *  flag to indicate that it has been viewed
+ * @inode: inode to set
+ * @new: new i_version value to set
+ *
+ * When loading in an i_version value from a backing store, we typically don't
+ * know whether it was previously viewed before being stored or not. Thus, we
+ * must assume that it was, to ensure that any changes will result in the
+ * value changing.
+ *
+ 

[PATCH v3 00/19] fs: rework and optimize i_version handling in filesystems

2017-12-18 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

v3:
- move i_version handling functions to new header file
- document that the kernel-managed i_version implementation will appear to
  increase over time
- fix inode_cmp_iversion to handle wraparound correctly

v2:
- xfs should use inode_peek_iversion instead of inode_peek_iversion_raw
- rework file_update_time patch
- don't dirty inode when only S_ATIME is set and SB_LAZYTIME is enabled
- better comments and documentation

tl;dr: I think we can greatly reduce the cost of the inode->i_version
counter, by exploiting the fact that we don't need to increment it if no
one is looking at it. We can also clean up the code to prepare to
eventually expose this value via statx().

Note that this set relies on a few patches that are in other trees. The
full stack that I've been testing with is here:


https://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git/log/?h=iversion

The inode->i_version field is supposed to be a value that changes
whenever there is any data or metadata change to the inode. Some
filesystems use it internally to detect directory changes during
readdir. knfsd will use it if the filesystem has MS_I_VERSION set. IMA
will also use it to optimize away some remeasurement if it's available.
NFS and AFS just use it to store an opaque change attribute from the
server.

Only btrfs, ext4, and xfs increment it for data changes. Because of
this, these filesystems must log the inode to disk whenever the
i_version counter changes. That has a non-zero performance impact,
especially on write-heavy workloads, because we end up dirtying the
inode metadata on every write, not just when the times change.

It turns out though that none of these users of i_version require that
it change on every change to the file. The only real requirement is that
it be different if something changed since the last time we queried for
it.

If we keep track of when something queries the value, we can avoid
bumping the counter and an on-disk update when nothing else has changed
if no one has queried it since it was last incremented.

This patchset changes the code to only bump the i_version counter when
it's strictly necessary, or when we're updating the inode metadata
anyway (e.g. when times change).

It takes the approach of converting the existing accessors of i_version
to use a new API, while leaving the underlying implementation mostly the
same.  The last patch then converts the existing implementation to keep
track of whether the value has been queried since it was last
incremented. It then uses that to avoid incrementing the counter when
it can.

With this, we reduce inode metadata updates across all 3 filesystems
down to roughly the frequency of the timestamp granularity, particularly
when it's not being queried (the vastly common case).

I can see measurable performance gains on xfs and ext4 with iversion
enabled, when streaming small (4k) I/Os. 

btrfs shows some slight gain in testing, but not quite the magnitude
that xfs and ext4 show. I'm not sure why yet and would appreciate some
input from btrfs folks.

My goal is to get this into linux-next fairly soon. If it shows no
problems then we can look at merging it for 4.16, or 4.17 if all of the
prequisite patches are not yet merged.

Jeff Layton (19):
  fs: new API for handling inode->i_version
  fs: don't take the i_lock in inode_inc_iversion
  fat: convert to new i_version API
  affs: convert to new i_version API
  afs: convert to new i_version API
  btrfs: convert to new i_version API
  exofs: switch to new i_version API
  ext2: convert to new i_version API
  ext4: convert to new i_version API
  nfs: convert to new i_version API
  nfsd: convert to new i_version API
  ocfs2: convert to new i_version API
  ufs: use new i_version API
  xfs: convert to new i_version API
  IMA: switch IMA over to new i_version API
  fs: only set S_VERSION when updating times if necessary
  xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need
incrementing
  btrfs: only dirty the inode in btrfs_update_time if something was
changed
  fs: handle inode->i_version more efficiently

 fs/affs/amigaffs.c|   5 +-
 fs/affs/dir.c |   5 +-
 fs/affs/super.c   |   3 +-
 fs/afs/fsclient.c |   3 +-
 fs/afs/inode.c|   5 +-
 fs/btrfs/delayed-inode.c  |   7 +-
 fs/btrfs/file.c   |   1 +
 fs/btrfs/inode.c  |  12 +-
 fs/btrfs/ioctl.c  |   1 +
 fs/btrfs/tree-log.c   |   4 +-
 fs/btrfs/xattr.c  |   1 +
 fs/exofs/dir.c|   9 +-
 fs/exofs/super.c  |   3 +-
 fs/ext2/dir.c |   9 +-
 fs/ext2/super.c   |   5 +-
 fs/ext4/dir.c |   9 +-
 fs/ext4/inline.c  |   7 +-
 fs/ext4/inode.c   |  13 +-
 fs/ext4/ioctl.c   |   3 +

[PATCH v3 04/19] affs: convert to new i_version API

2017-12-18 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/affs/amigaffs.c | 5 +++--
 fs/affs/dir.c  | 5 +++--
 fs/affs/super.c| 3 ++-
 3 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/affs/amigaffs.c b/fs/affs/amigaffs.c
index 0f0e6925e97d..14a6c1b90c9f 100644
--- a/fs/affs/amigaffs.c
+++ b/fs/affs/amigaffs.c
@@ -10,6 +10,7 @@
  */
 
 #include 
+#include 
 #include "affs.h"
 
 /*
@@ -60,7 +61,7 @@ affs_insert_hash(struct inode *dir, struct buffer_head *bh)
affs_brelse(dir_bh);
 
dir->i_mtime = dir->i_ctime = current_time(dir);
-   dir->i_version++;
+   inode_inc_iversion(dir);
mark_inode_dirty(dir);
 
return 0;
@@ -114,7 +115,7 @@ affs_remove_hash(struct inode *dir, struct buffer_head 
*rem_bh)
affs_brelse(bh);
 
dir->i_mtime = dir->i_ctime = current_time(dir);
-   dir->i_version++;
+   inode_inc_iversion(dir);
mark_inode_dirty(dir);
 
return retval;
diff --git a/fs/affs/dir.c b/fs/affs/dir.c
index a105e77df2c1..d180b46453cf 100644
--- a/fs/affs/dir.c
+++ b/fs/affs/dir.c
@@ -14,6 +14,7 @@
  *
  */
 
+#include 
 #include "affs.h"
 
 static int affs_readdir(struct file *, struct dir_context *);
@@ -80,7 +81,7 @@ affs_readdir(struct file *file, struct dir_context *ctx)
 * we can jump directly to where we left off.
 */
ino = (u32)(long)file->private_data;
-   if (ino && file->f_version == inode->i_version) {
+   if (ino && inode_cmp_iversion(inode, file->f_version) == 0) {
pr_debug("readdir() left off=%d\n", ino);
goto inside;
}
@@ -130,7 +131,7 @@ affs_readdir(struct file *file, struct dir_context *ctx)
} while (ino);
}
 done:
-   file->f_version = inode->i_version;
+   file->f_version = inode_query_iversion(inode);
file->private_data = (void *)(long)ino;
affs_brelse(fh_bh);
 
diff --git a/fs/affs/super.c b/fs/affs/super.c
index 1117e36134cc..e602619aed9d 100644
--- a/fs/affs/super.c
+++ b/fs/affs/super.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "affs.h"
 
 static int affs_statfs(struct dentry *dentry, struct kstatfs *buf);
@@ -102,7 +103,7 @@ static struct inode *affs_alloc_inode(struct super_block 
*sb)
if (!i)
return NULL;
 
-   i->vfs_inode.i_version = 1;
+   inode_set_iversion(>vfs_inode, 1);
i->i_lc = NULL;
i->i_ext_bh = NULL;
i->i_pa_cnt = 0;
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 03/19] fat: convert to new i_version API

2017-12-18 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/fat/dir.c |  3 ++-
 fs/fat/inode.c   |  9 +
 fs/fat/namei_msdos.c |  7 ---
 fs/fat/namei_vfat.c  | 22 +++---
 4 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/fs/fat/dir.c b/fs/fat/dir.c
index b833ffeee1e1..8e100c3bf72c 100644
--- a/fs/fat/dir.c
+++ b/fs/fat/dir.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "fat.h"
 
 /*
@@ -1055,7 +1056,7 @@ int fat_remove_entries(struct inode *dir, struct 
fat_slot_info *sinfo)
brelse(bh);
if (err)
return err;
-   dir->i_version++;
+   inode_inc_iversion(dir);
 
if (nr_slots) {
/*
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 20a0a89eaca5..ffbbf0520d9e 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "fat.h"
 
 #ifndef CONFIG_FAT_DEFAULT_IOCHARSET
@@ -507,7 +508,7 @@ int fat_fill_inode(struct inode *inode, struct 
msdos_dir_entry *de)
MSDOS_I(inode)->i_pos = 0;
inode->i_uid = sbi->options.fs_uid;
inode->i_gid = sbi->options.fs_gid;
-   inode->i_version++;
+   inode_inc_iversion(inode);
inode->i_generation = get_seconds();
 
if ((de->attr & ATTR_DIR) && !IS_FREE(de->name)) {
@@ -590,7 +591,7 @@ struct inode *fat_build_inode(struct super_block *sb,
goto out;
}
inode->i_ino = iunique(sb, MSDOS_ROOT_INO);
-   inode->i_version = 1;
+   inode_set_iversion(inode, 1);
err = fat_fill_inode(inode, de);
if (err) {
iput(inode);
@@ -1377,7 +1378,7 @@ static int fat_read_root(struct inode *inode)
MSDOS_I(inode)->i_pos = MSDOS_ROOT_INO;
inode->i_uid = sbi->options.fs_uid;
inode->i_gid = sbi->options.fs_gid;
-   inode->i_version++;
+   inode_inc_iversion(inode);
inode->i_generation = 0;
inode->i_mode = fat_make_mode(sbi, ATTR_DIR, S_IRWXUGO);
inode->i_op = sbi->dir_ops;
@@ -1828,7 +1829,7 @@ int fat_fill_super(struct super_block *sb, void *data, 
int silent, int isvfat,
if (!root_inode)
goto out_fail;
root_inode->i_ino = MSDOS_ROOT_INO;
-   root_inode->i_version = 1;
+   inode_set_iversion(root_inode, 1);
error = fat_read_root(root_inode);
if (error < 0) {
iput(root_inode);
diff --git a/fs/fat/namei_msdos.c b/fs/fat/namei_msdos.c
index d24d2758a363..582ca731a6c9 100644
--- a/fs/fat/namei_msdos.c
+++ b/fs/fat/namei_msdos.c
@@ -7,6 +7,7 @@
  */
 
 #include 
+#include 
 #include "fat.h"
 
 /* Characters that are undesirable in an MS-DOS file name */
@@ -480,7 +481,7 @@ static int do_msdos_rename(struct inode *old_dir, unsigned 
char *old_name,
} else
mark_inode_dirty(old_inode);
 
-   old_dir->i_version++;
+   inode_inc_iversion(old_dir);
old_dir->i_ctime = old_dir->i_mtime = 
current_time(old_dir);
if (IS_DIRSYNC(old_dir))
(void)fat_sync_inode(old_dir);
@@ -508,7 +509,7 @@ static int do_msdos_rename(struct inode *old_dir, unsigned 
char *old_name,
goto out;
new_i_pos = sinfo.i_pos;
}
-   new_dir->i_version++;
+   inode_inc_iversion(new_dir);
 
fat_detach(old_inode);
fat_attach(old_inode, new_i_pos);
@@ -540,7 +541,7 @@ static int do_msdos_rename(struct inode *old_dir, unsigned 
char *old_name,
old_sinfo.bh = NULL;
if (err)
goto error_dotdot;
-   old_dir->i_version++;
+   inode_inc_iversion(old_dir);
old_dir->i_ctime = old_dir->i_mtime = ts;
if (IS_DIRSYNC(old_dir))
(void)fat_sync_inode(old_dir);
diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 02c03a3a..cefea792cde8 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -20,7 +20,7 @@
 #include 
 #include 
 #include 
-
+#include 
 #include "fat.h"
 
 static inline unsigned long vfat_d_version(struct dentry *dentry)
@@ -46,7 +46,7 @@ static int vfat_revalidate_shortname(struct dentry *dentry)
 {
int ret = 1;
spin_lock(>d_lock);
-   if (vfat_d_version(dentry) != d_inode(dentry->d_parent)->i_version)
+   if (inode_cmp_iversion(d_inode(dentry->d_parent), 
vfat_d_version(dentry)))
ret = 0;
spin_unlock(>d_lock);
return ret;
@@ -759,7 +759,7 @@ static struct dentry *vfat_lookup(struct inode *dir, struct 
dentry *dentry,
 out:
mutex_unlock(_SB(sb)->s_lock);
if (!inode)
-   vfat_d_version_set

[PATCH v3 02/19] fs: don't take the i_lock in inode_inc_iversion

2017-12-18 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

The rationale for taking the i_lock when incrementing this value is
lost in antiquity. The readers of the field don't take it (at least
not universally), so my assumption is that it was only done here to
serialize incrementors.

If that is indeed the case, then we can drop the i_lock from this
codepath and treat it as a atomic64_t for the purposes of
incrementing it. This allows us to use inode_inc_iversion without
any danger of lock inversion.

Note that the read side is not fetched atomically with this change.
The assumption here is that that is not a critical issue since the
i_version is not fully synchronized with anything else anyway.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 include/linux/iversion.h | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/linux/iversion.h b/include/linux/iversion.h
index bb50d27c71f9..e08c634779df 100644
--- a/include/linux/iversion.h
+++ b/include/linux/iversion.h
@@ -104,12 +104,13 @@ inode_set_iversion_queried(struct inode *inode, const u64 
new)
 static inline bool
 inode_maybe_inc_iversion(struct inode *inode, bool force)
 {
-   spin_lock(>i_lock);
-   inode->i_version++;
-   spin_unlock(>i_lock);
+   atomic64_t *ivp = (atomic64_t *)>i_version;
+
+   atomic64_inc(ivp);
return true;
 }
 
+
 /**
  * inode_inc_iversion - forcibly increment i_version
  * @inode: inode that needs to be updated
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 05/19] afs: convert to new i_version API

2017-12-18 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

For AFS, it's generally treated as an opaque value, so we use the
*_raw variants of the API here.

Note that AFS has quite a different definition for this counter. AFS
only increments it on changes to the data, not for the metadata. We'll
need to reconcile that somehow if we ever want to present this to
userspace via statx.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/afs/fsclient.c | 3 ++-
 fs/afs/inode.c| 5 +++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index b90ef39ae914..88ec38c2d83c 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 #include "afs_fs.h"
 
@@ -124,7 +125,7 @@ static void xdr_decode_AFSFetchStatus(const __be32 **_bp,
vnode->vfs_inode.i_ctime.tv_sec = status->mtime_client;
vnode->vfs_inode.i_mtime= vnode->vfs_inode.i_ctime;
vnode->vfs_inode.i_atime= vnode->vfs_inode.i_ctime;
-   vnode->vfs_inode.i_version  = data_version;
+   inode_set_iversion_raw(>vfs_inode, data_version);
}
 
expected_version = status->data_version;
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 3415eb7484f6..dcd2e08d6cdb 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 
 static const struct inode_operations afs_symlink_inode_operations = {
@@ -89,7 +90,7 @@ static int afs_inode_map_status(struct afs_vnode *vnode, 
struct key *key)
inode->i_atime  = inode->i_mtime = inode->i_ctime;
inode->i_blocks = 0;
inode->i_generation = vnode->fid.unique;
-   inode->i_version= vnode->status.data_version;
+   inode_set_iversion_raw(inode, vnode->status.data_version);
inode->i_mapping->a_ops = _fs_aops;
 
read_sequnlock_excl(>cb_lock);
@@ -218,7 +219,7 @@ struct inode *afs_iget_autocell(struct inode *dir, const 
char *dev_name,
inode->i_ctime.tv_nsec  = 0;
inode->i_atime  = inode->i_mtime = inode->i_ctime;
inode->i_blocks = 0;
-   inode->i_version= 0;
+   inode_set_iversion_raw(inode, 0);
inode->i_generation = 0;
 
set_bit(AFS_VNODE_PSEUDODIR, >flags);
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 07/19] exofs: switch to new i_version API

2017-12-18 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/exofs/dir.c   | 9 +
 fs/exofs/super.c | 3 ++-
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/exofs/dir.c b/fs/exofs/dir.c
index 98233a97b7b8..c5a53fcc43ea 100644
--- a/fs/exofs/dir.c
+++ b/fs/exofs/dir.c
@@ -31,6 +31,7 @@
  * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
  */
 
+#include 
 #include "exofs.h"
 
 static inline unsigned exofs_chunk_size(struct inode *inode)
@@ -60,7 +61,7 @@ static int exofs_commit_chunk(struct page *page, loff_t pos, 
unsigned len)
struct inode *dir = mapping->host;
int err = 0;
 
-   dir->i_version++;
+   inode_inc_iversion(dir);
 
if (!PageUptodate(page))
SetPageUptodate(page);
@@ -241,7 +242,7 @@ exofs_readdir(struct file *file, struct dir_context *ctx)
unsigned long n = pos >> PAGE_SHIFT;
unsigned long npages = dir_pages(inode);
unsigned chunk_mask = ~(exofs_chunk_size(inode)-1);
-   int need_revalidate = (file->f_version != inode->i_version);
+   bool need_revalidate = inode_cmp_iversion(inode, file->f_version);
 
if (pos > inode->i_size - EXOFS_DIR_REC_LEN(1))
return 0;
@@ -264,8 +265,8 @@ exofs_readdir(struct file *file, struct dir_context *ctx)
chunk_mask);
ctx->pos = (n<<PAGE_SHIFT) + offset;
}
-   file->f_version = inode->i_version;
-   need_revalidate = 0;
+   file->f_version = inode_query_iversion(inode);
+   need_revalidate = false;
}
de = (struct exofs_dir_entry *)(kaddr + offset);
limit = kaddr + exofs_last_byte(inode, n) -
diff --git a/fs/exofs/super.c b/fs/exofs/super.c
index 819624cfc8da..7e244093c0e5 100644
--- a/fs/exofs/super.c
+++ b/fs/exofs/super.c
@@ -38,6 +38,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "exofs.h"
 
@@ -159,7 +160,7 @@ static struct inode *exofs_alloc_inode(struct super_block 
*sb)
if (!oi)
return NULL;
 
-   oi->vfs_inode.i_version = 1;
+   inode_set_iversion(>vfs_inode, 1);
return >vfs_inode;
 }
 
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 06/19] btrfs: convert to new i_version API

2017-12-18 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/btrfs/delayed-inode.c | 7 +--
 fs/btrfs/file.c  | 1 +
 fs/btrfs/inode.c | 7 +--
 fs/btrfs/ioctl.c | 1 +
 fs/btrfs/tree-log.c  | 4 +++-
 fs/btrfs/xattr.c | 1 +
 6 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 5d73f79ded8b..6a246ae2bcb2 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -18,6 +18,7 @@
  */
 
 #include 
+#include 
 #include "delayed-inode.h"
 #include "disk-io.h"
 #include "transaction.h"
@@ -1700,7 +1701,8 @@ static void fill_stack_inode_item(struct 
btrfs_trans_handle *trans,
btrfs_set_stack_inode_nbytes(inode_item, inode_get_bytes(inode));
btrfs_set_stack_inode_generation(inode_item,
 BTRFS_I(inode)->generation);
-   btrfs_set_stack_inode_sequence(inode_item, inode->i_version);
+   btrfs_set_stack_inode_sequence(inode_item,
+  inode_peek_iversion(inode));
btrfs_set_stack_inode_transid(inode_item, trans->transid);
btrfs_set_stack_inode_rdev(inode_item, inode->i_rdev);
btrfs_set_stack_inode_flags(inode_item, BTRFS_I(inode)->flags);
@@ -1754,7 +1756,8 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
BTRFS_I(inode)->generation = btrfs_stack_inode_generation(inode_item);
 BTRFS_I(inode)->last_trans = btrfs_stack_inode_transid(inode_item);
 
-   inode->i_version = btrfs_stack_inode_sequence(inode_item);
+   inode_set_iversion_queried(inode,
+  btrfs_stack_inode_sequence(inode_item));
inode->i_rdev = 0;
*rdev = btrfs_stack_inode_rdev(inode_item);
BTRFS_I(inode)->flags = btrfs_stack_inode_flags(inode_item);
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index eb1bac7c8553..c95d7b2efefb 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ctree.h"
 #include "disk-io.h"
 #include "transaction.h"
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e1a7f3cb5be9..ac8692849a81 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ctree.h"
 #include "disk-io.h"
 #include "transaction.h"
@@ -3777,7 +3778,8 @@ static int btrfs_read_locked_inode(struct inode *inode)
BTRFS_I(inode)->generation = btrfs_inode_generation(leaf, inode_item);
BTRFS_I(inode)->last_trans = btrfs_inode_transid(leaf, inode_item);
 
-   inode->i_version = btrfs_inode_sequence(leaf, inode_item);
+   inode_set_iversion_queried(inode,
+  btrfs_inode_sequence(leaf, inode_item));
inode->i_generation = BTRFS_I(inode)->generation;
inode->i_rdev = 0;
rdev = btrfs_inode_rdev(leaf, inode_item);
@@ -3945,7 +3947,8 @@ static void fill_inode_item(struct btrfs_trans_handle 
*trans,
 );
btrfs_set_token_inode_generation(leaf, item, BTRFS_I(inode)->generation,
 );
-   btrfs_set_token_inode_sequence(leaf, item, inode->i_version, );
+   btrfs_set_token_inode_sequence(leaf, item, inode_peek_iversion(inode),
+  );
btrfs_set_token_inode_transid(leaf, item, trans->transid, );
btrfs_set_token_inode_rdev(leaf, item, inode->i_rdev, );
btrfs_set_token_inode_flags(leaf, item, BTRFS_I(inode)->flags, );
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 2ef8acaac688..aa452c9e2eff 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -43,6 +43,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ctree.h"
 #include "disk-io.h"
 #include "transaction.h"
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 7bf9b31561db..1b7d92075c1f 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "tree-log.h"
 #include "disk-io.h"
 #include "locking.h"
@@ -3609,7 +3610,8 @@ static void fill_inode_item(struct btrfs_trans_handle 
*trans,
btrfs_set_token_inode_nbytes(leaf, item, inode_get_bytes(inode),
 );
 
-   btrfs_set_token_inode_sequence(leaf, item, inode->i_version, );
+   btrfs_set_token_inode_sequence(leaf, item,
+  inode_peek_iversion(inode), );
btrfs_set_token_inode_transid(leaf, item, trans->transid, );
btrfs_set_token_inode_rdev(leaf, item, inode->i_rdev, );
btrfs_set_token_inode_flags(leaf, item, BTRFS_I(inode)->

[PATCH v3 08/19] ext2: convert to new i_version API

2017-12-18 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
Reviewed-by: Jan Kara <j...@suse.cz>
---
 fs/ext2/dir.c   | 9 +
 fs/ext2/super.c | 5 +++--
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c
index 987647986f47..4111085a129f 100644
--- a/fs/ext2/dir.c
+++ b/fs/ext2/dir.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 typedef struct ext2_dir_entry_2 ext2_dirent;
 
@@ -92,7 +93,7 @@ static int ext2_commit_chunk(struct page *page, loff_t pos, 
unsigned len)
struct inode *dir = mapping->host;
int err = 0;
 
-   dir->i_version++;
+   inode_inc_iversion(dir);
block_write_end(NULL, mapping, pos, len, len, page, NULL);
 
if (pos+len > dir->i_size) {
@@ -293,7 +294,7 @@ ext2_readdir(struct file *file, struct dir_context *ctx)
unsigned long npages = dir_pages(inode);
unsigned chunk_mask = ~(ext2_chunk_size(inode)-1);
unsigned char *types = NULL;
-   int need_revalidate = file->f_version != inode->i_version;
+   bool need_revalidate = inode_cmp_iversion(inode, file->f_version);
 
if (pos > inode->i_size - EXT2_DIR_REC_LEN(1))
return 0;
@@ -319,8 +320,8 @@ ext2_readdir(struct file *file, struct dir_context *ctx)
offset = ext2_validate_entry(kaddr, offset, 
chunk_mask);
ctx->pos = (n<<PAGE_SHIFT) + offset;
}
-   file->f_version = inode->i_version;
-   need_revalidate = 0;
+   file->f_version = inode_query_iversion(inode);
+   need_revalidate = false;
}
de = (ext2_dirent *)(kaddr+offset);
limit = kaddr + ext2_last_byte(inode, n) - EXT2_DIR_REC_LEN(1);
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 7646818ab266..554c98b8a93a 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ext2.h"
 #include "xattr.h"
 #include "acl.h"
@@ -184,7 +185,7 @@ static struct inode *ext2_alloc_inode(struct super_block 
*sb)
if (!ei)
return NULL;
ei->i_block_alloc_info = NULL;
-   ei->vfs_inode.i_version = 1;
+   inode_set_iversion(>vfs_inode, 1);
 #ifdef CONFIG_QUOTA
memset(>i_dquot, 0, sizeof(ei->i_dquot));
 #endif
@@ -1569,7 +1570,7 @@ static ssize_t ext2_quota_write(struct super_block *sb, 
int type,
return err;
if (inode->i_size < off+len-towrite)
i_size_write(inode, off+len-towrite);
-   inode->i_version++;
+   inode_inc_iversion(inode);
inode->i_mtime = inode->i_ctime = current_time(inode);
mark_inode_dirty(inode);
return len - towrite;
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 10/19] nfs: convert to new i_version API

2017-12-18 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

For NFS, we just use the "raw" API since the i_version is mostly
managed by the server. The exception there is when the client
holds a write delegation, but we only need to bump it once
there anyway to handle CB_GETATTR.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/nfs/delegation.c|  3 ++-
 fs/nfs/fscache-index.c |  5 +++--
 fs/nfs/inode.c | 18 +-
 fs/nfs/nfs4proc.c  | 10 ++
 fs/nfs/nfstrace.h  |  5 +++--
 fs/nfs/write.c |  8 +++-
 6 files changed, 26 insertions(+), 23 deletions(-)

diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index ade44ca0c66c..d8b47624fee2 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -347,7 +348,7 @@ int nfs_inode_set_delegation(struct inode *inode, struct 
rpc_cred *cred, struct
nfs4_stateid_copy(>stateid, >delegation);
delegation->type = res->delegation_type;
delegation->pagemod_limit = res->pagemod_limit;
-   delegation->change_attr = inode->i_version;
+   delegation->change_attr = inode_peek_iversion_raw(inode);
delegation->cred = get_rpccred(cred);
delegation->inode = inode;
delegation->flags = 1<<NFS_DELEGATION_REFERENCED;
diff --git a/fs/nfs/fscache-index.c b/fs/nfs/fscache-index.c
index 3025fe8584a0..0ee4b93d36ea 100644
--- a/fs/nfs/fscache-index.c
+++ b/fs/nfs/fscache-index.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "internal.h"
 #include "fscache.h"
@@ -211,7 +212,7 @@ static uint16_t nfs_fscache_inode_get_aux(const void 
*cookie_netfs_data,
auxdata.ctime = nfsi->vfs_inode.i_ctime;
 
if (NFS_SERVER(>vfs_inode)->nfs_client->rpc_ops->version == 4)
-   auxdata.change_attr = nfsi->vfs_inode.i_version;
+   auxdata.change_attr = inode_peek_iversion_raw(>vfs_inode);
 
if (bufmax > sizeof(auxdata))
bufmax = sizeof(auxdata);
@@ -243,7 +244,7 @@ enum fscache_checkaux nfs_fscache_inode_check_aux(void 
*cookie_netfs_data,
auxdata.ctime = nfsi->vfs_inode.i_ctime;
 
if (NFS_SERVER(>vfs_inode)->nfs_client->rpc_ops->version == 4)
-   auxdata.change_attr = nfsi->vfs_inode.i_version;
+   auxdata.change_attr = inode_peek_iversion_raw(>vfs_inode);
 
if (memcmp(data, , datalen) != 0)
return FSCACHE_CHECKAUX_OBSOLETE;
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index b992d2382ffa..0b85cca1184b 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -38,8 +38,8 @@
 #include 
 #include 
 #include 
-
 #include 
+#include 
 
 #include "nfs4_fs.h"
 #include "callback.h"
@@ -483,7 +483,7 @@ nfs_fhget(struct super_block *sb, struct nfs_fh *fh, struct 
nfs_fattr *fattr, st
memset(>i_atime, 0, sizeof(inode->i_atime));
memset(>i_mtime, 0, sizeof(inode->i_mtime));
memset(>i_ctime, 0, sizeof(inode->i_ctime));
-   inode->i_version = 0;
+   inode_set_iversion_raw(inode, 0);
inode->i_size = 0;
clear_nlink(inode);
inode->i_uid = make_kuid(_user_ns, -2);
@@ -508,7 +508,7 @@ nfs_fhget(struct super_block *sb, struct nfs_fh *fh, struct 
nfs_fattr *fattr, st
else if (nfs_server_capable(inode, NFS_CAP_CTIME))
nfs_set_cache_invalid(inode, NFS_INO_INVALID_ATTR);
if (fattr->valid & NFS_ATTR_FATTR_CHANGE)
-   inode->i_version = fattr->change_attr;
+   inode_set_iversion_raw(inode, fattr->change_attr);
else
nfs_set_cache_invalid(inode, NFS_INO_INVALID_ATTR
| NFS_INO_REVAL_PAGECACHE);
@@ -1289,8 +1289,8 @@ static unsigned long nfs_wcc_update_inode(struct inode 
*inode, struct nfs_fattr
 
if ((fattr->valid & NFS_ATTR_FATTR_PRECHANGE)
&& (fattr->valid & NFS_ATTR_FATTR_CHANGE)
-   && inode->i_version == fattr->pre_change_attr) {
-   inode->i_version = fattr->change_attr;
+   && !inode_cmp_iversion(inode, fattr->pre_change_attr)) {
+   inode_set_iversion_raw(inode, fattr->change_attr);
if (S_ISDIR(inode->i_mode))
nfs_set_cache_invalid(inode, NFS_INO_INVALID_DATA);
ret |= NFS_INO_INVALID_ATTR;
@@ -1348,7 +1348,7 @@ static int nfs_check_inode_attributes(struct inode 
*inode, struct nfs_fattr *fat
 
if (!nfs_file_has_buffered_writers(nfsi)) {
/* Verify a few of the more important attributes */
-  

[PATCH v3 14/19] xfs: convert to new i_version API

2017-12-18 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/xfs/libxfs/xfs_inode_buf.c | 7 +--
 fs/xfs/xfs_icache.c   | 5 +++--
 fs/xfs/xfs_inode.c| 3 ++-
 fs/xfs/xfs_inode_item.c   | 3 ++-
 fs/xfs/xfs_trans_inode.c  | 4 +++-
 5 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index 6b7989038d75..b9c0bf80669c 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -32,6 +32,8 @@
 #include "xfs_ialloc.h"
 #include "xfs_dir2.h"
 
+#include 
+
 /*
  * Check that none of the inode's in the buffer have a next
  * unlinked field of 0.
@@ -264,7 +266,8 @@ xfs_inode_from_disk(
to->di_flags= be16_to_cpu(from->di_flags);
 
if (to->di_version == 3) {
-   inode->i_version = be64_to_cpu(from->di_changecount);
+   inode_set_iversion_queried(inode,
+  be64_to_cpu(from->di_changecount));
to->di_crtime.t_sec = be32_to_cpu(from->di_crtime.t_sec);
to->di_crtime.t_nsec = be32_to_cpu(from->di_crtime.t_nsec);
to->di_flags2 = be64_to_cpu(from->di_flags2);
@@ -314,7 +317,7 @@ xfs_inode_to_disk(
to->di_flags = cpu_to_be16(from->di_flags);
 
if (from->di_version == 3) {
-   to->di_changecount = cpu_to_be64(inode->i_version);
+   to->di_changecount = cpu_to_be64(inode_peek_iversion(inode));
to->di_crtime.t_sec = cpu_to_be32(from->di_crtime.t_sec);
to->di_crtime.t_nsec = cpu_to_be32(from->di_crtime.t_nsec);
to->di_flags2 = cpu_to_be64(from->di_flags2);
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 43005fbe8b1e..4c315adb05e6 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -37,6 +37,7 @@
 
 #include 
 #include 
+#include 
 
 /*
  * Allocate and initialise an xfs_inode.
@@ -293,14 +294,14 @@ xfs_reinit_inode(
int error;
uint32_tnlink = inode->i_nlink;
uint32_tgeneration = inode->i_generation;
-   uint64_tversion = inode->i_version;
+   uint64_tversion = inode_peek_iversion(inode);
umode_t mode = inode->i_mode;
 
error = inode_init_always(mp->m_super, inode);
 
set_nlink(inode, nlink);
inode->i_generation = generation;
-   inode->i_version = version;
+   inode_set_iversion_queried(inode, version);
inode->i_mode = mode;
return error;
 }
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 801274126648..dfc5e60d8af3 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -16,6 +16,7 @@
  * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
  */
 #include 
+#include 
 
 #include "xfs.h"
 #include "xfs_fs.h"
@@ -833,7 +834,7 @@ xfs_ialloc(
ip->i_d.di_flags = 0;
 
if (ip->i_d.di_version == 3) {
-   inode->i_version = 1;
+   inode_set_iversion(inode, 1);
ip->i_d.di_flags2 = 0;
ip->i_d.di_cowextsize = 0;
ip->i_d.di_crtime.t_sec = (int32_t)tv.tv_sec;
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 6ee5c3bf19ad..7571abf5dfb3 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -30,6 +30,7 @@
 #include "xfs_buf_item.h"
 #include "xfs_log.h"
 
+#include 
 
 kmem_zone_t*xfs_ili_zone;  /* inode log item zone */
 
@@ -354,7 +355,7 @@ xfs_inode_to_log_dinode(
to->di_next_unlinked = NULLAGINO;
 
if (from->di_version == 3) {
-   to->di_changecount = inode->i_version;
+   to->di_changecount = inode_peek_iversion(inode);
to->di_crtime.t_sec = from->di_crtime.t_sec;
to->di_crtime.t_nsec = from->di_crtime.t_nsec;
to->di_flags2 = from->di_flags2;
diff --git a/fs/xfs/xfs_trans_inode.c b/fs/xfs/xfs_trans_inode.c
index daa7615497f9..225544327c4f 100644
--- a/fs/xfs/xfs_trans_inode.c
+++ b/fs/xfs/xfs_trans_inode.c
@@ -28,6 +28,8 @@
 #include "xfs_inode_item.h"
 #include "xfs_trace.h"
 
+#include 
+
 /*
  * Add a locked inode to the transaction.
  *
@@ -117,7 +119,7 @@ xfs_trans_log_inode(
 */
if (!(ip->i_itemp->ili_item.li_desc->lid_flags & XFS_LID_DIRTY) &&
IS_I_VERSION(VFS_I(ip))) {
-   VFS_I(ip)->i_version++;
+   inode_inc_iversion(VFS_I(ip));
flags |= XFS_ILOG_CORE;
}
 
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 09/19] ext4: convert to new i_version API

2017-12-18 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
Acked-by: Theodore Ts'o <ty...@mit.edu>
---
 fs/ext4/dir.c|  9 +
 fs/ext4/inline.c |  7 ---
 fs/ext4/inode.c  | 13 +
 fs/ext4/ioctl.c  |  3 ++-
 fs/ext4/namei.c  |  5 +++--
 fs/ext4/super.c  |  3 ++-
 fs/ext4/xattr.c  |  5 +++--
 7 files changed, 28 insertions(+), 17 deletions(-)

diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index d5babc9f222b..afda0a0499ce 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "ext4.h"
 #include "xattr.h"
 
@@ -208,7 +209,7 @@ static int ext4_readdir(struct file *file, struct 
dir_context *ctx)
 * readdir(2), then we might be pointing to an invalid
 * dirent right now.  Scan from the start of the block
 * to make sure. */
-   if (file->f_version != inode->i_version) {
+   if (inode_cmp_iversion(inode, file->f_version)) {
for (i = 0; i < sb->s_blocksize && i < offset; ) {
de = (struct ext4_dir_entry_2 *)
(bh->b_data + i);
@@ -227,7 +228,7 @@ static int ext4_readdir(struct file *file, struct 
dir_context *ctx)
offset = i;
ctx->pos = (ctx->pos & ~(sb->s_blocksize - 1))
| offset;
-   file->f_version = inode->i_version;
+   file->f_version = inode_query_iversion(inode);
}
 
while (ctx->pos < inode->i_size
@@ -568,10 +569,10 @@ static int ext4_dx_readdir(struct file *file, struct 
dir_context *ctx)
 * cached entries.
 */
if ((!info->curr_node) ||
-   (file->f_version != inode->i_version)) {
+   inode_cmp_iversion(inode, file->f_version)) {
info->curr_node = NULL;
free_rb_tree_fname(>root);
-   file->f_version = inode->i_version;
+   file->f_version = inode_query_iversion(inode);
ret = ext4_htree_fill_tree(file, info->curr_hash,
   info->curr_minor_hash,
   >next_hash);
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 1367553c43bb..a8b987b71173 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -14,6 +14,7 @@
 
 #include 
 #include 
+#include 
 
 #include "ext4_jbd2.h"
 #include "ext4.h"
@@ -1042,7 +1043,7 @@ static int ext4_add_dirent_to_inline(handle_t *handle,
 */
dir->i_mtime = dir->i_ctime = current_time(dir);
ext4_update_dx_flag(dir);
-   dir->i_version++;
+   inode_inc_iversion(dir);
return 1;
 }
 
@@ -1494,7 +1495,7 @@ int ext4_read_inline_dir(struct file *file,
 * dirent right now.  Scan from the start of the inline
 * dir to make sure.
 */
-   if (file->f_version != inode->i_version) {
+   if (inode_cmp_iversion(inode, file->f_version)) {
for (i = 0; i < extra_size && i < offset;) {
/*
 * "." is with offset 0 and
@@ -1526,7 +1527,7 @@ int ext4_read_inline_dir(struct file *file,
}
offset = i;
ctx->pos = offset;
-   file->f_version = inode->i_version;
+   file->f_version = inode_query_iversion(inode);
}
 
while (ctx->pos < extra_size) {
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 7df2c5644e59..1b0d54b372f2 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ext4_jbd2.h"
 #include "xattr.h"
@@ -4873,12 +4874,14 @@ struct inode *ext4_iget(struct super_block *sb, 
unsigned long ino)
EXT4_EINODE_GET_XTIME(i_crtime, ei, raw_inode);
 
if (likely(!test_opt2(inode->i_sb, HURD_COMPAT))) {
-   inode->i_version = le32_to_cpu(raw_inode->i_disk_version);
+   u64 ivers = le32_to_cpu(raw_inode->i_disk_version);
+
if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE) {
if (EXT4_FITS_IN_INODE(raw_inode, ei, i_version_hi))
-   inode->i_version |=
+   ivers |=
(__u64)(le32_to_cpu(raw_inode->i_version_hi)) << 32;
}
+   inode_set_iversion_queried(inode, ivers);
}
 
ret = 0;
@@ -5164,11 +5167,13 @@ static int ext4_do_update_inode(handle_t *hand

[PATCH v3 11/19] nfsd: convert to new i_version API

2017-12-18 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Mostly just making sure we use the "get" wrappers so we know when
it is being fetched for later use.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/nfsd/nfsfh.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
index 43f31cf49bae..b8444189223b 100644
--- a/fs/nfsd/nfsfh.h
+++ b/fs/nfsd/nfsfh.h
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static inline __u32 ino_t_to_u32(ino_t ino)
 {
@@ -259,7 +260,7 @@ static inline u64 nfsd4_change_attribute(struct inode 
*inode)
chattr =  inode->i_ctime.tv_sec;
chattr <<= 30;
chattr += inode->i_ctime.tv_nsec;
-   chattr += inode->i_version;
+   chattr += inode_query_iversion(inode);
return chattr;
 }
 
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 13/19] ufs: use new i_version API

2017-12-18 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/ufs/dir.c   | 9 +
 fs/ufs/inode.c | 3 ++-
 fs/ufs/super.c | 3 ++-
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/ufs/dir.c b/fs/ufs/dir.c
index 2edc1755b7c5..50dfce000864 100644
--- a/fs/ufs/dir.c
+++ b/fs/ufs/dir.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ufs_fs.h"
 #include "ufs.h"
@@ -47,7 +48,7 @@ static int ufs_commit_chunk(struct page *page, loff_t pos, 
unsigned len)
struct inode *dir = mapping->host;
int err = 0;
 
-   dir->i_version++;
+   inode_inc_iversion(dir);
block_write_end(NULL, mapping, pos, len, len, page, NULL);
if (pos+len > dir->i_size) {
i_size_write(dir, pos+len);
@@ -428,7 +429,7 @@ ufs_readdir(struct file *file, struct dir_context *ctx)
unsigned long n = pos >> PAGE_SHIFT;
unsigned long npages = dir_pages(inode);
unsigned chunk_mask = ~(UFS_SB(sb)->s_uspi->s_dirblksize - 1);
-   int need_revalidate = file->f_version != inode->i_version;
+   bool need_revalidate = inode_cmp_iversion(inode, file->f_version);
unsigned flags = UFS_SB(sb)->s_flags;
 
UFSD("BEGIN\n");
@@ -455,8 +456,8 @@ ufs_readdir(struct file *file, struct dir_context *ctx)
offset = ufs_validate_entry(sb, kaddr, offset, 
chunk_mask);
ctx->pos = (n<<PAGE_SHIFT) + offset;
}
-   file->f_version = inode->i_version;
-   need_revalidate = 0;
+   file->f_version = inode_query_iversion(inode);
+   need_revalidate = false;
}
de = (struct ufs_dir_entry *)(kaddr+offset);
limit = kaddr + ufs_last_byte(inode, n) - UFS_DIR_REC_LEN(1);
diff --git a/fs/ufs/inode.c b/fs/ufs/inode.c
index afb601c0dda0..c843ec858cf7 100644
--- a/fs/ufs/inode.c
+++ b/fs/ufs/inode.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ufs_fs.h"
 #include "ufs.h"
@@ -693,7 +694,7 @@ struct inode *ufs_iget(struct super_block *sb, unsigned 
long ino)
if (err)
goto bad_inode;
 
-   inode->i_version++;
+   inode_inc_iversion(inode);
ufsi->i_lastfrag =
(inode->i_size + uspi->s_fsize - 1) >> uspi->s_fshift;
ufsi->i_dir_start_lookup = 0;
diff --git a/fs/ufs/super.c b/fs/ufs/super.c
index 4d497e9c6883..b6ba80e05bff 100644
--- a/fs/ufs/super.c
+++ b/fs/ufs/super.c
@@ -88,6 +88,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ufs_fs.h"
 #include "ufs.h"
@@ -1440,7 +1441,7 @@ static struct inode *ufs_alloc_inode(struct super_block 
*sb)
if (!ei)
return NULL;
 
-   ei->vfs_inode.i_version = 1;
+   inode_set_iversion(>vfs_inode, 1);
seqlock_init(>meta_lock);
mutex_init(>truncate_mutex);
return >vfs_inode;
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 15/19] IMA: switch IMA over to new i_version API

2017-12-18 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 security/integrity/ima/ima_api.c  | 3 ++-
 security/integrity/ima/ima_main.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c
index c7e8db0ea4c0..c6ae42266270 100644
--- a/security/integrity/ima/ima_api.c
+++ b/security/integrity/ima/ima_api.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ima.h"
 
@@ -215,7 +216,7 @@ int ima_collect_measurement(struct integrity_iint_cache 
*iint,
 * which do not support i_version, support is limited to an initial
 * measurement/appraisal/audit.
 */
-   i_version = file_inode(file)->i_version;
+   i_version = inode_query_iversion(inode);
hash.hdr.algo = algo;
 
/* Initialize hash digest to 0's in case of failure */
diff --git a/security/integrity/ima/ima_main.c 
b/security/integrity/ima/ima_main.c
index 50b8254d..06a70c5a2329 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ima.h"
 
@@ -128,7 +129,7 @@ static void ima_check_last_writer(struct 
integrity_iint_cache *iint,
inode_lock(inode);
if (atomic_read(>i_writecount) == 1) {
if (!IS_I_VERSION(inode) ||
-   (iint->version != inode->i_version) ||
+   inode_cmp_iversion(inode, iint->version) ||
(iint->flags & IMA_NEW_FILE)) {
iint->flags &= ~(IMA_DONE_MASK | IMA_NEW_FILE);
iint->measured_pcrs = 0;
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 12/19] ocfs2: convert to new i_version API

2017-12-18 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
Reviewed-by: Jan Kara <j...@suse.cz>
---
 fs/ocfs2/dir.c  | 15 ---
 fs/ocfs2/inode.c|  3 ++-
 fs/ocfs2/namei.c|  3 ++-
 fs/ocfs2/quota_global.c |  3 ++-
 4 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c
index febe6312ceff..32f9c72dff17 100644
--- a/fs/ocfs2/dir.c
+++ b/fs/ocfs2/dir.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -1174,7 +1175,7 @@ static int __ocfs2_delete_entry(handle_t *handle, struct 
inode *dir,
le16_add_cpu(>rec_len,
le16_to_cpu(de->rec_len));
de->inode = 0;
-   dir->i_version++;
+   inode_inc_iversion(dir);
ocfs2_journal_dirty(handle, bh);
goto bail;
}
@@ -1729,7 +1730,7 @@ int __ocfs2_add_entry(handle_t *handle,
if (ocfs2_dir_indexed(dir))
ocfs2_recalc_free_list(dir, handle, lookup);
 
-   dir->i_version++;
+   inode_inc_iversion(dir);
ocfs2_journal_dirty(handle, insert_bh);
retval = 0;
goto bail;
@@ -1775,7 +1776,7 @@ static int ocfs2_dir_foreach_blk_id(struct inode *inode,
 * readdir(2), then we might be pointing to an invalid
 * dirent right now.  Scan from the start of the block
 * to make sure. */
-   if (*f_version != inode->i_version) {
+   if (inode_cmp_iversion(inode, *f_version)) {
for (i = 0; i < i_size_read(inode) && i < offset; ) {
de = (struct ocfs2_dir_entry *)
(data->id_data + i);
@@ -1791,7 +1792,7 @@ static int ocfs2_dir_foreach_blk_id(struct inode *inode,
i += le16_to_cpu(de->rec_len);
}
ctx->pos = offset = i;
-   *f_version = inode->i_version;
+   *f_version = inode_query_iversion(inode);
}
 
de = (struct ocfs2_dir_entry *) (data->id_data + ctx->pos);
@@ -1869,7 +1870,7 @@ static int ocfs2_dir_foreach_blk_el(struct inode *inode,
 * readdir(2), then we might be pointing to an invalid
 * dirent right now.  Scan from the start of the block
 * to make sure. */
-   if (*f_version != inode->i_version) {
+   if (inode_cmp_iversion(inode, *f_version)) {
for (i = 0; i < sb->s_blocksize && i < offset; ) {
de = (struct ocfs2_dir_entry *) (bh->b_data + 
i);
/* It's too expensive to do a full
@@ -1886,7 +1887,7 @@ static int ocfs2_dir_foreach_blk_el(struct inode *inode,
offset = i;
ctx->pos = (ctx->pos & ~(sb->s_blocksize - 1))
| offset;
-   *f_version = inode->i_version;
+   *f_version = inode_query_iversion(inode);
}
 
while (ctx->pos < i_size_read(inode)
@@ -1940,7 +1941,7 @@ static int ocfs2_dir_foreach_blk(struct inode *inode, u64 
*f_version,
  */
 int ocfs2_dir_foreach(struct inode *inode, struct dir_context *ctx)
 {
-   u64 version = inode->i_version;
+   u64 version = inode_query_iversion(inode);
ocfs2_dir_foreach_blk(inode, , ctx, true);
return 0;
 }
diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
index 1a1e0078ab38..d51b80edd972 100644
--- a/fs/ocfs2/inode.c
+++ b/fs/ocfs2/inode.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -302,7 +303,7 @@ void ocfs2_populate_inode(struct inode *inode, struct 
ocfs2_dinode *fe,
OCFS2_I(inode)->ip_attr = le32_to_cpu(fe->i_attr);
OCFS2_I(inode)->ip_dyn_features = le16_to_cpu(fe->i_dyn_features);
 
-   inode->i_version = 1;
+   inode_set_iversion(inode, 1);
inode->i_generation = le32_to_cpu(fe->i_generation);
inode->i_rdev = huge_decode_dev(le64_to_cpu(fe->id1.dev1.i_rdev));
inode->i_mode = le16_to_cpu(fe->i_mode);
diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
index 3b0a10d9b36f..c801eddc4bf3 100644
--- a/fs/ocfs2/namei.c
+++ b/fs/ocfs2/namei.c
@@ -41,6 +41,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -1520,7 +1521,7 @@ static int ocfs2_rename(struct inode *old_dir,
mlog_errno(status);
goto bail;

[PATCH v3 17/19] xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need incrementing

2017-12-18 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

If XFS_ILOG_CORE is already set then go ahead and increment it.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/xfs/xfs_trans_inode.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_trans_inode.c b/fs/xfs/xfs_trans_inode.c
index 225544327c4f..4a89da4b6fe7 100644
--- a/fs/xfs/xfs_trans_inode.c
+++ b/fs/xfs/xfs_trans_inode.c
@@ -112,15 +112,17 @@ xfs_trans_log_inode(
 
/*
 * First time we log the inode in a transaction, bump the inode change
-* counter if it is configured for this to occur. We don't use
-* inode_inc_version() because there is no need for extra locking around
-* i_version as we already hold the inode locked exclusively for
-* metadata modification.
+* counter if it is configured for this to occur. While we have the
+* inode locked exclusively for metadata modification, we can usually
+* avoid setting XFS_ILOG_CORE if no one has queried the value since
+* the last time it was incremented. If we have XFS_ILOG_CORE already
+* set however, then go ahead and bump the i_version counter
+* unconditionally.
 */
if (!(ip->i_itemp->ili_item.li_desc->lid_flags & XFS_LID_DIRTY) &&
IS_I_VERSION(VFS_I(ip))) {
-   inode_inc_iversion(VFS_I(ip));
-   flags |= XFS_ILOG_CORE;
+   if (inode_maybe_inc_iversion(VFS_I(ip), flags & XFS_ILOG_CORE))
+   flags |= XFS_ILOG_CORE;
}
 
tp->t_flags |= XFS_TRANS_DIRTY;
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 16/19] fs: only set S_VERSION when updating times if necessary

2017-12-18 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

We only really need to update i_version if someone has queried for it
since we last incremented it. By doing that, we can avoid having to
update the inode if the times haven't changed.

If the times have changed, then we go ahead and forcibly increment the
counter, under the assumption that we'll be going to the storage
anyway, and the increment itself is relatively cheap.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/inode.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 03102d6ef044..83f6cfc3cde7 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 
 /*
@@ -1634,17 +1635,24 @@ static int relatime_need_update(const struct path 
*path, struct inode *inode,
 int generic_update_time(struct inode *inode, struct timespec *time, int flags)
 {
int iflags = I_DIRTY_TIME;
+   bool dirty = false;
 
-   if (flags & S_ATIME)
+   if (flags & S_ATIME) {
inode->i_atime = *time;
+   dirty |= !(inode->i_sb->s_flags & SB_LAZYTIME);
+   }
if (flags & S_VERSION)
-   inode_inc_iversion(inode);
-   if (flags & S_CTIME)
+   dirty |= inode_maybe_inc_iversion(inode, dirty);
+   if (flags & S_CTIME) {
inode->i_ctime = *time;
-   if (flags & S_MTIME)
+   dirty = true;
+   }
+   if (flags & S_MTIME) {
inode->i_mtime = *time;
+   dirty = true;
+   }
 
-   if (!(inode->i_sb->s_flags & SB_LAZYTIME) || (flags & S_VERSION))
+   if (dirty)
iflags |= I_DIRTY_SYNC;
__mark_inode_dirty(inode, iflags);
return 0;
@@ -1863,7 +1871,7 @@ int file_update_time(struct file *file)
if (!timespec_equal(>i_ctime, ))
sync_it |= S_CTIME;
 
-   if (IS_I_VERSION(inode))
+   if (IS_I_VERSION(inode) && inode_iversion_need_inc(inode))
sync_it |= S_VERSION;
 
if (!sync_it)
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 18/19] btrfs: only dirty the inode in btrfs_update_time if something was changed

2017-12-18 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

At this point, we know that "now" and the file times may differ, and we
suspect that the i_version has been flagged to be bumped. Attempt to
bump the i_version, and only mark the inode dirty if that actually
occurred or if one of the times was updated.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/btrfs/inode.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index ac8692849a81..76245323a7c8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6107,19 +6107,20 @@ static int btrfs_update_time(struct inode *inode, 
struct timespec *now,
 int flags)
 {
struct btrfs_root *root = BTRFS_I(inode)->root;
+   bool dirty = flags & ~S_VERSION;
 
if (btrfs_root_readonly(root))
return -EROFS;
 
if (flags & S_VERSION)
-   inode_inc_iversion(inode);
+   dirty |= inode_maybe_inc_iversion(inode, dirty);
if (flags & S_CTIME)
inode->i_ctime = *now;
if (flags & S_MTIME)
inode->i_mtime = *now;
if (flags & S_ATIME)
inode->i_atime = *now;
-   return btrfs_dirty_inode(inode);
+   return dirty ? btrfs_dirty_inode(inode) : 0;
 }
 
 /*
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 19/19] fs: handle inode->i_version more efficiently

2017-12-18 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Since i_version is mostly treated as an opaque value, we can exploit that
fact to avoid incrementing it when no one is watching. With that change,
we can avoid incrementing the counter on writes, unless someone has
queried for it since it was last incremented. If the a/c/mtime don't
change, and the i_version hasn't changed, then there's no need to dirty
the inode metadata on a write.

Convert the i_version counter to an atomic64_t, and use the lowest order
bit to hold a flag that will tell whether anyone has queried the value
since it was last incremented.

When we go to maybe increment it, we fetch the value and check the flag
bit.  If it's clear then we don't need to do anything if the update
isn't being forced.

If we do need to update, then we increment the counter by 2, and clear
the flag bit, and then use a CAS op to swap it into place. If that
works, we return true. If it doesn't then do it again with the value
that we fetch from the CAS operation.

On the query side, if the flag is already set, then we just shift the
value down by 1 bit and return it. Otherwise, we set the flag in our
on-stack value and again use cmpxchg to swap it into place if it hasn't
changed. If it has, then we use the value from the cmpxchg as the new
"old" value and try again.

This method allows us to avoid incrementing the counter on writes (and
dirtying the metadata) under typical workloads. We only need to increment
if it has been queried since it was last changed.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 include/linux/fs.h   |   2 +-
 include/linux/iversion.h | 180 ++-
 2 files changed, 131 insertions(+), 51 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 76382c24e9d0..6804d075933e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -639,7 +639,7 @@ struct inode {
struct hlist_head   i_dentry;
struct rcu_head i_rcu;
};
-   u64 i_version;
+   atomic64_t  i_version;
atomic_ti_count;
atomic_ti_dio_count;
atomic_ti_writecount;
diff --git a/include/linux/iversion.h b/include/linux/iversion.h
index e08c634779df..a9fbf99709df 100644
--- a/include/linux/iversion.h
+++ b/include/linux/iversion.h
@@ -5,6 +5,8 @@
 #include 
 
 /*
+ * The inode->i_version field:
+ * ---
  * The change attribute (i_version) is mandated by NFSv4 and is mostly for
  * knfsd, but is also used for other purposes (e.g. IMA). The i_version must
  * appear different to observers if there was a change to the inode's data or
@@ -27,86 +29,143 @@
  * i_version on namespace changes in directories (mkdir, rmdir, unlink, etc.).
  * We consider these sorts of filesystems to have a kernel-managed i_version.
  *
+ * This implementation uses the low bit in the i_version field as a flag to
+ * track when the value has been queried. If it has not been queried since it
+ * was last incremented, we can skip the increment in most cases.
+ *
+ * In the event that we're updating the ctime, we will usually go ahead and
+ * bump the i_version anyway. Since that has to go to stable storage in some
+ * fashion, we might as well increment it as well.
+ *
+ * With this implementation, the value should always appear to observers to
+ * increase over time if the file has changed. It's recommended to use
+ * inode_cmp_iversion() helper to compare values.
+ *
  * Note that some filesystems (e.g. NFS and AFS) just use the field to store
- * a server-provided value (for the most part). For that reason, those
+ * a server-provided value for the most part. For that reason, those
  * filesystems do not set SB_I_VERSION. These filesystems are considered to
  * have a self-managed i_version.
+ *
+ * Persistently storing the i_version
+ * --
+ * Queries of the i_version field are not gated on them hitting the backing
+ * store. It's always possible that the host could crash after allowing
+ * a query of the value but before it has made it to disk.
+ *
+ * To mitigate this problem, filesystems should always use
+ * inode_set_iversion_queried when loading an existing inode from disk. This
+ * ensures that the next attempted inode increment will result in the value
+ * changing.
+ *
+ * Storing the value to disk therefore does not count as a query, so those
+ * filesystems should use inode_peek_iversion to grab the value to be stored.
+ * There is no need to flag the value as having been queried in that case.
  */
 
+/*
+ * We borrow the lowest bit in the i_version to use as a flag to tell whether
+ * it has been queried since we last incremented it. If it has, then we must
+ * increment it on the next change. After that, we can clear the flag and
+ * avoid incrementing it again until it has again be

Re: [PATCH v2 01/19] fs: new API for handling inode->i_version

2017-12-16 Thread Jeff Layton
On Sun, 2017-12-17 at 09:37 +1100, Dave Chinner wrote:
> On Sat, Dec 16, 2017 at 08:46:38AM -0500, Jeff Layton wrote:
> > From: Jeff Layton <jlay...@redhat.com>
> > 
> > Add a documentation blob that explains what the i_version field is, how
> > it is expected to work, and how it is currently implemented by various
> > filesystems.
> > 
> > We already have inode_inc_iversion. Add several other functions for
> > manipulating and accessing the i_version counter. For now, the
> > implementation is trivial and basically works the way that all of the
> > open-coded i_version accesses work today.
> > 
> > Future patches will convert existing users of i_version to use the new
> > API, and then convert the backend implementation to do things more
> > efficiently.
> > 
> > Signed-off-by: Jeff Layton <jlay...@redhat.com>
> > ---
> >  include/linux/fs.h | 200 
> > ++---
> >  1 file changed, 192 insertions(+), 8 deletions(-)
> 
> Just a random sunday morning coffee musing
> 
> I was just wondering if it would be better to split this stuff out
> into it's own header file now? include/linux/fs.h is aleady a
> massive header file (~3500 lines) and changes cause tree-wide
> rebuilds, so maybe it would be better to split relatively isolated
> functionality like this out while it's being reworked and you're
> already touching every file that uses it?
> 
> Cheers,
> 
> Dave.

That's a good idea. Let me do that and I'll re-post.

Thanks,
-- 
Jeff Layton <jlay...@kernel.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 05/19] afs: convert to new i_version API

2017-12-16 Thread Jeff Layton
On Sat, 2017-12-16 at 11:18 -0500, Jeffrey Altman wrote:
> Hi Jeff,
> 
> A few thoughts on AFS usage below which might impact a future revision
> of the API.  I hope they are useful.
> 
> On 12/16/2017 8:49 AM, Jeff Layton wrote:
> > On Sat, 2017-12-16 at 08:46 -0500, Jeff Layton wrote:
> > > From: Jeff Layton <jlay...@redhat.com>
> > > 
> > > For AFS, it's generally treated as an opaque value, so we use the
> > > *_raw variants of the API here.
> > > 
> > > Note that AFS has quite a different definition for this counter. AFS
> > > only increments it on changes to the data, not for the metadata. We'll
> > > need to reconcile that somehow if we ever want to present this to
> > > userspace via statx.
> > > 
> 
> From the patch series notes:
> 
> "The inode->i_version field is supposed to be a value that changes
> whenever there is any data or metadata change to the inode. Some
> filesystems use it internally to detect directory changes during
> readdir. knfsd will use it if the filesystem has MS_I_VERSION set. IMA
> will also use it to optimize away some remeasurement if it's available.
> NFS and AFS just use it to store an opaque change attribute from the
> server.
> 
> "Only btrfs, ext4, and xfs increment it for data changes. Because of
> this, these filesystems must log the inode to disk whenever the
> i_version counter changes. That has a non-zero performance impact,
> especially on write-heavy workloads, because we end up dirtying the
> inode metadata on every write, not just when the times change. [1]"
> 
> 
> The AFS/AuriStorFS data version is an unsigned 64-bit value that is
> incremented by the file server as part of a data changing operation. For
> files, a StoreData and for directories entry manipulations such as
> create, rename, delete.  This data version is used to tag the version of
> any subset of the data stream for caching and replication purposes.
> 
> As Jeff notes, the AFS data version is not incremented for metadata
> changes.  Metadata cannot be trusted by clients without acquiring a
> callback promise from a fileserver.  The callback promise will either be
> satisfied by the issuing fileserver sending a CallBack notification that
> the metadata is no longer valid OR the callback promise will expire.
> 
> Something else that is important to note that it is assumed that local
> data changes that occur under a valid callback promise is newer than the
> data on the fileserver.  It might be useful if the new i_version API
> supported major and minor version numbers.  AFS implementations would
> store the fileserver provided data version number as the major version
> and would increment the minor version when local changes have been made
> which have yet to be stored back to the fileserver.  This functionality
> would be especially useful if disconnected operations were implemented
> for the AFS implementation.
> 
> It might also be useful to separate metadata version and data version
> although some filesystems would set the same value to both.  For AFS,
> the metadata major version would the timestamp at which the callback was
> issued.
> 
> Jeffrey Altman

Thanks. That seems like rather specialized use case.

If we did want to go that route, we'd probably need to give filesystems
a way to overload how i_version is handled and queried (maybe some new
inode ops?).

Given that nothing ever looks at the the i_version in kAFS now, I don't
have a lot of incentive to do anything along those lines in this set. I
think this patchset will probably make it simpler to do something like
that in the future, if you were motivated to do so though.
-- 
Jeff Layton <jlay...@kernel.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 01/19] fs: new API for handling inode->i_version

2017-12-16 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Add a documentation blob that explains what the i_version field is, how
it is expected to work, and how it is currently implemented by various
filesystems.

We already have inode_inc_iversion. Add several other functions for
manipulating and accessing the i_version counter. For now, the
implementation is trivial and basically works the way that all of the
open-coded i_version accesses work today.

Future patches will convert existing users of i_version to use the new
API, and then convert the backend implementation to do things more
efficiently.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 include/linux/fs.h | 200 ++---
 1 file changed, 192 insertions(+), 8 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 511fbaabf624..5001e77342fd 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2036,19 +2036,203 @@ static inline void inode_dec_link_count(struct inode 
*inode)
mark_inode_dirty(inode);
 }
 
+/*
+ * The change attribute (i_version) is mandated by NFSv4 and is mostly for
+ * knfsd, but is also used for other purposes (e.g. IMA). The i_version must
+ * appear different to observers if there was a change to the inode's data or
+ * metadata since it was last queried.
+ *
+ * It should be considered an opaque value by observers. If it remains the same
+ * since it was last checked, then nothing has changed in the inode. If it's
+ * different then something has changed. Observers cannot infer anything about
+ * the nature or magnitude of the changes from the value, only that the inode
+ * has changed in some fashion.
+ *
+ * Not all filesystems properly implement the i_version counter. Subsystems 
that
+ * want to use i_version field on an inode should first check whether the
+ * filesystem sets the SB_I_VERSION flag (usually via the IS_I_VERSION macro).
+ *
+ * Those that set SB_I_VERSION will automatically have their i_version counter
+ * incremented on writes to normal files. If the SB_I_VERSION is not set, then
+ * the VFS will not touch it on writes, and the filesystem can use it how it
+ * wishes. Note that the filesystem is always responsible for updating the
+ * i_version on namespace changes in directories (mkdir, rmdir, unlink, etc.).
+ * We consider these sorts of filesystems to have a kernel-managed i_version.
+ *
+ * Note that some filesystems (e.g. NFS and AFS) just use the field to store
+ * a server-provided value (for the most part). For that reason, those
+ * filesystems do not set SB_I_VERSION. These filesystems are considered to
+ * have a self-managed i_version.
+ */
+
+/**
+ * inode_set_iversion_raw - set i_version to the specified raw value
+ * @inode: inode to set
+ * @new: new i_version value to set
+ *
+ * Set @inode's i_version field to @new. This function is for use by
+ * filesystems that self-manage the i_version.
+ *
+ * For example, the NFS client stores its NFSv4 change attribute in this way,
+ * and the AFS client stores the data_version from the server here.
+ */
+static inline void
+inode_set_iversion_raw(struct inode *inode, const u64 new)
+{
+   inode->i_version = new;
+}
+
+/**
+ * inode_set_iversion - set i_version to a particular value
+ * @inode: inode to set
+ * @new: new i_version value to set
+ *
+ * Set @inode's i_version field to @new. This function is for filesystems with
+ * a kernel-managed i_version.
+ *
+ * For now, this just does the same thing as the _raw variant.
+ */
+static inline void
+inode_set_iversion(struct inode *inode, const u64 new)
+{
+   inode_set_iversion_raw(inode, new);
+}
+
+/**
+ * inode_set_iversion_queried - set i_version to a particular value and set
+ *  flag to indicate that it has been viewed
+ * @inode: inode to set
+ * @new: new i_version value to set
+ *
+ * When loading in an i_version value from a backing store, we typically don't
+ * know whether it was previously viewed before being stored or not. Thus, we
+ * must assume that it was, to ensure that any changes will result in the
+ * value changing.
+ *
+ * This function will set the inode's i_version, and possibly flag the value
+ * as if it has already been viewed at least once.
+ *
+ * For now, this just does what inode_set_iversion does.
+ */
+static inline void
+inode_set_iversion_queried(struct inode *inode, const u64 new)
+{
+   inode_set_iversion(inode, new);
+}
+
+/**
+ * inode_maybe_inc_iversion - increments i_version
+ * @inode: inode with the i_version that should be updated
+ * @force: increment the counter even if it's not necessary
+ *
+ * Every time the inode is modified, the i_version field must be seen to have
+ * changed by any observer.
+ *
+ * In this implementation, we always increment it after taking the i_lock to
+ * ensure that we don't race with other incrementors.
+ *
+ * Returns true if counter was bumped, and false if it wasn't.
+ */
+sta

[PATCH v2 03/19] fat: convert to new i_version API

2017-12-16 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/fat/dir.c |  2 +-
 fs/fat/inode.c   |  8 
 fs/fat/namei_msdos.c |  6 +++---
 fs/fat/namei_vfat.c  | 20 ++--
 4 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/fs/fat/dir.c b/fs/fat/dir.c
index b833ffeee1e1..221d85529c5f 100644
--- a/fs/fat/dir.c
+++ b/fs/fat/dir.c
@@ -1055,7 +1055,7 @@ int fat_remove_entries(struct inode *dir, struct 
fat_slot_info *sinfo)
brelse(bh);
if (err)
return err;
-   dir->i_version++;
+   inode_inc_iversion(dir);
 
if (nr_slots) {
/*
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 20a0a89eaca5..a444336c1eed 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -507,7 +507,7 @@ int fat_fill_inode(struct inode *inode, struct 
msdos_dir_entry *de)
MSDOS_I(inode)->i_pos = 0;
inode->i_uid = sbi->options.fs_uid;
inode->i_gid = sbi->options.fs_gid;
-   inode->i_version++;
+   inode_inc_iversion(inode);
inode->i_generation = get_seconds();
 
if ((de->attr & ATTR_DIR) && !IS_FREE(de->name)) {
@@ -590,7 +590,7 @@ struct inode *fat_build_inode(struct super_block *sb,
goto out;
}
inode->i_ino = iunique(sb, MSDOS_ROOT_INO);
-   inode->i_version = 1;
+   inode_set_iversion(inode, 1);
err = fat_fill_inode(inode, de);
if (err) {
iput(inode);
@@ -1377,7 +1377,7 @@ static int fat_read_root(struct inode *inode)
MSDOS_I(inode)->i_pos = MSDOS_ROOT_INO;
inode->i_uid = sbi->options.fs_uid;
inode->i_gid = sbi->options.fs_gid;
-   inode->i_version++;
+   inode_inc_iversion(inode);
inode->i_generation = 0;
inode->i_mode = fat_make_mode(sbi, ATTR_DIR, S_IRWXUGO);
inode->i_op = sbi->dir_ops;
@@ -1828,7 +1828,7 @@ int fat_fill_super(struct super_block *sb, void *data, 
int silent, int isvfat,
if (!root_inode)
goto out_fail;
root_inode->i_ino = MSDOS_ROOT_INO;
-   root_inode->i_version = 1;
+   inode_set_iversion(root_inode, 1);
error = fat_read_root(root_inode);
if (error < 0) {
iput(root_inode);
diff --git a/fs/fat/namei_msdos.c b/fs/fat/namei_msdos.c
index d24d2758a363..7b71c1b242ce 100644
--- a/fs/fat/namei_msdos.c
+++ b/fs/fat/namei_msdos.c
@@ -480,7 +480,7 @@ static int do_msdos_rename(struct inode *old_dir, unsigned 
char *old_name,
} else
mark_inode_dirty(old_inode);
 
-   old_dir->i_version++;
+   inode_inc_iversion(old_dir);
old_dir->i_ctime = old_dir->i_mtime = 
current_time(old_dir);
if (IS_DIRSYNC(old_dir))
(void)fat_sync_inode(old_dir);
@@ -508,7 +508,7 @@ static int do_msdos_rename(struct inode *old_dir, unsigned 
char *old_name,
goto out;
new_i_pos = sinfo.i_pos;
}
-   new_dir->i_version++;
+   inode_inc_iversion(new_dir);
 
fat_detach(old_inode);
fat_attach(old_inode, new_i_pos);
@@ -540,7 +540,7 @@ static int do_msdos_rename(struct inode *old_dir, unsigned 
char *old_name,
old_sinfo.bh = NULL;
if (err)
goto error_dotdot;
-   old_dir->i_version++;
+   inode_inc_iversion(old_dir);
old_dir->i_ctime = old_dir->i_mtime = ts;
if (IS_DIRSYNC(old_dir))
(void)fat_sync_inode(old_dir);
diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 02c03a3a..fe1bb65149c8 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -46,7 +46,7 @@ static int vfat_revalidate_shortname(struct dentry *dentry)
 {
int ret = 1;
spin_lock(>d_lock);
-   if (vfat_d_version(dentry) != d_inode(dentry->d_parent)->i_version)
+   if (inode_cmp_iversion(d_inode(dentry->d_parent), 
vfat_d_version(dentry)))
ret = 0;
spin_unlock(>d_lock);
return ret;
@@ -759,7 +759,7 @@ static struct dentry *vfat_lookup(struct inode *dir, struct 
dentry *dentry,
 out:
mutex_unlock(_SB(sb)->s_lock);
if (!inode)
-   vfat_d_version_set(dentry, dir->i_version);
+   vfat_d_version_set(dentry, inode_query_iversion(dir));
return d_splice_alias(inode, dentry);
 error:
mutex_unlock(_SB(sb)->s_lock);
@@ -781,7 +781,7 @@ static int vfat_create(struct inode *dir, struct dentry 
*dentry, umode_t mode,
err = vfat_add_entry(dir, >d_name, 0, 0, , );
if (err)
goto out;
-   dir->i_version++;
+   inode_inc_iversion(dir);
 
inode = fat_build_inode(sb, sinfo.de, sinfo.i_pos);
bre

[PATCH v2 02/19] fs: don't take the i_lock in inode_inc_iversion

2017-12-16 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

The rationale for taking the i_lock when incrementing this value is
lost in antiquity. The readers of the field don't take it (at least
not universally), so my assumption is that it was only done here to
serialize incrementors.

If that is indeed the case, then we can drop the i_lock from this
codepath and treat it as a atomic64_t for the purposes of
incrementing it. This allows us to use inode_inc_iversion without
any danger of lock inversion.

Note that the read side is not fetched atomically with this change.
The assumption here is that that is not a critical issue since the
i_version is not fully synchronized with anything else anyway.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 include/linux/fs.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

I'm not dead-set on this patch, but I was a little leery of adding
the i_lock into all of the places that do inode->i_version++ today,
even as an interim step. We can probably drop this patch if it's too
ugly to live...

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 5001e77342fd..c234fac4bb77 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2136,9 +2136,9 @@ inode_set_iversion_queried(struct inode *inode, const u64 
new)
 static inline bool
 inode_maybe_inc_iversion(struct inode *inode, bool force)
 {
-   spin_lock(>i_lock);
-   inode->i_version++;
-   spin_unlock(>i_lock);
+   atomic64_t *ivp = (atomic64_t *)>i_version;
+
+   atomic64_inc(ivp);
return true;
 }
 
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 05/19] afs: convert to new i_version API

2017-12-16 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

For AFS, it's generally treated as an opaque value, so we use the
*_raw variants of the API here.

Note that AFS has quite a different definition for this counter. AFS
only increments it on changes to the data, not for the metadata. We'll
need to reconcile that somehow if we ever want to present this to
userspace via statx.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/afs/fsclient.c | 2 +-
 fs/afs/inode.c| 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
index b90ef39ae914..67ed9bb5fe31 100644
--- a/fs/afs/fsclient.c
+++ b/fs/afs/fsclient.c
@@ -124,7 +124,7 @@ static void xdr_decode_AFSFetchStatus(const __be32 **_bp,
vnode->vfs_inode.i_ctime.tv_sec = status->mtime_client;
vnode->vfs_inode.i_mtime= vnode->vfs_inode.i_ctime;
vnode->vfs_inode.i_atime= vnode->vfs_inode.i_ctime;
-   vnode->vfs_inode.i_version  = data_version;
+   inode_set_iversion_raw(>vfs_inode, data_version);
}
 
expected_version = status->data_version;
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 3415eb7484f6..af9577210a46 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -89,7 +89,7 @@ static int afs_inode_map_status(struct afs_vnode *vnode, 
struct key *key)
inode->i_atime  = inode->i_mtime = inode->i_ctime;
inode->i_blocks = 0;
inode->i_generation = vnode->fid.unique;
-   inode->i_version= vnode->status.data_version;
+   inode_set_iversion_raw(inode, vnode->status.data_version);
inode->i_mapping->a_ops = _fs_aops;
 
read_sequnlock_excl(>cb_lock);
@@ -218,7 +218,7 @@ struct inode *afs_iget_autocell(struct inode *dir, const 
char *dev_name,
inode->i_ctime.tv_nsec  = 0;
inode->i_atime  = inode->i_mtime = inode->i_ctime;
inode->i_blocks = 0;
-   inode->i_version= 0;
+   inode_set_iversion_raw(inode, 0);
inode->i_generation = 0;
 
set_bit(AFS_VNODE_PSEUDODIR, >flags);
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 04/19] affs: convert to new i_version API

2017-12-16 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/affs/amigaffs.c | 4 ++--
 fs/affs/dir.c  | 4 ++--
 fs/affs/super.c| 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/affs/amigaffs.c b/fs/affs/amigaffs.c
index 0f0e6925e97d..085a93bea1e2 100644
--- a/fs/affs/amigaffs.c
+++ b/fs/affs/amigaffs.c
@@ -60,7 +60,7 @@ affs_insert_hash(struct inode *dir, struct buffer_head *bh)
affs_brelse(dir_bh);
 
dir->i_mtime = dir->i_ctime = current_time(dir);
-   dir->i_version++;
+   inode_inc_iversion(dir);
mark_inode_dirty(dir);
 
return 0;
@@ -114,7 +114,7 @@ affs_remove_hash(struct inode *dir, struct buffer_head 
*rem_bh)
affs_brelse(bh);
 
dir->i_mtime = dir->i_ctime = current_time(dir);
-   dir->i_version++;
+   inode_inc_iversion(dir);
mark_inode_dirty(dir);
 
return retval;
diff --git a/fs/affs/dir.c b/fs/affs/dir.c
index a105e77df2c1..1a8c7b067c44 100644
--- a/fs/affs/dir.c
+++ b/fs/affs/dir.c
@@ -80,7 +80,7 @@ affs_readdir(struct file *file, struct dir_context *ctx)
 * we can jump directly to where we left off.
 */
ino = (u32)(long)file->private_data;
-   if (ino && file->f_version == inode->i_version) {
+   if (ino && inode_cmp_iversion(inode, file->f_version) == 0) {
pr_debug("readdir() left off=%d\n", ino);
goto inside;
}
@@ -130,7 +130,7 @@ affs_readdir(struct file *file, struct dir_context *ctx)
} while (ino);
}
 done:
-   file->f_version = inode->i_version;
+   file->f_version = inode_query_iversion(inode);
file->private_data = (void *)(long)ino;
affs_brelse(fh_bh);
 
diff --git a/fs/affs/super.c b/fs/affs/super.c
index 1117e36134cc..70596d4084de 100644
--- a/fs/affs/super.c
+++ b/fs/affs/super.c
@@ -102,7 +102,7 @@ static struct inode *affs_alloc_inode(struct super_block 
*sb)
if (!i)
return NULL;
 
-   i->vfs_inode.i_version = 1;
+   inode_set_iversion(>vfs_inode, 1);
i->i_lc = NULL;
i->i_ext_bh = NULL;
i->i_pa_cnt = 0;
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 07/19] exofs: switch to new i_version API

2017-12-16 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/exofs/dir.c   | 8 
 fs/exofs/super.c | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/exofs/dir.c b/fs/exofs/dir.c
index 98233a97b7b8..36e314083b80 100644
--- a/fs/exofs/dir.c
+++ b/fs/exofs/dir.c
@@ -60,7 +60,7 @@ static int exofs_commit_chunk(struct page *page, loff_t pos, 
unsigned len)
struct inode *dir = mapping->host;
int err = 0;
 
-   dir->i_version++;
+   inode_inc_iversion(dir);
 
if (!PageUptodate(page))
SetPageUptodate(page);
@@ -241,7 +241,7 @@ exofs_readdir(struct file *file, struct dir_context *ctx)
unsigned long n = pos >> PAGE_SHIFT;
unsigned long npages = dir_pages(inode);
unsigned chunk_mask = ~(exofs_chunk_size(inode)-1);
-   int need_revalidate = (file->f_version != inode->i_version);
+   bool need_revalidate = inode_cmp_iversion(inode, file->f_version);
 
if (pos > inode->i_size - EXOFS_DIR_REC_LEN(1))
return 0;
@@ -264,8 +264,8 @@ exofs_readdir(struct file *file, struct dir_context *ctx)
chunk_mask);
ctx->pos = (n<<PAGE_SHIFT) + offset;
}
-   file->f_version = inode->i_version;
-   need_revalidate = 0;
+   file->f_version = inode_query_iversion(inode);
+   need_revalidate = false;
}
de = (struct exofs_dir_entry *)(kaddr + offset);
limit = kaddr + exofs_last_byte(inode, n) -
diff --git a/fs/exofs/super.c b/fs/exofs/super.c
index 819624cfc8da..6746999137df 100644
--- a/fs/exofs/super.c
+++ b/fs/exofs/super.c
@@ -159,7 +159,7 @@ static struct inode *exofs_alloc_inode(struct super_block 
*sb)
if (!oi)
return NULL;
 
-   oi->vfs_inode.i_version = 1;
+   inode_set_iversion(>vfs_inode, 1);
return >vfs_inode;
 }
 
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 06/19] btrfs: convert to new i_version API

2017-12-16 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/btrfs/delayed-inode.c | 6 --
 fs/btrfs/inode.c | 6 --
 fs/btrfs/tree-log.c  | 3 ++-
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 5d73f79ded8b..1f0cd7c904c6 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1700,7 +1700,8 @@ static void fill_stack_inode_item(struct 
btrfs_trans_handle *trans,
btrfs_set_stack_inode_nbytes(inode_item, inode_get_bytes(inode));
btrfs_set_stack_inode_generation(inode_item,
 BTRFS_I(inode)->generation);
-   btrfs_set_stack_inode_sequence(inode_item, inode->i_version);
+   btrfs_set_stack_inode_sequence(inode_item,
+  inode_peek_iversion(inode));
btrfs_set_stack_inode_transid(inode_item, trans->transid);
btrfs_set_stack_inode_rdev(inode_item, inode->i_rdev);
btrfs_set_stack_inode_flags(inode_item, BTRFS_I(inode)->flags);
@@ -1754,7 +1755,8 @@ int btrfs_fill_inode(struct inode *inode, u32 *rdev)
BTRFS_I(inode)->generation = btrfs_stack_inode_generation(inode_item);
 BTRFS_I(inode)->last_trans = btrfs_stack_inode_transid(inode_item);
 
-   inode->i_version = btrfs_stack_inode_sequence(inode_item);
+   inode_set_iversion_queried(inode,
+  btrfs_stack_inode_sequence(inode_item));
inode->i_rdev = 0;
*rdev = btrfs_stack_inode_rdev(inode_item);
BTRFS_I(inode)->flags = btrfs_stack_inode_flags(inode_item);
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e1a7f3cb5be9..1dfa299a31a9 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3777,7 +3777,8 @@ static int btrfs_read_locked_inode(struct inode *inode)
BTRFS_I(inode)->generation = btrfs_inode_generation(leaf, inode_item);
BTRFS_I(inode)->last_trans = btrfs_inode_transid(leaf, inode_item);
 
-   inode->i_version = btrfs_inode_sequence(leaf, inode_item);
+   inode_set_iversion_queried(inode,
+  btrfs_inode_sequence(leaf, inode_item));
inode->i_generation = BTRFS_I(inode)->generation;
inode->i_rdev = 0;
rdev = btrfs_inode_rdev(leaf, inode_item);
@@ -3945,7 +3946,8 @@ static void fill_inode_item(struct btrfs_trans_handle 
*trans,
 );
btrfs_set_token_inode_generation(leaf, item, BTRFS_I(inode)->generation,
 );
-   btrfs_set_token_inode_sequence(leaf, item, inode->i_version, );
+   btrfs_set_token_inode_sequence(leaf, item, inode_peek_iversion(inode),
+  );
btrfs_set_token_inode_transid(leaf, item, trans->transid, );
btrfs_set_token_inode_rdev(leaf, item, inode->i_rdev, );
btrfs_set_token_inode_flags(leaf, item, BTRFS_I(inode)->flags, );
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 7bf9b31561db..3c93e456d9f7 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -3609,7 +3609,8 @@ static void fill_inode_item(struct btrfs_trans_handle 
*trans,
btrfs_set_token_inode_nbytes(leaf, item, inode_get_bytes(inode),
 );
 
-   btrfs_set_token_inode_sequence(leaf, item, inode->i_version, );
+   btrfs_set_token_inode_sequence(leaf, item,
+  inode_peek_iversion(inode), );
btrfs_set_token_inode_transid(leaf, item, trans->transid, );
btrfs_set_token_inode_rdev(leaf, item, inode->i_rdev, );
btrfs_set_token_inode_flags(leaf, item, BTRFS_I(inode)->flags, );
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 09/19] ext4: convert to new i_version API

2017-12-16 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
Acked-by: Theodore Ts'o <ty...@mit.edu>
---
 fs/ext4/dir.c|  8 
 fs/ext4/inline.c |  6 +++---
 fs/ext4/inode.c  | 12 
 fs/ext4/ioctl.c  |  2 +-
 fs/ext4/namei.c  |  4 ++--
 fs/ext4/super.c  |  2 +-
 fs/ext4/xattr.c  |  4 ++--
 7 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index d5babc9f222b..18eb37395a45 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -208,7 +208,7 @@ static int ext4_readdir(struct file *file, struct 
dir_context *ctx)
 * readdir(2), then we might be pointing to an invalid
 * dirent right now.  Scan from the start of the block
 * to make sure. */
-   if (file->f_version != inode->i_version) {
+   if (inode_cmp_iversion(inode, file->f_version)) {
for (i = 0; i < sb->s_blocksize && i < offset; ) {
de = (struct ext4_dir_entry_2 *)
(bh->b_data + i);
@@ -227,7 +227,7 @@ static int ext4_readdir(struct file *file, struct 
dir_context *ctx)
offset = i;
ctx->pos = (ctx->pos & ~(sb->s_blocksize - 1))
| offset;
-   file->f_version = inode->i_version;
+   file->f_version = inode_query_iversion(inode);
}
 
while (ctx->pos < inode->i_size
@@ -568,10 +568,10 @@ static int ext4_dx_readdir(struct file *file, struct 
dir_context *ctx)
 * cached entries.
 */
if ((!info->curr_node) ||
-   (file->f_version != inode->i_version)) {
+   inode_cmp_iversion(inode, file->f_version)) {
info->curr_node = NULL;
free_rb_tree_fname(>root);
-   file->f_version = inode->i_version;
+   file->f_version = inode_query_iversion(inode);
ret = ext4_htree_fill_tree(file, info->curr_hash,
   info->curr_minor_hash,
   >next_hash);
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 1367553c43bb..a4a621eb80c1 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -1042,7 +1042,7 @@ static int ext4_add_dirent_to_inline(handle_t *handle,
 */
dir->i_mtime = dir->i_ctime = current_time(dir);
ext4_update_dx_flag(dir);
-   dir->i_version++;
+   inode_inc_iversion(dir);
return 1;
 }
 
@@ -1494,7 +1494,7 @@ int ext4_read_inline_dir(struct file *file,
 * dirent right now.  Scan from the start of the inline
 * dir to make sure.
 */
-   if (file->f_version != inode->i_version) {
+   if (inode_cmp_iversion(inode, file->f_version)) {
for (i = 0; i < extra_size && i < offset;) {
/*
 * "." is with offset 0 and
@@ -1526,7 +1526,7 @@ int ext4_read_inline_dir(struct file *file,
}
offset = i;
ctx->pos = offset;
-   file->f_version = inode->i_version;
+   file->f_version = inode_query_iversion(inode);
}
 
while (ctx->pos < extra_size) {
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 7df2c5644e59..1f8ec9cd042c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4873,12 +4873,14 @@ struct inode *ext4_iget(struct super_block *sb, 
unsigned long ino)
EXT4_EINODE_GET_XTIME(i_crtime, ei, raw_inode);
 
if (likely(!test_opt2(inode->i_sb, HURD_COMPAT))) {
-   inode->i_version = le32_to_cpu(raw_inode->i_disk_version);
+   u64 ivers = le32_to_cpu(raw_inode->i_disk_version);
+
if (EXT4_INODE_SIZE(inode->i_sb) > EXT4_GOOD_OLD_INODE_SIZE) {
if (EXT4_FITS_IN_INODE(raw_inode, ei, i_version_hi))
-   inode->i_version |=
+   ivers |=
(__u64)(le32_to_cpu(raw_inode->i_version_hi)) << 32;
}
+   inode_set_iversion_queried(inode, ivers);
}
 
ret = 0;
@@ -5164,11 +5166,13 @@ static int ext4_do_update_inode(handle_t *handle,
}
 
if (likely(!test_opt2(inode->i_sb, HURD_COMPAT))) {
-   raw_inode->i_disk_version = cpu_to_le32(inode->i_version);
+   u64 ivers = inode_peek_iversion(inode);
+
+   raw_inode->i_disk_version = cpu_to_le32(ivers);
if (ei->i_extra_isize) {
if (EXT4_FITS_IN_INODE(raw

[PATCH v2 10/19] nfs: convert to new i_version API

2017-12-16 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

For NFS, we just use the "raw" API since the i_version is mostly
managed by the server. The exception there is when the client
holds a write delegation, but we only need to bump it once
there anyway to handle CB_GETATTR.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/nfs/delegation.c|  2 +-
 fs/nfs/fscache-index.c |  4 ++--
 fs/nfs/inode.c | 16 
 fs/nfs/nfs4proc.c  |  9 +
 fs/nfs/nfstrace.h  |  4 ++--
 fs/nfs/write.c |  7 ++-
 6 files changed, 20 insertions(+), 22 deletions(-)

diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
index ade44ca0c66c..89b22b2a111c 100644
--- a/fs/nfs/delegation.c
+++ b/fs/nfs/delegation.c
@@ -347,7 +347,7 @@ int nfs_inode_set_delegation(struct inode *inode, struct 
rpc_cred *cred, struct
nfs4_stateid_copy(>stateid, >delegation);
delegation->type = res->delegation_type;
delegation->pagemod_limit = res->pagemod_limit;
-   delegation->change_attr = inode->i_version;
+   delegation->change_attr = inode_peek_iversion_raw(inode);
delegation->cred = get_rpccred(cred);
delegation->inode = inode;
delegation->flags = 1<<NFS_DELEGATION_REFERENCED;
diff --git a/fs/nfs/fscache-index.c b/fs/nfs/fscache-index.c
index 3025fe8584a0..8bf080ce4984 100644
--- a/fs/nfs/fscache-index.c
+++ b/fs/nfs/fscache-index.c
@@ -211,7 +211,7 @@ static uint16_t nfs_fscache_inode_get_aux(const void 
*cookie_netfs_data,
auxdata.ctime = nfsi->vfs_inode.i_ctime;
 
if (NFS_SERVER(>vfs_inode)->nfs_client->rpc_ops->version == 4)
-   auxdata.change_attr = nfsi->vfs_inode.i_version;
+   auxdata.change_attr = inode_peek_iversion_raw(>vfs_inode);
 
if (bufmax > sizeof(auxdata))
bufmax = sizeof(auxdata);
@@ -243,7 +243,7 @@ enum fscache_checkaux nfs_fscache_inode_check_aux(void 
*cookie_netfs_data,
auxdata.ctime = nfsi->vfs_inode.i_ctime;
 
if (NFS_SERVER(>vfs_inode)->nfs_client->rpc_ops->version == 4)
-   auxdata.change_attr = nfsi->vfs_inode.i_version;
+   auxdata.change_attr = inode_peek_iversion_raw(>vfs_inode);
 
if (memcmp(data, , datalen) != 0)
return FSCACHE_CHECKAUX_OBSOLETE;
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index b992d2382ffa..c55b86c87bce 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -483,7 +483,7 @@ nfs_fhget(struct super_block *sb, struct nfs_fh *fh, struct 
nfs_fattr *fattr, st
memset(>i_atime, 0, sizeof(inode->i_atime));
memset(>i_mtime, 0, sizeof(inode->i_mtime));
memset(>i_ctime, 0, sizeof(inode->i_ctime));
-   inode->i_version = 0;
+   inode_set_iversion_raw(inode, 0);
inode->i_size = 0;
clear_nlink(inode);
inode->i_uid = make_kuid(_user_ns, -2);
@@ -508,7 +508,7 @@ nfs_fhget(struct super_block *sb, struct nfs_fh *fh, struct 
nfs_fattr *fattr, st
else if (nfs_server_capable(inode, NFS_CAP_CTIME))
nfs_set_cache_invalid(inode, NFS_INO_INVALID_ATTR);
if (fattr->valid & NFS_ATTR_FATTR_CHANGE)
-   inode->i_version = fattr->change_attr;
+   inode_set_iversion_raw(inode, fattr->change_attr);
else
nfs_set_cache_invalid(inode, NFS_INO_INVALID_ATTR
| NFS_INO_REVAL_PAGECACHE);
@@ -1289,8 +1289,8 @@ static unsigned long nfs_wcc_update_inode(struct inode 
*inode, struct nfs_fattr
 
if ((fattr->valid & NFS_ATTR_FATTR_PRECHANGE)
&& (fattr->valid & NFS_ATTR_FATTR_CHANGE)
-   && inode->i_version == fattr->pre_change_attr) {
-   inode->i_version = fattr->change_attr;
+   && !inode_cmp_iversion(inode, fattr->pre_change_attr)) {
+   inode_set_iversion_raw(inode, fattr->change_attr);
if (S_ISDIR(inode->i_mode))
nfs_set_cache_invalid(inode, NFS_INO_INVALID_DATA);
ret |= NFS_INO_INVALID_ATTR;
@@ -1348,7 +1348,7 @@ static int nfs_check_inode_attributes(struct inode 
*inode, struct nfs_fattr *fat
 
if (!nfs_file_has_buffered_writers(nfsi)) {
/* Verify a few of the more important attributes */
-   if ((fattr->valid & NFS_ATTR_FATTR_CHANGE) != 0 && 
inode->i_version != fattr->change_attr)
+   if ((fattr->valid & NFS_ATTR_FATTR_CHANGE) != 0 && 
inode_cmp_iversion(inode, fattr->change_attr))
invalid |= NFS_INO_INVALID_ATTR | 
NFS_INO_REVAL_PAGECACHE;
 

[PATCH v2 11/19] nfsd: convert to new i_version API

2017-12-16 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Mostly just making sure we use the "get" wrappers so we know when
it is being fetched for later use.

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/nfsd/nfsfh.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
index 43f31cf49bae..7dedda5e3ed4 100644
--- a/fs/nfsd/nfsfh.h
+++ b/fs/nfsd/nfsfh.h
@@ -259,7 +259,7 @@ static inline u64 nfsd4_change_attribute(struct inode 
*inode)
chattr =  inode->i_ctime.tv_sec;
chattr <<= 30;
chattr += inode->i_ctime.tv_nsec;
-   chattr += inode->i_version;
+   chattr += inode_query_iversion(inode);
return chattr;
 }
 
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 05/19] afs: convert to new i_version API

2017-12-16 Thread Jeff Layton
On Sat, 2017-12-16 at 08:46 -0500, Jeff Layton wrote:
> From: Jeff Layton <jlay...@redhat.com>
> 
> For AFS, it's generally treated as an opaque value, so we use the
> *_raw variants of the API here.
> 
> Note that AFS has quite a different definition for this counter. AFS
> only increments it on changes to the data, not for the metadata. We'll
> need to reconcile that somehow if we ever want to present this to
> userspace via statx.
> 
> Signed-off-by: Jeff Layton <jlay...@redhat.com>
> ---
>  fs/afs/fsclient.c | 2 +-
>  fs/afs/inode.c| 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c
> index b90ef39ae914..67ed9bb5fe31 100644
> --- a/fs/afs/fsclient.c
> +++ b/fs/afs/fsclient.c
> @@ -124,7 +124,7 @@ static void xdr_decode_AFSFetchStatus(const __be32 **_bp,
>   vnode->vfs_inode.i_ctime.tv_sec = status->mtime_client;
>   vnode->vfs_inode.i_mtime= vnode->vfs_inode.i_ctime;
>   vnode->vfs_inode.i_atime= vnode->vfs_inode.i_ctime;
> - vnode->vfs_inode.i_version  = data_version;
> + inode_set_iversion_raw(>vfs_inode, data_version);
>   }
>  
>   expected_version = status->data_version;
> diff --git a/fs/afs/inode.c b/fs/afs/inode.c
> index 3415eb7484f6..af9577210a46 100644
> --- a/fs/afs/inode.c
> +++ b/fs/afs/inode.c
> @@ -89,7 +89,7 @@ static int afs_inode_map_status(struct afs_vnode *vnode, 
> struct key *key)
>   inode->i_atime  = inode->i_mtime = inode->i_ctime;
>   inode->i_blocks = 0;
>   inode->i_generation = vnode->fid.unique;
> - inode->i_version= vnode->status.data_version;
> + inode_set_iversion_raw(inode, vnode->status.data_version);
>   inode->i_mapping->a_ops = _fs_aops;
>  
>   read_sequnlock_excl(>cb_lock);
> @@ -218,7 +218,7 @@ struct inode *afs_iget_autocell(struct inode *dir, const 
> char *dev_name,
>   inode->i_ctime.tv_nsec  = 0;
>   inode->i_atime  = inode->i_mtime = inode->i_ctime;
>   inode->i_blocks = 0;
> - inode->i_version= 0;
> + inode_set_iversion_raw(inode, 0);
>   inode->i_generation = 0;
>  
>   set_bit(AFS_VNODE_PSEUDODIR, >flags);

TBH, we could just remove the i_version handling here. Nothing ever
looks at the i_version field in AFS inodes.
-- 
Jeff Layton <jlay...@kernel.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 13/19] ufs: use new i_version API

2017-12-16 Thread Jeff Layton
From: Jeff Layton <jlay...@redhat.com>

Signed-off-by: Jeff Layton <jlay...@redhat.com>
---
 fs/ufs/dir.c   | 8 
 fs/ufs/inode.c | 2 +-
 fs/ufs/super.c | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/ufs/dir.c b/fs/ufs/dir.c
index 2edc1755b7c5..547c4d4c4db8 100644
--- a/fs/ufs/dir.c
+++ b/fs/ufs/dir.c
@@ -47,7 +47,7 @@ static int ufs_commit_chunk(struct page *page, loff_t pos, 
unsigned len)
struct inode *dir = mapping->host;
int err = 0;
 
-   dir->i_version++;
+   inode_inc_iversion(dir);
block_write_end(NULL, mapping, pos, len, len, page, NULL);
if (pos+len > dir->i_size) {
i_size_write(dir, pos+len);
@@ -428,7 +428,7 @@ ufs_readdir(struct file *file, struct dir_context *ctx)
unsigned long n = pos >> PAGE_SHIFT;
unsigned long npages = dir_pages(inode);
unsigned chunk_mask = ~(UFS_SB(sb)->s_uspi->s_dirblksize - 1);
-   int need_revalidate = file->f_version != inode->i_version;
+   bool need_revalidate = inode_cmp_iversion(inode, file->f_version);
unsigned flags = UFS_SB(sb)->s_flags;
 
UFSD("BEGIN\n");
@@ -455,8 +455,8 @@ ufs_readdir(struct file *file, struct dir_context *ctx)
offset = ufs_validate_entry(sb, kaddr, offset, 
chunk_mask);
ctx->pos = (n<<PAGE_SHIFT) + offset;
}
-   file->f_version = inode->i_version;
-   need_revalidate = 0;
+   file->f_version = inode_query_iversion(inode);
+   need_revalidate = false;
}
de = (struct ufs_dir_entry *)(kaddr+offset);
limit = kaddr + ufs_last_byte(inode, n) - UFS_DIR_REC_LEN(1);
diff --git a/fs/ufs/inode.c b/fs/ufs/inode.c
index afb601c0dda0..58cb6dfb116d 100644
--- a/fs/ufs/inode.c
+++ b/fs/ufs/inode.c
@@ -693,7 +693,7 @@ struct inode *ufs_iget(struct super_block *sb, unsigned 
long ino)
if (err)
goto bad_inode;
 
-   inode->i_version++;
+   inode_inc_iversion(inode);
ufsi->i_lastfrag =
(inode->i_size + uspi->s_fsize - 1) >> uspi->s_fshift;
ufsi->i_dir_start_lookup = 0;
diff --git a/fs/ufs/super.c b/fs/ufs/super.c
index 4d497e9c6883..7dee0b07a571 100644
--- a/fs/ufs/super.c
+++ b/fs/ufs/super.c
@@ -1440,7 +1440,7 @@ static struct inode *ufs_alloc_inode(struct super_block 
*sb)
if (!ei)
return NULL;
 
-   ei->vfs_inode.i_version = 1;
+   inode_set_iversion(>vfs_inode, 1);
seqlock_init(>meta_lock);
mutex_init(>truncate_mutex);
return >vfs_inode;
-- 
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   >