Re: [PATCH net v2 1/2] socket: fix option SO_TIMESTAMPING_NEW

2020-10-12 Thread Deepa Dinamani
> On Mon, Oct 12, 2020 at 5:36 AM Christian Eggers  wrote:
> > v2:
> > -
> > - integrated proposal from Willem de Bruijn
> > - added Reviewed-by: from Willem and Deepa

You may add my
Acked-by: Deepa Dinamani 


-Deepa


Re: [PATCH net 2/2] socket: don't clear SOCK_TSTAMP_NEW when SO_TIMESTAMPNS is disabled

2020-10-09 Thread Deepa Dinamani
On Fri, Oct 9, 2020 at 5:35 PM Willem de Bruijn
 wrote:
>
> On Fri, Oct 9, 2020 at 6:32 AM Christian Eggers  wrote:
> >
> > SOCK_TSTAMP_NEW (timespec64 instead of timespec) is also used for
> > hardware time stamps (configured via SO_TIMESTAMPING_NEW).
> >
> > User space (ptp4l) first configures hardware time stamping via
> > SO_TIMESTAMPING_NEW which sets SOCK_TSTAMP_NEW. In the next step, ptp4l
> > disables SO_TIMESTAMPNS(_NEW) (software time stamps), but this must not
> > switch hardware time stamps back to "32 bit mode".
> >
> > This problem happens on 32 bit platforms were the libc has already
> > switched to struct timespec64 (from SO_TIMExxx_OLD to SO_TIMExxx_NEW
> > socket options). ptp4l complains with "missing timestamp on transmitted
> > peer delay request" because the wrong format is received (and
> > discarded).
> >
> > Fixes: 887feae36aee ("socket: Add SO_TIMESTAMP[NS]_NEW")
> > Fixes: 783da70e8396 ("net: add sock_enable_timestamps")
> > Signed-off-by: Christian Eggers 
>
> Acked-by: Willem de Bruijn 
>
> Yes, we should just select SOCK_TSTAMP_NEW based on which of the two
> syscall variants the process uses.
>
> There is no need to reset on timestamp disable: in the common case the
> selection is immaterial as timestamping is disabled.
>
> As this commit message shows, with SO_TIMESTAMP(NS) and
> SO_TIMESTAMPING that can be independently turned on and off, disabling
> one can incorrectly switch modes while the other is still active.

This will not help the case when a child process that inherits the fd
and then sets SO_TIMESTAMP*_OLD/NEW on it, while the parent uses the
other version.
One of the two processes might still be surprised. But, child and
parent actively using the same socket fd might be expecting trouble
already.

Acked-by: Deepa Dinamani 

-Deepa


Re: [PATCH net 1/2] socket: fix option SO_TIMESTAMPING_NEW

2020-10-09 Thread Deepa Dinamani
On Fri, Oct 9, 2020 at 5:43 PM Willem de Bruijn  wrote:
>
> On Fri, Oct 9, 2020 at 8:30 PM Deepa Dinamani  wrote:
> >
> > On Fri, Oct 9, 2020 at 3:32 AM Christian Eggers  wrote:
> > >
> > > The comparison of optname with SO_TIMESTAMPING_NEW is wrong way around,
> > > so SOCK_TSTAMP_NEW will first be set and than reset again. Additionally
> > > move it out of the test for SOF_TIMESTAMPING_RX_SOFTWARE as this seems
> > > unrelated.
> >
> > The SOCK_TSTAMP_NEW is reset only in the case when
> > SOF_TIMESTAMPING_RX_SOFTWARE is not set.
> > Note that we only call sock_enable_timestamp() at that time.
> >
> > Why would SOCK_TSTAMP_NEW be relevant otherwise?
>
> Other timestamps can be configured, such as hardware timestamps.
>
> As the follow-on patch shows, there is also the issue of overlap
> between SO_TIMESTAMP(NS) and SO_TIMESTAMPING.

I see. Thanks for clarification. I think I had missed that you could
have both software and hardware timestamps enabled at the same time.

> Don't select OLD on timestamp disable, which may only disable
> some of the ongoing timestamping.
>
> Setting based on the syscall is simpler, too. __sock_set_timestamps
> already uses for SO_TIMESTAMP(NS) the valbool approach I
> suggest for SO_TIMESTAMPING.
>
> The fallthrough can also be removed. My rough patch missed that.

Sounds good.

-Deepa


Re: [PATCH net 1/2] socket: fix option SO_TIMESTAMPING_NEW

2020-10-09 Thread Deepa Dinamani
On Fri, Oct 9, 2020 at 3:32 AM Christian Eggers  wrote:
>
> The comparison of optname with SO_TIMESTAMPING_NEW is wrong way around,
> so SOCK_TSTAMP_NEW will first be set and than reset again. Additionally
> move it out of the test for SOF_TIMESTAMPING_RX_SOFTWARE as this seems
> unrelated.

The SOCK_TSTAMP_NEW is reset only in the case when
SOF_TIMESTAMPING_RX_SOFTWARE is not set.
Note that we only call sock_enable_timestamp() at that time.

Why would SOCK_TSTAMP_NEW be relevant otherwise?

-Deepa


Re: [PATCH] fs/namespace.c: fix use-after-free of mount in mnt_warn_timestamp_expiry()

2019-10-13 Thread Deepa Dinamani
Thanks for the fix.

Would it be better to move the check and warning to a place where the
access is still safe?

-Deepa

> On Oct 9, 2019, at 12:19 AM, Eric Biggers  wrote:
>
> From: Eric Biggers 


On Wed, Oct 9, 2019 at 12:19 AM Eric Biggers  wrote:
>
> From: Eric Biggers 
>
> After do_add_mount() returns success, the caller doesn't hold a
> reference to the 'struct mount' anymore.  So it's invalid to access it
> in mnt_warn_timestamp_expiry().
>
> Fix it by instead passing the super_block and the mnt_flags.  It's safe
> to access the super_block because it's pinned by fs_context::root.
>
> Reported-by: syzbot+da4f525235510683d...@syzkaller.appspotmail.com
> Fixes: f8b92ba67c5d ("mount: Add mount warning for impending timestamp 
> expiry")
> Signed-off-by: Eric Biggers 
> ---
>  fs/namespace.c | 15 +++
>  1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/fs/namespace.c b/fs/namespace.c
> index fe0e9e1410fe..7ef8edaaed69 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2466,12 +2466,11 @@ static void set_mount_attributes(struct mount *mnt, 
> unsigned int mnt_flags)
> unlock_mount_hash();
>  }
>
> -static void mnt_warn_timestamp_expiry(struct path *mountpoint, struct 
> vfsmount *mnt)
> +static void mnt_warn_timestamp_expiry(struct path *mountpoint,
> + struct super_block *sb, int mnt_flags)
>  {
> -   struct super_block *sb = mnt->mnt_sb;
> -
> -   if (!__mnt_is_readonly(mnt) &&
> -  (ktime_get_real_seconds() + TIME_UPTIME_SEC_MAX > sb->s_time_max)) 
> {
> +   if (!(mnt_flags & MNT_READONLY) && !sb_rdonly(sb) &&
> +   (ktime_get_real_seconds() + TIME_UPTIME_SEC_MAX > 
> sb->s_time_max)) {
> char *buf = (char *)__get_free_page(GFP_KERNEL);
> char *mntpath = buf ? d_path(mountpoint, buf, PAGE_SIZE) : 
> ERR_PTR(-ENOMEM);
> struct tm tm;
> @@ -2512,7 +2511,7 @@ static int do_reconfigure_mnt(struct path *path, 
> unsigned int mnt_flags)
> set_mount_attributes(mnt, mnt_flags);
> up_write(>s_umount);
>
> -   mnt_warn_timestamp_expiry(path, >mnt);
> +   mnt_warn_timestamp_expiry(path, sb, mnt_flags);
>
> return ret;
>  }
> @@ -2555,7 +2554,7 @@ static int do_remount(struct path *path, int ms_flags, 
> int sb_flags,
> up_write(>s_umount);
> }
>
> -   mnt_warn_timestamp_expiry(path, >mnt);
> +   mnt_warn_timestamp_expiry(path, sb, mnt_flags);
>
> put_fs_context(fc);
> return err;
> @@ -2770,7 +2769,7 @@ static int do_new_mount_fc(struct fs_context *fc, 
> struct path *mountpoint,
> return error;
> }
>
> -   mnt_warn_timestamp_expiry(mountpoint, mnt);
> +   mnt_warn_timestamp_expiry(mountpoint, sb, mnt_flags);
>
> return error;
>  }
> --
> 2.23.0
>


[PATCH] ext4: Reduce ext4 timestamp warnings

2019-09-04 Thread Deepa Dinamani
When ext4 file systems were created intentionally with 128 byte inodes,
the rate-limited warning of eventual possible timestamp overflow are
still emitted rather frequently.  Remove the warning for now.

Discussion for whether any warning is needed,
and where it should be emitted, can be found at
https://lore.kernel.org/lkml/1567523922.5576.57.ca...@lca.pw/.
I can post a separate follow-up patch after the conclusion.

Reported-by: Qian Cai 
Signed-off-by: Deepa Dinamani 
---
 fs/ext4/ext4.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 9e3ae3be3de9..24b14bd3feab 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -833,10 +833,8 @@ do {   
\
(raw_inode)->xtime ## _extra =  
\
ext4_encode_extra_time(&(inode)->xtime);
\
}   
\
-   else{\
+   else\
(raw_inode)->xtime = cpu_to_le32(clamp_t(int32_t, 
(inode)->xtime.tv_sec, S32_MIN, S32_MAX));\
-   ext4_warning_inode(inode, "inode does not support timestamps 
beyond 2038"); \
-   } \
 } while (0)
 
 #define EXT4_EINODE_SET_XTIME(xtime, einode, raw_inode)
   \
-- 
2.17.1



Re: "beyond 2038" warnings from loopback mount is noisy

2019-09-04 Thread Deepa Dinamani
> On Sep 4, 2019, at 5:58 AM, Theodore Y. Ts'o  wrote:
>
>> On Tue, Sep 03, 2019 at 09:50:09PM -0700, Deepa Dinamani wrote:
>> If we don't care to warn about the timestamps that are clamped in
>> memory, maybe we could just warn when they are being written out.
>> Would something like this be more acceptable? I would also remove the
>> warning in ext4.h. I think we don't have to check if the inode is 128
>> bytes here (Please correct me if I am wrong). If this looks ok, I can
>> post this.
>
> That's better, but it's going to be misleading in many cases.  The
> inode's extra size field is 16 or larger, there will be enough space
> for the timestamps, so talking about "timestamps on this inode beyond
> 2038" when ext4 is unable to expand it from say, 24 to 32, won't be
> true.  Certain certain features won't be available, yes --- such as
> project-id-based quotas, since there won't be room to store the
> project ID.  However, it's not going to impact the ability to store
> timestamps beyond 2038.  The i_extra_isize field is not just about
> timestamps!

I understand that i_extra_isize is not just about timestamps. It’s
evident from EXT4_FITS_IN_INODE(). I think we can check for
EXT4_FITS_IN_INODE() here if that will consistently eliminates false
positives.

But, I hear you. You think this warning is unnecessary. I think there
are many file systems and I don’t think anybody would knows in’s and
outs of each one. I think if I’m mounting an ext4 fs and it has mixed
sizes of inodes, I think I would at least expect a dmesg(with a hint
on how to fix it) considering that this filesystem is restricted in
more ways than just time. Is this the purpose of the warning you
already have?:

if (error && (mnt_count != le16_to_cpu(sbi->s_es->s_mnt_count))) {
   ext4_warning(inode->i_sb, "Unable to expand inode %lu.
Delete some EAs or run e2fsck.",

Maybe there should be a warning, but it has nothing to do with just
time. Do we already have this?

-Deepa


Re: "beyond 2038" warnings from loopback mount is noisy

2019-09-03 Thread Deepa Dinamani
If we don't care to warn about the timestamps that are clamped in
memory, maybe we could just warn when they are being written out.
Would something like this be more acceptable? I would also remove the
warning in ext4.h. I think we don't have to check if the inode is 128
bytes here (Please correct me if I am wrong). If this looks ok, I can
post this.

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 9e3ae3be3de9..24b14bd3feab 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -833,10 +833,8 @@ do {
 \
(raw_inode)->xtime ## _extra =
 \

ext4_encode_extra_time(&(inode)->xtime);\
}
 \
-   else{\
+   else\
(raw_inode)->xtime = cpu_to_le32(clamp_t(int32_t,
(inode)->xtime.tv_sec, S32_MIN, S32_MAX));\
-   ext4_warning_inode(inode, "inode does not support
timestamps beyond 2038"); \
-   } \
 } while (0)

 #define EXT4_EINODE_SET_XTIME(xtime, einode, raw_inode)
\
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 491f9ee4040e..cef5b87cc5a6 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -2791,7 +2791,7 @@ int ext4_expand_extra_isize_ea(struct inode
*inode, int new_extra_isize,

 cleanup:
if (error && (mnt_count != le16_to_cpu(sbi->s_es->s_mnt_count))) {
-   ext4_warning(inode->i_sb, "Unable to expand inode %lu.
Delete some EAs or run e2fsck.",
+   ext4_warning(inode->i_sb, "Unable to expand inode %lu.
Delete some EAs or run e2fsck. Timestamps on the inode expire beyond
2038",
 inode->i_ino);
mnt_count = le16_to_cpu(sbi->s_es->s_mnt_count);
}


Re: "beyond 2038" warnings from loopback mount is noisy

2019-09-03 Thread Deepa Dinamani
> > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > index 9e3ae3be3de9..5a971d1b6d5e 100644
> > --- a/fs/ext4/ext4.h
> > +++ b/fs/ext4/ext4.h
> > @@ -835,7 +835,9 @@ do {
> >  \
> > }
> >  \
> > else{\
> > (raw_inode)->xtime = cpu_to_le32(clamp_t(int32_t,
> > (inode)->xtime.tv_sec, S32_MIN, S32_MAX));\
> > -   ext4_warning_inode(inode, "inode does not support
> > timestamps beyond 2038"); \
> > +   if (((inode)->xtime.tv_sec != (raw_inode)->xtime) && \
> > +   ((inode)->i_sb->s_time_max > S32_MAX))
> >  \
> > +   ext4_warning_inode(inode, "inode does not
> > support timestamps beyond 2038"); \
> > } \
> >  } while (0)
>
> Sure, that's much less objectionable.

The reason it was warning for every update was because of the
ratelimiting. I think ratelimiting is not working well here. I will
check that part.

-Deepa


Re: "beyond 2038" warnings from loopback mount is noisy

2019-09-03 Thread Deepa Dinamani
On Tue, Sep 3, 2019 at 2:18 PM Theodore Y. Ts'o  wrote:
>
> On Tue, Sep 03, 2019 at 09:18:44AM -0700, Deepa Dinamani wrote:
> >
> > This prints a warning for each inode that doesn't extend limits beyond
> > 2038. It is rate limited by the ext4_warning_inode().
> > Looks like your filesystem has inodes that cannot be extended.
> > We could use a different rate limit or ignore this corner case. Do the
> > maintainers have a preference?
>
> We need to drop this commit (ext4: Initialize timestamps limits), or
> at least the portion which adds the call to the EXT4_INODE_SET_XTIME
> macro in ext4.h.

As Arnd said, I think this can be fixed by warning only when the inode
size is not uniformly 128 bytes in ext4.h. Is this an acceptable
solution or we want to drop this warning altogether?

Arnd, should I be sending a pull request again with the fix? Or, we
drop the ext4 patch and I can send it to the maintainers directly?

> I know of a truly vast number of servers in production all over the
> world which are using 128 byte inodes, and spamming the inodes at the
> maximum rate limit is a really bad idea.  This includes at some major
> cloud data centers where the life of individual servers in their data
> centers is well understood (they're not going to last until 2038) and
> nothing stored on the local Linux file systems are long-lived ---
> that's all stored in the cluster file systems.  The choice of 128 byte
> inode was deliberately chosen to maximize storage TCO, and so spamming
> a warning at high rates is going to be extremely unfriendly.
>
> In cases where the inode size is such that there is no chance at all
> to support timestamps beyond 2038, a single warning at mount time, or
> maybe a warning at mkfs time might be acceptable.  But there's no
> point printing a warning time each time we set a timestamp on such a
> file system.  It's not going to change, and past a certain point, we
> need to trust that people who are using 128 byte inodes did so knowing
> what the tradeoffs might be.  After all, it is *not* the default.

We have a single mount time warning already in place here. I did not
realize some people actually chose to use 128 byte inodes on purpose.

-Deepa


Re: "beyond 2038" warnings from loopback mount is noisy

2019-09-03 Thread Deepa Dinamani
We might also want to consider updating the file system the LTP is
being run on here.

-Deepa


Re: "beyond 2038" warnings from loopback mount is noisy

2019-09-03 Thread Deepa Dinamani
Actually this warning is coming from this patch:
https://lore.kernel.org/linux-fsdevel/20190818165817.32634-10-deepa.ker...@gmail.com/
([PATCH v8 09/20] ext4: Initialize timestamps limits).

This is the code generating the warning:

 diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
 index 9c7f4036021b..ae5d0c86aba2 100644
 --- a/fs/ext4/ext4.h
 +++ b/fs/ext4/ext4.h
 @@ -832,11 +832,15 @@ static inline void
ext4_decode_extra_time(struct timespec64 *time,

  #define EXT4_INODE_SET_XTIME(xtime, inode, raw_inode)
  \
  do {
  \
  -   (raw_inode)->xtime = cpu_to_le32((inode)->xtime.tv_sec);
   \
  if (EXT4_FITS_IN_INODE(raw_inode, EXT4_I(inode), xtime ##
_extra)) {\
  +   (raw_inode)->xtime =
cpu_to_le32((inode)->xtime.tv_sec);\
  (raw_inode)->xtime ## _extra =
   \

ext4_encode_extra_time(&(inode)->xtime);\
  }
   \
  +   else{\
  +   (raw_inode)->xtime = cpu_to_le32(clamp_t(int32_t,
(inode)->xtime.tv_sec, S32_MIN, S32_MAX));\
  +   ext4_warning_inode(inode, "inode does not support
timestamps beyond 2038"); \
  +   } \
   } while (0)

This prints a warning for each inode that doesn't extend limits beyond
2038. It is rate limited by the ext4_warning_inode().
Looks like your filesystem has inodes that cannot be extended.
We could use a different rate limit or ignore this corner case. Do the
maintainers have a preference?

-Deepa

On Tue, Sep 3, 2019 at 8:18 AM Qian Cai  wrote:
>
> https://lore.kernel.org/linux-fsdevel/20190818165817.32634-5-deepa.kernel@gmail.
> com/
>
> Running only a subset of the LTP testsuite on today's linux-next with the 
> above
> commit is now generating ~800 warnings on this machine which seems a bit 
> crazy.
>
> [ 2130.970782] EXT4-fs warning (device loop0): ext4_do_update_inode:5261: 
> inode
> #40961: comm statx04: inode does not support timestamps beyond 2038
> [ 2130.970808] EXT4-fs warning (device loop0): ext4_do_update_inode:5262: 
> inode
> #40961: comm statx04: inode does not support timestamps beyond 2038
> [ 2130.970838] EXT4-fs warning (device loop0): ext4_do_update_inode:5263: 
> inode
> #40961: comm statx04: inode does not support timestamps beyond 2038
> [ 2130.971440] EXT4-fs warning (device loop0): ext4_do_update_inode:5261: 
> inode
> #40961: comm statx04: inode does not support timestamps beyond 2038
> [ 2131.847613] EXT4-fs warning (device loop0): ext4_do_update_inode:5261: 
> inode
> #32769: comm statx04: inode does not support timestamps beyond 2038
> [ 2131.847647] EXT4-fs warning (device loop0): ext4_do_update_inode:5262: 
> inode
> #32769: comm statx04: inode does not support timestamps beyond 2038
> [ 2131.847681] EXT4-fs warning (device loop0): ext4_do_update_inode:5263: 
> inode
> #32769: comm statx04: inode does not support timestamps beyond 2038
> [ 2131.847717] EXT4-fs warning (device loop0): ext4_do_update_inode:5261: 
> inode
> #32769: comm statx04: inode does not support timestamps beyond 2038
> [ 2131.847774] EXT4-fs warning (device loop0): ext4_do_update_inode:5262: 
> inode
> #32769: comm statx04: inode does not support timestamps beyond 2038
> [ 2131.847817] EXT4-fs warning (device loop0): ext4_do_update_inode:5263: 
> inode
> #32769: comm statx04: inode does not support timestamps beyond 2038
> [ 2131.847909] EXT4-fs warning (device loop0): ext4_do_update_inode:5261: 
> inode
> #32769: comm statx04: inode does not support timestamps beyond 2038
> [ 2131.847970] EXT4-fs warning (device loop0): ext4_do_update_inode:5262: 
> inode
> #32769: comm statx04: inode does not support timestamps beyond 2038
> [ 2131.848004] EXT4-fs warning (device loop0): ext4_do_update_inode:5263: 
> inode
> #32769: comm statx04: inode does not support timestamps beyond 2038
> [ 2131.848415] EXT4-fs warning (device loop0): ext4_do_update_inode:5261: 
> inode
> #32769: comm statx04: inode does not support timestamps beyond 2038
> [ 2134.753752] EXT4-fs warning (device loop0): ext4_do_update_inode:5261: 
> inode
> #12: comm statx05: inode does not support timestamps beyond 2038
> [ 2134.753783] EXT4-fs warning (device loop0): ext4_do_update_inode:5262: 
> inode
> #12: comm statx05: inode does not support timestamps beyond 2038
> [ 2134.753814] EXT4-fs warning (device loop0): ext4_do_update_inode:5263: 
> inode
> #12: comm statx05: inode does not support timestamps beyond 2038
> [ 2134.753847] EXT4-fs warning (device loop0): ext4_do_update_inode:5261: 
> inode
> #12: comm statx05: inode does not support timestamps beyond 2038
> [ 2134.753889] EXT4-fs warning (device loop0): ext4_do_update_inode:5262: 
> inode
> #12: comm statx05: inode does not support timestamps beyond 2038
> [ 2134.753929] EXT4-fs warning (device loop0): ext4_do_update_inode:5263: 
> inode
> #12: comm statx05: inode does not support timestamps beyond 2038
> [ 2134.754021] EXT4-fs warning (device loop0): ext4_do_update_inode:5261: 
> inode
> #12: comm 

[PATCH] adfs: Fill in max and min timestamps in sb

2019-09-02 Thread Deepa Dinamani
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

Note that the min timestamp is assumed to be
01 Jan 1970 00:00:00 (Unix epoch). This is consistent
with the way we convert timestamps in adfs_adfs2unix_time().

Signed-off-by: Deepa Dinamani 
---

This depends on the following patch in Arnd's y2038 tree:
https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground y2038
188d20bcd1eb ("vfs: Add file timestamp range support")

 fs/adfs/adfs.h  | 13 +
 fs/adfs/inode.c |  8 
 fs/adfs/super.c |  2 ++
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/fs/adfs/adfs.h b/fs/adfs/adfs.h
index 699c4fa8b78b..504ad80072ef 100644
--- a/fs/adfs/adfs.h
+++ b/fs/adfs/adfs.h
@@ -3,6 +3,19 @@
 #include 
 #include 
 
+/*
+ * 01 Jan 1970 00:00:00 (Unix epoch) as seconds since
+ * 01 Jan 1900 00:00:00 (RISC OS epoch)
+ */
+#define RISC_OS_EPOCH_DELTA 2208988800LL
+
+/*
+ * Convert 40 bit centi seconds to seconds
+ * since 01 Jan 1900 00:00:00 (RISC OS epoch)
+ * The result is 2248-06-03 06:57:57 GMT
+ */
+#define ADFS_MAX_TIMESTAMP ((0xFFLL / 100) - RISC_OS_EPOCH_DELTA)
+
 /* Internal data structures for ADFS */
 
 #define ADFS_FREE_FRAG  0
diff --git a/fs/adfs/inode.c b/fs/adfs/inode.c
index 5b8017ab0a98..11acd74fb099 100644
--- a/fs/adfs/inode.c
+++ b/fs/adfs/inode.c
@@ -162,7 +162,10 @@ static int adfs_mode2atts(struct super_block *sb, struct 
inode *inode,
return attr;
 }
 
-static const s64 nsec_unix_epoch_diff_risc_os_epoch = 22089888000LL;
+/* 01 Jan 1970 00:00:00 (Unix epoch) as nanoseconds since
+ * 01 Jan 1900 00:00:00 (RISC OS epoch)
+ */
+static const s64 nsec_unix_epoch_diff_risc_os_epoch = RISC_OS_EPOCH_DELTA * 
NSEC_PER_SEC;
 
 /*
  * Convert an ADFS time to Unix time.  ADFS has a 40-bit centi-second time
@@ -173,9 +176,6 @@ static void
 adfs_adfs2unix_time(struct timespec64 *tv, struct inode *inode)
 {
unsigned int high, low;
-   /* 01 Jan 1970 00:00:00 (Unix epoch) as nanoseconds since
-* 01 Jan 1900 00:00:00 (RISC OS epoch)
-*/
s64 nsec;
 
if (!adfs_inode_is_stamped(inode))
diff --git a/fs/adfs/super.c b/fs/adfs/super.c
index 7a3e6b394f2a..532e2afbc7a3 100644
--- a/fs/adfs/super.c
+++ b/fs/adfs/super.c
@@ -381,6 +381,8 @@ static int adfs_fill_super(struct super_block *sb, void 
*data, int silent)
sb->s_fs_info = asb;
sb->s_magic = ADFS_SUPER_MAGIC;
sb->s_time_gran = 1000;
+   sb->s_time_min = 0;
+   sb->s_time_max = ADFS_MAX_TIMESTAMP;
 
/* set default options */
asb->s_uid = GLOBAL_ROOT_UID;
-- 
2.17.1



[GIT PULL] vfs: Add support for timestamp limits

2019-08-28 Thread Deepa Dinamani
Hi Al, Arnd,

This is a pull request for filling in min and max timestamps for filesystems.
I've added all the acks, and dropped the adfs patch. That will be merged through
Russell's tree.

Thanks,
Deepa

The following changes since commit 5d18cb62218608a1388858880ad3ec76d6cb0d3b:

  Add linux-next specific files for 20190828 (2019-08-28 19:59:14 +1000)

are available in the Git repository at:

  https://github.com/deepa-hub/vfs limits

for you to fetch changes up to f0f216afa4c7e4dee9121fde52ccf57f76119188:

  isofs: Initialize filesystem timestamp ranges (2019-08-28 19:19:36 -0700)


Deepa Dinamani (19):
  vfs: Add file timestamp range support
  vfs: Add timestamp_truncate() api
  timestamp_truncate: Replace users of timespec64_trunc
  mount: Add mount warning for impending timestamp expiry
  utimes: Clamp the timestamps before update
  fs: Fill in max and min timestamps in superblock
  9p: Fill min and max timestamps in sb
  ext4: Initialize timestamps limits
  fs: nfs: Initialize filesystem timestamp ranges
  fs: cifs: Initialize filesystem timestamp ranges
  fs: fat: Initialize filesystem timestamp ranges
  fs: affs: Initialize filesystem timestamp ranges
  fs: sysv: Initialize filesystem timestamp ranges
  fs: ceph: Initialize filesystem timestamp ranges
  fs: orangefs: Initialize filesystem timestamp ranges
  fs: hpfs: Initialize filesystem timestamp ranges
  fs: omfs: Initialize filesystem timestamp ranges
  pstore: fs superblock limits
  isofs: Initialize filesystem timestamp ranges

 fs/9p/vfs_super.c|  6 +-
 fs/affs/amigaffs.c   |  2 +-
 fs/affs/amigaffs.h   |  3 +++
 fs/affs/inode.c  |  4 ++--
 fs/affs/super.c  |  4 
 fs/attr.c| 21 -
 fs/befs/linuxvfs.c   |  2 ++
 fs/bfs/inode.c   |  2 ++
 fs/ceph/super.c  |  2 ++
 fs/cifs/cifsfs.c | 22 ++
 fs/cifs/netmisc.c| 14 +++---
 fs/coda/inode.c  |  3 +++
 fs/configfs/inode.c  | 12 ++--
 fs/cramfs/inode.c|  2 ++
 fs/efs/super.c   |  2 ++
 fs/ext2/super.c  |  2 ++
 fs/ext4/ext4.h   | 10 +-
 fs/ext4/super.c  | 17 +++--
 fs/f2fs/file.c   | 21 -
 fs/fat/inode.c   | 12 
 fs/freevxfs/vxfs_super.c |  2 ++
 fs/hpfs/hpfs_fn.h|  6 ++
 fs/hpfs/super.c  |  2 ++
 fs/inode.c   | 33 -
 fs/isofs/inode.c |  7 +++
 fs/jffs2/fs.c|  3 +++
 fs/jfs/super.c   |  2 ++
 fs/kernfs/inode.c|  7 +++
 fs/minix/inode.c |  2 ++
 fs/namespace.c   | 33 -
 fs/nfs/super.c   | 20 +++-
 fs/ntfs/inode.c  | 21 -
 fs/omfs/inode.c  |  4 
 fs/orangefs/super.c  |  2 ++
 fs/pstore/ram.c  |  2 ++
 fs/qnx4/inode.c  |  2 ++
 fs/qnx6/inode.c  |  2 ++
 fs/reiserfs/super.c  |  3 +++
 fs/romfs/super.c |  2 ++
 fs/squashfs/super.c  |  2 ++
 fs/super.c   |  2 ++
 fs/sysv/super.c  |  5 -
 fs/ubifs/file.c  | 21 -
 fs/ufs/super.c   |  7 +++
 fs/utimes.c  |  6 ++
 fs/xfs/xfs_super.c   |  2 ++
 include/linux/fs.h   |  5 +
 include/linux/time64.h   |  2 ++
 48 files changed, 298 insertions(+), 72 deletions(-)


Re: [PATCH v8 19/20] pstore: fs superblock limits

2019-08-20 Thread Deepa Dinamani
On Tue, Aug 20, 2019 at 12:20 AM Kees Cook  wrote:
>
> On Sun, Aug 18, 2019 at 09:58:16AM -0700, Deepa Dinamani wrote:
> > Leaving granularity at 1ns because it is dependent on the specific
> > attached backing pstore module. ramoops has microsecond resolution.
> >
> > Fix the readback of ramoops fractional timestamp microseconds,
> > which has incorrectly been reporting the value as nanoseconds since
> > 3f8f80f0 ("pstore/ram: Read and write to the 'compressed' flag of pstore").
>
> As such, this should also have:
>
> Fixes: 3f8f80f0cfeb ("pstore/ram: Read and write to the 'compressed' flag of 
> pstore")

Will add that in. Thanks.

> > Signed-off-by: Deepa Dinamani 
> > Acked-by: Kees Cook 
>
> Also: this is going via some other tree, yes? (Or should I pick this up
> for the pstore tree?)

I am hoping Al can take the series as a whole.

-Deepa


Re: [PATCH v8 08/20] adfs: Fill in max and min timestamps in sb

2019-08-20 Thread Deepa Dinamani
On Tue, Aug 20, 2019 at 9:28 AM Matthew Wilcox  wrote:
>
> On Sun, Aug 18, 2019 at 09:58:05AM -0700, Deepa Dinamani wrote:
> > Note that the min timestamp is assumed to be
> > 01 Jan 1970 00:00:00 (Unix epoch). This is consistent
> > with the way we convert timestamps in adfs_adfs2unix_time().
>
> That's not actually correct.  RISC OS timestamps are centiseconds since
> 1900 stored in 5 bytes.

The timestamp can hold earlier values but the fs implementation
explicitly rejects those in adfs_adfs2unix_time() too_early check.
We could fix the implementation to not throw away times before 1970.
Are you suggesting we want to do this?
I could post a separate patch to fix this or we could do it as part of
the series.

 static void
 adfs_adfs2unix_time(struct timespec64 *tv, struct inode *inode)
 {
 unsigned int high, low;
 static const s64 nsec_unix_epoch_diff_risc_os_epoch =
RISC_OS_EPOCH_DELTA * NSEC_PER_SEC;
 s64 nsec;

 if (!adfs_inode_is_stamped(inode))
 goto cur_time;

 high = ADFS_I(inode)->loadaddr & 0xFF; /* top 8 bits of timestamp */
 low  = ADFS_I(inode)->execaddr;/* bottom 32 bits of timestamp */

 /* convert 40-bit centi-seconds to 32-bit seconds
  * going via nanoseconds to retain precision
  */
 nsec = (((s64) high << 32) | (s64) low) * 1000; /* cs to ns */

 /* Files dated pre  01 Jan 1970 00:00:00. */
 if (nsec < nsec_unix_epoch_diff_risc_os_epoch)
 goto too_early;

 /* convert from RISC OS to Unix epoch */
 nsec -= nsec_unix_epoch_diff_risc_os_epoch;

 *tv = ns_to_timespec64(nsec);
 return;

  cur_time:
 *tv = current_time(inode);
 return;

  too_early:
 tv->tv_sec = tv->tv_nsec = 0;
 return;
 }

-Deepa


[PATCH v8 07/20] 9p: Fill min and max timestamps in sb

2019-08-18 Thread Deepa Dinamani
struct p9_wstat and struct p9_stat_dotl indicate that the
wire transport uses u32 and u64 fields for timestamps.
Fill in the appropriate limits to avoid inconsistencies in
the vfs cached inode times when timestamps are outside the
permitted range.

Note that the upper bound for V9FS_PROTO_2000L is retained as S64_MAX.
This is because that is the upper bound supported by vfs.

Signed-off-by: Deepa Dinamani 
Cc: eri...@gmail.com
Cc: lu...@ionkov.net
Cc: asmad...@codewreck.org
Cc: v9fs-develo...@lists.sourceforge.net
---
 fs/9p/vfs_super.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
index 08112fbcaece..ca243e658d71 100644
--- a/fs/9p/vfs_super.c
+++ b/fs/9p/vfs_super.c
@@ -69,8 +69,12 @@ v9fs_fill_super(struct super_block *sb, struct 
v9fs_session_info *v9ses,
if (v9fs_proto_dotl(v9ses)) {
sb->s_op = _super_ops_dotl;
sb->s_xattr = v9fs_xattr_handlers;
-   } else
+   } else {
sb->s_op = _super_ops;
+   sb->s_time_max = U32_MAX;
+   }
+
+   sb->s_time_min = 0;
 
ret = super_setup_bdi(sb);
if (ret)
-- 
2.17.1



[PATCH v8 10/20] fs: nfs: Initialize filesystem timestamp ranges

2019-08-18 Thread Deepa Dinamani
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

The time formats for various verious is detailed in the
RFCs as below:

https://tools.ietf.org/html/rfc7862(time metadata)
https://tools.ietf.org/html/rfc7530:

nfstime4

   struct nfstime4 {
   int64_t seconds;
   uint32_tnseconds;
   };

https://tools.ietf.org/html/rfc1094

  struct timeval {
  unsigned int seconds;
  unsigned int useconds;
  };

https://tools.ietf.org/html/rfc1813

struct nfstime3 {
 uint32   seconds;
 uint32   nseconds;
  };

Use the limits as per the RFC.

Signed-off-by: Deepa Dinamani 
Cc: trond.mykleb...@hammerspace.com
Cc: anna.schuma...@netapp.com
Cc: linux-...@vger.kernel.org
---
 fs/nfs/super.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 703f595dce90..19a76cfa8b1f 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -2382,6 +2382,15 @@ void nfs_fill_super(struct super_block *sb, struct 
nfs_mount_info *mount_info)
sb->s_flags |= SB_POSIXACL;
sb->s_time_gran = 1;
sb->s_export_op = _export_ops;
+   } else
+   sb->s_time_gran = 1000;
+
+   if (server->nfs_client->rpc_ops->version != 4) {
+   sb->s_time_min = 0;
+   sb->s_time_max = U32_MAX;
+   } else {
+   sb->s_time_min = S64_MIN;
+   sb->s_time_max = S64_MAX;
}
 
nfs_initialise_sb(sb);
@@ -2402,7 +2411,6 @@ static void nfs_clone_super(struct super_block *sb,
sb->s_maxbytes = old_sb->s_maxbytes;
sb->s_xattr = old_sb->s_xattr;
sb->s_op = old_sb->s_op;
-   sb->s_time_gran = 1;
sb->s_export_op = old_sb->s_export_op;
 
if (server->nfs_client->rpc_ops->version != 2) {
@@ -2410,6 +2418,16 @@ static void nfs_clone_super(struct super_block *sb,
 * so ourselves when necessary.
 */
sb->s_flags |= SB_POSIXACL;
+   sb->s_time_gran = 1;
+   } else
+   sb->s_time_gran = 1000;
+
+   if (server->nfs_client->rpc_ops->version != 4) {
+   sb->s_time_min = 0;
+   sb->s_time_max = U32_MAX;
+   } else {
+   sb->s_time_min = S64_MIN;
+   sb->s_time_max = S64_MAX;
}
 
nfs_initialise_sb(sb);
-- 
2.17.1



[PATCH v8 11/20] fs: cifs: Initialize filesystem timestamp ranges

2019-08-18 Thread Deepa Dinamani
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

Also fixed cnvrtDosUnixTm calculations to avoid int overflow
while computing maximum date.

References:

http://cifs.com/

https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-cifs/d416ff7c-c536-406e-a951-4f04b2fd1d2b

Signed-off-by: Deepa Dinamani 
Cc: sfre...@samba.org
Cc: linux-c...@vger.kernel.org
---
 fs/cifs/cifsfs.c  | 22 ++
 fs/cifs/netmisc.c | 14 +++---
 2 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 3289b566463f..7a75726442ad 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -56,6 +56,15 @@
 #include "dfs_cache.h"
 #endif
 
+/*
+ * DOS dates from 1980/1/1 through 2107/12/31
+ * Protocol specifications indicate the range should be to 119, which
+ * limits maximum year to 2099. But this range has not been checked.
+ */
+#define SMB_DATE_MAX (127<<9 | 12<<5 | 31)
+#define SMB_DATE_MIN (0<<9 | 1<<5 | 1)
+#define SMB_TIME_MAX (23<<11 | 59<<5 | 29)
+
 int cifsFYI = 0;
 bool traceSMB;
 bool enable_oplocks = true;
@@ -142,6 +151,7 @@ cifs_read_super(struct super_block *sb)
struct inode *inode;
struct cifs_sb_info *cifs_sb;
struct cifs_tcon *tcon;
+   struct timespec64 ts;
int rc = 0;
 
cifs_sb = CIFS_SB(sb);
@@ -161,6 +171,18 @@ cifs_read_super(struct super_block *sb)
/* BB FIXME fix time_gran to be larger for LANMAN sessions */
sb->s_time_gran = 100;
 
+   if (tcon->unix_ext) {
+   ts = cifs_NTtimeToUnix(0);
+   sb->s_time_min = ts.tv_sec;
+   ts = cifs_NTtimeToUnix(cpu_to_le64(S64_MAX));
+   sb->s_time_max = ts.tv_sec;
+   } else {
+   ts = cnvrtDosUnixTm(cpu_to_le16(SMB_DATE_MIN), 0, 0);
+   sb->s_time_min = ts.tv_sec;
+   ts = cnvrtDosUnixTm(cpu_to_le16(SMB_DATE_MAX), 
cpu_to_le16(SMB_TIME_MAX), 0);
+   sb->s_time_max = ts.tv_sec;
+   }
+
sb->s_magic = CIFS_MAGIC_NUMBER;
sb->s_op = _super_ops;
sb->s_xattr = cifs_xattr_handlers;
diff --git a/fs/cifs/netmisc.c b/fs/cifs/netmisc.c
index ed92958e842d..49c17ee18254 100644
--- a/fs/cifs/netmisc.c
+++ b/fs/cifs/netmisc.c
@@ -949,8 +949,8 @@ static const int total_days_of_prev_months[] = {
 struct timespec64 cnvrtDosUnixTm(__le16 le_date, __le16 le_time, int offset)
 {
struct timespec64 ts;
-   time64_t sec;
-   int min, days, month, year;
+   time64_t sec, days;
+   int min, day, month, year;
u16 date = le16_to_cpu(le_date);
u16 time = le16_to_cpu(le_time);
SMB_TIME *st = (SMB_TIME *)
@@ -966,15 +966,15 @@ struct timespec64 cnvrtDosUnixTm(__le16 le_date, __le16 
le_time, int offset)
sec += 60 * 60 * st->Hours;
if (st->Hours > 24)
cifs_dbg(VFS, "illegal hours %d\n", st->Hours);
-   days = sd->Day;
+   day = sd->Day;
month = sd->Month;
-   if (days < 1 || days > 31 || month < 1 || month > 12) {
-   cifs_dbg(VFS, "illegal date, month %d day: %d\n", month, days);
-   days = clamp(days, 1, 31);
+   if (day < 1 || day > 31 || month < 1 || month > 12) {
+   cifs_dbg(VFS, "illegal date, month %d day: %d\n", month, day);
+   day = clamp(day, 1, 31);
month = clamp(month, 1, 12);
}
month -= 1;
-   days += total_days_of_prev_months[month];
+   days = day + total_days_of_prev_months[month];
days += 3652; /* account for difference in days between 1980 and 1970 */
year = sd->Year;
days += year * 365;
-- 
2.17.1



[PATCH v8 13/20] fs: affs: Initialize filesystem timestamp ranges

2019-08-18 Thread Deepa Dinamani
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

Also fix timestamp calculation to avoid overflow
while converting from days to seconds.

Signed-off-by: Deepa Dinamani 
Acked-by: David Sterba 
Cc: dste...@suse.com
---
 fs/affs/amigaffs.c | 2 +-
 fs/affs/amigaffs.h | 3 +++
 fs/affs/inode.c| 4 ++--
 fs/affs/super.c| 4 
 4 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/affs/amigaffs.c b/fs/affs/amigaffs.c
index 14a6c1b90c9f..f708c45d5f66 100644
--- a/fs/affs/amigaffs.c
+++ b/fs/affs/amigaffs.c
@@ -375,7 +375,7 @@ affs_secs_to_datestamp(time64_t secs, struct affs_date *ds)
u32  minute;
s32  rem;
 
-   secs -= sys_tz.tz_minuteswest * 60 + ((8 * 365 + 2) * 24 * 60 * 60);
+   secs -= sys_tz.tz_minuteswest * 60 + AFFS_EPOCH_DELTA;
if (secs < 0)
secs = 0;
days= div_s64_rem(secs, 86400, );
diff --git a/fs/affs/amigaffs.h b/fs/affs/amigaffs.h
index f9bef9056659..81fb396d4dfa 100644
--- a/fs/affs/amigaffs.h
+++ b/fs/affs/amigaffs.h
@@ -32,6 +32,9 @@
 
 #define AFFS_ROOT_BMAPS25
 
+/* Seconds since Amiga epoch of 1978/01/01 to UNIX */
+#define AFFS_EPOCH_DELTA ((8 * 365 + 2) * 86400LL)
+
 struct affs_date {
__be32 days;
__be32 mins;
diff --git a/fs/affs/inode.c b/fs/affs/inode.c
index 73598bff8506..a346cf7659f1 100644
--- a/fs/affs/inode.c
+++ b/fs/affs/inode.c
@@ -150,10 +150,10 @@ struct inode *affs_iget(struct super_block *sb, unsigned 
long ino)
}
 
inode->i_mtime.tv_sec = inode->i_atime.tv_sec = inode->i_ctime.tv_sec
-  = (be32_to_cpu(tail->change.days) * (24 * 60 * 60) +
+  = (be32_to_cpu(tail->change.days) * 86400LL +
 be32_to_cpu(tail->change.mins) * 60 +
 be32_to_cpu(tail->change.ticks) / 50 +
-((8 * 365 + 2) * 24 * 60 * 60)) +
+AFFS_EPOCH_DELTA) +
 sys_tz.tz_minuteswest * 60;
inode->i_mtime.tv_nsec = inode->i_ctime.tv_nsec = 
inode->i_atime.tv_nsec = 0;
affs_brelse(bh);
diff --git a/fs/affs/super.c b/fs/affs/super.c
index e7d036efbaa1..cc463ae47c12 100644
--- a/fs/affs/super.c
+++ b/fs/affs/super.c
@@ -355,6 +355,10 @@ static int affs_fill_super(struct super_block *sb, void 
*data, int silent)
sb->s_op= _sops;
sb->s_flags |= SB_NODIRATIME;
 
+   sb->s_time_gran = NSEC_PER_SEC;
+   sb->s_time_min = sys_tz.tz_minuteswest * 60 + AFFS_EPOCH_DELTA;
+   sb->s_time_max = 86400LL * U32_MAX + 86400 + sb->s_time_min;
+
sbi = kzalloc(sizeof(struct affs_sb_info), GFP_KERNEL);
if (!sbi)
return -ENOMEM;
-- 
2.17.1



[PATCH v8 09/20] ext4: Initialize timestamps limits

2019-08-18 Thread Deepa Dinamani
ext4 has different overflow limits for max filesystem
timestamps based on the extra bytes available.

The timestamp limits are calculated according to the
encoding table in
a4dad1ae24f85i(ext4: Fix handling of extended tv_sec):

* extra  msb of adjust for signed
* epoch  32-bit 32-bit tv_sec to
* bits   timedecoded 64-bit tv_sec  64-bit tv_sec  valid time range
* 0 01-0x8000..-0x0001  0x0   1901-12-13..1969-12-31
* 0 000x0..0x07fff  0x0   1970-01-01..2038-01-19
* 0 110x08000..0x0  0x1   2038-01-19..2106-02-07
* 0 100x1..0x17fff  0x1   2106-02-07..2174-02-25
* 1 010x18000..0x1  0x2   2174-02-25..2242-03-16
* 1 000x2..0x27fff  0x2   2242-03-16..2310-04-04
* 1 110x28000..0x2  0x3   2310-04-04..2378-04-22
* 1 100x3..0x37fff  0x3   2378-04-22..2446-05-10

Note that the time limits are not correct for deletion times.

Added a warn when an inode cannot be extended to incorporate an
extended timestamp.

Signed-off-by: Deepa Dinamani 
Reviewed-by: Andreas Dilger 
Cc: ty...@mit.edu
Cc: adilger.ker...@dilger.ca
Cc: linux-e...@vger.kernel.org
---
 fs/ext4/ext4.h  | 10 +-
 fs/ext4/super.c | 17 +++--
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 9c7f4036021b..ae5d0c86aba2 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -832,11 +832,15 @@ static inline void ext4_decode_extra_time(struct 
timespec64 *time,
 
 #define EXT4_INODE_SET_XTIME(xtime, inode, raw_inode)  
\
 do {   
\
-   (raw_inode)->xtime = cpu_to_le32((inode)->xtime.tv_sec);
\
if (EXT4_FITS_IN_INODE(raw_inode, EXT4_I(inode), xtime ## _extra)) 
{\
+   (raw_inode)->xtime = cpu_to_le32((inode)->xtime.tv_sec);
\
(raw_inode)->xtime ## _extra =  
\
ext4_encode_extra_time(&(inode)->xtime);
\
}   
\
+   else{\
+   (raw_inode)->xtime = cpu_to_le32(clamp_t(int32_t, 
(inode)->xtime.tv_sec, S32_MIN, S32_MAX));\
+   ext4_warning_inode(inode, "inode does not support timestamps 
beyond 2038"); \
+   } \
 } while (0)
 
 #define EXT4_EINODE_SET_XTIME(xtime, einode, raw_inode)
   \
@@ -1643,6 +1647,10 @@ static inline bool ext4_verity_in_progress(struct inode 
*inode)
 
 #define EXT4_GOOD_OLD_INODE_SIZE 128
 
+#define EXT4_EXTRA_TIMESTAMP_MAX   (((s64)1 << 34) - 1  + S32_MIN)
+#define EXT4_NON_EXTRA_TIMESTAMP_MAX   S32_MAX
+#define EXT4_TIMESTAMP_MIN S32_MIN
+
 /*
  * Feature set definitions
  */
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 27cd622676e7..3db5f17228b7 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4039,8 +4039,21 @@ static int ext4_fill_super(struct super_block *sb, void 
*data, int silent)
   sbi->s_inode_size);
goto failed_mount;
}
-   if (sbi->s_inode_size > EXT4_GOOD_OLD_INODE_SIZE)
-   sb->s_time_gran = 1 << (EXT4_EPOCH_BITS - 2);
+   /*
+* i_atime_extra is the last extra field available for 
[acm]times in
+* struct ext4_inode. Checking for that field should suffice to 
ensure
+* we have extra space for all three.
+*/
+   if (sbi->s_inode_size >= offsetof(struct ext4_inode, 
i_atime_extra) +
+   sizeof(((struct ext4_inode *)0)->i_atime_extra)) {
+   sb->s_time_gran = 1;
+   sb->s_time_max = EXT4_EXTRA_TIMESTAMP_MAX;
+   } else {
+   sb->s_time_gran = NSEC_PER_SEC;
+   sb->s_time_max = EXT4_NON_EXTRA_TIMESTAMP_MAX;
+   }
+
+   sb->s_time_min = EXT4_TIMESTAMP_MIN;
}
 
sbi->s_desc_size = le16_to_cpu(es->s_desc_size);
-- 
2.17.1



[PATCH v8 03/20] timestamp_truncate: Replace users of timespec64_trunc

2019-08-18 Thread Deepa Dinamani
Update the inode timestamp updates to use timestamp_truncate()
instead of timespec64_trunc().

The change was mostly generated by the following coccinelle
script.

virtual context
virtual patch

@r1 depends on patch forall@
struct inode *inode;
identifier i_xtime =~ "^i_[acm]time$";
expression e;
@@

inode->i_xtime =
- timespec64_trunc(
+ timestamp_truncate(
...,
- e);
+ inode);

Signed-off-by: Deepa Dinamani 
Cc: adrian.hun...@intel.com
Cc: dedeki...@gmail.com
Cc: gre...@linuxfoundation.org
Cc: h...@lst.de
Cc: jaeg...@kernel.org
Cc: jl...@evilplan.org
Cc: rich...@nod.at
Cc: t...@kernel.org
Cc: yuch...@huawei.com
Cc: linux-f2fs-de...@lists.sourceforge.net
Cc: linux-ntfs-...@lists.sourceforge.net
Cc: linux-...@lists.infradead.org
---
 fs/attr.c   | 21 -
 fs/configfs/inode.c | 12 ++--
 fs/f2fs/file.c  | 21 -
 fs/kernfs/inode.c   |  7 +++
 fs/ntfs/inode.c | 21 -
 fs/ubifs/file.c | 21 -
 6 files changed, 57 insertions(+), 46 deletions(-)

diff --git a/fs/attr.c b/fs/attr.c
index 1fcfdcc5b367..97b60ad7f419 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -183,15 +183,18 @@ void setattr_copy(struct inode *inode, const struct iattr 
*attr)
inode->i_uid = attr->ia_uid;
if (ia_valid & ATTR_GID)
inode->i_gid = attr->ia_gid;
-   if (ia_valid & ATTR_ATIME)
-   inode->i_atime = timespec64_trunc(attr->ia_atime,
- inode->i_sb->s_time_gran);
-   if (ia_valid & ATTR_MTIME)
-   inode->i_mtime = timespec64_trunc(attr->ia_mtime,
- inode->i_sb->s_time_gran);
-   if (ia_valid & ATTR_CTIME)
-   inode->i_ctime = timespec64_trunc(attr->ia_ctime,
- inode->i_sb->s_time_gran);
+   if (ia_valid & ATTR_ATIME) {
+   inode->i_atime = timestamp_truncate(attr->ia_atime,
+ inode);
+   }
+   if (ia_valid & ATTR_MTIME) {
+   inode->i_mtime = timestamp_truncate(attr->ia_mtime,
+ inode);
+   }
+   if (ia_valid & ATTR_CTIME) {
+   inode->i_ctime = timestamp_truncate(attr->ia_ctime,
+ inode);
+   }
if (ia_valid & ATTR_MODE) {
umode_t mode = attr->ia_mode;
 
diff --git a/fs/configfs/inode.c b/fs/configfs/inode.c
index ab0284321912..884dcf06cfbe 100644
--- a/fs/configfs/inode.c
+++ b/fs/configfs/inode.c
@@ -76,14 +76,14 @@ int configfs_setattr(struct dentry * dentry, struct iattr * 
iattr)
if (ia_valid & ATTR_GID)
sd_iattr->ia_gid = iattr->ia_gid;
if (ia_valid & ATTR_ATIME)
-   sd_iattr->ia_atime = timespec64_trunc(iattr->ia_atime,
- inode->i_sb->s_time_gran);
+   sd_iattr->ia_atime = timestamp_truncate(iattr->ia_atime,
+ inode);
if (ia_valid & ATTR_MTIME)
-   sd_iattr->ia_mtime = timespec64_trunc(iattr->ia_mtime,
- inode->i_sb->s_time_gran);
+   sd_iattr->ia_mtime = timestamp_truncate(iattr->ia_mtime,
+ inode);
if (ia_valid & ATTR_CTIME)
-   sd_iattr->ia_ctime = timespec64_trunc(iattr->ia_ctime,
- inode->i_sb->s_time_gran);
+   sd_iattr->ia_ctime = timestamp_truncate(iattr->ia_ctime,
+ inode);
if (ia_valid & ATTR_MODE) {
umode_t mode = iattr->ia_mode;
 
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 89a9ee22296d..af8cdd345f3d 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -749,15 +749,18 @@ static void __setattr_copy(struct inode *inode, const 
struct iattr *attr)
inode->i_uid = attr->ia_uid;
if (ia_valid & ATTR_GID)
inode->i_gid = attr->ia_gid;
-   if (ia_valid & ATTR_ATIME)
-   inode->i_atime = timespec64_trunc(attr->ia_atime,
- inode->i_sb->s_time_gran);
-   if (ia_valid & ATTR_MTIME)
-   inode->i_mtime = timespec64_trunc(attr->ia_mtime,
- inode->i_sb->s_time_gran);
-   if (ia_valid & ATTR_CTIME)
-   inode->i_ctime = timespec64_trunc(attr->ia_ctime,
- ino

[PATCH v8 19/20] pstore: fs superblock limits

2019-08-18 Thread Deepa Dinamani
Leaving granularity at 1ns because it is dependent on the specific
attached backing pstore module. ramoops has microsecond resolution.

Fix the readback of ramoops fractional timestamp microseconds,
which has incorrectly been reporting the value as nanoseconds since
3f8f80f0 ("pstore/ram: Read and write to the 'compressed' flag of pstore").

Signed-off-by: Deepa Dinamani 
Acked-by: Kees Cook 
Cc: an...@enomsg.org
Cc: ccr...@android.com
Cc: keesc...@chromium.org
Cc: tony.l...@intel.com
---
 fs/pstore/ram.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
index 2bb3468fc93a..8caff834f002 100644
--- a/fs/pstore/ram.c
+++ b/fs/pstore/ram.c
@@ -144,6 +144,7 @@ static int ramoops_read_kmsg_hdr(char *buffer, struct 
timespec64 *time,
if (sscanf(buffer, RAMOOPS_KERNMSG_HDR "%lld.%lu-%c\n%n",
   (time64_t *)>tv_sec, >tv_nsec, _type,
   _length) == 3) {
+   time->tv_nsec *= 1000;
if (data_type == 'C')
*compressed = true;
else
@@ -151,6 +152,7 @@ static int ramoops_read_kmsg_hdr(char *buffer, struct 
timespec64 *time,
} else if (sscanf(buffer, RAMOOPS_KERNMSG_HDR "%lld.%lu\n%n",
  (time64_t *)>tv_sec, >tv_nsec,
  _length) == 2) {
+   time->tv_nsec *= 1000;
*compressed = false;
} else {
time->tv_sec = 0;
-- 
2.17.1



[PATCH v8 12/20] fs: fat: Initialize filesystem timestamp ranges

2019-08-18 Thread Deepa Dinamani
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

Some FAT variants indicate that the years after 2099 are not supported.
Since commit 7decd1cb0305 ("fat: Fix and cleanup timestamp conversion")
we support the full range of years that can be represented, up to 2107.

Signed-off-by: Deepa Dinamani 
Cc: hirof...@mail.parknet.co.jp
---
 fs/fat/inode.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 0bc2abc5d453..f27f84e2103f 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -31,6 +31,11 @@
 
 #define KB_IN_SECTORS 2
 
+/* DOS dates from 1980/1/1 through 2107/12/31 */
+#define FAT_DATE_MIN (0<<9 | 1<<5 | 1)
+#define FAT_DATE_MAX (127<<9 | 12<<5 | 31)
+#define FAT_TIME_MAX (23<<11 | 59<<5 | 29)
+
 /*
  * A deserialized copy of the on-disk structure laid out in struct
  * fat_boot_sector.
@@ -1617,6 +1622,7 @@ int fat_fill_super(struct super_block *sb, void *data, 
int silent, int isvfat,
int debug;
long error;
char buf[50];
+   struct timespec64 ts;
 
/*
 * GFP_KERNEL is ok here, because while we do hold the
@@ -1710,6 +1716,12 @@ int fat_fill_super(struct super_block *sb, void *data, 
int silent, int isvfat,
sbi->free_clus_valid = 0;
sbi->prev_free = FAT_START_ENT;
sb->s_maxbytes = 0x;
+   fat_time_fat2unix(sbi, , 0, cpu_to_le16(FAT_DATE_MIN), 0);
+   sb->s_time_min = ts.tv_sec;
+
+   fat_time_fat2unix(sbi, , cpu_to_le16(FAT_TIME_MAX),
+ cpu_to_le16(FAT_DATE_MAX), 0);
+   sb->s_time_max = ts.tv_sec;
 
if (!sbi->fat_length && bpb.fat32_length) {
struct fat_boot_fsinfo *fsinfo;
-- 
2.17.1



[PATCH v8 17/20] fs: hpfs: Initialize filesystem timestamp ranges

2019-08-18 Thread Deepa Dinamani
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

Also change the local_to_gmt() to use time64_t instead
of time32_t.

Signed-off-by: Deepa Dinamani 
Cc: miku...@artax.karlin.mff.cuni.cz
---
 fs/hpfs/hpfs_fn.h | 6 ++
 fs/hpfs/super.c   | 2 ++
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/hpfs/hpfs_fn.h b/fs/hpfs/hpfs_fn.h
index ab2e7cc2ff33..1cca83218fb5 100644
--- a/fs/hpfs/hpfs_fn.h
+++ b/fs/hpfs/hpfs_fn.h
@@ -334,7 +334,7 @@ long hpfs_ioctl(struct file *file, unsigned cmd, unsigned 
long arg);
  * local time (HPFS) to GMT (Unix)
  */
 
-static inline time64_t local_to_gmt(struct super_block *s, time32_t t)
+static inline time64_t local_to_gmt(struct super_block *s, time64_t t)
 {
extern struct timezone sys_tz;
return t + sys_tz.tz_minuteswest * 60 + hpfs_sb(s)->sb_timeshift;
@@ -343,9 +343,7 @@ static inline time64_t local_to_gmt(struct super_block *s, 
time32_t t)
 static inline time32_t gmt_to_local(struct super_block *s, time64_t t)
 {
extern struct timezone sys_tz;
-   t = t - sys_tz.tz_minuteswest * 60 - hpfs_sb(s)->sb_timeshift;
-
-   return clamp_t(time64_t, t, 0, U32_MAX);
+   return t - sys_tz.tz_minuteswest * 60 - hpfs_sb(s)->sb_timeshift;
 }
 
 static inline time32_t local_get_seconds(struct super_block *s)
diff --git a/fs/hpfs/super.c b/fs/hpfs/super.c
index 9db6d84f0d62..0a677a9aaf34 100644
--- a/fs/hpfs/super.c
+++ b/fs/hpfs/super.c
@@ -614,6 +614,8 @@ static int hpfs_fill_super(struct super_block *s, void 
*options, int silent)
s->s_magic = HPFS_SUPER_MAGIC;
s->s_op = _sops;
s->s_d_op = _dentry_operations;
+   s->s_time_min =  local_to_gmt(s, 0);
+   s->s_time_max =  local_to_gmt(s, U32_MAX);
 
sbi->sb_root = le32_to_cpu(superblock->root);
sbi->sb_fs_size = le32_to_cpu(superblock->n_sectors);
-- 
2.17.1



[PATCH v8 14/20] fs: sysv: Initialize filesystem timestamp ranges

2019-08-18 Thread Deepa Dinamani
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

Signed-off-by: Deepa Dinamani 
Cc: h...@infradead.org
---
 fs/sysv/super.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/sysv/super.c b/fs/sysv/super.c
index d788b1daa7eb..cc8e2ed155c8 100644
--- a/fs/sysv/super.c
+++ b/fs/sysv/super.c
@@ -368,7 +368,8 @@ static int sysv_fill_super(struct super_block *sb, void 
*data, int silent)
sbi->s_block_base = 0;
mutex_init(>s_lock);
sb->s_fs_info = sbi;
-
+   sb->s_time_min = 0;
+   sb->s_time_max = U32_MAX;
sb_set_blocksize(sb, BLOCK_SIZE);
 
for (i = 0; i < ARRAY_SIZE(flavours) && !size; i++) {
@@ -487,6 +488,8 @@ static int v7_fill_super(struct super_block *sb, void 
*data, int silent)
sbi->s_type = FSTYPE_V7;
mutex_init(>s_lock);
sb->s_fs_info = sbi;
+   sb->s_time_min = 0;
+   sb->s_time_max = U32_MAX;

sb_set_blocksize(sb, 512);
 
-- 
2.17.1



[PATCH v8 16/20] fs: orangefs: Initialize filesystem timestamp ranges

2019-08-18 Thread Deepa Dinamani
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

Assume the limits as unsigned according to the below
commit 98e8eef557a9 ("changed PVFS_time from int64_t to uint64_t")
in https://github.com/waltligon/orangefs

Author: Neill Miller 
Date:   Thu Sep 2 15:00:38 2004 +

Signed-off-by: Deepa Dinamani 
Cc: hub...@omnibond.com
Cc: mar...@omnibond.com
Cc: de...@lists.orangefs.org
---
 fs/orangefs/super.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/orangefs/super.c b/fs/orangefs/super.c
index 2811586fafc2..1ec08fe459cf 100644
--- a/fs/orangefs/super.c
+++ b/fs/orangefs/super.c
@@ -425,6 +425,8 @@ static int orangefs_fill_sb(struct super_block *sb,
sb->s_blocksize = PAGE_SIZE;
sb->s_blocksize_bits = PAGE_SHIFT;
sb->s_maxbytes = MAX_LFS_FILESIZE;
+   sb->s_time_min = 0;
+   sb->s_time_max = S64_MAX;
 
ret = super_setup_bdi(sb);
if (ret)
-- 
2.17.1



[PATCH v8 18/20] fs: omfs: Initialize filesystem timestamp ranges

2019-08-18 Thread Deepa Dinamani
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

Signed-off-by: Deepa Dinamani 
Acked-by: Bob Copeland 
Cc: m...@bobcopeland.com
Cc: linux-karma-de...@lists.sourceforge.net
---
 fs/omfs/inode.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/fs/omfs/inode.c b/fs/omfs/inode.c
index 08226a835ec3..b76ec6b88ded 100644
--- a/fs/omfs/inode.c
+++ b/fs/omfs/inode.c
@@ -478,6 +478,10 @@ static int omfs_fill_super(struct super_block *sb, void 
*data, int silent)
 
sb->s_maxbytes = 0x;
 
+   sb->s_time_gran = NSEC_PER_MSEC;
+   sb->s_time_min = 0;
+   sb->s_time_max = U64_MAX / MSEC_PER_SEC;
+
sb_set_blocksize(sb, 0x200);
 
bh = sb_bread(sb, 0);
-- 
2.17.1



[PATCH v8 20/20] isofs: Initialize filesystem timestamp ranges

2019-08-18 Thread Deepa Dinamani
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

Reference: http://www.ecma-international.org/publications/standards/Ecma-119.htm

Signed-off-by: Deepa Dinamani 
---
 fs/isofs/inode.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/fs/isofs/inode.c b/fs/isofs/inode.c
index 9e30d8703735..62c0462dc89f 100644
--- a/fs/isofs/inode.c
+++ b/fs/isofs/inode.c
@@ -30,6 +30,9 @@
 #include "isofs.h"
 #include "zisofs.h"
 
+/* max tz offset is 13 hours */
+#define MAX_TZ_OFFSET (52*15*60)
+
 #define BEQUIET
 
 static int isofs_hashi(const struct dentry *parent, struct qstr *qstr);
@@ -801,6 +804,10 @@ static int isofs_fill_super(struct super_block *s, void 
*data, int silent)
 */
s->s_maxbytes = 0x800LL;
 
+   /* ECMA-119 timestamp from 1900/1/1 with tz offset */
+   s->s_time_min = mktime64(1900, 1, 1, 0, 0, 0) - MAX_TZ_OFFSET;
+   s->s_time_max = mktime64(U8_MAX+1900, 12, 31, 23, 59, 59) + 
MAX_TZ_OFFSET;
+
/* Set this for reference. Its not currently used except on write
   which we don't have .. */
 
-- 
2.17.1



[PATCH v8 15/20] fs: ceph: Initialize filesystem timestamp ranges

2019-08-18 Thread Deepa Dinamani
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

According to the disscussion in
https://patchwork.kernel.org/patch/8308691/ we agreed to use
unsigned 32 bit timestamps on ceph.
Update the limits accordingly.

Signed-off-by: Deepa Dinamani 
Cc: z...@redhat.com
Cc: s...@redhat.com
Cc: idryo...@gmail.com
Cc: ceph-de...@vger.kernel.org
---
 fs/ceph/super.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index 4406ec7018bb..2aa052b7bda7 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -907,6 +907,8 @@ static int ceph_set_super(struct super_block *s, struct 
fs_context *fc)
s->s_export_op = _export_ops;
 
s->s_time_gran = 1;
+   s->s_time_min = 0;
+   s->s_time_max = U32_MAX;
 
ret = set_anon_super_fc(s, fc);
if (ret != 0)
-- 
2.17.1



[PATCH v8 08/20] adfs: Fill in max and min timestamps in sb

2019-08-18 Thread Deepa Dinamani
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

Note that the min timestamp is assumed to be
01 Jan 1970 00:00:00 (Unix epoch). This is consistent
with the way we convert timestamps in adfs_adfs2unix_time().

Signed-off-by: Deepa Dinamani 
---
 fs/adfs/adfs.h  | 13 +
 fs/adfs/inode.c |  8 ++--
 fs/adfs/super.c |  2 ++
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/fs/adfs/adfs.h b/fs/adfs/adfs.h
index b7e844d2f321..dca8b23aa43f 100644
--- a/fs/adfs/adfs.h
+++ b/fs/adfs/adfs.h
@@ -3,6 +3,19 @@
 #include 
 #include 
 
+/*
+ * 01 Jan 1970 00:00:00 (Unix epoch) as seconds since
+ * 01 Jan 1900 00:00:00 (RISC OS epoch)
+ */
+#define RISC_OS_EPOCH_DELTA 2208988800LL
+
+/*
+ * Convert 40 bit centi seconds to seconds
+ * since 01 Jan 1900 00:00:00 (RISC OS epoch)
+ * The result is 2248-06-03 06:57:57 GMT
+ */
+#define ADFS_MAX_TIMESTAMP ((0xFFLL / 100) - RISC_OS_EPOCH_DELTA)
+
 /* Internal data structures for ADFS */
 
 #define ADFS_FREE_FRAG  0
diff --git a/fs/adfs/inode.c b/fs/adfs/inode.c
index 124de75413a5..41eca1c451dc 100644
--- a/fs/adfs/inode.c
+++ b/fs/adfs/inode.c
@@ -167,11 +167,7 @@ static void
 adfs_adfs2unix_time(struct timespec64 *tv, struct inode *inode)
 {
unsigned int high, low;
-   /* 01 Jan 1970 00:00:00 (Unix epoch) as nanoseconds since
-* 01 Jan 1900 00:00:00 (RISC OS epoch)
-*/
-   static const s64 nsec_unix_epoch_diff_risc_os_epoch =
-   22089888000LL;
+   static const s64 nsec_unix_epoch_diff_risc_os_epoch = 
RISC_OS_EPOCH_DELTA * NSEC_PER_SEC;
s64 nsec;
 
if (!adfs_inode_is_stamped(inode))
@@ -216,7 +212,7 @@ adfs_unix2adfs_time(struct inode *inode, unsigned int secs)
if (adfs_inode_is_stamped(inode)) {
/* convert 32-bit seconds to 40-bit centi-seconds */
low  = (secs & 255) * 100;
-   high = (secs / 256) * 100 + (low >> 8) + 0x336e996a;
+   high = (secs / 256) * 100 + (low >> 8) + 
(RISC_OS_EPOCH_DELTA*100/256);
 
ADFS_I(inode)->loadaddr = (high >> 24) |
(ADFS_I(inode)->loadaddr & ~0xff);
diff --git a/fs/adfs/super.c b/fs/adfs/super.c
index 65b04ebb51c3..f074fe7d7158 100644
--- a/fs/adfs/super.c
+++ b/fs/adfs/super.c
@@ -463,6 +463,8 @@ static int adfs_fill_super(struct super_block *sb, void 
*data, int silent)
asb->s_map_size = dr->nzones | (dr->nzones_high << 8);
asb->s_map2blk  = dr->log2bpmb - dr->log2secsize;
asb->s_log2sharesize= dr->log2sharesize;
+   sb->s_time_min  = 0;
+   sb->s_time_max  = ADFS_MAX_TIMESTAMP;
 
asb->s_map = adfs_read_map(sb, dr);
if (IS_ERR(asb->s_map)) {
-- 
2.17.1



[PATCH v8 02/20] vfs: Add timestamp_truncate() api

2019-08-18 Thread Deepa Dinamani
timespec_trunc() function is used to truncate a
filesystem timestamp to the right granularity.
But, the function does not clamp tv_sec part of the
timestamps according to the filesystem timestamp limits.

The replacement api: timestamp_truncate() also alters the
signature of the function to accommodate filesystem
timestamp clamping according to flesystem limits.

Note that the tv_nsec part is set to 0 if tv_sec is not within
the range supported for the filesystem.

Signed-off-by: Deepa Dinamani 
---
 fs/inode.c | 33 -
 include/linux/fs.h |  2 ++
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/fs/inode.c b/fs/inode.c
index ef33fdf0105f..fef457a42882 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2169,6 +2169,37 @@ struct timespec64 timespec64_trunc(struct timespec64 t, 
unsigned gran)
 }
 EXPORT_SYMBOL(timespec64_trunc);
 
+/**
+ * timestamp_truncate - Truncate timespec to a granularity
+ * @t: Timespec
+ * @inode: inode being updated
+ *
+ * Truncate a timespec to the granularity supported by the fs
+ * containing the inode. Always rounds down. gran must
+ * not be 0 nor greater than a second (NSEC_PER_SEC, or 10^9 ns).
+ */
+struct timespec64 timestamp_truncate(struct timespec64 t, struct inode *inode)
+{
+   struct super_block *sb = inode->i_sb;
+   unsigned int gran = sb->s_time_gran;
+
+   t.tv_sec = clamp(t.tv_sec, sb->s_time_min, sb->s_time_max);
+   if (unlikely(t.tv_sec == sb->s_time_max || t.tv_sec == sb->s_time_min))
+   t.tv_nsec = 0;
+
+   /* Avoid division in the common cases 1 ns and 1 s. */
+   if (gran == 1)
+   ; /* nothing */
+   else if (gran == NSEC_PER_SEC)
+   t.tv_nsec = 0;
+   else if (gran > 1 && gran < NSEC_PER_SEC)
+   t.tv_nsec -= t.tv_nsec % gran;
+   else
+   WARN(1, "invalid file time granularity: %u", gran);
+   return t;
+}
+EXPORT_SYMBOL(timestamp_truncate);
+
 /**
  * current_time - Return FS time
  * @inode: inode.
@@ -2190,7 +2221,7 @@ struct timespec64 current_time(struct inode *inode)
return now;
}
 
-   return timespec64_trunc(now, inode->i_sb->s_time_gran);
+   return timestamp_truncate(now, inode);
 }
 EXPORT_SYMBOL(current_time);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 93c440d22547..1170d8260aa2 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -737,6 +737,8 @@ struct inode {
void*i_private; /* fs or device private pointer */
 } __randomize_layout;
 
+struct timespec64 timestamp_truncate(struct timespec64 t, struct inode *inode);
+
 static inline unsigned int i_blocksize(const struct inode *node)
 {
return (1 << node->i_blkbits);
-- 
2.17.1



[PATCH v8 05/20] utimes: Clamp the timestamps before update

2019-08-18 Thread Deepa Dinamani
POSIX is ambiguous on the behavior of timestamps for
futimens, utimensat and utimes. Whether to return an
error or silently clamp a timestamp beyond the range
supported by the underlying filesystems is not clear.

POSIX.1 section for futimens, utimensat and utimes says:
(http://pubs.opengroup.org/onlinepubs/9699919799/functions/futimens.html)

The file's relevant timestamp shall be set to the greatest
value supported by the file system that is not greater
than the specified time.

If the tv_nsec field of a timespec structure has the special
value UTIME_NOW, the file's relevant timestamp shall be set
to the greatest value supported by the file system that is
not greater than the current time.

[EINVAL]
A new file timestamp would be a value whose tv_sec
component is not a value supported by the file system.

The patch chooses to clamp the timestamps according to the
filesystem timestamp ranges and does not return an error.
This is in line with the behavior of utime syscall also
since the POSIX 
page(http://pubs.opengroup.org/onlinepubs/009695399/functions/utime.html)
for utime does not mention returning an error or clamping like above.

Same for utimes 
http://pubs.opengroup.org/onlinepubs/009695399/functions/utimes.html

Signed-off-by: Deepa Dinamani 
---
 fs/utimes.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/utimes.c b/fs/utimes.c
index 350c9c16ace1..1ba3f7883870 100644
--- a/fs/utimes.c
+++ b/fs/utimes.c
@@ -36,16 +36,14 @@ static int utimes_common(const struct path *path, struct 
timespec64 *times)
if (times[0].tv_nsec == UTIME_OMIT)
newattrs.ia_valid &= ~ATTR_ATIME;
else if (times[0].tv_nsec != UTIME_NOW) {
-   newattrs.ia_atime.tv_sec = times[0].tv_sec;
-   newattrs.ia_atime.tv_nsec = times[0].tv_nsec;
+   newattrs.ia_atime = timestamp_truncate(times[0], inode);
newattrs.ia_valid |= ATTR_ATIME_SET;
}
 
if (times[1].tv_nsec == UTIME_OMIT)
newattrs.ia_valid &= ~ATTR_MTIME;
else if (times[1].tv_nsec != UTIME_NOW) {
-   newattrs.ia_mtime.tv_sec = times[1].tv_sec;
-   newattrs.ia_mtime.tv_nsec = times[1].tv_nsec;
+   newattrs.ia_mtime = timestamp_truncate(times[1], inode);
newattrs.ia_valid |= ATTR_MTIME_SET;
}
/*
-- 
2.17.1



[PATCH v8 04/20] mount: Add mount warning for impending timestamp expiry

2019-08-18 Thread Deepa Dinamani
The warning reuses the uptime max of 30 years used by
settimeofday().

Note that the warning is only emitted for writable filesystem mounts
through the mount syscall. Automounts do not have the same warning.

Print out the warning in human readable format using the struct tm.
After discussion with Arnd Bergmann, we chose to print only the year number.
The raw s_time_max is also displayed, and the user can easily decode
it e.g. "date -u -d @$((0x7fff))". We did not want to consolidate
struct rtc_tm and struct tm just to print the date using a format specifier
as part of this series.
Given that the rtc_tm is not compiled on all architectures, this is not a
trivial patch. This can be added in the future.

Signed-off-by: Deepa Dinamani 
---
 fs/namespace.c | 33 -
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index bfcb4e341257..7fcf289593c5 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2468,6 +2468,26 @@ static void set_mount_attributes(struct mount *mnt, 
unsigned int mnt_flags)
unlock_mount_hash();
 }
 
+static void mnt_warn_timestamp_expiry(struct path *mountpoint, struct vfsmount 
*mnt)
+{
+   struct super_block *sb = mnt->mnt_sb;
+
+   if (!__mnt_is_readonly(mnt) &&
+  (ktime_get_real_seconds() + TIME_UPTIME_SEC_MAX > sb->s_time_max)) {
+   char *buf = (char *)__get_free_page(GFP_KERNEL);
+   char *mntpath = buf ? d_path(mountpoint, buf, PAGE_SIZE) : 
ERR_PTR(-ENOMEM);
+   struct tm tm;
+
+   time64_to_tm(sb->s_time_max, 0, );
+
+   pr_warn("Mounted %s file system at %s supports timestamps until 
%04ld (0x%llx)\n",
+   sb->s_type->name, mntpath,
+   tm.tm_year+1900, (unsigned long long)sb->s_time_max);
+
+   free_page((unsigned long)buf);
+   }
+}
+
 /*
  * Handle reconfiguration of the mountpoint only without alteration of the
  * superblock it refers to.  This is triggered by specifying MS_REMOUNT|MS_BIND
@@ -2493,6 +2513,9 @@ static int do_reconfigure_mnt(struct path *path, unsigned 
int mnt_flags)
if (ret == 0)
set_mount_attributes(mnt, mnt_flags);
up_write(>s_umount);
+
+   mnt_warn_timestamp_expiry(path, >mnt);
+
return ret;
 }
 
@@ -2533,6 +2556,9 @@ static int do_remount(struct path *path, int ms_flags, 
int sb_flags,
}
up_write(>s_umount);
}
+
+   mnt_warn_timestamp_expiry(path, >mnt);
+
put_fs_context(fc);
return err;
 }
@@ -2741,8 +2767,13 @@ static int do_new_mount_fc(struct fs_context *fc, struct 
path *mountpoint,
return PTR_ERR(mnt);
 
error = do_add_mount(real_mount(mnt), mountpoint, mnt_flags);
-   if (error < 0)
+   if (error < 0) {
mntput(mnt);
+   return error;
+   }
+
+   mnt_warn_timestamp_expiry(mountpoint, mnt);
+
return error;
 }
 
-- 
2.17.1



[PATCH v8 01/20] vfs: Add file timestamp range support

2019-08-18 Thread Deepa Dinamani
Add fields to the superblock to track the min and max
timestamps supported by filesystems.

Initially, when a superblock is allocated, initialize
it to the max and min values the fields can hold.
Individual filesystems override these to match their
actual limits.

Pseudo filesystems are assumed to always support the
min and max allowable values for the fields.

Signed-off-by: Deepa Dinamani 
---
 fs/super.c | 2 ++
 include/linux/fs.h | 3 +++
 include/linux/time64.h | 2 ++
 3 files changed, 7 insertions(+)

diff --git a/fs/super.c b/fs/super.c
index 36cb5aaf6f08..620c1911bb36 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -258,6 +258,8 @@ static struct super_block *alloc_super(struct 
file_system_type *type, int flags,
s->s_maxbytes = MAX_NON_LFS;
s->s_op = _op;
s->s_time_gran = 10;
+   s->s_time_min = TIME64_MIN;
+   s->s_time_max = TIME64_MAX;
s->cleancache_poolid = CLEANCACHE_NO_POOL;
 
s->s_shrink.seeks = DEFAULT_SEEKS;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d97d74f35eb3..93c440d22547 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1463,6 +1463,9 @@ struct super_block {
 
/* Granularity of c/m/atime in ns (cannot be worse than a second) */
u32 s_time_gran;
+   /* Time limits for c/m/atime in seconds */
+   time64_t   s_time_min;
+   time64_t   s_time_max;
 #ifdef CONFIG_FSNOTIFY
__u32   s_fsnotify_mask;
struct fsnotify_mark_connector __rcu*s_fsnotify_marks;
diff --git a/include/linux/time64.h b/include/linux/time64.h
index a620ee610b9f..19125489ae94 100644
--- a/include/linux/time64.h
+++ b/include/linux/time64.h
@@ -30,6 +30,8 @@ struct itimerspec64 {
 
 /* Located here for timespec[64]_valid_strict */
 #define TIME64_MAX ((s64)~((u64)1 << 63))
+#define TIME64_MIN (-TIME64_MAX - 1)
+
 #define KTIME_MAX  ((s64)~((u64)1 << 63))
 #define KTIME_SEC_MAX  (KTIME_MAX / NSEC_PER_SEC)
 
-- 
2.17.1



Re: [PATCH 19/20] pstore: fs superblock limits

2019-08-18 Thread Deepa Dinamani
On Fri, Aug 2, 2019 at 12:15 AM Arnd Bergmann  wrote:
>
> On Fri, Aug 2, 2019 at 4:26 AM Deepa Dinamani  wrote:
> >
> > On Tue, Jul 30, 2019 at 12:36 AM Arnd Bergmann  wrote:
> > >
> > > On Tue, Jul 30, 2019 at 6:31 AM Kees Cook  wrote:
> > > >
> > > > On Mon, Jul 29, 2019 at 06:49:23PM -0700, Deepa Dinamani wrote:
> > > > > Also update the gran since pstore has microsecond granularity.
> > > >
> > > > So, I'm fine with this, but technically the granularity depends on the
> > > > backend storage... many have no actual time keeping, though. My point 
> > > > is,
> > > > pstore's timestamps are really mostly a lie, but the most common backend
> > > > (ramoops) is seconds-granularity.
> > > >
> > > > So, I'm fine with this, but it's a lie but it's a lie that doesn't
> > > > matter, so ...
> > > >
> > > > Acked-by: Kees Cook 
> > > >
> > > > I'm open to suggestions to improve it...
> > >
> > > If we don't care about using sub-second granularity, then setting it
> > > to one second unconditionally here will make it always use that and
> > > report it correctly.
> >
> > Should this printf in ramoops_write_kmsg_hdr() also be fixed then?
> >
> > RAMOOPS_KERNMSG_HDR "%lld.%06lu-%c\n",
> > (time64_t)record->time.tv_sec,
> > record->time.tv_nsec / 1000,
> > record->compressed ? 'C' : 'D');
> > persistent_ram_write(prz, hdr, len);
> >
> > ramoops_read_kmsg_hdr() doesn't read this as microseconds. Seems like
> > a mismatch from above.
>
> Good catch. This seems to go back to commit 3f8f80f0cfeb ("pstore/ram:
> Read and write to the 'compressed' flag of pstore"), which introduced the
> nanosecond read. The write function however has always used
> microseconds, and that was kept when the implementation changed
> from timeval to timespec in commit 1e817fb62cd1 ("time: create
> __getnstimeofday for WARNless calls").
>
> > If we want to agree that we just want seconds granularity for pstore,
> > we could replace the tv_nsec part to be all 0's if anybody else is
> > depending on this format.
> > I could drop this patch from the series and post that patch seperately.
>
> We should definitely fix it to not produce a bogus nanosecond value.
> Whether using full seconds or microsecond resolution is better here,
> I don't know. It seems that pstore records generally get created
> with a nanosecond nanosecond accurate timestamp from
> ktime_get_real_fast_ns() and then truncated to the resolution of the
> backend, rather than the normal jiffies-accurate inode timestamps that
> we have for regular file systems.
>
> This might mean that we do want the highest possible resolution
> and not further truncate here, in case that information ends
> up being useful afterwards.

I made a list of granularities used by pstore drivers using pstore_register():

1. efi - seconds
2. ramoops - microsecond
3. erst - seconds
4. powerpc/nvram64 - seconds

I will leave pstore granularity at nanoseconds and fix the ramoops read.

-Deepa


Re: [Y2038] [PATCH 04/20] mount: Add mount warning for impending timestamp expiry

2019-08-12 Thread Deepa Dinamani
On Mon, Aug 12, 2019 at 9:09 AM Deepa Dinamani  wrote:
>
> On Mon, Aug 12, 2019 at 7:11 AM Arnd Bergmann  wrote:
> >
> > On Mon, Aug 12, 2019 at 3:25 PM Ben Hutchings
> >  wrote:
> > > On Sat, 2019-08-10 at 13:44 -0700, Deepa Dinamani wrote:
> > > > On Mon, Aug 5, 2019 at 7:14 AM Ben Hutchings
> > > >  wrote:
> > > > > On Mon, 2019-07-29 at 18:49 -0700, Deepa Dinamani wrote:
> > > > > > The warning reuses the uptime max of 30 years used by the
> > > > > > setitimeofday().
> > > > > >
> > > > > > Note that the warning is only added for new filesystem mounts
> > > > > > through the mount syscall. Automounts do not have the same warning.
> > > > > [...]
> > > > >
> > > > > Another thing - perhaps this warning should be suppressed for 
> > > > > read-only
> > > > > mounts?
> > > >
> > > > Many filesystems support read only mounts only. We do fill in right
> > > > granularities and limits for these filesystems as well. In keeping
> > > > with the trend, I have added the warning accordingly. I don't think I
> > > > have a preference either way. But, not warning for the red only mounts
> > > > adds another if case. If you have a strong preference, I could add it
> > > > in.
> > >
> > > It seems to me that the warning is needed if there is a possibility of
> > > data loss (incorrect timestamps, potentially leading to incorrect
> > > decisions about which files are newer).  This can happen only when a
> > > filesystem is mounted read-write, or when a filesystem image is
> > > created.
> > >
> > > I think that warning for read-only mounts would be an annoyance to
> > > users retrieving files from old filesystems.
> >
> > I agree, the warning is not helpful for read-only mounts. An earlier
> > plan was to completely disallow writable mounts that might risk an
> > overflow (in some configurations at least). The warning replaces that
> > now, and I think it should also just warn for the cases that would
> > otherwise have been dangerous.
>
> Ok, I will make the change to exclude new read only mounts. I will use
> __mnt_is_readonly() so that it also exculdes filesystems that are
> readonly also.
> The diff looks like below:
>
> -   if (!error && sb->s_time_max &&
> +   if (!error && !__mnt_is_readonly(mnt) &&
> (ktime_get_real_seconds() + TIME_UPTIME_SEC_MAX > 
> sb->s_time_max)) {
>
> Note that we can get rid of checking for non zero sb->s_time_max now.

One more thing, we will probably have to add a second warning for when
the filesystem is re-mounted rw after the initial readonly mount.

-Deepa


Re: [Y2038] [PATCH 04/20] mount: Add mount warning for impending timestamp expiry

2019-08-12 Thread Deepa Dinamani
On Mon, Aug 12, 2019 at 7:11 AM Arnd Bergmann  wrote:
>
> On Mon, Aug 12, 2019 at 3:25 PM Ben Hutchings
>  wrote:
> > On Sat, 2019-08-10 at 13:44 -0700, Deepa Dinamani wrote:
> > > On Mon, Aug 5, 2019 at 7:14 AM Ben Hutchings
> > >  wrote:
> > > > On Mon, 2019-07-29 at 18:49 -0700, Deepa Dinamani wrote:
> > > > > The warning reuses the uptime max of 30 years used by the
> > > > > setitimeofday().
> > > > >
> > > > > Note that the warning is only added for new filesystem mounts
> > > > > through the mount syscall. Automounts do not have the same warning.
> > > > [...]
> > > >
> > > > Another thing - perhaps this warning should be suppressed for read-only
> > > > mounts?
> > >
> > > Many filesystems support read only mounts only. We do fill in right
> > > granularities and limits for these filesystems as well. In keeping
> > > with the trend, I have added the warning accordingly. I don't think I
> > > have a preference either way. But, not warning for the red only mounts
> > > adds another if case. If you have a strong preference, I could add it
> > > in.
> >
> > It seems to me that the warning is needed if there is a possibility of
> > data loss (incorrect timestamps, potentially leading to incorrect
> > decisions about which files are newer).  This can happen only when a
> > filesystem is mounted read-write, or when a filesystem image is
> > created.
> >
> > I think that warning for read-only mounts would be an annoyance to
> > users retrieving files from old filesystems.
>
> I agree, the warning is not helpful for read-only mounts. An earlier
> plan was to completely disallow writable mounts that might risk an
> overflow (in some configurations at least). The warning replaces that
> now, and I think it should also just warn for the cases that would
> otherwise have been dangerous.

Ok, I will make the change to exclude new read only mounts. I will use
__mnt_is_readonly() so that it also exculdes filesystems that are
readonly also.
The diff looks like below:

-   if (!error && sb->s_time_max &&
+   if (!error && !__mnt_is_readonly(mnt) &&
(ktime_get_real_seconds() + TIME_UPTIME_SEC_MAX > sb->s_time_max)) {

Note that we can get rid of checking for non zero sb->s_time_max now.

-Deepa


Re: [Y2038] [PATCH 04/20] mount: Add mount warning for impending timestamp expiry

2019-08-10 Thread Deepa Dinamani
> This doesn't seem like a helpful way to log the time.  Maybe use
> time64_to_tm() to convert to "broken down" time and then print it with
> "%ptR"... but that wants struct rtc_time.  If you apply the attached
> patch, however, you should then be able to print struct tm with
> "%ptT".

OK. Will print a more user friendly date string here.

Thanks,
-Deepa


Re: [Y2038] [PATCH 04/20] mount: Add mount warning for impending timestamp expiry

2019-08-10 Thread Deepa Dinamani
On Mon, Aug 5, 2019 at 7:14 AM Ben Hutchings
 wrote:
>
> On Mon, 2019-07-29 at 18:49 -0700, Deepa Dinamani wrote:
> > The warning reuses the uptime max of 30 years used by the
> > setitimeofday().
> >
> > Note that the warning is only added for new filesystem mounts
> > through the mount syscall. Automounts do not have the same warning.
> [...]
>
> Another thing - perhaps this warning should be suppressed for read-only
> mounts?

Many filesystems support read only mounts only. We do fill in right
granularities and limits for these filesystems as well. In keeping
with the trend, I have added the warning accordingly. I don't think I
have a preference either way. But, not warning for the red only mounts
adds another if case. If you have a strong preference, I could add it
in.

-Deepa


Re: [Y2038] [PATCH 05/20] utimes: Clamp the timestamps before update

2019-08-10 Thread Deepa Dinamani
On Mon, Aug 5, 2019 at 6:30 AM Ben Hutchings
 wrote:
>
> On Mon, 2019-07-29 at 18:49 -0700, Deepa Dinamani wrote:
> > POSIX is ambiguous on the behavior of timestamps for
> > futimens, utimensat and utimes. Whether to return an
> > error or silently clamp a timestamp beyond the range
> > supported by the underlying filesystems is not clear.
> >
> > POSIX.1 section for futimens, utimensat and utimes says:
> > (http://pubs.opengroup.org/onlinepubs/9699919799/functions/futimens.html)
> >
> > The file's relevant timestamp shall be set to the greatest
> > value supported by the file system that is not greater
> > than the specified time.
> >
> > If the tv_nsec field of a timespec structure has the special
> > value UTIME_NOW, the file's relevant timestamp shall be set
> > to the greatest value supported by the file system that is
> > not greater than the current time.
> >
> > [EINVAL]
> > A new file timestamp would be a value whose tv_sec
> > component is not a value supported by the file system.
> >
> > The patch chooses to clamp the timestamps according to the
> > filesystem timestamp ranges and does not return an error.
> > This is in line with the behavior of utime syscall also
> > since the POSIX 
> > page(http://pubs.opengroup.org/onlinepubs/009695399/functions/utime.html)
> > for utime does not mention returning an error or clamping like above.
> >
> > Same for utimes 
> > http://pubs.opengroup.org/onlinepubs/009695399/functions/utimes.html
> >
> > Signed-off-by: Deepa Dinamani 
> > ---
> >  fs/utimes.c | 17 +
> >  1 file changed, 13 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/utimes.c b/fs/utimes.c
> > index 350c9c16ace1..4c1a2ce90bbc 100644
> > --- a/fs/utimes.c
> > +++ b/fs/utimes.c
> > @@ -21,6 +21,7 @@ static int utimes_common(const struct path *path, struct 
> > timespec64 *times)
> >   int error;
> >   struct iattr newattrs;
> >   struct inode *inode = path->dentry->d_inode;
> > + struct super_block *sb = inode->i_sb;
> >   struct inode *delegated_inode = NULL;
> >
> >   error = mnt_want_write(path->mnt);
> > @@ -36,16 +37,24 @@ static int utimes_common(const struct path *path, 
> > struct timespec64 *times)
> >   if (times[0].tv_nsec == UTIME_OMIT)
> >   newattrs.ia_valid &= ~ATTR_ATIME;
> >   else if (times[0].tv_nsec != UTIME_NOW) {
> > - newattrs.ia_atime.tv_sec = times[0].tv_sec;
> > - newattrs.ia_atime.tv_nsec = times[0].tv_nsec;
> > + newattrs.ia_atime.tv_sec =
> > + clamp(times[0].tv_sec, sb->s_time_min, 
> > sb->s_time_max);
> > + if (times[0].tv_sec == sb->s_time_max || 
> > times[0].tv_sec == sb->s_time_min)
>
> This is testing the un-clamped value.
>
> > + newattrs.ia_atime.tv_nsec = 0;
> > + else
> > + newattrs.ia_atime.tv_nsec = times[0].tv_nsec;
> >   newattrs.ia_valid |= ATTR_ATIME_SET;
> >   }
> >
> >   if (times[1].tv_nsec == UTIME_OMIT)
> >   newattrs.ia_valid &= ~ATTR_MTIME;
> >   else if (times[1].tv_nsec != UTIME_NOW) {
> > - newattrs.ia_mtime.tv_sec = times[1].tv_sec;
> > - newattrs.ia_mtime.tv_nsec = times[1].tv_nsec;
> > + newattrs.ia_mtime.tv_sec =
> > + clamp(times[1].tv_sec, sb->s_time_min, 
> > sb->s_time_max);
> > + if (times[1].tv_sec >= sb->s_time_max || 
> > times[1].tv_sec == sb->s_time_min)
>
> Similarly here, for the minimum.
>
> I suggest testing for clamping like this:
>
> if (newattrs.ia_atime.tv_sec != times[0].tv_sec)
> ...
> if (newattrs.ia_mtime.tv_sec != times[1].tv_sec)
> ...
>
> Ben.
>
> > + newattrs.ia_mtime.tv_nsec = 0;
> > + else
> > + newattrs.ia_mtime.tv_nsec = times[1].tv_nsec;
> >   newattrs.ia_valid |= ATTR_MTIME_SET;
> >   }

Darrick pointed out that maybe we could use timestamp_truncate() here to clamp.
I think it is ok to truncate to the right granularity also here.
setattr callbacks do it already. So the diff here looks like below:

-   newattrs.ia_atime.tv_sec =
-   clamp(times[0].tv_sec, sb->s_time_min,
sb->s_time_max);
-   if (times[0].tv_sec == sb->s_time_max ||
times[0].tv_sec == sb->s_time_min)
-   newattrs.ia_atime.tv_nsec = 0;
-   else
-   newattrs.ia_atime.tv_nsec = times[0].tv_nsec;
+   newattrs.ia_atime = timestamp_truncate(times[0], inode);

Thanks,
Deepa


Re: [PATCH 09/20] ext4: Initialize timestamps limits

2019-08-08 Thread Deepa Dinamani
> Rather than printing a warning at mount time (which may be confusing
> to users for a problem they may never see), it makes sense to only
> print such a warning in the vanishingly small case that someone actually
> tries to modify the inode timestamp but it doesn't fit, rather than on
> the theoretical case that may never happen.

Arnd and I were discussing and we came to a similar conclusion that we
would not warn at mount. Arnd suggested maybe we include a
pr_warn_ratelimited() when we do write to these special inodes.

In general, there is an agreement to leave the fs granularity to a
higher resolution for such super blocks. And hence, the timestamps for
these special cases will never be clamped in memory.(Assuming we do
not want to make more changes and try to do something like
__ext4_expand_extra_isize() for in memory inode updates)
We could easily try to clamp these timestamps when they get written
out to the disk by modifying the EXT4_INODE_SET_XTIME to include such
a clamp. And, at this point we could also warn.

If this is acceptable, I could post an update.

-Deepa


Re: [PATCH 19/20] pstore: fs superblock limits

2019-08-01 Thread Deepa Dinamani
On Tue, Jul 30, 2019 at 12:36 AM Arnd Bergmann  wrote:
>
> On Tue, Jul 30, 2019 at 6:31 AM Kees Cook  wrote:
> >
> > On Mon, Jul 29, 2019 at 06:49:23PM -0700, Deepa Dinamani wrote:
> > > Also update the gran since pstore has microsecond granularity.
> >
> > So, I'm fine with this, but technically the granularity depends on the
> > backend storage... many have no actual time keeping, though. My point is,
> > pstore's timestamps are really mostly a lie, but the most common backend
> > (ramoops) is seconds-granularity.
> >
> > So, I'm fine with this, but it's a lie but it's a lie that doesn't
> > matter, so ...
> >
> > Acked-by: Kees Cook 
> >
> > I'm open to suggestions to improve it...
>
> If we don't care about using sub-second granularity, then setting it
> to one second unconditionally here will make it always use that and
> report it correctly.

Should this printf in ramoops_write_kmsg_hdr() also be fixed then?

RAMOOPS_KERNMSG_HDR "%lld.%06lu-%c\n",
(time64_t)record->time.tv_sec,
record->time.tv_nsec / 1000,
record->compressed ? 'C' : 'D');
persistent_ram_write(prz, hdr, len);

ramoops_read_kmsg_hdr() doesn't read this as microseconds. Seems like
a mismatch from above.

If we want to agree that we just want seconds granularity for pstore,
we could replace the tv_nsec part to be all 0's if anybody else is
depending on this format.
I could drop this patch from the series and post that patch seperately.

Thanks,
-Deepa


Re: [PATCH 09/20] ext4: Initialize timestamps limits

2019-08-01 Thread Deepa Dinamani
On Wed, Jul 31, 2019 at 8:26 AM Darrick J. Wong  wrote:
>
> On Mon, Jul 29, 2019 at 06:49:13PM -0700, Deepa Dinamani wrote:
> > ext4 has different overflow limits for max filesystem
> > timestamps based on the extra bytes available.
> >
> > The timestamp limits are calculated according to the
> > encoding table in
> > a4dad1ae24f85i(ext4: Fix handling of extended tv_sec):
> >
> > * extra  msb of adjust for signed
> > * epoch  32-bit 32-bit tv_sec to
> > * bits   timedecoded 64-bit tv_sec  64-bit tv_sec  valid time range
> > * 0 01-0x8000..-0x0001  0x0   1901-12-13..1969-12-31
> > * 0 000x0..0x07fff  0x0   1970-01-01..2038-01-19
> > * 0 110x08000..0x0  0x1   2038-01-19..2106-02-07
> > * 0 100x1..0x17fff  0x1   2106-02-07..2174-02-25
> > * 1 010x18000..0x1  0x2   2174-02-25..2242-03-16
> > * 1 000x2..0x27fff  0x2   2242-03-16..2310-04-04
> > * 1 110x28000..0x2  0x3   2310-04-04..2378-04-22
> > * 1 100x3..0x37fff  0x3   2378-04-22..2446-05-10
>
> My recollection of ext4 has gotten rusty, so this could be a bogus
> question:
>
> Say you have a filesystem with s_inode_size > 128 where not all of the
> ondisk inodes have been upgraded to i_extra_isize > 0 and therefore
> don't support nanoseconds or times beyond 2038.  I think this happens on
> ext3 filesystems that reserved extra space for inode attrs that are
> subsequently converted to ext4?
>
> In any case, that means that you have some inodes that support 34-bit
> tv_sec and some inodes that only support 32-bit tv_sec.  For the inodes
> with 32-bit tv_sec, I think all that happens is that a large timestamp
> will be truncated further, right?
>
> And no mount time warning because at least /some/ of the inodes are
> ready to go for the next 30 years?

I'm confused about ext3 being converted to ext4. If the converted
inodes have extra space, then ext4_iget() will start using the extra
space when it modifies the on disk inode, won't it?

But, if there are 32 bit tv_sec and 34 bit tv_sec in a superblock then
from this macro below, if an inode has space for extra bits in
timestamps, it uses it. Otherwise, only the first 32 bits are copied
to the on disk timestamp. This matches the behavior today for 32 bit
tv_sec. But, the 34 bit tv_sec has a better behavior after the series
because of the clamping and warning.

#define EXT4_INODE_SET_XTIME(xtime, inode, raw_inode)\
do {\
(raw_inode)->xtime = cpu_to_le32((inode)->xtime.tv_sec);\
if (EXT4_FITS_IN_INODE(raw_inode, EXT4_I(inode), xtime ## _extra)) {\
(raw_inode)->xtime ## _extra =\
ext4_encode_extra_time(&(inode)->xtime);\
}\
} while (0)

I'm not sure if this corner case if important. Maybe the maintainers
can help me here. If this is important, then the inode time updates
for an ext4 inode should always be through ext4_setattr() and we can
clamp the timestamps there as a special case. And, this patch can be
added separately?

Thanks,
Deepa


Re: [PATCH 05/20] utimes: Clamp the timestamps before update

2019-07-31 Thread Deepa Dinamani
On Wed, Jul 31, 2019 at 8:15 AM Darrick J. Wong  wrote:
>
> On Mon, Jul 29, 2019 at 06:49:09PM -0700, Deepa Dinamani wrote:
> > POSIX is ambiguous on the behavior of timestamps for
> > futimens, utimensat and utimes. Whether to return an
> > error or silently clamp a timestamp beyond the range
> > supported by the underlying filesystems is not clear.
> >
> > POSIX.1 section for futimens, utimensat and utimes says:
> > (http://pubs.opengroup.org/onlinepubs/9699919799/functions/futimens.html)
> >
> > The file's relevant timestamp shall be set to the greatest
> > value supported by the file system that is not greater
> > than the specified time.
> >
> > If the tv_nsec field of a timespec structure has the special
> > value UTIME_NOW, the file's relevant timestamp shall be set
> > to the greatest value supported by the file system that is
> > not greater than the current time.
> >
> > [EINVAL]
> > A new file timestamp would be a value whose tv_sec
> > component is not a value supported by the file system.
> >
> > The patch chooses to clamp the timestamps according to the
> > filesystem timestamp ranges and does not return an error.
> > This is in line with the behavior of utime syscall also
> > since the POSIX 
> > page(http://pubs.opengroup.org/onlinepubs/009695399/functions/utime.html)
> > for utime does not mention returning an error or clamping like above.
> >
> > Same for utimes 
> > http://pubs.opengroup.org/onlinepubs/009695399/functions/utimes.html
> >
> > Signed-off-by: Deepa Dinamani 
> > ---
> >  fs/utimes.c | 17 +
> >  1 file changed, 13 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/utimes.c b/fs/utimes.c
> > index 350c9c16ace1..4c1a2ce90bbc 100644
> > --- a/fs/utimes.c
> > +++ b/fs/utimes.c
> > @@ -21,6 +21,7 @@ static int utimes_common(const struct path *path, struct 
> > timespec64 *times)
> >   int error;
> >   struct iattr newattrs;
> >   struct inode *inode = path->dentry->d_inode;
> > + struct super_block *sb = inode->i_sb;
> >   struct inode *delegated_inode = NULL;
> >
> >   error = mnt_want_write(path->mnt);
> > @@ -36,16 +37,24 @@ static int utimes_common(const struct path *path, 
> > struct timespec64 *times)
> >   if (times[0].tv_nsec == UTIME_OMIT)
> >   newattrs.ia_valid &= ~ATTR_ATIME;
> >   else if (times[0].tv_nsec != UTIME_NOW) {
> > - newattrs.ia_atime.tv_sec = times[0].tv_sec;
> > - newattrs.ia_atime.tv_nsec = times[0].tv_nsec;
> > + newattrs.ia_atime.tv_sec =
> > + clamp(times[0].tv_sec, sb->s_time_min, 
> > sb->s_time_max);
> > + if (times[0].tv_sec == sb->s_time_max || 
> > times[0].tv_sec == sb->s_time_min)
> > + newattrs.ia_atime.tv_nsec = 0;
> > + else
> > + newattrs.ia_atime.tv_nsec = times[0].tv_nsec;
> >   newattrs.ia_valid |= ATTR_ATIME_SET;
> >   }
> >
> >   if (times[1].tv_nsec == UTIME_OMIT)
> >   newattrs.ia_valid &= ~ATTR_MTIME;
> >   else if (times[1].tv_nsec != UTIME_NOW) {
> > - newattrs.ia_mtime.tv_sec = times[1].tv_sec;
> > - newattrs.ia_mtime.tv_nsec = times[1].tv_nsec;
> > + newattrs.ia_mtime.tv_sec =
> > + clamp(times[1].tv_sec, sb->s_time_min, 
> > sb->s_time_max);
> > + if (times[1].tv_sec >= sb->s_time_max || 
> > times[1].tv_sec == sb->s_time_min)
>
> Line length.
>
> Also, didn't you just introduce a function to clamp tv_sec and fix
> granularity?  Why not just use it here?  I think this is the third time
> I've seen this open-coded logic.

Yes, we can use that now. Earlier we were not setting the tv_nsec to 0
in timestamp_truncate() which is why this was opencoded here.
I will make the change to include this.

Thanks,
Deepa


Re: [PATCH 12/20] fs: fat: Initialize filesystem timestamp ranges

2019-07-30 Thread Deepa Dinamani
On Tue, Jul 30, 2019 at 2:31 AM OGAWA Hirofumi
 wrote:
>
> Deepa Dinamani  writes:
>
> > +/* DOS dates from 1980/1/1 through 2107/12/31 */
> > +#define FAT_DATE_MIN (0<<9 | 1<<5 | 1)
> > +#define FAT_DATE_MAX (127<<9 | 12<<5 | 31)
> > +#define FAT_TIME_MAX (23<<11 | 59<<5 | 29)
> > +
> >  /*
> >   * A deserialized copy of the on-disk structure laid out in struct
> >   * fat_boot_sector.
> > @@ -1605,6 +1610,7 @@ int fat_fill_super(struct super_block *sb, void 
> > *data, int silent, int isvfat,
> >   int debug;
> >   long error;
> >   char buf[50];
> > + struct timespec64 ts;
> >
> >   /*
> >* GFP_KERNEL is ok here, because while we do hold the
> > @@ -1698,6 +1704,12 @@ int fat_fill_super(struct super_block *sb, void 
> > *data, int silent, int isvfat,
> >   sbi->free_clus_valid = 0;
> >   sbi->prev_free = FAT_START_ENT;
> >   sb->s_maxbytes = 0x;
> > + fat_time_fat2unix(sbi, , 0, cpu_to_le16(FAT_DATE_MIN), 0);
> > + sb->s_time_min = ts.tv_sec;
> > +
> > + fat_time_fat2unix(sbi, , cpu_to_le16(FAT_TIME_MAX),
> > +   cpu_to_le16(FAT_DATE_MAX), 0);
> > + sb->s_time_max = ts.tv_sec;
>
> At least, it is wrong to call fat_time_fat2unix() before setup parameters
> in sbi.

All the parameters that fat_time_fat2unix() cares in sbi is accessed through

static inline int fat_tz_offset(struct msdos_sb_info *sbi)
{
return (sbi->options.tz_set ?
   -sbi->options.time_offset :
   sys_tz.tz_minuteswest) * SECS_PER_MIN;
}

Both the sbi fields sbi->options.tz_set and sbi->options.time_offset
are set by the call to parse_options(). And, parse_options() is called
before the calls to fat_time_fat2unix().:

int fat_fill_super(struct super_block *sb, void *data, int silent, int isvfat,
   void (*setup)(struct super_block *))
{
 

error = parse_options(sb, data, isvfat, silent, , >options);
if (error)
goto out_fail;

   

sbi->prev_free = FAT_START_ENT;
sb->s_maxbytes = 0x;
fat_time_fat2unix(sbi, , 0, cpu_to_le16(FAT_DATE_MIN), 0);
sb->s_time_min = ts.tv_sec;

fat_time_fat2unix(sbi, , cpu_to_le16(FAT_TIME_MAX),
  cpu_to_le16(FAT_DATE_MAX), 0);
sb->s_time_max = ts.tv_sec;

   
}

I do not see what the problem is.

-Deepa


Re: [PATCH 03/20] timestamp_truncate: Replace users of timespec64_trunc

2019-07-30 Thread Deepa Dinamani
On Tue, Jul 30, 2019 at 1:27 AM OGAWA Hirofumi
 wrote:
>
> Deepa Dinamani  writes:
>
> > diff --git a/fs/fat/misc.c b/fs/fat/misc.c
> > index 1e08bd54c5fb..53bb7c6bf993 100644
> > --- a/fs/fat/misc.c
> > +++ b/fs/fat/misc.c
> > @@ -307,8 +307,9 @@ int fat_truncate_time(struct inode *inode, struct 
> > timespec64 *now, int flags)
> >   inode->i_atime = (struct timespec64){ seconds, 0 };
> >   }
> >   if (flags & S_CTIME) {
> > - if (sbi->options.isvfat)
> > - inode->i_ctime = timespec64_trunc(*now, 1000);
> > + if (sbi->options.isvfat) {
> > + inode->i_ctime = timestamp_truncate(*now, inode);
> > + }
> >   else
> >   inode->i_ctime = fat_timespec64_trunc_2secs(*now);
> >   }
>
> Looks like broken. It changed to sb->s_time_gran from 1000, and
> changed coding style.

This is using a new api: timestamp_truncate(). granularity is gotten
by inode->sb->s_time_gran. See Patch [2/20]:
https://lkml.org/lkml/2019/7/29/1853

So this is not broken if fat is filling in the right granularity in the sb.

-Deepa


[PATCH 16/20] fs: orangefs: Initialize filesystem timestamp ranges

2019-07-29 Thread Deepa Dinamani
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

Assume the limits as unsigned according to the below
commit 98e8eef557a9 ("changed PVFS_time from int64_t to uint64_t")
in https://github.com/waltligon/orangefs

Author: Neill Miller 
Date:   Thu Sep 2 15:00:38 2004 +

Signed-off-by: Deepa Dinamani 
Cc: hub...@omnibond.com
Cc: mar...@omnibond.com
Cc: de...@lists.orangefs.org
---
 fs/orangefs/super.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/orangefs/super.c b/fs/orangefs/super.c
index ee5efdc35cc1..dcd97e8158b1 100644
--- a/fs/orangefs/super.c
+++ b/fs/orangefs/super.c
@@ -439,6 +439,8 @@ static int orangefs_fill_sb(struct super_block *sb,
sb->s_blocksize = PAGE_SIZE;
sb->s_blocksize_bits = PAGE_SHIFT;
sb->s_maxbytes = MAX_LFS_FILESIZE;
+   sb->s_time_min = 0;
+   sb->s_time_max = S64_MAX;
 
ret = super_setup_bdi(sb);
if (ret)
-- 
2.17.1



[PATCH 15/20] fs: ceph: Initialize filesystem timestamp ranges

2019-07-29 Thread Deepa Dinamani
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

According to the disscussion in
https://patchwork.kernel.org/patch/8308691/ we agreed to use
unsigned 32 bit timestamps on ceph.
Update the limits accordingly.

Signed-off-by: Deepa Dinamani 
Cc: z...@redhat.com
Cc: s...@redhat.com
Cc: idryo...@gmail.com
Cc: ceph-de...@vger.kernel.org
---
 fs/ceph/super.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index d57fa60dcd43..6cf3a4fceac1 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -981,6 +981,8 @@ static int ceph_set_super(struct super_block *s, void *data)
s->s_export_op = _export_ops;
 
s->s_time_gran = 1000;  /* 1000 ns == 1 us */
+   s->s_time_min = 0;
+   s->s_time_max = U32_MAX;
 
ret = set_anon_super(s, NULL);  /* what is that second arg for? */
if (ret != 0)
-- 
2.17.1



[PATCH 18/20] fs: omfs: Initialize filesystem timestamp ranges

2019-07-29 Thread Deepa Dinamani
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

Signed-off-by: Deepa Dinamani 
Cc: m...@bobcopeland.com
Cc: linux-karma-de...@lists.sourceforge.net
---
 fs/omfs/inode.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/fs/omfs/inode.c b/fs/omfs/inode.c
index 08226a835ec3..b76ec6b88ded 100644
--- a/fs/omfs/inode.c
+++ b/fs/omfs/inode.c
@@ -478,6 +478,10 @@ static int omfs_fill_super(struct super_block *sb, void 
*data, int silent)
 
sb->s_maxbytes = 0x;
 
+   sb->s_time_gran = NSEC_PER_MSEC;
+   sb->s_time_min = 0;
+   sb->s_time_max = U64_MAX / MSEC_PER_SEC;
+
sb_set_blocksize(sb, 0x200);
 
bh = sb_bread(sb, 0);
-- 
2.17.1



[PATCH 17/20] fs: hpfs: Initialize filesystem timestamp ranges

2019-07-29 Thread Deepa Dinamani
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

Also change the local_to_gmt() to use time64_t instead
of time32_t.

Signed-off-by: Deepa Dinamani 
Cc: miku...@artax.karlin.mff.cuni.cz
---
 fs/hpfs/hpfs_fn.h | 6 ++
 fs/hpfs/super.c   | 2 ++
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/hpfs/hpfs_fn.h b/fs/hpfs/hpfs_fn.h
index ab2e7cc2ff33..1cca83218fb5 100644
--- a/fs/hpfs/hpfs_fn.h
+++ b/fs/hpfs/hpfs_fn.h
@@ -334,7 +334,7 @@ long hpfs_ioctl(struct file *file, unsigned cmd, unsigned 
long arg);
  * local time (HPFS) to GMT (Unix)
  */
 
-static inline time64_t local_to_gmt(struct super_block *s, time32_t t)
+static inline time64_t local_to_gmt(struct super_block *s, time64_t t)
 {
extern struct timezone sys_tz;
return t + sys_tz.tz_minuteswest * 60 + hpfs_sb(s)->sb_timeshift;
@@ -343,9 +343,7 @@ static inline time64_t local_to_gmt(struct super_block *s, 
time32_t t)
 static inline time32_t gmt_to_local(struct super_block *s, time64_t t)
 {
extern struct timezone sys_tz;
-   t = t - sys_tz.tz_minuteswest * 60 - hpfs_sb(s)->sb_timeshift;
-
-   return clamp_t(time64_t, t, 0, U32_MAX);
+   return t - sys_tz.tz_minuteswest * 60 - hpfs_sb(s)->sb_timeshift;
 }
 
 static inline time32_t local_get_seconds(struct super_block *s)
diff --git a/fs/hpfs/super.c b/fs/hpfs/super.c
index 9db6d84f0d62..0a677a9aaf34 100644
--- a/fs/hpfs/super.c
+++ b/fs/hpfs/super.c
@@ -614,6 +614,8 @@ static int hpfs_fill_super(struct super_block *s, void 
*options, int silent)
s->s_magic = HPFS_SUPER_MAGIC;
s->s_op = _sops;
s->s_d_op = _dentry_operations;
+   s->s_time_min =  local_to_gmt(s, 0);
+   s->s_time_max =  local_to_gmt(s, U32_MAX);
 
sbi->sb_root = le32_to_cpu(superblock->root);
sbi->sb_fs_size = le32_to_cpu(superblock->n_sectors);
-- 
2.17.1



[PATCH 13/20] fs: affs: Initialize filesystem timestamp ranges

2019-07-29 Thread Deepa Dinamani
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

Also fix timestamp calculation to avoid overflow
while converting from days to seconds.

Signed-off-by: Deepa Dinamani 
Cc: dste...@suse.com
---
 fs/affs/amigaffs.c | 2 +-
 fs/affs/amigaffs.h | 3 +++
 fs/affs/inode.c| 4 ++--
 fs/affs/super.c| 4 
 4 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/affs/amigaffs.c b/fs/affs/amigaffs.c
index 14a6c1b90c9f..f708c45d5f66 100644
--- a/fs/affs/amigaffs.c
+++ b/fs/affs/amigaffs.c
@@ -375,7 +375,7 @@ affs_secs_to_datestamp(time64_t secs, struct affs_date *ds)
u32  minute;
s32  rem;
 
-   secs -= sys_tz.tz_minuteswest * 60 + ((8 * 365 + 2) * 24 * 60 * 60);
+   secs -= sys_tz.tz_minuteswest * 60 + AFFS_EPOCH_DELTA;
if (secs < 0)
secs = 0;
days= div_s64_rem(secs, 86400, );
diff --git a/fs/affs/amigaffs.h b/fs/affs/amigaffs.h
index f9bef9056659..81fb396d4dfa 100644
--- a/fs/affs/amigaffs.h
+++ b/fs/affs/amigaffs.h
@@ -32,6 +32,9 @@
 
 #define AFFS_ROOT_BMAPS25
 
+/* Seconds since Amiga epoch of 1978/01/01 to UNIX */
+#define AFFS_EPOCH_DELTA ((8 * 365 + 2) * 86400LL)
+
 struct affs_date {
__be32 days;
__be32 mins;
diff --git a/fs/affs/inode.c b/fs/affs/inode.c
index 73598bff8506..a346cf7659f1 100644
--- a/fs/affs/inode.c
+++ b/fs/affs/inode.c
@@ -150,10 +150,10 @@ struct inode *affs_iget(struct super_block *sb, unsigned 
long ino)
}
 
inode->i_mtime.tv_sec = inode->i_atime.tv_sec = inode->i_ctime.tv_sec
-  = (be32_to_cpu(tail->change.days) * (24 * 60 * 60) +
+  = (be32_to_cpu(tail->change.days) * 86400LL +
 be32_to_cpu(tail->change.mins) * 60 +
 be32_to_cpu(tail->change.ticks) / 50 +
-((8 * 365 + 2) * 24 * 60 * 60)) +
+AFFS_EPOCH_DELTA) +
 sys_tz.tz_minuteswest * 60;
inode->i_mtime.tv_nsec = inode->i_ctime.tv_nsec = 
inode->i_atime.tv_nsec = 0;
affs_brelse(bh);
diff --git a/fs/affs/super.c b/fs/affs/super.c
index e7d036efbaa1..cc463ae47c12 100644
--- a/fs/affs/super.c
+++ b/fs/affs/super.c
@@ -355,6 +355,10 @@ static int affs_fill_super(struct super_block *sb, void 
*data, int silent)
sb->s_op= _sops;
sb->s_flags |= SB_NODIRATIME;
 
+   sb->s_time_gran = NSEC_PER_SEC;
+   sb->s_time_min = sys_tz.tz_minuteswest * 60 + AFFS_EPOCH_DELTA;
+   sb->s_time_max = 86400LL * U32_MAX + 86400 + sb->s_time_min;
+
sbi = kzalloc(sizeof(struct affs_sb_info), GFP_KERNEL);
if (!sbi)
return -ENOMEM;
-- 
2.17.1



[PATCH 07/20] 9p: Fill min and max timestamps in sb

2019-07-29 Thread Deepa Dinamani
struct p9_wstat and struct p9_stat_dotl indicate that the
wire transport uses u32 and u64 fields for timestamps.
Fill in the appropriate limits to avoid inconsistencies in
the vfs cached inode times when timestamps are outside the
permitted range.

Note that the upper bound for V9FS_PROTO_2000L is retained as S64_MAX.
This is because that is the upper bound supported by vfs.

Signed-off-by: Deepa Dinamani 
Cc: eri...@gmail.com
Cc: lu...@ionkov.net
Cc: asmad...@codewreck.org
Cc: v9fs-develo...@lists.sourceforge.net
---
 fs/9p/vfs_super.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
index 08112fbcaece..ca243e658d71 100644
--- a/fs/9p/vfs_super.c
+++ b/fs/9p/vfs_super.c
@@ -69,8 +69,12 @@ v9fs_fill_super(struct super_block *sb, struct 
v9fs_session_info *v9ses,
if (v9fs_proto_dotl(v9ses)) {
sb->s_op = _super_ops_dotl;
sb->s_xattr = v9fs_xattr_handlers;
-   } else
+   } else {
sb->s_op = _super_ops;
+   sb->s_time_max = U32_MAX;
+   }
+
+   sb->s_time_min = 0;
 
ret = super_setup_bdi(sb);
if (ret)
-- 
2.17.1



[PATCH 19/20] pstore: fs superblock limits

2019-07-29 Thread Deepa Dinamani
Also update the gran since pstore has microsecond granularity.

Signed-off-by: Deepa Dinamani 
Cc: an...@enomsg.org
Cc: ccr...@android.com
Cc: keesc...@chromium.org
Cc: tony.l...@intel.com
---
 fs/pstore/inode.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c
index 89a80b568a17..ee752f9fda57 100644
--- a/fs/pstore/inode.c
+++ b/fs/pstore/inode.c
@@ -388,7 +388,9 @@ static int pstore_fill_super(struct super_block *sb, void 
*data, int silent)
sb->s_blocksize_bits= PAGE_SHIFT;
sb->s_magic = PSTOREFS_MAGIC;
sb->s_op= _ops;
-   sb->s_time_gran = 1;
+   sb->s_time_gran = NSEC_PER_USEC;
+   sb->s_time_min  = S64_MIN;
+   sb->s_time_max  = S64_MAX;
 
parse_options(data);
 
-- 
2.17.1



[PATCH 11/20] fs: cifs: Initialize filesystem timestamp ranges

2019-07-29 Thread Deepa Dinamani
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

Also fixed cnvrtDosUnixTm calculations to avoid int overflow
while computing maximum date.

References:

http://cifs.com/

https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-cifs/d416ff7c-c536-406e-a951-4f04b2fd1d2b

Signed-off-by: Deepa Dinamani 
Cc: sfre...@samba.org
Cc: linux-c...@vger.kernel.org
---
 fs/cifs/cifsfs.c  | 22 ++
 fs/cifs/netmisc.c | 14 +++---
 2 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 81a16b4e1b48..94a52a63b9ea 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -56,6 +56,15 @@
 #include "dfs_cache.h"
 #endif
 
+/*
+ * DOS dates from 1980/1/1 through 2107/12/31
+ * Protocol specifications indicate the range should be to 119, which
+ * limits maximum year to 2099. But this range has not been checked.
+ */
+#define SMB_DATE_MAX (127<<9 | 12<<5 | 31)
+#define SMB_DATE_MIN (0<<9 | 1<<5 | 1)
+#define SMB_TIME_MAX (23<<11 | 59<<5 | 29)
+
 int cifsFYI = 0;
 bool traceSMB;
 bool enable_oplocks = true;
@@ -142,6 +151,7 @@ cifs_read_super(struct super_block *sb)
struct inode *inode;
struct cifs_sb_info *cifs_sb;
struct cifs_tcon *tcon;
+   struct timespec64 ts;
int rc = 0;
 
cifs_sb = CIFS_SB(sb);
@@ -161,6 +171,18 @@ cifs_read_super(struct super_block *sb)
/* BB FIXME fix time_gran to be larger for LANMAN sessions */
sb->s_time_gran = 100;
 
+   if (tcon->unix_ext) {
+   ts = cifs_NTtimeToUnix(0);
+   sb->s_time_min = ts.tv_sec;
+   ts = cifs_NTtimeToUnix(cpu_to_le64(S64_MAX));
+   sb->s_time_max = ts.tv_sec;
+   } else {
+   ts = cnvrtDosUnixTm(cpu_to_le16(SMB_DATE_MIN), 0, 0);
+   sb->s_time_min = ts.tv_sec;
+   ts = cnvrtDosUnixTm(cpu_to_le16(SMB_DATE_MAX), 
cpu_to_le16(SMB_TIME_MAX), 0);
+   sb->s_time_max = ts.tv_sec;
+   }
+
sb->s_magic = CIFS_MAGIC_NUMBER;
sb->s_op = _super_ops;
sb->s_xattr = cifs_xattr_handlers;
diff --git a/fs/cifs/netmisc.c b/fs/cifs/netmisc.c
index ed92958e842d..49c17ee18254 100644
--- a/fs/cifs/netmisc.c
+++ b/fs/cifs/netmisc.c
@@ -949,8 +949,8 @@ static const int total_days_of_prev_months[] = {
 struct timespec64 cnvrtDosUnixTm(__le16 le_date, __le16 le_time, int offset)
 {
struct timespec64 ts;
-   time64_t sec;
-   int min, days, month, year;
+   time64_t sec, days;
+   int min, day, month, year;
u16 date = le16_to_cpu(le_date);
u16 time = le16_to_cpu(le_time);
SMB_TIME *st = (SMB_TIME *)
@@ -966,15 +966,15 @@ struct timespec64 cnvrtDosUnixTm(__le16 le_date, __le16 
le_time, int offset)
sec += 60 * 60 * st->Hours;
if (st->Hours > 24)
cifs_dbg(VFS, "illegal hours %d\n", st->Hours);
-   days = sd->Day;
+   day = sd->Day;
month = sd->Month;
-   if (days < 1 || days > 31 || month < 1 || month > 12) {
-   cifs_dbg(VFS, "illegal date, month %d day: %d\n", month, days);
-   days = clamp(days, 1, 31);
+   if (day < 1 || day > 31 || month < 1 || month > 12) {
+   cifs_dbg(VFS, "illegal date, month %d day: %d\n", month, day);
+   day = clamp(day, 1, 31);
month = clamp(month, 1, 12);
}
month -= 1;
-   days += total_days_of_prev_months[month];
+   days = day + total_days_of_prev_months[month];
days += 3652; /* account for difference in days between 1980 and 1970 */
year = sd->Year;
days += year * 365;
-- 
2.17.1



[PATCH 20/20] isofs: Initialize filesystem timestamp ranges

2019-07-29 Thread Deepa Dinamani
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

Reference: http://www.ecma-international.org/publications/standards/Ecma-119.htm

Signed-off-by: Deepa Dinamani 
---
 fs/isofs/inode.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/fs/isofs/inode.c b/fs/isofs/inode.c
index 9e30d8703735..62c0462dc89f 100644
--- a/fs/isofs/inode.c
+++ b/fs/isofs/inode.c
@@ -30,6 +30,9 @@
 #include "isofs.h"
 #include "zisofs.h"
 
+/* max tz offset is 13 hours */
+#define MAX_TZ_OFFSET (52*15*60)
+
 #define BEQUIET
 
 static int isofs_hashi(const struct dentry *parent, struct qstr *qstr);
@@ -801,6 +804,10 @@ static int isofs_fill_super(struct super_block *s, void 
*data, int silent)
 */
s->s_maxbytes = 0x800LL;
 
+   /* ECMA-119 timestamp from 1900/1/1 with tz offset */
+   s->s_time_min = mktime64(1900, 1, 1, 0, 0, 0) - MAX_TZ_OFFSET;
+   s->s_time_max = mktime64(U8_MAX+1900, 12, 31, 23, 59, 59) + 
MAX_TZ_OFFSET;
+
/* Set this for reference. Its not currently used except on write
   which we don't have .. */
 
-- 
2.17.1



[PATCH 08/20] adfs: Fill in max and min timestamps in sb

2019-07-29 Thread Deepa Dinamani
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

Note that the min timestamp is assumed to be
01 Jan 1970 00:00:00 (Unix epoch). This is consistent
with the way we convert timestamps in adfs_adfs2unix_time().

Signed-off-by: Deepa Dinamani 
---
 fs/adfs/adfs.h  | 13 +
 fs/adfs/inode.c |  8 ++--
 fs/adfs/super.c |  2 ++
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/fs/adfs/adfs.h b/fs/adfs/adfs.h
index 804c6a77c5db..781b1c3817e0 100644
--- a/fs/adfs/adfs.h
+++ b/fs/adfs/adfs.h
@@ -2,6 +2,19 @@
 #include 
 #include 
 
+/*
+ * 01 Jan 1970 00:00:00 (Unix epoch) as seconds since
+ * 01 Jan 1900 00:00:00 (RISC OS epoch)
+ */
+#define RISC_OS_EPOCH_DELTA 2208988800LL
+
+/*
+ * Convert 40 bit centi seconds to seconds
+ * since 01 Jan 1900 00:00:00 (RISC OS epoch)
+ * The result is 2248-06-03 06:57:57 GMT
+ */
+#define ADFS_MAX_TIMESTAMP ((0xFFLL / 100) - RISC_OS_EPOCH_DELTA)
+
 /* Internal data structures for ADFS */
 
 #define ADFS_FREE_FRAG  0
diff --git a/fs/adfs/inode.c b/fs/adfs/inode.c
index 66621e96f9af..3f75cefc0380 100644
--- a/fs/adfs/inode.c
+++ b/fs/adfs/inode.c
@@ -170,11 +170,7 @@ static void
 adfs_adfs2unix_time(struct timespec64 *tv, struct inode *inode)
 {
unsigned int high, low;
-   /* 01 Jan 1970 00:00:00 (Unix epoch) as nanoseconds since
-* 01 Jan 1900 00:00:00 (RISC OS epoch)
-*/
-   static const s64 nsec_unix_epoch_diff_risc_os_epoch =
-   22089888000LL;
+   static const s64 nsec_unix_epoch_diff_risc_os_epoch = 
RISC_OS_EPOCH_DELTA * NSEC_PER_SEC;
s64 nsec;
 
if (ADFS_I(inode)->stamped == 0)
@@ -219,7 +215,7 @@ adfs_unix2adfs_time(struct inode *inode, unsigned int secs)
if (ADFS_I(inode)->stamped) {
/* convert 32-bit seconds to 40-bit centi-seconds */
low  = (secs & 255) * 100;
-   high = (secs / 256) * 100 + (low >> 8) + 0x336e996a;
+   high = (secs / 256) * 100 + (low >> 8) + 
(RISC_OS_EPOCH_DELTA*100/256);
 
ADFS_I(inode)->loadaddr = (high >> 24) |
(ADFS_I(inode)->loadaddr & ~0xff);
diff --git a/fs/adfs/super.c b/fs/adfs/super.c
index 2a83655c408f..0a0854ef9e3c 100644
--- a/fs/adfs/super.c
+++ b/fs/adfs/super.c
@@ -449,6 +449,8 @@ static int adfs_fill_super(struct super_block *sb, void 
*data, int silent)
asb->s_size = adfs_discsize(dr, sb->s_blocksize_bits);
asb->s_version  = dr->format_version;
asb->s_log2sharesize= dr->log2sharesize;
+   sb->s_time_min  = 0;
+   sb->s_time_max  = ADFS_MAX_TIMESTAMP;
 
asb->s_map = adfs_read_map(sb, dr);
if (IS_ERR(asb->s_map)) {
-- 
2.17.1



[PATCH 04/20] mount: Add mount warning for impending timestamp expiry

2019-07-29 Thread Deepa Dinamani
The warning reuses the uptime max of 30 years used by the
setitimeofday().

Note that the warning is only added for new filesystem mounts
through the mount syscall. Automounts do not have the same warning.

Signed-off-by: Deepa Dinamani 
---
 fs/namespace.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index b26778bdc236..5314fac8035e 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2739,6 +2739,17 @@ static int do_new_mount_fc(struct fs_context *fc, struct 
path *mountpoint,
error = do_add_mount(real_mount(mnt), mountpoint, mnt_flags);
if (error < 0)
mntput(mnt);
+
+   if (!error && sb->s_time_max &&
+   (ktime_get_real_seconds() + TIME_UPTIME_SEC_MAX > sb->s_time_max)) {
+   char *buf = (char *)__get_free_page(GFP_KERNEL);
+   char *mntpath = buf ? d_path(mountpoint, buf, PAGE_SIZE) : 
ERR_PTR(-ENOMEM);
+
+   pr_warn("Mounted %s file system at %s supports timestamps until 
0x%llx\n",
+   fc->fs_type->name, mntpath, (unsigned long 
long)sb->s_time_max);
+   free_page((unsigned long)buf);
+   }
+
return error;
 }
 
-- 
2.17.1



[PATCH 09/20] ext4: Initialize timestamps limits

2019-07-29 Thread Deepa Dinamani
ext4 has different overflow limits for max filesystem
timestamps based on the extra bytes available.

The timestamp limits are calculated according to the
encoding table in
a4dad1ae24f85i(ext4: Fix handling of extended tv_sec):

* extra  msb of adjust for signed
* epoch  32-bit 32-bit tv_sec to
* bits   timedecoded 64-bit tv_sec  64-bit tv_sec  valid time range
* 0 01-0x8000..-0x0001  0x0   1901-12-13..1969-12-31
* 0 000x0..0x07fff  0x0   1970-01-01..2038-01-19
* 0 110x08000..0x0  0x1   2038-01-19..2106-02-07
* 0 100x1..0x17fff  0x1   2106-02-07..2174-02-25
* 1 010x18000..0x1  0x2   2174-02-25..2242-03-16
* 1 000x2..0x27fff  0x2   2242-03-16..2310-04-04
* 1 110x28000..0x2  0x3   2310-04-04..2378-04-22
* 1 100x3..0x37fff  0x3   2378-04-22..2446-05-10

Note that the time limits are not correct for deletion times.

Signed-off-by: Deepa Dinamani 
Reviewed-by: Andreas Dilger 
Cc: ty...@mit.edu
Cc: adilger.ker...@dilger.ca
Cc: linux-e...@vger.kernel.org
---
 fs/ext4/ext4.h  |  4 
 fs/ext4/super.c | 17 +++--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 1cb67859e051..3f13cf12ae7f 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1631,6 +1631,10 @@ static inline void ext4_clear_state_flags(struct 
ext4_inode_info *ei)
 
 #define EXT4_GOOD_OLD_INODE_SIZE 128
 
+#define EXT4_EXTRA_TIMESTAMP_MAX   (((s64)1 << 34) - 1  + S32_MIN)
+#define EXT4_NON_EXTRA_TIMESTAMP_MAX   S32_MAX
+#define EXT4_TIMESTAMP_MIN S32_MIN
+
 /*
  * Feature set definitions
  */
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 4079605d437a..3ea2d60f33aa 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4035,8 +4035,21 @@ static int ext4_fill_super(struct super_block *sb, void 
*data, int silent)
   sbi->s_inode_size);
goto failed_mount;
}
-   if (sbi->s_inode_size > EXT4_GOOD_OLD_INODE_SIZE)
-   sb->s_time_gran = 1 << (EXT4_EPOCH_BITS - 2);
+   /*
+* i_atime_extra is the last extra field available for 
[acm]times in
+* struct ext4_inode. Checking for that field should suffice to 
ensure
+* we have extra space for all three.
+*/
+   if (sbi->s_inode_size >= offsetof(struct ext4_inode, 
i_atime_extra) +
+   sizeof(((struct ext4_inode *)0)->i_atime_extra)) {
+   sb->s_time_gran = 1;
+   sb->s_time_max = EXT4_EXTRA_TIMESTAMP_MAX;
+   } else {
+   sb->s_time_gran = NSEC_PER_SEC;
+   sb->s_time_max = EXT4_NON_EXTRA_TIMESTAMP_MAX;
+   }
+
+   sb->s_time_min = EXT4_TIMESTAMP_MIN;
}
 
sbi->s_desc_size = le16_to_cpu(es->s_desc_size);
-- 
2.17.1



[PATCH 05/20] utimes: Clamp the timestamps before update

2019-07-29 Thread Deepa Dinamani
POSIX is ambiguous on the behavior of timestamps for
futimens, utimensat and utimes. Whether to return an
error or silently clamp a timestamp beyond the range
supported by the underlying filesystems is not clear.

POSIX.1 section for futimens, utimensat and utimes says:
(http://pubs.opengroup.org/onlinepubs/9699919799/functions/futimens.html)

The file's relevant timestamp shall be set to the greatest
value supported by the file system that is not greater
than the specified time.

If the tv_nsec field of a timespec structure has the special
value UTIME_NOW, the file's relevant timestamp shall be set
to the greatest value supported by the file system that is
not greater than the current time.

[EINVAL]
A new file timestamp would be a value whose tv_sec
component is not a value supported by the file system.

The patch chooses to clamp the timestamps according to the
filesystem timestamp ranges and does not return an error.
This is in line with the behavior of utime syscall also
since the POSIX 
page(http://pubs.opengroup.org/onlinepubs/009695399/functions/utime.html)
for utime does not mention returning an error or clamping like above.

Same for utimes 
http://pubs.opengroup.org/onlinepubs/009695399/functions/utimes.html

Signed-off-by: Deepa Dinamani 
---
 fs/utimes.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/fs/utimes.c b/fs/utimes.c
index 350c9c16ace1..4c1a2ce90bbc 100644
--- a/fs/utimes.c
+++ b/fs/utimes.c
@@ -21,6 +21,7 @@ static int utimes_common(const struct path *path, struct 
timespec64 *times)
int error;
struct iattr newattrs;
struct inode *inode = path->dentry->d_inode;
+   struct super_block *sb = inode->i_sb;
struct inode *delegated_inode = NULL;
 
error = mnt_want_write(path->mnt);
@@ -36,16 +37,24 @@ static int utimes_common(const struct path *path, struct 
timespec64 *times)
if (times[0].tv_nsec == UTIME_OMIT)
newattrs.ia_valid &= ~ATTR_ATIME;
else if (times[0].tv_nsec != UTIME_NOW) {
-   newattrs.ia_atime.tv_sec = times[0].tv_sec;
-   newattrs.ia_atime.tv_nsec = times[0].tv_nsec;
+   newattrs.ia_atime.tv_sec =
+   clamp(times[0].tv_sec, sb->s_time_min, 
sb->s_time_max);
+   if (times[0].tv_sec == sb->s_time_max || 
times[0].tv_sec == sb->s_time_min)
+   newattrs.ia_atime.tv_nsec = 0;
+   else
+   newattrs.ia_atime.tv_nsec = times[0].tv_nsec;
newattrs.ia_valid |= ATTR_ATIME_SET;
}
 
if (times[1].tv_nsec == UTIME_OMIT)
newattrs.ia_valid &= ~ATTR_MTIME;
else if (times[1].tv_nsec != UTIME_NOW) {
-   newattrs.ia_mtime.tv_sec = times[1].tv_sec;
-   newattrs.ia_mtime.tv_nsec = times[1].tv_nsec;
+   newattrs.ia_mtime.tv_sec =
+   clamp(times[1].tv_sec, sb->s_time_min, 
sb->s_time_max);
+   if (times[1].tv_sec >= sb->s_time_max || 
times[1].tv_sec == sb->s_time_min)
+   newattrs.ia_mtime.tv_nsec = 0;
+   else
+   newattrs.ia_mtime.tv_nsec = times[1].tv_nsec;
newattrs.ia_valid |= ATTR_MTIME_SET;
}
/*
-- 
2.17.1



[PATCH 10/20] fs: nfs: Initialize filesystem timestamp ranges

2019-07-29 Thread Deepa Dinamani
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

The time formats for various verious is detailed in the
RFCs as below:

https://tools.ietf.org/html/rfc7862(time metadata)
https://tools.ietf.org/html/rfc7530:

nfstime4

   struct nfstime4 {
   int64_t seconds;
   uint32_tnseconds;
   };

https://tools.ietf.org/html/rfc1094

  struct timeval {
  unsigned int seconds;
  unsigned int useconds;
  };

https://tools.ietf.org/html/rfc1813

struct nfstime3 {
 uint32   seconds;
 uint32   nseconds;
  };

Use the limits as per the RFC.

Signed-off-by: Deepa Dinamani 
Cc: trond.mykleb...@hammerspace.com
Cc: anna.schuma...@netapp.com
Cc: linux-...@vger.kernel.org
---
 fs/nfs/super.c | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index f88ddac2dcdf..54eb5a47f180 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -2360,6 +2360,15 @@ void nfs_fill_super(struct super_block *sb, struct 
nfs_mount_info *mount_info)
sb->s_flags |= SB_POSIXACL;
sb->s_time_gran = 1;
sb->s_export_op = _export_ops;
+   } else
+   sb->s_time_gran = 1000;
+
+   if (server->nfs_client->rpc_ops->version != 4) {
+   sb->s_time_min = 0;
+   sb->s_time_max = U32_MAX;
+   } else {
+   sb->s_time_min = S64_MIN;
+   sb->s_time_max = S64_MAX;
}
 
nfs_initialise_sb(sb);
@@ -2380,7 +2389,6 @@ static void nfs_clone_super(struct super_block *sb,
sb->s_maxbytes = old_sb->s_maxbytes;
sb->s_xattr = old_sb->s_xattr;
sb->s_op = old_sb->s_op;
-   sb->s_time_gran = 1;
sb->s_export_op = old_sb->s_export_op;
 
if (server->nfs_client->rpc_ops->version != 2) {
@@ -2388,6 +2396,16 @@ static void nfs_clone_super(struct super_block *sb,
 * so ourselves when necessary.
 */
sb->s_flags |= SB_POSIXACL;
+   sb->s_time_gran = 1;
+   } else
+   sb->s_time_gran = 1000;
+
+   if (server->nfs_client->rpc_ops->version != 4) {
+   sb->s_time_min = 0;
+   sb->s_time_max = U32_MAX;
+   } else {
+   sb->s_time_min = S64_MIN;
+   sb->s_time_max = S64_MAX;
}
 
nfs_initialise_sb(sb);
-- 
2.17.1



[PATCH 14/20] fs: sysv: Initialize filesystem timestamp ranges

2019-07-29 Thread Deepa Dinamani
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

Signed-off-by: Deepa Dinamani 
Cc: h...@infradead.org
---
 fs/sysv/super.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/sysv/super.c b/fs/sysv/super.c
index d788b1daa7eb..cc8e2ed155c8 100644
--- a/fs/sysv/super.c
+++ b/fs/sysv/super.c
@@ -368,7 +368,8 @@ static int sysv_fill_super(struct super_block *sb, void 
*data, int silent)
sbi->s_block_base = 0;
mutex_init(>s_lock);
sb->s_fs_info = sbi;
-
+   sb->s_time_min = 0;
+   sb->s_time_max = U32_MAX;
sb_set_blocksize(sb, BLOCK_SIZE);
 
for (i = 0; i < ARRAY_SIZE(flavours) && !size; i++) {
@@ -487,6 +488,8 @@ static int v7_fill_super(struct super_block *sb, void 
*data, int silent)
sbi->s_type = FSTYPE_V7;
mutex_init(>s_lock);
sb->s_fs_info = sbi;
+   sb->s_time_min = 0;
+   sb->s_time_max = U32_MAX;

sb_set_blocksize(sb, 512);
 
-- 
2.17.1



[PATCH 01/20] vfs: Add file timestamp range support

2019-07-29 Thread Deepa Dinamani
Add fields to the superblock to track the min and max
timestamps supported by filesystems.

Initially, when a superblock is allocated, initialize
it to the max and min values the fields can hold.
Individual filesystems override these to match their
actual limits.

Pseudo filesystems are assumed to always support the
min and max allowable values for the fields.

Signed-off-by: Deepa Dinamani 
---
 fs/super.c | 2 ++
 include/linux/fs.h | 3 +++
 include/linux/time64.h | 2 ++
 3 files changed, 7 insertions(+)

diff --git a/fs/super.c b/fs/super.c
index 2739f57515f8..e5835c38d336 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -257,6 +257,8 @@ static struct super_block *alloc_super(struct 
file_system_type *type, int flags,
s->s_maxbytes = MAX_NON_LFS;
s->s_op = _op;
s->s_time_gran = 10;
+   s->s_time_min = TIME64_MIN;
+   s->s_time_max = TIME64_MAX;
s->cleancache_poolid = CLEANCACHE_NO_POOL;
 
s->s_shrink.seeks = DEFAULT_SEEKS;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f1a6ca872943..e9d04e4e5628 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1448,6 +1448,9 @@ struct super_block {
 
/* Granularity of c/m/atime in ns (cannot be worse than a second) */
u32 s_time_gran;
+   /* Time limits for c/m/atime in seconds */
+   time64_t   s_time_min;
+   time64_t   s_time_max;
 #ifdef CONFIG_FSNOTIFY
__u32   s_fsnotify_mask;
struct fsnotify_mark_connector __rcu*s_fsnotify_marks;
diff --git a/include/linux/time64.h b/include/linux/time64.h
index a620ee610b9f..19125489ae94 100644
--- a/include/linux/time64.h
+++ b/include/linux/time64.h
@@ -30,6 +30,8 @@ struct itimerspec64 {
 
 /* Located here for timespec[64]_valid_strict */
 #define TIME64_MAX ((s64)~((u64)1 << 63))
+#define TIME64_MIN (-TIME64_MAX - 1)
+
 #define KTIME_MAX  ((s64)~((u64)1 << 63))
 #define KTIME_SEC_MAX  (KTIME_MAX / NSEC_PER_SEC)
 
-- 
2.17.1



[PATCH 12/20] fs: fat: Initialize filesystem timestamp ranges

2019-07-29 Thread Deepa Dinamani
Fill in the appropriate limits to avoid inconsistencies
in the vfs cached inode times when timestamps are
outside the permitted range.

Some FAT variants indicate that the years after 2099 are not supported.
Since commit 7decd1cb0305 ("fat: Fix and cleanup timestamp conversion"),
we support the full range of years that can be represented, up to 2107.

Signed-off-by: Deepa Dinamani 
Cc: hirof...@mail.parknet.co.jp
---
 fs/fat/inode.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 05689198f5af..5f04c5c810fb 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -31,6 +31,11 @@
 
 #define KB_IN_SECTORS 2
 
+/* DOS dates from 1980/1/1 through 2107/12/31 */
+#define FAT_DATE_MIN (0<<9 | 1<<5 | 1)
+#define FAT_DATE_MAX (127<<9 | 12<<5 | 31)
+#define FAT_TIME_MAX (23<<11 | 59<<5 | 29)
+
 /*
  * A deserialized copy of the on-disk structure laid out in struct
  * fat_boot_sector.
@@ -1605,6 +1610,7 @@ int fat_fill_super(struct super_block *sb, void *data, 
int silent, int isvfat,
int debug;
long error;
char buf[50];
+   struct timespec64 ts;
 
/*
 * GFP_KERNEL is ok here, because while we do hold the
@@ -1698,6 +1704,12 @@ int fat_fill_super(struct super_block *sb, void *data, 
int silent, int isvfat,
sbi->free_clus_valid = 0;
sbi->prev_free = FAT_START_ENT;
sb->s_maxbytes = 0x;
+   fat_time_fat2unix(sbi, , 0, cpu_to_le16(FAT_DATE_MIN), 0);
+   sb->s_time_min = ts.tv_sec;
+
+   fat_time_fat2unix(sbi, , cpu_to_le16(FAT_TIME_MAX),
+ cpu_to_le16(FAT_DATE_MAX), 0);
+   sb->s_time_max = ts.tv_sec;
 
if (!sbi->fat_length && bpb.fat32_length) {
struct fat_boot_fsinfo *fsinfo;
-- 
2.17.1



[PATCH 00/20] vfs: Add support for timestamp limits

2019-07-29 Thread Deepa Dinamani
The series is an update and a more complete version of the
previously posted series at
https://lore.kernel.org/linux-fsdevel/20180122020426.2988-1-deepa.ker...@gmail.com/

Thanks to Arnd Bergmann for doing a few preliminary reviews.
They helped me fix a few issues I had overlooked.

The limits (sometimes granularity also) for the filesystems updated here are 
according to the
following table:

File system   Time type  Start year Expiration year 
Granularity
cramfsfixed  0  0
romfs fixed  0  0
pstoreascii seconds (27 digit ascii) S64_MINS64_MAX 
NSEC_PER_USEC
coda  INT64  S64_MINS64_MAX 1
omfs  64-bit milliseconds0  U64_MAX/ 1000   
NSEC_PER_MSEC
befs  unsigned 48-bit seconds0  0x  
alloc_super
bfs   unsigned 32-bit seconds0  U32_MAX 
alloc_super
efs   unsigned 32-bit seconds0  U32_MAX 
alloc_super
ext2  signed 32-bit seconds  S32_MINS32_MAX 
alloc_super
ext3  signed 32-bit seconds  S32_MINS32_MAX 
alloc_super
ext4 (old)signed 32-bit seconds  S32_MINS32_MAX 
alloc_super
ext4 (extra)  34-bit seconds, 30-bit ns  S32_MIN0x37fff 1
freevxfs  u32 secs/usecs 0  U32_MAX 
alloc_super
jffs2 unsigned 32-bit seconds0  U32_MAX 
alloc_super
jfs   unsigned 32-bit seconds/ns 0  U32_MAX 1
minix unsigned 32-bit seconds0  U32_MAX 
alloc_super
orangefs  u64 seconds0  U64_MAX 
alloc_super
qnx4  unsigned 32-bit seconds0  U32_MAX 
alloc_super
qnx6  unsigned 32-bit seconds0  U32_MAX 
alloc_super
reiserfs  unsigned 32-bit seconds0  U32_MAX 
alloc_super
squashfs  unsigned 32-bit seconds0  U32_MAX 
alloc_super
ufs1  signed 32-bit seconds  S32_MINS32_MAX 
NSEC_PER_SEC
ufs2  signed 64-bit seconds/u32 ns   S64_MINS64_MAX 1
xfs   signed 32-bit seconds/ns   S32_MINS32_MAX 1
ceph  unsigned 32-bit second/ns  0  U32_MAX 1000
sysv  unsigned 32-bit seconds0  U32_MAX 
alloc_super
affs  u32 day, min, ticks1978   u32_max days
NSEC_PER_SEC
nfsv2 unsigned 32-bit seconds/ns 0  U32_MAX 1
nfsv3 unsigned 32-bit seconds/ns 0  U32_MAX 1000
nfsv4 u64 seconds/u32 ns S64_MINS64_MAX 1000
isofs u8 year since 1900 (fixable)   1900   2155
alloc_super
hpfs  unsigned 32-bit seconds1970   2106
alloc_super
fat   7-bit years, 2s resolution 1980   2107
cifs (smb)7-bit years1980   2107
cifs (modern) 64-bit 100ns since 16011601   30828
adfs  40-bit cs since 1900   1900   2248
9p (9P2000)   unsigned 32-bit seconds1970   2106
9p (9P2000.L) signed 64-bit seconds, ns  1970   S64_MAX

Granularity column filled in by the alloc_super() in the above table indicates 
that
the granularity is NSEC_PER_SEC.
Note that anything not mentioned above still has the default limits
S64_MIN..S64_MAX.

The patches in the series are as structured below:
1. Add vfs support to maintain the limits per filesystem.
2. Add a new timestamp_truncate() api for clamping timestamps
   according to the filesystem limits.
3. Add a warning for mount syscall to indicate the impending
   expiry of timestamps.
4. Modify utimes to clamp the timestamps.
5. Fill in limits for filesystems.

An updated version of the test for checking file system timestamp limits has 
been posted
at https://www.spinics.net/lists/fstests/msg12262.html

Changes from previous version:
* No change in mount behavior because of expiry of timestamps.
* Included limits for more filesystems.

Deepa Dinamani (20):
  vfs: Add file timestamp range support
  vfs: Add timestamp_truncate() api
  timestamp_truncate: Replace users of timespec64_trunc
  mount: Add mount warning for impending timestamp expiry
  utimes: Clamp the timestamps before update
  fs: Fill in max and min timestamps in superblock
  9p: Fill min and max timestamps in sb
  adfs: Fill in max and min timestamps in sb
  ext4: Initialize timestamps limits
  fs: nfs: Initialize filesystem timestamp ranges
  fs: cifs: Initialize filesystem timestamp ranges
  fs: fat: Initialize filesystem timestamp ranges
  fs: affs: Initialize filesystem timestamp ranges
  fs: sysv: Initialize filesystem timestamp ranges
  fs: ceph

[PATCH 02/20] vfs: Add timestamp_truncate() api

2019-07-29 Thread Deepa Dinamani
timespec_trunc() function is used to truncate a
filesystem timestamp to the right granularity.
But, the function does not clamp tv_sec part of the
timestamps according to the filesystem timestamp limits.

The replacement api: timestamp_truncate() also alters the
signature of the function to accommodate filesystem
timestamp clamping according to flesystem limits.

Note that the tv_nsec part is set to 0 if tv_sec is not within
the range supported for the filesystem.

Signed-off-by: Deepa Dinamani 
---
 fs/inode.c | 33 -
 include/linux/fs.h |  2 ++
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/fs/inode.c b/fs/inode.c
index 5f5431ec3d62..0fb1f0fb296a 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -2166,6 +2166,37 @@ struct timespec64 timespec64_trunc(struct timespec64 t, 
unsigned gran)
 }
 EXPORT_SYMBOL(timespec64_trunc);
 
+/**
+ * timestamp_truncate - Truncate timespec to a granularity
+ * @t: Timespec
+ * @inode: inode being updated
+ *
+ * Truncate a timespec to the granularity supported by the fs
+ * containing the inode. Always rounds down. gran must
+ * not be 0 nor greater than a second (NSEC_PER_SEC, or 10^9 ns).
+ */
+struct timespec64 timestamp_truncate(struct timespec64 t, struct inode *inode)
+{
+   struct super_block *sb = inode->i_sb;
+   unsigned int gran = sb->s_time_gran;
+
+   t.tv_sec = clamp(t.tv_sec, sb->s_time_min, sb->s_time_max);
+   if (unlikely(t.tv_sec == sb->s_time_max || t.tv_sec == sb->s_time_min))
+   t.tv_nsec = 0;
+
+   /* Avoid division in the common cases 1 ns and 1 s. */
+   if (gran == 1)
+   ; /* nothing */
+   else if (gran == NSEC_PER_SEC)
+   t.tv_nsec = 0;
+   else if (gran > 1 && gran < NSEC_PER_SEC)
+   t.tv_nsec -= t.tv_nsec % gran;
+   else
+   WARN(1, "invalid file time granularity: %u", gran);
+   return t;
+}
+EXPORT_SYMBOL(timestamp_truncate);
+
 /**
  * current_time - Return FS time
  * @inode: inode.
@@ -2187,6 +2218,6 @@ struct timespec64 current_time(struct inode *inode)
return now;
}
 
-   return timespec64_trunc(now, inode->i_sb->s_time_gran);
+   return timestamp_truncate(now, inode);
 }
 EXPORT_SYMBOL(current_time);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e9d04e4e5628..fdfe51d096fa 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -726,6 +726,8 @@ struct inode {
void*i_private; /* fs or device private pointer */
 } __randomize_layout;
 
+struct timespec64 timestamp_truncate(struct timespec64 t, struct inode *inode);
+
 static inline unsigned int i_blocksize(const struct inode *node)
 {
return (1 << node->i_blkbits);
-- 
2.17.1



Re: [PATCH] signal: remove the wrong signal_pending() check in restore_user_sigmask()

2019-06-04 Thread Deepa Dinamani
> On Tue, Jun 4, 2019 at 3:41 PM Oleg Nesterov  wrote:
> >
> > This is the minimal fix for stable, I'll send cleanups later.
> >
> > The commit 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add
> > restore_user_sigmask()") introduced the visible change which breaks
> > user-space: a signal temporary unblocked by set_user_sigmask() can
> > be delivered even if the caller returns success or timeout.
> >
> > Change restore_user_sigmask() to accept the additional "interrupted"
> > argument which should be used instead of signal_pending() check, and
> > update the callers.
> >
> > Reported-by: Eric Wong 
> > Fixes: 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add 
> > restore_user_sigmask()")
> > cc: sta...@vger.kernel.org (v5.0+)
> > Signed-off-by: Oleg Nesterov 
>

Acked-by: Deepa Dinamani 

The original fix posted:
https://lore.kernel.org/patchwork/patch/1077355/ would also have been
a correct fix for this problem. But, given the cleanups that are in
the pipeline, this is a better fix.

-Deepa


Re: pselect/etc semantics

2019-05-30 Thread Deepa Dinamani
On Thu, May 30, 2019 at 8:48 AM Deepa Dinamani  wrote:
>
> > On May 30, 2019, at 8:38 AM, Eric W. Biederman  
> > wrote:
> >
> > ebied...@xmission.com (Eric W. Biederman) writes:
> >
> >> Which means I believe we have a semantically valid change in behavior
> >> that is causing a regression.
> >
> > I haven't made a survey of all of the functions yet but
> > fucntions return -ENORESTARTNOHAND will never return -EINTR and are
> > immune from this problem.
> >
> > AKA pselect is fine.  While epoll_pwait can be affected.
>
> This was my understanding as well.

I think I was misremembered here. I had noted this before:
https://lore.kernel.org/linux-fsdevel/CABeXuvq7gCV2qPOo+Q8jvNyRaTvhkRLRbnL_oJ-AuK7Sp=p...@mail.gmail.com/

"sys_io_pgetevents() does not seem to have this problem as we are still
checking signal_pending() here.
sys_pselect6() seems to have a similar problem. The changes to
sys_pselect6() also impact sys_select() as the changes are in the
common code path."

This was the code replaced for io_pgetevents by 854a6ed56839a40f6b is as below.
No matter what events completed, there was signal_pending() check
after the return from do_io_getevents().

--- a/fs/aio.c
+++ b/fs/aio.c
@@ -2110,18 +2110,9 @@ SYSCALL_DEFINE6(io_pgetevents,
return ret;

ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ?  : NULL);
-   if (signal_pending(current)) {
-   if (ksig.sigmask) {
-   current->saved_sigmask = sigsaved;
-   set_restore_sigmask();
-   }
-
-   if (!ret)
-   ret = -ERESTARTNOHAND;
-   } else {
-   if (ksig.sigmask)
-   sigprocmask(SIG_SETMASK, , NULL);
-   }
+   restore_user_sigmask(ksig.sigmask, );
+   if (signal_pending(current) && !ret)
+   ret = -ERESTARTNOHAND;

Can I ask a simple question for my understanding?

man page for epoll_pwait says

EINTR
The call was interrupted by a signal handler before either any of the
requested events occurred or the timeout expired; see signal(7).

But it is not clear to me if we can figure out(without race) the
chronological order if one of the requested events are completed or a
signal came first.
Is this a correct exectation?

Also like pointed out above, this behavior is not consistent for all
such syscalls(io_pgetevents). Was this also by design?

-Deepa


Re: pselect/etc semantics

2019-05-30 Thread Deepa Dinamani
> On May 30, 2019, at 8:38 AM, Eric W. Biederman  wrote:
>
> ebied...@xmission.com (Eric W. Biederman) writes:
>
>> Which means I believe we have a semantically valid change in behavior
>> that is causing a regression.
>
> I haven't made a survey of all of the functions yet but
> fucntions return -ENORESTARTNOHAND will never return -EINTR and are
> immune from this problem.
>
> AKA pselect is fine.  While epoll_pwait can be affected.

This was my understanding as well.

> Has anyone contacted Omar Kilani to see if that is his issue?
> https://lore.kernel.org/lkml/CA+8F9hicnF=kvjXPZFQy=pa2hjus3js+g9vswfhnqqynpmh...@mail.gmail.com/

Omar was cc-ed when this regression was reported. I did cc him on fix
and asked if he could try it. We have not heard from him.

> So far the only regression report I am seeing is from Eric Wong.
> AKA https://lore.kernel.org/lkml/20190501021405.hfvd7ps623liu25i@dcvr/
> Are there any others?  How did we get to be talking about more
> than just epoll_pwait?

This is the only report that I know of. I’m not sure why people
started talking about pselect. I was also confused why instead of
reviewing the patch and discussing the fix, we ended up talking about
how to simplify the code. We have deviated much from what should have
been a code review.

-Deepa


Re: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask()

2019-05-29 Thread Deepa Dinamani
On Wed, May 29, 2019 at 9:57 AM Oleg Nesterov  wrote:
>
> On 05/28, Deepa Dinamani wrote:
> >
> > I agree that signal handller being called and return value not being
> > altered is an issue with other syscalls also. I was just wondering if
> > some userspace code assumption would be assuming this. This is not a
> > kernel bug.
> >
> > But, I do not think we have an understanding of what was wrong in
> > 854a6ed56839a anymore since you pointed out that my assumption was not
> > correct that the signal handler being called without errno being set
> > is wrong.
>
> Deepa, sorry, I simply can't parse the above... most probably because of
> my bad English.

Ok, All I meant was that I had thought a signal handler being invoked
without the error value reflecting it was wrong. That is what I had
thought was wrong with 854a6ed56839a. Now, that we agree that signal
handler can be invoked without the errno returning success, I thought
I did not know what is wrong with 854a6ed56839a anymore.

But, you now pointed out that the signals we care about should not be
delivered after an event has been ready. This points out to what was
wrong with 854a6ed56839a. Thanks.

> > One open question: this part of epoll_pwait was already broken before
> > 854a6ed56839a. Do you agree?
> >
> > if (err == -EINTR) {
> >memcpy(>saved_sigmask, ,
> >   sizeof(sigsaved));
> > set_restore_sigmask();
> >   } else
> >set_current_blocked();
>
> I do not understand why do you think this part was broken :/

Ok, because of your other statement that the signals the application
cares about do not want to know about signals they care about after an
event is ready this is also not a problem.

> > Or, I could revert the signal_pending() check and provide a fix
> > something like below(not a complete patch)
>
> ...
>
> > -void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
> > +int restore_user_sigmask(const void __user *usigmask, sigset_t
> > *sigsaved, int sig_pending)
> >  {
> >
> > if (!usigmask)
> >return;
> >
> > /*
> >  * When signals are pending, do not restore them here.
> >  * Restoring sigmask here can lead to delivering signals that the 
> > above
> >  * syscalls are intended to block because of the sigmask passed in.
> >  */
> > +   if (sig_pending) {
> > current->saved_sigmask = *sigsaved;
> > set_restore_sigmask();
> >return;
> >}
> >
> > @@ -2330,7 +2330,8 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct
> > epoll_event __user *, events,
> >
> > error = do_epoll_wait(epfd, events, maxevents, timeout);
> >
> > -   restore_user_sigmask(sigmask, );
> > +   signal_detected = restore_user_sigmask(sigmask, ,
> > error == -EINTR);
>
> I fail to understand this pseudo-code, sorry. In particular, do not understand
> why restore_user_sigmask() needs to return a boolean.

That was a remnant from the other patch. Return type needs to be void.

> The only thing I _seem to_ understand is the "sig_pending" flag passed by the
> caller which replaces the signal_pending() check.

Correct. This is what is the main change I was proposing.

> Yes, this is what I think we
> should do, and this is what I tried to propose from the very beginning in my
> 1st email in this thread.

This was not clear to me in your first response that you did not want
the signal_pending() check in restore_user_sigmask(). :
https://lore.kernel.org/lkml/20190522150505.ga4...@redhat.com/

"Ugh. I need to re-check, but at first glance I really dislike this change.

I think we can fix the problem _and_ simplify the code. Something like below.
The patch is obviously incomplete, it changes only only one caller of
set_user_sigmask(), epoll_pwait() to explain what I mean.

restore_user_sigmask() should simply die. Although perhaps another helper
makes sense to add WARN_ON(test_tsk_restore_sigmask() && !signal_pending)."

-Deepa


Re: pselect/etc semantics (Was: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask())

2019-05-29 Thread Deepa Dinamani
On Wed, May 29, 2019 at 9:12 AM Oleg Nesterov  wrote:
>
> Al, Linus, Eric, please help.
>
> The previous discussion was very confusing, we simply can not understand each
> other.
>
> To me everything looks very simple and clear, but perhaps I missed something
> obvious? Please correct me.
>
> I think that the following code is correct
>
> int interrupted = 0;
>
> void sigint_handler(int sig)
> {
> interrupted = 1;
> }
>
> int main(void)
> {
> sigset_t sigint, empty;
>
> sigemptyset();
> sigaddset(, SIGINT);
> sigprocmask(SIG_BLOCK, , NULL);
>
> signal(SIGINT, sigint_handler);
>
> sigemptyset();// so pselect() unblocks SIGINT
>
> ret = pselect(..., );
>
> if (ret >= 0)   // sucess or timeout
> assert(!interrupted);
>
> if (interrupted)
> assert(ret == -EINTR);
> }
>
> IOW, if pselect(sigmask) temporary unblocks SIGINT according to sigmask, this
> signal should not be delivered if a ready fd was found or timeout. The signal
> handle should only run if ret == -EINTR.

I do not think we discussed this part earlier. But, if this is true
then this is what is wrong as part of 854a6ed56839a. I missed that
before.

> (pselect() can be interrupted by any other signal which has a handler. In this
>  case the handler can be called even if ret >= 0. This is correct, I fail to
>  understand why some people think this is wrong, and in any case we simply 
> can't
>  avoid this).

This patch is wrong because I did not know that it was ok to deliver a
signal and not set the errno before. I also admitted to this. And
proposed another way to revert the patch.:
https://lore.kernel.org/lkml/CABeXuvouBzZuNarmNcd9JgZgvonL1N_p21gat=O_x0-1hMx=6...@mail.gmail.com/

-Deepa


Re: pselect/etc semantics (Was: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask())

2019-05-29 Thread Deepa Dinamani
Resending due to inadvertent conversion of prior message to html.

On Wed, May 29, 2019 at 9:12 AM Oleg Nesterov  wrote:
> Al, Linus, Eric, please help.
>
> The previous discussion was very confusing, we simply can not understand each
> other.
>
> To me everything looks very simple and clear, but perhaps I missed something
> obvious? Please correct me.
>
> I think that the following code is correct
>
> int interrupted = 0;
>
> void sigint_handler(int sig)
> {
> interrupted = 1;
> }
>
> int main(void)
> {
> sigset_t sigint, empty;
>
> sigemptyset();
> sigaddset(, SIGINT);
> sigprocmask(SIG_BLOCK, , NULL);
>
> signal(SIGINT, sigint_handler);
>
> sigemptyset();// so pselect() unblocks SIGINT
>
> ret = pselect(..., );
>
> if (ret >= 0)   // sucess or timeout
> assert(!interrupted);
>
> if (interrupted)
> assert(ret == -EINTR);
> }
>
> IOW, if pselect(sigmask) temporary unblocks SIGINT according to sigmask, this
> signal should not be delivered if a ready fd was found or timeout. The signal
> handle should only run if ret == -EINTR.
>
> (pselect() can be interrupted by any other signal which has a handler. In this
>  case the handler can be called even if ret >= 0. This is correct, I fail to
>  understand why some people think this is wrong, and in any case we simply 
> can't
>  avoid this).
>
> This was true until 854a6ed56839a ("signal: Add restore_user_sigmask()"),
> now this is broken by the signal_pending() check in restore_user_sigmask().
>
> This patch 
> https://lore.kernel.org/lkml/20190522032144.10995-1-deepa.ker...@gmail.com/
> turns 0 into -EINTR if signal_pending(), but I think we should simply restore
> the old behaviour and simplify the code.
>
> See the compile-tested patch at the end. Of course, the new _xxx() helpers
> should be renamed somehow. fs/aio.c doesn't look right with or without this
> patch, but iiuc this is what it did before 854a6ed56839a.
>
> Let me show the code with the patch applied. I am using epoll_pwait() as an
> example because it looks very simple.
>
>
> static inline void set_restore_sigmask(void)
> {
> // WARN_ON(!TIF_SIGPENDING) was removed by this patch
> current->restore_sigmask = true;
> }
>
> int set_xxx(const sigset_t __user *umask, size_t sigsetsize)
> {
> sigset_t *kmask;
>
> if (!umask)
> return 0;
> if (sigsetsize != sizeof(sigset_t))
> return -EINVAL;
> if (copy_from_user(kmask, umask, sizeof(sigset_t)))
> return -EFAULT;
>
> // we can safely modify ->saved_sigmask/restore_sigmask, they has no meaning
> // until the syscall returns.
> set_restore_sigmask();
> current->saved_sigmask = current->blocked;
> set_current_blocked(kmask);
>
> return 0;
> }
>
>
> void update_xxx(bool interrupted)
> {
> // the main reason for this helper is WARN_ON(!TIF_SIGPENDING) which was 
> "moved"
> // from set_restore_sigmask() above.
> if (interrupted)
> WARN_ON(!test_thread_flag(TIF_SIGPENDING));
> else
> restore_saved_sigmask();
> }
>
> SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, 
> events,
> int, maxevents, int, timeout, const sigset_t __user 
> *, sigmask,
> size_t, sigsetsize)
> {
> int error;
>
> error = set_xxx(sigmask, sigsetsize);
> if (error)
> return error;
>
> error = do_epoll_wait(epfd, events, maxevents, timeout);
> update_xxx(error == -EINTR);
>
> return error;
> }
>
> Oleg.
> ---
>
>  fs/aio.c | 40 ++-
>  fs/eventpoll.c   | 12 +++
>  fs/io_uring.c| 12 +++
>  fs/select.c  | 40 +++
>  include/linux/compat.h   |  4 +--
>  include/linux/sched/signal.h |  2 --
>  include/linux/signal.h   |  6 ++--
>  kernel/signal.c  | 77 
> 
>  8 files changed, 74 insertions(+), 119 deletions(-)
>
>
> diff --git a/fs/aio.c b/fs/aio.c
> index 3490d1f..8315bd2 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -2093,8 +2093,8 @@ SYSCALL_DEFINE6(io_pgetevents,
> const struct __aio_sigset __user *, usig)
>  {
> struct __aio_sigset ksig = { NULL, };
> -   sigset_tksigmask, sigsaved;
> struct timespec64   ts;
> +  

Re: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask()

2019-05-28 Thread Deepa Dinamani
On Mon, May 27, 2019 at 8:04 AM Oleg Nesterov  wrote:
>
> Deepa,
>
> it seems that we both are saying the same things again and again, and we
> simply can't understand each other.

Oleg, I'm sorry for the confusion.  Maybe I should point out what I
agree with also.

I agree that signal handller being called and return value not being
altered is an issue with other syscalls also. I was just wondering if
some userspace code assumption would be assuming this. This is not a
kernel bug.

But, I do not think we have an understanding of what was wrong in
854a6ed56839a anymore since you pointed out that my assumption was not
correct that the signal handler being called without errno being set
is wrong.

One open question: this part of epoll_pwait was already broken before
854a6ed56839a. Do you agree?

if (err == -EINTR) {
   memcpy(>saved_sigmask, ,
  sizeof(sigsaved));
set_restore_sigmask();
  } else
   set_current_blocked();

What to do next?
We could just see if your optimization patch resolves Eric's issue.
Or, I could revert the signal_pending() check and provide a fix
something like below(not a complete patch) since mainline has this
regression. Eric had tested something like this works also. And, I can
continue to look at what was wrong with 854a6ed56839a in the first
place. Let me know what you prefer:

-void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
+int restore_user_sigmask(const void __user *usigmask, sigset_t
*sigsaved, int sig_pending)
 {

if (!usigmask)
   return;

/*
 * When signals are pending, do not restore them here.
 * Restoring sigmask here can lead to delivering signals that the above
 * syscalls are intended to block because of the sigmask passed in.
 */
+   if (sig_pending) {
current->saved_sigmask = *sigsaved;
set_restore_sigmask();
   return;
   }

@@ -2330,7 +2330,8 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct
epoll_event __user *, events,

error = do_epoll_wait(epfd, events, maxevents, timeout);

-   restore_user_sigmask(sigmask, );
+   signal_detected = restore_user_sigmask(sigmask, ,
error == -EINTR);

-Deepa


Re: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask()

2019-05-28 Thread Deepa Dinamani



> On May 28, 2019, at 2:12 AM, David Laight  wrote:
> 
> From: Deepa Dinamani
>> Sent: 24 May 2019 18:02
> ...
>> Look at the code before 854a6ed56839a:
>> 
>> /*
>>   * If we changed the signal mask, we need to restore the original one.
>>   * In case we've got a signal while waiting, we do not restore the
>>   * signal mask yet, and we allow do_signal() to deliver the signal on
>>   * the way back to userspace, before the signal mask is restored.
>>   */
>>  if (sigmask) {
>> ### This err has not been changed since ep_poll()
>> ### So if there is a signal before this point, but
>> err = 0, then we goto else.
>>  if (err == -EINTR) {
>>  memcpy(>saved_sigmask, ,
>> sizeof(sigsaved));
>>  set_restore_sigmask();
>>  } else
>> This is a problem if there is signal
>> pending that is sigmask should block.
>>### This is the whole reason we have
>> current->saved_sigmask?
>>  set_current_blocked();
>>  }
> 
> What happens if all that crap is just deleted (I presume from the
> bottom of ep_wait()) ?

Hmm, you have to update the saved_sigmask or the sigmask.

> I'm guessing that on the way back to userspace signal handlers for
> signals enabled in the process's current mask (the one specified
> to epoll_pwait) get called.
> Then the signal mask is loaded from current->saved_sigmask and
> and enabled signal handlers are called again.

Who is saving this saved_sigmask that is being restored on the way back?

> No special code there that depends on the syscall result, errno
> of the syscall number.

I didn’t say this has anything to do with errno.

-Deepa 



Re: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask()

2019-05-24 Thread Deepa Dinamani
On Fri, May 24, 2019 at 9:33 AM Oleg Nesterov  wrote:
>
> On 05/24, Deepa Dinamani wrote:
> >
> > On Fri, May 24, 2019 at 7:11 AM Oleg Nesterov  wrote:
> > >
> > > On 05/23, Deepa Dinamani wrote:
> > > >
> > > > Ok, since there has been quite a bit of argument here, I will
> > > > backtrack a little bit and maybe it will help us understand what's
> > > > happening here.
> > > > There are many scenarios being discussed on this thread:
> > > > a. State of code before 854a6ed56839a
> > >
> > > I think everything was correct,
> >
> > There were 2 things that were wrong:
> >
> > 1. If an unblocked signal was received, after the ep_poll(), then the
> > return status did not indicate that.
>
> Yes,
>
> > This is expected behavior
> > according to man page. If this is indeed what is expected then the man
> > page should note that signal will be delivered in this case and return
> > code will still be 0.
> >
> > "EINTR
> > The call was interrupted by a signal handler before either any of the
> > requested events occurred or the timeout expired; see signal(7)."
>
> and what do you think the man page could say?

Maybe clarify that a signal handler can be invoked even if the syscall
return indicates a success.

Maybe a crude userspace application could do something like this:

sig_handler()
{
  set global abort = 1
}

poll_the_fds()
{
   ret = epoll_pwait()
   if (ret)
  return ret
  if (abort)  # but this abort should be ignored
if ret was 0.
return try_again

}

> This is obviously possible for any syscall, and we can't avoid this. A signal
> can come right after syscall insn completes. The signal handler will be called
> but this won't change $rax, user-space can see return code == 0 or anything 
> else.
>
> And this doesn't differ from the case when the signal comes before syscall 
> returns.

But, these syscalls are depending on there signals. I would assume for
the purpose of these syscalls that the execution is done when we
updated the saved_sigmask. We can pick a different point per syscall
like ep_poll() also, but then we need to probably make it clear for
each such syscall.

> > 2. The restoring of the sigmask is done right in the syscall part and
> > not while exiting the syscall and if you get a blocked signal here,
> > you will deliver this to userspace.
>
> So I assume that this time you are talking about epoll_pwait() and not 
> epoll_wait()...

Yes.

> And I simply can't understand you. But yes, if the original mask doesn't 
> include
> the pending signal it will be delivered while the syscall can return 
> success/timout
> or -EFAULT or anything.
>
> This is correct, see above.

Look at the code before 854a6ed56839a:

  /*
* If we changed the signal mask, we need to restore the original one.
* In case we've got a signal while waiting, we do not restore the
* signal mask yet, and we allow do_signal() to deliver the signal on
* the way back to userspace, before the signal mask is restored.
*/
   if (sigmask) {
  ### This err has not been changed since ep_poll()
  ### So if there is a signal before this point, but
err = 0, then we goto else.
   if (err == -EINTR) {
   memcpy(>saved_sigmask, ,
  sizeof(sigsaved));
   set_restore_sigmask();
   } else
  This is a problem if there is signal
pending that is sigmask should block.
 ### This is the whole reason we have
current->saved_sigmask?
   set_current_blocked();
   }

> > > > b. State after 854a6ed56839a
> > >
> > > obviously buggy,
> >
> > Ok, then can you point out what specifically was wrong with
> > 854a6ed56839a?
>
> Cough. If nothing else the lost -EINTR?

This was my theory. My basis behind the theory was [1](the issue with
return value not being updated) above. And, you are saying this is ok.

854a6ed56839a also has timing differences compared to the original
code. So unless we are sure what was uncovered because of
854a6ed56839a, we might just be masking a pre-existing problem by
reverting it. So I think we should code review 854a6ed56839a and
figure out what is wrong programatically before just reverting it.

> > And, not how it could be more simple?

Oh, I was not asking here. I was saying let's please discuss what's
wrong before simplifying the code.

-Deepa


Re: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask()

2019-05-24 Thread Deepa Dinamani
On Fri, May 24, 2019 at 7:11 AM Oleg Nesterov  wrote:
>
> On 05/23, Deepa Dinamani wrote:
> >
> > Ok, since there has been quite a bit of argument here, I will
> > backtrack a little bit and maybe it will help us understand what's
> > happening here.
> > There are many scenarios being discussed on this thread:
> > a. State of code before 854a6ed56839a
>
> I think everything was correct,

There were 2 things that were wrong:

1. If an unblocked signal was received, after the ep_poll(), then the
return status did not indicate that. This is expected behavior
according to man page. If this is indeed what is expected then the man
page should note that signal will be delivered in this case and return
code will still be 0.

"EINTR
The call was interrupted by a signal handler before either any of the
requested events occurred or the timeout expired; see signal(7)."

2. The restoring of the sigmask is done right in the syscall part and
not while exiting the syscall and if you get a blocked signal here,
you will deliver this to userspace.

> > b. State after 854a6ed56839a
>
> obviously buggy,

Ok, then can you point out what specifically was wrong with
854a6ed56839a? And, not how it could be more simple?

> > c. Proposed fix as per the patchset in question.
>
> > As per [a] and let's consider the case of epoll_pwait only first for 
> > simplicity.
> >
> > As I said before, ep_poll() is what checks for signal_pending() and is
> > responsible for setting errno to -EINTR when there is a signal.
>
> To clarify, if do_epoll_wait() return -EINTR then signal_pending() is true,
> right?

Yes, the case I'm talking about is when do_epoll_wait() returns 0 and
then you get a signal.

> > So if a signal is received after ep_poll() and ep_poll() returns
> > success, it is never noticed by the syscall during execution.
>
> What you are saying looks very confusing to me, I will assume that you
> meant something like
>
> - a signal SIG_XXX was blocked before sys_epoll_pwait() was called
>
> - sys_epoll_pwait(sigmask) unblocks SIG_XXX according to sigmask
>
> - sys_epoll_pwait() calls do_epoll_wait() which returns success
>
> - SIG_XXX comes after that and it is "never noticed"
>
> Yes. Everything is correct. And see my reply to David, SIG_XXX can even
> come _before_ sys_epoll_pwait() was called.

No, I'm talking about a signal that was not blocked.

> > So the question is does the userspace have to know about this signal
> > or not.
>
> If userspace needs to know about SIG_XXX it should not block it, that is all.

What should be the return value if a signal is detected after a fd completed?

> > What [b] does is to move the signal check closer to the restoration of
> > the signal.
>
> FOR NO REASON, afaics (to simplify, lets forget the problem with the wrong
> return value you are trying to fix).

As I already pointed out, the restoring of the sigmask is done during
the syscall and not while exiting the syscall and if you get a blocked
signal here, you will deliver this to userspace.

> And even if there were ANY reason to do this, note that (with or without this
> fix) the signal_pending() check inside restore_user_sigmask() can NOT help,
> simply because SIG_XXX can come right after this check.

This I pointed out already that we should probably make this sequence atomic.


-Deepa


Re: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask()

2019-05-24 Thread Deepa Dinamani
I think you are misunderstanding what I said. You are taking things
out of context. I was saying here what I did was inspired by why the
syscall was designed to begin with. The syscall below refers to
epoll_wait and not epoll_pwait.

-Deepa

On Fri, May 24, 2019 at 7:19 AM Oleg Nesterov  wrote:
>
> On 05/23, Deepa Dinamani wrote:
> >
> > 1. block the signals you don't care about.
> > 2. syscall()
> > 3. unblock the signals blocked in 1.
>
> and even this part of your email is very confusing. because in this case
> we can never miss a signal. I'd say
>
> 1. block the signals you don't care about
> 2. unblock the signals which should interrupt the syscall below
> 3. syscall()
> 4. block the signals unblocked in 2.
>
> Oleg.
>


Re: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask()

2019-05-23 Thread Deepa Dinamani
> Just adding a little more clarification, there is an additional change
> between [a] and [b].
> As per [a] we would just restore the signal instead of changing the
> saved_sigmask and the signal could get delivered right then. [b]
> changes this to happen at syscall exit:

Rewording above, as there seems to be a few misrepresentations:

Just adding a little more clarification, there is an additional change
between [a] and [b].
As per [a] we would just restore the signal mask instead of changing
the saved_sigmask and the even the blocked signals could get delivered
right then. [b] changes the restoration to happen at syscall exit:

> void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
> {
>
>
>
>   /*
>* When signals are pending, do not restore them here.
>* Restoring sigmask here can lead to delivering signals
> that the above
>* syscalls are intended to block because of the sigmask passed in.
>*/
>if (signal_pending(current)) {
>current->saved_sigmask = *sigsaved;
>set_restore_sigmask();
>return;
> }

 -Deepa


Re: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask()

2019-05-23 Thread Deepa Dinamani
On Thu, May 23, 2019 at 11:06 AM Deepa Dinamani  wrote:
>
> Ok, since there has been quite a bit of argument here, I will
> backtrack a little bit and maybe it will help us understand what's
> happening here.
> There are many scenarios being discussed on this thread:
> a. State of code before 854a6ed56839a
> b. State after 854a6ed56839a
> c. Proposed fix as per the patchset in question.
>
> Oleg, I will discuss these first and then we can discuss the
> additional changes you suggested.
>
> Some background on why we have these syscalls that take sigmask as an
> argument. This is just for the sake of completeness of the argument.
>
> These are particularly meant for a scenario(d) such as below:
>
> 1. block the signals you don't care about.
> 2. syscall()
> 3. unblock the signals blocked in 1.
>
> The problem here is that if there is a signal that is not blocked by 1
> and such a signal is delivered between 1 and 2(since they are not
> atomic), the syscall in 2 might block forever as it never found out
> about the signal.
>
> As per [a] and let's consider the case of epoll_pwait only first for 
> simplicity.
>
> As I said before, ep_poll() is what checks for signal_pending() and is
> responsible for setting errno to -EINTR when there is a signal.
>
> So if a signal is received after ep_poll() and ep_poll() returns
> success, it is never noticed by the syscall during execution.
> So the question is does the userspace have to know about this signal
> or not. From scenario [d] above, I would say it should, even if all
> the fd's completed successfully.
> This does not happen in [a]. So this is what I said was already broken.
>
> What [b] does is to move the signal check closer to the restoration of
> the signal. This way it is good. So, if there is a signal after
> ep_poll() returns success, it is noticed and the signal is delivered
> when the syscall exits. But, the syscall error status itself is 0.
>
> So now [c] is adjusting the return values based on whether extra
> signals were detected after ep_poll(). This part was needed even for
> [a].
>
> Let me know if this clarifies things a bit.

Just adding a little more clarification, there is an additional change
between [a] and [b].
As per [a] we would just restore the signal instead of changing the
saved_sigmask and the signal could get delivered right then. [b]
changes this to happen at syscall exit:

void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
{

   

  /*
   * When signals are pending, do not restore them here.
   * Restoring sigmask here can lead to delivering signals
that the above
   * syscalls are intended to block because of the sigmask passed in.
   */
   if (signal_pending(current)) {
   current->saved_sigmask = *sigsaved;
   set_restore_sigmask();
   return;
}

-Deepa


Re: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask()

2019-05-23 Thread Deepa Dinamani
Ok, since there has been quite a bit of argument here, I will
backtrack a little bit and maybe it will help us understand what's
happening here.
There are many scenarios being discussed on this thread:
a. State of code before 854a6ed56839a
b. State after 854a6ed56839a
c. Proposed fix as per the patchset in question.

Oleg, I will discuss these first and then we can discuss the
additional changes you suggested.

Some background on why we have these syscalls that take sigmask as an
argument. This is just for the sake of completeness of the argument.

These are particularly meant for a scenario(d) such as below:

1. block the signals you don't care about.
2. syscall()
3. unblock the signals blocked in 1.

The problem here is that if there is a signal that is not blocked by 1
and such a signal is delivered between 1 and 2(since they are not
atomic), the syscall in 2 might block forever as it never found out
about the signal.

As per [a] and let's consider the case of epoll_pwait only first for simplicity.

As I said before, ep_poll() is what checks for signal_pending() and is
responsible for setting errno to -EINTR when there is a signal.

So if a signal is received after ep_poll() and ep_poll() returns
success, it is never noticed by the syscall during execution.
So the question is does the userspace have to know about this signal
or not. From scenario [d] above, I would say it should, even if all
the fd's completed successfully.
This does not happen in [a]. So this is what I said was already broken.

What [b] does is to move the signal check closer to the restoration of
the signal. This way it is good. So, if there is a signal after
ep_poll() returns success, it is noticed and the signal is delivered
when the syscall exits. But, the syscall error status itself is 0.

So now [c] is adjusting the return values based on whether extra
signals were detected after ep_poll(). This part was needed even for
[a].

Let me know if this clarifies things a bit.

-Deepa


Re: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask()

2019-05-22 Thread Deepa Dinamani
On Wed, May 22, 2019 at 3:18 PM Chris Down  wrote:
>
> +Cc: linux-mm, since this broke mmots tree and has been applied there
>
> This patch is missing a definition for signal_detected in io_cqring_wait, 
> which
> breaks the build.

This patch does not break the build.
The patch the breaks the build was the v2 of this patch since there
was an accidental deletion.
That's what the v3 fixed. I think v3 got picked up today morning into
the mm tree


-Deepa


Re: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask()

2019-05-22 Thread Deepa Dinamani
On Wed, May 22, 2019 at 9:14 AM Oleg Nesterov  wrote:
>
> On 05/22, Deepa Dinamani wrote:
> >
> > -Deepa
> >
> > > On May 22, 2019, at 8:05 AM, Oleg Nesterov  wrote:
> > >
> > >> On 05/21, Deepa Dinamani wrote:
> > >>
> > >> Note that this patch returns interrupted errors (EINTR, ERESTARTNOHAND,
> > >> etc) only when there is no other error. If there is a signal and an error
> > >> like EINVAL, the syscalls return -EINVAL rather than the interrupted
> > >> error codes.
> > >
> > > Ugh. I need to re-check, but at first glance I really dislike this change.
> > >
> > > I think we can fix the problem _and_ simplify the code. Something like 
> > > below.
> > > The patch is obviously incomplete, it changes only only one caller of
> > > set_user_sigmask(), epoll_pwait() to explain what I mean.
> > > restore_user_sigmask() should simply die. Although perhaps another helper
> > > makes sense to add WARN_ON(test_tsk_restore_sigmask() && !signal_pending).
> >
> > restore_user_sigmask() was added because of all the variants of these
> > syscalls we added because of y2038 as noted in commit message:
> >
> >   signal: Add restore_user_sigmask()
> >
> > Refactor the logic to restore the sigmask before the syscall
> > returns into an api.
> > This is useful for versions of syscalls that pass in the
> > sigmask and expect the current->sigmask to be changed during
> > the execution and restored after the execution of the syscall.
> >
> > With the advent of new y2038 syscalls in the subsequent patches,
> > we add two more new versions of the syscalls (for pselect, ppoll
> > and io_pgetevents) in addition to the existing native and compat
> > versions. Adding such an api reduces the logic that would need to
> > be replicated otherwise.
>
> Again, I need to re-check, will continue tomorrow. But so far I am not sure
> this helper can actually help.
>
> > > --- a/fs/eventpoll.c
> > > +++ b/fs/eventpoll.c
> > > @@ -2318,19 +2318,19 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct 
> > > epoll_event __user *, events,
> > >size_t, sigsetsize)
> > > {
> > >int error;
> > > -sigset_t ksigmask, sigsaved;
> > >
> > >/*
> > > * If the caller wants a certain signal mask to be set during the wait,
> > > * we apply it here.
> > > */
> > > -error = set_user_sigmask(sigmask, , , sigsetsize);
> > > +error = set_user_sigmask(sigmask, sigsetsize);
> > >if (error)
> > >return error;
> > >
> > >error = do_epoll_wait(epfd, events, maxevents, timeout);
> > >
> > > -restore_user_sigmask(sigmask, );
> > > +if (error != -EINTR)
> >
> > As you address all the other syscalls this condition becomes more and
> > more complicated.
>
> May be.
>
> > > --- a/include/linux/sched/signal.h
> > > +++ b/include/linux/sched/signal.h
> > > @@ -416,7 +416,6 @@ void task_join_group_stop(struct task_struct *task);
> > > static inline void set_restore_sigmask(void)
> > > {
> > >set_thread_flag(TIF_RESTORE_SIGMASK);
> > > -WARN_ON(!test_thread_flag(TIF_SIGPENDING));
> >
> > So you always want do_signal() to be called?
>
> Why do you think so? No. This is just to avoid the warning, because with the
> patch I sent set_restore_sigmask() is called "in advance".
>
> > You will have to check each architecture's implementation of
> > do_signal() to check if that has any side effects.
>
> I don't think so.

Why not?

> > Although this is not what the patch is solving.
>
> Sure. But you know, after I tried to read the changelog, I am not sure
> I understand what exactly you are trying to fix. Could you please explain
> this part
>
> The behavior
> before 854a6ed56839a was that the signals were dropped after the error
> code was decided. This resulted in lost signals but the userspace did 
> not
> notice it
>
> ? I fail to understand it, sorry. It looks as if the code was already buggy 
> before
> that commit and it could miss a signal or something like this, but I do not 
> see how.

Did you read the explanation pointed to in the commit text? :

https://lore.kernel.org/linux-fsdevel/20190427093319.sgicqik2oqkez3wk@dcvr/

Let me know what part you don't understand and I can explain more.

It would be better to understand the isssue before we start discussing the fix.

-Deepa


Re: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask()

2019-05-22 Thread Deepa Dinamani
-Deepa

> On May 22, 2019, at 8:05 AM, Oleg Nesterov  wrote:
>
>> On 05/21, Deepa Dinamani wrote:
>>
>> Note that this patch returns interrupted errors (EINTR, ERESTARTNOHAND,
>> etc) only when there is no other error. If there is a signal and an error
>> like EINVAL, the syscalls return -EINVAL rather than the interrupted
>> error codes.
>
> Ugh. I need to re-check, but at first glance I really dislike this change.
>
> I think we can fix the problem _and_ simplify the code. Something like below.
> The patch is obviously incomplete, it changes only only one caller of
> set_user_sigmask(), epoll_pwait() to explain what I mean.
> restore_user_sigmask() should simply die. Although perhaps another helper
> makes sense to add WARN_ON(test_tsk_restore_sigmask() && !signal_pending).

restore_user_sigmask() was added because of all the variants of these
syscalls we added because of y2038 as noted in commit message:

  signal: Add restore_user_sigmask()

Refactor the logic to restore the sigmask before the syscall
returns into an api.
This is useful for versions of syscalls that pass in the
sigmask and expect the current->sigmask to be changed during
the execution and restored after the execution of the syscall.

With the advent of new y2038 syscalls in the subsequent patches,
we add two more new versions of the syscalls (for pselect, ppoll
and io_pgetevents) in addition to the existing native and compat
versions. Adding such an api reduces the logic that would need to
be replicated otherwise.


>
> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> index 4a0e98d..85f56e4 100644
> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -2318,19 +2318,19 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct 
> epoll_event __user *, events,
>size_t, sigsetsize)
> {
>int error;
> -sigset_t ksigmask, sigsaved;
>
>/*
> * If the caller wants a certain signal mask to be set during the wait,
> * we apply it here.
> */
> -error = set_user_sigmask(sigmask, , , sigsetsize);
> +error = set_user_sigmask(sigmask, sigsetsize);
>if (error)
>return error;
>
>error = do_epoll_wait(epfd, events, maxevents, timeout);
>
> -restore_user_sigmask(sigmask, );
> +if (error != -EINTR)

As you address all the other syscalls this condition becomes more and
more complicated.

> +restore_saved_sigmask();
>
>return error;
> }
> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
> index e412c09..1e82ae0 100644
> --- a/include/linux/sched/signal.h
> +++ b/include/linux/sched/signal.h
> @@ -416,7 +416,6 @@ void task_join_group_stop(struct task_struct *task);
> static inline void set_restore_sigmask(void)
> {
>set_thread_flag(TIF_RESTORE_SIGMASK);
> -WARN_ON(!test_thread_flag(TIF_SIGPENDING));

So you always want do_signal() to be called?
You will have to check each architecture's implementation of
do_signal() to check if that has any side effects.

Although this is not what the patch is solving. What we want is to
adjust return codes on all these syscalls to user and not drop
signals. Please check v2/v3 of the patch. I've updated the commit text
to provide more context into what is actually being fixed here.

If we really want to simplify, we should rewrite all the internal
logic of all the ppoll, epoll_pwait, io_pgetevent syscall internal
handling where we set the error code.
As new versions of syscalls were added, the internal logic got
reworked rather hapazardly. But, as the current issue points out,
these are delicate changes.

-Deepa
> }
>
> static inline void clear_tsk_restore_sigmask(struct task_struct *tsk)
> @@ -447,7 +446,6 @@ static inline bool test_and_clear_restore_sigmask(void)
> static inline void set_restore_sigmask(void)
> {
>current->restore_sigmask = true;
> -WARN_ON(!test_thread_flag(TIF_SIGPENDING));
> }
> static inline void clear_tsk_restore_sigmask(struct task_struct *tsk)
> {
> diff --git a/include/linux/signal.h b/include/linux/signal.h
> index 9702016..887cea6 100644
> --- a/include/linux/signal.h
> +++ b/include/linux/signal.h
> @@ -273,8 +273,7 @@ extern int group_send_sig_info(int sig, struct 
> kernel_siginfo *info,
>   struct task_struct *p, enum pid_type type);
> extern int __group_send_sig_info(int, struct kernel_siginfo *, struct 
> task_struct *);
> extern int sigprocmask(int, sigset_t *, sigset_t *);
> -extern int set_user_sigmask(const sigset_t __user *usigmask, sigset_t *set,
> -sigset_t *oldset, size_t sigsetsize);
> +extern int set_user_sigmask(const sigset_t __user *umask, size_t sigsetsize);
> extern void restore_user_sigmask(const void __user *usigmask,
> sigset_t *s

[PATCH v3] signal: Adjust error codes according to restore_user_sigmask()

2019-05-22 Thread Deepa Dinamani
A regression caused by 854a6ed56839 ("signal: Add restore_user_sigmask()")
caused users of epoll_pwait, io_pgetevents, and ppoll to notice a
latent problem in signal handling during these syscalls.

That patch (854a6ed56839) moved the signal_pending() check closer
to restoring of the user sigmask. But, it failed to update the error
code accordingly.  From the userspace perspective, the patch increased
the time window for the signal discovery and subsequent delivery to the
userspace, but did not always adjust the errno afterwards. The behavior
before 854a6ed56839a was that the signals were dropped after the error
code was decided. This resulted in lost signals but the userspace did not
notice it as the syscalls had finished executing the core functionality
and the error codes returned notified success.

For all the syscalls that receive a sigmask from the userland,
the user sigmask is to be in effect through the syscall execution.
At the end of syscall, sigmask of the current process is restored
to what it was before the switch over to user sigmask.
But, for this to be true in practice, the sigmask should be restored
only at the the point we change the saved_sigmask. Anything before
that loses signals. And, anything after is just pointless as the
signal is already lost by restoring the sigmask.

Detailed issue discussion permalink:
https://lore.kernel.org/linux-fsdevel/20190427093319.sgicqik2oqkez3wk@dcvr/

Note that this patch returns interrupted errors (EINTR, ERESTARTNOHAND,
etc) only when there is no other error. If there is a signal and an error
like EINVAL, the syscalls return -EINVAL rather than the interrupted
error codes.

Reported-by: Eric Wong 
Fixes: 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add 
restore_user_sigmask()")
Signed-off-by: Deepa Dinamani 
Reviewed-by: Davidlohr Bueso 
Cc:  # 5.0.x
Cc:  # 5.1.x
---
Applies to 5.1 using patch -p1.
Backport to 5.0 requires dropping the io_uring.c diff.

Changes since v2:
* added back the reviewed-by tag
* fixed missing local definition in io_uring.c
Changes since v1:
* updated the commit text for more context of the pre-existing condition
* added stable tags as requested

 fs/aio.c   | 24 
 fs/eventpoll.c | 14 ++
 fs/io_uring.c  |  7 +--
 fs/select.c| 37 +
 include/linux/signal.h |  2 +-
 kernel/signal.c| 13 ++---
 6 files changed, 59 insertions(+), 38 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 3490d1fa0e16..ebd2b1980161 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -2095,7 +2095,7 @@ SYSCALL_DEFINE6(io_pgetevents,
struct __aio_sigset ksig = { NULL, };
sigset_tksigmask, sigsaved;
struct timespec64   ts;
-   int ret;
+   int ret, signal_detected;
 
if (timeout && unlikely(get_timespec64(, timeout)))
return -EFAULT;
@@ -2108,8 +2108,8 @@ SYSCALL_DEFINE6(io_pgetevents,
return ret;
 
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ?  : NULL);
-   restore_user_sigmask(ksig.sigmask, );
-   if (signal_pending(current) && !ret)
+   signal_detected = restore_user_sigmask(ksig.sigmask, );
+   if (signal_detected && !ret)
ret = -ERESTARTNOHAND;
 
return ret;
@@ -2128,7 +2128,7 @@ SYSCALL_DEFINE6(io_pgetevents_time32,
struct __aio_sigset ksig = { NULL, };
sigset_tksigmask, sigsaved;
struct timespec64   ts;
-   int ret;
+   int ret, signal_detected;
 
if (timeout && unlikely(get_old_timespec32(, timeout)))
return -EFAULT;
@@ -2142,8 +2142,8 @@ SYSCALL_DEFINE6(io_pgetevents_time32,
return ret;
 
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ?  : NULL);
-   restore_user_sigmask(ksig.sigmask, );
-   if (signal_pending(current) && !ret)
+   signal_detected = restore_user_sigmask(ksig.sigmask, );
+   if (signal_detected && !ret)
ret = -ERESTARTNOHAND;
 
return ret;
@@ -2193,7 +2193,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents,
struct __compat_aio_sigset ksig = { NULL, };
sigset_t ksigmask, sigsaved;
struct timespec64 t;
-   int ret;
+   int ret, signal_detected;
 
if (timeout && get_old_timespec32(, timeout))
return -EFAULT;
@@ -2206,8 +2206,8 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents,
return ret;
 
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ?  : NULL);
-   restore_user_sigmask(ksig.sigmask, );
-   if (signal_pending(current) && !ret)
+   signal_detected = restore_user_sigmask(ksig.sigmask, );
+   if (signal_detected && !ret)
ret = -ERESTARTNOHAND;
 
return ret;
@@ -2226,7 +2226,7 @@ C

[PATCH v2] signal: Adjust error codes according to restore_user_sigmask()

2019-05-21 Thread Deepa Dinamani
A regression caused by 854a6ed56839 ("signal: Add restore_user_sigmask()")
caused users of epoll_pwait, io_pgetevents, and ppoll to notice a
latent problem in signal handling during these syscalls.

That patch (854a6ed56839) moved the signal_pending() check closer
to restoring of the user sigmask. But, it failed to update the error
code accordingly.  From the userspace perspective, the patch increased
the time window for the signal discovery and subsequent delivery to the
userspace, but did not always adjust the errno afterwards. The behavior
before 854a6ed56839a was that the signals were dropped after the error
code was decided. This resulted in lost signals but the userspace did not
notice it as the syscalls had finished executing the core functionality
and the error codes returned notified success.

For all the syscalls that receive a sigmask from the userland,
the user sigmask is to be in effect through the syscall execution.
At the end of syscall, sigmask of the current process is restored
to what it was before the switch over to user sigmask.
But, for this to be true in practice, the sigmask should be restored
only at the the point we change the saved_sigmask. Anything before
that loses signals. And, anything after is just pointless as the
signal is already lost by restoring the sigmask.

Detailed issue discussion permalink:
https://lore.kernel.org/linux-fsdevel/20190427093319.sgicqik2oqkez3wk@dcvr/

Note that this patch returns interrupted errors (EINTR, ERESTARTNOHAND,
etc) only when there is no other error. If there is a signal and an error
like EINVAL, the syscalls return -EINVAL rather than the interrupted
error codes.

Reported-by: Eric Wong 
Fixes: 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add 
restore_user_sigmask()")
Signed-off-by: Deepa Dinamani 
Cc:  # 5.0.x
Cc:  # 5.1.x
---
Changes since v1:
* updated the commit text for more context of the pre-existing condition
* added stable tags as requested

 fs/aio.c   | 24 
 fs/eventpoll.c | 14 ++
 fs/io_uring.c  |  7 +--
 fs/select.c| 37 +
 include/linux/signal.h |  2 +-
 kernel/signal.c| 13 ++---
 6 files changed, 59 insertions(+), 38 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 3490d1fa0e16..ebd2b1980161 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -2095,7 +2095,7 @@ SYSCALL_DEFINE6(io_pgetevents,
struct __aio_sigset ksig = { NULL, };
sigset_tksigmask, sigsaved;
struct timespec64   ts;
-   int ret;
+   int ret, signal_detected;
 
if (timeout && unlikely(get_timespec64(, timeout)))
return -EFAULT;
@@ -2108,8 +2108,8 @@ SYSCALL_DEFINE6(io_pgetevents,
return ret;
 
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ?  : NULL);
-   restore_user_sigmask(ksig.sigmask, );
-   if (signal_pending(current) && !ret)
+   signal_detected = restore_user_sigmask(ksig.sigmask, );
+   if (signal_detected && !ret)
ret = -ERESTARTNOHAND;
 
return ret;
@@ -2128,7 +2128,7 @@ SYSCALL_DEFINE6(io_pgetevents_time32,
struct __aio_sigset ksig = { NULL, };
sigset_tksigmask, sigsaved;
struct timespec64   ts;
-   int ret;
+   int ret, signal_detected;
 
if (timeout && unlikely(get_old_timespec32(, timeout)))
return -EFAULT;
@@ -2142,8 +2142,8 @@ SYSCALL_DEFINE6(io_pgetevents_time32,
return ret;
 
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ?  : NULL);
-   restore_user_sigmask(ksig.sigmask, );
-   if (signal_pending(current) && !ret)
+   signal_detected = restore_user_sigmask(ksig.sigmask, );
+   if (signal_detected && !ret)
ret = -ERESTARTNOHAND;
 
return ret;
@@ -2193,7 +2193,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents,
struct __compat_aio_sigset ksig = { NULL, };
sigset_t ksigmask, sigsaved;
struct timespec64 t;
-   int ret;
+   int ret, signal_detected;
 
if (timeout && get_old_timespec32(, timeout))
return -EFAULT;
@@ -2206,8 +2206,8 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents,
return ret;
 
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ?  : NULL);
-   restore_user_sigmask(ksig.sigmask, );
-   if (signal_pending(current) && !ret)
+   signal_detected = restore_user_sigmask(ksig.sigmask, );
+   if (signal_detected && !ret)
ret = -ERESTARTNOHAND;
 
return ret;
@@ -2226,7 +2226,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_time64,
struct __compat_aio_sigset ksig = { NULL, };
sigset_t ksigmask, sigsaved;
struct timespec64 t;
-   int ret;
+   int ret, signal_detected;
 

Re: [PATCH 1/1] signal: Adjust error codes according to restore_user_sigmask()

2019-05-21 Thread Deepa Dinamani
On Tue, May 21, 2019 at 5:35 PM Deepa Dinamani  wrote:
>
> > > > It's been 2 weeks and this fix hasn't appeared in mmots / mmotm.
> > > > I also noticed it's missing Cc: for stable@ (below)
> > >
> > > Why is a -stable backport needed?  I see some talk above about lost
> > > signals but it is unclear whether these are being observed after fixing
> > > the regression caused by 854a6ed56839a.
> >
> > I guess Deepa's commit messages wasn't clear...
> > I suggest prepending this as the first paragraph to Deepa's
> > original message:
> >
> >   This fixes a bug introduced with 854a6ed56839a which caused
> >   EINTR to not be reported to userspace on epoll_pwait.  Failure
> >   to report EINTR to userspace caused problems with user code
> >   which relies on EINTR to run signal handlers.
>
> This is not what the patch fixed.
>
> The notable change is userspace is that now whenever a signal is
> delivered, the return value is adjusted to reflect the signal
> delivery.
> Prior to this patch, there was a window, however small it might have
> been, when the signal was delivered but the errono was not adjusted
> appropriately.
> This is because of the regression caused by 854a6ed56839a, which
> extended the window of delivery of signals that was delivered to
> userspace.
> The patch also fixes more than sys_epoll_pwait().
>
> I will post a follow up patch.
>
> >
> > > IOW, can we please have a changelog which has a clear and complete
> > > description of the user-visible effects of the change.
> > >
> > > And please Cc Oleg.
>
> I will cc Oleg.

Also the commit message was brief because the issue was explained in
the link that was quoted in the commit message.

Detailed issue discussion permalink:
https://lore.kernel.org/linux-fsdevel/20190427093319.sgicqik2oqkez3wk@dcvr/

-Deepa


Re: [PATCH 1/1] signal: Adjust error codes according to restore_user_sigmask()

2019-05-21 Thread Deepa Dinamani
> > > It's been 2 weeks and this fix hasn't appeared in mmots / mmotm.
> > > I also noticed it's missing Cc: for stable@ (below)
> >
> > Why is a -stable backport needed?  I see some talk above about lost
> > signals but it is unclear whether these are being observed after fixing
> > the regression caused by 854a6ed56839a.
>
> I guess Deepa's commit messages wasn't clear...
> I suggest prepending this as the first paragraph to Deepa's
> original message:
>
>   This fixes a bug introduced with 854a6ed56839a which caused
>   EINTR to not be reported to userspace on epoll_pwait.  Failure
>   to report EINTR to userspace caused problems with user code
>   which relies on EINTR to run signal handlers.

This is not what the patch fixed.

The notable change is userspace is that now whenever a signal is
delivered, the return value is adjusted to reflect the signal
delivery.
Prior to this patch, there was a window, however small it might have
been, when the signal was delivered but the errono was not adjusted
appropriately.
This is because of the regression caused by 854a6ed56839a, which
extended the window of delivery of signals that was delivered to
userspace.
The patch also fixes more than sys_epoll_pwait().

I will post a follow up patch.

>
> > IOW, can we please have a changelog which has a clear and complete
> > description of the user-visible effects of the change.
> >
> > And please Cc Oleg.

I will cc Oleg.


[PATCH 1/1] signal: Adjust error codes according to restore_user_sigmask()

2019-05-06 Thread Deepa Dinamani
For all the syscalls that receive a sigmask from the userland,
the user sigmask is to be in effect through the syscall execution.
At the end of syscall, sigmask of the current process is restored
to what it was before the switch over to user sigmask.
But, for this to be true in practice, the sigmask should be restored
only at the the point we change the saved_sigmask. Anything before
that loses signals. And, anything after is just pointless as the
signal is already lost by restoring the sigmask.

The inherent issue was detected because of a regression caused by
854a6ed56839a.
The patch moved the signal_pending() check closer to restoring of the
user sigmask. But, it failed to update the error code accordingly.

Detailed issue discussion permalink:
https://lore.kernel.org/linux-fsdevel/20190427093319.sgicqik2oqkez3wk@dcvr/

Note that the patch returns interrupted errors (EINTR, ERESTARTNOHAND,
etc) only when there is no other error. If there is a signal and an error
like EINVAL, the syscalls return -EINVAL rather than the interrupted
error codes.

The sys_io_uring_enter() seems to be returning success when there is
a signal and the queue is not empty. This seems to be a bug. I will
follow up with a separate patch for that.

Reported-by: Eric Wong 
Fixes: 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add 
restore_user_sigmask()")
Signed-off-by: Deepa Dinamani 
Reviewed-by: Davidlohr Bueso 
---

 fs/aio.c   | 24 
 fs/eventpoll.c | 14 ++
 fs/io_uring.c  |  9 ++---
 fs/select.c| 37 +
 include/linux/signal.h |  2 +-
 kernel/signal.c| 13 ++---
 6 files changed, 60 insertions(+), 39 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 3490d1fa0e16..ebd2b1980161 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -2095,7 +2095,7 @@ SYSCALL_DEFINE6(io_pgetevents,
struct __aio_sigset ksig = { NULL, };
sigset_tksigmask, sigsaved;
struct timespec64   ts;
-   int ret;
+   int ret, signal_detected;
 
if (timeout && unlikely(get_timespec64(, timeout)))
return -EFAULT;
@@ -2108,8 +2108,8 @@ SYSCALL_DEFINE6(io_pgetevents,
return ret;
 
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ?  : NULL);
-   restore_user_sigmask(ksig.sigmask, );
-   if (signal_pending(current) && !ret)
+   signal_detected = restore_user_sigmask(ksig.sigmask, );
+   if (signal_detected && !ret)
ret = -ERESTARTNOHAND;
 
return ret;
@@ -2128,7 +2128,7 @@ SYSCALL_DEFINE6(io_pgetevents_time32,
struct __aio_sigset ksig = { NULL, };
sigset_tksigmask, sigsaved;
struct timespec64   ts;
-   int ret;
+   int ret, signal_detected;
 
if (timeout && unlikely(get_old_timespec32(, timeout)))
return -EFAULT;
@@ -2142,8 +2142,8 @@ SYSCALL_DEFINE6(io_pgetevents_time32,
return ret;
 
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ?  : NULL);
-   restore_user_sigmask(ksig.sigmask, );
-   if (signal_pending(current) && !ret)
+   signal_detected = restore_user_sigmask(ksig.sigmask, );
+   if (signal_detected && !ret)
ret = -ERESTARTNOHAND;
 
return ret;
@@ -2193,7 +2193,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents,
struct __compat_aio_sigset ksig = { NULL, };
sigset_t ksigmask, sigsaved;
struct timespec64 t;
-   int ret;
+   int ret, signal_detected;
 
if (timeout && get_old_timespec32(, timeout))
return -EFAULT;
@@ -2206,8 +2206,8 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents,
return ret;
 
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ?  : NULL);
-   restore_user_sigmask(ksig.sigmask, );
-   if (signal_pending(current) && !ret)
+   signal_detected = restore_user_sigmask(ksig.sigmask, );
+   if (signal_detected && !ret)
ret = -ERESTARTNOHAND;
 
return ret;
@@ -2226,7 +2226,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_time64,
struct __compat_aio_sigset ksig = { NULL, };
sigset_t ksigmask, sigsaved;
struct timespec64 t;
-   int ret;
+   int ret, signal_detected;
 
if (timeout && get_timespec64(, timeout))
return -EFAULT;
@@ -2239,8 +2239,8 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_time64,
return ret;
 
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ?  : NULL);
-   restore_user_sigmask(ksig.sigmask, );
-   if (signal_pending(current) && !ret)
+   signal_detected = restore_user_sigmask(ksig.sigmask, );
+   if (signal_detected && !ret)
ret = -ERESTARTNOHAND;
 
return ret;
diff --git a/fs/even

Re: [PATCH] signal: Adjust error codes according to restore_user_sigmask()

2019-05-03 Thread Deepa Dinamani
The original patch was merged through the tip tree. Adding tglx just in case.

I will post the revised patch to everyone on this thread.

> >For all the syscalls that receive a sigmask from the userland,
> >the user sigmask is to be in effect through the syscall execution.
> >At the end of syscall, sigmask of the current process is restored
> >to what it was before the switch over to user sigmask.
> >But, for this to be true in practice, the sigmask should be restored
> >only at the the point we change the saved_sigmask. Anything before
> >that loses signals. And, anything after is just pointless as the
> >signal is already lost by restoring the sigmask.
> >
> >The issue was detected because of a regression caused by 854a6ed56839a.
> >The patch moved the signal_pending() check closer to restoring of the
> >user sigmask. But, it failed to update the error code accordingly.
> >
> >Detailed issue discussion permalink:
> >https://lore.kernel.org/linux-fsdevel/20190427093319.sgicqik2oqkez3wk@dcvr/
> >
> >Note that the patch returns interrupted errors (EINTR, ERESTARTNOHAND,
> >etc) only when there is no other error. If there is a signal and an error
> >like EINVAL, the syscalls return -EINVAL rather than the interrupted
> >error codes.
>
> Thanks for doing this; I've reviewed the epoll bits (along with the overall
> idea) and it seems like a sane alternative to reverting the offending patch.

Sorry maybe the description wasn't clear. What I actually am saying is
that all these syscalls were dropping signals before and
854a6ed56839a4 actually did things right by making sure they did not
do so.
But, there was a bug in that it did not communicate to userspace when
the error code was not already set.
However, we could still argue that the check and flipping of the mask
isn't atomic and there is still a way this can theoretically happen.
But, this will also mean that these syscalls will slow down further.
But, they are already expected to be slow so maybe it doesn't matter.
I will note this down in the commit text.
I don't think reverting was an alternative. 854a6ed56839a4 exposed a
bug that was already there.

> Feel free to add:
>
> Reviewed-by: Davidlohr Bueso 
>
> A small nit, I think we should be a bit more verbose about the return 
> semantics
> of restore_user_sigmask()... see at the end.
>
> >
> >Reported-by: Eric Wong 
> >Fixes: 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add 
> >restore_user_sigmask()")
> >Signed-off-by: Deepa Dinamani 
> >--- a/kernel/signal.c
> >+++ b/kernel/signal.c
> >@@ -2845,15 +2845,16 @@ EXPORT_SYMBOL(set_compat_user_sigmask);
> >  * usigmask: sigmask passed in from userland.
> >  * sigsaved: saved sigmask when the syscall started and changed the sigmask 
> > to
> >  *   usigmask.
> >+ * returns 1 in case a pending signal is detected.
>
> How about:
>
> "
> Callers must carefully coordinate between signal_pending() checks between the
> actual system call and restore_user_sigmask() - for which a new pending signal
> may come in between calls and therefore must consider this for returning a 
> proper
> error code.
>
> Returns 1 in case a signal pending is detected, otherwise 0.

Ok, I will add more verbiage here.

Thanks,
Deepa


Re: [PATCH] signal: Adjust error codes according to restore_user_sigmask()

2019-05-03 Thread Deepa Dinamani
On Thu, May 2, 2019 at 11:34 PM Eric Wong  wrote:
>
> Deepa Dinamani  wrote:
> > Sorry, I was trying a new setup at work. I should have tested it.
> > My bad, I've checked this one.
>
> Thanks.  This is good w.r.t. epoll_pwait and ppoll when applied
> to 5.0.11 (no fs/io_uring.c).

Thanks. Al, would you be picking up this patch? I can resend it.

> Can't think of anything which uses pselect or aio on my system;
> but it looks right to me.
>
> > I've removed the questionable reported-by, since we're not sure if
> > it is actually the same issue.
>
> Yes, I hope Omar can test this, too.

Omar, would you be able to test this?

Thanks,
Deepa


[PATCH] signal: Adjust error codes according to restore_user_sigmask()

2019-05-02 Thread Deepa Dinamani
For all the syscalls that receive a sigmask from the userland,
the user sigmask is to be in effect through the syscall execution.
At the end of syscall, sigmask of the current process is restored
to what it was before the switch over to user sigmask.
But, for this to be true in practice, the sigmask should be restored
only at the the point we change the saved_sigmask. Anything before
that loses signals. And, anything after is just pointless as the
signal is already lost by restoring the sigmask.

The issue was detected because of a regression caused by 854a6ed56839a.
The patch moved the signal_pending() check closer to restoring of the
user sigmask. But, it failed to update the error code accordingly.

Detailed issue discussion permalink:
https://lore.kernel.org/linux-fsdevel/20190427093319.sgicqik2oqkez3wk@dcvr/

Note that the patch returns interrupted errors (EINTR, ERESTARTNOHAND,
etc) only when there is no other error. If there is a signal and an error
like EINVAL, the syscalls return -EINVAL rather than the interrupted
error codes.

Reported-by: Eric Wong 
Fixes: 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add 
restore_user_sigmask()")
Signed-off-by: Deepa Dinamani 
---
Sorry, I was trying a new setup at work. I should have tested it.
My bad, I've checked this one.
I've removed the questionable reported-by, since we're not sure if
it is actually the same issue.

 fs/aio.c   | 24 
 fs/eventpoll.c | 14 ++
 fs/io_uring.c  |  9 ++---
 fs/select.c| 37 +
 include/linux/signal.h |  2 +-
 kernel/signal.c|  8 +---
 6 files changed, 55 insertions(+), 39 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 38b741aef0bf..7de2f7573d55 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -2133,7 +2133,7 @@ SYSCALL_DEFINE6(io_pgetevents,
struct __aio_sigset ksig = { NULL, };
sigset_tksigmask, sigsaved;
struct timespec64   ts;
-   int ret;
+   int ret, signal_detected;
 
if (timeout && unlikely(get_timespec64(, timeout)))
return -EFAULT;
@@ -2146,8 +2146,8 @@ SYSCALL_DEFINE6(io_pgetevents,
return ret;
 
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ?  : NULL);
-   restore_user_sigmask(ksig.sigmask, );
-   if (signal_pending(current) && !ret)
+   signal_detected = restore_user_sigmask(ksig.sigmask, );
+   if (signal_detected && !ret)
ret = -ERESTARTNOHAND;
 
return ret;
@@ -2166,7 +2166,7 @@ SYSCALL_DEFINE6(io_pgetevents_time32,
struct __aio_sigset ksig = { NULL, };
sigset_tksigmask, sigsaved;
struct timespec64   ts;
-   int ret;
+   int ret, signal_detected;
 
if (timeout && unlikely(get_old_timespec32(, timeout)))
return -EFAULT;
@@ -2180,8 +2180,8 @@ SYSCALL_DEFINE6(io_pgetevents_time32,
return ret;
 
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ?  : NULL);
-   restore_user_sigmask(ksig.sigmask, );
-   if (signal_pending(current) && !ret)
+   signal_detected = restore_user_sigmask(ksig.sigmask, );
+   if (signal_detected && !ret)
ret = -ERESTARTNOHAND;
 
return ret;
@@ -2231,7 +2231,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents,
struct __compat_aio_sigset ksig = { NULL, };
sigset_t ksigmask, sigsaved;
struct timespec64 t;
-   int ret;
+   int ret, signal_detected;
 
if (timeout && get_old_timespec32(, timeout))
return -EFAULT;
@@ -2244,8 +2244,8 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents,
return ret;
 
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ?  : NULL);
-   restore_user_sigmask(ksig.sigmask, );
-   if (signal_pending(current) && !ret)
+   signal_detected = restore_user_sigmask(ksig.sigmask, );
+   if (signal_detected && !ret)
ret = -ERESTARTNOHAND;
 
return ret;
@@ -2264,7 +2264,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_time64,
struct __compat_aio_sigset ksig = { NULL, };
sigset_t ksigmask, sigsaved;
struct timespec64 t;
-   int ret;
+   int ret, signal_detected;
 
if (timeout && get_timespec64(, timeout))
return -EFAULT;
@@ -2277,8 +2277,8 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_time64,
return ret;
 
ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ?  : NULL);
-   restore_user_sigmask(ksig.sigmask, );
-   if (signal_pending(current) && !ret)
+   signal_detected = restore_user_sigmask(ksig.sigmask, );
+   if (signal_detected && !ret)
ret = -ERESTARTNOHAND;
 
return ret;
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 4a0e98d

Re: Strange issues with epoll since 5.0

2019-05-02 Thread Deepa Dinamani
Eric,
Can you please help test this?
If this solves your problem, I can post the fix.
Thanks,
- Deepa

-8<---

Subject: [PATCH] signal: Adjust error codes according to restore_user_sigmask()

For all the syscalls that receive a sigmask from the userland,
the user sigmask is to be in effect through the syscall execution.
At the end of syscall, sigmask of the current process is restored
to what it was before the switch over to user sigmask.
But, for this to be true in practice, the sigmask should be restored
only at the the point we change the saved_sigmask. Anything before
that loses signals. And, anything after is just pointless as the
signal is already lost by restoring the sigmask.

The issue was detected because of a regression caused by 854a6ed56839a.
The patch moved the signal_pending() check closer to restoring of the
user sigmask. But, it failed to update the error code accordingly.

Detailed issue discussion permalink:
https://lore.kernel.org/linux-fsdevel/20190427093319.sgicqik2oqkez3wk@dcvr/

Note that the patch returns interrupted errors (EINTR, ERESTARTNOHAND, etc)
only when there is no other error. If there is a signal and an error like
EINVAL, the syscalls return -EINVAL rather than the interrupted error codes.

Reported-by: Omar Kilani 
Reported-by: Eric Wong 
Fixes: 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add
restore_user_sigmask()")
Signed-off-by: Deepa Dinamani 
---
 fs/aio.c   | 24 
 fs/eventpoll.c | 14 ++
 fs/io_uring.c  |  9 ++---
 fs/select.c| 37 +
 include/linux/signal.h |  2 +-
 kernel/signal.c|  8 +---
 6 files changed, 55 insertions(+), 39 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 38b741aef0bf..7de2f7573d55 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -2133,7 +2133,7 @@ SYSCALL_DEFINE6(io_pgetevents,
 struct __aio_sigsetksig = { NULL, };
 sigset_tksigmask, sigsaved;
 struct timespec64ts;
-int ret;
+int ret, signal_detected;

 if (timeout && unlikely(get_timespec64(, timeout)))
 return -EFAULT;
@@ -2146,8 +2146,8 @@ SYSCALL_DEFINE6(io_pgetevents,
 return ret;

 ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ?  : NULL);
-restore_user_sigmask(ksig.sigmask, );
-if (signal_pending(current) && !ret)
+signal_detected = restore_user_sigmask(ksig.sigmask, );
+if (signal_detected && !ret)
 ret = -ERESTARTNOHAND;

 return ret;
@@ -2166,7 +2166,7 @@ SYSCALL_DEFINE6(io_pgetevents_time32,
 struct __aio_sigsetksig = { NULL, };
 sigset_tksigmask, sigsaved;
 struct timespec64ts;
-int ret;
+int ret, signal_detected;

 if (timeout && unlikely(get_old_timespec32(, timeout)))
 return -EFAULT;
@@ -2180,8 +2180,8 @@ SYSCALL_DEFINE6(io_pgetevents_time32,
 return ret;

 ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ?  : NULL);
-restore_user_sigmask(ksig.sigmask, );
-if (signal_pending(current) && !ret)
+signal_detected = restore_user_sigmask(ksig.sigmask, );
+if (signal_detected && !ret)
 ret = -ERESTARTNOHAND;

 return ret;
@@ -2231,7 +2231,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents,
 struct __compat_aio_sigset ksig = { NULL, };
 sigset_t ksigmask, sigsaved;
 struct timespec64 t;
-int ret;
+int ret, signal_detected;

 if (timeout && get_old_timespec32(, timeout))
 return -EFAULT;
@@ -2244,8 +2244,8 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents,
 return ret;

 ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ?  : NULL);
-restore_user_sigmask(ksig.sigmask, );
-if (signal_pending(current) && !ret)
+signal_detected = restore_user_sigmask(ksig.sigmask, );
+if (signal_detected && !ret)
 ret = -ERESTARTNOHAND;

 return ret;
@@ -2264,7 +2264,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_time64,
 struct __compat_aio_sigset ksig = { NULL, };
 sigset_t ksigmask, sigsaved;
 struct timespec64 t;
-int ret;
+int ret, signal_detected;

 if (timeout && get_timespec64(, timeout))
 return -EFAULT;
@@ -2277,8 +2277,8 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_time64,
 return ret;

 ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ?  : NULL);
-restore_user_sigmask(ksig.sigmask, );
-if (signal_pending(current) && !ret)
+signal_detected = restore_user_sigmask(ksig.sigmask, );
+if (signal_detected && !ret)
 ret = -ERESTARTNOHAND;

 return ret;
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 4a0e98d87fcc..fe5a0724b417 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2317,7 +2317,7 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct
epoll_event __user *, events,
 int, maxevents, int, timeout, const sigset_t __us

Re: Strange issues with epoll since 5.0

2019-05-01 Thread Deepa Dinamani
On Wed, May 1, 2019 at 1:48 PM Eric Wong  wrote:
>
> Deepa Dinamani  wrote:
> > So here is my analysis:
>
> 
>
> > So the 854a6ed56839a40f6 seems to be better than the original code in
> > that it detects the signal.
>
> OTOH, does matter to anybody that a signal is detected slightly
> sooner than it would've been, otherwise?

The original code drops the signal altogether. This is because it
overwrites the current's sigmask with the provided
one(set_current_blocked()). If a signal bit was set, it is lost
forever. It does not detect it sooner. The check for pending signal is
sooner and not just before the syscall returns.
This is what the patch in discussion does: check for signals just
before returning.

>
> > But, the problem is that it doesn't
> > communicate it to the userspace.
>
> Yup, that's a big problem :)
>
> > So a patch like below solves the problem. This is incomplete. I'll
> > verify and send you a proper fix you can test soon. This is just for
> > the sake of discussion:
> >
> > diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> > index 4a0e98d87fcc..63a387329c3d 100644
> > --- a/fs/eventpoll.c
> > +++ b/fs/eventpoll.c
> > @@ -2317,7 +2317,7 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct
> > epoll_event __user *, events,
> > int, maxevents, int, timeout, const sigset_t __user *, 
> > sigmask,
> > size_t, sigsetsize)
> >  {
> > -   int error;
> > +   int error, signal_detected;
> > sigset_t ksigmask, sigsaved;
> >
> > /*
> > @@ -2330,7 +2330,10 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct
> > epoll_event __user *, events,
> >
> > error = do_epoll_wait(epfd, events, maxevents, timeout);
> >
> > -   restore_user_sigmask(sigmask, );
> > +   signal_detected = restore_user_sigmask(sigmask, );
> > +
> > +   if (signal_detected && !error)
> > +   return -EITNR;
> >
> > return error;
>
> Looks like a reasonable API.
>
> > @@ -2862,7 +2862,7 @@ void restore_user_sigmask(const void __user
> > *usigmask, sigset_t *sigsaved)
> > if (signal_pending(current)) {
> > current->saved_sigmask = *sigsaved;
> > set_restore_sigmask();
> > -   return;
> > +   return 0;
>
> Shouldn't that "return 1" if a signal is pending?

Yep, I meant this to be 1.

-Deepa


Re: Strange issues with epoll since 5.0

2019-05-01 Thread Deepa Dinamani
Thanks for trying the fix.

So here is my analysis:

Let's start with epoll_pwait:

ep_poll() is what checks for signal_pending() and is responsible for
setting errno to -EINTR when there is a signal.

So if a signal is received after ep_poll(), it is never noticed by the
syscall during execution.

Moreover, the original code before
854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add
restore_user_sigmask()"), had the following call flow:

 error = do_epoll_wait(epfd, events, maxevents, timeout);

 Here error = 0 if the signal is received after ep_poll().

-   /*
-* If we changed the signal mask, we need to restore the original one.
-* In case we've got a signal while waiting, we do not restore the
-* signal mask yet, and we allow do_signal() to deliver the signal on
-* the way back to userspace, before the signal mask is restored.
-*/
-   if (sigmask) {
-   if (error == -EINTR) {
-   memcpy(>saved_sigmask, ,
-  sizeof(sigsaved));
-   set_restore_sigmask();
-   } else

 Execution reaches this else statement and the sigmask is restored
directly, ignoring the newly generated signal. The signal is never
handled.

-   set_current_blocked();
-   }

In the current execution flow:

error = do_epoll_wait(epfd, events, maxevents, timeout);

 error is still 0 as ep_poll() did not detect the signal.

restore_user_sigmask(sigmask, , error == -EITNR);

void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
{

if (!usigmask)
return;
/*
 * When signals are pending, do not restore them here.
 * Restoring sigmask here can lead to delivering signals that the above
 * syscalls are intended to block because of the sigmask passed in.
 */
if (signal_pending(current)) {

 execution path reaches here and do_signal() actually delivers the
signal to userspace. But the errno is not set. So the userspace fails
to notice it.

current->saved_sigmask = *sigsaved;
set_restore_sigmask();
return;
}

/*
 * This is needed because the fast syscall return path does not restore
 * saved_sigmask when signals are not pending.
 */
set_current_blocked(sigsaved);
}

For other syscalls in the same commit:

sys_io_pgetevents() does not seem to have this problem as we are still
checking signal_pending() here.
sys_pselect6() seems to have a similar problem. The changes to
sys_pselect6() also impact sys_select() as the changes are in the
common code path.

So the 854a6ed56839a40f6 seems to be better than the original code in
that it detects the signal. But, the problem is that it doesn't
communicate it to the userspace.

So a patch like below solves the problem. This is incomplete. I'll
verify and send you a proper fix you can test soon. This is just for
the sake of discussion:

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 4a0e98d87fcc..63a387329c3d 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2317,7 +2317,7 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct
epoll_event __user *, events,
int, maxevents, int, timeout, const sigset_t __user *, sigmask,
size_t, sigsetsize)
 {
-   int error;
+   int error, signal_detected;
sigset_t ksigmask, sigsaved;

/*
@@ -2330,7 +2330,10 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct
epoll_event __user *, events,

error = do_epoll_wait(epfd, events, maxevents, timeout);

-   restore_user_sigmask(sigmask, );
+   signal_detected = restore_user_sigmask(sigmask, );
+
+   if (signal_detected && !error)
+   return -EITNR;

return error;
 }
@@ -2342,7 +2345,7 @@ COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd,
const compat_sigset_t __user *, sigmask,
compat_size_t, sigsetsize)
 {
-   long err;
+   long err, signal_detected;
sigset_t ksigmask, sigsaved;

/*
@@ -2355,7 +2358,10 @@ COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd,

err = do_epoll_wait(epfd, events, maxevents, timeout);

-   restore_user_sigmask(sigmask, );
+   signal_detected = restore_user_sigmask(sigmask, );
+
+   if (signal_detected && !err)
+   return -EITNR;

return err;
 }
diff --git a/kernel/signal.c b/kernel/signal.c
index 3a9e41197d46..c76ab2a52ebf 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2849,11 +2849,11 @@ EXPORT_SYMBOL(set_compat_user_sigmask);
  * This is useful for syscalls such as ppoll, pselect, io_pgetevents and
  * epoll_pwait where a new sigmask is passed in from userland for the syscalls.
  */
-void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
+int restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
 {

Re: Strange issues with epoll since 5.0

2019-04-30 Thread Deepa Dinamani
I was also not able to reproduce this.
Arnd and I were talking about this today morning. Here is something
Arnd noticed:

If there was a signal after do_epoll_wait(), we never were not
entering the if (err = -EINTR) at all before. But, now we do.
We could try with the below patch:

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 4a0e98d87fcc..5cfb800cf598 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2330,7 +2330,7 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct
epoll_event __user *, events,

error = do_epoll_wait(epfd, events, maxevents, timeout);

-   restore_user_sigmask(sigmask, );
+   restore_user_sigmask(sigmask, , error == -EITNR);

return error;
 }

diff --git a/kernel/signal.c b/kernel/signal.c
index 3a9e41197d46..4a8f96f5c1c0 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2849,7 +2849,7 @@ EXPORT_SYMBOL(set_compat_user_sigmask);
  * This is useful for syscalls such as ppoll, pselect, io_pgetevents and
  * epoll_pwait where a new sigmask is passed in from userland for the syscalls.
  */
-void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
+void restore_user_sigmask(const void __user *usigmask, sigset_t
*sigsaved, int sig_pending)
 {

if (!usigmask)
@@ -2859,7 +2859,7 @@ void restore_user_sigmask(const void __user
*usigmask, sigset_t *sigsaved)
 * Restoring sigmask here can lead to delivering signals that the above
 * syscalls are intended to block because of the sigmask passed in.
 */
-   if (signal_pending(current)) {
+   if (sig_pending) {
current->saved_sigmask = *sigsaved;
set_restore_sigmask();


If this works that means we know what is busted.
I'm not sure what the hang in the userspace is about. Is it because
the syscall did not return an error or the particular signal was
blocked etc.

There are also a few timing differences also. But, can we try this first?

-Deepa


Re: Strange issues with epoll since 5.0

2019-04-27 Thread Deepa Dinamani
I tried to replicate the failure on qemu.
I do not see the failure with N=32.

Does it work for N < 32?
Does any other signal work?
Are there any other architectures that fail?

Could you help me figure out how to run just the one test that is failing?

-Deepa


Re: [PATCH v2] posix-cpu-timers: Avoid undefined behaviour in timespec64_to_ns()

2019-02-27 Thread Deepa Dinamani
On Tue, Feb 26, 2019 at 11:52 PM Xiongfeng Wang
 wrote:
>
> When I ran Syzkaller testsuite, I got the following call trace.
> 
> UBSAN: Undefined behaviour in ./include/linux/time64.h:120:27
> signed integer overflow:
> 8243129037239968815 * 10 cannot be represented in type 'long long int'
> CPU: 5 PID: 28854 Comm: syz-executor.1 Not tainted 4.19.24 #4
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0xca/0x13e lib/dump_stack.c:113
>  ubsan_epilogue+0xe/0x81 lib/ubsan.c:159
>  handle_overflow+0x193/0x1e2 lib/ubsan.c:190
>  timespec64_to_ns include/linux/time64.h:120 [inline]
>  posix_cpu_timer_set+0x95a/0xb70 kernel/time/posix-cpu-timers.c:687
>  do_timer_settime+0x198/0x2a0 kernel/time/posix-timers.c:892
>  __do_sys_timer_settime kernel/time/posix-timers.c:918 [inline]
>  __se_sys_timer_settime kernel/time/posix-timers.c:904 [inline]
>  __x64_sys_timer_settime+0x18d/0x260 kernel/time/posix-timers.c:904
>  do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x462eb9
> Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 
> 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 
> 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
> RSP: 002b:7f14e4127c58 EFLAGS: 0246 ORIG_RAX: 00df
> RAX: ffda RBX: 0073bfa0 RCX: 00462eb9
> RDX: 2080 RSI:  RDI: 
> RBP: 0004 R08:  R09: 
> R10:  R11: 0246 R12: 7f14e41286bc
> R13: 004c54cc R14: 00704278 R15: 
> 
>
> It is because 'it_interval.tv_sec' is larger than 'KTIME_SEC_MAX' and
> 'it_interval.tv_sec * NSEC_PER_SEC' overflows in 'timespec64_to_ns()'.
>
> This patch use 'timespec64_valid_restrict()' to check whether
> 'it_interval.tv_sec' is larger than 'KTIME_SEC_MAX'.
>
> Signed-off-by: Xiongfeng Wang 
> ---
>  kernel/time/posix-timers.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
> index 0e84bb7..97b773c 100644
> --- a/kernel/time/posix-timers.c
> +++ b/kernel/time/posix-timers.c
> @@ -853,8 +853,8 @@ static int do_timer_settime(timer_t timer_id, int flags,
> unsigned long flag;
> int error = 0;
>
> -   if (!timespec64_valid(_spec64->it_interval) ||
> -   !timespec64_valid(_spec64->it_value))
> +   if (!timespec64_valid_strict(_spec64->it_interval) ||
> +   !timespec64_valid_strict(_spec64->it_value))
> return -EINVAL;
>
> if (old_spec64)

sys_timer_settime() is a POSIX interface:
http://pubs.opengroup.org/onlinepubs/7908799/xsh/timer_settime.html

The timer_settime() function will fail if:

[EINVAL] The timerid argument does not correspond to an id returned by
timer_create() but not yet deleted by timer_delete().

[EINVAL] A value structure specified a nanosecond value less than zero
or greater than or equal to 1000 million.

So we cannot return EINVAL here if we want to maintain POSIX compatibility.
Maybe we should check for limit and saturate here at the syscall interface?

-Deepa


Re: [RFC PATCH] posix-cpu-timers: Avoid undefined behaviour in timespec64_to_ns()

2019-02-26 Thread Deepa Dinamani
On Tue, Feb 26, 2019 at 6:07 PM Xiongfeng Wang
 wrote:
>
> When I ran Syzkaller testsuite, I got the following call trace.
> 
> UBSAN: Undefined behaviour in ./include/linux/time64.h:120:27
> signed integer overflow:
> 8243129037239968815 * 10 cannot be represented in type 'long long int'
> CPU: 5 PID: 28854 Comm: syz-executor.1 Not tainted 4.19.24 #4
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0xca/0x13e lib/dump_stack.c:113
>  ubsan_epilogue+0xe/0x81 lib/ubsan.c:159
>  handle_overflow+0x193/0x1e2 lib/ubsan.c:190
>  timespec64_to_ns include/linux/time64.h:120 [inline]
>  posix_cpu_timer_set+0x95a/0xb70 kernel/time/posix-cpu-timers.c:687
>  do_timer_settime+0x198/0x2a0 kernel/time/posix-timers.c:892
>  __do_sys_timer_settime kernel/time/posix-timers.c:918 [inline]
>  __se_sys_timer_settime kernel/time/posix-timers.c:904 [inline]
>  __x64_sys_timer_settime+0x18d/0x260 kernel/time/posix-timers.c:904
>  do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x462eb9
> Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 
> 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 
> 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
> RSP: 002b:7f14e4127c58 EFLAGS: 0246 ORIG_RAX: 00df
> RAX: ffda RBX: 0073bfa0 RCX: 00462eb9
> RDX: 2080 RSI:  RDI: 
> RBP: 0004 R08:  R09: 
> R10:  R11: 0246 R12: 7f14e41286bc
> R13: 004c54cc R14: 00704278 R15: 
> 
>
> This patch use 'timespec64_to_ktime()' to limit 'tv_sec' to avoid
> overflow.
>
> Signed-off-by: Xiongfeng Wang 
> ---
>  kernel/time/posix-cpu-timers.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
> index 80f9552..f7e3929 100644
> --- a/kernel/time/posix-cpu-timers.c
> +++ b/kernel/time/posix-cpu-timers.c
> @@ -684,7 +684,7 @@ static int posix_cpu_timer_set(struct k_itimer *timer, 
> int timer_flags,
>  * Install the new reload setting, and
>  * set up the signal and overrun bookkeeping.
>  */
> -   timer->it.cpu.incr = timespec64_to_ns(>it_interval);
> +   timer->it.cpu.incr = 
> ktime_to_ns(timespec64_to_ktime(new->it_interval));
> timer->it_interval = ns_to_ktime(timer->it.cpu.incr);
>
> /*

This seems like a similar bug as the other one
https://lkml.org/lkml/2019/2/24/214. Maybe it makes sense here also to
do some bounds checking when we get the userspace parameter.
This patch just saturates the value.

-Deepa


Re: [PATCH] time64: Avoid undefined behaviour in timespec64_add()

2019-02-25 Thread Deepa Dinamani
On Mon, Feb 25, 2019 at 1:02 AM Arnd Bergmann  wrote:
>
> On Mon, Feb 25, 2019 at 5:53 AM Deepa Dinamani  wrote:
> >
> > On Sun, Feb 24, 2019 at 7:13 PM Hongbo Yao  wrote:
> > >
> > > I ran into this:
> > > 
> > > =
> > > UBSAN: Undefined behaviour in ./include/linux/time64.h:70:2
> > > signed integer overflow:
> > > 1551059291 + 9223372036854775807 cannot be represented in type 
> > > 'long
> > > long int'
> > > CPU: 5 PID: 20064 Comm: syz-executor.2 Not tainted 4.19.24 #4
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> > > 1.10.2-1ubuntu1 04/01/2014
> > > Call Trace:
> > >  __dump_stack lib/dump_stack.c:77 [inline]
> > >  dump_stack+0xca/0x13e lib/dump_stack.c:113
> > >  ubsan_epilogue+0xe/0x81 lib/ubsan.c:159
> > >  handle_overflow+0x193/0x1e2 lib/ubsan.c:190
> > >  timespec64_add include/linux/time64.h:70 [inline]
> ...
> > > Since lhs.tv_sec and rhs.tv_sec are both time64_t, this is a signed
> > > addition which will cause undefined behaviour on overflow.
>
> I wonder if we should treat this as undefined behavior in the kernel or not:
> The kernel is build with -fno-strict-overflow, so signed integer overflow
> is supposed to behave the same way as unsigned, and assume
> two's-complement arithmetic.

Another option is to saturate or return error when possible.
2's complement arithmetic lets us check for overflow. So I feel this
is nice for timespec64_add().

> > > @@ -67,7 +67,7 @@ static inline struct timespec64 timespec64_add(struct 
> > > timespec64 lhs,
> > > struct timespec64 rhs)
> > >  {
> > > struct timespec64 ts_delta;
> > > -   set_normalized_timespec64(_delta, lhs.tv_sec + rhs.tv_sec,
> > > +   set_normalized_timespec64(_delta, (timeu64_t)lhs.tv_sec + 
> > > rhs.tv_sec,
> > > lhs.tv_nsec + rhs.tv_nsec);
> > > return ts_delta;
> > >  }
> >
> > There is already a timespec64_add_safe() to account for such
> > overflows. That assumes both the timespec64 values are positive.
> > But, timekeeping_inject_offset() cannot use that as one of the values
> > can be negative.
>
> We could perhaps extend timespec64_add_safe() to handle both
> overflow and underflow, and allow negative arguments. It would
> have to use some extra checks then.

I was thinking the reason for having just timespec64_add() is that the
caller makes these checks in case they want to return an error on
overflow. The safe version will just saturate and caller is unaware.

> There are actually only
> a very small number of callers to timespec64_add():
>
> arch/arm/xen/enlighten.c:   *ts = timespec64_add(now, ts_monotonic);
> arch/arm/xen/enlighten.c:   system_time = timespec64_add(now,
> tk->wall_to_monotonic);
> drivers/net/ethernet/cadence/macb_ptp.c:now =
> timespec64_add(now, then);
> drivers/net/ethernet/intel/igb/igb_main.c:  ts =
> timespec64_add(adapter->perout[0].start,
> drivers/net/ethernet/intel/igb/igb_main.c:  ts =
> timespec64_add(adapter->perout[1].start,
> drivers/net/ethernet/intel/igb/igb_ptp.c:   now = timespec64_add(now, 
> then);
> fs/cifs/dfs_cache.c:return timespec64_add(now, ts);
> include/linux/rtc.h:*to_set = timespec64_add(*now, delay);
> include/linux/time64.h:static inline struct timespec64
> timespec64_add(struct timespec64 lhs,
> kernel/time/timekeeping.c:  tmp = timespec64_add(tk_xtime(tk), *ts);
> kernel/time/timekeeping.c:
> timespec64_add(timekeeping_suspend_time, delta_delta);
> net/ceph/messenger.c:   ts =
> timespec64_add(con->last_keepalive_ack, ts);
>
> It looks like an actual overflow would be really bad in most of these,
> regardless
> of the undefined behavior.

I can look at sanitizing whatever you haven't started on.

> > Are you running some kind of a fuzzer that would cause a overflow?
> > You seem to be adding INT64_MAX here. Maybe the right thing to do is
> > to add a check at the syscall interface rather than here.
>
> Returning an error from the syscall here sounds like a good idea. I'm
> not sure what we should do about the time32 version of adjtimex though
> if we decide we want that. Should we just reject any times there that
> result in a time outside of the 1970..2038 range?

utimes_common() does not return an error right now and fails silently
when the filesystem is not able to represent times. And, POSIX seems
ambiguous here.
Should both these syscalls match in behavior?

-Deepa


-Deepa


Re: [PATCH] time64: Avoid undefined behaviour in timespec64_add()

2019-02-24 Thread Deepa Dinamani
On Sun, Feb 24, 2019 at 7:13 PM Hongbo Yao  wrote:
>
> I ran into this:
> 
> =
> UBSAN: Undefined behaviour in ./include/linux/time64.h:70:2
> signed integer overflow:
> 1551059291 + 9223372036854775807 cannot be represented in type 'long
> long int'
> CPU: 5 PID: 20064 Comm: syz-executor.2 Not tainted 4.19.24 #4
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> 1.10.2-1ubuntu1 04/01/2014
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0xca/0x13e lib/dump_stack.c:113
>  ubsan_epilogue+0xe/0x81 lib/ubsan.c:159
>  handle_overflow+0x193/0x1e2 lib/ubsan.c:190
>  timespec64_add include/linux/time64.h:70 [inline]
>  timekeeping_inject_offset+0x3ed/0x4e0 kernel/time/timekeeping.c:1301
>  do_adjtimex+0x1e5/0x6c0 kernel/time/timekeeping.c:2360
>  __do_sys_clock_adjtime+0x122/0x200 kernel/time/posix-timers.c:1086
>  do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x462eb9
> Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 
> 89
> f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
> f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
> RSP: 002b:7f888aa2dc58 EFLAGS: 0246 ORIG_RAX: 0131
> RAX: ffda RBX: 0073bf00 RCX: 00462eb9
> RDX:  RSI: 23c0 RDI: 
> RBP: 0002 R08:  R09: 
> R10:  R11: 0246 R12: 7f888aa2e6bc
> R13: 004bcae8 R14: 006f6868 R15: 
> 
> ==
>
> Since lhs.tv_sec and rhs.tv_sec are both time64_t, this is a signed
> addition which will cause undefined behaviour on overflow.
>
> The easiest way to avoid the overflow is to cast one of the arguments to
> unsigned (so the addition will be done using unsigned arithmetic).
> This patch doesn't change generated code.
>
> Signed-off-by: Hongbo Yao 
> ---
>  include/linux/time64.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/time64.h b/include/linux/time64.h
> index 05634afba0db..5926bdd4167f 100644
> --- a/include/linux/time64.h
> +++ b/include/linux/time64.h
> @@ -67,7 +67,7 @@ static inline struct timespec64 timespec64_add(struct 
> timespec64 lhs,
> struct timespec64 rhs)
>  {
> struct timespec64 ts_delta;
> -   set_normalized_timespec64(_delta, lhs.tv_sec + rhs.tv_sec,
> +   set_normalized_timespec64(_delta, (timeu64_t)lhs.tv_sec + 
> rhs.tv_sec,
> lhs.tv_nsec + rhs.tv_nsec);
> return ts_delta;
>  }

There is already a timespec64_add_safe() to account for such
overflows. That assumes both the timespec64 values are positive.
But, timekeeping_inject_offset() cannot use that as one of the values
can be negative.

Are you running some kind of a fuzzer that would cause a overflow?
You seem to be adding INT64_MAX here. Maybe the right thing to do is
to add a check at the syscall interface rather than here.

-Deepa


  1   2   3   4   5   6   7   8   9   10   >