Re: [PATCH net v2 1/2] socket: fix option SO_TIMESTAMPING_NEW
> On Mon, Oct 12, 2020 at 5:36 AM Christian Eggers wrote: > > v2: > > - > > - integrated proposal from Willem de Bruijn > > - added Reviewed-by: from Willem and Deepa You may add my Acked-by: Deepa Dinamani -Deepa
Re: [PATCH net 2/2] socket: don't clear SOCK_TSTAMP_NEW when SO_TIMESTAMPNS is disabled
On Fri, Oct 9, 2020 at 5:35 PM Willem de Bruijn wrote: > > On Fri, Oct 9, 2020 at 6:32 AM Christian Eggers wrote: > > > > SOCK_TSTAMP_NEW (timespec64 instead of timespec) is also used for > > hardware time stamps (configured via SO_TIMESTAMPING_NEW). > > > > User space (ptp4l) first configures hardware time stamping via > > SO_TIMESTAMPING_NEW which sets SOCK_TSTAMP_NEW. In the next step, ptp4l > > disables SO_TIMESTAMPNS(_NEW) (software time stamps), but this must not > > switch hardware time stamps back to "32 bit mode". > > > > This problem happens on 32 bit platforms were the libc has already > > switched to struct timespec64 (from SO_TIMExxx_OLD to SO_TIMExxx_NEW > > socket options). ptp4l complains with "missing timestamp on transmitted > > peer delay request" because the wrong format is received (and > > discarded). > > > > Fixes: 887feae36aee ("socket: Add SO_TIMESTAMP[NS]_NEW") > > Fixes: 783da70e8396 ("net: add sock_enable_timestamps") > > Signed-off-by: Christian Eggers > > Acked-by: Willem de Bruijn > > Yes, we should just select SOCK_TSTAMP_NEW based on which of the two > syscall variants the process uses. > > There is no need to reset on timestamp disable: in the common case the > selection is immaterial as timestamping is disabled. > > As this commit message shows, with SO_TIMESTAMP(NS) and > SO_TIMESTAMPING that can be independently turned on and off, disabling > one can incorrectly switch modes while the other is still active. This will not help the case when a child process that inherits the fd and then sets SO_TIMESTAMP*_OLD/NEW on it, while the parent uses the other version. One of the two processes might still be surprised. But, child and parent actively using the same socket fd might be expecting trouble already. Acked-by: Deepa Dinamani -Deepa
Re: [PATCH net 1/2] socket: fix option SO_TIMESTAMPING_NEW
On Fri, Oct 9, 2020 at 5:43 PM Willem de Bruijn wrote: > > On Fri, Oct 9, 2020 at 8:30 PM Deepa Dinamani wrote: > > > > On Fri, Oct 9, 2020 at 3:32 AM Christian Eggers wrote: > > > > > > The comparison of optname with SO_TIMESTAMPING_NEW is wrong way around, > > > so SOCK_TSTAMP_NEW will first be set and than reset again. Additionally > > > move it out of the test for SOF_TIMESTAMPING_RX_SOFTWARE as this seems > > > unrelated. > > > > The SOCK_TSTAMP_NEW is reset only in the case when > > SOF_TIMESTAMPING_RX_SOFTWARE is not set. > > Note that we only call sock_enable_timestamp() at that time. > > > > Why would SOCK_TSTAMP_NEW be relevant otherwise? > > Other timestamps can be configured, such as hardware timestamps. > > As the follow-on patch shows, there is also the issue of overlap > between SO_TIMESTAMP(NS) and SO_TIMESTAMPING. I see. Thanks for clarification. I think I had missed that you could have both software and hardware timestamps enabled at the same time. > Don't select OLD on timestamp disable, which may only disable > some of the ongoing timestamping. > > Setting based on the syscall is simpler, too. __sock_set_timestamps > already uses for SO_TIMESTAMP(NS) the valbool approach I > suggest for SO_TIMESTAMPING. > > The fallthrough can also be removed. My rough patch missed that. Sounds good. -Deepa
Re: [PATCH net 1/2] socket: fix option SO_TIMESTAMPING_NEW
On Fri, Oct 9, 2020 at 3:32 AM Christian Eggers wrote: > > The comparison of optname with SO_TIMESTAMPING_NEW is wrong way around, > so SOCK_TSTAMP_NEW will first be set and than reset again. Additionally > move it out of the test for SOF_TIMESTAMPING_RX_SOFTWARE as this seems > unrelated. The SOCK_TSTAMP_NEW is reset only in the case when SOF_TIMESTAMPING_RX_SOFTWARE is not set. Note that we only call sock_enable_timestamp() at that time. Why would SOCK_TSTAMP_NEW be relevant otherwise? -Deepa
Re: [PATCH] fs/namespace.c: fix use-after-free of mount in mnt_warn_timestamp_expiry()
Thanks for the fix. Would it be better to move the check and warning to a place where the access is still safe? -Deepa > On Oct 9, 2019, at 12:19 AM, Eric Biggers wrote: > > From: Eric Biggers On Wed, Oct 9, 2019 at 12:19 AM Eric Biggers wrote: > > From: Eric Biggers > > After do_add_mount() returns success, the caller doesn't hold a > reference to the 'struct mount' anymore. So it's invalid to access it > in mnt_warn_timestamp_expiry(). > > Fix it by instead passing the super_block and the mnt_flags. It's safe > to access the super_block because it's pinned by fs_context::root. > > Reported-by: syzbot+da4f525235510683d...@syzkaller.appspotmail.com > Fixes: f8b92ba67c5d ("mount: Add mount warning for impending timestamp > expiry") > Signed-off-by: Eric Biggers > --- > fs/namespace.c | 15 +++ > 1 file changed, 7 insertions(+), 8 deletions(-) > > diff --git a/fs/namespace.c b/fs/namespace.c > index fe0e9e1410fe..7ef8edaaed69 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -2466,12 +2466,11 @@ static void set_mount_attributes(struct mount *mnt, > unsigned int mnt_flags) > unlock_mount_hash(); > } > > -static void mnt_warn_timestamp_expiry(struct path *mountpoint, struct > vfsmount *mnt) > +static void mnt_warn_timestamp_expiry(struct path *mountpoint, > + struct super_block *sb, int mnt_flags) > { > - struct super_block *sb = mnt->mnt_sb; > - > - if (!__mnt_is_readonly(mnt) && > - (ktime_get_real_seconds() + TIME_UPTIME_SEC_MAX > sb->s_time_max)) > { > + if (!(mnt_flags & MNT_READONLY) && !sb_rdonly(sb) && > + (ktime_get_real_seconds() + TIME_UPTIME_SEC_MAX > > sb->s_time_max)) { > char *buf = (char *)__get_free_page(GFP_KERNEL); > char *mntpath = buf ? d_path(mountpoint, buf, PAGE_SIZE) : > ERR_PTR(-ENOMEM); > struct tm tm; > @@ -2512,7 +2511,7 @@ static int do_reconfigure_mnt(struct path *path, > unsigned int mnt_flags) > set_mount_attributes(mnt, mnt_flags); > up_write(>s_umount); > > - mnt_warn_timestamp_expiry(path, >mnt); > + mnt_warn_timestamp_expiry(path, sb, mnt_flags); > > return ret; > } > @@ -2555,7 +2554,7 @@ static int do_remount(struct path *path, int ms_flags, > int sb_flags, > up_write(>s_umount); > } > > - mnt_warn_timestamp_expiry(path, >mnt); > + mnt_warn_timestamp_expiry(path, sb, mnt_flags); > > put_fs_context(fc); > return err; > @@ -2770,7 +2769,7 @@ static int do_new_mount_fc(struct fs_context *fc, > struct path *mountpoint, > return error; > } > > - mnt_warn_timestamp_expiry(mountpoint, mnt); > + mnt_warn_timestamp_expiry(mountpoint, sb, mnt_flags); > > return error; > } > -- > 2.23.0 >
[PATCH] ext4: Reduce ext4 timestamp warnings
When ext4 file systems were created intentionally with 128 byte inodes, the rate-limited warning of eventual possible timestamp overflow are still emitted rather frequently. Remove the warning for now. Discussion for whether any warning is needed, and where it should be emitted, can be found at https://lore.kernel.org/lkml/1567523922.5576.57.ca...@lca.pw/. I can post a separate follow-up patch after the conclusion. Reported-by: Qian Cai Signed-off-by: Deepa Dinamani --- fs/ext4/ext4.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 9e3ae3be3de9..24b14bd3feab 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -833,10 +833,8 @@ do { \ (raw_inode)->xtime ## _extra = \ ext4_encode_extra_time(&(inode)->xtime); \ } \ - else{\ + else\ (raw_inode)->xtime = cpu_to_le32(clamp_t(int32_t, (inode)->xtime.tv_sec, S32_MIN, S32_MAX));\ - ext4_warning_inode(inode, "inode does not support timestamps beyond 2038"); \ - } \ } while (0) #define EXT4_EINODE_SET_XTIME(xtime, einode, raw_inode) \ -- 2.17.1
Re: "beyond 2038" warnings from loopback mount is noisy
> On Sep 4, 2019, at 5:58 AM, Theodore Y. Ts'o wrote: > >> On Tue, Sep 03, 2019 at 09:50:09PM -0700, Deepa Dinamani wrote: >> If we don't care to warn about the timestamps that are clamped in >> memory, maybe we could just warn when they are being written out. >> Would something like this be more acceptable? I would also remove the >> warning in ext4.h. I think we don't have to check if the inode is 128 >> bytes here (Please correct me if I am wrong). If this looks ok, I can >> post this. > > That's better, but it's going to be misleading in many cases. The > inode's extra size field is 16 or larger, there will be enough space > for the timestamps, so talking about "timestamps on this inode beyond > 2038" when ext4 is unable to expand it from say, 24 to 32, won't be > true. Certain certain features won't be available, yes --- such as > project-id-based quotas, since there won't be room to store the > project ID. However, it's not going to impact the ability to store > timestamps beyond 2038. The i_extra_isize field is not just about > timestamps! I understand that i_extra_isize is not just about timestamps. It’s evident from EXT4_FITS_IN_INODE(). I think we can check for EXT4_FITS_IN_INODE() here if that will consistently eliminates false positives. But, I hear you. You think this warning is unnecessary. I think there are many file systems and I don’t think anybody would knows in’s and outs of each one. I think if I’m mounting an ext4 fs and it has mixed sizes of inodes, I think I would at least expect a dmesg(with a hint on how to fix it) considering that this filesystem is restricted in more ways than just time. Is this the purpose of the warning you already have?: if (error && (mnt_count != le16_to_cpu(sbi->s_es->s_mnt_count))) { ext4_warning(inode->i_sb, "Unable to expand inode %lu. Delete some EAs or run e2fsck.", Maybe there should be a warning, but it has nothing to do with just time. Do we already have this? -Deepa
Re: "beyond 2038" warnings from loopback mount is noisy
If we don't care to warn about the timestamps that are clamped in memory, maybe we could just warn when they are being written out. Would something like this be more acceptable? I would also remove the warning in ext4.h. I think we don't have to check if the inode is 128 bytes here (Please correct me if I am wrong). If this looks ok, I can post this. diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 9e3ae3be3de9..24b14bd3feab 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -833,10 +833,8 @@ do { \ (raw_inode)->xtime ## _extra = \ ext4_encode_extra_time(&(inode)->xtime);\ } \ - else{\ + else\ (raw_inode)->xtime = cpu_to_le32(clamp_t(int32_t, (inode)->xtime.tv_sec, S32_MIN, S32_MAX));\ - ext4_warning_inode(inode, "inode does not support timestamps beyond 2038"); \ - } \ } while (0) #define EXT4_EINODE_SET_XTIME(xtime, einode, raw_inode) \ diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index 491f9ee4040e..cef5b87cc5a6 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -2791,7 +2791,7 @@ int ext4_expand_extra_isize_ea(struct inode *inode, int new_extra_isize, cleanup: if (error && (mnt_count != le16_to_cpu(sbi->s_es->s_mnt_count))) { - ext4_warning(inode->i_sb, "Unable to expand inode %lu. Delete some EAs or run e2fsck.", + ext4_warning(inode->i_sb, "Unable to expand inode %lu. Delete some EAs or run e2fsck. Timestamps on the inode expire beyond 2038", inode->i_ino); mnt_count = le16_to_cpu(sbi->s_es->s_mnt_count); }
Re: "beyond 2038" warnings from loopback mount is noisy
> > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h > > index 9e3ae3be3de9..5a971d1b6d5e 100644 > > --- a/fs/ext4/ext4.h > > +++ b/fs/ext4/ext4.h > > @@ -835,7 +835,9 @@ do { > > \ > > } > > \ > > else{\ > > (raw_inode)->xtime = cpu_to_le32(clamp_t(int32_t, > > (inode)->xtime.tv_sec, S32_MIN, S32_MAX));\ > > - ext4_warning_inode(inode, "inode does not support > > timestamps beyond 2038"); \ > > + if (((inode)->xtime.tv_sec != (raw_inode)->xtime) && \ > > + ((inode)->i_sb->s_time_max > S32_MAX)) > > \ > > + ext4_warning_inode(inode, "inode does not > > support timestamps beyond 2038"); \ > > } \ > > } while (0) > > Sure, that's much less objectionable. The reason it was warning for every update was because of the ratelimiting. I think ratelimiting is not working well here. I will check that part. -Deepa
Re: "beyond 2038" warnings from loopback mount is noisy
On Tue, Sep 3, 2019 at 2:18 PM Theodore Y. Ts'o wrote: > > On Tue, Sep 03, 2019 at 09:18:44AM -0700, Deepa Dinamani wrote: > > > > This prints a warning for each inode that doesn't extend limits beyond > > 2038. It is rate limited by the ext4_warning_inode(). > > Looks like your filesystem has inodes that cannot be extended. > > We could use a different rate limit or ignore this corner case. Do the > > maintainers have a preference? > > We need to drop this commit (ext4: Initialize timestamps limits), or > at least the portion which adds the call to the EXT4_INODE_SET_XTIME > macro in ext4.h. As Arnd said, I think this can be fixed by warning only when the inode size is not uniformly 128 bytes in ext4.h. Is this an acceptable solution or we want to drop this warning altogether? Arnd, should I be sending a pull request again with the fix? Or, we drop the ext4 patch and I can send it to the maintainers directly? > I know of a truly vast number of servers in production all over the > world which are using 128 byte inodes, and spamming the inodes at the > maximum rate limit is a really bad idea. This includes at some major > cloud data centers where the life of individual servers in their data > centers is well understood (they're not going to last until 2038) and > nothing stored on the local Linux file systems are long-lived --- > that's all stored in the cluster file systems. The choice of 128 byte > inode was deliberately chosen to maximize storage TCO, and so spamming > a warning at high rates is going to be extremely unfriendly. > > In cases where the inode size is such that there is no chance at all > to support timestamps beyond 2038, a single warning at mount time, or > maybe a warning at mkfs time might be acceptable. But there's no > point printing a warning time each time we set a timestamp on such a > file system. It's not going to change, and past a certain point, we > need to trust that people who are using 128 byte inodes did so knowing > what the tradeoffs might be. After all, it is *not* the default. We have a single mount time warning already in place here. I did not realize some people actually chose to use 128 byte inodes on purpose. -Deepa
Re: "beyond 2038" warnings from loopback mount is noisy
We might also want to consider updating the file system the LTP is being run on here. -Deepa
Re: "beyond 2038" warnings from loopback mount is noisy
Actually this warning is coming from this patch: https://lore.kernel.org/linux-fsdevel/20190818165817.32634-10-deepa.ker...@gmail.com/ ([PATCH v8 09/20] ext4: Initialize timestamps limits). This is the code generating the warning: diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 9c7f4036021b..ae5d0c86aba2 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -832,11 +832,15 @@ static inline void ext4_decode_extra_time(struct timespec64 *time, #define EXT4_INODE_SET_XTIME(xtime, inode, raw_inode) \ do { \ - (raw_inode)->xtime = cpu_to_le32((inode)->xtime.tv_sec); \ if (EXT4_FITS_IN_INODE(raw_inode, EXT4_I(inode), xtime ## _extra)) {\ + (raw_inode)->xtime = cpu_to_le32((inode)->xtime.tv_sec);\ (raw_inode)->xtime ## _extra = \ ext4_encode_extra_time(&(inode)->xtime);\ } \ + else{\ + (raw_inode)->xtime = cpu_to_le32(clamp_t(int32_t, (inode)->xtime.tv_sec, S32_MIN, S32_MAX));\ + ext4_warning_inode(inode, "inode does not support timestamps beyond 2038"); \ + } \ } while (0) This prints a warning for each inode that doesn't extend limits beyond 2038. It is rate limited by the ext4_warning_inode(). Looks like your filesystem has inodes that cannot be extended. We could use a different rate limit or ignore this corner case. Do the maintainers have a preference? -Deepa On Tue, Sep 3, 2019 at 8:18 AM Qian Cai wrote: > > https://lore.kernel.org/linux-fsdevel/20190818165817.32634-5-deepa.kernel@gmail. > com/ > > Running only a subset of the LTP testsuite on today's linux-next with the > above > commit is now generating ~800 warnings on this machine which seems a bit > crazy. > > [ 2130.970782] EXT4-fs warning (device loop0): ext4_do_update_inode:5261: > inode > #40961: comm statx04: inode does not support timestamps beyond 2038 > [ 2130.970808] EXT4-fs warning (device loop0): ext4_do_update_inode:5262: > inode > #40961: comm statx04: inode does not support timestamps beyond 2038 > [ 2130.970838] EXT4-fs warning (device loop0): ext4_do_update_inode:5263: > inode > #40961: comm statx04: inode does not support timestamps beyond 2038 > [ 2130.971440] EXT4-fs warning (device loop0): ext4_do_update_inode:5261: > inode > #40961: comm statx04: inode does not support timestamps beyond 2038 > [ 2131.847613] EXT4-fs warning (device loop0): ext4_do_update_inode:5261: > inode > #32769: comm statx04: inode does not support timestamps beyond 2038 > [ 2131.847647] EXT4-fs warning (device loop0): ext4_do_update_inode:5262: > inode > #32769: comm statx04: inode does not support timestamps beyond 2038 > [ 2131.847681] EXT4-fs warning (device loop0): ext4_do_update_inode:5263: > inode > #32769: comm statx04: inode does not support timestamps beyond 2038 > [ 2131.847717] EXT4-fs warning (device loop0): ext4_do_update_inode:5261: > inode > #32769: comm statx04: inode does not support timestamps beyond 2038 > [ 2131.847774] EXT4-fs warning (device loop0): ext4_do_update_inode:5262: > inode > #32769: comm statx04: inode does not support timestamps beyond 2038 > [ 2131.847817] EXT4-fs warning (device loop0): ext4_do_update_inode:5263: > inode > #32769: comm statx04: inode does not support timestamps beyond 2038 > [ 2131.847909] EXT4-fs warning (device loop0): ext4_do_update_inode:5261: > inode > #32769: comm statx04: inode does not support timestamps beyond 2038 > [ 2131.847970] EXT4-fs warning (device loop0): ext4_do_update_inode:5262: > inode > #32769: comm statx04: inode does not support timestamps beyond 2038 > [ 2131.848004] EXT4-fs warning (device loop0): ext4_do_update_inode:5263: > inode > #32769: comm statx04: inode does not support timestamps beyond 2038 > [ 2131.848415] EXT4-fs warning (device loop0): ext4_do_update_inode:5261: > inode > #32769: comm statx04: inode does not support timestamps beyond 2038 > [ 2134.753752] EXT4-fs warning (device loop0): ext4_do_update_inode:5261: > inode > #12: comm statx05: inode does not support timestamps beyond 2038 > [ 2134.753783] EXT4-fs warning (device loop0): ext4_do_update_inode:5262: > inode > #12: comm statx05: inode does not support timestamps beyond 2038 > [ 2134.753814] EXT4-fs warning (device loop0): ext4_do_update_inode:5263: > inode > #12: comm statx05: inode does not support timestamps beyond 2038 > [ 2134.753847] EXT4-fs warning (device loop0): ext4_do_update_inode:5261: > inode > #12: comm statx05: inode does not support timestamps beyond 2038 > [ 2134.753889] EXT4-fs warning (device loop0): ext4_do_update_inode:5262: > inode > #12: comm statx05: inode does not support timestamps beyond 2038 > [ 2134.753929] EXT4-fs warning (device loop0): ext4_do_update_inode:5263: > inode > #12: comm statx05: inode does not support timestamps beyond 2038 > [ 2134.754021] EXT4-fs warning (device loop0): ext4_do_update_inode:5261: > inode > #12: comm
[PATCH] adfs: Fill in max and min timestamps in sb
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Note that the min timestamp is assumed to be 01 Jan 1970 00:00:00 (Unix epoch). This is consistent with the way we convert timestamps in adfs_adfs2unix_time(). Signed-off-by: Deepa Dinamani --- This depends on the following patch in Arnd's y2038 tree: https://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground y2038 188d20bcd1eb ("vfs: Add file timestamp range support") fs/adfs/adfs.h | 13 + fs/adfs/inode.c | 8 fs/adfs/super.c | 2 ++ 3 files changed, 19 insertions(+), 4 deletions(-) diff --git a/fs/adfs/adfs.h b/fs/adfs/adfs.h index 699c4fa8b78b..504ad80072ef 100644 --- a/fs/adfs/adfs.h +++ b/fs/adfs/adfs.h @@ -3,6 +3,19 @@ #include #include +/* + * 01 Jan 1970 00:00:00 (Unix epoch) as seconds since + * 01 Jan 1900 00:00:00 (RISC OS epoch) + */ +#define RISC_OS_EPOCH_DELTA 2208988800LL + +/* + * Convert 40 bit centi seconds to seconds + * since 01 Jan 1900 00:00:00 (RISC OS epoch) + * The result is 2248-06-03 06:57:57 GMT + */ +#define ADFS_MAX_TIMESTAMP ((0xFFLL / 100) - RISC_OS_EPOCH_DELTA) + /* Internal data structures for ADFS */ #define ADFS_FREE_FRAG 0 diff --git a/fs/adfs/inode.c b/fs/adfs/inode.c index 5b8017ab0a98..11acd74fb099 100644 --- a/fs/adfs/inode.c +++ b/fs/adfs/inode.c @@ -162,7 +162,10 @@ static int adfs_mode2atts(struct super_block *sb, struct inode *inode, return attr; } -static const s64 nsec_unix_epoch_diff_risc_os_epoch = 22089888000LL; +/* 01 Jan 1970 00:00:00 (Unix epoch) as nanoseconds since + * 01 Jan 1900 00:00:00 (RISC OS epoch) + */ +static const s64 nsec_unix_epoch_diff_risc_os_epoch = RISC_OS_EPOCH_DELTA * NSEC_PER_SEC; /* * Convert an ADFS time to Unix time. ADFS has a 40-bit centi-second time @@ -173,9 +176,6 @@ static void adfs_adfs2unix_time(struct timespec64 *tv, struct inode *inode) { unsigned int high, low; - /* 01 Jan 1970 00:00:00 (Unix epoch) as nanoseconds since -* 01 Jan 1900 00:00:00 (RISC OS epoch) -*/ s64 nsec; if (!adfs_inode_is_stamped(inode)) diff --git a/fs/adfs/super.c b/fs/adfs/super.c index 7a3e6b394f2a..532e2afbc7a3 100644 --- a/fs/adfs/super.c +++ b/fs/adfs/super.c @@ -381,6 +381,8 @@ static int adfs_fill_super(struct super_block *sb, void *data, int silent) sb->s_fs_info = asb; sb->s_magic = ADFS_SUPER_MAGIC; sb->s_time_gran = 1000; + sb->s_time_min = 0; + sb->s_time_max = ADFS_MAX_TIMESTAMP; /* set default options */ asb->s_uid = GLOBAL_ROOT_UID; -- 2.17.1
[GIT PULL] vfs: Add support for timestamp limits
Hi Al, Arnd, This is a pull request for filling in min and max timestamps for filesystems. I've added all the acks, and dropped the adfs patch. That will be merged through Russell's tree. Thanks, Deepa The following changes since commit 5d18cb62218608a1388858880ad3ec76d6cb0d3b: Add linux-next specific files for 20190828 (2019-08-28 19:59:14 +1000) are available in the Git repository at: https://github.com/deepa-hub/vfs limits for you to fetch changes up to f0f216afa4c7e4dee9121fde52ccf57f76119188: isofs: Initialize filesystem timestamp ranges (2019-08-28 19:19:36 -0700) Deepa Dinamani (19): vfs: Add file timestamp range support vfs: Add timestamp_truncate() api timestamp_truncate: Replace users of timespec64_trunc mount: Add mount warning for impending timestamp expiry utimes: Clamp the timestamps before update fs: Fill in max and min timestamps in superblock 9p: Fill min and max timestamps in sb ext4: Initialize timestamps limits fs: nfs: Initialize filesystem timestamp ranges fs: cifs: Initialize filesystem timestamp ranges fs: fat: Initialize filesystem timestamp ranges fs: affs: Initialize filesystem timestamp ranges fs: sysv: Initialize filesystem timestamp ranges fs: ceph: Initialize filesystem timestamp ranges fs: orangefs: Initialize filesystem timestamp ranges fs: hpfs: Initialize filesystem timestamp ranges fs: omfs: Initialize filesystem timestamp ranges pstore: fs superblock limits isofs: Initialize filesystem timestamp ranges fs/9p/vfs_super.c| 6 +- fs/affs/amigaffs.c | 2 +- fs/affs/amigaffs.h | 3 +++ fs/affs/inode.c | 4 ++-- fs/affs/super.c | 4 fs/attr.c| 21 - fs/befs/linuxvfs.c | 2 ++ fs/bfs/inode.c | 2 ++ fs/ceph/super.c | 2 ++ fs/cifs/cifsfs.c | 22 ++ fs/cifs/netmisc.c| 14 +++--- fs/coda/inode.c | 3 +++ fs/configfs/inode.c | 12 ++-- fs/cramfs/inode.c| 2 ++ fs/efs/super.c | 2 ++ fs/ext2/super.c | 2 ++ fs/ext4/ext4.h | 10 +- fs/ext4/super.c | 17 +++-- fs/f2fs/file.c | 21 - fs/fat/inode.c | 12 fs/freevxfs/vxfs_super.c | 2 ++ fs/hpfs/hpfs_fn.h| 6 ++ fs/hpfs/super.c | 2 ++ fs/inode.c | 33 - fs/isofs/inode.c | 7 +++ fs/jffs2/fs.c| 3 +++ fs/jfs/super.c | 2 ++ fs/kernfs/inode.c| 7 +++ fs/minix/inode.c | 2 ++ fs/namespace.c | 33 - fs/nfs/super.c | 20 +++- fs/ntfs/inode.c | 21 - fs/omfs/inode.c | 4 fs/orangefs/super.c | 2 ++ fs/pstore/ram.c | 2 ++ fs/qnx4/inode.c | 2 ++ fs/qnx6/inode.c | 2 ++ fs/reiserfs/super.c | 3 +++ fs/romfs/super.c | 2 ++ fs/squashfs/super.c | 2 ++ fs/super.c | 2 ++ fs/sysv/super.c | 5 - fs/ubifs/file.c | 21 - fs/ufs/super.c | 7 +++ fs/utimes.c | 6 ++ fs/xfs/xfs_super.c | 2 ++ include/linux/fs.h | 5 + include/linux/time64.h | 2 ++ 48 files changed, 298 insertions(+), 72 deletions(-)
Re: [PATCH v8 19/20] pstore: fs superblock limits
On Tue, Aug 20, 2019 at 12:20 AM Kees Cook wrote: > > On Sun, Aug 18, 2019 at 09:58:16AM -0700, Deepa Dinamani wrote: > > Leaving granularity at 1ns because it is dependent on the specific > > attached backing pstore module. ramoops has microsecond resolution. > > > > Fix the readback of ramoops fractional timestamp microseconds, > > which has incorrectly been reporting the value as nanoseconds since > > 3f8f80f0 ("pstore/ram: Read and write to the 'compressed' flag of pstore"). > > As such, this should also have: > > Fixes: 3f8f80f0cfeb ("pstore/ram: Read and write to the 'compressed' flag of > pstore") Will add that in. Thanks. > > Signed-off-by: Deepa Dinamani > > Acked-by: Kees Cook > > Also: this is going via some other tree, yes? (Or should I pick this up > for the pstore tree?) I am hoping Al can take the series as a whole. -Deepa
Re: [PATCH v8 08/20] adfs: Fill in max and min timestamps in sb
On Tue, Aug 20, 2019 at 9:28 AM Matthew Wilcox wrote: > > On Sun, Aug 18, 2019 at 09:58:05AM -0700, Deepa Dinamani wrote: > > Note that the min timestamp is assumed to be > > 01 Jan 1970 00:00:00 (Unix epoch). This is consistent > > with the way we convert timestamps in adfs_adfs2unix_time(). > > That's not actually correct. RISC OS timestamps are centiseconds since > 1900 stored in 5 bytes. The timestamp can hold earlier values but the fs implementation explicitly rejects those in adfs_adfs2unix_time() too_early check. We could fix the implementation to not throw away times before 1970. Are you suggesting we want to do this? I could post a separate patch to fix this or we could do it as part of the series. static void adfs_adfs2unix_time(struct timespec64 *tv, struct inode *inode) { unsigned int high, low; static const s64 nsec_unix_epoch_diff_risc_os_epoch = RISC_OS_EPOCH_DELTA * NSEC_PER_SEC; s64 nsec; if (!adfs_inode_is_stamped(inode)) goto cur_time; high = ADFS_I(inode)->loadaddr & 0xFF; /* top 8 bits of timestamp */ low = ADFS_I(inode)->execaddr;/* bottom 32 bits of timestamp */ /* convert 40-bit centi-seconds to 32-bit seconds * going via nanoseconds to retain precision */ nsec = (((s64) high << 32) | (s64) low) * 1000; /* cs to ns */ /* Files dated pre 01 Jan 1970 00:00:00. */ if (nsec < nsec_unix_epoch_diff_risc_os_epoch) goto too_early; /* convert from RISC OS to Unix epoch */ nsec -= nsec_unix_epoch_diff_risc_os_epoch; *tv = ns_to_timespec64(nsec); return; cur_time: *tv = current_time(inode); return; too_early: tv->tv_sec = tv->tv_nsec = 0; return; } -Deepa
[PATCH v8 07/20] 9p: Fill min and max timestamps in sb
struct p9_wstat and struct p9_stat_dotl indicate that the wire transport uses u32 and u64 fields for timestamps. Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Note that the upper bound for V9FS_PROTO_2000L is retained as S64_MAX. This is because that is the upper bound supported by vfs. Signed-off-by: Deepa Dinamani Cc: eri...@gmail.com Cc: lu...@ionkov.net Cc: asmad...@codewreck.org Cc: v9fs-develo...@lists.sourceforge.net --- fs/9p/vfs_super.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c index 08112fbcaece..ca243e658d71 100644 --- a/fs/9p/vfs_super.c +++ b/fs/9p/vfs_super.c @@ -69,8 +69,12 @@ v9fs_fill_super(struct super_block *sb, struct v9fs_session_info *v9ses, if (v9fs_proto_dotl(v9ses)) { sb->s_op = _super_ops_dotl; sb->s_xattr = v9fs_xattr_handlers; - } else + } else { sb->s_op = _super_ops; + sb->s_time_max = U32_MAX; + } + + sb->s_time_min = 0; ret = super_setup_bdi(sb); if (ret) -- 2.17.1
[PATCH v8 10/20] fs: nfs: Initialize filesystem timestamp ranges
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. The time formats for various verious is detailed in the RFCs as below: https://tools.ietf.org/html/rfc7862(time metadata) https://tools.ietf.org/html/rfc7530: nfstime4 struct nfstime4 { int64_t seconds; uint32_tnseconds; }; https://tools.ietf.org/html/rfc1094 struct timeval { unsigned int seconds; unsigned int useconds; }; https://tools.ietf.org/html/rfc1813 struct nfstime3 { uint32 seconds; uint32 nseconds; }; Use the limits as per the RFC. Signed-off-by: Deepa Dinamani Cc: trond.mykleb...@hammerspace.com Cc: anna.schuma...@netapp.com Cc: linux-...@vger.kernel.org --- fs/nfs/super.c | 20 +++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/fs/nfs/super.c b/fs/nfs/super.c index 703f595dce90..19a76cfa8b1f 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -2382,6 +2382,15 @@ void nfs_fill_super(struct super_block *sb, struct nfs_mount_info *mount_info) sb->s_flags |= SB_POSIXACL; sb->s_time_gran = 1; sb->s_export_op = _export_ops; + } else + sb->s_time_gran = 1000; + + if (server->nfs_client->rpc_ops->version != 4) { + sb->s_time_min = 0; + sb->s_time_max = U32_MAX; + } else { + sb->s_time_min = S64_MIN; + sb->s_time_max = S64_MAX; } nfs_initialise_sb(sb); @@ -2402,7 +2411,6 @@ static void nfs_clone_super(struct super_block *sb, sb->s_maxbytes = old_sb->s_maxbytes; sb->s_xattr = old_sb->s_xattr; sb->s_op = old_sb->s_op; - sb->s_time_gran = 1; sb->s_export_op = old_sb->s_export_op; if (server->nfs_client->rpc_ops->version != 2) { @@ -2410,6 +2418,16 @@ static void nfs_clone_super(struct super_block *sb, * so ourselves when necessary. */ sb->s_flags |= SB_POSIXACL; + sb->s_time_gran = 1; + } else + sb->s_time_gran = 1000; + + if (server->nfs_client->rpc_ops->version != 4) { + sb->s_time_min = 0; + sb->s_time_max = U32_MAX; + } else { + sb->s_time_min = S64_MIN; + sb->s_time_max = S64_MAX; } nfs_initialise_sb(sb); -- 2.17.1
[PATCH v8 11/20] fs: cifs: Initialize filesystem timestamp ranges
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Also fixed cnvrtDosUnixTm calculations to avoid int overflow while computing maximum date. References: http://cifs.com/ https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-cifs/d416ff7c-c536-406e-a951-4f04b2fd1d2b Signed-off-by: Deepa Dinamani Cc: sfre...@samba.org Cc: linux-c...@vger.kernel.org --- fs/cifs/cifsfs.c | 22 ++ fs/cifs/netmisc.c | 14 +++--- 2 files changed, 29 insertions(+), 7 deletions(-) diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c index 3289b566463f..7a75726442ad 100644 --- a/fs/cifs/cifsfs.c +++ b/fs/cifs/cifsfs.c @@ -56,6 +56,15 @@ #include "dfs_cache.h" #endif +/* + * DOS dates from 1980/1/1 through 2107/12/31 + * Protocol specifications indicate the range should be to 119, which + * limits maximum year to 2099. But this range has not been checked. + */ +#define SMB_DATE_MAX (127<<9 | 12<<5 | 31) +#define SMB_DATE_MIN (0<<9 | 1<<5 | 1) +#define SMB_TIME_MAX (23<<11 | 59<<5 | 29) + int cifsFYI = 0; bool traceSMB; bool enable_oplocks = true; @@ -142,6 +151,7 @@ cifs_read_super(struct super_block *sb) struct inode *inode; struct cifs_sb_info *cifs_sb; struct cifs_tcon *tcon; + struct timespec64 ts; int rc = 0; cifs_sb = CIFS_SB(sb); @@ -161,6 +171,18 @@ cifs_read_super(struct super_block *sb) /* BB FIXME fix time_gran to be larger for LANMAN sessions */ sb->s_time_gran = 100; + if (tcon->unix_ext) { + ts = cifs_NTtimeToUnix(0); + sb->s_time_min = ts.tv_sec; + ts = cifs_NTtimeToUnix(cpu_to_le64(S64_MAX)); + sb->s_time_max = ts.tv_sec; + } else { + ts = cnvrtDosUnixTm(cpu_to_le16(SMB_DATE_MIN), 0, 0); + sb->s_time_min = ts.tv_sec; + ts = cnvrtDosUnixTm(cpu_to_le16(SMB_DATE_MAX), cpu_to_le16(SMB_TIME_MAX), 0); + sb->s_time_max = ts.tv_sec; + } + sb->s_magic = CIFS_MAGIC_NUMBER; sb->s_op = _super_ops; sb->s_xattr = cifs_xattr_handlers; diff --git a/fs/cifs/netmisc.c b/fs/cifs/netmisc.c index ed92958e842d..49c17ee18254 100644 --- a/fs/cifs/netmisc.c +++ b/fs/cifs/netmisc.c @@ -949,8 +949,8 @@ static const int total_days_of_prev_months[] = { struct timespec64 cnvrtDosUnixTm(__le16 le_date, __le16 le_time, int offset) { struct timespec64 ts; - time64_t sec; - int min, days, month, year; + time64_t sec, days; + int min, day, month, year; u16 date = le16_to_cpu(le_date); u16 time = le16_to_cpu(le_time); SMB_TIME *st = (SMB_TIME *) @@ -966,15 +966,15 @@ struct timespec64 cnvrtDosUnixTm(__le16 le_date, __le16 le_time, int offset) sec += 60 * 60 * st->Hours; if (st->Hours > 24) cifs_dbg(VFS, "illegal hours %d\n", st->Hours); - days = sd->Day; + day = sd->Day; month = sd->Month; - if (days < 1 || days > 31 || month < 1 || month > 12) { - cifs_dbg(VFS, "illegal date, month %d day: %d\n", month, days); - days = clamp(days, 1, 31); + if (day < 1 || day > 31 || month < 1 || month > 12) { + cifs_dbg(VFS, "illegal date, month %d day: %d\n", month, day); + day = clamp(day, 1, 31); month = clamp(month, 1, 12); } month -= 1; - days += total_days_of_prev_months[month]; + days = day + total_days_of_prev_months[month]; days += 3652; /* account for difference in days between 1980 and 1970 */ year = sd->Year; days += year * 365; -- 2.17.1
[PATCH v8 13/20] fs: affs: Initialize filesystem timestamp ranges
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Also fix timestamp calculation to avoid overflow while converting from days to seconds. Signed-off-by: Deepa Dinamani Acked-by: David Sterba Cc: dste...@suse.com --- fs/affs/amigaffs.c | 2 +- fs/affs/amigaffs.h | 3 +++ fs/affs/inode.c| 4 ++-- fs/affs/super.c| 4 4 files changed, 10 insertions(+), 3 deletions(-) diff --git a/fs/affs/amigaffs.c b/fs/affs/amigaffs.c index 14a6c1b90c9f..f708c45d5f66 100644 --- a/fs/affs/amigaffs.c +++ b/fs/affs/amigaffs.c @@ -375,7 +375,7 @@ affs_secs_to_datestamp(time64_t secs, struct affs_date *ds) u32 minute; s32 rem; - secs -= sys_tz.tz_minuteswest * 60 + ((8 * 365 + 2) * 24 * 60 * 60); + secs -= sys_tz.tz_minuteswest * 60 + AFFS_EPOCH_DELTA; if (secs < 0) secs = 0; days= div_s64_rem(secs, 86400, ); diff --git a/fs/affs/amigaffs.h b/fs/affs/amigaffs.h index f9bef9056659..81fb396d4dfa 100644 --- a/fs/affs/amigaffs.h +++ b/fs/affs/amigaffs.h @@ -32,6 +32,9 @@ #define AFFS_ROOT_BMAPS25 +/* Seconds since Amiga epoch of 1978/01/01 to UNIX */ +#define AFFS_EPOCH_DELTA ((8 * 365 + 2) * 86400LL) + struct affs_date { __be32 days; __be32 mins; diff --git a/fs/affs/inode.c b/fs/affs/inode.c index 73598bff8506..a346cf7659f1 100644 --- a/fs/affs/inode.c +++ b/fs/affs/inode.c @@ -150,10 +150,10 @@ struct inode *affs_iget(struct super_block *sb, unsigned long ino) } inode->i_mtime.tv_sec = inode->i_atime.tv_sec = inode->i_ctime.tv_sec - = (be32_to_cpu(tail->change.days) * (24 * 60 * 60) + + = (be32_to_cpu(tail->change.days) * 86400LL + be32_to_cpu(tail->change.mins) * 60 + be32_to_cpu(tail->change.ticks) / 50 + -((8 * 365 + 2) * 24 * 60 * 60)) + +AFFS_EPOCH_DELTA) + sys_tz.tz_minuteswest * 60; inode->i_mtime.tv_nsec = inode->i_ctime.tv_nsec = inode->i_atime.tv_nsec = 0; affs_brelse(bh); diff --git a/fs/affs/super.c b/fs/affs/super.c index e7d036efbaa1..cc463ae47c12 100644 --- a/fs/affs/super.c +++ b/fs/affs/super.c @@ -355,6 +355,10 @@ static int affs_fill_super(struct super_block *sb, void *data, int silent) sb->s_op= _sops; sb->s_flags |= SB_NODIRATIME; + sb->s_time_gran = NSEC_PER_SEC; + sb->s_time_min = sys_tz.tz_minuteswest * 60 + AFFS_EPOCH_DELTA; + sb->s_time_max = 86400LL * U32_MAX + 86400 + sb->s_time_min; + sbi = kzalloc(sizeof(struct affs_sb_info), GFP_KERNEL); if (!sbi) return -ENOMEM; -- 2.17.1
[PATCH v8 09/20] ext4: Initialize timestamps limits
ext4 has different overflow limits for max filesystem timestamps based on the extra bytes available. The timestamp limits are calculated according to the encoding table in a4dad1ae24f85i(ext4: Fix handling of extended tv_sec): * extra msb of adjust for signed * epoch 32-bit 32-bit tv_sec to * bits timedecoded 64-bit tv_sec 64-bit tv_sec valid time range * 0 01-0x8000..-0x0001 0x0 1901-12-13..1969-12-31 * 0 000x0..0x07fff 0x0 1970-01-01..2038-01-19 * 0 110x08000..0x0 0x1 2038-01-19..2106-02-07 * 0 100x1..0x17fff 0x1 2106-02-07..2174-02-25 * 1 010x18000..0x1 0x2 2174-02-25..2242-03-16 * 1 000x2..0x27fff 0x2 2242-03-16..2310-04-04 * 1 110x28000..0x2 0x3 2310-04-04..2378-04-22 * 1 100x3..0x37fff 0x3 2378-04-22..2446-05-10 Note that the time limits are not correct for deletion times. Added a warn when an inode cannot be extended to incorporate an extended timestamp. Signed-off-by: Deepa Dinamani Reviewed-by: Andreas Dilger Cc: ty...@mit.edu Cc: adilger.ker...@dilger.ca Cc: linux-e...@vger.kernel.org --- fs/ext4/ext4.h | 10 +- fs/ext4/super.c | 17 +++-- 2 files changed, 24 insertions(+), 3 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 9c7f4036021b..ae5d0c86aba2 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -832,11 +832,15 @@ static inline void ext4_decode_extra_time(struct timespec64 *time, #define EXT4_INODE_SET_XTIME(xtime, inode, raw_inode) \ do { \ - (raw_inode)->xtime = cpu_to_le32((inode)->xtime.tv_sec); \ if (EXT4_FITS_IN_INODE(raw_inode, EXT4_I(inode), xtime ## _extra)) {\ + (raw_inode)->xtime = cpu_to_le32((inode)->xtime.tv_sec); \ (raw_inode)->xtime ## _extra = \ ext4_encode_extra_time(&(inode)->xtime); \ } \ + else{\ + (raw_inode)->xtime = cpu_to_le32(clamp_t(int32_t, (inode)->xtime.tv_sec, S32_MIN, S32_MAX));\ + ext4_warning_inode(inode, "inode does not support timestamps beyond 2038"); \ + } \ } while (0) #define EXT4_EINODE_SET_XTIME(xtime, einode, raw_inode) \ @@ -1643,6 +1647,10 @@ static inline bool ext4_verity_in_progress(struct inode *inode) #define EXT4_GOOD_OLD_INODE_SIZE 128 +#define EXT4_EXTRA_TIMESTAMP_MAX (((s64)1 << 34) - 1 + S32_MIN) +#define EXT4_NON_EXTRA_TIMESTAMP_MAX S32_MAX +#define EXT4_TIMESTAMP_MIN S32_MIN + /* * Feature set definitions */ diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 27cd622676e7..3db5f17228b7 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -4039,8 +4039,21 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) sbi->s_inode_size); goto failed_mount; } - if (sbi->s_inode_size > EXT4_GOOD_OLD_INODE_SIZE) - sb->s_time_gran = 1 << (EXT4_EPOCH_BITS - 2); + /* +* i_atime_extra is the last extra field available for [acm]times in +* struct ext4_inode. Checking for that field should suffice to ensure +* we have extra space for all three. +*/ + if (sbi->s_inode_size >= offsetof(struct ext4_inode, i_atime_extra) + + sizeof(((struct ext4_inode *)0)->i_atime_extra)) { + sb->s_time_gran = 1; + sb->s_time_max = EXT4_EXTRA_TIMESTAMP_MAX; + } else { + sb->s_time_gran = NSEC_PER_SEC; + sb->s_time_max = EXT4_NON_EXTRA_TIMESTAMP_MAX; + } + + sb->s_time_min = EXT4_TIMESTAMP_MIN; } sbi->s_desc_size = le16_to_cpu(es->s_desc_size); -- 2.17.1
[PATCH v8 03/20] timestamp_truncate: Replace users of timespec64_trunc
Update the inode timestamp updates to use timestamp_truncate() instead of timespec64_trunc(). The change was mostly generated by the following coccinelle script. virtual context virtual patch @r1 depends on patch forall@ struct inode *inode; identifier i_xtime =~ "^i_[acm]time$"; expression e; @@ inode->i_xtime = - timespec64_trunc( + timestamp_truncate( ..., - e); + inode); Signed-off-by: Deepa Dinamani Cc: adrian.hun...@intel.com Cc: dedeki...@gmail.com Cc: gre...@linuxfoundation.org Cc: h...@lst.de Cc: jaeg...@kernel.org Cc: jl...@evilplan.org Cc: rich...@nod.at Cc: t...@kernel.org Cc: yuch...@huawei.com Cc: linux-f2fs-de...@lists.sourceforge.net Cc: linux-ntfs-...@lists.sourceforge.net Cc: linux-...@lists.infradead.org --- fs/attr.c | 21 - fs/configfs/inode.c | 12 ++-- fs/f2fs/file.c | 21 - fs/kernfs/inode.c | 7 +++ fs/ntfs/inode.c | 21 - fs/ubifs/file.c | 21 - 6 files changed, 57 insertions(+), 46 deletions(-) diff --git a/fs/attr.c b/fs/attr.c index 1fcfdcc5b367..97b60ad7f419 100644 --- a/fs/attr.c +++ b/fs/attr.c @@ -183,15 +183,18 @@ void setattr_copy(struct inode *inode, const struct iattr *attr) inode->i_uid = attr->ia_uid; if (ia_valid & ATTR_GID) inode->i_gid = attr->ia_gid; - if (ia_valid & ATTR_ATIME) - inode->i_atime = timespec64_trunc(attr->ia_atime, - inode->i_sb->s_time_gran); - if (ia_valid & ATTR_MTIME) - inode->i_mtime = timespec64_trunc(attr->ia_mtime, - inode->i_sb->s_time_gran); - if (ia_valid & ATTR_CTIME) - inode->i_ctime = timespec64_trunc(attr->ia_ctime, - inode->i_sb->s_time_gran); + if (ia_valid & ATTR_ATIME) { + inode->i_atime = timestamp_truncate(attr->ia_atime, + inode); + } + if (ia_valid & ATTR_MTIME) { + inode->i_mtime = timestamp_truncate(attr->ia_mtime, + inode); + } + if (ia_valid & ATTR_CTIME) { + inode->i_ctime = timestamp_truncate(attr->ia_ctime, + inode); + } if (ia_valid & ATTR_MODE) { umode_t mode = attr->ia_mode; diff --git a/fs/configfs/inode.c b/fs/configfs/inode.c index ab0284321912..884dcf06cfbe 100644 --- a/fs/configfs/inode.c +++ b/fs/configfs/inode.c @@ -76,14 +76,14 @@ int configfs_setattr(struct dentry * dentry, struct iattr * iattr) if (ia_valid & ATTR_GID) sd_iattr->ia_gid = iattr->ia_gid; if (ia_valid & ATTR_ATIME) - sd_iattr->ia_atime = timespec64_trunc(iattr->ia_atime, - inode->i_sb->s_time_gran); + sd_iattr->ia_atime = timestamp_truncate(iattr->ia_atime, + inode); if (ia_valid & ATTR_MTIME) - sd_iattr->ia_mtime = timespec64_trunc(iattr->ia_mtime, - inode->i_sb->s_time_gran); + sd_iattr->ia_mtime = timestamp_truncate(iattr->ia_mtime, + inode); if (ia_valid & ATTR_CTIME) - sd_iattr->ia_ctime = timespec64_trunc(iattr->ia_ctime, - inode->i_sb->s_time_gran); + sd_iattr->ia_ctime = timestamp_truncate(iattr->ia_ctime, + inode); if (ia_valid & ATTR_MODE) { umode_t mode = iattr->ia_mode; diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 89a9ee22296d..af8cdd345f3d 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -749,15 +749,18 @@ static void __setattr_copy(struct inode *inode, const struct iattr *attr) inode->i_uid = attr->ia_uid; if (ia_valid & ATTR_GID) inode->i_gid = attr->ia_gid; - if (ia_valid & ATTR_ATIME) - inode->i_atime = timespec64_trunc(attr->ia_atime, - inode->i_sb->s_time_gran); - if (ia_valid & ATTR_MTIME) - inode->i_mtime = timespec64_trunc(attr->ia_mtime, - inode->i_sb->s_time_gran); - if (ia_valid & ATTR_CTIME) - inode->i_ctime = timespec64_trunc(attr->ia_ctime, - ino
[PATCH v8 19/20] pstore: fs superblock limits
Leaving granularity at 1ns because it is dependent on the specific attached backing pstore module. ramoops has microsecond resolution. Fix the readback of ramoops fractional timestamp microseconds, which has incorrectly been reporting the value as nanoseconds since 3f8f80f0 ("pstore/ram: Read and write to the 'compressed' flag of pstore"). Signed-off-by: Deepa Dinamani Acked-by: Kees Cook Cc: an...@enomsg.org Cc: ccr...@android.com Cc: keesc...@chromium.org Cc: tony.l...@intel.com --- fs/pstore/ram.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c index 2bb3468fc93a..8caff834f002 100644 --- a/fs/pstore/ram.c +++ b/fs/pstore/ram.c @@ -144,6 +144,7 @@ static int ramoops_read_kmsg_hdr(char *buffer, struct timespec64 *time, if (sscanf(buffer, RAMOOPS_KERNMSG_HDR "%lld.%lu-%c\n%n", (time64_t *)>tv_sec, >tv_nsec, _type, _length) == 3) { + time->tv_nsec *= 1000; if (data_type == 'C') *compressed = true; else @@ -151,6 +152,7 @@ static int ramoops_read_kmsg_hdr(char *buffer, struct timespec64 *time, } else if (sscanf(buffer, RAMOOPS_KERNMSG_HDR "%lld.%lu\n%n", (time64_t *)>tv_sec, >tv_nsec, _length) == 2) { + time->tv_nsec *= 1000; *compressed = false; } else { time->tv_sec = 0; -- 2.17.1
[PATCH v8 12/20] fs: fat: Initialize filesystem timestamp ranges
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Some FAT variants indicate that the years after 2099 are not supported. Since commit 7decd1cb0305 ("fat: Fix and cleanup timestamp conversion") we support the full range of years that can be represented, up to 2107. Signed-off-by: Deepa Dinamani Cc: hirof...@mail.parknet.co.jp --- fs/fat/inode.c | 12 1 file changed, 12 insertions(+) diff --git a/fs/fat/inode.c b/fs/fat/inode.c index 0bc2abc5d453..f27f84e2103f 100644 --- a/fs/fat/inode.c +++ b/fs/fat/inode.c @@ -31,6 +31,11 @@ #define KB_IN_SECTORS 2 +/* DOS dates from 1980/1/1 through 2107/12/31 */ +#define FAT_DATE_MIN (0<<9 | 1<<5 | 1) +#define FAT_DATE_MAX (127<<9 | 12<<5 | 31) +#define FAT_TIME_MAX (23<<11 | 59<<5 | 29) + /* * A deserialized copy of the on-disk structure laid out in struct * fat_boot_sector. @@ -1617,6 +1622,7 @@ int fat_fill_super(struct super_block *sb, void *data, int silent, int isvfat, int debug; long error; char buf[50]; + struct timespec64 ts; /* * GFP_KERNEL is ok here, because while we do hold the @@ -1710,6 +1716,12 @@ int fat_fill_super(struct super_block *sb, void *data, int silent, int isvfat, sbi->free_clus_valid = 0; sbi->prev_free = FAT_START_ENT; sb->s_maxbytes = 0x; + fat_time_fat2unix(sbi, , 0, cpu_to_le16(FAT_DATE_MIN), 0); + sb->s_time_min = ts.tv_sec; + + fat_time_fat2unix(sbi, , cpu_to_le16(FAT_TIME_MAX), + cpu_to_le16(FAT_DATE_MAX), 0); + sb->s_time_max = ts.tv_sec; if (!sbi->fat_length && bpb.fat32_length) { struct fat_boot_fsinfo *fsinfo; -- 2.17.1
[PATCH v8 17/20] fs: hpfs: Initialize filesystem timestamp ranges
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Also change the local_to_gmt() to use time64_t instead of time32_t. Signed-off-by: Deepa Dinamani Cc: miku...@artax.karlin.mff.cuni.cz --- fs/hpfs/hpfs_fn.h | 6 ++ fs/hpfs/super.c | 2 ++ 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/hpfs/hpfs_fn.h b/fs/hpfs/hpfs_fn.h index ab2e7cc2ff33..1cca83218fb5 100644 --- a/fs/hpfs/hpfs_fn.h +++ b/fs/hpfs/hpfs_fn.h @@ -334,7 +334,7 @@ long hpfs_ioctl(struct file *file, unsigned cmd, unsigned long arg); * local time (HPFS) to GMT (Unix) */ -static inline time64_t local_to_gmt(struct super_block *s, time32_t t) +static inline time64_t local_to_gmt(struct super_block *s, time64_t t) { extern struct timezone sys_tz; return t + sys_tz.tz_minuteswest * 60 + hpfs_sb(s)->sb_timeshift; @@ -343,9 +343,7 @@ static inline time64_t local_to_gmt(struct super_block *s, time32_t t) static inline time32_t gmt_to_local(struct super_block *s, time64_t t) { extern struct timezone sys_tz; - t = t - sys_tz.tz_minuteswest * 60 - hpfs_sb(s)->sb_timeshift; - - return clamp_t(time64_t, t, 0, U32_MAX); + return t - sys_tz.tz_minuteswest * 60 - hpfs_sb(s)->sb_timeshift; } static inline time32_t local_get_seconds(struct super_block *s) diff --git a/fs/hpfs/super.c b/fs/hpfs/super.c index 9db6d84f0d62..0a677a9aaf34 100644 --- a/fs/hpfs/super.c +++ b/fs/hpfs/super.c @@ -614,6 +614,8 @@ static int hpfs_fill_super(struct super_block *s, void *options, int silent) s->s_magic = HPFS_SUPER_MAGIC; s->s_op = _sops; s->s_d_op = _dentry_operations; + s->s_time_min = local_to_gmt(s, 0); + s->s_time_max = local_to_gmt(s, U32_MAX); sbi->sb_root = le32_to_cpu(superblock->root); sbi->sb_fs_size = le32_to_cpu(superblock->n_sectors); -- 2.17.1
[PATCH v8 14/20] fs: sysv: Initialize filesystem timestamp ranges
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Signed-off-by: Deepa Dinamani Cc: h...@infradead.org --- fs/sysv/super.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/sysv/super.c b/fs/sysv/super.c index d788b1daa7eb..cc8e2ed155c8 100644 --- a/fs/sysv/super.c +++ b/fs/sysv/super.c @@ -368,7 +368,8 @@ static int sysv_fill_super(struct super_block *sb, void *data, int silent) sbi->s_block_base = 0; mutex_init(>s_lock); sb->s_fs_info = sbi; - + sb->s_time_min = 0; + sb->s_time_max = U32_MAX; sb_set_blocksize(sb, BLOCK_SIZE); for (i = 0; i < ARRAY_SIZE(flavours) && !size; i++) { @@ -487,6 +488,8 @@ static int v7_fill_super(struct super_block *sb, void *data, int silent) sbi->s_type = FSTYPE_V7; mutex_init(>s_lock); sb->s_fs_info = sbi; + sb->s_time_min = 0; + sb->s_time_max = U32_MAX; sb_set_blocksize(sb, 512); -- 2.17.1
[PATCH v8 16/20] fs: orangefs: Initialize filesystem timestamp ranges
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Assume the limits as unsigned according to the below commit 98e8eef557a9 ("changed PVFS_time from int64_t to uint64_t") in https://github.com/waltligon/orangefs Author: Neill Miller Date: Thu Sep 2 15:00:38 2004 + Signed-off-by: Deepa Dinamani Cc: hub...@omnibond.com Cc: mar...@omnibond.com Cc: de...@lists.orangefs.org --- fs/orangefs/super.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/orangefs/super.c b/fs/orangefs/super.c index 2811586fafc2..1ec08fe459cf 100644 --- a/fs/orangefs/super.c +++ b/fs/orangefs/super.c @@ -425,6 +425,8 @@ static int orangefs_fill_sb(struct super_block *sb, sb->s_blocksize = PAGE_SIZE; sb->s_blocksize_bits = PAGE_SHIFT; sb->s_maxbytes = MAX_LFS_FILESIZE; + sb->s_time_min = 0; + sb->s_time_max = S64_MAX; ret = super_setup_bdi(sb); if (ret) -- 2.17.1
[PATCH v8 18/20] fs: omfs: Initialize filesystem timestamp ranges
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Signed-off-by: Deepa Dinamani Acked-by: Bob Copeland Cc: m...@bobcopeland.com Cc: linux-karma-de...@lists.sourceforge.net --- fs/omfs/inode.c | 4 1 file changed, 4 insertions(+) diff --git a/fs/omfs/inode.c b/fs/omfs/inode.c index 08226a835ec3..b76ec6b88ded 100644 --- a/fs/omfs/inode.c +++ b/fs/omfs/inode.c @@ -478,6 +478,10 @@ static int omfs_fill_super(struct super_block *sb, void *data, int silent) sb->s_maxbytes = 0x; + sb->s_time_gran = NSEC_PER_MSEC; + sb->s_time_min = 0; + sb->s_time_max = U64_MAX / MSEC_PER_SEC; + sb_set_blocksize(sb, 0x200); bh = sb_bread(sb, 0); -- 2.17.1
[PATCH v8 20/20] isofs: Initialize filesystem timestamp ranges
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Reference: http://www.ecma-international.org/publications/standards/Ecma-119.htm Signed-off-by: Deepa Dinamani --- fs/isofs/inode.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/fs/isofs/inode.c b/fs/isofs/inode.c index 9e30d8703735..62c0462dc89f 100644 --- a/fs/isofs/inode.c +++ b/fs/isofs/inode.c @@ -30,6 +30,9 @@ #include "isofs.h" #include "zisofs.h" +/* max tz offset is 13 hours */ +#define MAX_TZ_OFFSET (52*15*60) + #define BEQUIET static int isofs_hashi(const struct dentry *parent, struct qstr *qstr); @@ -801,6 +804,10 @@ static int isofs_fill_super(struct super_block *s, void *data, int silent) */ s->s_maxbytes = 0x800LL; + /* ECMA-119 timestamp from 1900/1/1 with tz offset */ + s->s_time_min = mktime64(1900, 1, 1, 0, 0, 0) - MAX_TZ_OFFSET; + s->s_time_max = mktime64(U8_MAX+1900, 12, 31, 23, 59, 59) + MAX_TZ_OFFSET; + /* Set this for reference. Its not currently used except on write which we don't have .. */ -- 2.17.1
[PATCH v8 15/20] fs: ceph: Initialize filesystem timestamp ranges
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. According to the disscussion in https://patchwork.kernel.org/patch/8308691/ we agreed to use unsigned 32 bit timestamps on ceph. Update the limits accordingly. Signed-off-by: Deepa Dinamani Cc: z...@redhat.com Cc: s...@redhat.com Cc: idryo...@gmail.com Cc: ceph-de...@vger.kernel.org --- fs/ceph/super.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/ceph/super.c b/fs/ceph/super.c index 4406ec7018bb..2aa052b7bda7 100644 --- a/fs/ceph/super.c +++ b/fs/ceph/super.c @@ -907,6 +907,8 @@ static int ceph_set_super(struct super_block *s, struct fs_context *fc) s->s_export_op = _export_ops; s->s_time_gran = 1; + s->s_time_min = 0; + s->s_time_max = U32_MAX; ret = set_anon_super_fc(s, fc); if (ret != 0) -- 2.17.1
[PATCH v8 08/20] adfs: Fill in max and min timestamps in sb
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Note that the min timestamp is assumed to be 01 Jan 1970 00:00:00 (Unix epoch). This is consistent with the way we convert timestamps in adfs_adfs2unix_time(). Signed-off-by: Deepa Dinamani --- fs/adfs/adfs.h | 13 + fs/adfs/inode.c | 8 ++-- fs/adfs/super.c | 2 ++ 3 files changed, 17 insertions(+), 6 deletions(-) diff --git a/fs/adfs/adfs.h b/fs/adfs/adfs.h index b7e844d2f321..dca8b23aa43f 100644 --- a/fs/adfs/adfs.h +++ b/fs/adfs/adfs.h @@ -3,6 +3,19 @@ #include #include +/* + * 01 Jan 1970 00:00:00 (Unix epoch) as seconds since + * 01 Jan 1900 00:00:00 (RISC OS epoch) + */ +#define RISC_OS_EPOCH_DELTA 2208988800LL + +/* + * Convert 40 bit centi seconds to seconds + * since 01 Jan 1900 00:00:00 (RISC OS epoch) + * The result is 2248-06-03 06:57:57 GMT + */ +#define ADFS_MAX_TIMESTAMP ((0xFFLL / 100) - RISC_OS_EPOCH_DELTA) + /* Internal data structures for ADFS */ #define ADFS_FREE_FRAG 0 diff --git a/fs/adfs/inode.c b/fs/adfs/inode.c index 124de75413a5..41eca1c451dc 100644 --- a/fs/adfs/inode.c +++ b/fs/adfs/inode.c @@ -167,11 +167,7 @@ static void adfs_adfs2unix_time(struct timespec64 *tv, struct inode *inode) { unsigned int high, low; - /* 01 Jan 1970 00:00:00 (Unix epoch) as nanoseconds since -* 01 Jan 1900 00:00:00 (RISC OS epoch) -*/ - static const s64 nsec_unix_epoch_diff_risc_os_epoch = - 22089888000LL; + static const s64 nsec_unix_epoch_diff_risc_os_epoch = RISC_OS_EPOCH_DELTA * NSEC_PER_SEC; s64 nsec; if (!adfs_inode_is_stamped(inode)) @@ -216,7 +212,7 @@ adfs_unix2adfs_time(struct inode *inode, unsigned int secs) if (adfs_inode_is_stamped(inode)) { /* convert 32-bit seconds to 40-bit centi-seconds */ low = (secs & 255) * 100; - high = (secs / 256) * 100 + (low >> 8) + 0x336e996a; + high = (secs / 256) * 100 + (low >> 8) + (RISC_OS_EPOCH_DELTA*100/256); ADFS_I(inode)->loadaddr = (high >> 24) | (ADFS_I(inode)->loadaddr & ~0xff); diff --git a/fs/adfs/super.c b/fs/adfs/super.c index 65b04ebb51c3..f074fe7d7158 100644 --- a/fs/adfs/super.c +++ b/fs/adfs/super.c @@ -463,6 +463,8 @@ static int adfs_fill_super(struct super_block *sb, void *data, int silent) asb->s_map_size = dr->nzones | (dr->nzones_high << 8); asb->s_map2blk = dr->log2bpmb - dr->log2secsize; asb->s_log2sharesize= dr->log2sharesize; + sb->s_time_min = 0; + sb->s_time_max = ADFS_MAX_TIMESTAMP; asb->s_map = adfs_read_map(sb, dr); if (IS_ERR(asb->s_map)) { -- 2.17.1
[PATCH v8 02/20] vfs: Add timestamp_truncate() api
timespec_trunc() function is used to truncate a filesystem timestamp to the right granularity. But, the function does not clamp tv_sec part of the timestamps according to the filesystem timestamp limits. The replacement api: timestamp_truncate() also alters the signature of the function to accommodate filesystem timestamp clamping according to flesystem limits. Note that the tv_nsec part is set to 0 if tv_sec is not within the range supported for the filesystem. Signed-off-by: Deepa Dinamani --- fs/inode.c | 33 - include/linux/fs.h | 2 ++ 2 files changed, 34 insertions(+), 1 deletion(-) diff --git a/fs/inode.c b/fs/inode.c index ef33fdf0105f..fef457a42882 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -2169,6 +2169,37 @@ struct timespec64 timespec64_trunc(struct timespec64 t, unsigned gran) } EXPORT_SYMBOL(timespec64_trunc); +/** + * timestamp_truncate - Truncate timespec to a granularity + * @t: Timespec + * @inode: inode being updated + * + * Truncate a timespec to the granularity supported by the fs + * containing the inode. Always rounds down. gran must + * not be 0 nor greater than a second (NSEC_PER_SEC, or 10^9 ns). + */ +struct timespec64 timestamp_truncate(struct timespec64 t, struct inode *inode) +{ + struct super_block *sb = inode->i_sb; + unsigned int gran = sb->s_time_gran; + + t.tv_sec = clamp(t.tv_sec, sb->s_time_min, sb->s_time_max); + if (unlikely(t.tv_sec == sb->s_time_max || t.tv_sec == sb->s_time_min)) + t.tv_nsec = 0; + + /* Avoid division in the common cases 1 ns and 1 s. */ + if (gran == 1) + ; /* nothing */ + else if (gran == NSEC_PER_SEC) + t.tv_nsec = 0; + else if (gran > 1 && gran < NSEC_PER_SEC) + t.tv_nsec -= t.tv_nsec % gran; + else + WARN(1, "invalid file time granularity: %u", gran); + return t; +} +EXPORT_SYMBOL(timestamp_truncate); + /** * current_time - Return FS time * @inode: inode. @@ -2190,7 +2221,7 @@ struct timespec64 current_time(struct inode *inode) return now; } - return timespec64_trunc(now, inode->i_sb->s_time_gran); + return timestamp_truncate(now, inode); } EXPORT_SYMBOL(current_time); diff --git a/include/linux/fs.h b/include/linux/fs.h index 93c440d22547..1170d8260aa2 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -737,6 +737,8 @@ struct inode { void*i_private; /* fs or device private pointer */ } __randomize_layout; +struct timespec64 timestamp_truncate(struct timespec64 t, struct inode *inode); + static inline unsigned int i_blocksize(const struct inode *node) { return (1 << node->i_blkbits); -- 2.17.1
[PATCH v8 05/20] utimes: Clamp the timestamps before update
POSIX is ambiguous on the behavior of timestamps for futimens, utimensat and utimes. Whether to return an error or silently clamp a timestamp beyond the range supported by the underlying filesystems is not clear. POSIX.1 section for futimens, utimensat and utimes says: (http://pubs.opengroup.org/onlinepubs/9699919799/functions/futimens.html) The file's relevant timestamp shall be set to the greatest value supported by the file system that is not greater than the specified time. If the tv_nsec field of a timespec structure has the special value UTIME_NOW, the file's relevant timestamp shall be set to the greatest value supported by the file system that is not greater than the current time. [EINVAL] A new file timestamp would be a value whose tv_sec component is not a value supported by the file system. The patch chooses to clamp the timestamps according to the filesystem timestamp ranges and does not return an error. This is in line with the behavior of utime syscall also since the POSIX page(http://pubs.opengroup.org/onlinepubs/009695399/functions/utime.html) for utime does not mention returning an error or clamping like above. Same for utimes http://pubs.opengroup.org/onlinepubs/009695399/functions/utimes.html Signed-off-by: Deepa Dinamani --- fs/utimes.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/fs/utimes.c b/fs/utimes.c index 350c9c16ace1..1ba3f7883870 100644 --- a/fs/utimes.c +++ b/fs/utimes.c @@ -36,16 +36,14 @@ static int utimes_common(const struct path *path, struct timespec64 *times) if (times[0].tv_nsec == UTIME_OMIT) newattrs.ia_valid &= ~ATTR_ATIME; else if (times[0].tv_nsec != UTIME_NOW) { - newattrs.ia_atime.tv_sec = times[0].tv_sec; - newattrs.ia_atime.tv_nsec = times[0].tv_nsec; + newattrs.ia_atime = timestamp_truncate(times[0], inode); newattrs.ia_valid |= ATTR_ATIME_SET; } if (times[1].tv_nsec == UTIME_OMIT) newattrs.ia_valid &= ~ATTR_MTIME; else if (times[1].tv_nsec != UTIME_NOW) { - newattrs.ia_mtime.tv_sec = times[1].tv_sec; - newattrs.ia_mtime.tv_nsec = times[1].tv_nsec; + newattrs.ia_mtime = timestamp_truncate(times[1], inode); newattrs.ia_valid |= ATTR_MTIME_SET; } /* -- 2.17.1
[PATCH v8 04/20] mount: Add mount warning for impending timestamp expiry
The warning reuses the uptime max of 30 years used by settimeofday(). Note that the warning is only emitted for writable filesystem mounts through the mount syscall. Automounts do not have the same warning. Print out the warning in human readable format using the struct tm. After discussion with Arnd Bergmann, we chose to print only the year number. The raw s_time_max is also displayed, and the user can easily decode it e.g. "date -u -d @$((0x7fff))". We did not want to consolidate struct rtc_tm and struct tm just to print the date using a format specifier as part of this series. Given that the rtc_tm is not compiled on all architectures, this is not a trivial patch. This can be added in the future. Signed-off-by: Deepa Dinamani --- fs/namespace.c | 33 - 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/fs/namespace.c b/fs/namespace.c index bfcb4e341257..7fcf289593c5 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -2468,6 +2468,26 @@ static void set_mount_attributes(struct mount *mnt, unsigned int mnt_flags) unlock_mount_hash(); } +static void mnt_warn_timestamp_expiry(struct path *mountpoint, struct vfsmount *mnt) +{ + struct super_block *sb = mnt->mnt_sb; + + if (!__mnt_is_readonly(mnt) && + (ktime_get_real_seconds() + TIME_UPTIME_SEC_MAX > sb->s_time_max)) { + char *buf = (char *)__get_free_page(GFP_KERNEL); + char *mntpath = buf ? d_path(mountpoint, buf, PAGE_SIZE) : ERR_PTR(-ENOMEM); + struct tm tm; + + time64_to_tm(sb->s_time_max, 0, ); + + pr_warn("Mounted %s file system at %s supports timestamps until %04ld (0x%llx)\n", + sb->s_type->name, mntpath, + tm.tm_year+1900, (unsigned long long)sb->s_time_max); + + free_page((unsigned long)buf); + } +} + /* * Handle reconfiguration of the mountpoint only without alteration of the * superblock it refers to. This is triggered by specifying MS_REMOUNT|MS_BIND @@ -2493,6 +2513,9 @@ static int do_reconfigure_mnt(struct path *path, unsigned int mnt_flags) if (ret == 0) set_mount_attributes(mnt, mnt_flags); up_write(>s_umount); + + mnt_warn_timestamp_expiry(path, >mnt); + return ret; } @@ -2533,6 +2556,9 @@ static int do_remount(struct path *path, int ms_flags, int sb_flags, } up_write(>s_umount); } + + mnt_warn_timestamp_expiry(path, >mnt); + put_fs_context(fc); return err; } @@ -2741,8 +2767,13 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint, return PTR_ERR(mnt); error = do_add_mount(real_mount(mnt), mountpoint, mnt_flags); - if (error < 0) + if (error < 0) { mntput(mnt); + return error; + } + + mnt_warn_timestamp_expiry(mountpoint, mnt); + return error; } -- 2.17.1
[PATCH v8 01/20] vfs: Add file timestamp range support
Add fields to the superblock to track the min and max timestamps supported by filesystems. Initially, when a superblock is allocated, initialize it to the max and min values the fields can hold. Individual filesystems override these to match their actual limits. Pseudo filesystems are assumed to always support the min and max allowable values for the fields. Signed-off-by: Deepa Dinamani --- fs/super.c | 2 ++ include/linux/fs.h | 3 +++ include/linux/time64.h | 2 ++ 3 files changed, 7 insertions(+) diff --git a/fs/super.c b/fs/super.c index 36cb5aaf6f08..620c1911bb36 100644 --- a/fs/super.c +++ b/fs/super.c @@ -258,6 +258,8 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags, s->s_maxbytes = MAX_NON_LFS; s->s_op = _op; s->s_time_gran = 10; + s->s_time_min = TIME64_MIN; + s->s_time_max = TIME64_MAX; s->cleancache_poolid = CLEANCACHE_NO_POOL; s->s_shrink.seeks = DEFAULT_SEEKS; diff --git a/include/linux/fs.h b/include/linux/fs.h index d97d74f35eb3..93c440d22547 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1463,6 +1463,9 @@ struct super_block { /* Granularity of c/m/atime in ns (cannot be worse than a second) */ u32 s_time_gran; + /* Time limits for c/m/atime in seconds */ + time64_t s_time_min; + time64_t s_time_max; #ifdef CONFIG_FSNOTIFY __u32 s_fsnotify_mask; struct fsnotify_mark_connector __rcu*s_fsnotify_marks; diff --git a/include/linux/time64.h b/include/linux/time64.h index a620ee610b9f..19125489ae94 100644 --- a/include/linux/time64.h +++ b/include/linux/time64.h @@ -30,6 +30,8 @@ struct itimerspec64 { /* Located here for timespec[64]_valid_strict */ #define TIME64_MAX ((s64)~((u64)1 << 63)) +#define TIME64_MIN (-TIME64_MAX - 1) + #define KTIME_MAX ((s64)~((u64)1 << 63)) #define KTIME_SEC_MAX (KTIME_MAX / NSEC_PER_SEC) -- 2.17.1
Re: [PATCH 19/20] pstore: fs superblock limits
On Fri, Aug 2, 2019 at 12:15 AM Arnd Bergmann wrote: > > On Fri, Aug 2, 2019 at 4:26 AM Deepa Dinamani wrote: > > > > On Tue, Jul 30, 2019 at 12:36 AM Arnd Bergmann wrote: > > > > > > On Tue, Jul 30, 2019 at 6:31 AM Kees Cook wrote: > > > > > > > > On Mon, Jul 29, 2019 at 06:49:23PM -0700, Deepa Dinamani wrote: > > > > > Also update the gran since pstore has microsecond granularity. > > > > > > > > So, I'm fine with this, but technically the granularity depends on the > > > > backend storage... many have no actual time keeping, though. My point > > > > is, > > > > pstore's timestamps are really mostly a lie, but the most common backend > > > > (ramoops) is seconds-granularity. > > > > > > > > So, I'm fine with this, but it's a lie but it's a lie that doesn't > > > > matter, so ... > > > > > > > > Acked-by: Kees Cook > > > > > > > > I'm open to suggestions to improve it... > > > > > > If we don't care about using sub-second granularity, then setting it > > > to one second unconditionally here will make it always use that and > > > report it correctly. > > > > Should this printf in ramoops_write_kmsg_hdr() also be fixed then? > > > > RAMOOPS_KERNMSG_HDR "%lld.%06lu-%c\n", > > (time64_t)record->time.tv_sec, > > record->time.tv_nsec / 1000, > > record->compressed ? 'C' : 'D'); > > persistent_ram_write(prz, hdr, len); > > > > ramoops_read_kmsg_hdr() doesn't read this as microseconds. Seems like > > a mismatch from above. > > Good catch. This seems to go back to commit 3f8f80f0cfeb ("pstore/ram: > Read and write to the 'compressed' flag of pstore"), which introduced the > nanosecond read. The write function however has always used > microseconds, and that was kept when the implementation changed > from timeval to timespec in commit 1e817fb62cd1 ("time: create > __getnstimeofday for WARNless calls"). > > > If we want to agree that we just want seconds granularity for pstore, > > we could replace the tv_nsec part to be all 0's if anybody else is > > depending on this format. > > I could drop this patch from the series and post that patch seperately. > > We should definitely fix it to not produce a bogus nanosecond value. > Whether using full seconds or microsecond resolution is better here, > I don't know. It seems that pstore records generally get created > with a nanosecond nanosecond accurate timestamp from > ktime_get_real_fast_ns() and then truncated to the resolution of the > backend, rather than the normal jiffies-accurate inode timestamps that > we have for regular file systems. > > This might mean that we do want the highest possible resolution > and not further truncate here, in case that information ends > up being useful afterwards. I made a list of granularities used by pstore drivers using pstore_register(): 1. efi - seconds 2. ramoops - microsecond 3. erst - seconds 4. powerpc/nvram64 - seconds I will leave pstore granularity at nanoseconds and fix the ramoops read. -Deepa
Re: [Y2038] [PATCH 04/20] mount: Add mount warning for impending timestamp expiry
On Mon, Aug 12, 2019 at 9:09 AM Deepa Dinamani wrote: > > On Mon, Aug 12, 2019 at 7:11 AM Arnd Bergmann wrote: > > > > On Mon, Aug 12, 2019 at 3:25 PM Ben Hutchings > > wrote: > > > On Sat, 2019-08-10 at 13:44 -0700, Deepa Dinamani wrote: > > > > On Mon, Aug 5, 2019 at 7:14 AM Ben Hutchings > > > > wrote: > > > > > On Mon, 2019-07-29 at 18:49 -0700, Deepa Dinamani wrote: > > > > > > The warning reuses the uptime max of 30 years used by the > > > > > > setitimeofday(). > > > > > > > > > > > > Note that the warning is only added for new filesystem mounts > > > > > > through the mount syscall. Automounts do not have the same warning. > > > > > [...] > > > > > > > > > > Another thing - perhaps this warning should be suppressed for > > > > > read-only > > > > > mounts? > > > > > > > > Many filesystems support read only mounts only. We do fill in right > > > > granularities and limits for these filesystems as well. In keeping > > > > with the trend, I have added the warning accordingly. I don't think I > > > > have a preference either way. But, not warning for the red only mounts > > > > adds another if case. If you have a strong preference, I could add it > > > > in. > > > > > > It seems to me that the warning is needed if there is a possibility of > > > data loss (incorrect timestamps, potentially leading to incorrect > > > decisions about which files are newer). This can happen only when a > > > filesystem is mounted read-write, or when a filesystem image is > > > created. > > > > > > I think that warning for read-only mounts would be an annoyance to > > > users retrieving files from old filesystems. > > > > I agree, the warning is not helpful for read-only mounts. An earlier > > plan was to completely disallow writable mounts that might risk an > > overflow (in some configurations at least). The warning replaces that > > now, and I think it should also just warn for the cases that would > > otherwise have been dangerous. > > Ok, I will make the change to exclude new read only mounts. I will use > __mnt_is_readonly() so that it also exculdes filesystems that are > readonly also. > The diff looks like below: > > - if (!error && sb->s_time_max && > + if (!error && !__mnt_is_readonly(mnt) && > (ktime_get_real_seconds() + TIME_UPTIME_SEC_MAX > > sb->s_time_max)) { > > Note that we can get rid of checking for non zero sb->s_time_max now. One more thing, we will probably have to add a second warning for when the filesystem is re-mounted rw after the initial readonly mount. -Deepa
Re: [Y2038] [PATCH 04/20] mount: Add mount warning for impending timestamp expiry
On Mon, Aug 12, 2019 at 7:11 AM Arnd Bergmann wrote: > > On Mon, Aug 12, 2019 at 3:25 PM Ben Hutchings > wrote: > > On Sat, 2019-08-10 at 13:44 -0700, Deepa Dinamani wrote: > > > On Mon, Aug 5, 2019 at 7:14 AM Ben Hutchings > > > wrote: > > > > On Mon, 2019-07-29 at 18:49 -0700, Deepa Dinamani wrote: > > > > > The warning reuses the uptime max of 30 years used by the > > > > > setitimeofday(). > > > > > > > > > > Note that the warning is only added for new filesystem mounts > > > > > through the mount syscall. Automounts do not have the same warning. > > > > [...] > > > > > > > > Another thing - perhaps this warning should be suppressed for read-only > > > > mounts? > > > > > > Many filesystems support read only mounts only. We do fill in right > > > granularities and limits for these filesystems as well. In keeping > > > with the trend, I have added the warning accordingly. I don't think I > > > have a preference either way. But, not warning for the red only mounts > > > adds another if case. If you have a strong preference, I could add it > > > in. > > > > It seems to me that the warning is needed if there is a possibility of > > data loss (incorrect timestamps, potentially leading to incorrect > > decisions about which files are newer). This can happen only when a > > filesystem is mounted read-write, or when a filesystem image is > > created. > > > > I think that warning for read-only mounts would be an annoyance to > > users retrieving files from old filesystems. > > I agree, the warning is not helpful for read-only mounts. An earlier > plan was to completely disallow writable mounts that might risk an > overflow (in some configurations at least). The warning replaces that > now, and I think it should also just warn for the cases that would > otherwise have been dangerous. Ok, I will make the change to exclude new read only mounts. I will use __mnt_is_readonly() so that it also exculdes filesystems that are readonly also. The diff looks like below: - if (!error && sb->s_time_max && + if (!error && !__mnt_is_readonly(mnt) && (ktime_get_real_seconds() + TIME_UPTIME_SEC_MAX > sb->s_time_max)) { Note that we can get rid of checking for non zero sb->s_time_max now. -Deepa
Re: [Y2038] [PATCH 04/20] mount: Add mount warning for impending timestamp expiry
> This doesn't seem like a helpful way to log the time. Maybe use > time64_to_tm() to convert to "broken down" time and then print it with > "%ptR"... but that wants struct rtc_time. If you apply the attached > patch, however, you should then be able to print struct tm with > "%ptT". OK. Will print a more user friendly date string here. Thanks, -Deepa
Re: [Y2038] [PATCH 04/20] mount: Add mount warning for impending timestamp expiry
On Mon, Aug 5, 2019 at 7:14 AM Ben Hutchings wrote: > > On Mon, 2019-07-29 at 18:49 -0700, Deepa Dinamani wrote: > > The warning reuses the uptime max of 30 years used by the > > setitimeofday(). > > > > Note that the warning is only added for new filesystem mounts > > through the mount syscall. Automounts do not have the same warning. > [...] > > Another thing - perhaps this warning should be suppressed for read-only > mounts? Many filesystems support read only mounts only. We do fill in right granularities and limits for these filesystems as well. In keeping with the trend, I have added the warning accordingly. I don't think I have a preference either way. But, not warning for the red only mounts adds another if case. If you have a strong preference, I could add it in. -Deepa
Re: [Y2038] [PATCH 05/20] utimes: Clamp the timestamps before update
On Mon, Aug 5, 2019 at 6:30 AM Ben Hutchings wrote: > > On Mon, 2019-07-29 at 18:49 -0700, Deepa Dinamani wrote: > > POSIX is ambiguous on the behavior of timestamps for > > futimens, utimensat and utimes. Whether to return an > > error or silently clamp a timestamp beyond the range > > supported by the underlying filesystems is not clear. > > > > POSIX.1 section for futimens, utimensat and utimes says: > > (http://pubs.opengroup.org/onlinepubs/9699919799/functions/futimens.html) > > > > The file's relevant timestamp shall be set to the greatest > > value supported by the file system that is not greater > > than the specified time. > > > > If the tv_nsec field of a timespec structure has the special > > value UTIME_NOW, the file's relevant timestamp shall be set > > to the greatest value supported by the file system that is > > not greater than the current time. > > > > [EINVAL] > > A new file timestamp would be a value whose tv_sec > > component is not a value supported by the file system. > > > > The patch chooses to clamp the timestamps according to the > > filesystem timestamp ranges and does not return an error. > > This is in line with the behavior of utime syscall also > > since the POSIX > > page(http://pubs.opengroup.org/onlinepubs/009695399/functions/utime.html) > > for utime does not mention returning an error or clamping like above. > > > > Same for utimes > > http://pubs.opengroup.org/onlinepubs/009695399/functions/utimes.html > > > > Signed-off-by: Deepa Dinamani > > --- > > fs/utimes.c | 17 + > > 1 file changed, 13 insertions(+), 4 deletions(-) > > > > diff --git a/fs/utimes.c b/fs/utimes.c > > index 350c9c16ace1..4c1a2ce90bbc 100644 > > --- a/fs/utimes.c > > +++ b/fs/utimes.c > > @@ -21,6 +21,7 @@ static int utimes_common(const struct path *path, struct > > timespec64 *times) > > int error; > > struct iattr newattrs; > > struct inode *inode = path->dentry->d_inode; > > + struct super_block *sb = inode->i_sb; > > struct inode *delegated_inode = NULL; > > > > error = mnt_want_write(path->mnt); > > @@ -36,16 +37,24 @@ static int utimes_common(const struct path *path, > > struct timespec64 *times) > > if (times[0].tv_nsec == UTIME_OMIT) > > newattrs.ia_valid &= ~ATTR_ATIME; > > else if (times[0].tv_nsec != UTIME_NOW) { > > - newattrs.ia_atime.tv_sec = times[0].tv_sec; > > - newattrs.ia_atime.tv_nsec = times[0].tv_nsec; > > + newattrs.ia_atime.tv_sec = > > + clamp(times[0].tv_sec, sb->s_time_min, > > sb->s_time_max); > > + if (times[0].tv_sec == sb->s_time_max || > > times[0].tv_sec == sb->s_time_min) > > This is testing the un-clamped value. > > > + newattrs.ia_atime.tv_nsec = 0; > > + else > > + newattrs.ia_atime.tv_nsec = times[0].tv_nsec; > > newattrs.ia_valid |= ATTR_ATIME_SET; > > } > > > > if (times[1].tv_nsec == UTIME_OMIT) > > newattrs.ia_valid &= ~ATTR_MTIME; > > else if (times[1].tv_nsec != UTIME_NOW) { > > - newattrs.ia_mtime.tv_sec = times[1].tv_sec; > > - newattrs.ia_mtime.tv_nsec = times[1].tv_nsec; > > + newattrs.ia_mtime.tv_sec = > > + clamp(times[1].tv_sec, sb->s_time_min, > > sb->s_time_max); > > + if (times[1].tv_sec >= sb->s_time_max || > > times[1].tv_sec == sb->s_time_min) > > Similarly here, for the minimum. > > I suggest testing for clamping like this: > > if (newattrs.ia_atime.tv_sec != times[0].tv_sec) > ... > if (newattrs.ia_mtime.tv_sec != times[1].tv_sec) > ... > > Ben. > > > + newattrs.ia_mtime.tv_nsec = 0; > > + else > > + newattrs.ia_mtime.tv_nsec = times[1].tv_nsec; > > newattrs.ia_valid |= ATTR_MTIME_SET; > > } Darrick pointed out that maybe we could use timestamp_truncate() here to clamp. I think it is ok to truncate to the right granularity also here. setattr callbacks do it already. So the diff here looks like below: - newattrs.ia_atime.tv_sec = - clamp(times[0].tv_sec, sb->s_time_min, sb->s_time_max); - if (times[0].tv_sec == sb->s_time_max || times[0].tv_sec == sb->s_time_min) - newattrs.ia_atime.tv_nsec = 0; - else - newattrs.ia_atime.tv_nsec = times[0].tv_nsec; + newattrs.ia_atime = timestamp_truncate(times[0], inode); Thanks, Deepa
Re: [PATCH 09/20] ext4: Initialize timestamps limits
> Rather than printing a warning at mount time (which may be confusing > to users for a problem they may never see), it makes sense to only > print such a warning in the vanishingly small case that someone actually > tries to modify the inode timestamp but it doesn't fit, rather than on > the theoretical case that may never happen. Arnd and I were discussing and we came to a similar conclusion that we would not warn at mount. Arnd suggested maybe we include a pr_warn_ratelimited() when we do write to these special inodes. In general, there is an agreement to leave the fs granularity to a higher resolution for such super blocks. And hence, the timestamps for these special cases will never be clamped in memory.(Assuming we do not want to make more changes and try to do something like __ext4_expand_extra_isize() for in memory inode updates) We could easily try to clamp these timestamps when they get written out to the disk by modifying the EXT4_INODE_SET_XTIME to include such a clamp. And, at this point we could also warn. If this is acceptable, I could post an update. -Deepa
Re: [PATCH 19/20] pstore: fs superblock limits
On Tue, Jul 30, 2019 at 12:36 AM Arnd Bergmann wrote: > > On Tue, Jul 30, 2019 at 6:31 AM Kees Cook wrote: > > > > On Mon, Jul 29, 2019 at 06:49:23PM -0700, Deepa Dinamani wrote: > > > Also update the gran since pstore has microsecond granularity. > > > > So, I'm fine with this, but technically the granularity depends on the > > backend storage... many have no actual time keeping, though. My point is, > > pstore's timestamps are really mostly a lie, but the most common backend > > (ramoops) is seconds-granularity. > > > > So, I'm fine with this, but it's a lie but it's a lie that doesn't > > matter, so ... > > > > Acked-by: Kees Cook > > > > I'm open to suggestions to improve it... > > If we don't care about using sub-second granularity, then setting it > to one second unconditionally here will make it always use that and > report it correctly. Should this printf in ramoops_write_kmsg_hdr() also be fixed then? RAMOOPS_KERNMSG_HDR "%lld.%06lu-%c\n", (time64_t)record->time.tv_sec, record->time.tv_nsec / 1000, record->compressed ? 'C' : 'D'); persistent_ram_write(prz, hdr, len); ramoops_read_kmsg_hdr() doesn't read this as microseconds. Seems like a mismatch from above. If we want to agree that we just want seconds granularity for pstore, we could replace the tv_nsec part to be all 0's if anybody else is depending on this format. I could drop this patch from the series and post that patch seperately. Thanks, -Deepa
Re: [PATCH 09/20] ext4: Initialize timestamps limits
On Wed, Jul 31, 2019 at 8:26 AM Darrick J. Wong wrote: > > On Mon, Jul 29, 2019 at 06:49:13PM -0700, Deepa Dinamani wrote: > > ext4 has different overflow limits for max filesystem > > timestamps based on the extra bytes available. > > > > The timestamp limits are calculated according to the > > encoding table in > > a4dad1ae24f85i(ext4: Fix handling of extended tv_sec): > > > > * extra msb of adjust for signed > > * epoch 32-bit 32-bit tv_sec to > > * bits timedecoded 64-bit tv_sec 64-bit tv_sec valid time range > > * 0 01-0x8000..-0x0001 0x0 1901-12-13..1969-12-31 > > * 0 000x0..0x07fff 0x0 1970-01-01..2038-01-19 > > * 0 110x08000..0x0 0x1 2038-01-19..2106-02-07 > > * 0 100x1..0x17fff 0x1 2106-02-07..2174-02-25 > > * 1 010x18000..0x1 0x2 2174-02-25..2242-03-16 > > * 1 000x2..0x27fff 0x2 2242-03-16..2310-04-04 > > * 1 110x28000..0x2 0x3 2310-04-04..2378-04-22 > > * 1 100x3..0x37fff 0x3 2378-04-22..2446-05-10 > > My recollection of ext4 has gotten rusty, so this could be a bogus > question: > > Say you have a filesystem with s_inode_size > 128 where not all of the > ondisk inodes have been upgraded to i_extra_isize > 0 and therefore > don't support nanoseconds or times beyond 2038. I think this happens on > ext3 filesystems that reserved extra space for inode attrs that are > subsequently converted to ext4? > > In any case, that means that you have some inodes that support 34-bit > tv_sec and some inodes that only support 32-bit tv_sec. For the inodes > with 32-bit tv_sec, I think all that happens is that a large timestamp > will be truncated further, right? > > And no mount time warning because at least /some/ of the inodes are > ready to go for the next 30 years? I'm confused about ext3 being converted to ext4. If the converted inodes have extra space, then ext4_iget() will start using the extra space when it modifies the on disk inode, won't it? But, if there are 32 bit tv_sec and 34 bit tv_sec in a superblock then from this macro below, if an inode has space for extra bits in timestamps, it uses it. Otherwise, only the first 32 bits are copied to the on disk timestamp. This matches the behavior today for 32 bit tv_sec. But, the 34 bit tv_sec has a better behavior after the series because of the clamping and warning. #define EXT4_INODE_SET_XTIME(xtime, inode, raw_inode)\ do {\ (raw_inode)->xtime = cpu_to_le32((inode)->xtime.tv_sec);\ if (EXT4_FITS_IN_INODE(raw_inode, EXT4_I(inode), xtime ## _extra)) {\ (raw_inode)->xtime ## _extra =\ ext4_encode_extra_time(&(inode)->xtime);\ }\ } while (0) I'm not sure if this corner case if important. Maybe the maintainers can help me here. If this is important, then the inode time updates for an ext4 inode should always be through ext4_setattr() and we can clamp the timestamps there as a special case. And, this patch can be added separately? Thanks, Deepa
Re: [PATCH 05/20] utimes: Clamp the timestamps before update
On Wed, Jul 31, 2019 at 8:15 AM Darrick J. Wong wrote: > > On Mon, Jul 29, 2019 at 06:49:09PM -0700, Deepa Dinamani wrote: > > POSIX is ambiguous on the behavior of timestamps for > > futimens, utimensat and utimes. Whether to return an > > error or silently clamp a timestamp beyond the range > > supported by the underlying filesystems is not clear. > > > > POSIX.1 section for futimens, utimensat and utimes says: > > (http://pubs.opengroup.org/onlinepubs/9699919799/functions/futimens.html) > > > > The file's relevant timestamp shall be set to the greatest > > value supported by the file system that is not greater > > than the specified time. > > > > If the tv_nsec field of a timespec structure has the special > > value UTIME_NOW, the file's relevant timestamp shall be set > > to the greatest value supported by the file system that is > > not greater than the current time. > > > > [EINVAL] > > A new file timestamp would be a value whose tv_sec > > component is not a value supported by the file system. > > > > The patch chooses to clamp the timestamps according to the > > filesystem timestamp ranges and does not return an error. > > This is in line with the behavior of utime syscall also > > since the POSIX > > page(http://pubs.opengroup.org/onlinepubs/009695399/functions/utime.html) > > for utime does not mention returning an error or clamping like above. > > > > Same for utimes > > http://pubs.opengroup.org/onlinepubs/009695399/functions/utimes.html > > > > Signed-off-by: Deepa Dinamani > > --- > > fs/utimes.c | 17 + > > 1 file changed, 13 insertions(+), 4 deletions(-) > > > > diff --git a/fs/utimes.c b/fs/utimes.c > > index 350c9c16ace1..4c1a2ce90bbc 100644 > > --- a/fs/utimes.c > > +++ b/fs/utimes.c > > @@ -21,6 +21,7 @@ static int utimes_common(const struct path *path, struct > > timespec64 *times) > > int error; > > struct iattr newattrs; > > struct inode *inode = path->dentry->d_inode; > > + struct super_block *sb = inode->i_sb; > > struct inode *delegated_inode = NULL; > > > > error = mnt_want_write(path->mnt); > > @@ -36,16 +37,24 @@ static int utimes_common(const struct path *path, > > struct timespec64 *times) > > if (times[0].tv_nsec == UTIME_OMIT) > > newattrs.ia_valid &= ~ATTR_ATIME; > > else if (times[0].tv_nsec != UTIME_NOW) { > > - newattrs.ia_atime.tv_sec = times[0].tv_sec; > > - newattrs.ia_atime.tv_nsec = times[0].tv_nsec; > > + newattrs.ia_atime.tv_sec = > > + clamp(times[0].tv_sec, sb->s_time_min, > > sb->s_time_max); > > + if (times[0].tv_sec == sb->s_time_max || > > times[0].tv_sec == sb->s_time_min) > > + newattrs.ia_atime.tv_nsec = 0; > > + else > > + newattrs.ia_atime.tv_nsec = times[0].tv_nsec; > > newattrs.ia_valid |= ATTR_ATIME_SET; > > } > > > > if (times[1].tv_nsec == UTIME_OMIT) > > newattrs.ia_valid &= ~ATTR_MTIME; > > else if (times[1].tv_nsec != UTIME_NOW) { > > - newattrs.ia_mtime.tv_sec = times[1].tv_sec; > > - newattrs.ia_mtime.tv_nsec = times[1].tv_nsec; > > + newattrs.ia_mtime.tv_sec = > > + clamp(times[1].tv_sec, sb->s_time_min, > > sb->s_time_max); > > + if (times[1].tv_sec >= sb->s_time_max || > > times[1].tv_sec == sb->s_time_min) > > Line length. > > Also, didn't you just introduce a function to clamp tv_sec and fix > granularity? Why not just use it here? I think this is the third time > I've seen this open-coded logic. Yes, we can use that now. Earlier we were not setting the tv_nsec to 0 in timestamp_truncate() which is why this was opencoded here. I will make the change to include this. Thanks, Deepa
Re: [PATCH 12/20] fs: fat: Initialize filesystem timestamp ranges
On Tue, Jul 30, 2019 at 2:31 AM OGAWA Hirofumi wrote: > > Deepa Dinamani writes: > > > +/* DOS dates from 1980/1/1 through 2107/12/31 */ > > +#define FAT_DATE_MIN (0<<9 | 1<<5 | 1) > > +#define FAT_DATE_MAX (127<<9 | 12<<5 | 31) > > +#define FAT_TIME_MAX (23<<11 | 59<<5 | 29) > > + > > /* > > * A deserialized copy of the on-disk structure laid out in struct > > * fat_boot_sector. > > @@ -1605,6 +1610,7 @@ int fat_fill_super(struct super_block *sb, void > > *data, int silent, int isvfat, > > int debug; > > long error; > > char buf[50]; > > + struct timespec64 ts; > > > > /* > >* GFP_KERNEL is ok here, because while we do hold the > > @@ -1698,6 +1704,12 @@ int fat_fill_super(struct super_block *sb, void > > *data, int silent, int isvfat, > > sbi->free_clus_valid = 0; > > sbi->prev_free = FAT_START_ENT; > > sb->s_maxbytes = 0x; > > + fat_time_fat2unix(sbi, , 0, cpu_to_le16(FAT_DATE_MIN), 0); > > + sb->s_time_min = ts.tv_sec; > > + > > + fat_time_fat2unix(sbi, , cpu_to_le16(FAT_TIME_MAX), > > + cpu_to_le16(FAT_DATE_MAX), 0); > > + sb->s_time_max = ts.tv_sec; > > At least, it is wrong to call fat_time_fat2unix() before setup parameters > in sbi. All the parameters that fat_time_fat2unix() cares in sbi is accessed through static inline int fat_tz_offset(struct msdos_sb_info *sbi) { return (sbi->options.tz_set ? -sbi->options.time_offset : sys_tz.tz_minuteswest) * SECS_PER_MIN; } Both the sbi fields sbi->options.tz_set and sbi->options.time_offset are set by the call to parse_options(). And, parse_options() is called before the calls to fat_time_fat2unix().: int fat_fill_super(struct super_block *sb, void *data, int silent, int isvfat, void (*setup)(struct super_block *)) { error = parse_options(sb, data, isvfat, silent, , >options); if (error) goto out_fail; sbi->prev_free = FAT_START_ENT; sb->s_maxbytes = 0x; fat_time_fat2unix(sbi, , 0, cpu_to_le16(FAT_DATE_MIN), 0); sb->s_time_min = ts.tv_sec; fat_time_fat2unix(sbi, , cpu_to_le16(FAT_TIME_MAX), cpu_to_le16(FAT_DATE_MAX), 0); sb->s_time_max = ts.tv_sec; } I do not see what the problem is. -Deepa
Re: [PATCH 03/20] timestamp_truncate: Replace users of timespec64_trunc
On Tue, Jul 30, 2019 at 1:27 AM OGAWA Hirofumi wrote: > > Deepa Dinamani writes: > > > diff --git a/fs/fat/misc.c b/fs/fat/misc.c > > index 1e08bd54c5fb..53bb7c6bf993 100644 > > --- a/fs/fat/misc.c > > +++ b/fs/fat/misc.c > > @@ -307,8 +307,9 @@ int fat_truncate_time(struct inode *inode, struct > > timespec64 *now, int flags) > > inode->i_atime = (struct timespec64){ seconds, 0 }; > > } > > if (flags & S_CTIME) { > > - if (sbi->options.isvfat) > > - inode->i_ctime = timespec64_trunc(*now, 1000); > > + if (sbi->options.isvfat) { > > + inode->i_ctime = timestamp_truncate(*now, inode); > > + } > > else > > inode->i_ctime = fat_timespec64_trunc_2secs(*now); > > } > > Looks like broken. It changed to sb->s_time_gran from 1000, and > changed coding style. This is using a new api: timestamp_truncate(). granularity is gotten by inode->sb->s_time_gran. See Patch [2/20]: https://lkml.org/lkml/2019/7/29/1853 So this is not broken if fat is filling in the right granularity in the sb. -Deepa
[PATCH 16/20] fs: orangefs: Initialize filesystem timestamp ranges
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Assume the limits as unsigned according to the below commit 98e8eef557a9 ("changed PVFS_time from int64_t to uint64_t") in https://github.com/waltligon/orangefs Author: Neill Miller Date: Thu Sep 2 15:00:38 2004 + Signed-off-by: Deepa Dinamani Cc: hub...@omnibond.com Cc: mar...@omnibond.com Cc: de...@lists.orangefs.org --- fs/orangefs/super.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/orangefs/super.c b/fs/orangefs/super.c index ee5efdc35cc1..dcd97e8158b1 100644 --- a/fs/orangefs/super.c +++ b/fs/orangefs/super.c @@ -439,6 +439,8 @@ static int orangefs_fill_sb(struct super_block *sb, sb->s_blocksize = PAGE_SIZE; sb->s_blocksize_bits = PAGE_SHIFT; sb->s_maxbytes = MAX_LFS_FILESIZE; + sb->s_time_min = 0; + sb->s_time_max = S64_MAX; ret = super_setup_bdi(sb); if (ret) -- 2.17.1
[PATCH 15/20] fs: ceph: Initialize filesystem timestamp ranges
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. According to the disscussion in https://patchwork.kernel.org/patch/8308691/ we agreed to use unsigned 32 bit timestamps on ceph. Update the limits accordingly. Signed-off-by: Deepa Dinamani Cc: z...@redhat.com Cc: s...@redhat.com Cc: idryo...@gmail.com Cc: ceph-de...@vger.kernel.org --- fs/ceph/super.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/ceph/super.c b/fs/ceph/super.c index d57fa60dcd43..6cf3a4fceac1 100644 --- a/fs/ceph/super.c +++ b/fs/ceph/super.c @@ -981,6 +981,8 @@ static int ceph_set_super(struct super_block *s, void *data) s->s_export_op = _export_ops; s->s_time_gran = 1000; /* 1000 ns == 1 us */ + s->s_time_min = 0; + s->s_time_max = U32_MAX; ret = set_anon_super(s, NULL); /* what is that second arg for? */ if (ret != 0) -- 2.17.1
[PATCH 18/20] fs: omfs: Initialize filesystem timestamp ranges
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Signed-off-by: Deepa Dinamani Cc: m...@bobcopeland.com Cc: linux-karma-de...@lists.sourceforge.net --- fs/omfs/inode.c | 4 1 file changed, 4 insertions(+) diff --git a/fs/omfs/inode.c b/fs/omfs/inode.c index 08226a835ec3..b76ec6b88ded 100644 --- a/fs/omfs/inode.c +++ b/fs/omfs/inode.c @@ -478,6 +478,10 @@ static int omfs_fill_super(struct super_block *sb, void *data, int silent) sb->s_maxbytes = 0x; + sb->s_time_gran = NSEC_PER_MSEC; + sb->s_time_min = 0; + sb->s_time_max = U64_MAX / MSEC_PER_SEC; + sb_set_blocksize(sb, 0x200); bh = sb_bread(sb, 0); -- 2.17.1
[PATCH 17/20] fs: hpfs: Initialize filesystem timestamp ranges
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Also change the local_to_gmt() to use time64_t instead of time32_t. Signed-off-by: Deepa Dinamani Cc: miku...@artax.karlin.mff.cuni.cz --- fs/hpfs/hpfs_fn.h | 6 ++ fs/hpfs/super.c | 2 ++ 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/hpfs/hpfs_fn.h b/fs/hpfs/hpfs_fn.h index ab2e7cc2ff33..1cca83218fb5 100644 --- a/fs/hpfs/hpfs_fn.h +++ b/fs/hpfs/hpfs_fn.h @@ -334,7 +334,7 @@ long hpfs_ioctl(struct file *file, unsigned cmd, unsigned long arg); * local time (HPFS) to GMT (Unix) */ -static inline time64_t local_to_gmt(struct super_block *s, time32_t t) +static inline time64_t local_to_gmt(struct super_block *s, time64_t t) { extern struct timezone sys_tz; return t + sys_tz.tz_minuteswest * 60 + hpfs_sb(s)->sb_timeshift; @@ -343,9 +343,7 @@ static inline time64_t local_to_gmt(struct super_block *s, time32_t t) static inline time32_t gmt_to_local(struct super_block *s, time64_t t) { extern struct timezone sys_tz; - t = t - sys_tz.tz_minuteswest * 60 - hpfs_sb(s)->sb_timeshift; - - return clamp_t(time64_t, t, 0, U32_MAX); + return t - sys_tz.tz_minuteswest * 60 - hpfs_sb(s)->sb_timeshift; } static inline time32_t local_get_seconds(struct super_block *s) diff --git a/fs/hpfs/super.c b/fs/hpfs/super.c index 9db6d84f0d62..0a677a9aaf34 100644 --- a/fs/hpfs/super.c +++ b/fs/hpfs/super.c @@ -614,6 +614,8 @@ static int hpfs_fill_super(struct super_block *s, void *options, int silent) s->s_magic = HPFS_SUPER_MAGIC; s->s_op = _sops; s->s_d_op = _dentry_operations; + s->s_time_min = local_to_gmt(s, 0); + s->s_time_max = local_to_gmt(s, U32_MAX); sbi->sb_root = le32_to_cpu(superblock->root); sbi->sb_fs_size = le32_to_cpu(superblock->n_sectors); -- 2.17.1
[PATCH 13/20] fs: affs: Initialize filesystem timestamp ranges
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Also fix timestamp calculation to avoid overflow while converting from days to seconds. Signed-off-by: Deepa Dinamani Cc: dste...@suse.com --- fs/affs/amigaffs.c | 2 +- fs/affs/amigaffs.h | 3 +++ fs/affs/inode.c| 4 ++-- fs/affs/super.c| 4 4 files changed, 10 insertions(+), 3 deletions(-) diff --git a/fs/affs/amigaffs.c b/fs/affs/amigaffs.c index 14a6c1b90c9f..f708c45d5f66 100644 --- a/fs/affs/amigaffs.c +++ b/fs/affs/amigaffs.c @@ -375,7 +375,7 @@ affs_secs_to_datestamp(time64_t secs, struct affs_date *ds) u32 minute; s32 rem; - secs -= sys_tz.tz_minuteswest * 60 + ((8 * 365 + 2) * 24 * 60 * 60); + secs -= sys_tz.tz_minuteswest * 60 + AFFS_EPOCH_DELTA; if (secs < 0) secs = 0; days= div_s64_rem(secs, 86400, ); diff --git a/fs/affs/amigaffs.h b/fs/affs/amigaffs.h index f9bef9056659..81fb396d4dfa 100644 --- a/fs/affs/amigaffs.h +++ b/fs/affs/amigaffs.h @@ -32,6 +32,9 @@ #define AFFS_ROOT_BMAPS25 +/* Seconds since Amiga epoch of 1978/01/01 to UNIX */ +#define AFFS_EPOCH_DELTA ((8 * 365 + 2) * 86400LL) + struct affs_date { __be32 days; __be32 mins; diff --git a/fs/affs/inode.c b/fs/affs/inode.c index 73598bff8506..a346cf7659f1 100644 --- a/fs/affs/inode.c +++ b/fs/affs/inode.c @@ -150,10 +150,10 @@ struct inode *affs_iget(struct super_block *sb, unsigned long ino) } inode->i_mtime.tv_sec = inode->i_atime.tv_sec = inode->i_ctime.tv_sec - = (be32_to_cpu(tail->change.days) * (24 * 60 * 60) + + = (be32_to_cpu(tail->change.days) * 86400LL + be32_to_cpu(tail->change.mins) * 60 + be32_to_cpu(tail->change.ticks) / 50 + -((8 * 365 + 2) * 24 * 60 * 60)) + +AFFS_EPOCH_DELTA) + sys_tz.tz_minuteswest * 60; inode->i_mtime.tv_nsec = inode->i_ctime.tv_nsec = inode->i_atime.tv_nsec = 0; affs_brelse(bh); diff --git a/fs/affs/super.c b/fs/affs/super.c index e7d036efbaa1..cc463ae47c12 100644 --- a/fs/affs/super.c +++ b/fs/affs/super.c @@ -355,6 +355,10 @@ static int affs_fill_super(struct super_block *sb, void *data, int silent) sb->s_op= _sops; sb->s_flags |= SB_NODIRATIME; + sb->s_time_gran = NSEC_PER_SEC; + sb->s_time_min = sys_tz.tz_minuteswest * 60 + AFFS_EPOCH_DELTA; + sb->s_time_max = 86400LL * U32_MAX + 86400 + sb->s_time_min; + sbi = kzalloc(sizeof(struct affs_sb_info), GFP_KERNEL); if (!sbi) return -ENOMEM; -- 2.17.1
[PATCH 07/20] 9p: Fill min and max timestamps in sb
struct p9_wstat and struct p9_stat_dotl indicate that the wire transport uses u32 and u64 fields for timestamps. Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Note that the upper bound for V9FS_PROTO_2000L is retained as S64_MAX. This is because that is the upper bound supported by vfs. Signed-off-by: Deepa Dinamani Cc: eri...@gmail.com Cc: lu...@ionkov.net Cc: asmad...@codewreck.org Cc: v9fs-develo...@lists.sourceforge.net --- fs/9p/vfs_super.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c index 08112fbcaece..ca243e658d71 100644 --- a/fs/9p/vfs_super.c +++ b/fs/9p/vfs_super.c @@ -69,8 +69,12 @@ v9fs_fill_super(struct super_block *sb, struct v9fs_session_info *v9ses, if (v9fs_proto_dotl(v9ses)) { sb->s_op = _super_ops_dotl; sb->s_xattr = v9fs_xattr_handlers; - } else + } else { sb->s_op = _super_ops; + sb->s_time_max = U32_MAX; + } + + sb->s_time_min = 0; ret = super_setup_bdi(sb); if (ret) -- 2.17.1
[PATCH 19/20] pstore: fs superblock limits
Also update the gran since pstore has microsecond granularity. Signed-off-by: Deepa Dinamani Cc: an...@enomsg.org Cc: ccr...@android.com Cc: keesc...@chromium.org Cc: tony.l...@intel.com --- fs/pstore/inode.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c index 89a80b568a17..ee752f9fda57 100644 --- a/fs/pstore/inode.c +++ b/fs/pstore/inode.c @@ -388,7 +388,9 @@ static int pstore_fill_super(struct super_block *sb, void *data, int silent) sb->s_blocksize_bits= PAGE_SHIFT; sb->s_magic = PSTOREFS_MAGIC; sb->s_op= _ops; - sb->s_time_gran = 1; + sb->s_time_gran = NSEC_PER_USEC; + sb->s_time_min = S64_MIN; + sb->s_time_max = S64_MAX; parse_options(data); -- 2.17.1
[PATCH 11/20] fs: cifs: Initialize filesystem timestamp ranges
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Also fixed cnvrtDosUnixTm calculations to avoid int overflow while computing maximum date. References: http://cifs.com/ https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-cifs/d416ff7c-c536-406e-a951-4f04b2fd1d2b Signed-off-by: Deepa Dinamani Cc: sfre...@samba.org Cc: linux-c...@vger.kernel.org --- fs/cifs/cifsfs.c | 22 ++ fs/cifs/netmisc.c | 14 +++--- 2 files changed, 29 insertions(+), 7 deletions(-) diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c index 81a16b4e1b48..94a52a63b9ea 100644 --- a/fs/cifs/cifsfs.c +++ b/fs/cifs/cifsfs.c @@ -56,6 +56,15 @@ #include "dfs_cache.h" #endif +/* + * DOS dates from 1980/1/1 through 2107/12/31 + * Protocol specifications indicate the range should be to 119, which + * limits maximum year to 2099. But this range has not been checked. + */ +#define SMB_DATE_MAX (127<<9 | 12<<5 | 31) +#define SMB_DATE_MIN (0<<9 | 1<<5 | 1) +#define SMB_TIME_MAX (23<<11 | 59<<5 | 29) + int cifsFYI = 0; bool traceSMB; bool enable_oplocks = true; @@ -142,6 +151,7 @@ cifs_read_super(struct super_block *sb) struct inode *inode; struct cifs_sb_info *cifs_sb; struct cifs_tcon *tcon; + struct timespec64 ts; int rc = 0; cifs_sb = CIFS_SB(sb); @@ -161,6 +171,18 @@ cifs_read_super(struct super_block *sb) /* BB FIXME fix time_gran to be larger for LANMAN sessions */ sb->s_time_gran = 100; + if (tcon->unix_ext) { + ts = cifs_NTtimeToUnix(0); + sb->s_time_min = ts.tv_sec; + ts = cifs_NTtimeToUnix(cpu_to_le64(S64_MAX)); + sb->s_time_max = ts.tv_sec; + } else { + ts = cnvrtDosUnixTm(cpu_to_le16(SMB_DATE_MIN), 0, 0); + sb->s_time_min = ts.tv_sec; + ts = cnvrtDosUnixTm(cpu_to_le16(SMB_DATE_MAX), cpu_to_le16(SMB_TIME_MAX), 0); + sb->s_time_max = ts.tv_sec; + } + sb->s_magic = CIFS_MAGIC_NUMBER; sb->s_op = _super_ops; sb->s_xattr = cifs_xattr_handlers; diff --git a/fs/cifs/netmisc.c b/fs/cifs/netmisc.c index ed92958e842d..49c17ee18254 100644 --- a/fs/cifs/netmisc.c +++ b/fs/cifs/netmisc.c @@ -949,8 +949,8 @@ static const int total_days_of_prev_months[] = { struct timespec64 cnvrtDosUnixTm(__le16 le_date, __le16 le_time, int offset) { struct timespec64 ts; - time64_t sec; - int min, days, month, year; + time64_t sec, days; + int min, day, month, year; u16 date = le16_to_cpu(le_date); u16 time = le16_to_cpu(le_time); SMB_TIME *st = (SMB_TIME *) @@ -966,15 +966,15 @@ struct timespec64 cnvrtDosUnixTm(__le16 le_date, __le16 le_time, int offset) sec += 60 * 60 * st->Hours; if (st->Hours > 24) cifs_dbg(VFS, "illegal hours %d\n", st->Hours); - days = sd->Day; + day = sd->Day; month = sd->Month; - if (days < 1 || days > 31 || month < 1 || month > 12) { - cifs_dbg(VFS, "illegal date, month %d day: %d\n", month, days); - days = clamp(days, 1, 31); + if (day < 1 || day > 31 || month < 1 || month > 12) { + cifs_dbg(VFS, "illegal date, month %d day: %d\n", month, day); + day = clamp(day, 1, 31); month = clamp(month, 1, 12); } month -= 1; - days += total_days_of_prev_months[month]; + days = day + total_days_of_prev_months[month]; days += 3652; /* account for difference in days between 1980 and 1970 */ year = sd->Year; days += year * 365; -- 2.17.1
[PATCH 20/20] isofs: Initialize filesystem timestamp ranges
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Reference: http://www.ecma-international.org/publications/standards/Ecma-119.htm Signed-off-by: Deepa Dinamani --- fs/isofs/inode.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/fs/isofs/inode.c b/fs/isofs/inode.c index 9e30d8703735..62c0462dc89f 100644 --- a/fs/isofs/inode.c +++ b/fs/isofs/inode.c @@ -30,6 +30,9 @@ #include "isofs.h" #include "zisofs.h" +/* max tz offset is 13 hours */ +#define MAX_TZ_OFFSET (52*15*60) + #define BEQUIET static int isofs_hashi(const struct dentry *parent, struct qstr *qstr); @@ -801,6 +804,10 @@ static int isofs_fill_super(struct super_block *s, void *data, int silent) */ s->s_maxbytes = 0x800LL; + /* ECMA-119 timestamp from 1900/1/1 with tz offset */ + s->s_time_min = mktime64(1900, 1, 1, 0, 0, 0) - MAX_TZ_OFFSET; + s->s_time_max = mktime64(U8_MAX+1900, 12, 31, 23, 59, 59) + MAX_TZ_OFFSET; + /* Set this for reference. Its not currently used except on write which we don't have .. */ -- 2.17.1
[PATCH 08/20] adfs: Fill in max and min timestamps in sb
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Note that the min timestamp is assumed to be 01 Jan 1970 00:00:00 (Unix epoch). This is consistent with the way we convert timestamps in adfs_adfs2unix_time(). Signed-off-by: Deepa Dinamani --- fs/adfs/adfs.h | 13 + fs/adfs/inode.c | 8 ++-- fs/adfs/super.c | 2 ++ 3 files changed, 17 insertions(+), 6 deletions(-) diff --git a/fs/adfs/adfs.h b/fs/adfs/adfs.h index 804c6a77c5db..781b1c3817e0 100644 --- a/fs/adfs/adfs.h +++ b/fs/adfs/adfs.h @@ -2,6 +2,19 @@ #include #include +/* + * 01 Jan 1970 00:00:00 (Unix epoch) as seconds since + * 01 Jan 1900 00:00:00 (RISC OS epoch) + */ +#define RISC_OS_EPOCH_DELTA 2208988800LL + +/* + * Convert 40 bit centi seconds to seconds + * since 01 Jan 1900 00:00:00 (RISC OS epoch) + * The result is 2248-06-03 06:57:57 GMT + */ +#define ADFS_MAX_TIMESTAMP ((0xFFLL / 100) - RISC_OS_EPOCH_DELTA) + /* Internal data structures for ADFS */ #define ADFS_FREE_FRAG 0 diff --git a/fs/adfs/inode.c b/fs/adfs/inode.c index 66621e96f9af..3f75cefc0380 100644 --- a/fs/adfs/inode.c +++ b/fs/adfs/inode.c @@ -170,11 +170,7 @@ static void adfs_adfs2unix_time(struct timespec64 *tv, struct inode *inode) { unsigned int high, low; - /* 01 Jan 1970 00:00:00 (Unix epoch) as nanoseconds since -* 01 Jan 1900 00:00:00 (RISC OS epoch) -*/ - static const s64 nsec_unix_epoch_diff_risc_os_epoch = - 22089888000LL; + static const s64 nsec_unix_epoch_diff_risc_os_epoch = RISC_OS_EPOCH_DELTA * NSEC_PER_SEC; s64 nsec; if (ADFS_I(inode)->stamped == 0) @@ -219,7 +215,7 @@ adfs_unix2adfs_time(struct inode *inode, unsigned int secs) if (ADFS_I(inode)->stamped) { /* convert 32-bit seconds to 40-bit centi-seconds */ low = (secs & 255) * 100; - high = (secs / 256) * 100 + (low >> 8) + 0x336e996a; + high = (secs / 256) * 100 + (low >> 8) + (RISC_OS_EPOCH_DELTA*100/256); ADFS_I(inode)->loadaddr = (high >> 24) | (ADFS_I(inode)->loadaddr & ~0xff); diff --git a/fs/adfs/super.c b/fs/adfs/super.c index 2a83655c408f..0a0854ef9e3c 100644 --- a/fs/adfs/super.c +++ b/fs/adfs/super.c @@ -449,6 +449,8 @@ static int adfs_fill_super(struct super_block *sb, void *data, int silent) asb->s_size = adfs_discsize(dr, sb->s_blocksize_bits); asb->s_version = dr->format_version; asb->s_log2sharesize= dr->log2sharesize; + sb->s_time_min = 0; + sb->s_time_max = ADFS_MAX_TIMESTAMP; asb->s_map = adfs_read_map(sb, dr); if (IS_ERR(asb->s_map)) { -- 2.17.1
[PATCH 04/20] mount: Add mount warning for impending timestamp expiry
The warning reuses the uptime max of 30 years used by the setitimeofday(). Note that the warning is only added for new filesystem mounts through the mount syscall. Automounts do not have the same warning. Signed-off-by: Deepa Dinamani --- fs/namespace.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/fs/namespace.c b/fs/namespace.c index b26778bdc236..5314fac8035e 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -2739,6 +2739,17 @@ static int do_new_mount_fc(struct fs_context *fc, struct path *mountpoint, error = do_add_mount(real_mount(mnt), mountpoint, mnt_flags); if (error < 0) mntput(mnt); + + if (!error && sb->s_time_max && + (ktime_get_real_seconds() + TIME_UPTIME_SEC_MAX > sb->s_time_max)) { + char *buf = (char *)__get_free_page(GFP_KERNEL); + char *mntpath = buf ? d_path(mountpoint, buf, PAGE_SIZE) : ERR_PTR(-ENOMEM); + + pr_warn("Mounted %s file system at %s supports timestamps until 0x%llx\n", + fc->fs_type->name, mntpath, (unsigned long long)sb->s_time_max); + free_page((unsigned long)buf); + } + return error; } -- 2.17.1
[PATCH 09/20] ext4: Initialize timestamps limits
ext4 has different overflow limits for max filesystem timestamps based on the extra bytes available. The timestamp limits are calculated according to the encoding table in a4dad1ae24f85i(ext4: Fix handling of extended tv_sec): * extra msb of adjust for signed * epoch 32-bit 32-bit tv_sec to * bits timedecoded 64-bit tv_sec 64-bit tv_sec valid time range * 0 01-0x8000..-0x0001 0x0 1901-12-13..1969-12-31 * 0 000x0..0x07fff 0x0 1970-01-01..2038-01-19 * 0 110x08000..0x0 0x1 2038-01-19..2106-02-07 * 0 100x1..0x17fff 0x1 2106-02-07..2174-02-25 * 1 010x18000..0x1 0x2 2174-02-25..2242-03-16 * 1 000x2..0x27fff 0x2 2242-03-16..2310-04-04 * 1 110x28000..0x2 0x3 2310-04-04..2378-04-22 * 1 100x3..0x37fff 0x3 2378-04-22..2446-05-10 Note that the time limits are not correct for deletion times. Signed-off-by: Deepa Dinamani Reviewed-by: Andreas Dilger Cc: ty...@mit.edu Cc: adilger.ker...@dilger.ca Cc: linux-e...@vger.kernel.org --- fs/ext4/ext4.h | 4 fs/ext4/super.c | 17 +++-- 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 1cb67859e051..3f13cf12ae7f 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1631,6 +1631,10 @@ static inline void ext4_clear_state_flags(struct ext4_inode_info *ei) #define EXT4_GOOD_OLD_INODE_SIZE 128 +#define EXT4_EXTRA_TIMESTAMP_MAX (((s64)1 << 34) - 1 + S32_MIN) +#define EXT4_NON_EXTRA_TIMESTAMP_MAX S32_MAX +#define EXT4_TIMESTAMP_MIN S32_MIN + /* * Feature set definitions */ diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 4079605d437a..3ea2d60f33aa 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -4035,8 +4035,21 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) sbi->s_inode_size); goto failed_mount; } - if (sbi->s_inode_size > EXT4_GOOD_OLD_INODE_SIZE) - sb->s_time_gran = 1 << (EXT4_EPOCH_BITS - 2); + /* +* i_atime_extra is the last extra field available for [acm]times in +* struct ext4_inode. Checking for that field should suffice to ensure +* we have extra space for all three. +*/ + if (sbi->s_inode_size >= offsetof(struct ext4_inode, i_atime_extra) + + sizeof(((struct ext4_inode *)0)->i_atime_extra)) { + sb->s_time_gran = 1; + sb->s_time_max = EXT4_EXTRA_TIMESTAMP_MAX; + } else { + sb->s_time_gran = NSEC_PER_SEC; + sb->s_time_max = EXT4_NON_EXTRA_TIMESTAMP_MAX; + } + + sb->s_time_min = EXT4_TIMESTAMP_MIN; } sbi->s_desc_size = le16_to_cpu(es->s_desc_size); -- 2.17.1
[PATCH 05/20] utimes: Clamp the timestamps before update
POSIX is ambiguous on the behavior of timestamps for futimens, utimensat and utimes. Whether to return an error or silently clamp a timestamp beyond the range supported by the underlying filesystems is not clear. POSIX.1 section for futimens, utimensat and utimes says: (http://pubs.opengroup.org/onlinepubs/9699919799/functions/futimens.html) The file's relevant timestamp shall be set to the greatest value supported by the file system that is not greater than the specified time. If the tv_nsec field of a timespec structure has the special value UTIME_NOW, the file's relevant timestamp shall be set to the greatest value supported by the file system that is not greater than the current time. [EINVAL] A new file timestamp would be a value whose tv_sec component is not a value supported by the file system. The patch chooses to clamp the timestamps according to the filesystem timestamp ranges and does not return an error. This is in line with the behavior of utime syscall also since the POSIX page(http://pubs.opengroup.org/onlinepubs/009695399/functions/utime.html) for utime does not mention returning an error or clamping like above. Same for utimes http://pubs.opengroup.org/onlinepubs/009695399/functions/utimes.html Signed-off-by: Deepa Dinamani --- fs/utimes.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/fs/utimes.c b/fs/utimes.c index 350c9c16ace1..4c1a2ce90bbc 100644 --- a/fs/utimes.c +++ b/fs/utimes.c @@ -21,6 +21,7 @@ static int utimes_common(const struct path *path, struct timespec64 *times) int error; struct iattr newattrs; struct inode *inode = path->dentry->d_inode; + struct super_block *sb = inode->i_sb; struct inode *delegated_inode = NULL; error = mnt_want_write(path->mnt); @@ -36,16 +37,24 @@ static int utimes_common(const struct path *path, struct timespec64 *times) if (times[0].tv_nsec == UTIME_OMIT) newattrs.ia_valid &= ~ATTR_ATIME; else if (times[0].tv_nsec != UTIME_NOW) { - newattrs.ia_atime.tv_sec = times[0].tv_sec; - newattrs.ia_atime.tv_nsec = times[0].tv_nsec; + newattrs.ia_atime.tv_sec = + clamp(times[0].tv_sec, sb->s_time_min, sb->s_time_max); + if (times[0].tv_sec == sb->s_time_max || times[0].tv_sec == sb->s_time_min) + newattrs.ia_atime.tv_nsec = 0; + else + newattrs.ia_atime.tv_nsec = times[0].tv_nsec; newattrs.ia_valid |= ATTR_ATIME_SET; } if (times[1].tv_nsec == UTIME_OMIT) newattrs.ia_valid &= ~ATTR_MTIME; else if (times[1].tv_nsec != UTIME_NOW) { - newattrs.ia_mtime.tv_sec = times[1].tv_sec; - newattrs.ia_mtime.tv_nsec = times[1].tv_nsec; + newattrs.ia_mtime.tv_sec = + clamp(times[1].tv_sec, sb->s_time_min, sb->s_time_max); + if (times[1].tv_sec >= sb->s_time_max || times[1].tv_sec == sb->s_time_min) + newattrs.ia_mtime.tv_nsec = 0; + else + newattrs.ia_mtime.tv_nsec = times[1].tv_nsec; newattrs.ia_valid |= ATTR_MTIME_SET; } /* -- 2.17.1
[PATCH 10/20] fs: nfs: Initialize filesystem timestamp ranges
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. The time formats for various verious is detailed in the RFCs as below: https://tools.ietf.org/html/rfc7862(time metadata) https://tools.ietf.org/html/rfc7530: nfstime4 struct nfstime4 { int64_t seconds; uint32_tnseconds; }; https://tools.ietf.org/html/rfc1094 struct timeval { unsigned int seconds; unsigned int useconds; }; https://tools.ietf.org/html/rfc1813 struct nfstime3 { uint32 seconds; uint32 nseconds; }; Use the limits as per the RFC. Signed-off-by: Deepa Dinamani Cc: trond.mykleb...@hammerspace.com Cc: anna.schuma...@netapp.com Cc: linux-...@vger.kernel.org --- fs/nfs/super.c | 20 +++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/fs/nfs/super.c b/fs/nfs/super.c index f88ddac2dcdf..54eb5a47f180 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -2360,6 +2360,15 @@ void nfs_fill_super(struct super_block *sb, struct nfs_mount_info *mount_info) sb->s_flags |= SB_POSIXACL; sb->s_time_gran = 1; sb->s_export_op = _export_ops; + } else + sb->s_time_gran = 1000; + + if (server->nfs_client->rpc_ops->version != 4) { + sb->s_time_min = 0; + sb->s_time_max = U32_MAX; + } else { + sb->s_time_min = S64_MIN; + sb->s_time_max = S64_MAX; } nfs_initialise_sb(sb); @@ -2380,7 +2389,6 @@ static void nfs_clone_super(struct super_block *sb, sb->s_maxbytes = old_sb->s_maxbytes; sb->s_xattr = old_sb->s_xattr; sb->s_op = old_sb->s_op; - sb->s_time_gran = 1; sb->s_export_op = old_sb->s_export_op; if (server->nfs_client->rpc_ops->version != 2) { @@ -2388,6 +2396,16 @@ static void nfs_clone_super(struct super_block *sb, * so ourselves when necessary. */ sb->s_flags |= SB_POSIXACL; + sb->s_time_gran = 1; + } else + sb->s_time_gran = 1000; + + if (server->nfs_client->rpc_ops->version != 4) { + sb->s_time_min = 0; + sb->s_time_max = U32_MAX; + } else { + sb->s_time_min = S64_MIN; + sb->s_time_max = S64_MAX; } nfs_initialise_sb(sb); -- 2.17.1
[PATCH 14/20] fs: sysv: Initialize filesystem timestamp ranges
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Signed-off-by: Deepa Dinamani Cc: h...@infradead.org --- fs/sysv/super.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/sysv/super.c b/fs/sysv/super.c index d788b1daa7eb..cc8e2ed155c8 100644 --- a/fs/sysv/super.c +++ b/fs/sysv/super.c @@ -368,7 +368,8 @@ static int sysv_fill_super(struct super_block *sb, void *data, int silent) sbi->s_block_base = 0; mutex_init(>s_lock); sb->s_fs_info = sbi; - + sb->s_time_min = 0; + sb->s_time_max = U32_MAX; sb_set_blocksize(sb, BLOCK_SIZE); for (i = 0; i < ARRAY_SIZE(flavours) && !size; i++) { @@ -487,6 +488,8 @@ static int v7_fill_super(struct super_block *sb, void *data, int silent) sbi->s_type = FSTYPE_V7; mutex_init(>s_lock); sb->s_fs_info = sbi; + sb->s_time_min = 0; + sb->s_time_max = U32_MAX; sb_set_blocksize(sb, 512); -- 2.17.1
[PATCH 01/20] vfs: Add file timestamp range support
Add fields to the superblock to track the min and max timestamps supported by filesystems. Initially, when a superblock is allocated, initialize it to the max and min values the fields can hold. Individual filesystems override these to match their actual limits. Pseudo filesystems are assumed to always support the min and max allowable values for the fields. Signed-off-by: Deepa Dinamani --- fs/super.c | 2 ++ include/linux/fs.h | 3 +++ include/linux/time64.h | 2 ++ 3 files changed, 7 insertions(+) diff --git a/fs/super.c b/fs/super.c index 2739f57515f8..e5835c38d336 100644 --- a/fs/super.c +++ b/fs/super.c @@ -257,6 +257,8 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags, s->s_maxbytes = MAX_NON_LFS; s->s_op = _op; s->s_time_gran = 10; + s->s_time_min = TIME64_MIN; + s->s_time_max = TIME64_MAX; s->cleancache_poolid = CLEANCACHE_NO_POOL; s->s_shrink.seeks = DEFAULT_SEEKS; diff --git a/include/linux/fs.h b/include/linux/fs.h index f1a6ca872943..e9d04e4e5628 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1448,6 +1448,9 @@ struct super_block { /* Granularity of c/m/atime in ns (cannot be worse than a second) */ u32 s_time_gran; + /* Time limits for c/m/atime in seconds */ + time64_t s_time_min; + time64_t s_time_max; #ifdef CONFIG_FSNOTIFY __u32 s_fsnotify_mask; struct fsnotify_mark_connector __rcu*s_fsnotify_marks; diff --git a/include/linux/time64.h b/include/linux/time64.h index a620ee610b9f..19125489ae94 100644 --- a/include/linux/time64.h +++ b/include/linux/time64.h @@ -30,6 +30,8 @@ struct itimerspec64 { /* Located here for timespec[64]_valid_strict */ #define TIME64_MAX ((s64)~((u64)1 << 63)) +#define TIME64_MIN (-TIME64_MAX - 1) + #define KTIME_MAX ((s64)~((u64)1 << 63)) #define KTIME_SEC_MAX (KTIME_MAX / NSEC_PER_SEC) -- 2.17.1
[PATCH 12/20] fs: fat: Initialize filesystem timestamp ranges
Fill in the appropriate limits to avoid inconsistencies in the vfs cached inode times when timestamps are outside the permitted range. Some FAT variants indicate that the years after 2099 are not supported. Since commit 7decd1cb0305 ("fat: Fix and cleanup timestamp conversion"), we support the full range of years that can be represented, up to 2107. Signed-off-by: Deepa Dinamani Cc: hirof...@mail.parknet.co.jp --- fs/fat/inode.c | 12 1 file changed, 12 insertions(+) diff --git a/fs/fat/inode.c b/fs/fat/inode.c index 05689198f5af..5f04c5c810fb 100644 --- a/fs/fat/inode.c +++ b/fs/fat/inode.c @@ -31,6 +31,11 @@ #define KB_IN_SECTORS 2 +/* DOS dates from 1980/1/1 through 2107/12/31 */ +#define FAT_DATE_MIN (0<<9 | 1<<5 | 1) +#define FAT_DATE_MAX (127<<9 | 12<<5 | 31) +#define FAT_TIME_MAX (23<<11 | 59<<5 | 29) + /* * A deserialized copy of the on-disk structure laid out in struct * fat_boot_sector. @@ -1605,6 +1610,7 @@ int fat_fill_super(struct super_block *sb, void *data, int silent, int isvfat, int debug; long error; char buf[50]; + struct timespec64 ts; /* * GFP_KERNEL is ok here, because while we do hold the @@ -1698,6 +1704,12 @@ int fat_fill_super(struct super_block *sb, void *data, int silent, int isvfat, sbi->free_clus_valid = 0; sbi->prev_free = FAT_START_ENT; sb->s_maxbytes = 0x; + fat_time_fat2unix(sbi, , 0, cpu_to_le16(FAT_DATE_MIN), 0); + sb->s_time_min = ts.tv_sec; + + fat_time_fat2unix(sbi, , cpu_to_le16(FAT_TIME_MAX), + cpu_to_le16(FAT_DATE_MAX), 0); + sb->s_time_max = ts.tv_sec; if (!sbi->fat_length && bpb.fat32_length) { struct fat_boot_fsinfo *fsinfo; -- 2.17.1
[PATCH 00/20] vfs: Add support for timestamp limits
The series is an update and a more complete version of the previously posted series at https://lore.kernel.org/linux-fsdevel/20180122020426.2988-1-deepa.ker...@gmail.com/ Thanks to Arnd Bergmann for doing a few preliminary reviews. They helped me fix a few issues I had overlooked. The limits (sometimes granularity also) for the filesystems updated here are according to the following table: File system Time type Start year Expiration year Granularity cramfsfixed 0 0 romfs fixed 0 0 pstoreascii seconds (27 digit ascii) S64_MINS64_MAX NSEC_PER_USEC coda INT64 S64_MINS64_MAX 1 omfs 64-bit milliseconds0 U64_MAX/ 1000 NSEC_PER_MSEC befs unsigned 48-bit seconds0 0x alloc_super bfs unsigned 32-bit seconds0 U32_MAX alloc_super efs unsigned 32-bit seconds0 U32_MAX alloc_super ext2 signed 32-bit seconds S32_MINS32_MAX alloc_super ext3 signed 32-bit seconds S32_MINS32_MAX alloc_super ext4 (old)signed 32-bit seconds S32_MINS32_MAX alloc_super ext4 (extra) 34-bit seconds, 30-bit ns S32_MIN0x37fff 1 freevxfs u32 secs/usecs 0 U32_MAX alloc_super jffs2 unsigned 32-bit seconds0 U32_MAX alloc_super jfs unsigned 32-bit seconds/ns 0 U32_MAX 1 minix unsigned 32-bit seconds0 U32_MAX alloc_super orangefs u64 seconds0 U64_MAX alloc_super qnx4 unsigned 32-bit seconds0 U32_MAX alloc_super qnx6 unsigned 32-bit seconds0 U32_MAX alloc_super reiserfs unsigned 32-bit seconds0 U32_MAX alloc_super squashfs unsigned 32-bit seconds0 U32_MAX alloc_super ufs1 signed 32-bit seconds S32_MINS32_MAX NSEC_PER_SEC ufs2 signed 64-bit seconds/u32 ns S64_MINS64_MAX 1 xfs signed 32-bit seconds/ns S32_MINS32_MAX 1 ceph unsigned 32-bit second/ns 0 U32_MAX 1000 sysv unsigned 32-bit seconds0 U32_MAX alloc_super affs u32 day, min, ticks1978 u32_max days NSEC_PER_SEC nfsv2 unsigned 32-bit seconds/ns 0 U32_MAX 1 nfsv3 unsigned 32-bit seconds/ns 0 U32_MAX 1000 nfsv4 u64 seconds/u32 ns S64_MINS64_MAX 1000 isofs u8 year since 1900 (fixable) 1900 2155 alloc_super hpfs unsigned 32-bit seconds1970 2106 alloc_super fat 7-bit years, 2s resolution 1980 2107 cifs (smb)7-bit years1980 2107 cifs (modern) 64-bit 100ns since 16011601 30828 adfs 40-bit cs since 1900 1900 2248 9p (9P2000) unsigned 32-bit seconds1970 2106 9p (9P2000.L) signed 64-bit seconds, ns 1970 S64_MAX Granularity column filled in by the alloc_super() in the above table indicates that the granularity is NSEC_PER_SEC. Note that anything not mentioned above still has the default limits S64_MIN..S64_MAX. The patches in the series are as structured below: 1. Add vfs support to maintain the limits per filesystem. 2. Add a new timestamp_truncate() api for clamping timestamps according to the filesystem limits. 3. Add a warning for mount syscall to indicate the impending expiry of timestamps. 4. Modify utimes to clamp the timestamps. 5. Fill in limits for filesystems. An updated version of the test for checking file system timestamp limits has been posted at https://www.spinics.net/lists/fstests/msg12262.html Changes from previous version: * No change in mount behavior because of expiry of timestamps. * Included limits for more filesystems. Deepa Dinamani (20): vfs: Add file timestamp range support vfs: Add timestamp_truncate() api timestamp_truncate: Replace users of timespec64_trunc mount: Add mount warning for impending timestamp expiry utimes: Clamp the timestamps before update fs: Fill in max and min timestamps in superblock 9p: Fill min and max timestamps in sb adfs: Fill in max and min timestamps in sb ext4: Initialize timestamps limits fs: nfs: Initialize filesystem timestamp ranges fs: cifs: Initialize filesystem timestamp ranges fs: fat: Initialize filesystem timestamp ranges fs: affs: Initialize filesystem timestamp ranges fs: sysv: Initialize filesystem timestamp ranges fs: ceph
[PATCH 02/20] vfs: Add timestamp_truncate() api
timespec_trunc() function is used to truncate a filesystem timestamp to the right granularity. But, the function does not clamp tv_sec part of the timestamps according to the filesystem timestamp limits. The replacement api: timestamp_truncate() also alters the signature of the function to accommodate filesystem timestamp clamping according to flesystem limits. Note that the tv_nsec part is set to 0 if tv_sec is not within the range supported for the filesystem. Signed-off-by: Deepa Dinamani --- fs/inode.c | 33 - include/linux/fs.h | 2 ++ 2 files changed, 34 insertions(+), 1 deletion(-) diff --git a/fs/inode.c b/fs/inode.c index 5f5431ec3d62..0fb1f0fb296a 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -2166,6 +2166,37 @@ struct timespec64 timespec64_trunc(struct timespec64 t, unsigned gran) } EXPORT_SYMBOL(timespec64_trunc); +/** + * timestamp_truncate - Truncate timespec to a granularity + * @t: Timespec + * @inode: inode being updated + * + * Truncate a timespec to the granularity supported by the fs + * containing the inode. Always rounds down. gran must + * not be 0 nor greater than a second (NSEC_PER_SEC, or 10^9 ns). + */ +struct timespec64 timestamp_truncate(struct timespec64 t, struct inode *inode) +{ + struct super_block *sb = inode->i_sb; + unsigned int gran = sb->s_time_gran; + + t.tv_sec = clamp(t.tv_sec, sb->s_time_min, sb->s_time_max); + if (unlikely(t.tv_sec == sb->s_time_max || t.tv_sec == sb->s_time_min)) + t.tv_nsec = 0; + + /* Avoid division in the common cases 1 ns and 1 s. */ + if (gran == 1) + ; /* nothing */ + else if (gran == NSEC_PER_SEC) + t.tv_nsec = 0; + else if (gran > 1 && gran < NSEC_PER_SEC) + t.tv_nsec -= t.tv_nsec % gran; + else + WARN(1, "invalid file time granularity: %u", gran); + return t; +} +EXPORT_SYMBOL(timestamp_truncate); + /** * current_time - Return FS time * @inode: inode. @@ -2187,6 +2218,6 @@ struct timespec64 current_time(struct inode *inode) return now; } - return timespec64_trunc(now, inode->i_sb->s_time_gran); + return timestamp_truncate(now, inode); } EXPORT_SYMBOL(current_time); diff --git a/include/linux/fs.h b/include/linux/fs.h index e9d04e4e5628..fdfe51d096fa 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -726,6 +726,8 @@ struct inode { void*i_private; /* fs or device private pointer */ } __randomize_layout; +struct timespec64 timestamp_truncate(struct timespec64 t, struct inode *inode); + static inline unsigned int i_blocksize(const struct inode *node) { return (1 << node->i_blkbits); -- 2.17.1
Re: [PATCH] signal: remove the wrong signal_pending() check in restore_user_sigmask()
> On Tue, Jun 4, 2019 at 3:41 PM Oleg Nesterov wrote: > > > > This is the minimal fix for stable, I'll send cleanups later. > > > > The commit 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add > > restore_user_sigmask()") introduced the visible change which breaks > > user-space: a signal temporary unblocked by set_user_sigmask() can > > be delivered even if the caller returns success or timeout. > > > > Change restore_user_sigmask() to accept the additional "interrupted" > > argument which should be used instead of signal_pending() check, and > > update the callers. > > > > Reported-by: Eric Wong > > Fixes: 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add > > restore_user_sigmask()") > > cc: sta...@vger.kernel.org (v5.0+) > > Signed-off-by: Oleg Nesterov > Acked-by: Deepa Dinamani The original fix posted: https://lore.kernel.org/patchwork/patch/1077355/ would also have been a correct fix for this problem. But, given the cleanups that are in the pipeline, this is a better fix. -Deepa
Re: pselect/etc semantics
On Thu, May 30, 2019 at 8:48 AM Deepa Dinamani wrote: > > > On May 30, 2019, at 8:38 AM, Eric W. Biederman > > wrote: > > > > ebied...@xmission.com (Eric W. Biederman) writes: > > > >> Which means I believe we have a semantically valid change in behavior > >> that is causing a regression. > > > > I haven't made a survey of all of the functions yet but > > fucntions return -ENORESTARTNOHAND will never return -EINTR and are > > immune from this problem. > > > > AKA pselect is fine. While epoll_pwait can be affected. > > This was my understanding as well. I think I was misremembered here. I had noted this before: https://lore.kernel.org/linux-fsdevel/CABeXuvq7gCV2qPOo+Q8jvNyRaTvhkRLRbnL_oJ-AuK7Sp=p...@mail.gmail.com/ "sys_io_pgetevents() does not seem to have this problem as we are still checking signal_pending() here. sys_pselect6() seems to have a similar problem. The changes to sys_pselect6() also impact sys_select() as the changes are in the common code path." This was the code replaced for io_pgetevents by 854a6ed56839a40f6b is as below. No matter what events completed, there was signal_pending() check after the return from do_io_getevents(). --- a/fs/aio.c +++ b/fs/aio.c @@ -2110,18 +2110,9 @@ SYSCALL_DEFINE6(io_pgetevents, return ret; ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? : NULL); - if (signal_pending(current)) { - if (ksig.sigmask) { - current->saved_sigmask = sigsaved; - set_restore_sigmask(); - } - - if (!ret) - ret = -ERESTARTNOHAND; - } else { - if (ksig.sigmask) - sigprocmask(SIG_SETMASK, , NULL); - } + restore_user_sigmask(ksig.sigmask, ); + if (signal_pending(current) && !ret) + ret = -ERESTARTNOHAND; Can I ask a simple question for my understanding? man page for epoll_pwait says EINTR The call was interrupted by a signal handler before either any of the requested events occurred or the timeout expired; see signal(7). But it is not clear to me if we can figure out(without race) the chronological order if one of the requested events are completed or a signal came first. Is this a correct exectation? Also like pointed out above, this behavior is not consistent for all such syscalls(io_pgetevents). Was this also by design? -Deepa
Re: pselect/etc semantics
> On May 30, 2019, at 8:38 AM, Eric W. Biederman wrote: > > ebied...@xmission.com (Eric W. Biederman) writes: > >> Which means I believe we have a semantically valid change in behavior >> that is causing a regression. > > I haven't made a survey of all of the functions yet but > fucntions return -ENORESTARTNOHAND will never return -EINTR and are > immune from this problem. > > AKA pselect is fine. While epoll_pwait can be affected. This was my understanding as well. > Has anyone contacted Omar Kilani to see if that is his issue? > https://lore.kernel.org/lkml/CA+8F9hicnF=kvjXPZFQy=pa2hjus3js+g9vswfhnqqynpmh...@mail.gmail.com/ Omar was cc-ed when this regression was reported. I did cc him on fix and asked if he could try it. We have not heard from him. > So far the only regression report I am seeing is from Eric Wong. > AKA https://lore.kernel.org/lkml/20190501021405.hfvd7ps623liu25i@dcvr/ > Are there any others? How did we get to be talking about more > than just epoll_pwait? This is the only report that I know of. I’m not sure why people started talking about pselect. I was also confused why instead of reviewing the patch and discussing the fix, we ended up talking about how to simplify the code. We have deviated much from what should have been a code review. -Deepa
Re: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask()
On Wed, May 29, 2019 at 9:57 AM Oleg Nesterov wrote: > > On 05/28, Deepa Dinamani wrote: > > > > I agree that signal handller being called and return value not being > > altered is an issue with other syscalls also. I was just wondering if > > some userspace code assumption would be assuming this. This is not a > > kernel bug. > > > > But, I do not think we have an understanding of what was wrong in > > 854a6ed56839a anymore since you pointed out that my assumption was not > > correct that the signal handler being called without errno being set > > is wrong. > > Deepa, sorry, I simply can't parse the above... most probably because of > my bad English. Ok, All I meant was that I had thought a signal handler being invoked without the error value reflecting it was wrong. That is what I had thought was wrong with 854a6ed56839a. Now, that we agree that signal handler can be invoked without the errno returning success, I thought I did not know what is wrong with 854a6ed56839a anymore. But, you now pointed out that the signals we care about should not be delivered after an event has been ready. This points out to what was wrong with 854a6ed56839a. Thanks. > > One open question: this part of epoll_pwait was already broken before > > 854a6ed56839a. Do you agree? > > > > if (err == -EINTR) { > >memcpy(>saved_sigmask, , > > sizeof(sigsaved)); > > set_restore_sigmask(); > > } else > >set_current_blocked(); > > I do not understand why do you think this part was broken :/ Ok, because of your other statement that the signals the application cares about do not want to know about signals they care about after an event is ready this is also not a problem. > > Or, I could revert the signal_pending() check and provide a fix > > something like below(not a complete patch) > > ... > > > -void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved) > > +int restore_user_sigmask(const void __user *usigmask, sigset_t > > *sigsaved, int sig_pending) > > { > > > > if (!usigmask) > >return; > > > > /* > > * When signals are pending, do not restore them here. > > * Restoring sigmask here can lead to delivering signals that the > > above > > * syscalls are intended to block because of the sigmask passed in. > > */ > > + if (sig_pending) { > > current->saved_sigmask = *sigsaved; > > set_restore_sigmask(); > >return; > >} > > > > @@ -2330,7 +2330,8 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct > > epoll_event __user *, events, > > > > error = do_epoll_wait(epfd, events, maxevents, timeout); > > > > - restore_user_sigmask(sigmask, ); > > + signal_detected = restore_user_sigmask(sigmask, , > > error == -EINTR); > > I fail to understand this pseudo-code, sorry. In particular, do not understand > why restore_user_sigmask() needs to return a boolean. That was a remnant from the other patch. Return type needs to be void. > The only thing I _seem to_ understand is the "sig_pending" flag passed by the > caller which replaces the signal_pending() check. Correct. This is what is the main change I was proposing. > Yes, this is what I think we > should do, and this is what I tried to propose from the very beginning in my > 1st email in this thread. This was not clear to me in your first response that you did not want the signal_pending() check in restore_user_sigmask(). : https://lore.kernel.org/lkml/20190522150505.ga4...@redhat.com/ "Ugh. I need to re-check, but at first glance I really dislike this change. I think we can fix the problem _and_ simplify the code. Something like below. The patch is obviously incomplete, it changes only only one caller of set_user_sigmask(), epoll_pwait() to explain what I mean. restore_user_sigmask() should simply die. Although perhaps another helper makes sense to add WARN_ON(test_tsk_restore_sigmask() && !signal_pending)." -Deepa
Re: pselect/etc semantics (Was: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask())
On Wed, May 29, 2019 at 9:12 AM Oleg Nesterov wrote: > > Al, Linus, Eric, please help. > > The previous discussion was very confusing, we simply can not understand each > other. > > To me everything looks very simple and clear, but perhaps I missed something > obvious? Please correct me. > > I think that the following code is correct > > int interrupted = 0; > > void sigint_handler(int sig) > { > interrupted = 1; > } > > int main(void) > { > sigset_t sigint, empty; > > sigemptyset(); > sigaddset(, SIGINT); > sigprocmask(SIG_BLOCK, , NULL); > > signal(SIGINT, sigint_handler); > > sigemptyset();// so pselect() unblocks SIGINT > > ret = pselect(..., ); > > if (ret >= 0) // sucess or timeout > assert(!interrupted); > > if (interrupted) > assert(ret == -EINTR); > } > > IOW, if pselect(sigmask) temporary unblocks SIGINT according to sigmask, this > signal should not be delivered if a ready fd was found or timeout. The signal > handle should only run if ret == -EINTR. I do not think we discussed this part earlier. But, if this is true then this is what is wrong as part of 854a6ed56839a. I missed that before. > (pselect() can be interrupted by any other signal which has a handler. In this > case the handler can be called even if ret >= 0. This is correct, I fail to > understand why some people think this is wrong, and in any case we simply > can't > avoid this). This patch is wrong because I did not know that it was ok to deliver a signal and not set the errno before. I also admitted to this. And proposed another way to revert the patch.: https://lore.kernel.org/lkml/CABeXuvouBzZuNarmNcd9JgZgvonL1N_p21gat=O_x0-1hMx=6...@mail.gmail.com/ -Deepa
Re: pselect/etc semantics (Was: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask())
Resending due to inadvertent conversion of prior message to html. On Wed, May 29, 2019 at 9:12 AM Oleg Nesterov wrote: > Al, Linus, Eric, please help. > > The previous discussion was very confusing, we simply can not understand each > other. > > To me everything looks very simple and clear, but perhaps I missed something > obvious? Please correct me. > > I think that the following code is correct > > int interrupted = 0; > > void sigint_handler(int sig) > { > interrupted = 1; > } > > int main(void) > { > sigset_t sigint, empty; > > sigemptyset(); > sigaddset(, SIGINT); > sigprocmask(SIG_BLOCK, , NULL); > > signal(SIGINT, sigint_handler); > > sigemptyset();// so pselect() unblocks SIGINT > > ret = pselect(..., ); > > if (ret >= 0) // sucess or timeout > assert(!interrupted); > > if (interrupted) > assert(ret == -EINTR); > } > > IOW, if pselect(sigmask) temporary unblocks SIGINT according to sigmask, this > signal should not be delivered if a ready fd was found or timeout. The signal > handle should only run if ret == -EINTR. > > (pselect() can be interrupted by any other signal which has a handler. In this > case the handler can be called even if ret >= 0. This is correct, I fail to > understand why some people think this is wrong, and in any case we simply > can't > avoid this). > > This was true until 854a6ed56839a ("signal: Add restore_user_sigmask()"), > now this is broken by the signal_pending() check in restore_user_sigmask(). > > This patch > https://lore.kernel.org/lkml/20190522032144.10995-1-deepa.ker...@gmail.com/ > turns 0 into -EINTR if signal_pending(), but I think we should simply restore > the old behaviour and simplify the code. > > See the compile-tested patch at the end. Of course, the new _xxx() helpers > should be renamed somehow. fs/aio.c doesn't look right with or without this > patch, but iiuc this is what it did before 854a6ed56839a. > > Let me show the code with the patch applied. I am using epoll_pwait() as an > example because it looks very simple. > > > static inline void set_restore_sigmask(void) > { > // WARN_ON(!TIF_SIGPENDING) was removed by this patch > current->restore_sigmask = true; > } > > int set_xxx(const sigset_t __user *umask, size_t sigsetsize) > { > sigset_t *kmask; > > if (!umask) > return 0; > if (sigsetsize != sizeof(sigset_t)) > return -EINVAL; > if (copy_from_user(kmask, umask, sizeof(sigset_t))) > return -EFAULT; > > // we can safely modify ->saved_sigmask/restore_sigmask, they has no meaning > // until the syscall returns. > set_restore_sigmask(); > current->saved_sigmask = current->blocked; > set_current_blocked(kmask); > > return 0; > } > > > void update_xxx(bool interrupted) > { > // the main reason for this helper is WARN_ON(!TIF_SIGPENDING) which was > "moved" > // from set_restore_sigmask() above. > if (interrupted) > WARN_ON(!test_thread_flag(TIF_SIGPENDING)); > else > restore_saved_sigmask(); > } > > SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, > events, > int, maxevents, int, timeout, const sigset_t __user > *, sigmask, > size_t, sigsetsize) > { > int error; > > error = set_xxx(sigmask, sigsetsize); > if (error) > return error; > > error = do_epoll_wait(epfd, events, maxevents, timeout); > update_xxx(error == -EINTR); > > return error; > } > > Oleg. > --- > > fs/aio.c | 40 ++- > fs/eventpoll.c | 12 +++ > fs/io_uring.c| 12 +++ > fs/select.c | 40 +++ > include/linux/compat.h | 4 +-- > include/linux/sched/signal.h | 2 -- > include/linux/signal.h | 6 ++-- > kernel/signal.c | 77 > > 8 files changed, 74 insertions(+), 119 deletions(-) > > > diff --git a/fs/aio.c b/fs/aio.c > index 3490d1f..8315bd2 100644 > --- a/fs/aio.c > +++ b/fs/aio.c > @@ -2093,8 +2093,8 @@ SYSCALL_DEFINE6(io_pgetevents, > const struct __aio_sigset __user *, usig) > { > struct __aio_sigset ksig = { NULL, }; > - sigset_tksigmask, sigsaved; > struct timespec64 ts; > +
Re: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask()
On Mon, May 27, 2019 at 8:04 AM Oleg Nesterov wrote: > > Deepa, > > it seems that we both are saying the same things again and again, and we > simply can't understand each other. Oleg, I'm sorry for the confusion. Maybe I should point out what I agree with also. I agree that signal handller being called and return value not being altered is an issue with other syscalls also. I was just wondering if some userspace code assumption would be assuming this. This is not a kernel bug. But, I do not think we have an understanding of what was wrong in 854a6ed56839a anymore since you pointed out that my assumption was not correct that the signal handler being called without errno being set is wrong. One open question: this part of epoll_pwait was already broken before 854a6ed56839a. Do you agree? if (err == -EINTR) { memcpy(>saved_sigmask, , sizeof(sigsaved)); set_restore_sigmask(); } else set_current_blocked(); What to do next? We could just see if your optimization patch resolves Eric's issue. Or, I could revert the signal_pending() check and provide a fix something like below(not a complete patch) since mainline has this regression. Eric had tested something like this works also. And, I can continue to look at what was wrong with 854a6ed56839a in the first place. Let me know what you prefer: -void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved) +int restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved, int sig_pending) { if (!usigmask) return; /* * When signals are pending, do not restore them here. * Restoring sigmask here can lead to delivering signals that the above * syscalls are intended to block because of the sigmask passed in. */ + if (sig_pending) { current->saved_sigmask = *sigsaved; set_restore_sigmask(); return; } @@ -2330,7 +2330,8 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events, error = do_epoll_wait(epfd, events, maxevents, timeout); - restore_user_sigmask(sigmask, ); + signal_detected = restore_user_sigmask(sigmask, , error == -EINTR); -Deepa
Re: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask()
> On May 28, 2019, at 2:12 AM, David Laight wrote: > > From: Deepa Dinamani >> Sent: 24 May 2019 18:02 > ... >> Look at the code before 854a6ed56839a: >> >> /* >> * If we changed the signal mask, we need to restore the original one. >> * In case we've got a signal while waiting, we do not restore the >> * signal mask yet, and we allow do_signal() to deliver the signal on >> * the way back to userspace, before the signal mask is restored. >> */ >> if (sigmask) { >> ### This err has not been changed since ep_poll() >> ### So if there is a signal before this point, but >> err = 0, then we goto else. >> if (err == -EINTR) { >> memcpy(>saved_sigmask, , >> sizeof(sigsaved)); >> set_restore_sigmask(); >> } else >> This is a problem if there is signal >> pending that is sigmask should block. >>### This is the whole reason we have >> current->saved_sigmask? >> set_current_blocked(); >> } > > What happens if all that crap is just deleted (I presume from the > bottom of ep_wait()) ? Hmm, you have to update the saved_sigmask or the sigmask. > I'm guessing that on the way back to userspace signal handlers for > signals enabled in the process's current mask (the one specified > to epoll_pwait) get called. > Then the signal mask is loaded from current->saved_sigmask and > and enabled signal handlers are called again. Who is saving this saved_sigmask that is being restored on the way back? > No special code there that depends on the syscall result, errno > of the syscall number. I didn’t say this has anything to do with errno. -Deepa
Re: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask()
On Fri, May 24, 2019 at 9:33 AM Oleg Nesterov wrote: > > On 05/24, Deepa Dinamani wrote: > > > > On Fri, May 24, 2019 at 7:11 AM Oleg Nesterov wrote: > > > > > > On 05/23, Deepa Dinamani wrote: > > > > > > > > Ok, since there has been quite a bit of argument here, I will > > > > backtrack a little bit and maybe it will help us understand what's > > > > happening here. > > > > There are many scenarios being discussed on this thread: > > > > a. State of code before 854a6ed56839a > > > > > > I think everything was correct, > > > > There were 2 things that were wrong: > > > > 1. If an unblocked signal was received, after the ep_poll(), then the > > return status did not indicate that. > > Yes, > > > This is expected behavior > > according to man page. If this is indeed what is expected then the man > > page should note that signal will be delivered in this case and return > > code will still be 0. > > > > "EINTR > > The call was interrupted by a signal handler before either any of the > > requested events occurred or the timeout expired; see signal(7)." > > and what do you think the man page could say? Maybe clarify that a signal handler can be invoked even if the syscall return indicates a success. Maybe a crude userspace application could do something like this: sig_handler() { set global abort = 1 } poll_the_fds() { ret = epoll_pwait() if (ret) return ret if (abort) # but this abort should be ignored if ret was 0. return try_again } > This is obviously possible for any syscall, and we can't avoid this. A signal > can come right after syscall insn completes. The signal handler will be called > but this won't change $rax, user-space can see return code == 0 or anything > else. > > And this doesn't differ from the case when the signal comes before syscall > returns. But, these syscalls are depending on there signals. I would assume for the purpose of these syscalls that the execution is done when we updated the saved_sigmask. We can pick a different point per syscall like ep_poll() also, but then we need to probably make it clear for each such syscall. > > 2. The restoring of the sigmask is done right in the syscall part and > > not while exiting the syscall and if you get a blocked signal here, > > you will deliver this to userspace. > > So I assume that this time you are talking about epoll_pwait() and not > epoll_wait()... Yes. > And I simply can't understand you. But yes, if the original mask doesn't > include > the pending signal it will be delivered while the syscall can return > success/timout > or -EFAULT or anything. > > This is correct, see above. Look at the code before 854a6ed56839a: /* * If we changed the signal mask, we need to restore the original one. * In case we've got a signal while waiting, we do not restore the * signal mask yet, and we allow do_signal() to deliver the signal on * the way back to userspace, before the signal mask is restored. */ if (sigmask) { ### This err has not been changed since ep_poll() ### So if there is a signal before this point, but err = 0, then we goto else. if (err == -EINTR) { memcpy(>saved_sigmask, , sizeof(sigsaved)); set_restore_sigmask(); } else This is a problem if there is signal pending that is sigmask should block. ### This is the whole reason we have current->saved_sigmask? set_current_blocked(); } > > > > b. State after 854a6ed56839a > > > > > > obviously buggy, > > > > Ok, then can you point out what specifically was wrong with > > 854a6ed56839a? > > Cough. If nothing else the lost -EINTR? This was my theory. My basis behind the theory was [1](the issue with return value not being updated) above. And, you are saying this is ok. 854a6ed56839a also has timing differences compared to the original code. So unless we are sure what was uncovered because of 854a6ed56839a, we might just be masking a pre-existing problem by reverting it. So I think we should code review 854a6ed56839a and figure out what is wrong programatically before just reverting it. > > And, not how it could be more simple? Oh, I was not asking here. I was saying let's please discuss what's wrong before simplifying the code. -Deepa
Re: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask()
On Fri, May 24, 2019 at 7:11 AM Oleg Nesterov wrote: > > On 05/23, Deepa Dinamani wrote: > > > > Ok, since there has been quite a bit of argument here, I will > > backtrack a little bit and maybe it will help us understand what's > > happening here. > > There are many scenarios being discussed on this thread: > > a. State of code before 854a6ed56839a > > I think everything was correct, There were 2 things that were wrong: 1. If an unblocked signal was received, after the ep_poll(), then the return status did not indicate that. This is expected behavior according to man page. If this is indeed what is expected then the man page should note that signal will be delivered in this case and return code will still be 0. "EINTR The call was interrupted by a signal handler before either any of the requested events occurred or the timeout expired; see signal(7)." 2. The restoring of the sigmask is done right in the syscall part and not while exiting the syscall and if you get a blocked signal here, you will deliver this to userspace. > > b. State after 854a6ed56839a > > obviously buggy, Ok, then can you point out what specifically was wrong with 854a6ed56839a? And, not how it could be more simple? > > c. Proposed fix as per the patchset in question. > > > As per [a] and let's consider the case of epoll_pwait only first for > > simplicity. > > > > As I said before, ep_poll() is what checks for signal_pending() and is > > responsible for setting errno to -EINTR when there is a signal. > > To clarify, if do_epoll_wait() return -EINTR then signal_pending() is true, > right? Yes, the case I'm talking about is when do_epoll_wait() returns 0 and then you get a signal. > > So if a signal is received after ep_poll() and ep_poll() returns > > success, it is never noticed by the syscall during execution. > > What you are saying looks very confusing to me, I will assume that you > meant something like > > - a signal SIG_XXX was blocked before sys_epoll_pwait() was called > > - sys_epoll_pwait(sigmask) unblocks SIG_XXX according to sigmask > > - sys_epoll_pwait() calls do_epoll_wait() which returns success > > - SIG_XXX comes after that and it is "never noticed" > > Yes. Everything is correct. And see my reply to David, SIG_XXX can even > come _before_ sys_epoll_pwait() was called. No, I'm talking about a signal that was not blocked. > > So the question is does the userspace have to know about this signal > > or not. > > If userspace needs to know about SIG_XXX it should not block it, that is all. What should be the return value if a signal is detected after a fd completed? > > What [b] does is to move the signal check closer to the restoration of > > the signal. > > FOR NO REASON, afaics (to simplify, lets forget the problem with the wrong > return value you are trying to fix). As I already pointed out, the restoring of the sigmask is done during the syscall and not while exiting the syscall and if you get a blocked signal here, you will deliver this to userspace. > And even if there were ANY reason to do this, note that (with or without this > fix) the signal_pending() check inside restore_user_sigmask() can NOT help, > simply because SIG_XXX can come right after this check. This I pointed out already that we should probably make this sequence atomic. -Deepa
Re: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask()
I think you are misunderstanding what I said. You are taking things out of context. I was saying here what I did was inspired by why the syscall was designed to begin with. The syscall below refers to epoll_wait and not epoll_pwait. -Deepa On Fri, May 24, 2019 at 7:19 AM Oleg Nesterov wrote: > > On 05/23, Deepa Dinamani wrote: > > > > 1. block the signals you don't care about. > > 2. syscall() > > 3. unblock the signals blocked in 1. > > and even this part of your email is very confusing. because in this case > we can never miss a signal. I'd say > > 1. block the signals you don't care about > 2. unblock the signals which should interrupt the syscall below > 3. syscall() > 4. block the signals unblocked in 2. > > Oleg. >
Re: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask()
> Just adding a little more clarification, there is an additional change > between [a] and [b]. > As per [a] we would just restore the signal instead of changing the > saved_sigmask and the signal could get delivered right then. [b] > changes this to happen at syscall exit: Rewording above, as there seems to be a few misrepresentations: Just adding a little more clarification, there is an additional change between [a] and [b]. As per [a] we would just restore the signal mask instead of changing the saved_sigmask and the even the blocked signals could get delivered right then. [b] changes the restoration to happen at syscall exit: > void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved) > { > > > > /* >* When signals are pending, do not restore them here. >* Restoring sigmask here can lead to delivering signals > that the above >* syscalls are intended to block because of the sigmask passed in. >*/ >if (signal_pending(current)) { >current->saved_sigmask = *sigsaved; >set_restore_sigmask(); >return; > } -Deepa
Re: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask()
On Thu, May 23, 2019 at 11:06 AM Deepa Dinamani wrote: > > Ok, since there has been quite a bit of argument here, I will > backtrack a little bit and maybe it will help us understand what's > happening here. > There are many scenarios being discussed on this thread: > a. State of code before 854a6ed56839a > b. State after 854a6ed56839a > c. Proposed fix as per the patchset in question. > > Oleg, I will discuss these first and then we can discuss the > additional changes you suggested. > > Some background on why we have these syscalls that take sigmask as an > argument. This is just for the sake of completeness of the argument. > > These are particularly meant for a scenario(d) such as below: > > 1. block the signals you don't care about. > 2. syscall() > 3. unblock the signals blocked in 1. > > The problem here is that if there is a signal that is not blocked by 1 > and such a signal is delivered between 1 and 2(since they are not > atomic), the syscall in 2 might block forever as it never found out > about the signal. > > As per [a] and let's consider the case of epoll_pwait only first for > simplicity. > > As I said before, ep_poll() is what checks for signal_pending() and is > responsible for setting errno to -EINTR when there is a signal. > > So if a signal is received after ep_poll() and ep_poll() returns > success, it is never noticed by the syscall during execution. > So the question is does the userspace have to know about this signal > or not. From scenario [d] above, I would say it should, even if all > the fd's completed successfully. > This does not happen in [a]. So this is what I said was already broken. > > What [b] does is to move the signal check closer to the restoration of > the signal. This way it is good. So, if there is a signal after > ep_poll() returns success, it is noticed and the signal is delivered > when the syscall exits. But, the syscall error status itself is 0. > > So now [c] is adjusting the return values based on whether extra > signals were detected after ep_poll(). This part was needed even for > [a]. > > Let me know if this clarifies things a bit. Just adding a little more clarification, there is an additional change between [a] and [b]. As per [a] we would just restore the signal instead of changing the saved_sigmask and the signal could get delivered right then. [b] changes this to happen at syscall exit: void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved) { /* * When signals are pending, do not restore them here. * Restoring sigmask here can lead to delivering signals that the above * syscalls are intended to block because of the sigmask passed in. */ if (signal_pending(current)) { current->saved_sigmask = *sigsaved; set_restore_sigmask(); return; } -Deepa
Re: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask()
Ok, since there has been quite a bit of argument here, I will backtrack a little bit and maybe it will help us understand what's happening here. There are many scenarios being discussed on this thread: a. State of code before 854a6ed56839a b. State after 854a6ed56839a c. Proposed fix as per the patchset in question. Oleg, I will discuss these first and then we can discuss the additional changes you suggested. Some background on why we have these syscalls that take sigmask as an argument. This is just for the sake of completeness of the argument. These are particularly meant for a scenario(d) such as below: 1. block the signals you don't care about. 2. syscall() 3. unblock the signals blocked in 1. The problem here is that if there is a signal that is not blocked by 1 and such a signal is delivered between 1 and 2(since they are not atomic), the syscall in 2 might block forever as it never found out about the signal. As per [a] and let's consider the case of epoll_pwait only first for simplicity. As I said before, ep_poll() is what checks for signal_pending() and is responsible for setting errno to -EINTR when there is a signal. So if a signal is received after ep_poll() and ep_poll() returns success, it is never noticed by the syscall during execution. So the question is does the userspace have to know about this signal or not. From scenario [d] above, I would say it should, even if all the fd's completed successfully. This does not happen in [a]. So this is what I said was already broken. What [b] does is to move the signal check closer to the restoration of the signal. This way it is good. So, if there is a signal after ep_poll() returns success, it is noticed and the signal is delivered when the syscall exits. But, the syscall error status itself is 0. So now [c] is adjusting the return values based on whether extra signals were detected after ep_poll(). This part was needed even for [a]. Let me know if this clarifies things a bit. -Deepa
Re: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask()
On Wed, May 22, 2019 at 3:18 PM Chris Down wrote: > > +Cc: linux-mm, since this broke mmots tree and has been applied there > > This patch is missing a definition for signal_detected in io_cqring_wait, > which > breaks the build. This patch does not break the build. The patch the breaks the build was the v2 of this patch since there was an accidental deletion. That's what the v3 fixed. I think v3 got picked up today morning into the mm tree -Deepa
Re: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask()
On Wed, May 22, 2019 at 9:14 AM Oleg Nesterov wrote: > > On 05/22, Deepa Dinamani wrote: > > > > -Deepa > > > > > On May 22, 2019, at 8:05 AM, Oleg Nesterov wrote: > > > > > >> On 05/21, Deepa Dinamani wrote: > > >> > > >> Note that this patch returns interrupted errors (EINTR, ERESTARTNOHAND, > > >> etc) only when there is no other error. If there is a signal and an error > > >> like EINVAL, the syscalls return -EINVAL rather than the interrupted > > >> error codes. > > > > > > Ugh. I need to re-check, but at first glance I really dislike this change. > > > > > > I think we can fix the problem _and_ simplify the code. Something like > > > below. > > > The patch is obviously incomplete, it changes only only one caller of > > > set_user_sigmask(), epoll_pwait() to explain what I mean. > > > restore_user_sigmask() should simply die. Although perhaps another helper > > > makes sense to add WARN_ON(test_tsk_restore_sigmask() && !signal_pending). > > > > restore_user_sigmask() was added because of all the variants of these > > syscalls we added because of y2038 as noted in commit message: > > > > signal: Add restore_user_sigmask() > > > > Refactor the logic to restore the sigmask before the syscall > > returns into an api. > > This is useful for versions of syscalls that pass in the > > sigmask and expect the current->sigmask to be changed during > > the execution and restored after the execution of the syscall. > > > > With the advent of new y2038 syscalls in the subsequent patches, > > we add two more new versions of the syscalls (for pselect, ppoll > > and io_pgetevents) in addition to the existing native and compat > > versions. Adding such an api reduces the logic that would need to > > be replicated otherwise. > > Again, I need to re-check, will continue tomorrow. But so far I am not sure > this helper can actually help. > > > > --- a/fs/eventpoll.c > > > +++ b/fs/eventpoll.c > > > @@ -2318,19 +2318,19 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct > > > epoll_event __user *, events, > > >size_t, sigsetsize) > > > { > > >int error; > > > -sigset_t ksigmask, sigsaved; > > > > > >/* > > > * If the caller wants a certain signal mask to be set during the wait, > > > * we apply it here. > > > */ > > > -error = set_user_sigmask(sigmask, , , sigsetsize); > > > +error = set_user_sigmask(sigmask, sigsetsize); > > >if (error) > > >return error; > > > > > >error = do_epoll_wait(epfd, events, maxevents, timeout); > > > > > > -restore_user_sigmask(sigmask, ); > > > +if (error != -EINTR) > > > > As you address all the other syscalls this condition becomes more and > > more complicated. > > May be. > > > > --- a/include/linux/sched/signal.h > > > +++ b/include/linux/sched/signal.h > > > @@ -416,7 +416,6 @@ void task_join_group_stop(struct task_struct *task); > > > static inline void set_restore_sigmask(void) > > > { > > >set_thread_flag(TIF_RESTORE_SIGMASK); > > > -WARN_ON(!test_thread_flag(TIF_SIGPENDING)); > > > > So you always want do_signal() to be called? > > Why do you think so? No. This is just to avoid the warning, because with the > patch I sent set_restore_sigmask() is called "in advance". > > > You will have to check each architecture's implementation of > > do_signal() to check if that has any side effects. > > I don't think so. Why not? > > Although this is not what the patch is solving. > > Sure. But you know, after I tried to read the changelog, I am not sure > I understand what exactly you are trying to fix. Could you please explain > this part > > The behavior > before 854a6ed56839a was that the signals were dropped after the error > code was decided. This resulted in lost signals but the userspace did > not > notice it > > ? I fail to understand it, sorry. It looks as if the code was already buggy > before > that commit and it could miss a signal or something like this, but I do not > see how. Did you read the explanation pointed to in the commit text? : https://lore.kernel.org/linux-fsdevel/20190427093319.sgicqik2oqkez3wk@dcvr/ Let me know what part you don't understand and I can explain more. It would be better to understand the isssue before we start discussing the fix. -Deepa
Re: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask()
-Deepa > On May 22, 2019, at 8:05 AM, Oleg Nesterov wrote: > >> On 05/21, Deepa Dinamani wrote: >> >> Note that this patch returns interrupted errors (EINTR, ERESTARTNOHAND, >> etc) only when there is no other error. If there is a signal and an error >> like EINVAL, the syscalls return -EINVAL rather than the interrupted >> error codes. > > Ugh. I need to re-check, but at first glance I really dislike this change. > > I think we can fix the problem _and_ simplify the code. Something like below. > The patch is obviously incomplete, it changes only only one caller of > set_user_sigmask(), epoll_pwait() to explain what I mean. > restore_user_sigmask() should simply die. Although perhaps another helper > makes sense to add WARN_ON(test_tsk_restore_sigmask() && !signal_pending). restore_user_sigmask() was added because of all the variants of these syscalls we added because of y2038 as noted in commit message: signal: Add restore_user_sigmask() Refactor the logic to restore the sigmask before the syscall returns into an api. This is useful for versions of syscalls that pass in the sigmask and expect the current->sigmask to be changed during the execution and restored after the execution of the syscall. With the advent of new y2038 syscalls in the subsequent patches, we add two more new versions of the syscalls (for pselect, ppoll and io_pgetevents) in addition to the existing native and compat versions. Adding such an api reduces the logic that would need to be replicated otherwise. > > diff --git a/fs/eventpoll.c b/fs/eventpoll.c > index 4a0e98d..85f56e4 100644 > --- a/fs/eventpoll.c > +++ b/fs/eventpoll.c > @@ -2318,19 +2318,19 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct > epoll_event __user *, events, >size_t, sigsetsize) > { >int error; > -sigset_t ksigmask, sigsaved; > >/* > * If the caller wants a certain signal mask to be set during the wait, > * we apply it here. > */ > -error = set_user_sigmask(sigmask, , , sigsetsize); > +error = set_user_sigmask(sigmask, sigsetsize); >if (error) >return error; > >error = do_epoll_wait(epfd, events, maxevents, timeout); > > -restore_user_sigmask(sigmask, ); > +if (error != -EINTR) As you address all the other syscalls this condition becomes more and more complicated. > +restore_saved_sigmask(); > >return error; > } > diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h > index e412c09..1e82ae0 100644 > --- a/include/linux/sched/signal.h > +++ b/include/linux/sched/signal.h > @@ -416,7 +416,6 @@ void task_join_group_stop(struct task_struct *task); > static inline void set_restore_sigmask(void) > { >set_thread_flag(TIF_RESTORE_SIGMASK); > -WARN_ON(!test_thread_flag(TIF_SIGPENDING)); So you always want do_signal() to be called? You will have to check each architecture's implementation of do_signal() to check if that has any side effects. Although this is not what the patch is solving. What we want is to adjust return codes on all these syscalls to user and not drop signals. Please check v2/v3 of the patch. I've updated the commit text to provide more context into what is actually being fixed here. If we really want to simplify, we should rewrite all the internal logic of all the ppoll, epoll_pwait, io_pgetevent syscall internal handling where we set the error code. As new versions of syscalls were added, the internal logic got reworked rather hapazardly. But, as the current issue points out, these are delicate changes. -Deepa > } > > static inline void clear_tsk_restore_sigmask(struct task_struct *tsk) > @@ -447,7 +446,6 @@ static inline bool test_and_clear_restore_sigmask(void) > static inline void set_restore_sigmask(void) > { >current->restore_sigmask = true; > -WARN_ON(!test_thread_flag(TIF_SIGPENDING)); > } > static inline void clear_tsk_restore_sigmask(struct task_struct *tsk) > { > diff --git a/include/linux/signal.h b/include/linux/signal.h > index 9702016..887cea6 100644 > --- a/include/linux/signal.h > +++ b/include/linux/signal.h > @@ -273,8 +273,7 @@ extern int group_send_sig_info(int sig, struct > kernel_siginfo *info, > struct task_struct *p, enum pid_type type); > extern int __group_send_sig_info(int, struct kernel_siginfo *, struct > task_struct *); > extern int sigprocmask(int, sigset_t *, sigset_t *); > -extern int set_user_sigmask(const sigset_t __user *usigmask, sigset_t *set, > -sigset_t *oldset, size_t sigsetsize); > +extern int set_user_sigmask(const sigset_t __user *umask, size_t sigsetsize); > extern void restore_user_sigmask(const void __user *usigmask, > sigset_t *s
[PATCH v3] signal: Adjust error codes according to restore_user_sigmask()
A regression caused by 854a6ed56839 ("signal: Add restore_user_sigmask()") caused users of epoll_pwait, io_pgetevents, and ppoll to notice a latent problem in signal handling during these syscalls. That patch (854a6ed56839) moved the signal_pending() check closer to restoring of the user sigmask. But, it failed to update the error code accordingly. From the userspace perspective, the patch increased the time window for the signal discovery and subsequent delivery to the userspace, but did not always adjust the errno afterwards. The behavior before 854a6ed56839a was that the signals were dropped after the error code was decided. This resulted in lost signals but the userspace did not notice it as the syscalls had finished executing the core functionality and the error codes returned notified success. For all the syscalls that receive a sigmask from the userland, the user sigmask is to be in effect through the syscall execution. At the end of syscall, sigmask of the current process is restored to what it was before the switch over to user sigmask. But, for this to be true in practice, the sigmask should be restored only at the the point we change the saved_sigmask. Anything before that loses signals. And, anything after is just pointless as the signal is already lost by restoring the sigmask. Detailed issue discussion permalink: https://lore.kernel.org/linux-fsdevel/20190427093319.sgicqik2oqkez3wk@dcvr/ Note that this patch returns interrupted errors (EINTR, ERESTARTNOHAND, etc) only when there is no other error. If there is a signal and an error like EINVAL, the syscalls return -EINVAL rather than the interrupted error codes. Reported-by: Eric Wong Fixes: 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add restore_user_sigmask()") Signed-off-by: Deepa Dinamani Reviewed-by: Davidlohr Bueso Cc: # 5.0.x Cc: # 5.1.x --- Applies to 5.1 using patch -p1. Backport to 5.0 requires dropping the io_uring.c diff. Changes since v2: * added back the reviewed-by tag * fixed missing local definition in io_uring.c Changes since v1: * updated the commit text for more context of the pre-existing condition * added stable tags as requested fs/aio.c | 24 fs/eventpoll.c | 14 ++ fs/io_uring.c | 7 +-- fs/select.c| 37 + include/linux/signal.h | 2 +- kernel/signal.c| 13 ++--- 6 files changed, 59 insertions(+), 38 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 3490d1fa0e16..ebd2b1980161 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -2095,7 +2095,7 @@ SYSCALL_DEFINE6(io_pgetevents, struct __aio_sigset ksig = { NULL, }; sigset_tksigmask, sigsaved; struct timespec64 ts; - int ret; + int ret, signal_detected; if (timeout && unlikely(get_timespec64(, timeout))) return -EFAULT; @@ -2108,8 +2108,8 @@ SYSCALL_DEFINE6(io_pgetevents, return ret; ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? : NULL); - restore_user_sigmask(ksig.sigmask, ); - if (signal_pending(current) && !ret) + signal_detected = restore_user_sigmask(ksig.sigmask, ); + if (signal_detected && !ret) ret = -ERESTARTNOHAND; return ret; @@ -2128,7 +2128,7 @@ SYSCALL_DEFINE6(io_pgetevents_time32, struct __aio_sigset ksig = { NULL, }; sigset_tksigmask, sigsaved; struct timespec64 ts; - int ret; + int ret, signal_detected; if (timeout && unlikely(get_old_timespec32(, timeout))) return -EFAULT; @@ -2142,8 +2142,8 @@ SYSCALL_DEFINE6(io_pgetevents_time32, return ret; ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? : NULL); - restore_user_sigmask(ksig.sigmask, ); - if (signal_pending(current) && !ret) + signal_detected = restore_user_sigmask(ksig.sigmask, ); + if (signal_detected && !ret) ret = -ERESTARTNOHAND; return ret; @@ -2193,7 +2193,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents, struct __compat_aio_sigset ksig = { NULL, }; sigset_t ksigmask, sigsaved; struct timespec64 t; - int ret; + int ret, signal_detected; if (timeout && get_old_timespec32(, timeout)) return -EFAULT; @@ -2206,8 +2206,8 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents, return ret; ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? : NULL); - restore_user_sigmask(ksig.sigmask, ); - if (signal_pending(current) && !ret) + signal_detected = restore_user_sigmask(ksig.sigmask, ); + if (signal_detected && !ret) ret = -ERESTARTNOHAND; return ret; @@ -2226,7 +2226,7 @@ C
[PATCH v2] signal: Adjust error codes according to restore_user_sigmask()
A regression caused by 854a6ed56839 ("signal: Add restore_user_sigmask()") caused users of epoll_pwait, io_pgetevents, and ppoll to notice a latent problem in signal handling during these syscalls. That patch (854a6ed56839) moved the signal_pending() check closer to restoring of the user sigmask. But, it failed to update the error code accordingly. From the userspace perspective, the patch increased the time window for the signal discovery and subsequent delivery to the userspace, but did not always adjust the errno afterwards. The behavior before 854a6ed56839a was that the signals were dropped after the error code was decided. This resulted in lost signals but the userspace did not notice it as the syscalls had finished executing the core functionality and the error codes returned notified success. For all the syscalls that receive a sigmask from the userland, the user sigmask is to be in effect through the syscall execution. At the end of syscall, sigmask of the current process is restored to what it was before the switch over to user sigmask. But, for this to be true in practice, the sigmask should be restored only at the the point we change the saved_sigmask. Anything before that loses signals. And, anything after is just pointless as the signal is already lost by restoring the sigmask. Detailed issue discussion permalink: https://lore.kernel.org/linux-fsdevel/20190427093319.sgicqik2oqkez3wk@dcvr/ Note that this patch returns interrupted errors (EINTR, ERESTARTNOHAND, etc) only when there is no other error. If there is a signal and an error like EINVAL, the syscalls return -EINVAL rather than the interrupted error codes. Reported-by: Eric Wong Fixes: 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add restore_user_sigmask()") Signed-off-by: Deepa Dinamani Cc: # 5.0.x Cc: # 5.1.x --- Changes since v1: * updated the commit text for more context of the pre-existing condition * added stable tags as requested fs/aio.c | 24 fs/eventpoll.c | 14 ++ fs/io_uring.c | 7 +-- fs/select.c| 37 + include/linux/signal.h | 2 +- kernel/signal.c| 13 ++--- 6 files changed, 59 insertions(+), 38 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 3490d1fa0e16..ebd2b1980161 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -2095,7 +2095,7 @@ SYSCALL_DEFINE6(io_pgetevents, struct __aio_sigset ksig = { NULL, }; sigset_tksigmask, sigsaved; struct timespec64 ts; - int ret; + int ret, signal_detected; if (timeout && unlikely(get_timespec64(, timeout))) return -EFAULT; @@ -2108,8 +2108,8 @@ SYSCALL_DEFINE6(io_pgetevents, return ret; ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? : NULL); - restore_user_sigmask(ksig.sigmask, ); - if (signal_pending(current) && !ret) + signal_detected = restore_user_sigmask(ksig.sigmask, ); + if (signal_detected && !ret) ret = -ERESTARTNOHAND; return ret; @@ -2128,7 +2128,7 @@ SYSCALL_DEFINE6(io_pgetevents_time32, struct __aio_sigset ksig = { NULL, }; sigset_tksigmask, sigsaved; struct timespec64 ts; - int ret; + int ret, signal_detected; if (timeout && unlikely(get_old_timespec32(, timeout))) return -EFAULT; @@ -2142,8 +2142,8 @@ SYSCALL_DEFINE6(io_pgetevents_time32, return ret; ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? : NULL); - restore_user_sigmask(ksig.sigmask, ); - if (signal_pending(current) && !ret) + signal_detected = restore_user_sigmask(ksig.sigmask, ); + if (signal_detected && !ret) ret = -ERESTARTNOHAND; return ret; @@ -2193,7 +2193,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents, struct __compat_aio_sigset ksig = { NULL, }; sigset_t ksigmask, sigsaved; struct timespec64 t; - int ret; + int ret, signal_detected; if (timeout && get_old_timespec32(, timeout)) return -EFAULT; @@ -2206,8 +2206,8 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents, return ret; ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? : NULL); - restore_user_sigmask(ksig.sigmask, ); - if (signal_pending(current) && !ret) + signal_detected = restore_user_sigmask(ksig.sigmask, ); + if (signal_detected && !ret) ret = -ERESTARTNOHAND; return ret; @@ -2226,7 +2226,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_time64, struct __compat_aio_sigset ksig = { NULL, }; sigset_t ksigmask, sigsaved; struct timespec64 t; - int ret; + int ret, signal_detected;
Re: [PATCH 1/1] signal: Adjust error codes according to restore_user_sigmask()
On Tue, May 21, 2019 at 5:35 PM Deepa Dinamani wrote: > > > > > It's been 2 weeks and this fix hasn't appeared in mmots / mmotm. > > > > I also noticed it's missing Cc: for stable@ (below) > > > > > > Why is a -stable backport needed? I see some talk above about lost > > > signals but it is unclear whether these are being observed after fixing > > > the regression caused by 854a6ed56839a. > > > > I guess Deepa's commit messages wasn't clear... > > I suggest prepending this as the first paragraph to Deepa's > > original message: > > > > This fixes a bug introduced with 854a6ed56839a which caused > > EINTR to not be reported to userspace on epoll_pwait. Failure > > to report EINTR to userspace caused problems with user code > > which relies on EINTR to run signal handlers. > > This is not what the patch fixed. > > The notable change is userspace is that now whenever a signal is > delivered, the return value is adjusted to reflect the signal > delivery. > Prior to this patch, there was a window, however small it might have > been, when the signal was delivered but the errono was not adjusted > appropriately. > This is because of the regression caused by 854a6ed56839a, which > extended the window of delivery of signals that was delivered to > userspace. > The patch also fixes more than sys_epoll_pwait(). > > I will post a follow up patch. > > > > > > IOW, can we please have a changelog which has a clear and complete > > > description of the user-visible effects of the change. > > > > > > And please Cc Oleg. > > I will cc Oleg. Also the commit message was brief because the issue was explained in the link that was quoted in the commit message. Detailed issue discussion permalink: https://lore.kernel.org/linux-fsdevel/20190427093319.sgicqik2oqkez3wk@dcvr/ -Deepa
Re: [PATCH 1/1] signal: Adjust error codes according to restore_user_sigmask()
> > > It's been 2 weeks and this fix hasn't appeared in mmots / mmotm. > > > I also noticed it's missing Cc: for stable@ (below) > > > > Why is a -stable backport needed? I see some talk above about lost > > signals but it is unclear whether these are being observed after fixing > > the regression caused by 854a6ed56839a. > > I guess Deepa's commit messages wasn't clear... > I suggest prepending this as the first paragraph to Deepa's > original message: > > This fixes a bug introduced with 854a6ed56839a which caused > EINTR to not be reported to userspace on epoll_pwait. Failure > to report EINTR to userspace caused problems with user code > which relies on EINTR to run signal handlers. This is not what the patch fixed. The notable change is userspace is that now whenever a signal is delivered, the return value is adjusted to reflect the signal delivery. Prior to this patch, there was a window, however small it might have been, when the signal was delivered but the errono was not adjusted appropriately. This is because of the regression caused by 854a6ed56839a, which extended the window of delivery of signals that was delivered to userspace. The patch also fixes more than sys_epoll_pwait(). I will post a follow up patch. > > > IOW, can we please have a changelog which has a clear and complete > > description of the user-visible effects of the change. > > > > And please Cc Oleg. I will cc Oleg.
[PATCH 1/1] signal: Adjust error codes according to restore_user_sigmask()
For all the syscalls that receive a sigmask from the userland, the user sigmask is to be in effect through the syscall execution. At the end of syscall, sigmask of the current process is restored to what it was before the switch over to user sigmask. But, for this to be true in practice, the sigmask should be restored only at the the point we change the saved_sigmask. Anything before that loses signals. And, anything after is just pointless as the signal is already lost by restoring the sigmask. The inherent issue was detected because of a regression caused by 854a6ed56839a. The patch moved the signal_pending() check closer to restoring of the user sigmask. But, it failed to update the error code accordingly. Detailed issue discussion permalink: https://lore.kernel.org/linux-fsdevel/20190427093319.sgicqik2oqkez3wk@dcvr/ Note that the patch returns interrupted errors (EINTR, ERESTARTNOHAND, etc) only when there is no other error. If there is a signal and an error like EINVAL, the syscalls return -EINVAL rather than the interrupted error codes. The sys_io_uring_enter() seems to be returning success when there is a signal and the queue is not empty. This seems to be a bug. I will follow up with a separate patch for that. Reported-by: Eric Wong Fixes: 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add restore_user_sigmask()") Signed-off-by: Deepa Dinamani Reviewed-by: Davidlohr Bueso --- fs/aio.c | 24 fs/eventpoll.c | 14 ++ fs/io_uring.c | 9 ++--- fs/select.c| 37 + include/linux/signal.h | 2 +- kernel/signal.c| 13 ++--- 6 files changed, 60 insertions(+), 39 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 3490d1fa0e16..ebd2b1980161 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -2095,7 +2095,7 @@ SYSCALL_DEFINE6(io_pgetevents, struct __aio_sigset ksig = { NULL, }; sigset_tksigmask, sigsaved; struct timespec64 ts; - int ret; + int ret, signal_detected; if (timeout && unlikely(get_timespec64(, timeout))) return -EFAULT; @@ -2108,8 +2108,8 @@ SYSCALL_DEFINE6(io_pgetevents, return ret; ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? : NULL); - restore_user_sigmask(ksig.sigmask, ); - if (signal_pending(current) && !ret) + signal_detected = restore_user_sigmask(ksig.sigmask, ); + if (signal_detected && !ret) ret = -ERESTARTNOHAND; return ret; @@ -2128,7 +2128,7 @@ SYSCALL_DEFINE6(io_pgetevents_time32, struct __aio_sigset ksig = { NULL, }; sigset_tksigmask, sigsaved; struct timespec64 ts; - int ret; + int ret, signal_detected; if (timeout && unlikely(get_old_timespec32(, timeout))) return -EFAULT; @@ -2142,8 +2142,8 @@ SYSCALL_DEFINE6(io_pgetevents_time32, return ret; ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? : NULL); - restore_user_sigmask(ksig.sigmask, ); - if (signal_pending(current) && !ret) + signal_detected = restore_user_sigmask(ksig.sigmask, ); + if (signal_detected && !ret) ret = -ERESTARTNOHAND; return ret; @@ -2193,7 +2193,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents, struct __compat_aio_sigset ksig = { NULL, }; sigset_t ksigmask, sigsaved; struct timespec64 t; - int ret; + int ret, signal_detected; if (timeout && get_old_timespec32(, timeout)) return -EFAULT; @@ -2206,8 +2206,8 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents, return ret; ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? : NULL); - restore_user_sigmask(ksig.sigmask, ); - if (signal_pending(current) && !ret) + signal_detected = restore_user_sigmask(ksig.sigmask, ); + if (signal_detected && !ret) ret = -ERESTARTNOHAND; return ret; @@ -2226,7 +2226,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_time64, struct __compat_aio_sigset ksig = { NULL, }; sigset_t ksigmask, sigsaved; struct timespec64 t; - int ret; + int ret, signal_detected; if (timeout && get_timespec64(, timeout)) return -EFAULT; @@ -2239,8 +2239,8 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_time64, return ret; ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? : NULL); - restore_user_sigmask(ksig.sigmask, ); - if (signal_pending(current) && !ret) + signal_detected = restore_user_sigmask(ksig.sigmask, ); + if (signal_detected && !ret) ret = -ERESTARTNOHAND; return ret; diff --git a/fs/even
Re: [PATCH] signal: Adjust error codes according to restore_user_sigmask()
The original patch was merged through the tip tree. Adding tglx just in case. I will post the revised patch to everyone on this thread. > >For all the syscalls that receive a sigmask from the userland, > >the user sigmask is to be in effect through the syscall execution. > >At the end of syscall, sigmask of the current process is restored > >to what it was before the switch over to user sigmask. > >But, for this to be true in practice, the sigmask should be restored > >only at the the point we change the saved_sigmask. Anything before > >that loses signals. And, anything after is just pointless as the > >signal is already lost by restoring the sigmask. > > > >The issue was detected because of a regression caused by 854a6ed56839a. > >The patch moved the signal_pending() check closer to restoring of the > >user sigmask. But, it failed to update the error code accordingly. > > > >Detailed issue discussion permalink: > >https://lore.kernel.org/linux-fsdevel/20190427093319.sgicqik2oqkez3wk@dcvr/ > > > >Note that the patch returns interrupted errors (EINTR, ERESTARTNOHAND, > >etc) only when there is no other error. If there is a signal and an error > >like EINVAL, the syscalls return -EINVAL rather than the interrupted > >error codes. > > Thanks for doing this; I've reviewed the epoll bits (along with the overall > idea) and it seems like a sane alternative to reverting the offending patch. Sorry maybe the description wasn't clear. What I actually am saying is that all these syscalls were dropping signals before and 854a6ed56839a4 actually did things right by making sure they did not do so. But, there was a bug in that it did not communicate to userspace when the error code was not already set. However, we could still argue that the check and flipping of the mask isn't atomic and there is still a way this can theoretically happen. But, this will also mean that these syscalls will slow down further. But, they are already expected to be slow so maybe it doesn't matter. I will note this down in the commit text. I don't think reverting was an alternative. 854a6ed56839a4 exposed a bug that was already there. > Feel free to add: > > Reviewed-by: Davidlohr Bueso > > A small nit, I think we should be a bit more verbose about the return > semantics > of restore_user_sigmask()... see at the end. > > > > >Reported-by: Eric Wong > >Fixes: 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add > >restore_user_sigmask()") > >Signed-off-by: Deepa Dinamani > >--- a/kernel/signal.c > >+++ b/kernel/signal.c > >@@ -2845,15 +2845,16 @@ EXPORT_SYMBOL(set_compat_user_sigmask); > > * usigmask: sigmask passed in from userland. > > * sigsaved: saved sigmask when the syscall started and changed the sigmask > > to > > * usigmask. > >+ * returns 1 in case a pending signal is detected. > > How about: > > " > Callers must carefully coordinate between signal_pending() checks between the > actual system call and restore_user_sigmask() - for which a new pending signal > may come in between calls and therefore must consider this for returning a > proper > error code. > > Returns 1 in case a signal pending is detected, otherwise 0. Ok, I will add more verbiage here. Thanks, Deepa
Re: [PATCH] signal: Adjust error codes according to restore_user_sigmask()
On Thu, May 2, 2019 at 11:34 PM Eric Wong wrote: > > Deepa Dinamani wrote: > > Sorry, I was trying a new setup at work. I should have tested it. > > My bad, I've checked this one. > > Thanks. This is good w.r.t. epoll_pwait and ppoll when applied > to 5.0.11 (no fs/io_uring.c). Thanks. Al, would you be picking up this patch? I can resend it. > Can't think of anything which uses pselect or aio on my system; > but it looks right to me. > > > I've removed the questionable reported-by, since we're not sure if > > it is actually the same issue. > > Yes, I hope Omar can test this, too. Omar, would you be able to test this? Thanks, Deepa
[PATCH] signal: Adjust error codes according to restore_user_sigmask()
For all the syscalls that receive a sigmask from the userland, the user sigmask is to be in effect through the syscall execution. At the end of syscall, sigmask of the current process is restored to what it was before the switch over to user sigmask. But, for this to be true in practice, the sigmask should be restored only at the the point we change the saved_sigmask. Anything before that loses signals. And, anything after is just pointless as the signal is already lost by restoring the sigmask. The issue was detected because of a regression caused by 854a6ed56839a. The patch moved the signal_pending() check closer to restoring of the user sigmask. But, it failed to update the error code accordingly. Detailed issue discussion permalink: https://lore.kernel.org/linux-fsdevel/20190427093319.sgicqik2oqkez3wk@dcvr/ Note that the patch returns interrupted errors (EINTR, ERESTARTNOHAND, etc) only when there is no other error. If there is a signal and an error like EINVAL, the syscalls return -EINVAL rather than the interrupted error codes. Reported-by: Eric Wong Fixes: 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add restore_user_sigmask()") Signed-off-by: Deepa Dinamani --- Sorry, I was trying a new setup at work. I should have tested it. My bad, I've checked this one. I've removed the questionable reported-by, since we're not sure if it is actually the same issue. fs/aio.c | 24 fs/eventpoll.c | 14 ++ fs/io_uring.c | 9 ++--- fs/select.c| 37 + include/linux/signal.h | 2 +- kernel/signal.c| 8 +--- 6 files changed, 55 insertions(+), 39 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 38b741aef0bf..7de2f7573d55 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -2133,7 +2133,7 @@ SYSCALL_DEFINE6(io_pgetevents, struct __aio_sigset ksig = { NULL, }; sigset_tksigmask, sigsaved; struct timespec64 ts; - int ret; + int ret, signal_detected; if (timeout && unlikely(get_timespec64(, timeout))) return -EFAULT; @@ -2146,8 +2146,8 @@ SYSCALL_DEFINE6(io_pgetevents, return ret; ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? : NULL); - restore_user_sigmask(ksig.sigmask, ); - if (signal_pending(current) && !ret) + signal_detected = restore_user_sigmask(ksig.sigmask, ); + if (signal_detected && !ret) ret = -ERESTARTNOHAND; return ret; @@ -2166,7 +2166,7 @@ SYSCALL_DEFINE6(io_pgetevents_time32, struct __aio_sigset ksig = { NULL, }; sigset_tksigmask, sigsaved; struct timespec64 ts; - int ret; + int ret, signal_detected; if (timeout && unlikely(get_old_timespec32(, timeout))) return -EFAULT; @@ -2180,8 +2180,8 @@ SYSCALL_DEFINE6(io_pgetevents_time32, return ret; ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? : NULL); - restore_user_sigmask(ksig.sigmask, ); - if (signal_pending(current) && !ret) + signal_detected = restore_user_sigmask(ksig.sigmask, ); + if (signal_detected && !ret) ret = -ERESTARTNOHAND; return ret; @@ -2231,7 +2231,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents, struct __compat_aio_sigset ksig = { NULL, }; sigset_t ksigmask, sigsaved; struct timespec64 t; - int ret; + int ret, signal_detected; if (timeout && get_old_timespec32(, timeout)) return -EFAULT; @@ -2244,8 +2244,8 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents, return ret; ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? : NULL); - restore_user_sigmask(ksig.sigmask, ); - if (signal_pending(current) && !ret) + signal_detected = restore_user_sigmask(ksig.sigmask, ); + if (signal_detected && !ret) ret = -ERESTARTNOHAND; return ret; @@ -2264,7 +2264,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_time64, struct __compat_aio_sigset ksig = { NULL, }; sigset_t ksigmask, sigsaved; struct timespec64 t; - int ret; + int ret, signal_detected; if (timeout && get_timespec64(, timeout)) return -EFAULT; @@ -2277,8 +2277,8 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_time64, return ret; ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? : NULL); - restore_user_sigmask(ksig.sigmask, ); - if (signal_pending(current) && !ret) + signal_detected = restore_user_sigmask(ksig.sigmask, ); + if (signal_detected && !ret) ret = -ERESTARTNOHAND; return ret; diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 4a0e98d
Re: Strange issues with epoll since 5.0
Eric, Can you please help test this? If this solves your problem, I can post the fix. Thanks, - Deepa -8<--- Subject: [PATCH] signal: Adjust error codes according to restore_user_sigmask() For all the syscalls that receive a sigmask from the userland, the user sigmask is to be in effect through the syscall execution. At the end of syscall, sigmask of the current process is restored to what it was before the switch over to user sigmask. But, for this to be true in practice, the sigmask should be restored only at the the point we change the saved_sigmask. Anything before that loses signals. And, anything after is just pointless as the signal is already lost by restoring the sigmask. The issue was detected because of a regression caused by 854a6ed56839a. The patch moved the signal_pending() check closer to restoring of the user sigmask. But, it failed to update the error code accordingly. Detailed issue discussion permalink: https://lore.kernel.org/linux-fsdevel/20190427093319.sgicqik2oqkez3wk@dcvr/ Note that the patch returns interrupted errors (EINTR, ERESTARTNOHAND, etc) only when there is no other error. If there is a signal and an error like EINVAL, the syscalls return -EINVAL rather than the interrupted error codes. Reported-by: Omar Kilani Reported-by: Eric Wong Fixes: 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add restore_user_sigmask()") Signed-off-by: Deepa Dinamani --- fs/aio.c | 24 fs/eventpoll.c | 14 ++ fs/io_uring.c | 9 ++--- fs/select.c| 37 + include/linux/signal.h | 2 +- kernel/signal.c| 8 +--- 6 files changed, 55 insertions(+), 39 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 38b741aef0bf..7de2f7573d55 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -2133,7 +2133,7 @@ SYSCALL_DEFINE6(io_pgetevents, struct __aio_sigsetksig = { NULL, }; sigset_tksigmask, sigsaved; struct timespec64ts; -int ret; +int ret, signal_detected; if (timeout && unlikely(get_timespec64(, timeout))) return -EFAULT; @@ -2146,8 +2146,8 @@ SYSCALL_DEFINE6(io_pgetevents, return ret; ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? : NULL); -restore_user_sigmask(ksig.sigmask, ); -if (signal_pending(current) && !ret) +signal_detected = restore_user_sigmask(ksig.sigmask, ); +if (signal_detected && !ret) ret = -ERESTARTNOHAND; return ret; @@ -2166,7 +2166,7 @@ SYSCALL_DEFINE6(io_pgetevents_time32, struct __aio_sigsetksig = { NULL, }; sigset_tksigmask, sigsaved; struct timespec64ts; -int ret; +int ret, signal_detected; if (timeout && unlikely(get_old_timespec32(, timeout))) return -EFAULT; @@ -2180,8 +2180,8 @@ SYSCALL_DEFINE6(io_pgetevents_time32, return ret; ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? : NULL); -restore_user_sigmask(ksig.sigmask, ); -if (signal_pending(current) && !ret) +signal_detected = restore_user_sigmask(ksig.sigmask, ); +if (signal_detected && !ret) ret = -ERESTARTNOHAND; return ret; @@ -2231,7 +2231,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents, struct __compat_aio_sigset ksig = { NULL, }; sigset_t ksigmask, sigsaved; struct timespec64 t; -int ret; +int ret, signal_detected; if (timeout && get_old_timespec32(, timeout)) return -EFAULT; @@ -2244,8 +2244,8 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents, return ret; ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? : NULL); -restore_user_sigmask(ksig.sigmask, ); -if (signal_pending(current) && !ret) +signal_detected = restore_user_sigmask(ksig.sigmask, ); +if (signal_detected && !ret) ret = -ERESTARTNOHAND; return ret; @@ -2264,7 +2264,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_time64, struct __compat_aio_sigset ksig = { NULL, }; sigset_t ksigmask, sigsaved; struct timespec64 t; -int ret; +int ret, signal_detected; if (timeout && get_timespec64(, timeout)) return -EFAULT; @@ -2277,8 +2277,8 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_time64, return ret; ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? : NULL); -restore_user_sigmask(ksig.sigmask, ); -if (signal_pending(current) && !ret) +signal_detected = restore_user_sigmask(ksig.sigmask, ); +if (signal_detected && !ret) ret = -ERESTARTNOHAND; return ret; diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 4a0e98d87fcc..fe5a0724b417 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -2317,7 +2317,7 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events, int, maxevents, int, timeout, const sigset_t __us
Re: Strange issues with epoll since 5.0
On Wed, May 1, 2019 at 1:48 PM Eric Wong wrote: > > Deepa Dinamani wrote: > > So here is my analysis: > > > > > So the 854a6ed56839a40f6 seems to be better than the original code in > > that it detects the signal. > > OTOH, does matter to anybody that a signal is detected slightly > sooner than it would've been, otherwise? The original code drops the signal altogether. This is because it overwrites the current's sigmask with the provided one(set_current_blocked()). If a signal bit was set, it is lost forever. It does not detect it sooner. The check for pending signal is sooner and not just before the syscall returns. This is what the patch in discussion does: check for signals just before returning. > > > But, the problem is that it doesn't > > communicate it to the userspace. > > Yup, that's a big problem :) > > > So a patch like below solves the problem. This is incomplete. I'll > > verify and send you a proper fix you can test soon. This is just for > > the sake of discussion: > > > > diff --git a/fs/eventpoll.c b/fs/eventpoll.c > > index 4a0e98d87fcc..63a387329c3d 100644 > > --- a/fs/eventpoll.c > > +++ b/fs/eventpoll.c > > @@ -2317,7 +2317,7 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct > > epoll_event __user *, events, > > int, maxevents, int, timeout, const sigset_t __user *, > > sigmask, > > size_t, sigsetsize) > > { > > - int error; > > + int error, signal_detected; > > sigset_t ksigmask, sigsaved; > > > > /* > > @@ -2330,7 +2330,10 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct > > epoll_event __user *, events, > > > > error = do_epoll_wait(epfd, events, maxevents, timeout); > > > > - restore_user_sigmask(sigmask, ); > > + signal_detected = restore_user_sigmask(sigmask, ); > > + > > + if (signal_detected && !error) > > + return -EITNR; > > > > return error; > > Looks like a reasonable API. > > > @@ -2862,7 +2862,7 @@ void restore_user_sigmask(const void __user > > *usigmask, sigset_t *sigsaved) > > if (signal_pending(current)) { > > current->saved_sigmask = *sigsaved; > > set_restore_sigmask(); > > - return; > > + return 0; > > Shouldn't that "return 1" if a signal is pending? Yep, I meant this to be 1. -Deepa
Re: Strange issues with epoll since 5.0
Thanks for trying the fix. So here is my analysis: Let's start with epoll_pwait: ep_poll() is what checks for signal_pending() and is responsible for setting errno to -EINTR when there is a signal. So if a signal is received after ep_poll(), it is never noticed by the syscall during execution. Moreover, the original code before 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add restore_user_sigmask()"), had the following call flow: error = do_epoll_wait(epfd, events, maxevents, timeout); Here error = 0 if the signal is received after ep_poll(). - /* -* If we changed the signal mask, we need to restore the original one. -* In case we've got a signal while waiting, we do not restore the -* signal mask yet, and we allow do_signal() to deliver the signal on -* the way back to userspace, before the signal mask is restored. -*/ - if (sigmask) { - if (error == -EINTR) { - memcpy(>saved_sigmask, , - sizeof(sigsaved)); - set_restore_sigmask(); - } else Execution reaches this else statement and the sigmask is restored directly, ignoring the newly generated signal. The signal is never handled. - set_current_blocked(); - } In the current execution flow: error = do_epoll_wait(epfd, events, maxevents, timeout); error is still 0 as ep_poll() did not detect the signal. restore_user_sigmask(sigmask, , error == -EITNR); void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved) { if (!usigmask) return; /* * When signals are pending, do not restore them here. * Restoring sigmask here can lead to delivering signals that the above * syscalls are intended to block because of the sigmask passed in. */ if (signal_pending(current)) { execution path reaches here and do_signal() actually delivers the signal to userspace. But the errno is not set. So the userspace fails to notice it. current->saved_sigmask = *sigsaved; set_restore_sigmask(); return; } /* * This is needed because the fast syscall return path does not restore * saved_sigmask when signals are not pending. */ set_current_blocked(sigsaved); } For other syscalls in the same commit: sys_io_pgetevents() does not seem to have this problem as we are still checking signal_pending() here. sys_pselect6() seems to have a similar problem. The changes to sys_pselect6() also impact sys_select() as the changes are in the common code path. So the 854a6ed56839a40f6 seems to be better than the original code in that it detects the signal. But, the problem is that it doesn't communicate it to the userspace. So a patch like below solves the problem. This is incomplete. I'll verify and send you a proper fix you can test soon. This is just for the sake of discussion: diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 4a0e98d87fcc..63a387329c3d 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -2317,7 +2317,7 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events, int, maxevents, int, timeout, const sigset_t __user *, sigmask, size_t, sigsetsize) { - int error; + int error, signal_detected; sigset_t ksigmask, sigsaved; /* @@ -2330,7 +2330,10 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events, error = do_epoll_wait(epfd, events, maxevents, timeout); - restore_user_sigmask(sigmask, ); + signal_detected = restore_user_sigmask(sigmask, ); + + if (signal_detected && !error) + return -EITNR; return error; } @@ -2342,7 +2345,7 @@ COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd, const compat_sigset_t __user *, sigmask, compat_size_t, sigsetsize) { - long err; + long err, signal_detected; sigset_t ksigmask, sigsaved; /* @@ -2355,7 +2358,10 @@ COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, epfd, err = do_epoll_wait(epfd, events, maxevents, timeout); - restore_user_sigmask(sigmask, ); + signal_detected = restore_user_sigmask(sigmask, ); + + if (signal_detected && !err) + return -EITNR; return err; } diff --git a/kernel/signal.c b/kernel/signal.c index 3a9e41197d46..c76ab2a52ebf 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2849,11 +2849,11 @@ EXPORT_SYMBOL(set_compat_user_sigmask); * This is useful for syscalls such as ppoll, pselect, io_pgetevents and * epoll_pwait where a new sigmask is passed in from userland for the syscalls. */ -void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved) +int restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved) {
Re: Strange issues with epoll since 5.0
I was also not able to reproduce this. Arnd and I were talking about this today morning. Here is something Arnd noticed: If there was a signal after do_epoll_wait(), we never were not entering the if (err = -EINTR) at all before. But, now we do. We could try with the below patch: diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 4a0e98d87fcc..5cfb800cf598 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -2330,7 +2330,7 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, events, error = do_epoll_wait(epfd, events, maxevents, timeout); - restore_user_sigmask(sigmask, ); + restore_user_sigmask(sigmask, , error == -EITNR); return error; } diff --git a/kernel/signal.c b/kernel/signal.c index 3a9e41197d46..4a8f96f5c1c0 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2849,7 +2849,7 @@ EXPORT_SYMBOL(set_compat_user_sigmask); * This is useful for syscalls such as ppoll, pselect, io_pgetevents and * epoll_pwait where a new sigmask is passed in from userland for the syscalls. */ -void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved) +void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved, int sig_pending) { if (!usigmask) @@ -2859,7 +2859,7 @@ void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved) * Restoring sigmask here can lead to delivering signals that the above * syscalls are intended to block because of the sigmask passed in. */ - if (signal_pending(current)) { + if (sig_pending) { current->saved_sigmask = *sigsaved; set_restore_sigmask(); If this works that means we know what is busted. I'm not sure what the hang in the userspace is about. Is it because the syscall did not return an error or the particular signal was blocked etc. There are also a few timing differences also. But, can we try this first? -Deepa
Re: Strange issues with epoll since 5.0
I tried to replicate the failure on qemu. I do not see the failure with N=32. Does it work for N < 32? Does any other signal work? Are there any other architectures that fail? Could you help me figure out how to run just the one test that is failing? -Deepa
Re: [PATCH v2] posix-cpu-timers: Avoid undefined behaviour in timespec64_to_ns()
On Tue, Feb 26, 2019 at 11:52 PM Xiongfeng Wang wrote: > > When I ran Syzkaller testsuite, I got the following call trace. > > UBSAN: Undefined behaviour in ./include/linux/time64.h:120:27 > signed integer overflow: > 8243129037239968815 * 10 cannot be represented in type 'long long int' > CPU: 5 PID: 28854 Comm: syz-executor.1 Not tainted 4.19.24 #4 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014 > Call Trace: > __dump_stack lib/dump_stack.c:77 [inline] > dump_stack+0xca/0x13e lib/dump_stack.c:113 > ubsan_epilogue+0xe/0x81 lib/ubsan.c:159 > handle_overflow+0x193/0x1e2 lib/ubsan.c:190 > timespec64_to_ns include/linux/time64.h:120 [inline] > posix_cpu_timer_set+0x95a/0xb70 kernel/time/posix-cpu-timers.c:687 > do_timer_settime+0x198/0x2a0 kernel/time/posix-timers.c:892 > __do_sys_timer_settime kernel/time/posix-timers.c:918 [inline] > __se_sys_timer_settime kernel/time/posix-timers.c:904 [inline] > __x64_sys_timer_settime+0x18d/0x260 kernel/time/posix-timers.c:904 > do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > RIP: 0033:0x462eb9 > Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 > 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 > 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 > RSP: 002b:7f14e4127c58 EFLAGS: 0246 ORIG_RAX: 00df > RAX: ffda RBX: 0073bfa0 RCX: 00462eb9 > RDX: 2080 RSI: RDI: > RBP: 0004 R08: R09: > R10: R11: 0246 R12: 7f14e41286bc > R13: 004c54cc R14: 00704278 R15: > > > It is because 'it_interval.tv_sec' is larger than 'KTIME_SEC_MAX' and > 'it_interval.tv_sec * NSEC_PER_SEC' overflows in 'timespec64_to_ns()'. > > This patch use 'timespec64_valid_restrict()' to check whether > 'it_interval.tv_sec' is larger than 'KTIME_SEC_MAX'. > > Signed-off-by: Xiongfeng Wang > --- > kernel/time/posix-timers.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c > index 0e84bb7..97b773c 100644 > --- a/kernel/time/posix-timers.c > +++ b/kernel/time/posix-timers.c > @@ -853,8 +853,8 @@ static int do_timer_settime(timer_t timer_id, int flags, > unsigned long flag; > int error = 0; > > - if (!timespec64_valid(_spec64->it_interval) || > - !timespec64_valid(_spec64->it_value)) > + if (!timespec64_valid_strict(_spec64->it_interval) || > + !timespec64_valid_strict(_spec64->it_value)) > return -EINVAL; > > if (old_spec64) sys_timer_settime() is a POSIX interface: http://pubs.opengroup.org/onlinepubs/7908799/xsh/timer_settime.html The timer_settime() function will fail if: [EINVAL] The timerid argument does not correspond to an id returned by timer_create() but not yet deleted by timer_delete(). [EINVAL] A value structure specified a nanosecond value less than zero or greater than or equal to 1000 million. So we cannot return EINVAL here if we want to maintain POSIX compatibility. Maybe we should check for limit and saturate here at the syscall interface? -Deepa
Re: [RFC PATCH] posix-cpu-timers: Avoid undefined behaviour in timespec64_to_ns()
On Tue, Feb 26, 2019 at 6:07 PM Xiongfeng Wang wrote: > > When I ran Syzkaller testsuite, I got the following call trace. > > UBSAN: Undefined behaviour in ./include/linux/time64.h:120:27 > signed integer overflow: > 8243129037239968815 * 10 cannot be represented in type 'long long int' > CPU: 5 PID: 28854 Comm: syz-executor.1 Not tainted 4.19.24 #4 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014 > Call Trace: > __dump_stack lib/dump_stack.c:77 [inline] > dump_stack+0xca/0x13e lib/dump_stack.c:113 > ubsan_epilogue+0xe/0x81 lib/ubsan.c:159 > handle_overflow+0x193/0x1e2 lib/ubsan.c:190 > timespec64_to_ns include/linux/time64.h:120 [inline] > posix_cpu_timer_set+0x95a/0xb70 kernel/time/posix-cpu-timers.c:687 > do_timer_settime+0x198/0x2a0 kernel/time/posix-timers.c:892 > __do_sys_timer_settime kernel/time/posix-timers.c:918 [inline] > __se_sys_timer_settime kernel/time/posix-timers.c:904 [inline] > __x64_sys_timer_settime+0x18d/0x260 kernel/time/posix-timers.c:904 > do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > RIP: 0033:0x462eb9 > Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 > 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 > 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 > RSP: 002b:7f14e4127c58 EFLAGS: 0246 ORIG_RAX: 00df > RAX: ffda RBX: 0073bfa0 RCX: 00462eb9 > RDX: 2080 RSI: RDI: > RBP: 0004 R08: R09: > R10: R11: 0246 R12: 7f14e41286bc > R13: 004c54cc R14: 00704278 R15: > > > This patch use 'timespec64_to_ktime()' to limit 'tv_sec' to avoid > overflow. > > Signed-off-by: Xiongfeng Wang > --- > kernel/time/posix-cpu-timers.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c > index 80f9552..f7e3929 100644 > --- a/kernel/time/posix-cpu-timers.c > +++ b/kernel/time/posix-cpu-timers.c > @@ -684,7 +684,7 @@ static int posix_cpu_timer_set(struct k_itimer *timer, > int timer_flags, > * Install the new reload setting, and > * set up the signal and overrun bookkeeping. > */ > - timer->it.cpu.incr = timespec64_to_ns(>it_interval); > + timer->it.cpu.incr = > ktime_to_ns(timespec64_to_ktime(new->it_interval)); > timer->it_interval = ns_to_ktime(timer->it.cpu.incr); > > /* This seems like a similar bug as the other one https://lkml.org/lkml/2019/2/24/214. Maybe it makes sense here also to do some bounds checking when we get the userspace parameter. This patch just saturates the value. -Deepa
Re: [PATCH] time64: Avoid undefined behaviour in timespec64_add()
On Mon, Feb 25, 2019 at 1:02 AM Arnd Bergmann wrote: > > On Mon, Feb 25, 2019 at 5:53 AM Deepa Dinamani wrote: > > > > On Sun, Feb 24, 2019 at 7:13 PM Hongbo Yao wrote: > > > > > > I ran into this: > > > > > > = > > > UBSAN: Undefined behaviour in ./include/linux/time64.h:70:2 > > > signed integer overflow: > > > 1551059291 + 9223372036854775807 cannot be represented in type > > > 'long > > > long int' > > > CPU: 5 PID: 20064 Comm: syz-executor.2 Not tainted 4.19.24 #4 > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > > > 1.10.2-1ubuntu1 04/01/2014 > > > Call Trace: > > > __dump_stack lib/dump_stack.c:77 [inline] > > > dump_stack+0xca/0x13e lib/dump_stack.c:113 > > > ubsan_epilogue+0xe/0x81 lib/ubsan.c:159 > > > handle_overflow+0x193/0x1e2 lib/ubsan.c:190 > > > timespec64_add include/linux/time64.h:70 [inline] > ... > > > Since lhs.tv_sec and rhs.tv_sec are both time64_t, this is a signed > > > addition which will cause undefined behaviour on overflow. > > I wonder if we should treat this as undefined behavior in the kernel or not: > The kernel is build with -fno-strict-overflow, so signed integer overflow > is supposed to behave the same way as unsigned, and assume > two's-complement arithmetic. Another option is to saturate or return error when possible. 2's complement arithmetic lets us check for overflow. So I feel this is nice for timespec64_add(). > > > @@ -67,7 +67,7 @@ static inline struct timespec64 timespec64_add(struct > > > timespec64 lhs, > > > struct timespec64 rhs) > > > { > > > struct timespec64 ts_delta; > > > - set_normalized_timespec64(_delta, lhs.tv_sec + rhs.tv_sec, > > > + set_normalized_timespec64(_delta, (timeu64_t)lhs.tv_sec + > > > rhs.tv_sec, > > > lhs.tv_nsec + rhs.tv_nsec); > > > return ts_delta; > > > } > > > > There is already a timespec64_add_safe() to account for such > > overflows. That assumes both the timespec64 values are positive. > > But, timekeeping_inject_offset() cannot use that as one of the values > > can be negative. > > We could perhaps extend timespec64_add_safe() to handle both > overflow and underflow, and allow negative arguments. It would > have to use some extra checks then. I was thinking the reason for having just timespec64_add() is that the caller makes these checks in case they want to return an error on overflow. The safe version will just saturate and caller is unaware. > There are actually only > a very small number of callers to timespec64_add(): > > arch/arm/xen/enlighten.c: *ts = timespec64_add(now, ts_monotonic); > arch/arm/xen/enlighten.c: system_time = timespec64_add(now, > tk->wall_to_monotonic); > drivers/net/ethernet/cadence/macb_ptp.c:now = > timespec64_add(now, then); > drivers/net/ethernet/intel/igb/igb_main.c: ts = > timespec64_add(adapter->perout[0].start, > drivers/net/ethernet/intel/igb/igb_main.c: ts = > timespec64_add(adapter->perout[1].start, > drivers/net/ethernet/intel/igb/igb_ptp.c: now = timespec64_add(now, > then); > fs/cifs/dfs_cache.c:return timespec64_add(now, ts); > include/linux/rtc.h:*to_set = timespec64_add(*now, delay); > include/linux/time64.h:static inline struct timespec64 > timespec64_add(struct timespec64 lhs, > kernel/time/timekeeping.c: tmp = timespec64_add(tk_xtime(tk), *ts); > kernel/time/timekeeping.c: > timespec64_add(timekeeping_suspend_time, delta_delta); > net/ceph/messenger.c: ts = > timespec64_add(con->last_keepalive_ack, ts); > > It looks like an actual overflow would be really bad in most of these, > regardless > of the undefined behavior. I can look at sanitizing whatever you haven't started on. > > Are you running some kind of a fuzzer that would cause a overflow? > > You seem to be adding INT64_MAX here. Maybe the right thing to do is > > to add a check at the syscall interface rather than here. > > Returning an error from the syscall here sounds like a good idea. I'm > not sure what we should do about the time32 version of adjtimex though > if we decide we want that. Should we just reject any times there that > result in a time outside of the 1970..2038 range? utimes_common() does not return an error right now and fails silently when the filesystem is not able to represent times. And, POSIX seems ambiguous here. Should both these syscalls match in behavior? -Deepa -Deepa
Re: [PATCH] time64: Avoid undefined behaviour in timespec64_add()
On Sun, Feb 24, 2019 at 7:13 PM Hongbo Yao wrote: > > I ran into this: > > = > UBSAN: Undefined behaviour in ./include/linux/time64.h:70:2 > signed integer overflow: > 1551059291 + 9223372036854775807 cannot be represented in type 'long > long int' > CPU: 5 PID: 20064 Comm: syz-executor.2 Not tainted 4.19.24 #4 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > 1.10.2-1ubuntu1 04/01/2014 > Call Trace: > __dump_stack lib/dump_stack.c:77 [inline] > dump_stack+0xca/0x13e lib/dump_stack.c:113 > ubsan_epilogue+0xe/0x81 lib/ubsan.c:159 > handle_overflow+0x193/0x1e2 lib/ubsan.c:190 > timespec64_add include/linux/time64.h:70 [inline] > timekeeping_inject_offset+0x3ed/0x4e0 kernel/time/timekeeping.c:1301 > do_adjtimex+0x1e5/0x6c0 kernel/time/timekeeping.c:2360 > __do_sys_clock_adjtime+0x122/0x200 kernel/time/posix-timers.c:1086 > do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > RIP: 0033:0x462eb9 > Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 > 89 > f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 > f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 > RSP: 002b:7f888aa2dc58 EFLAGS: 0246 ORIG_RAX: 0131 > RAX: ffda RBX: 0073bf00 RCX: 00462eb9 > RDX: RSI: 23c0 RDI: > RBP: 0002 R08: R09: > R10: R11: 0246 R12: 7f888aa2e6bc > R13: 004bcae8 R14: 006f6868 R15: > > == > > Since lhs.tv_sec and rhs.tv_sec are both time64_t, this is a signed > addition which will cause undefined behaviour on overflow. > > The easiest way to avoid the overflow is to cast one of the arguments to > unsigned (so the addition will be done using unsigned arithmetic). > This patch doesn't change generated code. > > Signed-off-by: Hongbo Yao > --- > include/linux/time64.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/include/linux/time64.h b/include/linux/time64.h > index 05634afba0db..5926bdd4167f 100644 > --- a/include/linux/time64.h > +++ b/include/linux/time64.h > @@ -67,7 +67,7 @@ static inline struct timespec64 timespec64_add(struct > timespec64 lhs, > struct timespec64 rhs) > { > struct timespec64 ts_delta; > - set_normalized_timespec64(_delta, lhs.tv_sec + rhs.tv_sec, > + set_normalized_timespec64(_delta, (timeu64_t)lhs.tv_sec + > rhs.tv_sec, > lhs.tv_nsec + rhs.tv_nsec); > return ts_delta; > } There is already a timespec64_add_safe() to account for such overflows. That assumes both the timespec64 values are positive. But, timekeeping_inject_offset() cannot use that as one of the values can be negative. Are you running some kind of a fuzzer that would cause a overflow? You seem to be adding INT64_MAX here. Maybe the right thing to do is to add a check at the syscall interface rather than here. -Deepa