Re: compat_sys_times() bogus until jiffies >= 0.
On Fri, Dec 21, 2007 at 08:36:40PM -0800, David Miller wrote: > From: Andi Kleen <[EMAIL PROTECTED]> > Date: Sat, 22 Dec 2007 02:53:11 +0100 > > > > And to handle potentially ambiguous cases we, for a long time, have > > > the force_successful_syscall_return() arch hook. > > > > Ah I see what you mean now. > > > > Thanks for the clarification. > > Thanks for continuing to insist it's "impossible" :-) It's still hard -- e.g. i'm not sure your condition flag setting would be even possible for i386 SYSEXIT which does not restore EFLAGS from memory and has some other constraints too. And there is no free other register to use for this either on i386 nor x86-64. Ok you could always disable SYSEXIT on force_successfull_return(), but then e.g. all lseek()s would use the slow path which might not be a good idea. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
On Fri, Dec 21, 2007 at 08:36:40PM -0800, David Miller wrote: From: Andi Kleen [EMAIL PROTECTED] Date: Sat, 22 Dec 2007 02:53:11 +0100 And to handle potentially ambiguous cases we, for a long time, have the force_successful_syscall_return() arch hook. Ah I see what you mean now. Thanks for the clarification. Thanks for continuing to insist it's impossible :-) It's still hard -- e.g. i'm not sure your condition flag setting would be even possible for i386 SYSEXIT which does not restore EFLAGS from memory and has some other constraints too. And there is no free other register to use for this either on i386 nor x86-64. Ok you could always disable SYSEXIT on force_successfull_return(), but then e.g. all lseek()s would use the slow path which might not be a good idea. -Andi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
From: Andi Kleen <[EMAIL PROTECTED]> Date: Sat, 22 Dec 2007 02:53:11 +0100 > > And to handle potentially ambiguous cases we, for a long time, have > > the force_successful_syscall_return() arch hook. > > Ah I see what you mean now. > > Thanks for the clarification. Thanks for continuing to insist it's "impossible" :-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
> And to handle potentially ambiguous cases we, for a long time, have > the force_successful_syscall_return() arch hook. Ah I see what you mean now. Thanks for the clarification. Ok that could be in theory made to work yes. The migration would probably be ugly though (how would glibc figure out if the kernel does that or not?) -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
> I'm suggesting that you set the condition codes based upon whether > there is an error or not. And the only way the syscall code could find out if there is an error is by checking err < 0 && err >= -4096 like glibc (except for the compat syscall on 64bit kernel case) Or rewrite all code that returns errors to system calls to pass a separate flag too. > That is the critical thing x86 doesn't do > that all the other platforms do. It doesn't do it because it's useless without a kernel rewrite. I frankly doubt it really works on Sparc :-) Maybe it could work there on a hypothetical rewritten kernel, but not today. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
From: David Miller <[EMAIL PROTECTED]> Date: Fri, 21 Dec 2007 17:41:24 -0800 (PST) > I'm suggesting that you set the condition codes based upon whether > there is an error or not. That is the critical thing x86 doesn't do > that all the other platforms do. And if you still don't get it, I'm saying that x86, in the syscall trap return path, should set the conditon codes based upon whether the system call is really signalling an error or not. And to handle potentially ambiguous cases we, for a long time, have the force_successful_syscall_return() arch hook. System call implementations use this when the return values they give could be mis-construed as error values. And if you'll notice x86 makes no attempt to implement that hook, because it currently can't. That's what needs to be fixed. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
From: Andi Kleen <[EMAIL PROTECTED]> Date: Sat, 22 Dec 2007 01:42:02 +0100 > David Miller <[EMAIL PROTECTED]> writes: > > > Only on x86 platforms. Sparc, IA64, MIPS, powerpc, etc. all get this > > case right. > > Do they for 32bit kernels or for 64bit userland? Both. This is not a compat issue. > > Yes it's another unfortunate side effect of how error status is > > indicated for x86 system calls. > > Maybe I'm dense, but doesn't all the kernel code pass it the > same way as the x86 syscall code? For your proposal you > would need a separate error bit coming out of the sys_* to > handle this case. Basically rewrite all code that ever returns > errors in the kernel. Or do I miss something? I'm suggesting that you set the condition codes based upon whether there is an error or not. That is the critical thing x86 doesn't do that all the other platforms do. x86 relies on interpretation of purely the integer returned from the system call to userspace, and that means a certain chunk of the return value space can never represent valid values. If you use the condition codes to signal "the return value is an error" you don't have these problems. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
David Miller <[EMAIL PROTECTED]> writes: > Only on x86 platforms. Sparc, IA64, MIPS, powerpc, etc. all get this > case right. Do they for 32bit kernels or for 64bit userland? > Yes it's another unfortunate side effect of how error status is > indicated for x86 system calls. Maybe I'm dense, but doesn't all the kernel code pass it the same way as the x86 syscall code? For your proposal you would need a separate error bit coming out of the sys_* to handle this case. Basically rewrite all code that ever returns errors in the kernel. Or do I miss something? Or are you talking about solving it only for 32bit compat emulation on 64bit kernels? There it would be possible, but only doing it there and not on native 32bit systems wouldn't seem clean to me. At least on x86-64 the compat code's (near) only goal in live is to be as compatible to 32it as possible not better. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
David Miller [EMAIL PROTECTED] writes: Only on x86 platforms. Sparc, IA64, MIPS, powerpc, etc. all get this case right. Do they for 32bit kernels or for 64bit userland? Yes it's another unfortunate side effect of how error status is indicated for x86 system calls. Maybe I'm dense, but doesn't all the kernel code pass it the same way as the x86 syscall code? For your proposal you would need a separate error bit coming out of the sys_* to handle this case. Basically rewrite all code that ever returns errors in the kernel. Or do I miss something? Or are you talking about solving it only for 32bit compat emulation on 64bit kernels? There it would be possible, but only doing it there and not on native 32bit systems wouldn't seem clean to me. At least on x86-64 the compat code's (near) only goal in live is to be as compatible to 32it as possible not better. -Andi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
From: Andi Kleen [EMAIL PROTECTED] Date: Sat, 22 Dec 2007 01:42:02 +0100 David Miller [EMAIL PROTECTED] writes: Only on x86 platforms. Sparc, IA64, MIPS, powerpc, etc. all get this case right. Do they for 32bit kernels or for 64bit userland? Both. This is not a compat issue. Yes it's another unfortunate side effect of how error status is indicated for x86 system calls. Maybe I'm dense, but doesn't all the kernel code pass it the same way as the x86 syscall code? For your proposal you would need a separate error bit coming out of the sys_* to handle this case. Basically rewrite all code that ever returns errors in the kernel. Or do I miss something? I'm suggesting that you set the condition codes based upon whether there is an error or not. That is the critical thing x86 doesn't do that all the other platforms do. x86 relies on interpretation of purely the integer returned from the system call to userspace, and that means a certain chunk of the return value space can never represent valid values. If you use the condition codes to signal the return value is an error you don't have these problems. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
From: David Miller [EMAIL PROTECTED] Date: Fri, 21 Dec 2007 17:41:24 -0800 (PST) I'm suggesting that you set the condition codes based upon whether there is an error or not. That is the critical thing x86 doesn't do that all the other platforms do. And if you still don't get it, I'm saying that x86, in the syscall trap return path, should set the conditon codes based upon whether the system call is really signalling an error or not. And to handle potentially ambiguous cases we, for a long time, have the force_successful_syscall_return() arch hook. System call implementations use this when the return values they give could be mis-construed as error values. And if you'll notice x86 makes no attempt to implement that hook, because it currently can't. That's what needs to be fixed. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
I'm suggesting that you set the condition codes based upon whether there is an error or not. And the only way the syscall code could find out if there is an error is by checking err 0 err = -4096 like glibc (except for the compat syscall on 64bit kernel case) Or rewrite all code that returns errors to system calls to pass a separate flag too. That is the critical thing x86 doesn't do that all the other platforms do. It doesn't do it because it's useless without a kernel rewrite. I frankly doubt it really works on Sparc :-) Maybe it could work there on a hypothetical rewritten kernel, but not today. -Andi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
And to handle potentially ambiguous cases we, for a long time, have the force_successful_syscall_return() arch hook. Ah I see what you mean now. Thanks for the clarification. Ok that could be in theory made to work yes. The migration would probably be ugly though (how would glibc figure out if the kernel does that or not?) -Andi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
From: Andi Kleen [EMAIL PROTECTED] Date: Sat, 22 Dec 2007 02:53:11 +0100 And to handle potentially ambiguous cases we, for a long time, have the force_successful_syscall_return() arch hook. Ah I see what you mean now. Thanks for the clarification. Thanks for continuing to insist it's impossible :-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
From: Michael Kerrisk <[EMAIL PROTECTED]> Date: Thu, 20 Dec 2007 12:36:52 +0100 > Some testing just now shows me that lseek() on /dev/mem suffers similar > problems when seeking to bytes 0xf001 through to 0x. Only on x86 platforms. Sparc, IA64, MIPS, powerpc, etc. all get this case right. Yes it's another unfortunate side effect of how error status is indicated for x86 system calls. I would suggest, that we put something in place to fix this in the long term: 1) Start setting the condition codes properly to indicate error in the system call return path on x86 like other platforms do now. Make sure that it tips off on force_successful_syscall_return(), as needed. 2) Come up with a transition plan for glibc et al. to take advantage of this. It actually sounds like the kind of problem that could be solved well using the VDSO page. :-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
David, Andrew, Paul, A late coda to this thread, but I'll just note some changes I'm making to the man page (which I'd like you to review -- please see below), and note a few other points. Andrew, you asked about what happens for x86 with the -1 to -4095 return for other syscalls. At least two other syscalls suffer the same problem. >From the fcntl(2) man page: BUGS A limitation of the Linux system call conventions on some architectures (notably x86) means that if a (negative) process group ID to be returned by F_GETOWN falls in the range -1 to -4095, then the return value is wrongly interpreted by glibc as an error in the system call; that is, the return value of fcntl() will be -1, and errno will contain the (positive) process group ID. Some testing just now shows me that lseek() on /dev/mem suffers similar problems when seeking to bytes 0xf001 through to 0x. Ulrich Drepper wrote: > Chris Friesen wrote: >>> A possible remedy is to return the ticks since process start time, which >>> delays the wrap around much further. POSIX only demands consistency >>> within the same process. >> This would be an interesting solution. > >> The man page for linux states that the return code is time since system >> boot, so that could realistically be expected to correlate between >> different processes. > > The Linux man page is documenting existing functionality on top of what > the standard requires. Programmers should ever only require what the > standard guarantees. > > I am perfectly willing to support a solution where the time is measured >>from process startup time. The only code using times() I found is > cross-platform and most likely does not depend on the value returned is > usable in isolation (only in a difference). Did I miss anything? Is anyone actually working on a solution along these lines? In the meantime, for man-pages-2.74, I've reworked the description of the return value: RETURN VALUE times() returns the number of clock ticks that have elapsed since an arbitrary point in the past. The return value may overflow the possible range of type clock_t. On error, (clock_t) -1 is returned, and errno is set appropriately. I moved the Linux specific details of the return value to NOTES, and added a warning about relying on those details: NOTES ... On Linux, the "arbitrary point in the past" from which the return value of times() is measured has varied across kernel versions. On Linux 2.4 and earlier this point is the moment the system was booted. Since Linux 2.6, this point is (2^32/HZ) - 300 (i.e., about 429 million) sec- onds before system boot time. This variability across kernel versions (and across Unix implementations), com- bined with the fact that the returned value may overflow the range of clock_t, means that a portable application would be wise to avoid using this value. To measure changes in elapsed time, use gettimeofday(2) instead. Under BUGS I added: BUGS A limitation of the Linux system call conventions on some architectures (notably x86) means that on Linux 2.6 there is a small time window (41 seconds) soon after boot when times(2) can return -1, falsely indicating that an error occurred. The same problem can occur when the return value wraps passed the maximum value that can be stored in clockid_t. Look okay to you folks? Cheers, Michael -- Michael Kerrisk Maintainer of the Linux man-pages project http://www.kernel.org/doc/man-pages/ Want to report a man-pages bug? Look here: http://www.kernel.org/doc/man-pages/reporting_bugs.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
David, Andrew, Paul, A late coda to this thread, but I'll just note some changes I'm making to the man page (which I'd like you to review -- please see below), and note a few other points. Andrew, you asked about what happens for x86 with the -1 to -4095 return for other syscalls. At least two other syscalls suffer the same problem. From the fcntl(2) man page: BUGS A limitation of the Linux system call conventions on some architectures (notably x86) means that if a (negative) process group ID to be returned by F_GETOWN falls in the range -1 to -4095, then the return value is wrongly interpreted by glibc as an error in the system call; that is, the return value of fcntl() will be -1, and errno will contain the (positive) process group ID. Some testing just now shows me that lseek() on /dev/mem suffers similar problems when seeking to bytes 0xf001 through to 0x. Ulrich Drepper wrote: Chris Friesen wrote: A possible remedy is to return the ticks since process start time, which delays the wrap around much further. POSIX only demands consistency within the same process. This would be an interesting solution. The man page for linux states that the return code is time since system boot, so that could realistically be expected to correlate between different processes. The Linux man page is documenting existing functionality on top of what the standard requires. Programmers should ever only require what the standard guarantees. I am perfectly willing to support a solution where the time is measured from process startup time. The only code using times() I found is cross-platform and most likely does not depend on the value returned is usable in isolation (only in a difference). Did I miss anything? Is anyone actually working on a solution along these lines? In the meantime, for man-pages-2.74, I've reworked the description of the return value: RETURN VALUE times() returns the number of clock ticks that have elapsed since an arbitrary point in the past. The return value may overflow the possible range of type clock_t. On error, (clock_t) -1 is returned, and errno is set appropriately. I moved the Linux specific details of the return value to NOTES, and added a warning about relying on those details: NOTES ... On Linux, the arbitrary point in the past from which the return value of times() is measured has varied across kernel versions. On Linux 2.4 and earlier this point is the moment the system was booted. Since Linux 2.6, this point is (2^32/HZ) - 300 (i.e., about 429 million) sec- onds before system boot time. This variability across kernel versions (and across Unix implementations), com- bined with the fact that the returned value may overflow the range of clock_t, means that a portable application would be wise to avoid using this value. To measure changes in elapsed time, use gettimeofday(2) instead. Under BUGS I added: BUGS A limitation of the Linux system call conventions on some architectures (notably x86) means that on Linux 2.6 there is a small time window (41 seconds) soon after boot when times(2) can return -1, falsely indicating that an error occurred. The same problem can occur when the return value wraps passed the maximum value that can be stored in clockid_t. Look okay to you folks? Cheers, Michael -- Michael Kerrisk Maintainer of the Linux man-pages project http://www.kernel.org/doc/man-pages/ Want to report a man-pages bug? Look here: http://www.kernel.org/doc/man-pages/reporting_bugs.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
From: Michael Kerrisk [EMAIL PROTECTED] Date: Thu, 20 Dec 2007 12:36:52 +0100 Some testing just now shows me that lseek() on /dev/mem suffers similar problems when seeking to bytes 0xf001 through to 0x. Only on x86 platforms. Sparc, IA64, MIPS, powerpc, etc. all get this case right. Yes it's another unfortunate side effect of how error status is indicated for x86 system calls. I would suggest, that we put something in place to fix this in the long term: 1) Start setting the condition codes properly to indicate error in the system call return path on x86 like other platforms do now. Make sure that it tips off on force_successful_syscall_return(), as needed. 2) Come up with a transition plan for glibc et al. to take advantage of this. It actually sounds like the kind of problem that could be solved well using the VDSO page. :-) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Chris Friesen wrote: >> A possible remedy is to return the ticks since process start time, which >> delays the wrap around much further. POSIX only demands consistency >> within the same process. > > This would be an interesting solution. > > The man page for linux states that the return code is time since system > boot, so that could realistically be expected to correlate between > different processes. The Linux man page is documenting existing functionality on top of what the standard requires. Programmers should ever only require what the standard guarantees. I am perfectly willing to support a solution where the time is measured from process startup time. The only code using times() I found is cross-platform and most likely does not depend on the value returned is usable in isolation (only in a difference). - -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) iD8DBQFHNKTZ2ijCOnn/RHQRAv2wAJsHOnWRrbE2N2Z4R35bsU1+BIZEGQCguaxL zY9f4XEhJnAoNF5jFxm76qI= =0nsU -END PGP SIGNATURE- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Chris Friesen wrote: A possible remedy is to return the ticks since process start time, which delays the wrap around much further. POSIX only demands consistency within the same process. This would be an interesting solution. The man page for linux states that the return code is time since system boot, so that could realistically be expected to correlate between different processes. The Linux man page is documenting existing functionality on top of what the standard requires. Programmers should ever only require what the standard guarantees. I am perfectly willing to support a solution where the time is measured from process startup time. The only code using times() I found is cross-platform and most likely does not depend on the value returned is usable in isolation (only in a difference). - -- ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) iD8DBQFHNKTZ2ijCOnn/RHQRAv2wAJsHOnWRrbE2N2Z4R35bsU1+BIZEGQCguaxL zY9f4XEhJnAoNF5jFxm76qI= =0nsU -END PGP SIGNATURE- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
On Thursday 08 November 2007 03:07, Andrew Morton wrote: > > On Thu, 8 Nov 2007 12:53:57 +1100 Paul Mackerras <[EMAIL PROTECTED]> wrote: > > Andrew Morton writes: > > > > > Given all this stuff, the return value from sys_times() doesn't seem a > > > particularly useful or reliable kernel interface. > > > > I think the best thing would be to ignore any error from copy_to_user > > and always return the number of clock ticks. We should call > > force_successful_syscall_return, and glibc on x86 should be taught not > > to interpret negative values as an error. > > Changing glibc might be hard ;) > > > POSIX doesn't require us to return an EFAULT error if the buf argument > > is bogus. If userspace does supply a bogus buf pointer, then either > > it will dereference it itself and get a segfault, or it won't > > dereference it, in which case it obviously didn't care about the > > values we tried to put there. > > > > If we try to return an error under some circumstances, then there is > > at least one 32-bit value for the number of ticks that will cause > > confusion. We can either change that value (or values) to some other > > value, which seems pretty bogus, or we can just decide not to return > > any errors. The latter seems to me to have no significant downside > > and to be the simplest solution to the problem. > > "the latter" is what my protopatch does isn't it? It wraps at 0x7fff. > It appears that glibc treats all of 0x8000-0x as an error. The best solution is to change the kernel to never return an error and to change glibc to never treat return as an error. -- vda - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
On Thursday 08 November 2007 02:09, David Miller wrote: > > I think the best thing would be to ignore any error from copy_to_user > > and always return the number of clock ticks. We should call > > force_successful_syscall_return, and glibc on x86 should be taught not > > to interpret negative values as an error. > > > > POSIX doesn't require us to return an EFAULT error if the buf argument > > is bogus. If userspace does supply a bogus buf pointer, then either > > it will dereference it itself and get a segfault, or it won't > > dereference it, in which case it obviously didn't care about the > > values we tried to put there. > > > > If we try to return an error under some circumstances, then there is > > at least one 32-bit value for the number of ticks that will cause > > confusion. We can either change that value (or values) to some other > > value, which seems pretty bogus, or we can just decide not to return > > any errors. The latter seems to me to have no significant downside > > and to be the simplest solution to the problem. > > I agree with this analysis. > > The Linux man page for times() explicitly lists (clock_t) -1 as a > return value meaning error. > > So even if we did make some effort to return errors "properly" (via > force_successful_syscall_return() et al.) userspace would still be > screwed because (clock_t) -1 would be interpreted as an error. > > Actually I think this basically proves we cannot return (clock_t) -1 > ever because all existing userland (I'm not talking about inside > glibc, I'm talking about inside of applications) will see this as an > error. What error? I'd argue it's perfectly sane for application to assume that times() never fails. struct tms t; clock_t start = times(); ... clock_t end = times(); clock_t delta = end - start; The only error form kernel POV is that passed pointer can be invalid. But from application POV in the above example it cannot be true and if (start == -1) error("error in times!"); would be and exercise in wasting CPU cycles, producing dead code and feeding one's paranoia. > User applications have no other way to check for error. And in all realistic scenarios it doesn't need to. In this particular case, it makes sense to ignore standards and never return an error. If user indeed passed invalid pointer, just don't store anything there, but still return valid value. -- vda - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
Andreas Schwab wrote: A possible remedy is to return the ticks since process start time, which delays the wrap around much further. POSIX only demands consistency within the same process. This would be an interesting solution. The man page for linux states that the return code is time since system boot, so that could realistically be expected to correlate between different processes. Could we get away with changing the man page and breaking any apps relying on this previously-documented behaviour? Chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
David Miller <[EMAIL PROTECTED]> writes: > I agree with this analysis. > > The Linux man page for times() explicitly lists (clock_t) -1 as a > return value meaning error. > > So even if we did make some effort to return errors "properly" (via > force_successful_syscall_return() et al.) userspace would still be > screwed because (clock_t) -1 would be interpreted as an error. > > Actually I think this basically proves we cannot return (clock_t) -1 > ever because all existing userland (I'm not talking about inside > glibc, I'm talking about inside of applications) will see this as an > error. > > User applications have no other way to check for error. > > This API is definitely very poorly designed, no matter which way we > "fix" this some case will remain broken. A possible remedy is to return the ticks since process start time, which delays the wrap around much further. POSIX only demands consistency within the same process. Andreas. -- Andreas Schwab, SuSE Labs, [EMAIL PROTECTED] SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
David Miller writes: > On x86 only. We could use force_successful_syscall_return() > to make sure the condition codes get set correctly on > other platforms. > > But even in that case we'd still be broken when the return > value is exactly -1 and that's what the application is going > to compare against to test for errors. We could special-case that and turn it into 0. That would introduce a 0.01 second blip, which would be better than a 41 second window for bad behaviour like we have at the moment. It's also possible that many applications already don't check for errors. For example, glibc deliberately doesn't check for errors when it calls __times in the clock() implementation. There is a comment in sysdeps/unix/sysv/linux/clock.c that says this: /* We don't check for errors here. The only error the kernel returns is EFAULT if the value cannot be written to the struct we pass a pointer to. Otherwise the kernel returns an `unsigned long' value which is the number of jiffies since system start. But this number can be negative (when read as `long') when the system is up for some time. Ignoring errors should therefore have no negative impacts but solve the problem. */ __times (); Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
David Miller writes: On x86 only. We could use force_successful_syscall_return() to make sure the condition codes get set correctly on other platforms. But even in that case we'd still be broken when the return value is exactly -1 and that's what the application is going to compare against to test for errors. We could special-case that and turn it into 0. That would introduce a 0.01 second blip, which would be better than a 41 second window for bad behaviour like we have at the moment. It's also possible that many applications already don't check for errors. For example, glibc deliberately doesn't check for errors when it calls __times in the clock() implementation. There is a comment in sysdeps/unix/sysv/linux/clock.c that says this: /* We don't check for errors here. The only error the kernel returns is EFAULT if the value cannot be written to the struct we pass a pointer to. Otherwise the kernel returns an `unsigned long' value which is the number of jiffies since system start. But this number can be negative (when read as `long') when the system is up for some time. Ignoring errors should therefore have no negative impacts but solve the problem. */ __times (buf); Paul. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
David Miller [EMAIL PROTECTED] writes: I agree with this analysis. The Linux man page for times() explicitly lists (clock_t) -1 as a return value meaning error. So even if we did make some effort to return errors properly (via force_successful_syscall_return() et al.) userspace would still be screwed because (clock_t) -1 would be interpreted as an error. Actually I think this basically proves we cannot return (clock_t) -1 ever because all existing userland (I'm not talking about inside glibc, I'm talking about inside of applications) will see this as an error. User applications have no other way to check for error. This API is definitely very poorly designed, no matter which way we fix this some case will remain broken. A possible remedy is to return the ticks since process start time, which delays the wrap around much further. POSIX only demands consistency within the same process. Andreas. -- Andreas Schwab, SuSE Labs, [EMAIL PROTECTED] SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
Andreas Schwab wrote: A possible remedy is to return the ticks since process start time, which delays the wrap around much further. POSIX only demands consistency within the same process. This would be an interesting solution. The man page for linux states that the return code is time since system boot, so that could realistically be expected to correlate between different processes. Could we get away with changing the man page and breaking any apps relying on this previously-documented behaviour? Chris - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
On Thursday 08 November 2007 02:09, David Miller wrote: I think the best thing would be to ignore any error from copy_to_user and always return the number of clock ticks. We should call force_successful_syscall_return, and glibc on x86 should be taught not to interpret negative values as an error. POSIX doesn't require us to return an EFAULT error if the buf argument is bogus. If userspace does supply a bogus buf pointer, then either it will dereference it itself and get a segfault, or it won't dereference it, in which case it obviously didn't care about the values we tried to put there. If we try to return an error under some circumstances, then there is at least one 32-bit value for the number of ticks that will cause confusion. We can either change that value (or values) to some other value, which seems pretty bogus, or we can just decide not to return any errors. The latter seems to me to have no significant downside and to be the simplest solution to the problem. I agree with this analysis. The Linux man page for times() explicitly lists (clock_t) -1 as a return value meaning error. So even if we did make some effort to return errors properly (via force_successful_syscall_return() et al.) userspace would still be screwed because (clock_t) -1 would be interpreted as an error. Actually I think this basically proves we cannot return (clock_t) -1 ever because all existing userland (I'm not talking about inside glibc, I'm talking about inside of applications) will see this as an error. What error? I'd argue it's perfectly sane for application to assume that times() never fails. struct tms t; clock_t start = times(t); ... clock_t end = times(t); clock_t delta = end - start; The only error form kernel POV is that passed pointer can be invalid. But from application POV in the above example it cannot be true and if (start == -1) error(error in times!); would be and exercise in wasting CPU cycles, producing dead code and feeding one's paranoia. User applications have no other way to check for error. And in all realistic scenarios it doesn't need to. In this particular case, it makes sense to ignore standards and never return an error. If user indeed passed invalid pointer, just don't store anything there, but still return valid value. -- vda - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
On Thursday 08 November 2007 03:07, Andrew Morton wrote: On Thu, 8 Nov 2007 12:53:57 +1100 Paul Mackerras [EMAIL PROTECTED] wrote: Andrew Morton writes: Given all this stuff, the return value from sys_times() doesn't seem a particularly useful or reliable kernel interface. I think the best thing would be to ignore any error from copy_to_user and always return the number of clock ticks. We should call force_successful_syscall_return, and glibc on x86 should be taught not to interpret negative values as an error. Changing glibc might be hard ;) POSIX doesn't require us to return an EFAULT error if the buf argument is bogus. If userspace does supply a bogus buf pointer, then either it will dereference it itself and get a segfault, or it won't dereference it, in which case it obviously didn't care about the values we tried to put there. If we try to return an error under some circumstances, then there is at least one 32-bit value for the number of ticks that will cause confusion. We can either change that value (or values) to some other value, which seems pretty bogus, or we can just decide not to return any errors. The latter seems to me to have no significant downside and to be the simplest solution to the problem. the latter is what my protopatch does isn't it? It wraps at 0x7fff. It appears that glibc treats all of 0x8000-0x as an error. The best solution is to change the kernel to never return an error and to change glibc to never treat return as an error. -- vda - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
From: Andrew Morton <[EMAIL PROTECTED]> Date: Wed, 7 Nov 2007 23:09:16 -0800 > I don't think that's a big problem? This syscall can (oddly) return any > 32-bit (64-bit) number and a smart application developer (after saying wtf) > would realise that he just can't check for errors and have correctly > working code. > > Then again, if he was smart he just wouldn't use times(2)'s return value > for anything. But what is the alternative? I don't think there is one, > apart from much saner things like gettimeofday(). You and I would say "wtf", but the manual states what it does: On error, (clock_t) -1 is returned, and errno is set appro- priately. And I think this (obviously bogus) convention is something we are really stuck with. Another awful aspect of this is that glibc is going to overwrite 'errno' for this return value range. That will likely cause more application misbehavior than some of the other side effects we've been discussing. In short we have two problems: 1) glibc thinks -4096 < x < 0 is an error, and will write this value into errno and return -1 to the application 2) the manual states that -1 means error - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
> On Wed, 07 Nov 2007 22:25:30 -0800 (PST) David Miller <[EMAIL PROTECTED]> > wrote: > From: Andrew Morton <[EMAIL PROTECTED]> > Date: Wed, 7 Nov 2007 21:20:05 -0800 > > > Yup. But userspace will already have a fit if either the start or end time > > advanced into the glibc-thought-that-was-an-error range. > > On x86 only. We could use force_successful_syscall_return() > to make sure the condition codes get set correctly on > other platforms. > > But even in that case we'd still be broken when the return > value is exactly -1 and that's what the application is going > to compare against to test for errors. I don't think that's a big problem? This syscall can (oddly) return any 32-bit (64-bit) number and a smart application developer (after saying wtf) would realise that he just can't check for errors and have correctly working code. Then again, if he was smart he just wouldn't use times(2)'s return value for anything. But what is the alternative? I don't think there is one, apart from much saner things like gettimeofday(). - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
From: Andrew Morton <[EMAIL PROTECTED]> Date: Wed, 7 Nov 2007 21:20:05 -0800 > Yup. But userspace will already have a fit if either the start or end time > advanced into the glibc-thought-that-was-an-error range. On x86 only. We could use force_successful_syscall_return() to make sure the condition codes get set correctly on other platforms. But even in that case we'd still be broken when the return value is exactly -1 and that's what the application is going to compare against to test for errors. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
From: Paul Mackerras <[EMAIL PROTECTED]> Date: Thu, 8 Nov 2007 16:15:51 +1100 > David Miller writes: > > > I can't see where x86 is doing this though, so perhaps for x86 > > glibc does make the negative value check. But I doubt it is > > checking the range 0x8000-0x, otherwise mmap() would > > be busted. > > At least for the INTERNAL_SYSCALL macro in glibc, the error check is: > > #define INTERNAL_SYSCALL_ERROR_P(val, err) \ > ((unsigned int) (val) >= 0xf001u) > > in sysdeps/unix/sysv/linux/i386/sysdep.h. Similarly the PSEUDO macro > in that file does a cmpl $-4095,%eax to test for error. (There is also > a PSEUDO_NOERRNO which doesn't test for error.) > > So the convention on (32-bit) x86 is that -4095 .. -1 are error > values, and other values are successful return values. Thanks for figuring that out. Really there is no way to fix sys_times() return values universally. Each proposed solution either doesn't fix the problem, or adds a new failure mode. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
From: Paul Mackerras <[EMAIL PROTECTED]> Date: Thu, 8 Nov 2007 15:59:12 +1100 > Not on powerpc. On powerpc the error indication is carried separately > in a condition register bit. So a force_successful_syscall_return() > call will make glibc automatically do the right thing without any > glibc changes on powerpc. It still won't fix the problem. When the return value is (clock_t) -1, all the force_successful_syscall_return() calls and glibc condition codes checks in the world are not going to fix the application code which checks for error using -1. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
> On Thu, 8 Nov 2007 16:36:08 +1100 Paul Mackerras <[EMAIL PROTECTED]> wrote: > Andrew Morton writes: > > > Yup. But userspace will already have a fit if either the start or end time > > advanced into the glibc-thought-that-was-an-error range. > > Not nearly as much of a fit. The effect on x86 is that values between > -4095 and -1 are reported as -1, so the end-start difference will be > out by less than 41 seconds. That's not nearly as dramatic as a > difference of 21 million seconds (over 16 years). :) > > I really think that wrapping at 0x7fff makes the situation worse, > not better. > Sure. So we need to do what you say: never return an error from sys_times() and change glibc to not perform error-interpretation on sys_times() return values and recommend that people bypass libc and go direct to the syscall so they'll work correctly on older glibc. Lovely. I wonder what happens with things like F_GETOWN, shmat() and lseek(/dev/mem) on x86 (things which use force_successful_syscall_return()). According to the comment in include/linux/ptrace.h, glibc should be special-casing these. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
On Wed, Nov 07, 2007 at 03:28:33PM -0800, Andrew Morton wrote: On Wed, 7 Nov 2007 14:47:22 -0800 David Brown <[EMAIL PROTECTED]> wrote: will return '-1' to user space and set the negated clock_t value to errno. At minimum, perhaps it should return a sane errno value. RETURN VALUE times() returns the number of clock ticks that have elapsed since an arbitrary point in the past. For Linux 2.4 and earlier this point is the moment the system was booted. Since Linux 2.6, this point is (2^32/HZ) - 300 (i.e., about 429 million) seconds before system boot time. The return value may overflow the possible range of type clock_t. On error, (clock_t) -1 is returned, and errno is set appro- priately. The strange -1 behavior is enshrined in history. I think a better answer is to tell people to use getrusage() if they want a return result without this problem. Adding INITIAL_JIFFIES will fix the case where an embedded system is booted up to run a test and then shut down, and the mask, although it causes discontinuities periodically at least moves them away from the early boot. INITIAL_JIFFIES was a good idea, but it is probably best to keep it inside of the kernel. David Brown - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
Andrew Morton writes: > Yup. But userspace will already have a fit if either the start or end time > advanced into the glibc-thought-that-was-an-error range. Not nearly as much of a fit. The effect on x86 is that values between -4095 and -1 are reported as -1, so the end-start difference will be out by less than 41 seconds. That's not nearly as dramatic as a difference of 21 million seconds (over 16 years). :) I really think that wrapping at 0x7fff makes the situation worse, not better. Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
> On Thu, 8 Nov 2007 15:59:12 +1100 Paul Mackerras <[EMAIL PROTECTED]> wrote: > Andrew Morton writes: > > > "the latter" is what my protopatch does isn't it? It wraps at 0x7fff. > > It appears that glibc treats all of 0x8000-0x as an error. > > Not on powerpc. On powerpc the error indication is carried separately > in a condition register bit. So a force_successful_syscall_return() > call will make glibc automatically do the right thing without any > glibc changes on powerpc. OK > Wrapping at 0x7fff will cause programs to see large negative > deltas between successive calls when the wrap occurs. I can see that > giving userspace fits. :) > Yup. But userspace will already have a fit if either the start or end time advanced into the glibc-thought-that-was-an-error range. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
David Miller writes: > I can't see where x86 is doing this though, so perhaps for x86 > glibc does make the negative value check. But I doubt it is > checking the range 0x8000-0x, otherwise mmap() would > be busted. At least for the INTERNAL_SYSCALL macro in glibc, the error check is: #define INTERNAL_SYSCALL_ERROR_P(val, err) \ ((unsigned int) (val) >= 0xf001u) in sysdeps/unix/sysv/linux/i386/sysdep.h. Similarly the PSEUDO macro in that file does a cmpl $-4095,%eax to test for error. (There is also a PSEUDO_NOERRNO which doesn't test for error.) So the convention on (32-bit) x86 is that -4095 .. -1 are error values, and other values are successful return values. Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
Andrew Morton writes: > "the latter" is what my protopatch does isn't it? It wraps at 0x7fff. > It appears that glibc treats all of 0x8000-0x as an error. Not on powerpc. On powerpc the error indication is carried separately in a condition register bit. So a force_successful_syscall_return() call will make glibc automatically do the right thing without any glibc changes on powerpc. Wrapping at 0x7fff will cause programs to see large negative deltas between successive calls when the wrap occurs. I can see that giving userspace fits. :) Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
From: Andrew Morton <[EMAIL PROTECTED]> Date: Wed, 7 Nov 2007 19:07:14 -0800 > It appears that glibc treats all of 0x8000-0x as an > error. glibc treats it as an error if the system call returns with the carry condition code set. At least that's how I've understood it to work and at a minimum this is how it works on sparc, ppc, ia64, mips, etc. The error indication is being created by the system call return path in the kernel. It tests for values between -512 and 0, and marks those as errors unless force_successful_syscall() has been called. I can't see where x86 is doing this though, so perhaps for x86 glibc does make the negative value check. But I doubt it is checking the range 0x8000-0x, otherwise mmap() would be busted. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
> On Thu, 8 Nov 2007 12:53:57 +1100 Paul Mackerras <[EMAIL PROTECTED]> wrote: > Andrew Morton writes: > > > Given all this stuff, the return value from sys_times() doesn't seem a > > particularly useful or reliable kernel interface. > > I think the best thing would be to ignore any error from copy_to_user > and always return the number of clock ticks. We should call > force_successful_syscall_return, and glibc on x86 should be taught not > to interpret negative values as an error. Changing glibc might be hard ;) > POSIX doesn't require us to return an EFAULT error if the buf argument > is bogus. If userspace does supply a bogus buf pointer, then either > it will dereference it itself and get a segfault, or it won't > dereference it, in which case it obviously didn't care about the > values we tried to put there. > > If we try to return an error under some circumstances, then there is > at least one 32-bit value for the number of ticks that will cause > confusion. We can either change that value (or values) to some other > value, which seems pretty bogus, or we can just decide not to return > any errors. The latter seems to me to have no significant downside > and to be the simplest solution to the problem. "the latter" is what my protopatch does isn't it? It wraps at 0x7fff. It appears that glibc treats all of 0x8000-0x as an error. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
From: Paul Mackerras <[EMAIL PROTECTED]> Date: Thu, 8 Nov 2007 12:53:57 +1100 > Andrew Morton writes: > > > Given all this stuff, the return value from sys_times() doesn't seem a > > particularly useful or reliable kernel interface. > > I think the best thing would be to ignore any error from copy_to_user > and always return the number of clock ticks. We should call > force_successful_syscall_return, and glibc on x86 should be taught not > to interpret negative values as an error. > > POSIX doesn't require us to return an EFAULT error if the buf argument > is bogus. If userspace does supply a bogus buf pointer, then either > it will dereference it itself and get a segfault, or it won't > dereference it, in which case it obviously didn't care about the > values we tried to put there. > > If we try to return an error under some circumstances, then there is > at least one 32-bit value for the number of ticks that will cause > confusion. We can either change that value (or values) to some other > value, which seems pretty bogus, or we can just decide not to return > any errors. The latter seems to me to have no significant downside > and to be the simplest solution to the problem. I agree with this analysis. The Linux man page for times() explicitly lists (clock_t) -1 as a return value meaning error. So even if we did make some effort to return errors "properly" (via force_successful_syscall_return() et al.) userspace would still be screwed because (clock_t) -1 would be interpreted as an error. Actually I think this basically proves we cannot return (clock_t) -1 ever because all existing userland (I'm not talking about inside glibc, I'm talking about inside of applications) will see this as an error. User applications have no other way to check for error. This API is definitely very poorly designed, no matter which way we "fix" this some case will remain broken. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
Andrew Morton writes: > Given all this stuff, the return value from sys_times() doesn't seem a > particularly useful or reliable kernel interface. I think the best thing would be to ignore any error from copy_to_user and always return the number of clock ticks. We should call force_successful_syscall_return, and glibc on x86 should be taught not to interpret negative values as an error. POSIX doesn't require us to return an EFAULT error if the buf argument is bogus. If userspace does supply a bogus buf pointer, then either it will dereference it itself and get a segfault, or it won't dereference it, in which case it obviously didn't care about the values we tried to put there. If we try to return an error under some circumstances, then there is at least one 32-bit value for the number of ticks that will cause confusion. We can either change that value (or values) to some other value, which seems pretty bogus, or we can just decide not to return any errors. The latter seems to me to have no significant downside and to be the simplest solution to the problem. Paul. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
> On Thu, 08 Nov 2007 01:54:40 +0100 Andreas Schwab <[EMAIL PROTECTED]> wrote: > Andrew Morton <[EMAIL PROTECTED]> writes: > > > diff -puN kernel/compat.c~a kernel/compat.c > > --- a/kernel/compat.c~a > > +++ a/kernel/compat.c > > @@ -162,7 +162,8 @@ asmlinkage long compat_sys_times(struct > > if (copy_to_user(tbuf, , sizeof(tmp))) > > return -EFAULT; > > } > > - return compat_jiffies_to_clock_t(jiffies); > > + return compat_jiffies_to_clock_t((jiffies + INITIAL_JIFFIES) & > > + LONG_MAX); > > Are you sure you want LONG_MAX here, not 0x7fff? > I'm not sure of anything - I'm just trolling ;) That's 0x7fff for architectures which implement this function. I think that lines up correctly with jiffies and the return value from compat_sys_times(). Perhaps formally it should be USERSPACE_CLOCK_T_MAX, but we don't have that. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
> On Wed, 07 Nov 2007 16:50:22 -0800 (PST) David Miller <[EMAIL PROTECTED]> > wrote: > From: Andrew Morton <[EMAIL PROTECTED]> > Date: Wed, 7 Nov 2007 15:28:33 -0800 > > > Perhaps this is a bug in glibc: it is interpreting the times() return value > > in the same way as other syscalls. > > The problem is more likely that we are failing to > invoke force_successful_syscall_return() here. > > Otherwise the syscall return path interprets negative > values as errors, and sets the cpu condition codes. > > And that is what userspace is actually checking for > to determine if there is an error or not. hm, I'd forgotten about that. It seems to be a no-op on lots of architectures? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
Andrew Morton <[EMAIL PROTECTED]> writes: > diff -puN kernel/compat.c~a kernel/compat.c > --- a/kernel/compat.c~a > +++ a/kernel/compat.c > @@ -162,7 +162,8 @@ asmlinkage long compat_sys_times(struct > if (copy_to_user(tbuf, , sizeof(tmp))) > return -EFAULT; > } > - return compat_jiffies_to_clock_t(jiffies); > + return compat_jiffies_to_clock_t((jiffies + INITIAL_JIFFIES) & > + LONG_MAX); Are you sure you want LONG_MAX here, not 0x7fff? Andreas. -- Andreas Schwab, SuSE Labs, [EMAIL PROTECTED] SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
From: Andrew Morton <[EMAIL PROTECTED]> Date: Wed, 7 Nov 2007 15:28:33 -0800 > Perhaps this is a bug in glibc: it is interpreting the times() return value > in the same way as other syscalls. The problem is more likely that we are failing to invoke force_successful_syscall_return() here. Otherwise the syscall return path interprets negative values as errors, and sets the cpu condition codes. And that is what userspace is actually checking for to determine if there is an error or not. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
> On Wed, 7 Nov 2007 15:28:33 -0800 Andrew Morton <[EMAIL PROTECTED]> wrote: > > On Wed, 7 Nov 2007 14:47:22 -0800 David Brown <[EMAIL PROTECTED]> wrote: > > compat_sys_times() has bogus return until jiffies is >= 0. I discovered > > this running LTP within 5 minutes of booting. > > > > The return result > > > > return compat_jiffies_to_clock_t(jiffies); > > > > will return '-1' to user space and set the negated clock_t value to errno. > > > > I'm not sure what the correct fix for this is. I can come up with a patch > > if anyone has ideas on how to fix it. > > > > At minimum, perhaps it should return a sane errno value. > > RETURN VALUE >times() returns the number of clock ticks that have elapsed since an >arbitrary point in the past. For Linux 2.4 and earlier this point is >the moment the system was booted. Since Linux 2.6, this point is >(2^32/HZ) - 300 (i.e., about 429 million) seconds before system boot >time. The return value may overflow the possible range of type >clock_t. On error, (clock_t) -1 is returned, and errno is set appro- >priately. > > > Perhaps this is a bug in glibc: it is interpreting the times() return value > in the same way as other syscalls. > > It would have been sensible for us to add INITIAL_JIFFIES to the value > instead of exposing this kernel-only detail to the world, although the > problem will of course reoccur once jiffies hits 0x8000. Unfortunately > we've even gone and enshrined this bogon in the manpage. > > Proposed fix: > > -return compat_jiffies_to_clock_t(jiffies); > +return compat_jiffies_to_clock_t((jiffies + INITIAL_JIFFIES) & > + 0x7fff); > > ? Like this? It gets messy. From: Andrew Morton <[EMAIL PROTECTED]> David Brown points out that compat_sys_times() (and sys_times()) can return arbitrary 32-bit (or 64-bit values). If these happen to be negative (jiffy wrap, or before INITIAL_JIFFIES) then libc will interpret this as an error and will return -1 to the libc user and will set errno. The manpage for times(2) says: times() returns the number of clock ticks that have elapsed since an arbitrary point in the past. For Linux 2.4 and earlier this point is the moment the system was booted. Since Linux 2.6, this point is (2^32/HZ) - 300 (i.e., about 429 million) seconds before system boot time. The return value may overflow the possible range of type clock_t. On error, (clock_t) -1 is returned, and errno is set appro- priately. We can fix this by masking the return value down to a 31-bit (63-bit) value. Also, let's correct for INTIAL_JIFFIES - this isn't a detail which should be exposed to userspace. Unfortunately this change can break userspace. If a program was (correctly) doing: unsigned long start = times(...); ... unsigned long end = times(...); unsigned long delta = end - start; then `delta' can be grossly wrong if we wrapped in the interval. Instead userspace will need to mask `delta' by 0x7fff (0x7fff) to get the correct number. But userspace was already busted in the presence of wraparound, due to glibc's convert-to-negative-one behaviour. Given all this stuff, the return value from sys_times() doesn't seem a particularly useful or reliable kernel interface. Cc: David Brown <[EMAIL PROTECTED]> Cc: Ulrich Drepper <[EMAIL PROTECTED]> Cc: Michael Kerrisk <[EMAIL PROTECTED]> Signed-off-by: Andrew Morton <[EMAIL PROTECTED]> --- kernel/compat.c |3 ++- kernel/sys.c|3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff -puN kernel/sys.c~a kernel/sys.c --- a/kernel/sys.c~a +++ a/kernel/sys.c @@ -897,7 +897,8 @@ asmlinkage long sys_times(struct tms __u if (copy_to_user(tbuf, , sizeof(struct tms))) return -EFAULT; } - return (long) jiffies_64_to_clock_t(get_jiffies_64()); + return jiffies_64_to_clock_t((get_jiffies_64() + INITIAL_JIFFIES) & + LONG_MAX); } /* diff -puN kernel/compat.c~a kernel/compat.c --- a/kernel/compat.c~a +++ a/kernel/compat.c @@ -162,7 +162,8 @@ asmlinkage long compat_sys_times(struct if (copy_to_user(tbuf, , sizeof(tmp))) return -EFAULT; } - return compat_jiffies_to_clock_t(jiffies); + return compat_jiffies_to_clock_t((jiffies + INITIAL_JIFFIES) & + LONG_MAX); } /* _ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies >= 0.
> On Wed, 7 Nov 2007 14:47:22 -0800 David Brown <[EMAIL PROTECTED]> wrote: > compat_sys_times() has bogus return until jiffies is >= 0. I discovered > this running LTP within 5 minutes of booting. > > The return result > > return compat_jiffies_to_clock_t(jiffies); > > will return '-1' to user space and set the negated clock_t value to errno. > > I'm not sure what the correct fix for this is. I can come up with a patch > if anyone has ideas on how to fix it. > > At minimum, perhaps it should return a sane errno value. RETURN VALUE times() returns the number of clock ticks that have elapsed since an arbitrary point in the past. For Linux 2.4 and earlier this point is the moment the system was booted. Since Linux 2.6, this point is (2^32/HZ) - 300 (i.e., about 429 million) seconds before system boot time. The return value may overflow the possible range of type clock_t. On error, (clock_t) -1 is returned, and errno is set appro- priately. Perhaps this is a bug in glibc: it is interpreting the times() return value in the same way as other syscalls. It would have been sensible for us to add INITIAL_JIFFIES to the value instead of exposing this kernel-only detail to the world, although the problem will of course reoccur once jiffies hits 0x8000. Unfortunately we've even gone and enshrined this bogon in the manpage. Proposed fix: -return compat_jiffies_to_clock_t(jiffies); +return compat_jiffies_to_clock_t((jiffies + INITIAL_JIFFIES) & + 0x7fff); ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
compat_sys_times() bogus until jiffies >= 0.
compat_sys_times() has bogus return until jiffies is >= 0. I discovered this running LTP within 5 minutes of booting. The return result return compat_jiffies_to_clock_t(jiffies); will return '-1' to user space and set the negated clock_t value to errno. I'm not sure what the correct fix for this is. I can come up with a patch if anyone has ideas on how to fix it. At minimum, perhaps it should return a sane errno value. Thanks, David Brown - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
compat_sys_times() bogus until jiffies = 0.
compat_sys_times() has bogus return until jiffies is = 0. I discovered this running LTP within 5 minutes of booting. The return result return compat_jiffies_to_clock_t(jiffies); will return '-1' to user space and set the negated clock_t value to errno. I'm not sure what the correct fix for this is. I can come up with a patch if anyone has ideas on how to fix it. At minimum, perhaps it should return a sane errno value. Thanks, David Brown - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
On Wed, 7 Nov 2007 14:47:22 -0800 David Brown [EMAIL PROTECTED] wrote: compat_sys_times() has bogus return until jiffies is = 0. I discovered this running LTP within 5 minutes of booting. The return result return compat_jiffies_to_clock_t(jiffies); will return '-1' to user space and set the negated clock_t value to errno. I'm not sure what the correct fix for this is. I can come up with a patch if anyone has ideas on how to fix it. At minimum, perhaps it should return a sane errno value. RETURN VALUE times() returns the number of clock ticks that have elapsed since an arbitrary point in the past. For Linux 2.4 and earlier this point is the moment the system was booted. Since Linux 2.6, this point is (2^32/HZ) - 300 (i.e., about 429 million) seconds before system boot time. The return value may overflow the possible range of type clock_t. On error, (clock_t) -1 is returned, and errno is set appro- priately. Perhaps this is a bug in glibc: it is interpreting the times() return value in the same way as other syscalls. It would have been sensible for us to add INITIAL_JIFFIES to the value instead of exposing this kernel-only detail to the world, although the problem will of course reoccur once jiffies hits 0x8000. Unfortunately we've even gone and enshrined this bogon in the manpage. Proposed fix: -return compat_jiffies_to_clock_t(jiffies); +return compat_jiffies_to_clock_t((jiffies + INITIAL_JIFFIES) + 0x7fff); ? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
On Thu, 8 Nov 2007 15:59:12 +1100 Paul Mackerras [EMAIL PROTECTED] wrote: Andrew Morton writes: the latter is what my protopatch does isn't it? It wraps at 0x7fff. It appears that glibc treats all of 0x8000-0x as an error. Not on powerpc. On powerpc the error indication is carried separately in a condition register bit. So a force_successful_syscall_return() call will make glibc automatically do the right thing without any glibc changes on powerpc. OK Wrapping at 0x7fff will cause programs to see large negative deltas between successive calls when the wrap occurs. I can see that giving userspace fits. :) Yup. But userspace will already have a fit if either the start or end time advanced into the glibc-thought-that-was-an-error range. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
Andrew Morton writes: Yup. But userspace will already have a fit if either the start or end time advanced into the glibc-thought-that-was-an-error range. Not nearly as much of a fit. The effect on x86 is that values between -4095 and -1 are reported as -1, so the end-start difference will be out by less than 41 seconds. That's not nearly as dramatic as a difference of 21 million seconds (over 16 years). :) I really think that wrapping at 0x7fff makes the situation worse, not better. Paul. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
From: Andrew Morton [EMAIL PROTECTED] Date: Wed, 7 Nov 2007 19:07:14 -0800 It appears that glibc treats all of 0x8000-0x as an error. glibc treats it as an error if the system call returns with the carry condition code set. At least that's how I've understood it to work and at a minimum this is how it works on sparc, ppc, ia64, mips, etc. The error indication is being created by the system call return path in the kernel. It tests for values between -512 and 0, and marks those as errors unless force_successful_syscall() has been called. I can't see where x86 is doing this though, so perhaps for x86 glibc does make the negative value check. But I doubt it is checking the range 0x8000-0x, otherwise mmap() would be busted. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
From: Paul Mackerras [EMAIL PROTECTED] Date: Thu, 8 Nov 2007 12:53:57 +1100 Andrew Morton writes: Given all this stuff, the return value from sys_times() doesn't seem a particularly useful or reliable kernel interface. I think the best thing would be to ignore any error from copy_to_user and always return the number of clock ticks. We should call force_successful_syscall_return, and glibc on x86 should be taught not to interpret negative values as an error. POSIX doesn't require us to return an EFAULT error if the buf argument is bogus. If userspace does supply a bogus buf pointer, then either it will dereference it itself and get a segfault, or it won't dereference it, in which case it obviously didn't care about the values we tried to put there. If we try to return an error under some circumstances, then there is at least one 32-bit value for the number of ticks that will cause confusion. We can either change that value (or values) to some other value, which seems pretty bogus, or we can just decide not to return any errors. The latter seems to me to have no significant downside and to be the simplest solution to the problem. I agree with this analysis. The Linux man page for times() explicitly lists (clock_t) -1 as a return value meaning error. So even if we did make some effort to return errors properly (via force_successful_syscall_return() et al.) userspace would still be screwed because (clock_t) -1 would be interpreted as an error. Actually I think this basically proves we cannot return (clock_t) -1 ever because all existing userland (I'm not talking about inside glibc, I'm talking about inside of applications) will see this as an error. User applications have no other way to check for error. This API is definitely very poorly designed, no matter which way we fix this some case will remain broken. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
On Wed, 07 Nov 2007 22:25:30 -0800 (PST) David Miller [EMAIL PROTECTED] wrote: From: Andrew Morton [EMAIL PROTECTED] Date: Wed, 7 Nov 2007 21:20:05 -0800 Yup. But userspace will already have a fit if either the start or end time advanced into the glibc-thought-that-was-an-error range. On x86 only. We could use force_successful_syscall_return() to make sure the condition codes get set correctly on other platforms. But even in that case we'd still be broken when the return value is exactly -1 and that's what the application is going to compare against to test for errors. I don't think that's a big problem? This syscall can (oddly) return any 32-bit (64-bit) number and a smart application developer (after saying wtf) would realise that he just can't check for errors and have correctly working code. Then again, if he was smart he just wouldn't use times(2)'s return value for anything. But what is the alternative? I don't think there is one, apart from much saner things like gettimeofday(). - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
From: Paul Mackerras [EMAIL PROTECTED] Date: Thu, 8 Nov 2007 16:15:51 +1100 David Miller writes: I can't see where x86 is doing this though, so perhaps for x86 glibc does make the negative value check. But I doubt it is checking the range 0x8000-0x, otherwise mmap() would be busted. At least for the INTERNAL_SYSCALL macro in glibc, the error check is: #define INTERNAL_SYSCALL_ERROR_P(val, err) \ ((unsigned int) (val) = 0xf001u) in sysdeps/unix/sysv/linux/i386/sysdep.h. Similarly the PSEUDO macro in that file does a cmpl $-4095,%eax to test for error. (There is also a PSEUDO_NOERRNO which doesn't test for error.) So the convention on (32-bit) x86 is that -4095 .. -1 are error values, and other values are successful return values. Thanks for figuring that out. Really there is no way to fix sys_times() return values universally. Each proposed solution either doesn't fix the problem, or adds a new failure mode. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
On Thu, 8 Nov 2007 16:36:08 +1100 Paul Mackerras [EMAIL PROTECTED] wrote: Andrew Morton writes: Yup. But userspace will already have a fit if either the start or end time advanced into the glibc-thought-that-was-an-error range. Not nearly as much of a fit. The effect on x86 is that values between -4095 and -1 are reported as -1, so the end-start difference will be out by less than 41 seconds. That's not nearly as dramatic as a difference of 21 million seconds (over 16 years). :) I really think that wrapping at 0x7fff makes the situation worse, not better. Sure. So we need to do what you say: never return an error from sys_times() and change glibc to not perform error-interpretation on sys_times() return values and recommend that people bypass libc and go direct to the syscall so they'll work correctly on older glibc. Lovely. I wonder what happens with things like F_GETOWN, shmat() and lseek(/dev/mem) on x86 (things which use force_successful_syscall_return()). According to the comment in include/linux/ptrace.h, glibc should be special-casing these. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
On Wed, 7 Nov 2007 15:28:33 -0800 Andrew Morton [EMAIL PROTECTED] wrote: On Wed, 7 Nov 2007 14:47:22 -0800 David Brown [EMAIL PROTECTED] wrote: compat_sys_times() has bogus return until jiffies is = 0. I discovered this running LTP within 5 minutes of booting. The return result return compat_jiffies_to_clock_t(jiffies); will return '-1' to user space and set the negated clock_t value to errno. I'm not sure what the correct fix for this is. I can come up with a patch if anyone has ideas on how to fix it. At minimum, perhaps it should return a sane errno value. RETURN VALUE times() returns the number of clock ticks that have elapsed since an arbitrary point in the past. For Linux 2.4 and earlier this point is the moment the system was booted. Since Linux 2.6, this point is (2^32/HZ) - 300 (i.e., about 429 million) seconds before system boot time. The return value may overflow the possible range of type clock_t. On error, (clock_t) -1 is returned, and errno is set appro- priately. Perhaps this is a bug in glibc: it is interpreting the times() return value in the same way as other syscalls. It would have been sensible for us to add INITIAL_JIFFIES to the value instead of exposing this kernel-only detail to the world, although the problem will of course reoccur once jiffies hits 0x8000. Unfortunately we've even gone and enshrined this bogon in the manpage. Proposed fix: -return compat_jiffies_to_clock_t(jiffies); +return compat_jiffies_to_clock_t((jiffies + INITIAL_JIFFIES) + 0x7fff); ? Like this? It gets messy. From: Andrew Morton [EMAIL PROTECTED] David Brown points out that compat_sys_times() (and sys_times()) can return arbitrary 32-bit (or 64-bit values). If these happen to be negative (jiffy wrap, or before INITIAL_JIFFIES) then libc will interpret this as an error and will return -1 to the libc user and will set errno. The manpage for times(2) says: times() returns the number of clock ticks that have elapsed since an arbitrary point in the past. For Linux 2.4 and earlier this point is the moment the system was booted. Since Linux 2.6, this point is (2^32/HZ) - 300 (i.e., about 429 million) seconds before system boot time. The return value may overflow the possible range of type clock_t. On error, (clock_t) -1 is returned, and errno is set appro- priately. We can fix this by masking the return value down to a 31-bit (63-bit) value. Also, let's correct for INTIAL_JIFFIES - this isn't a detail which should be exposed to userspace. Unfortunately this change can break userspace. If a program was (correctly) doing: unsigned long start = times(...); ... unsigned long end = times(...); unsigned long delta = end - start; then `delta' can be grossly wrong if we wrapped in the interval. Instead userspace will need to mask `delta' by 0x7fff (0x7fff) to get the correct number. But userspace was already busted in the presence of wraparound, due to glibc's convert-to-negative-one behaviour. Given all this stuff, the return value from sys_times() doesn't seem a particularly useful or reliable kernel interface. Cc: David Brown [EMAIL PROTECTED] Cc: Ulrich Drepper [EMAIL PROTECTED] Cc: Michael Kerrisk [EMAIL PROTECTED] Signed-off-by: Andrew Morton [EMAIL PROTECTED] --- kernel/compat.c |3 ++- kernel/sys.c|3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff -puN kernel/sys.c~a kernel/sys.c --- a/kernel/sys.c~a +++ a/kernel/sys.c @@ -897,7 +897,8 @@ asmlinkage long sys_times(struct tms __u if (copy_to_user(tbuf, tmp, sizeof(struct tms))) return -EFAULT; } - return (long) jiffies_64_to_clock_t(get_jiffies_64()); + return jiffies_64_to_clock_t((get_jiffies_64() + INITIAL_JIFFIES) + LONG_MAX); } /* diff -puN kernel/compat.c~a kernel/compat.c --- a/kernel/compat.c~a +++ a/kernel/compat.c @@ -162,7 +162,8 @@ asmlinkage long compat_sys_times(struct if (copy_to_user(tbuf, tmp, sizeof(tmp))) return -EFAULT; } - return compat_jiffies_to_clock_t(jiffies); + return compat_jiffies_to_clock_t((jiffies + INITIAL_JIFFIES) + LONG_MAX); } /* _ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
Andrew Morton writes: Given all this stuff, the return value from sys_times() doesn't seem a particularly useful or reliable kernel interface. I think the best thing would be to ignore any error from copy_to_user and always return the number of clock ticks. We should call force_successful_syscall_return, and glibc on x86 should be taught not to interpret negative values as an error. POSIX doesn't require us to return an EFAULT error if the buf argument is bogus. If userspace does supply a bogus buf pointer, then either it will dereference it itself and get a segfault, or it won't dereference it, in which case it obviously didn't care about the values we tried to put there. If we try to return an error under some circumstances, then there is at least one 32-bit value for the number of ticks that will cause confusion. We can either change that value (or values) to some other value, which seems pretty bogus, or we can just decide not to return any errors. The latter seems to me to have no significant downside and to be the simplest solution to the problem. Paul. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
From: Andrew Morton [EMAIL PROTECTED] Date: Wed, 7 Nov 2007 15:28:33 -0800 Perhaps this is a bug in glibc: it is interpreting the times() return value in the same way as other syscalls. The problem is more likely that we are failing to invoke force_successful_syscall_return() here. Otherwise the syscall return path interprets negative values as errors, and sets the cpu condition codes. And that is what userspace is actually checking for to determine if there is an error or not. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
Andrew Morton [EMAIL PROTECTED] writes: diff -puN kernel/compat.c~a kernel/compat.c --- a/kernel/compat.c~a +++ a/kernel/compat.c @@ -162,7 +162,8 @@ asmlinkage long compat_sys_times(struct if (copy_to_user(tbuf, tmp, sizeof(tmp))) return -EFAULT; } - return compat_jiffies_to_clock_t(jiffies); + return compat_jiffies_to_clock_t((jiffies + INITIAL_JIFFIES) + LONG_MAX); Are you sure you want LONG_MAX here, not 0x7fff? Andreas. -- Andreas Schwab, SuSE Labs, [EMAIL PROTECTED] SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
On Wed, 07 Nov 2007 16:50:22 -0800 (PST) David Miller [EMAIL PROTECTED] wrote: From: Andrew Morton [EMAIL PROTECTED] Date: Wed, 7 Nov 2007 15:28:33 -0800 Perhaps this is a bug in glibc: it is interpreting the times() return value in the same way as other syscalls. The problem is more likely that we are failing to invoke force_successful_syscall_return() here. Otherwise the syscall return path interprets negative values as errors, and sets the cpu condition codes. And that is what userspace is actually checking for to determine if there is an error or not. hm, I'd forgotten about that. It seems to be a no-op on lots of architectures? - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
On Thu, 08 Nov 2007 01:54:40 +0100 Andreas Schwab [EMAIL PROTECTED] wrote: Andrew Morton [EMAIL PROTECTED] writes: diff -puN kernel/compat.c~a kernel/compat.c --- a/kernel/compat.c~a +++ a/kernel/compat.c @@ -162,7 +162,8 @@ asmlinkage long compat_sys_times(struct if (copy_to_user(tbuf, tmp, sizeof(tmp))) return -EFAULT; } - return compat_jiffies_to_clock_t(jiffies); + return compat_jiffies_to_clock_t((jiffies + INITIAL_JIFFIES) + LONG_MAX); Are you sure you want LONG_MAX here, not 0x7fff? I'm not sure of anything - I'm just trolling ;) That's 0x7fff for architectures which implement this function. I think that lines up correctly with jiffies and the return value from compat_sys_times(). Perhaps formally it should be USERSPACE_CLOCK_T_MAX, but we don't have that. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
On Wed, Nov 07, 2007 at 03:28:33PM -0800, Andrew Morton wrote: On Wed, 7 Nov 2007 14:47:22 -0800 David Brown [EMAIL PROTECTED] wrote: will return '-1' to user space and set the negated clock_t value to errno. At minimum, perhaps it should return a sane errno value. RETURN VALUE times() returns the number of clock ticks that have elapsed since an arbitrary point in the past. For Linux 2.4 and earlier this point is the moment the system was booted. Since Linux 2.6, this point is (2^32/HZ) - 300 (i.e., about 429 million) seconds before system boot time. The return value may overflow the possible range of type clock_t. On error, (clock_t) -1 is returned, and errno is set appro- priately. The strange -1 behavior is enshrined in history. I think a better answer is to tell people to use getrusage() if they want a return result without this problem. Adding INITIAL_JIFFIES will fix the case where an embedded system is booted up to run a test and then shut down, and the mask, although it causes discontinuities periodically at least moves them away from the early boot. INITIAL_JIFFIES was a good idea, but it is probably best to keep it inside of the kernel. David Brown - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
From: Paul Mackerras [EMAIL PROTECTED] Date: Thu, 8 Nov 2007 15:59:12 +1100 Not on powerpc. On powerpc the error indication is carried separately in a condition register bit. So a force_successful_syscall_return() call will make glibc automatically do the right thing without any glibc changes on powerpc. It still won't fix the problem. When the return value is (clock_t) -1, all the force_successful_syscall_return() calls and glibc condition codes checks in the world are not going to fix the application code which checks for error using -1. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
From: Andrew Morton [EMAIL PROTECTED] Date: Wed, 7 Nov 2007 21:20:05 -0800 Yup. But userspace will already have a fit if either the start or end time advanced into the glibc-thought-that-was-an-error range. On x86 only. We could use force_successful_syscall_return() to make sure the condition codes get set correctly on other platforms. But even in that case we'd still be broken when the return value is exactly -1 and that's what the application is going to compare against to test for errors. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
On Thu, 8 Nov 2007 12:53:57 +1100 Paul Mackerras [EMAIL PROTECTED] wrote: Andrew Morton writes: Given all this stuff, the return value from sys_times() doesn't seem a particularly useful or reliable kernel interface. I think the best thing would be to ignore any error from copy_to_user and always return the number of clock ticks. We should call force_successful_syscall_return, and glibc on x86 should be taught not to interpret negative values as an error. Changing glibc might be hard ;) POSIX doesn't require us to return an EFAULT error if the buf argument is bogus. If userspace does supply a bogus buf pointer, then either it will dereference it itself and get a segfault, or it won't dereference it, in which case it obviously didn't care about the values we tried to put there. If we try to return an error under some circumstances, then there is at least one 32-bit value for the number of ticks that will cause confusion. We can either change that value (or values) to some other value, which seems pretty bogus, or we can just decide not to return any errors. The latter seems to me to have no significant downside and to be the simplest solution to the problem. the latter is what my protopatch does isn't it? It wraps at 0x7fff. It appears that glibc treats all of 0x8000-0x as an error. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
From: Andrew Morton [EMAIL PROTECTED] Date: Wed, 7 Nov 2007 23:09:16 -0800 I don't think that's a big problem? This syscall can (oddly) return any 32-bit (64-bit) number and a smart application developer (after saying wtf) would realise that he just can't check for errors and have correctly working code. Then again, if he was smart he just wouldn't use times(2)'s return value for anything. But what is the alternative? I don't think there is one, apart from much saner things like gettimeofday(). You and I would say wtf, but the manual states what it does: On error, (clock_t) -1 is returned, and errno is set appro- priately. And I think this (obviously bogus) convention is something we are really stuck with. Another awful aspect of this is that glibc is going to overwrite 'errno' for this return value range. That will likely cause more application misbehavior than some of the other side effects we've been discussing. In short we have two problems: 1) glibc thinks -4096 x 0 is an error, and will write this value into errno and return -1 to the application 2) the manual states that -1 means error - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
Andrew Morton writes: the latter is what my protopatch does isn't it? It wraps at 0x7fff. It appears that glibc treats all of 0x8000-0x as an error. Not on powerpc. On powerpc the error indication is carried separately in a condition register bit. So a force_successful_syscall_return() call will make glibc automatically do the right thing without any glibc changes on powerpc. Wrapping at 0x7fff will cause programs to see large negative deltas between successive calls when the wrap occurs. I can see that giving userspace fits. :) Paul. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: compat_sys_times() bogus until jiffies = 0.
David Miller writes: I can't see where x86 is doing this though, so perhaps for x86 glibc does make the negative value check. But I doubt it is checking the range 0x8000-0x, otherwise mmap() would be busted. At least for the INTERNAL_SYSCALL macro in glibc, the error check is: #define INTERNAL_SYSCALL_ERROR_P(val, err) \ ((unsigned int) (val) = 0xf001u) in sysdeps/unix/sysv/linux/i386/sysdep.h. Similarly the PSEUDO macro in that file does a cmpl $-4095,%eax to test for error. (There is also a PSEUDO_NOERRNO which doesn't test for error.) So the convention on (32-bit) x86 is that -4095 .. -1 are error values, and other values are successful return values. Paul. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/