Re: svn commit: r236026 - in head/sys: amd64/linux32 compat/freebsd32 kern
On Mon, 28 May 2012, Robert N. M. Watson wrote: On 28 May 2012, at 09:36, Konstantin Belousov wrote: On Sun, May 27, 2012 at 07:49:36AM +1000, Bruce Evans wrote: On Sat, 26 May 2012, Konstantin Belousov wrote: On Sat, May 26, 2012 at 10:21:25PM +1000, Bruce Evans wrote: The 'low level' AKA magic happens in several *_fetch_syscall_args() functions. For both linux32 and freebsd32, the magic code automatically zero-extends the arguments into 64bit entities. Linux passes args in registers, while FreeBSD uses words on stack. Actually, the amd64 linux_fetch32_fetch_syscall_args() just copies from 64-bit registers frame->tf_r* to 64-bit sa->args[*]. I can't see how this gives anything except garbage in the top bits. Is there magic in the switch to 64-bit mode that sets the top bits? Anyway, sign extension would give garbage for unsigned args, and zero-extension would give garbage for negative signed args. Hardware zero-extends any register touched in the 32bit mode. In fact, please see r217991 for related bug. This may well be true on Intel, but is not true of MIPS -- which we probably don't care about currently for the purposes of Linux emulation, but maybe someday we will. On MIPS, 32-bit values are sign-extended rather than zero-extended. Please use the unix newline character in mail. The translation is MD so "this" (not the newline) isn't a problem. I see a somewhat complex thread here, but am not sure I quite understand the import for Capsicum. Is the 64-bit rights mask as part of system call arguments not working properly in compat32 scenarios? Or are there issues outside of the compat environment? Right now compat32 is not well-supported with Capsicum, but fixing that is quite important to productionising Capsicum. 64-bit args need special translation, and the rights mask doesn't have any. At best, the low 32 bits of it might work accidentally. 64-bit args in ia32 start as 2 32-bit ones. These get turned into 2 64-bit ones (zero extended). These must be combined (in non-automatically generated code) into a 32-bit one. The automatically-generated args struct promotes the 32-bit args to 64 bits so that it can be passed directly to native syscalls. This doesn't work if there are any specially specially-translated args. Then the whole args struct must be translated (by simple copying for 32-bit args) lseek() shows how this should be done: % #ifdef COMPAT_43 % int % ofreebsd32_lseek(struct thread *td, struct ofreebsd32_lseek_args *uap) % { % struct lseek_args nuap; % % nuap.fd = uap->fd; % nuap.offset = uap->offset; % nuap.whence = uap->whence; % return (sys_lseek(td, &nuap)); % } % #endif This one seems to be to work around olseek() being broken in kern/syscalls.master. Its offset arg is only 32 bits, so it should be possible to call olseek() directly, but the bug makes the args structs incompatible, so the above does a simple translation by copying args and calls the full native lseek. olseek() uses the same copying code, but never worked since it starts with a wrong args struct. Apparently no one uses olseek(), but this compat lseek() is used a bit so the bug was noticed. The args stucts are: % struct ofreebsd32_lseek_args { % char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)]; % char offset_l_[PADL_(int)]; int offset; char offset_r_[PADR_(int)]; % char whence_l_[PADL_(int)]; int whence; char whence_r_[PADR_(int)]; % }; The padding is to 64 bits. All the types are correct, although the type of `offset' is confusing. Its type is that of the pre-4.4BSD off_t, which was long, which was 32 bits on ia32 back then. This is now spelled as int, so that we can't easily see either of the 4 layers of translation in it (off_t = long -> int32_t back then (type pun) -> int32_t now (type pun) = int now. % struct olseek_args { % char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)]; % char offset_l_[PADL_(long)]; long offset; char offset_r_[PADR_(long)]; % char whence_l_[PADL_(int)]; int whence; char whence_r_[PADR_(int)]; % }; The type of `offset' is broken. It needs to be 32 bits since that is what it was on ia32 before 4.4BSD, as above. amd64 shouldn't support pre-4.4BSD, and making an olseek() call is close to impossible on it. However, on amd64 with ia32 support, you have to enable COMPAT_43 for the ofreebsd32_lseek() to be compiled, and this gives olseek() too, so you might as well use it after fixing it. If a native olseek() call is somehow made on amd64, then it would start with a 32-bit offset and this might be extended correctly in userland, and then the wrong `long offset' in the above would work. But there is no need to risk this. By declaring the offset as signed 32 bits, it will be extended correctly when it is loaded from the args struct. Note that the above translation `nuap.offset = uap->offset;' does a critical non-null currect extension and this requires the olseek() being declared
Re: svn commit: r236026 - in head/sys: amd64/linux32 compat/freebsd32 kern
On 28 May 2012, at 09:36, Konstantin Belousov wrote: > On Sun, May 27, 2012 at 07:49:36AM +1000, Bruce Evans wrote: >> On Sat, 26 May 2012, Konstantin Belousov wrote: >> >>> On Sat, May 26, 2012 at 10:21:25PM +1000, Bruce Evans wrote: >>> The 'low level' AKA magic happens in several *_fetch_syscall_args() >>> functions. For both linux32 and freebsd32, the magic code automatically >>> zero-extends the arguments into 64bit entities. Linux passes args in >>> registers, while FreeBSD uses words on stack. >> >> Actually, the amd64 linux_fetch32_fetch_syscall_args() just copies from >> 64-bit registers frame->tf_r* to 64-bit sa->args[*]. I can't see how >> this gives anything except garbage in the top bits. Is there magic in >> the switch to 64-bit mode that sets the top bits? Anyway, sign extension >> would give garbage for unsigned args, and zero-extension would give >> garbage for negative signed args. > Hardware zero-extends any register touched in the 32bit mode. > > In fact, please see r217991 for related bug. This may well be true on Intel, but is not true of MIPS -- which we probably don't care about currently for the purposes of Linux emulation, but maybe someday we will. On MIPS, 32-bit values are sign-extended rather than zero-extended. I see a somewhat complex thread here, but am not sure I quite understand the import for Capsicum. Is the 64-bit rights mask as part of system call arguments not working properly in compat32 scenarios? Or are there issues outside of the compat environment? Right now compat32 is not well-supported with Capsicum, but fixing that is quite important to productionising Capsicum. Robert___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r236026 - in head/sys: amd64/linux32 compat/freebsd32 kern
On Mon, 28 May 2012, Konstantin Belousov wrote: On Sun, May 27, 2012 at 07:49:36AM +1000, Bruce Evans wrote: On Sat, 26 May 2012, Konstantin Belousov wrote: On Sat, May 26, 2012 at 10:21:25PM +1000, Bruce Evans wrote: The 'low level' AKA magic happens in several *_fetch_syscall_args() functions. For both linux32 and freebsd32, the magic code automatically zero-extends the arguments into 64bit entities. Linux passes args in registers, while FreeBSD uses words on stack. Actually, the amd64 linux_fetch32_fetch_syscall_args() just copies from 64-bit registers frame->tf_r* to 64-bit sa->args[*]. I can't see how this gives anything except garbage in the top bits. Is there magic in the switch to 64-bit mode that sets the top bits? Anyway, sign extension would give garbage for unsigned args, and zero-extension would give garbage for negative signed args. Hardware zero-extends any register touched in the 32bit mode. So they have garbage extension when not touched? Or maybe the kernel extends them. In fact, please see r217991 for related bug. That seems to be the kernel extending them. I tested on a kernel built on 3 Mar 2012. It is much later than that, and shows nonzero extensions (about half of the wrong cases sign extensions). Bruce ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r236026 - in head/sys: amd64/linux32 compat/freebsd32 kern
On Sun, May 27, 2012 at 07:49:36AM +1000, Bruce Evans wrote: > On Sat, 26 May 2012, Konstantin Belousov wrote: > > >On Sat, May 26, 2012 at 10:21:25PM +1000, Bruce Evans wrote: > >The 'low level' AKA magic happens in several *_fetch_syscall_args() > >functions. For both linux32 and freebsd32, the magic code automatically > >zero-extends the arguments into 64bit entities. Linux passes args in > >registers, while FreeBSD uses words on stack. > > Actually, the amd64 linux_fetch32_fetch_syscall_args() just copies from > 64-bit registers frame->tf_r* to 64-bit sa->args[*]. I can't see how > this gives anything except garbage in the top bits. Is there magic in > the switch to 64-bit mode that sets the top bits? Anyway, sign extension > would give garbage for unsigned args, and zero-extension would give > garbage for negative signed args. Hardware zero-extends any register touched in the 32bit mode. In fact, please see r217991 for related bug. pgpS3viGKMIHx.pgp Description: PGP signature
Re: svn commit: r236026 - in head/sys: amd64/linux32 compat/freebsd32 kern
On Sun, 27 May 2012, Bruce Evans wrote: On Sat, 26 May 2012, Konstantin Belousov wrote: On Sat, May 26, 2012 at 10:21:25PM +1000, Bruce Evans wrote: ... All the non-indirect "char *"s for pathnames and other things seem to be completely wrong on amd64 too. These pointers start as 32 bits, and it takes more than a bad type pun to turn then into kernel 64-bit pointers. The magic for this seems to be: - all args are converted to 64 bits (by zero-extension?) at a low level ... The 'low level' AKA magic happens in several *_fetch_syscall_args() functions. For both linux32 and freebsd32, the magic code automatically zero-extends the arguments into 64bit entities. Linux passes args in registers, while FreeBSD uses words on stack. Actually, the amd64 linux_fetch32_fetch_syscall_args() just copies from 64-bit registers frame->tf_r* to 64-bit sa->args[*]. I can't see how this gives anything except garbage in the top bits. Is there magic in the switch to 64-bit mode that sets the top bits? Anyway, sign extension would give garbage for unsigned args, and zero-extension would give garbage for negative signed args. I checked what actually happens. There is no magic. The registers just have garbage in the top bits. Mostly this is 0. Sometimes it is all top bits 1. Sometimes it is some top bits 1 (starting at the top). I traced a bit of the execution of ld-linux.so... The first syscall was an linux_mmap. It had all top bits 1 in tf_rbx, then some top bits 1 in a later register, and all top bits 0 in 4 registers. Then there were several syscalls with all top bits in all 6 syscall arg registers 0. Then there was a linux_write syscall with 1's back in some top bits. Then there were many syscalls with all top bits in all arg registers 0. This behaviour is probably due to the initial state being more random (but why doesn't exec clear all top bits?). Then normal activity like xorl %ebx,%ebx tends to clear top bits (does this xorl clear top bits even in 32-bit mode?). Then something occasionally sets top bits to 1. The amd64 ia32_fetch_syscall_args() is quite different. Now the args ... Bruce ___ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"
Re: svn commit: r236026 - in head/sys: amd64/linux32 compat/freebsd32 kern
On Sat, 26 May 2012, Konstantin Belousov wrote: On Sat, May 26, 2012 at 10:21:25PM +1000, Bruce Evans wrote: Please don't quote the whole thing. On Fri, 25 May 2012, Ed Schouten wrote: Log: Remove use of non-ISO-C integer types from system call tables. These files already use ISO-C-style integer types, so make them less inconsistent by preferring the standard types. These should actually be Linux types l_foo_t. ISO-C-style integer types [... but only in the low-level linux headers where this is possible] Modified: head/sys/amd64/linux32/syscalls.master == --- head/sys/amd64/linux32/syscalls.master Fri May 25 21:12:24 2012 (r236025) +++ head/sys/amd64/linux32/syscalls.master Fri May 25 21:50:48 2012 (r236026) @@ -54,8 +54,8 @@ l_int mode); } 9 AUE_LINKSTD { int linux_link(char *path, char *to); } 10 AUE_UNLINK STD { int linux_unlink(char *path); } -11 AUE_EXECVE STD { int linux_execve(char *path, u_int32_t *argp, \ - u_int32_t *envp); } +11 AUE_EXECVE STD { int linux_execve(char *path, uint32_t *argp, \ + uint32_t *envp); } argp and envp aren't uintany_t * in either Linux or FreeBSD. They start as "char * const *". There is no Linux type for an indirect "char *", and one was hacked up here by pretending that "char *" is u_int32_t (it is actually just 32 bits). Using l_caddr_t seems to be the best hack available (since by abusing l_caddr_t, we know that it is actually char *). The `const' in the type for argp and envp is further from being handled correctly. Most or all syscall.master's just type-pun it away. Similarly for "const char *path". All the non-indirect "char *"s for pathnames and other things seem to be completely wrong on amd64 too. These pointers start as 32 bits, and it takes more than a bad type pun to turn then into kernel 64-bit pointers. The magic for this seems to be: - all args are converted to 64 bits (by zero-extension?) at a low level ... The 'low level' AKA magic happens in several *_fetch_syscall_args() functions. For both linux32 and freebsd32, the magic code automatically zero-extends the arguments into 64bit entities. Linux passes args in registers, while FreeBSD uses words on stack. Actually, the amd64 linux_fetch32_fetch_syscall_args() just copies from 64-bit registers frame->tf_r* to 64-bit sa->args[*]. I can't see how this gives anything except garbage in the top bits. Is there magic in the switch to 64-bit mode that sets the top bits? Anyway, sign extension would give garbage for unsigned args, and zero-extension would give garbage for negative signed args. The amd64 ia32_fetch_syscall_args() is quite different. Now the args stack as 32 bits on the stack, so normal C accesses naturally extend them when assigning them to 64-bit memory sa->args[*]. The stack is in user space, so normal C accesses are unavailable at first. sa->code is read using fuword32(), which gives zero-extension. Then the stack is copied in and normal C accesses become available. Finally, all args are copied from 32-bit args[i] to 64-bit sa->args[i]. args[i] is u_int32_t, so this indeed gives zero-extension. But args[i] is signed (int64_t), since it is register_t which is bogusly signed. So for negative args, overflow occurs when int32_t accessed as u_int32_t via the type pun in the declaration of args[]. There is no further overflow when the result is assigned to sa->args[i], but the result is wrong (no longer negative). It takes further magic to undo this. BTW, struct syscall_args has bad names and formatting. It is missing indentation and sa_prefixes for all its members. These bugs are missing for the 64-bit register side of the copy (struct trapframe). The types in the syscalls.master prototype should be in fact selected to fit the in-kernel prototypes for the functions implementing the syscalls, No, they have to match the types actually passed so as to ignore any garbage bits created by previous magic steps. The lower levels cannot do a correct conversion since they don't have access to the type info generated from syscalls.master, and they shouldn't do a correct conversion since this is easier to do later -- only the functions implementing the syscalls can do it easily. The type info is mostly encoded in syscall args structs for the individual functions, in a way that the conversions only need to access fields in the structs to do the conversions (this only works for simple conversions but handles ignoring any leading and trailing bits up to reasonably boundaries, since this is needed for endianness handling). esp. for NOPROTO cases, and not to the low-level layout of the syscall entry data. Of course the layout must match the one actually used by the function implementing the syscall. For NOPROTO cases
Re: svn commit: r236026 - in head/sys: amd64/linux32 compat/freebsd32 kern
On Sat, May 26, 2012 at 10:21:25PM +1000, Bruce Evans wrote: > On Fri, 25 May 2012, Ed Schouten wrote: > > >Log: > > Remove use of non-ISO-C integer types from system call tables. > > > > These files already use ISO-C-style integer types, so make them less > > inconsistent by preferring the standard types. > > These should actually be Linux types l_foo_t. ISO-C-style integer types > seem to have only been used for a couple of uintptr_t's, and these uses > are more than just style bugs on amd64 since uintptr_t is for the host > (64 bits on amd64) while the target uintptr_t is only 32 bits. There > are also a few misuses of the abominable caddr_t instead of l_caddr_t, > so syscalls that don't even take a caddr_t. Otherwise, Linux types > are used a lot to avoid size mismatches. > > >Modified: head/sys/amd64/linux32/syscalls.master > >== > >--- head/sys/amd64/linux32/syscalls.master Fri May 25 21:12:24 2012 > >(r236025) > >+++ head/sys/amd64/linux32/syscalls.master Fri May 25 21:50:48 2012 > >(r236026) > >@@ -54,8 +54,8 @@ > > l_int mode); } > >9AUE_LINKSTD { int linux_link(char *path, char *to); } > >10 AUE_UNLINK STD { int linux_unlink(char *path); } > >-11 AUE_EXECVE STD { int linux_execve(char *path, u_int32_t > >*argp, \ > >-u_int32_t *envp); } > >+11 AUE_EXECVE STD { int linux_execve(char *path, uint32_t > >*argp, \ > >+uint32_t *envp); } > > argp and envp aren't uintany_t * in either Linux or FreeBSD. They start as > "char * const *". There is no Linux type for an indirect "char *", and one > was hacked up here by pretending that "char *" is u_int32_t (it is actually > just 32 bits). Using l_caddr_t seems to be the best hack available (since > by abusing l_caddr_t, we know that it is actually char *). > > The `const' in the type for argp and envp is further from being handled > correctly. Most or all syscall.master's just type-pun it away. Similarly > for "const char *path". > > All the non-indirect "char *"s for pathnames and other things seem to be > completely wrong on amd64 too. These pointers start as 32 bits, and it > takes more than a bad type pun to turn then into kernel 64-bit pointers. > The magic for this seems to be: > - all args are converted to 64 bits (by zero-extension?) at a low level The 'low level' AKA magic happens in several *_fetch_syscall_args() functions. For both linux32 and freebsd32, the magic code automatically zero-extends the arguments into 64bit entities. Linux passes args in registers, while FreeBSD uses words on stack. The types in the syscalls.master prototype should be in fact selected to fit the in-kernel prototypes for the functions implementing the syscalls, esp. for NOPROTO cases, and not to the low-level layout of the syscall entry data. > - the args struct for a pathname is > [left padding]; char *; [right padding]; > Since the char * is misdeclared, the explicit padding is null, but the > layout of the args struct is correct because the wrong arg type in it > supplies equivalent padding (extra 32 bits on the right). The arg struct layout is irrelevant, since fetch_syscall_args() functions perform the needed translation from process ABI to kernel ABI. I think that the padding could be completely eliminated for translated ABI, but since it is easier to reuse makesyscalls.sh instead of creating ABI-specific script, and since there is quite non-trivial count of NOPROTO declarations that just match the native-ABI syscall handlers, it is better not to start that. > - the "char *" in the args struct is not actually a char *, and is unusable > directly in the kernel. However, it is only used in copyin() and > copyout(), where it becomes a user address and works correctly. (An > older bug in this that the user address for copy*() is declared as > "void *". "void *" means a kernel pointer. The type of a user > address should be more like vm_offset_t, but even that needs logical > translation for linux32). It is char *, but in different address space. Linux-style type qualifiers like __usermode would be appropriate there (remember far/near ?), but we do not have static checkers that do understand the difference. > > The same mechanism presumably avoids problems when raw caddr_t is used > instead of l_caddr_t, and when uintptr_t is used instead of l_uintptr_t. > > >12 AUE_CHDIR STD { int linux_chdir(char *path); } > >13 AUE_NULLSTD { int linux_time(l_time_t *tm); } > > Example of a correct use of a linux type. Again, the first-level pointer > is handled by the above magic, but for the second level we need an l_foo_t > to describe its size correctly. > > >14 AUE_MKNOD STD { int linux_mknod(char *path, l_int mode, \ > > Broken except in the K&R case, but amd64 is t
Re: svn commit: r236026 - in head/sys: amd64/linux32 compat/freebsd32 kern
On Fri, 25 May 2012, Ed Schouten wrote: Log: Remove use of non-ISO-C integer types from system call tables. These files already use ISO-C-style integer types, so make them less inconsistent by preferring the standard types. These should actually be Linux types l_foo_t. ISO-C-style integer types seem to have only been used for a couple of uintptr_t's, and these uses are more than just style bugs on amd64 since uintptr_t is for the host (64 bits on amd64) while the target uintptr_t is only 32 bits. There are also a few misuses of the abominable caddr_t instead of l_caddr_t, so syscalls that don't even take a caddr_t. Otherwise, Linux types are used a lot to avoid size mismatches. Modified: head/sys/amd64/linux32/syscalls.master == --- head/sys/amd64/linux32/syscalls.master Fri May 25 21:12:24 2012 (r236025) +++ head/sys/amd64/linux32/syscalls.master Fri May 25 21:50:48 2012 (r236026) @@ -54,8 +54,8 @@ l_int mode); } 9 AUE_LINKSTD { int linux_link(char *path, char *to); } 10 AUE_UNLINK STD { int linux_unlink(char *path); } -11 AUE_EXECVE STD { int linux_execve(char *path, u_int32_t *argp, \ - u_int32_t *envp); } +11 AUE_EXECVE STD { int linux_execve(char *path, uint32_t *argp, \ + uint32_t *envp); } argp and envp aren't uintany_t * in either Linux or FreeBSD. They start as "char * const *". There is no Linux type for an indirect "char *", and one was hacked up here by pretending that "char *" is u_int32_t (it is actually just 32 bits). Using l_caddr_t seems to be the best hack available (since by abusing l_caddr_t, we know that it is actually char *). The `const' in the type for argp and envp is further from being handled correctly. Most or all syscall.master's just type-pun it away. Similarly for "const char *path". All the non-indirect "char *"s for pathnames and other things seem to be completely wrong on amd64 too. These pointers start as 32 bits, and it takes more than a bad type pun to turn then into kernel 64-bit pointers. The magic for this seems to be: - all args are converted to 64 bits (by zero-extension?) at a low level - the args struct for a pathname is [left padding]; char *; [right padding]; Since the char * is misdeclared, the explicit padding is null, but the layout of the args struct is correct because the wrong arg type in it supplies equivalent padding (extra 32 bits on the right). - the "char *" in the args struct is not actually a char *, and is unusable directly in the kernel. However, it is only used in copyin() and copyout(), where it becomes a user address and works correctly. (An older bug in this that the user address for copy*() is declared as "void *". "void *" means a kernel pointer. The type of a user address should be more like vm_offset_t, but even that needs logical translation for linux32). The same mechanism presumably avoids problems when raw caddr_t is used instead of l_caddr_t, and when uintptr_t is used instead of l_uintptr_t. 12 AUE_CHDIR STD { int linux_chdir(char *path); } 13 AUE_NULLSTD { int linux_time(l_time_t *tm); } Example of a correct use of a linux type. Again, the first-level pointer is handled by the above magic, but for the second level we need an l_foo_t to describe its size correctly. 14 AUE_MKNOD STD { int linux_mknod(char *path, l_int mode, \ Broken except in the K&R case, but amd64 is too new to pretend to support K&R. Bug for bug compatible with mknod() and some other syscalls in kern/syscalls.master. mknod() takes an arg of type mode_t, not one of type int. l_mode_t exists so that the above can be declared correctly, and is already used for linux_chmod() and linux_chmodat(). OTOH, int for the mode arg is correct for open(), since open() is variadic at a level before the syscall, so its mode_t arg gets promoted to int. Similarly, if mknod() is compiled by a K&R compiler, or by a STDC compiler with no prototype in scope, then its mode_t arg gets promoted to int (but for the STDC case, the behaviour is then undefined once mknod() is reached). Normally this doesn't cause any problems, but it is easy to declare things correctly. Modified: head/sys/compat/freebsd32/syscalls.master == --- head/sys/compat/freebsd32/syscalls.master Fri May 25 21:12:24 2012 (r236025) +++ head/sys/compat/freebsd32/syscalls.master Fri May 25 21:50:48 2012 (r236026) @@ -104,9 +104,9 @@ int flags); } 28 AUE_SENDMSG STD { int freebsd32_sendmsg(int s, struct msghdr32 *msg, \ int flags); } -29 AUE_RECVFROMSTD { int fre