On Tue, Aug 23, 2011 at 9:48 AM, Al Viro <v...@zeniv.linux.org.uk> wrote: > > Um... How would it know which syscall variant had that been, to start > with?
Just read the instruction, for chissake. UML *already* does that, to see if it's "int80" or "sysenter" ('is_syscall()'). Now, I do agree that if we had designed the ptrace interface with these kinds of issues in mind, then we would have added a "state" field to the thing that could have this kind of information as part of the GETREGS interface. There is no question that that would have been a good idea - but we have what we have. I mean, technically, we could also have always just given "raw user space register state" to ptrace, and then just said that "anybody who traces system calls needs to know the exact calling conventions for *that* kind of system call". But instead of that, we give the "cooked" pt_regs values on read-out, to make it simpler for strace and friends. And it's actualyl simpler for UML too. If we *didn't* give that cooked register set information, then UML would *still* have to look at the actual instruction in order to emulate the system call correctly ("it's sysenter, so now I need to take some of the system call arguments from the stack"). So the fact that we do that register state swizzling actually helps not just strace, but UML too. It would be *nice* if we did the swizzling automatically at setregs() time too, but we simply don't have enough information in the kernel to do that. Again, exactly because pt_regs doesn't have a "state" variable, when user-space does the SETREGS call, we simply don't know whether we are in "normal" code or in some system call entry or exit state. So the kernel does the swizzling at GETREGS time (by virtue of always having the registers in a "canonical" state for system call entry), but we fundamentally *cannot* to do the unswizzle, because we don't know what the SETREGS caller actually did. So I think the current state is actually the best we could possibly do, with the caveat that *if* we had known about the "different system calls have different register layouts" originally and had thought of it, we could have added a 'state' word that the kernel could set at GETREGS time, and use at SETREGS time to decide whether swizzling is needed or not. But not only would that have required time travel (ptrace existed before the multiple system calls did), even then it's not 100% clear that the current simpler model (with the admittedly subtle case of implicit state and its effect on register state) isn't actually the better solution. *Somebody* has to do the register swizzling, and the current "kernel canonicalizes registers at read time, you need to swizzle them if you change state" may simply be the RightThing(tm). Linus ------------------------------------------------------------------------------ Get a FREE DOWNLOAD! and learn more about uberSVN rich system, user administration capabilities and model configuration. Take the hassle out of deploying and managing Subversion and the tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2 _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel