On Sun, Aug 21, 2011 at 12:41 PM, Al Viro <v...@zeniv.linux.org.uk> wrote: > On Sun, Aug 21, 2011 at 03:43:52PM +0100, Al Viro wrote: > >> We do not lie to ptrace and iret. At all. We do just what you have >> described. And fuck up when restart returns us to the SYSCALL / SYSENTER >> instruction again, which expects the different calling conventions, >> so the values arranged in registers in the way int 0x80 would expect >> do us no good. > > FWIW, what really happens (for 32bit task on amd64) is this:
I think I believe your analysis... > * Both codepaths start with arranging the same thing on the kernel > stack frame; one 64bit int 0x80 would create. For the good and simple > reason: they all have to be able to leave via IRET. Stack layout is the > same, but we need to fill it accordingly to calling conventions we are > stuck with. I.e. ->cx should be initialized with arg2 and ->bp with > arg6, wherever those currently are on given codepath. _That_ is what > "lying to ptrace" is about - we store there registers according to how > they were when we entered __kernel_vsyscall(), not as they are at the > moment of actual SYSCALL insn. Which is precisely the right thing to do, > since if we *are* ptraced, the tracer expects to find the syscall argument > in the same places, whichever variant of syscall tracee happens to be using. This is, IMO, gross -- if the values in pt_regs matched what they were when sysenter / syscall was issued, then we'd be fine -- we could restart the syscall and everything would work. Apparently ptrace users have a problem with that, so we're stuck with the "lie" (i.e. reporting values as of __kernel_vsyscall, not as of the actual kernel entry). > * If there *was* a syscall restart to be done, we are guaranteed to > have left via IRET path. In all cases the syscall arguments end up in > registers, in the same way int 0x80 expected them. What happens afterwards > depends on how we entered, though. > + int 0x80: all registers are restored (with ptrace > manipulations, if any, having left their effect) as they'd been the last > time around. In we go and that's it. Which suggests an easy-ish fix: if sysenter is used or if syscall is entered from the EIP is is supposed to be entered from, then just change ip in the argument save to point to the int 0x80 instruction. This might also require tweaking the userspace stack. That way, restart would hit int 0x80 instead of syscall/sysenter and the registers are exactly as expected. Getting this right in the case where ptrace attaches during the syscall might be tricky, though. --Andy ------------------------------------------------------------------------------ uberSVN's rich system and user administration capabilities and model configuration take the hassle out of deploying and managing Subversion and the tools developers use with it. Learn more about uberSVN and get a free download at: http://p.sf.net/sfu/wandisco-dev2dev _______________________________________________ User-mode-linux-devel mailing list User-mode-linux-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel