On Tuesday 06 March 2007 00:26, Blaisorblade wrote:
> On Tuesday 06 March 2007 00:10, Jeff Dike wrote:
> > On Tue, Mar 06, 2007 at 12:03:26AM +0100, Blaisorblade wrote:
> > > > No, RCX corruption is different - that happens when a sysexit is done
> > > > from a system call where userspace wasn't prepared to save and
> > > > restore RCX.  sigreturn is the best example.
> > >
> > > Hmm... we should finally fix that, at some point. Or... now that you
> > > explain it this way, it could even seem unfixable... is it? Or maybe
> > > sysreturn should become a syscall where the return must happen through
> > > the slow return path (iret), if that exists for x86_64.
> >
> > This is fixed, and has been for a while.  The fix was, as you suggest,
> > return through iret in this case.

Hmm, return through IRET is implemented for sys_rt_sigreturn since 2.6.0 (with 
a couple of changes, yeah, but...).

Was the original Bodo's report bogus? No, he actually found a much harder 
issue.

I've attached the log of that IRC here for reference.

> Also the 32bit emulation case? That would be interesting for SKAS with
> 64bit host and 32bit guest (which I haven't tested for a long time). Also
> this means that I could test the needed trivial fixes for 64 on 64 (like
> opening /proc/mm64, using PTRACE_EX_FAULTINFO which I introduced...).

I looked and it doesn't seem to have been fixed. Andi, can you give a look to 
this problem (sigreturn returning through iret and corrupting ECX for 32-bit 
processes)?

If I added in arch/x86_64/ia32/ia32_signal.c: sys32_sigreturn() a call to 
set_thread_flag(TIF_IRET), would that fix the problem?

I see no use of this in x86_64, even if this flag is defined and it is 
(implicitly) implemented in *entry.S - it is never mentioned but it is tested 
though _TIF_WORK_MASK / _TIF_ALLWORK_MASK, and separate stubs are used for 
execve and sigreturn. Is there a good reason not to use IRET there?

Bye
-- 
Inform me of my mistakes, so I can add them to my list!
Paolo Giarrusso, aka Blaisorblade
http://www.user-mode-linux.org/~blaisorblade
[16:08:52] Canali in comune con bodo [EMAIL PROTECTED]: #uml
[16:09:02] <Blaisorblade> Hey, Bodo!
[16:09:51] <bodo> Hello Blaisorblade!
[16:09:59] <Blaisorblade> Ok, so?
[16:10:14] <Blaisorblade> I've tried guessing the problem:
[16:10:33] <Blaisorblade> But didn't found it...
[16:10:34] <bodo> Blaisorblade: First, let us check: entry.S on x86_64 isn't 
modified by the patch, true?
[16:10:43] <Blaisorblade> Exactly.
[16:11:16] <bodo> Blaisorblade: So, assume a process was scheduled out not on a 
syscall, but on a interrupt
[16:11:17] <Blaisorblade> But for 32-bit binaries the file to look at is 
ia32/ia32entry.S. And normal SKAS doesn't touch that file either.
[16:11:30] <Blaisorblade> Hmm, ok...
[16:11:44] <Blaisorblade> The interrupt handler will lazely share the MM.
[16:12:03] <bodo> Blaisorblade: now, assume when it is scheduled next, this 
will happen on a syscall done by the outgoing process
[16:12:17] <Blaisorblade> Ok.
[16:12:40] <bodo> Blaisorblade: in that case, RCX and R11 of the process will 
be destroyed
[16:12:55] <Blaisorblade> Hmm...
[16:13:22] <Blaisorblade> What is the difference between SKAS and normal 
processes?
[16:13:53] <bodo> Blaisorblade: SKAS does a UML context-switch while the 
host-process still is the same
[16:14:23] <bodo> this happens in SKAS0, too. In case of threads sharing the 
same mm
[16:14:25] <Blaisorblade> Yes, exactly... by "outgoing process" you refer to 
the UML one or the host one.
[16:14:38] <bodo> host one
[16:14:48] <bodo> err. UML one
[16:15:15] <Blaisorblade> Because UML will read the values saved by the 
interrupt handler context, right?
[16:15:34] <Blaisorblade> But is the interrupt handler preemptible?
[16:15:44] <bodo> yes. And *all* registers need to be restored. But this won't 
be done on a return from syscall
[16:16:24] <bodo> Blaisorblade: think of timer tick interrupting the process
[16:16:43] <Blaisorblade> You mean a host interrupt or a UML interrupt, i.e. a 
signal?
[16:17:32] <bodo> both.
[16:17:46] <bodo> think of a timer tick interrupting the host.
[16:18:12] <bodo> host see an timer running out and queues a SIGALRM for 
UML-process
[16:18:44] <bodo> on return from interrupt, this signal will be processed, so 
the UML kernel is started
[16:18:48] <Blaisorblade> Ok. A host interrupt must be finished before UML 
continues running, except on a multiprocessor host...
[16:19:22] <Blaisorblade> bodo: the UML kernel is started, it does the context 
switch, and the registers of the process have been saved somewhere on the 
host...
[16:19:23] <bodo> UML decides to schedule another process
[16:20:07] <bodo> and while the kernel runs, the user-process still is "in 
interrupt"
[16:20:37] <bodo> as so_signal is called before returning to user
[16:21:02] <Blaisorblade> bodo: Wait a moment: what you're describing is a race 
condition on ptrace()...
[16:21:02] <Blaisorblade> Anyhow, hardware interrupt *don't* send signals 
anywhere... they must schedule softirqs or something else to do the long work.
[16:21:09] <bodo> the host schedules out the user-process and schules the 
kernel-process
[16:21:20] <bodo> No, no race condition!!!
[16:21:36] <bodo> let's try again:
[16:21:46] <bodo> assume in UML process A is running
[16:21:50] <Blaisorblade> Wait a moment: where do you see the call to do_signal?
[16:21:59] <bodo> in the host
[16:22:13] <Blaisorblade> and where in the code?
[16:22:34] <Blaisorblade> In entry.S this isn't found.
[16:23:39] <Blaisorblade> Ok, found sysret_signal and do_notify_resume().
[16:23:57] <Blaisorblade> -> and do_signal.
[16:24:17] <bodo> retint_signal
[16:24:59] <bodo> May I try to explain better?
[16:25:06] <Blaisorblade> Ok, let me look...
[16:25:29] <Blaisorblade> Ok, go... I'm starting understanding the problem...
[16:25:52] <bodo> Let's assume, process A on UML is running
[16:26:18] <bodo> user-process is interrupted by a timer-tick
[16:26:27] <Blaisorblade> Yes, perfectly... And then it's interrupted by a host 
timer tick and a SIGALRM sent to UML which then schedules another process, 
right?
[16:26:40] <bodo> yes!!!
[16:27:34] <Blaisorblade> The problem is that in that point, PTRACE_GETREGS 
will access the interrupt values of the registers, while saving the registers 
for the old process, right?
[16:27:34] <bodo> now, UML's process is interrupt while *running*, that means, 
we have to save *all* regs and also to restore *all* regs later
[16:28:04] <bodo> Blaisorblade: yes, and that is OK. , as we will read all 
register values.
[16:28:28] <bodo> the problem will come up later
[16:28:39] <bodo> assume, now process B is running on UML
[16:28:44] <Blaisorblade> Ok, it will be later, so the values we read are 
correct.
[16:29:01] <bodo> process B does a syscall, that leads to process A being 
scheduled
[16:29:44] <bodo> A's registers are read in a "syscall-context", which is no 
problem
[16:29:51] <Blaisorblade> Hmm, ok, this means that we must restore all 
registers of process A...
[16:30:23] <Blaisorblade> bodo: You said that we interrupted A while it was 
inside the *interrupt* code...
[16:30:43] <bodo> but B's registers are written to a syscall-context, which is 
wrong, as RCX must contain return-address on syscall and R11 must contain RFLAGS
[16:31:31] <bodo> yes. we interrupted A while it was inside the *interrupt* 
code, but we bring it back in syscall-code --> ERROR
[16:31:58] <bodo> this is specific x86_64
[16:32:52] <bodo> I don't know, if this is *your* problem, but it is *one* 
problem
[16:34:10] <Blaisorblade> Hmm...
[16:34:31] <Blaisorblade> Ok, what I'm currently testing for now is support for 
32-bit binaries...
[16:34:49] <bodo> AFAICS, Jeff also should see problems with sys_rt_sigreturn, 
if the process was interrupted while not doing a syscall
[16:35:22] <Blaisorblade> I.e. I mean 32-bit UML binaries, which run inside 
ia32entry.S...
[16:35:51] <Blaisorblade> Wait a moment: what's the problem about B's registers?
[16:36:16] <bodo> No problem about B, only about A
[16:36:48] <Blaisorblade> What I understood is that we restored our registers 
of process A, not the registers of the interrupt handler, thus preventing the 
interrupt handler from returning, right?
[16:37:24] <Blaisorblade> bodo: You said "B's registers are written to a 
syscall context, which is wrong, [.....]RCX[...] R11[...]"
[16:37:32] <bodo> the registers of the interrupt handler *are* A's registers
[16:37:53] <bodo> the problem is, on x86_64 a syscall *will* clobber RCX and R11
[16:38:21] <bodo> a syscall returns using SYSRET, while a interrupt returns 
using IRET
[16:39:28] <Blaisorblade> Well, wait a moment, the registers we(UML) saved with 
ptrace about A are the ones which were on A's stack, right?
[16:39:41] <bodo> yes.
[16:40:12] <Blaisorblade> And B is a UML process?
[16:40:16] <bodo> yes.
[16:40:42] <Blaisorblade> So B is doing a syscall which is captured by 
PTRACE_SYSCALL, hmmm, ok...
[16:40:52] <bodo> yes.
[16:41:43] <Blaisorblade> on x86_64 a syscall will *save* and clobber RCX and 
R11, right? And those values will be used by SYSRET.
[16:41:57] <bodo> yes.
[16:42:00] <Blaisorblade> The registers we read on A's stack were the ones from 
the userspace process before the syscall...
[16:42:26] <Blaisorblade> right?
[16:42:27] <bodo> yes. And RCX and R11 have to contain the previous value on 
resume
[16:43:13] <bodo> But we return from a syscall, that clobbers them
[16:43:56] <bodo> Unfortunately, I have no x86_64, so all this is from reading 
the source only.
[16:44:00] <Blaisorblade> Ok. In normal activity, or even TT or SKAS0, when 
process A is suspended the host saves again the registers, this time the ones 
from inside the current frame.
[16:44:29] <Blaisorblade> bodo: again, would this affect 32bit processes in 
your opinion?
[16:44:46] <Blaisorblade> They are handled by ia32entry.S, which is different.
[16:44:54] <bodo> Wait a moment, I'll try to find out
[16:46:49] <Blaisorblade> Well, what the hell! It seems that ia32 emulated 
syscalls through vsyscall page are done by either sysenter or syscall... while 
the ones in libc are done by int 0x80.
[16:47:57] <bodo> I see. At least SYSCALL method should be affected, maybe 
sysenter also, wait a moment
[16:48:55] <Blaisorblade> Ok, sysenter is enabled only if the vendor is INTEL. 
arch/x86_64/ia32/syscall32.c
[16:50:27] <Blaisorblade> So, this means that saving the registers of a tracee 
while it's in interrupt context will read the ones from syscall context, i.e. 
the one from userspace (except for a few clobbered ones). I.e. we return 
abruptly from the interrupt handler.
[16:51:08] <bodo> Maybe, vsyscall-page will repair all changed values, as for 
i386-programs, it has to look like a i386
[16:52:31] <bodo> What are the problems you see on your x86_64?
[16:52:48] <Blaisorblade> bodo: from looking at 
arch/i386/kernel/vsyscall-sysenter.S, it seems that a syscall done through 
sysenter will save and clobber some i386 registers...
[16:53:19] <Blaisorblade> bodo: the problem I see is IIRC a crash during boot...
[16:53:46] <Blaisorblade> with a BUG in mmap.c (or memory.c)...
[16:54:19] <Blaisorblade> For instance, a old 2.4 kernel spit out, some time 
ago:
[16:54:19] <Blaisorblade> 
[16:54:19] <Blaisorblade> VFS: Mounted root (ext2 filesystem) readonly.
[16:54:19] <Blaisorblade> Unable to load interpreter
[16:54:19] <Blaisorblade> Kernel panic: kernel BUG at memory.c:377!
[16:54:21] <bodo> Do you boot an UML/i386 or UML/x86_64
[16:54:27] <Blaisorblade> UML/i386...
[16:54:41] <Blaisorblade> I've not yet modified UML/x86_64...
[16:55:16] <bodo> UML/x86_64 isn't yet ready for SKAS3?
[16:55:37] <Blaisorblade> The changes are at least:
[16:55:37] <Blaisorblade> 1) use /proc/mm64
[16:55:37] <Blaisorblade> 2) use PTRACE_EX_FAULTINFO which also returns trap_no
[16:55:37] <Blaisorblade> 3) use trap_no
[16:55:37] <Blaisorblade> 4) do everything else that might be needed, which I 
must investigate.
[16:56:06] <Blaisorblade> For 4), I guess that there will be some SKAS specific 
code in sys-i386 or in sysdep, but I still must find out.
[16:57:11] <bodo> So, currently you use UML/i386 on a x86_64
[16:57:22] <Blaisorblade> Exactly...
[16:57:29] <bodo> and that crashes on boot
[16:57:52] <Blaisorblade> Yes... actually maybe I missed testing the last 
version but I think I did...
[16:58:37] <Blaisorblade> I'll post updated results on the status when I'll 
have time, but for now I must go back to more urgent stuff, sorry... I have to 
finish this by today...
[16:58:54] <Blaisorblade> Thanks for the help anyway, I'll save this chat and 
look more carefully after...
[16:59:12] <Blaisorblade> Anyhow, this is something which *can* be solved by 
fiddling with entry.S, right?
[16:59:47] <Blaisorblade> I'll understand it more fully when I'll have studied 
the SYSENTER and SYSCALL instructions, anyhow.
[17:00:28] <bodo> right. It should use int_ret_from_sys_call. which kind of 
syscall does the glibc on UML use?
[17:01:17] <bodo> and don't forget: I couldn't test anything, so maybe I'm 
totally wrong ...
[17:02:31] <Blaisorblade> Ok, I'll look at what happens... glibc inside UML 
uses int 0x80 however...
[17:02:43] <Blaisorblade> Because I mostly tested with a slack10...
[17:03:25] <Blaisorblade> Slackware 10.0, which has a old glibc... however 
maybe I tested with Sarge more recently... earlier tests didn't work for a 
double >> PAGE_SHIFT problem.
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

Reply via email to