Re: [uml-devel] [RFC] weird crap with vdso on uml/i386

Richard Weinberger Sat, 20 Aug 2011 08:23:45 -0700

Am 20.08.2011 03:18, schrieb Al Viro:
> 3) with the previous two issues dealt with, we get the following magical
> mistery shite when running 32bit uml kernel + userland on 64bit host:
>       * the system boots all the way to getty/login and sshd (i.e. gets
> through the debian /etc/init.d (squeeze/i386))
>       * one can log into it, both on terminals and over ssh.  shell and
> a bunch of other stuff works.  Mostly.
>       * /bin/bash -c "echo *" reliably segfaults.  Always.  So does tab
> completion in bash, for that matter.
>       * said segfault is reproducible both from shell and under gdb.
> For /bin/bash -c "echo *" under gdb it's always the 10th call of brk(3).
> What happens there apparently boils down to __kernel_vsyscall() getting
> called (and yes, sys_brk() is called, succeeds and results in expected
> value in %eax) and corrupting the living hell out of %ecx.  Namely, on
> return from what presumably is __kernel_vsyscall() I'm seeing %ecx equal
> to (original value of) %ebp.  All registers except %eax and %ecx (including
> %esp and %ebp) remain unchanged.
>       Again, that happens only on the same call of brk(3) - all previous
> calls succeed as expected.  I don't believe that it's a race.  I also
> very much doubt that we are calling the wrong location - it's hard to tell
> with the call being call *%gs:0x10 (is there any way to find what that
> is equal to in gdb, BTW?  Short of hot-patching movl *%gs:0x10,%eax in place
> of that call and single-stepping it, that is...) but it *does* end up
> making the system call that ought to have been made, so I suspect that it
> does hit __kernel_vsyscall(), after all...
>
> The text of __kernel_vsyscall() is
>       0xffffe420<__kernel_vsyscall+0>:       push   %ebp
>       0xffffe421<__kernel_vsyscall+1>:       mov    %ecx,%ebp
>       0xffffe423<__kernel_vsyscall+3>:       syscall
>       0xffffe425<__kernel_vsyscall+5>:       mov    $0x2b,%ecx
>       0xffffe42a<__kernel_vsyscall+10>:      mov    %ecx,%ss
>       0xffffe42c<__kernel_vsyscall+12>:      mov    %ebp,%ecx
>       0xffffe42e<__kernel_vsyscall+14>:      pop    %ebp
>       0xffffe42f<__kernel_vsyscall+15>:      ret
> so %ecx on the way out becoming equal to original %ebp is bloody curious -
> it would smell like entering that sucker 3 bytes too late and skipping
> mov %ecx, %ebp, but... we would also skip push %ebp, so we'd get trashed
> on the way out - wrong return address, wrong value in %ebp, changed %esp.
> None of that happens.  And we are executing that code in userland - i.e.
> to get corrupt it would have to get corrupt in *HOST* 32bit VDSO.  Which
> would have much more visible effects, starting with the next attempt to
> run the testcase blowing up immediately instead of waiting (as it actually
> does) for the same 10th call of brk()...
>
> I'm at loss, to be honest.  The sucker is nicely reproducible, but bisecting
> doesn't help at all - it seems to be present all the way back at least to
> 2.6.33.  I hadn't tried to go back further and I hadn't tried to go for
> older host kernels, but I wouldn't put too much faith into that...  The
> reason it hadn't been noticed much earlier is that it works fine on i386
> host - aforementioned shit happens only when the entire thing (identical
> binary, identical fs image, identical options) is run on amd64.  However,
> on i386 I have a different __kernel_vsyscall, which might easily be the
> reason it doesn't happen there.  It's a K7 box with sysenter-based
> variant ending up as __kernel_vsyscall().  Hell knows what's going on...
> Behaviour is really weird and I'd appreciate any pointers re debugging
> that crap.  Suggestions?


Hmmm, very strange.
Sadly I cannot reproduce the issue. :(
Everything works fine within UML.
(Of course I've applied your vDSO/i386 patches)

My test setup:
Host kernel: 2.6.37 and 3.0.1
Distro: openSUSE 11.4/x86_64

UML kernel: 3.1-rc2
Distro: openSUSE 11.1/i386

Does the problem also occur with another host kernel or a different 
guest image?

Thanks,
//richard

------------------------------------------------------------------------------
Get a FREE DOWNLOAD! and learn more about uberSVN rich system, 
user administration capabilities and model configuration. Take 
the hassle out of deploying and managing Subversion and the 
tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2
_______________________________________________
User-mode-linux-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

Re: [uml-devel] [RFC] weird crap with vdso on uml/i386

Reply via email to