> > 2. when kqemu isn't used, I've noticed that the 64-bit S-x86 guest
> > os kernel randomly reports "EFAULT" errors for system calls.
> > This results in svc.startd / svc.configd startup errors, telling me that
> > there were disk error reading the repository DB.  Or fsck errors,
> > reporting block read errores, when checking the root filesystem.
> > 
> > What I found out so far is that this problem seems to happen only
> > when the standard C-library  /lib/libc.so.1 is used.
> > 
> > When I mount the /usr/lib/libc/libc_hwcap2.so.1 shared C-library on
> > top of  /lib/libc.so.1 the EFAULT errors are gone.
> > 
> > A qemu problem with the "int $0x91" system calls in  /lib/libc.so.1,
> > which doesn't happen with the "syscall" system calls from
> > libc_hwcap2.so.1 ?
> 
> I've started looking at this. The most likely source of EFAULT errors 
> seemed to be the copyin of args failing but that didn't pan out. So now 
> I'm looking into whether any of the system calls are actually returning 
> EFAULT (likely due to munged arguments), or if something is happening in 
> the post-syscall processing to return EFAULT after a good syscall. The 
> post-syscall processing isn't performed in the case of a syscall or 
> sysenter instruction.. otherwise there are few differences since we 
> don't use the syscall/sysenter argument passing methods but instead 
> appear to always retrieve the arguments from the user stack.

I also made some progress with this issue.

It seems to be the copyin_nowatch() call in copyin_args32() which is
returning an error.

http://cvs.opensolaris.org/source/xref/on/usr/src/uts/intel/ia32/os/syscall.c#co
pyin_args32


copyin_nowatch() is failing because the user address is above "kernelbase".
The user address is the process' stackpointer plus on word:

    144         greg32_t *sp = 1 + (greg32_t *)rp->r_sp;        /* skip ret 
addr 
*/


And the error is noticed in copyin(), which is called by copyin_nowatch().

I see that register structure contain an  rp->r_rsp = 0xfffffe80080475a0
which looks like a 32-bit user stack address of 0x80475a0 with some
unexpected bits set in the upper 32-bits of the 64-bit register.  The address 
r_rsp = 0xfffffe80080475a0 is above kernelbase (0xfffffd8000000000),
so copyin() fails, copyin_nowatch() fails, and copyin_args32() fails.

copyin_arg32() is called from syscall_entry(), which signals the EFAULT
error when the copyin_arg32() has failed.




Reply via email to