> > 2. when kqemu isn't used, I've noticed that the 64-bit S-x86 guest > > os kernel randomly reports "EFAULT" errors for system calls. > > This results in svc.startd / svc.configd startup errors, telling me that > > there were disk error reading the repository DB. Or fsck errors, > > reporting block read errores, when checking the root filesystem. > > > > What I found out so far is that this problem seems to happen only > > when the standard C-library /lib/libc.so.1 is used. > > > > When I mount the /usr/lib/libc/libc_hwcap2.so.1 shared C-library on > > top of /lib/libc.so.1 the EFAULT errors are gone. > > > > A qemu problem with the "int $0x91" system calls in /lib/libc.so.1, > > which doesn't happen with the "syscall" system calls from > > libc_hwcap2.so.1 ? > > I've started looking at this. The most likely source of EFAULT errors > seemed to be the copyin of args failing but that didn't pan out. So now > I'm looking into whether any of the system calls are actually returning > EFAULT (likely due to munged arguments), or if something is happening in > the post-syscall processing to return EFAULT after a good syscall. The > post-syscall processing isn't performed in the case of a syscall or > sysenter instruction.. otherwise there are few differences since we > don't use the syscall/sysenter argument passing methods but instead > appear to always retrieve the arguments from the user stack.
I also made some progress with this issue. It seems to be the copyin_nowatch() call in copyin_args32() which is returning an error. http://cvs.opensolaris.org/source/xref/on/usr/src/uts/intel/ia32/os/syscall.c#co pyin_args32 copyin_nowatch() is failing because the user address is above "kernelbase". The user address is the process' stackpointer plus on word: 144 greg32_t *sp = 1 + (greg32_t *)rp->r_sp; /* skip ret addr */ And the error is noticed in copyin(), which is called by copyin_nowatch(). I see that register structure contain an rp->r_rsp = 0xfffffe80080475a0 which looks like a 32-bit user stack address of 0x80475a0 with some unexpected bits set in the upper 32-bits of the 64-bit register. The address r_rsp = 0xfffffe80080475a0 is above kernelbase (0xfffffd8000000000), so copyin() fails, copyin_nowatch() fails, and copyin_args32() fails. copyin_arg32() is called from syscall_entry(), which signals the EFAULT error when the copyin_arg32() has failed.