> Date: Tue, 19 Apr 2011 15:43:01 +0200
> From: Christian Ehrhardt <[email protected]>
> 
> Hi,
> 
> On Mon, Apr 18, 2011 at 10:00:02PM -0700, Philip Guenther wrote:
> > > The sti was introduced in revision 1.97 of locore.s in March 2006 by
> > > mickey@. Commit message:
> > >
> > > | prevent the faults on iret to run w/ disabled intrs and cause
> > > | deadlocks; niklas toby tom ok
> > >
> > > Maybe mickey or one of the people giving oks back then want to comment?
> > 
> > That predates my involvement with OpenBSD, but I think get the gist.
> > iret can fault if, for example, the user corrupts certain members of
> > the sigcontext passed to a signal handler.  If that happens the the
> > CPU generates an exception on the iret and you end up in trap() where
> > it peeks at the instruction that triggered the trap and, seeing it was
> > iret, arranges to 'return' to the resume_iret label to generate what
> > looks like a normal fault against the process.  The goal of that sti
> > is to keep that 'normal' fault from being processed with interrupts
> > blocked.
> 
> Point taken, I missed the sigreturn path. However, note that iret can
> only fault if the values of the segment registers stored on the kernel
> stack are bogus. In particular iret will not fault on the new EIP address.
> (The address can page fault but the fault will be on the "instruction"
> pointed to by the new CS:EIP and not in iret).
> 
> With that in mind, the question is: Is it really a problem if the trap
> handler runs with interrupts disabled? See below.

Yes, that will be a problem.  IPIs will be blocked.  So if that trap
requires us to grab the kernel lock, and another process (that holds
the kernel lock) tries to do a TLB shootdown, we'll deadlock.

> > Possibility two would be to try to handle this from, uh, inside
> > resume_iret I guess.  I'm not 100% sure whether it can be unilateral
> > there though.

This seems to be what amd64 does, and since calling uvm_fault() with
interrupts disabled is really really bad, I don't think this would be
a problem.

> This would be my preferred solution. In particular, I see two possibilities:
> 
> * Maybe it is not even a problem is irqs are sometimes disabled during
>   trap.

naddy@ is seeing hangs on the i386 ports building machine, and I
suspect that is because interrupts are disabled while handling a page
fault trap.

> * It is a problem if trap runs with irqs disabled. In this case, userland
>   can easily trigger this by trapping on one of the pop instructions for
>   ds, es, fs or gs. Those instructions in the interrupt return path run
>   with irqs disabled. Thus moving the sti in the interupt return path down
>   one instruction immediately before the iret does not really help.
>   The proper solution would probably be to add explicit calls to sti in all
>   of the resume paths (or even explicitly before calling "trap").

So here is a diff that adds the sti to both resume paths.  Like I
said, amd64 already does this (but only has the one resume path).  Now
the i386 ports machine is a bad machine to test this on.  So it would
be nice if somebody with the appropriate hardware (an 8-way i386
machine I presume) could:

1. try to reproduce the problem that naddy@ is seeing

2. and then try to reproduce the problem with this diff applied

Cheers,

Mark


Index: locore.s
===================================================================
RCS file: /cvs/src/sys/arch/i386/i386/locore.s,v
retrieving revision 1.134
diff -u -p -r1.134 locore.s
--- locore.s    27 Apr 2011 11:30:53 -0000      1.134
+++ locore.s    2 May 2011 15:37:06 -0000
@@ -1461,6 +1461,7 @@ IDTVEC(align)
  * This will cause the process to get a SIGBUS.
  */
 NENTRY(resume_iret)
+       sti
        ZTRAP(T_PROTFLT)
 NENTRY(resume_pop_ds)
        pushl   %es
@@ -1476,6 +1477,7 @@ NENTRY(resume_pop_gs)
        movw    %ax,%fs
 NENTRY(resume_pop_fs)
        movl    $T_PROTFLT,TF_TRAPNO(%esp)
+       sti
        jmp     calltrap
 
 NENTRY(alltraps)

Reply via email to