> Date: Mon, 2 May 2011 19:21:40 +0200
> From: Christian Ehrhardt <christian_ehrha...@genua.de>
> 
> Hello Mark,
> 
> On Mon, May 02, 2011 at 05:42:10PM +0200, Mark Kettenis wrote:
> > > Date: Tue, 19 Apr 2011 15:43:01 +0200
> > > From: Christian Ehrhardt <christian_ehrha...@genua.de>
> > > 
> > > Hi,
> > > 
> > > On Mon, Apr 18, 2011 at 10:00:02PM -0700, Philip Guenther wrote:
> > > > > The sti was introduced in revision 1.97 of locore.s in March 2006 by
> > > > > mickey@. Commit message:
> > > > >
> > > > > | prevent the faults on iret to run w/ disabled intrs and cause
> > > > > | deadlocks; niklas toby tom ok
> > > > >
> > > > > Maybe mickey or one of the people giving oks back then want to 
> > > > > comment?
> > > > 
> > > > That predates my involvement with OpenBSD, but I think get the gist.
> > > > iret can fault if, for example, the user corrupts certain members of
> > > > the sigcontext passed to a signal handler.  If that happens the the
> > > > CPU generates an exception on the iret and you end up in trap() where
> > > > it peeks at the instruction that triggered the trap and, seeing it was
> > > > iret, arranges to 'return' to the resume_iret label to generate what
> > > > looks like a normal fault against the process.  The goal of that sti
> > > > is to keep that 'normal' fault from being processed with interrupts
> > > > blocked.
> > > 
> > > Point taken, I missed the sigreturn path. However, note that iret can
> > > only fault if the values of the segment registers stored on the kernel
> > > stack are bogus. In particular iret will not fault on the new EIP address.
> > > (The address can page fault but the fault will be on the "instruction"
> > > pointed to by the new CS:EIP and not in iret).
> > > 
> > > With that in mind, the question is: Is it really a problem if the trap
> > > handler runs with interrupts disabled? See below.
> > 
> > Yes, that will be a problem.  IPIs will be blocked.  So if that trap
> > requires us to grab the kernel lock, and another process (that holds
> > the kernel lock) tries to do a TLB shootdown, we'll deadlock.
> 
> Ok, but ...
> 
> > > > Possibility two would be to try to handle this from, uh, inside
> > > > resume_iret I guess.  I'm not 100% sure whether it can be unilateral
> > > > there though.
> > 
> > This seems to be what amd64 does, and since calling uvm_fault() with
> > interrupts disabled is really really bad, I don't think this would be
> > a problem.
> > 
> > > This would be my preferred solution. In particular, I see two 
> > > possibilities:
> > > 
> > > * Maybe it is not even a problem if irqs are sometimes disabled during
> > >   trap.
> > 
> > naddy@ is seeing hangs on the i386 ports building machine, and I
> > suspect that is because interrupts are disabled while handling a page
> > fault trap.
> 
> ... I really don't believe this, unless there is some path that I still
> miss. Faulting on iret itself should IMHO be a very rare thing. The one
> and only path that is see, where this can happen, is sigreturn with
> bogus segment registers provided by userland.

I wouldn't call myself an expert on the i386 architecter.  The fact
that I confuse page faults and protection faults proves this.  Anyway,
the resume_xxx paths exist for a reason.

> > So here is a diff that adds the sti to both resume paths.  Like I
> > said, amd64 already does this (but only has the one resume path).  Now
> > the i386 ports machine is a bad machine to test this on.  So it would
> > be nice if somebody with the appropriate hardware (an 8-way i386
> > machine I presume) could:
> 
> In fact it might be a good idea to try this patch instead (i.e.
> unconditionally enable irqs before calling trap):

Hmm, the amd64 code does have an sti just after the alltraps entry
point, but just before the calltrap label.  I don't think there is any
reason for i386 to be different here.  Dunno if there is a reason for
the amd64 code to be like it is.  Naively I'd think that the amd64
code can indeed be simplified by moving the sti after the calltrap
label and remove the sti instructions just before the jmp calltrap
instructions.

> Index: i386/locore.s
> ===================================================================
> RCS file: /cvs/src/sys/arch/i386/i386/locore.s,v
> retrieving revision 1.130
> diff -u -r1.130 locore.s
> --- i386/locore.s     3 Jul 2010 04:54:32 -0000       1.130
> +++ i386/locore.s     2 May 2011 16:46:14 -0000
> @@ -1511,6 +1511,7 @@
>  #ifdef DIAGNOSTIC
>       movl    CPL,%ebx
>  #endif /* DIAGNOSTIC */
> +     sti
>       pushl   %esp
>       call    _C_LABEL(trap)
>       addl    $4,%esp
> 
> 
>    regards     Christian

Reply via email to