Re: POSIX-compliant page fault error codes

2014-06-29 Thread Theo de Raadt
> To prevent this from happening, the X server will install a signal
> handler for SIGBUS, check if a shared memory object is being accessed
> and patch things up (by mmap'ing anonymous memory on top of the
> mapping).  This code can be extended of course by handling SIGSEGV as
> well.  But this means more work in xenocara and ports, and we might
> miss some places where this needs to be done.

I actually don't believe this theory about a SIGBUS (or even SIGSEGV)
handler "fixing things up".

Over the last two decades, I've done more than a little auditing of
signal handlers.  The only general principle I can report back about
them is that in general is that safe ones are exceedingly rare.

Fixups are a myth.  If it does happen, SIGBUS and SIGSEGV can be
handled the same unsafe way...



Re: POSIX-compliant page fault error codes

2014-06-29 Thread Mark Kettenis
> Date: Tue, 24 Jun 2014 15:53:20 -0700
> From: Matthew Dempsky 
> 
> On Tue, Jun 24, 2014 at 11:04:10AM -0700, Matthew Dempsky wrote:
> >   SIGBUS/BUS_ADRERR: Accessing a mapped page that exceeds the end of
> >   the underlying mapped file.
> 
> Generating SIGBUS for this case has proven controversial due to
> concern that this is Linux invented behavior and not compatible with
> Solaris, so I decided to collect some more background information on
> the subject.
> 
> - SunOS 4.1.3's mmap() manual specifies: "Any reference to addresses
> beyond the end of the object, however, will result in the delivery of
> a SIGBUS signal." This wording was relaxed to "SIGBUS or SIGSEGV" in
> SunOS 5.6 and remains in current manuals. (I'm not sure, but I suspect
> this may be to simply reflect that memory protection violations take
> priority over bounds checking.)

It makes sense that memory protection violations take priority over
bounds checking.

>   SunOS 4.1.3: 
> http://www.freebsd.org/cgi/man.cgi?query=mmap&sektion=2&manpath=SunOS+4.1.3
>   SunOS 5.6: 
> http://www.freebsd.org/cgi/man.cgi?query=mmap&sektion=2&manpath=SunOS+5.6
>   Solaris 11: http://docs.oracle.com/cd/E23824_01/html/821-1463/mmap-2.html
> 
> - Many other SVR-derived OSes similarly document SIGBUS in their
> mmap() manuals too:
> 
>   AIX: 
> http://www-01.ibm.com/support/knowledgecenter/ssw_aix_53/com.ibm.aix.basetechref/doc/basetrf1/mmap.htm?lang=en
>   HPUX: 
> http://h20566.www2.hp.com/portal/site/hpsc/template.BINARYPORTLET/public/kb/docDisplay/resource.process/?spf_p.tpst=kbDocDisplay_ws_BI&spf_p.rid_kbDocDisplay=docDisplayResURL&javax.portlet.begCacheTok=com.vignette.cachetoken&spf_p.rst_kbDocDisplay=wsrp-resourceState%3DdocId%253Demr_na-c02261243-2%257CdocLocale%253D&javax.portlet.endCacheTok=com.vignette.cachetoken
>   UnixWare: http://uw714doc.sco.com/en/man/html.2/mmap.2.html
> 
> - This behavior has been (awkwardly) specified for mmap() since SUSv2:
> "References within the address range starting at pa and continuing for
> len bytes to whole pages following the end of an object shall result
> in delivery of a SIGBUS signal." Later versions of POSIX have the same
> wording.
> 
>   SUSv2: http://pubs.opengroup.org/onlinepubs/007908799/xsh/mmap.html
>   POSIX.2001: 
> http://pubs.opengroup.org/onlinepubs/009695399/functions/mmap.html
>   POSIX.2008: 
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/mmap.html
> 
> - More generally, POSIX explains the SIGBUS/SIGSEGV distinction
> thusly: "When an object is mapped, various application accesses to the
> mapped region may result in signals. In this context, SIGBUS is used
> to indicate an error using the mapped object, and SIGSEGV is used to
> indicate a protection violation or misuse of an address." Specific
> examples are provided too:
> 
>   Memory Protection: 
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_08_03_03
>

Generating SIGBUS for access beyond the end of an object makes some
sense.  In this case there is a valid mapping; it's just that the
underlying physical memory pages aren't there.  It is no dissimmilar
to having mapped a physical address that maps to say the PCI bus.  On
real hardware accessing such a mapping will lead to a failed bus
transaction for which the logical representation is a SIGBUS.  (On
PeeCee hardware you'll probably get back an all-ones bit-pattern).
>From a hardware-oriented perspective, SIGSEGV is generated by the MMU
and SIGBUS is generated by the underlying hardware.

So I don't think the Sun engineers made a totally unreasonable
decision here.  Unfortunately the CRSG made a different decision when
they reimplemented mmap support in 4.3BSD-Reno.  Or perhaps things got
broken after that...

In my view, generating SIGBUS under these circumstances is a bit
unfortunate.  Currently, SIGBUS on OpenBSD is a very clear indication
of an alignment issue.  If we would generate SIGBUS for access beyond
the end of a mmap'ed object this would no longer be the case.  We'd
actually have to look at the siginfo, which isn't printed by the shell.

On the other hand, passing memory objects by fd is getting more
common.  Xorg recently modernized its shared memory interface
(MIT-SHM, aka XShm) to support mmap'ing file descriptor passed over
sockets.  And DRM is moving in the same direction to solve security
issues with access to graphics objects.  But this approach has a
downside.  A malicious client could pass an fd to the X server and
subsequently truncate it after the X server mapped it.  If the X
server accesses this mapping, it will crash.  To prevent this from
happening, the X server will install a signal handler for SIGBUS,
check if a shared memory object is being accessed and patch things up
(by mmap'ing anonymous memory on top of the mapping).  This code can
be extended of course by handling SIGSEGV as well.  But this means
more work in xenocara and ports, and we might miss some places where
this needs to be done.

Theo ha

Re: POSIX-compliant page fault error codes

2014-06-24 Thread Theo de Raadt
Matthew -- fine, you collected information.  Thank you.

It is quite clear that POSIX set in stone an accident, a significant
error in my opinion.  Anyone with enough expertise can recognize this
is an accident in the SVR4 codebase, which ended up being "ratified"
(in quotes, because the more mistakes you make, the less value there
is).  This specific refinement may help a few pieces of code which
require specific detail in siginfo, but it introduces a lot more
accidental risk in those which only use the signal number/handler.

It is complicated enough that it requires experts to review how
(typically poorly) programs (written by non-experts) use signals to
deal with this added kernel behaviour.

As in, it is bad enough that I am scared even for the way that SIGBUS
and SIGSEGV handlers in crap programs in base handle it.  The issue of
unsafe terminal signal handlers returns IN FORCE, and we need to cope
with those.  Nothing ever changes, noone ever learns, noone cares.

Where we go from continues to be a big question mark.  Compatible?
one issue.. not compatible?   another issue..  Thanks POSIX, whoever
you are.  What favors did you do us recently?



Re: POSIX-compliant page fault error codes

2014-06-24 Thread Matthew Dempsky
On Tue, Jun 24, 2014 at 11:04:10AM -0700, Matthew Dempsky wrote:
>   SIGBUS/BUS_ADRERR: Accessing a mapped page that exceeds the end of
>   the underlying mapped file.

Generating SIGBUS for this case has proven controversial due to
concern that this is Linux invented behavior and not compatible with
Solaris, so I decided to collect some more background information on
the subject.

- SunOS 4.1.3's mmap() manual specifies: "Any reference to addresses
beyond the end of the object, however, will result in the delivery of
a SIGBUS signal." This wording was relaxed to "SIGBUS or SIGSEGV" in
SunOS 5.6 and remains in current manuals. (I'm not sure, but I suspect
this may be to simply reflect that memory protection violations take
priority over bounds checking.)

  SunOS 4.1.3: 
http://www.freebsd.org/cgi/man.cgi?query=mmap&sektion=2&manpath=SunOS+4.1.3
  SunOS 5.6: 
http://www.freebsd.org/cgi/man.cgi?query=mmap&sektion=2&manpath=SunOS+5.6
  Solaris 11: http://docs.oracle.com/cd/E23824_01/html/821-1463/mmap-2.html

- Many other SVR-derived OSes similarly document SIGBUS in their
mmap() manuals too:

  AIX: 
http://www-01.ibm.com/support/knowledgecenter/ssw_aix_53/com.ibm.aix.basetechref/doc/basetrf1/mmap.htm?lang=en
  HPUX: 
http://h20566.www2.hp.com/portal/site/hpsc/template.BINARYPORTLET/public/kb/docDisplay/resource.process/?spf_p.tpst=kbDocDisplay_ws_BI&spf_p.rid_kbDocDisplay=docDisplayResURL&javax.portlet.begCacheTok=com.vignette.cachetoken&spf_p.rst_kbDocDisplay=wsrp-resourceState%3DdocId%253Demr_na-c02261243-2%257CdocLocale%253D&javax.portlet.endCacheTok=com.vignette.cachetoken
  UnixWare: http://uw714doc.sco.com/en/man/html.2/mmap.2.html

- This behavior has been (awkwardly) specified for mmap() since SUSv2:
"References within the address range starting at pa and continuing for
len bytes to whole pages following the end of an object shall result
in delivery of a SIGBUS signal." Later versions of POSIX have the same
wording.

  SUSv2: http://pubs.opengroup.org/onlinepubs/007908799/xsh/mmap.html
  POSIX.2001: http://pubs.opengroup.org/onlinepubs/009695399/functions/mmap.html
  POSIX.2008: 
http://pubs.opengroup.org/onlinepubs/9699919799/functions/mmap.html

- More generally, POSIX explains the SIGBUS/SIGSEGV distinction
thusly: "When an object is mapped, various application accesses to the
mapped region may result in signals. In this context, SIGBUS is used
to indicate an error using the mapped object, and SIGSEGV is used to
indicate a protection violation or misuse of an address." Specific
examples are provided too:

  Memory Protection: 
http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_08_03_03



POSIX-compliant page fault error codes

2014-06-24 Thread Matthew Dempsky
POSIX specifies these error cases for memory faults:

  SIGSEGV/SEGV_MAPERR: Accessing an unmapped page.

  SIGSEGV/SEGV_ACCERR: Reading from a non-readable or writing to a
  non-writable page. 

  SIGBUS/BUS_ADRERR: Accessing a mapped page that exceeds the end of
  the underlying mapped file.

I added a regress test at regress/sys/kern/siginfo-fault to cover
these cases, but unfortunately we're non-compliant in a few ways, and
fixing it is somewhat MD.  With the diff below, the tests pass on
amd64, but other platforms will need similar changes.

Currently VM_PAGER_BAD is only returned by pgo_get() in the case of
uvn_get() trying to access a page beyond the end of the file, so this
diff changes uvm_fault() to recognize this and return ENOSPC
(arbitrary unused error code) and then the MD trap() code needs to
know to map this error to BUS_ADRERR.

Additionally, some MD trap()s already know to map EACCES to
SEGV_ACCERR instead of SEGV_MAPERR, but amd64 wasn't one of them.  So
this diff fixes that too.


Index: uvm/uvm_fault.c
===
RCS file: /home/matthew/cvs-mirror/cvs/src/sys/uvm/uvm_fault.c,v
retrieving revision 1.73
diff -u -p -r1.73 uvm_fault.c
--- uvm/uvm_fault.c 8 May 2014 20:08:50 -   1.73
+++ uvm/uvm_fault.c 23 Jun 2014 21:29:24 -
@@ -1125,7 +1125,8 @@ Case2:
goto ReFault;
}
 
-   return (EACCES); /* XXX i/o error */
+   /* XXX i/o error */
+   return (result == VM_PAGER_BAD ? ENOSPC : EACCES);
}
 
/* re-verify the state of the world.  */
Index: arch/amd64/amd64/trap.c
===
RCS file: /home/matthew/cvs-mirror/cvs/src/sys/arch/amd64/amd64/trap.c,v
retrieving revision 1.40
diff -u -p -r1.40 trap.c
--- arch/amd64/amd64/trap.c 15 Jun 2014 11:43:24 -  1.40
+++ arch/amd64/amd64/trap.c 23 Jun 2014 21:38:31 -
@@ -387,9 +387,6 @@ faultcommon:
KERNEL_UNLOCK();
goto out;
}
-   if (error == EACCES) {
-   error = EFAULT;
-   }
 
if (type == T_PAGEFLT) {
if (pcb->pcb_onfault != 0) {
@@ -407,13 +404,23 @@ faultcommon:
sv.sival_ptr = (void *)fa;
trapsignal(p, SIGKILL, T_PAGEFLT, SEGV_MAPERR, sv);
} else {
+   int signo, code;
+   if (error == ENOSPC) {
+   signo = SIGBUS;
+   code = BUS_ADRERR;
+   } else {
+   signo = SIGSEGV;
+   code = (error == EACCES) ? SEGV_ACCERR :
+   SEGV_MAPERR;
+   }
 #ifdef TRAP_SIGDEBUG
-   printf("pid %d (%s): SEGV at rip %lx addr %lx\n",
-   p->p_pid, p->p_comm, frame->tf_rip, fa);
+   printf("pid %d (%s): %s at rip %lx addr %lx\n",
+   p->p_pid, p->p_comm, (signo == SIGBUS) ?
+   "BUS" : "SEGV", frame->tf_rip, fa);
frame_dump(frame);
 #endif
sv.sival_ptr = (void *)fa;
-   trapsignal(p, SIGSEGV, T_PAGEFLT, SEGV_MAPERR, sv);
+   trapsignal(p, signo, T_PAGEFLT, code, sv);
}
KERNEL_UNLOCK();
break;