Re: POSIX-compliant page fault error codes
> To prevent this from happening, the X server will install a signal > handler for SIGBUS, check if a shared memory object is being accessed > and patch things up (by mmap'ing anonymous memory on top of the > mapping). This code can be extended of course by handling SIGSEGV as > well. But this means more work in xenocara and ports, and we might > miss some places where this needs to be done. I actually don't believe this theory about a SIGBUS (or even SIGSEGV) handler "fixing things up". Over the last two decades, I've done more than a little auditing of signal handlers. The only general principle I can report back about them is that in general is that safe ones are exceedingly rare. Fixups are a myth. If it does happen, SIGBUS and SIGSEGV can be handled the same unsafe way...
Re: POSIX-compliant page fault error codes
> Date: Tue, 24 Jun 2014 15:53:20 -0700 > From: Matthew Dempsky > > On Tue, Jun 24, 2014 at 11:04:10AM -0700, Matthew Dempsky wrote: > > SIGBUS/BUS_ADRERR: Accessing a mapped page that exceeds the end of > > the underlying mapped file. > > Generating SIGBUS for this case has proven controversial due to > concern that this is Linux invented behavior and not compatible with > Solaris, so I decided to collect some more background information on > the subject. > > - SunOS 4.1.3's mmap() manual specifies: "Any reference to addresses > beyond the end of the object, however, will result in the delivery of > a SIGBUS signal." This wording was relaxed to "SIGBUS or SIGSEGV" in > SunOS 5.6 and remains in current manuals. (I'm not sure, but I suspect > this may be to simply reflect that memory protection violations take > priority over bounds checking.) It makes sense that memory protection violations take priority over bounds checking. > SunOS 4.1.3: > http://www.freebsd.org/cgi/man.cgi?query=mmap&sektion=2&manpath=SunOS+4.1.3 > SunOS 5.6: > http://www.freebsd.org/cgi/man.cgi?query=mmap&sektion=2&manpath=SunOS+5.6 > Solaris 11: http://docs.oracle.com/cd/E23824_01/html/821-1463/mmap-2.html > > - Many other SVR-derived OSes similarly document SIGBUS in their > mmap() manuals too: > > AIX: > http://www-01.ibm.com/support/knowledgecenter/ssw_aix_53/com.ibm.aix.basetechref/doc/basetrf1/mmap.htm?lang=en > HPUX: > http://h20566.www2.hp.com/portal/site/hpsc/template.BINARYPORTLET/public/kb/docDisplay/resource.process/?spf_p.tpst=kbDocDisplay_ws_BI&spf_p.rid_kbDocDisplay=docDisplayResURL&javax.portlet.begCacheTok=com.vignette.cachetoken&spf_p.rst_kbDocDisplay=wsrp-resourceState%3DdocId%253Demr_na-c02261243-2%257CdocLocale%253D&javax.portlet.endCacheTok=com.vignette.cachetoken > UnixWare: http://uw714doc.sco.com/en/man/html.2/mmap.2.html > > - This behavior has been (awkwardly) specified for mmap() since SUSv2: > "References within the address range starting at pa and continuing for > len bytes to whole pages following the end of an object shall result > in delivery of a SIGBUS signal." Later versions of POSIX have the same > wording. > > SUSv2: http://pubs.opengroup.org/onlinepubs/007908799/xsh/mmap.html > POSIX.2001: > http://pubs.opengroup.org/onlinepubs/009695399/functions/mmap.html > POSIX.2008: > http://pubs.opengroup.org/onlinepubs/9699919799/functions/mmap.html > > - More generally, POSIX explains the SIGBUS/SIGSEGV distinction > thusly: "When an object is mapped, various application accesses to the > mapped region may result in signals. In this context, SIGBUS is used > to indicate an error using the mapped object, and SIGSEGV is used to > indicate a protection violation or misuse of an address." Specific > examples are provided too: > > Memory Protection: > http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_08_03_03 > Generating SIGBUS for access beyond the end of an object makes some sense. In this case there is a valid mapping; it's just that the underlying physical memory pages aren't there. It is no dissimmilar to having mapped a physical address that maps to say the PCI bus. On real hardware accessing such a mapping will lead to a failed bus transaction for which the logical representation is a SIGBUS. (On PeeCee hardware you'll probably get back an all-ones bit-pattern). >From a hardware-oriented perspective, SIGSEGV is generated by the MMU and SIGBUS is generated by the underlying hardware. So I don't think the Sun engineers made a totally unreasonable decision here. Unfortunately the CRSG made a different decision when they reimplemented mmap support in 4.3BSD-Reno. Or perhaps things got broken after that... In my view, generating SIGBUS under these circumstances is a bit unfortunate. Currently, SIGBUS on OpenBSD is a very clear indication of an alignment issue. If we would generate SIGBUS for access beyond the end of a mmap'ed object this would no longer be the case. We'd actually have to look at the siginfo, which isn't printed by the shell. On the other hand, passing memory objects by fd is getting more common. Xorg recently modernized its shared memory interface (MIT-SHM, aka XShm) to support mmap'ing file descriptor passed over sockets. And DRM is moving in the same direction to solve security issues with access to graphics objects. But this approach has a downside. A malicious client could pass an fd to the X server and subsequently truncate it after the X server mapped it. If the X server accesses this mapping, it will crash. To prevent this from happening, the X server will install a signal handler for SIGBUS, check if a shared memory object is being accessed and patch things up (by mmap'ing anonymous memory on top of the mapping). This code can be extended of course by handling SIGSEGV as well. But this means more work in xenocara and ports, and we might miss some places where this needs to be done. Theo ha
Re: POSIX-compliant page fault error codes
Matthew -- fine, you collected information. Thank you. It is quite clear that POSIX set in stone an accident, a significant error in my opinion. Anyone with enough expertise can recognize this is an accident in the SVR4 codebase, which ended up being "ratified" (in quotes, because the more mistakes you make, the less value there is). This specific refinement may help a few pieces of code which require specific detail in siginfo, but it introduces a lot more accidental risk in those which only use the signal number/handler. It is complicated enough that it requires experts to review how (typically poorly) programs (written by non-experts) use signals to deal with this added kernel behaviour. As in, it is bad enough that I am scared even for the way that SIGBUS and SIGSEGV handlers in crap programs in base handle it. The issue of unsafe terminal signal handlers returns IN FORCE, and we need to cope with those. Nothing ever changes, noone ever learns, noone cares. Where we go from continues to be a big question mark. Compatible? one issue.. not compatible? another issue.. Thanks POSIX, whoever you are. What favors did you do us recently?
Re: POSIX-compliant page fault error codes
On Tue, Jun 24, 2014 at 11:04:10AM -0700, Matthew Dempsky wrote: > SIGBUS/BUS_ADRERR: Accessing a mapped page that exceeds the end of > the underlying mapped file. Generating SIGBUS for this case has proven controversial due to concern that this is Linux invented behavior and not compatible with Solaris, so I decided to collect some more background information on the subject. - SunOS 4.1.3's mmap() manual specifies: "Any reference to addresses beyond the end of the object, however, will result in the delivery of a SIGBUS signal." This wording was relaxed to "SIGBUS or SIGSEGV" in SunOS 5.6 and remains in current manuals. (I'm not sure, but I suspect this may be to simply reflect that memory protection violations take priority over bounds checking.) SunOS 4.1.3: http://www.freebsd.org/cgi/man.cgi?query=mmap&sektion=2&manpath=SunOS+4.1.3 SunOS 5.6: http://www.freebsd.org/cgi/man.cgi?query=mmap&sektion=2&manpath=SunOS+5.6 Solaris 11: http://docs.oracle.com/cd/E23824_01/html/821-1463/mmap-2.html - Many other SVR-derived OSes similarly document SIGBUS in their mmap() manuals too: AIX: http://www-01.ibm.com/support/knowledgecenter/ssw_aix_53/com.ibm.aix.basetechref/doc/basetrf1/mmap.htm?lang=en HPUX: http://h20566.www2.hp.com/portal/site/hpsc/template.BINARYPORTLET/public/kb/docDisplay/resource.process/?spf_p.tpst=kbDocDisplay_ws_BI&spf_p.rid_kbDocDisplay=docDisplayResURL&javax.portlet.begCacheTok=com.vignette.cachetoken&spf_p.rst_kbDocDisplay=wsrp-resourceState%3DdocId%253Demr_na-c02261243-2%257CdocLocale%253D&javax.portlet.endCacheTok=com.vignette.cachetoken UnixWare: http://uw714doc.sco.com/en/man/html.2/mmap.2.html - This behavior has been (awkwardly) specified for mmap() since SUSv2: "References within the address range starting at pa and continuing for len bytes to whole pages following the end of an object shall result in delivery of a SIGBUS signal." Later versions of POSIX have the same wording. SUSv2: http://pubs.opengroup.org/onlinepubs/007908799/xsh/mmap.html POSIX.2001: http://pubs.opengroup.org/onlinepubs/009695399/functions/mmap.html POSIX.2008: http://pubs.opengroup.org/onlinepubs/9699919799/functions/mmap.html - More generally, POSIX explains the SIGBUS/SIGSEGV distinction thusly: "When an object is mapped, various application accesses to the mapped region may result in signals. In this context, SIGBUS is used to indicate an error using the mapped object, and SIGSEGV is used to indicate a protection violation or misuse of an address." Specific examples are provided too: Memory Protection: http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_08_03_03
POSIX-compliant page fault error codes
POSIX specifies these error cases for memory faults: SIGSEGV/SEGV_MAPERR: Accessing an unmapped page. SIGSEGV/SEGV_ACCERR: Reading from a non-readable or writing to a non-writable page. SIGBUS/BUS_ADRERR: Accessing a mapped page that exceeds the end of the underlying mapped file. I added a regress test at regress/sys/kern/siginfo-fault to cover these cases, but unfortunately we're non-compliant in a few ways, and fixing it is somewhat MD. With the diff below, the tests pass on amd64, but other platforms will need similar changes. Currently VM_PAGER_BAD is only returned by pgo_get() in the case of uvn_get() trying to access a page beyond the end of the file, so this diff changes uvm_fault() to recognize this and return ENOSPC (arbitrary unused error code) and then the MD trap() code needs to know to map this error to BUS_ADRERR. Additionally, some MD trap()s already know to map EACCES to SEGV_ACCERR instead of SEGV_MAPERR, but amd64 wasn't one of them. So this diff fixes that too. Index: uvm/uvm_fault.c === RCS file: /home/matthew/cvs-mirror/cvs/src/sys/uvm/uvm_fault.c,v retrieving revision 1.73 diff -u -p -r1.73 uvm_fault.c --- uvm/uvm_fault.c 8 May 2014 20:08:50 - 1.73 +++ uvm/uvm_fault.c 23 Jun 2014 21:29:24 - @@ -1125,7 +1125,8 @@ Case2: goto ReFault; } - return (EACCES); /* XXX i/o error */ + /* XXX i/o error */ + return (result == VM_PAGER_BAD ? ENOSPC : EACCES); } /* re-verify the state of the world. */ Index: arch/amd64/amd64/trap.c === RCS file: /home/matthew/cvs-mirror/cvs/src/sys/arch/amd64/amd64/trap.c,v retrieving revision 1.40 diff -u -p -r1.40 trap.c --- arch/amd64/amd64/trap.c 15 Jun 2014 11:43:24 - 1.40 +++ arch/amd64/amd64/trap.c 23 Jun 2014 21:38:31 - @@ -387,9 +387,6 @@ faultcommon: KERNEL_UNLOCK(); goto out; } - if (error == EACCES) { - error = EFAULT; - } if (type == T_PAGEFLT) { if (pcb->pcb_onfault != 0) { @@ -407,13 +404,23 @@ faultcommon: sv.sival_ptr = (void *)fa; trapsignal(p, SIGKILL, T_PAGEFLT, SEGV_MAPERR, sv); } else { + int signo, code; + if (error == ENOSPC) { + signo = SIGBUS; + code = BUS_ADRERR; + } else { + signo = SIGSEGV; + code = (error == EACCES) ? SEGV_ACCERR : + SEGV_MAPERR; + } #ifdef TRAP_SIGDEBUG - printf("pid %d (%s): SEGV at rip %lx addr %lx\n", - p->p_pid, p->p_comm, frame->tf_rip, fa); + printf("pid %d (%s): %s at rip %lx addr %lx\n", + p->p_pid, p->p_comm, (signo == SIGBUS) ? + "BUS" : "SEGV", frame->tf_rip, fa); frame_dump(frame); #endif sv.sival_ptr = (void *)fa; - trapsignal(p, SIGSEGV, T_PAGEFLT, SEGV_MAPERR, sv); + trapsignal(p, signo, T_PAGEFLT, code, sv); } KERNEL_UNLOCK(); break;