I am sponsoring the following self-reviewed case for myself.
It removes support for the PT_SUNWBSS ELF program header
from Solaris. I believe this qualifies for self review:
- This feature has never been used.
- We (linker group) believe that the design of
PT_SUNWBSS prevents it from being used in the future.
- Its presence complicates the implementation of the
new mmapobj() system call described in PSARC/2008/226
-----
Release Binding: Patch/Micro
PT_SUNWBSS removal Committed
---------------------------------------------------------------------------
Problem Description:
--------------------
[This section was written by Rod Evans, and is quoted verbatim
from "6736890 PT_SUNWBSS should be disabled"].
ELF defines SHT_SUNW_MOVE sections. These sections allow
large data buffers to be expressed as NOBITS sections that
can be initialized at runtime with small numbers of data items.
The concept stemmed from a desire to provide for Fortran
common blocks, which can often be larges blocks of zero's
with a small number of no-zero items. These blocks are
traditionally defined as .data items, which means they can
take up substantial disc space.
The SHT_SUNW_MOVE implementation is described in the
Linker and Libraries Guide:
http://docs.sun.com/app/docs/doc/819-0690/6n33n7fci?l=en&a=view&q=SHT_SUNW_MOVE
The use of these sections isn't limited to Fortran, in fact
our SPARC compilers use Move sections to express large (mostly
zero) data arrays (our Intel compilers don't seem to use Move
sections at all):
% cat main1.c
#include <stdio.h>
int move_1[0x1000] = { 0, 0x11, 0x12 };
static int move_2[0x1000] = { 0x21, 0, 0x22 };
int main()
{
(void) printf("move_1[0] = 0x%x 0x%x\n", &move_1[0], move_1[0]);
(void) printf("move_1[1] = 0x%x 0x%x\n", &move_1[1], move_1[1]);
(void) printf("move_1[2] = 0x%x 0x%x\n", &move_1[2], move_1[2]);
(void) printf("move_1[3] = 0x%x 0x%x\n", &move_1[3], move_1[3]);
(void) printf("\n");
(void) printf("move_2[0] = 0x%x 0x%x\n", &move_2[0], move_2[0]);
(void) printf("move_2[1] = 0x%x 0x%x\n", &move_2[1], move_2[1]);
(void) printf("move_2[2] = 0x%x 0x%x\n", &move_2[2], move_2[2]);
(void) printf("move_2[3] = 0x%x 0x%x\n", &move_2[3], move_2[3]);
return (0);
}
% cc -V -o main1 main1.c
cc: Sun C 5.8 Patch 121015-04 2007/01/10
% elfdump -m main1
Move Section: .SUNW_move
symndx offset size repeat stride value with respect to
[1] 16384 4 1 1 0x000000000000000021 .bss (section)
[1] 16392 4 1 1 0x000000000000000022 .bss (section)
[11] 4 4 1 1 0x000000000000000011 move_1
[11] 8 4 1 1 0x000000000000000012 move_1
% ./main1
move_1[0] = 0x212a0 0x0
move_1[1] = 0x212a4 0x11
move_1[2] = 0x212a8 0x12
move_1[3] = 0x212ac 0x0
move_2[0] = 0x252a0 0x21
move_2[1] = 0x252a4 0x0
move_2[2] = 0x252a8 0x22
move_2[3] = 0x252ac 0x0
The original design for SHT_SUNW_MOVE assumed that symbols that
were to have data moved to them would either be defined as
.bss (local) or COMMON (global), as is the standard for "tentative"
symbols. However, as can be seen from the above example, and
has been confirmed by the Fortran folks, the compiler implementations
only emits .bss symbols for Move information.
ld(1) has a implementation detail that would place any COMMON Move
symbols into their own PT_SUNWBSS segment. This has little
documentation other than a table entry within the Linker and
Libraries Guide. The idea was that if multiple shared objects
defined the same PT_SUNWBSS segment, then only the first loaded
shared object would have to instantiate this segment. All other
segments would effectively be interposed upon.
However, ld.so.1 was never implemented to trigger off of this
design. Any PT_SUNWBSS segments are treated as PT_LOAD segments.
And, given the compilers don't create COMMON Move symbols, the only
way we've ever generated a PT_SUNWBSS segment is to essentially
elfedit() it into existence.
But, let's revisit the PT_SUNWBSS idea, as there seem to be
some "limitations":
i. You can't defer providing a PT_SUNWBSS segment. An object
that defines a PT_SUNWBSS segment must make a reservation
for the address space when the object is loaded. Trying
to claim the address space later (when a relocation indicates
you need the Move data) is too late, as the space may have
been used for another mapping.
ii. Although a reservation must be made, the writing of the Move
data could be deferred. However, the PT_SUNWBSS segment
should really reside on a page boundary to provide any
potential savings. Todays implementation simply
butts the PT_SUNWBSS segment up against the last
PT_LOAD segment.
iii. And can we assume that every shared object that defines
Move data, defines the same Move data, so that one complete
PT_SUNWBSS segment can interpose on all others?
Mitigation
----------
PT_SUNWBSS was introduced in 1998 (Solaris 7), as part of the
work implementing partial initialization. As described above,
PT_SUNWBSS segments have never been used. In fact, they cannot,
because their design makes assumptions that do not hold
in practice. The code that supports this has therefore been
unused amd largely undocumented. It has sat, unnoticed for years.
Recently, we became aware of it again as part of the work for
2008/226: mmapfd(2) - mmap file descriptor
The issue of how to handle PT_SUNWBSS within the mmapobj() system
call described in PSARC/2008/226 has refocused attention on PT_SUNWBSS,
and whether it is worth embedding its questionable semantics in
new kernel code. It seems clear that we cannot, and that PT_SUNWBSS
should simply be removed. This case will therefore:
- Leave the PT_SUNWBSS definition in our headers, but comment
the definition as unused.
- Remove the PT_SUNWBSS creation from ld(1), and the PT_SUNWBSS
evaluation from ld.so.1(1), and mmapobj(2).
- Remove the PT_SUNWBSS table entries from the Linker and
Libraries Guide.