I note that I've written over a hundred lines of rant in response to his previous email already. I should dig back through this and turn it into proper documentation at some point. (Especially since Elliott knows more of this stuff than I do so I'm likely to get corrected a lot here...)
On 1/2/24 20:54, enh wrote: >> You can look at /proc/self/maps (and /proc/self/smaps, and >> /proc/self/smaps_rollup) to see them for a running process (replace "self" >> with >> any running PID, self is a symlink to your current PID). The six sections >> are: >> >> text - the executable functions: mmap(MAP_PRIVATE, PROT_READ|PROT_EXEC) >> rodata - const globals, string constants, etc: mmap(MAP_PRIVATE, PROT_READ) >> data - writeable data initialized to nonzero: mmap(MAP_PRIVATE, PROT_WRITE) >> bss - writeable data initialized to zero: mmap(MAP_ANON, PROT_WRITE) >> stack - function call stack, also contains environment data >> heap - backing store for malloc() and free() > > (Android and modern linux distros require the relro section too. I thought that was only needed for dynamic linking? Then again you don't allow a lot of static stuff to run on the final system anyway... (The line between PIE and dynamic linking confuses even me. How does static PIE relocate itself? I _think_ I looked it up once, but "it's statically links in a dynamic linker in the pile of crt1.o and begin.o files" _can't_ be right...) > interestingly, there _is_ an elf program header for the stack, to > signal that you don't want an executable stack. iirc Android and [very > very recently] modern linux distros won't let you start a process with > an executable main stack, but afaik the code for the option no-one has > wanted/needed for a very long time is still in the kernel.) Cool. These days there's also vdso and vvar, which are provided by the kernel at runtime. The first is a .text section with magic functions you can call as an alternative to syscalls, and the second is a magic .rodata section that provides volatile variables the OS updates which you can just reach out and look at. Between the two of them you can do things like check the current timestamp without a system call. What they actually provide varies by OS (and then your libc has to be taught to use each new capability out of there instead of making the syscalls). "cat /proc/self/maps" and they're the last two entries if present. There is a "man 7 vdso" but I dunno how up to date it is. (Which gets us back to Michael Kerrisk's retirement and the new guy NOT MAINTAINING A WEB COPY. Grrr.) Maintaining backwards compatibility means keeping a lot of old stuff. I had a talk with Rich Felker last night on IRC about what musl-libc's syscall requirements actually _are_, and what it would take to repot it on top of a posix-ish RTOS du jour. (Makes the trusting trust cleansing cycle smaller if you can cross compile Linux from an RTOS...) We didn't come to a conclusion, but I _did_ get permission from skarnet to use his git://git.skarnet.org/mdevd under 0BSD. (POrting that to toybox seems easier than bringing my old mdev code up to speed for all the https://github.com/slashbeast/mdev-like-a-boss stuff it's grown since I handed it off. >> The first three of those literally exist in the ELF file, as in it mmap()s a >> block of data out of the file at a starting offset, and the memory is thus >> automatically populated with data from the file. The text and rodata ones >> don't >> really care if it's MAP_PRIVATE or MAP_SHARED because they can never write >> anything back to the file, but the data one cares that it's MAP_PRIVATE: any >> changes stay local and do NOT get written back to the file. And the bss is an >> anonymous mapping so starts zeroed, the file doesn't bother wasting space on >> a >> run of zeroes when the OS can just provide that on request. (It stands for >> Block >> Starting Symbol which I assume meant something useful 40 years ago on DEC >> hardware.) > > (close, but it was IBM and the name was slightly different: > https://en.wikipedia.org/wiki/.bss#Origin) That says United Aircraft Corporation named it using IBM 704 hardware in an assembler and then in fortran. (I only give wikipedia[citation needed] about an 80% chance to be accurate about any given fact, but am not root causing it right now. :) I like to track down magic acronyms, ala grep meaning "get regular expression". I once emailed Dennis Ritchie to ask what "inode" meant: https://lkml.iu.edu/hypermail/linux/kernel/0207.2/1182.html But in this case I stopped paying attention once I confirmed it doesn't mean anything of modern relevance. The interesting part (to me) is that the name predates unix by almost 20 years (mainframe legacy predating even the PDP-1), and predating ELF by 40 years. (The first OS with ELF binaries was Solaris 2.0 released in 1992. Linux switched over 3-4 years later.) If it wasn't a legacy acronym from shortly after world war II, it would probably be called something like the "zero section" and we wouldn't have to memorize what it means. :) >> The stack is also set up by the kernel, and is funny in three ways: >> >> 1) it has environment data at the end (so all your inherited environment >> variables, and your argv[] arguments, plus an array of pointers to the start >> of >> each string which is what char *argv[] and char *environ[] actually point to. >> The kernel's task struct also used to live there, but these days there's a >> separate "kernel stack" and I'd have to look up where things physically are >> now >> and what's user visible. > > (plus the confusingly named "ELF aux values", which come from the > kernel, and aren't really anything to do with ELF --- almost by > definition, since they're things that the binary _can't_ know like > "what's the actual page size of the system i'm _running_ on?" or > "what's the l1d cache size of the system i'm _running_ on?".) Are they in the stack? I know the pointer is passed to _start() (often not in a proper argument, in a REGISTER), but hadn't tracked down where it actually lived. Stack makes sense... Sadly, I have had to care about the auxiliary vector on far too many occasions: man 3 getauxval >> 3) The stack generally has _two_ pointers, a "stack pointer" and a "base >> pointer" which I always get confused. One of them points to the start of the >> mapping (kinda important to keep track of where your mappings are), and the >> other one moves (gets subtracted from and added to and offset to access local >> variables). > > (s/base pointer/frame pointer/ for everything except x86. and actually > _both_ change. it's the "base" of the current stack _frame_, not the > whole stack. for a concrete example: alloca() changes the stack > pointer, but not the frame pointer. so local variables offsets > relative to fp will be constant throughout the function, whereas > offsets relative to sp can change. [stacked values of] fp is also what > you're using when you're unwinding.) I only implemented alloca() for my tinycc fork on 32-bit x86, and that was back in 2008. I'm hoping to sit on tonight's https://meet.jit.si/golug at 6pm about creating a compiler with a recursive descent parser, and someday hope to read https://norasandler.com/2017/11/29/Write-a-Compiler.html and the corresponding https://nostarch.com/writing-c-compiler and https://github.com/nlsandler/nqcc but right now restarting my https://landley.net/code/qcc is not even on the back burner... >> All this is ignoring dynamic linking, in which case EACH library has those >> first >> four sections (plus a PLT and GOT which have to nest since the shared >> libraries >> are THEMSELVES dynamically linked, which is why you need to run ldd >> recursively >> when harvesting binaries, although what it does to them at runtime I try not >> to >> examine too closely after eating). There should still only be one stack and >> heap >> shared by each process though. > > (one stack _per thread_ in the process. and the main thread stack is > very different from thread stacks.) A thread is a process with brain damage inherited from solaris' limitations, but you're right. I just mentally gloss over threads as "process with training wheels and 5x the debugging effort". Even before the ~7 year period where I thought java was a good idea, I had to use threading VERY EXTENSIVELY on OS/2. The "workplace shell" desktop was a single process with many, many threads, so any desktop programming there meant creating a shared library the workplace shell process would dlopen() and launch threads for. I got very, very good at debugging thread issues, once upon a time. (And I've debugged a lot of OTHER people's threading issues as a consultant. The oil exploration company that bought three different programs and mushed them together into a single highly threaded process that leaked like a sieve and segfaulted randomly. The 2018 project that replaced WinCE with Linux when microsoft end-of-lifed wince, resulting in an 80 thread application process, half of which were C# code running in mono and the other half were linux native code sharing the same address space, and the PROBLEM was on the ~200 mhz deployment hardware they had a warehouse full of and wanted to keep selling, fork() caused a 75 millisecond latency spike in EVERY OTHER THREAD because the kernel took one look at that mess and locked the whole vma until fork() had finished copying everything, which meant a thread spawning a child process caused the token-ring-like bus to timeout and drop connection. Which meant I got to do a real world use of vfork() on a system with an MMU, because that only suspends the PARENT thread, not all the other threads in the process, and vfork()/exec() isn't much that much harder to program around than fork()/exec().) My modern reaction to dealing with threads is... https://www.youtube.com/watch?v=hlVwbpm4eHI They're SOMETIMES the right tool for the job? Occasionally? Maybe? >> If you launch dozens of instances of the same program, the read only sections >> (text and rodata) are shared between all the instances. (This is why nommu >> systems needed to invent fdpic: in conventional ELF everything uses absolute >> addresses, which is find when you've got an MMU because each process has its >> own >> virtual address range starting at zero. (Generally libc or something will >> mmap() >> about 64k of "cannot read, cannot write, cannot execute" memory there so any >> attempt to dereference a NULL pointer segfaults, but other than that...) >> >> But shared libraries need to move so they can fit around stuff. Back in the >> a.out days each shared library was also linked at an absolute address (just >> one >> well above zero, out of the way of most programs), meaning when putting >> together >> a system you needed a registry of what addresses were used by each library, >> and >> you'd have to supply an address range to each library you were building as >> part >> of the compiler options (or linker script or however that build did it). This >> sucked tremendously. > > (funnily enough, this gets reinvented as an optimization every couple > of decades. iirc macOS has "prelinking" again, but Android is > currently in the no-prelinking phase of the cycle.) The old line about how there are two hard problems in computer science: naming things, cache invalidation, and fencepost errors. This falls under 'cache invalidation", which more generically is "object lifetime rules". The really FUN one is the horrible trick people did on various embedded systems for fast boot, or on OpenVZ as part of the live migration, where they'd basically core dump a process, load it into a debugger, and resume. Thus skipping all the setup! (Assuming NOTHING HAS CHANGED in the context the resumed process expects around it. Luckily X11 has "detach and restart" plumbing that lets it reopen a process's network pipe without killing the window or the process, because network connections hanging and needing retry isn't a new thing.) Sigh, I did a whole rant about what would be involved in kernel upgrades without reboots way back in 2002: https://lkml.iu.edu/hypermail/linux/kernel/0206.2/0610.html https://lkml.iu.edu/hypermail/linux/kernel/0206.2/0835.html https://lkml.iu.edu/hypermail/linux/kernel/0206.2/1244.html And I was just going "this is _hard_" but people tracked me down from that and had me help IMPLEMENT some of that stuff over the years. The hard part was that processes act in GROUPS: parent/child relationships and pipelines and so on, and the kernel had no way to group processes. Enter "container" support, and me helping the parallels/OpenVZ guys explain _why_ the kernel could benefit from it. (The number of times I've been hired as a programmer and wound up spending most of my energy as a combination tech writer and marketer...) Sigh, I gotta go get on an airplane now, so stopping here for the moment... Rob _______________________________________________ Toybox mailing list [email protected] http://lists.landley.net/listinfo.cgi/toybox-landley.net
