See usr/src/uts/chrp/os/startup.c Solaris/PPC 2.6 had startup code, but in this area the differences between 2.6 and 2.11 were greater than the differences among processors. So, startup.c for PowerPC got its start from existing 2.11 startup.c for another processor.
Sparc or x86? ------------- For some code, we have taken modern Solaris/x86 as the starting point, and made the changes needed for PowerPC. In other instances, we started from the corresponding Sparc code. Sometimes Sparc seems like the best choice because it is more similar to PowerPC, in some way. Both Sparc and PowerPC use OpenFirmware interfaces; x86 is different. Sparc and PowerPC have 32 general-purpose registers, so for example, they tend to pass most function parameters in registers; x86 is register starved. Sparc and PowerPC are both RISC ISAs; they have fixed size instructions, and they both need more than one instruction to synthesize a 32-bit constant. For other purposes, it seems that x86 is a better starting point. IA32 and PPC32 are 32-bit kernels, and they both reserve the upper portion of address space in order to be always present. They have similar requirements that dictate how kernel virtual address space will be carved up. They both have strictly physically indexed caches -- no VAC. For startup.c, I decided to start with the IA32 startup.c. Overall, it seemed the closest fit. Differences =========== OpenFirmware ------------ Although we started with x86 startup code, there are places where we grafted in the code from Sparc, because we still want to do things the OpenFirmware way. For example, memsegs can be arrays or linked lists. I retained code to get along with both. Nucleus allocation ------------------ Before we sort of settled on inetboot+VOF for our bootware, I had a scheme where I assumed that the kernel would be mapped by a small number of BAT registers. That means it is pretty likely that there would be some leftover pages at the end of the large mappings for both kernel text and data. That didn't seem to be a problem, because the "wasted" text pages could be used for module text, and the leftover data pages would constitute a "nucleus"; certain kinds of kernel data structures would be allocated from there, first. The term "nucleus" was borrowed, rather loosely, from Sparc kernel startup. One bit of logic that was retained is that allocations that come only from the valloc area on other processors come from a combination of nucleus and valloc, on PowerPC. However, with inetboot+VOF, the nucleus size is 0, which is a degenerate case, but it works. HAT resources from nucleus+valloc --------------------------------- PowerPC has inverted pagetables. We use the one global pagetable, and it is inherited from VOF. Capacity planning is done based on the total amount of physical memory. Because of all this, we know the total size to plan for HAT resources, such as HMEs, per PTEG-Pair structures, etc. So, some of the data structures that get allocated in the nucleus and/or valloc area are HAT data structures. This kind of capacity planning up front would not be done on x86. Module text -- limited range ---------------------------- On PowerPC, there is good reason to reserve kernel virtual address space for module text that is VA-close to kernel text. This is because the PowerPC PC-relative branch instruction has limited range. A single branch instruction cannot deal with a branch target more than +- 32 MBytes away. If kernel text were mapped by BAT register, and so there were a kernel text nucleus from which we could allocate module text, then that would take care of all or part of it. But, without that, we need to do more allocations up front and try to get memory as close as we can to kernel text. No ASCII art ------------ You will not see big ASCII art drawings of the layout of kernel virtual address space, or physical memory. That is deliberate. See usr/src/uts/pmdb/amap.c See usr/src/uts/pmdb/pmdb.texinfo section on amap. If you really miss the ASCII art, or if amap is unsatisfactory in some other way, you might consider improving amap, rather than investing in hand-made ASCII art. Switchable bop_alloc -------------------- On PowerPC, BOP_ALLOC() resolves to the PROM services function, bop_alloc(). See usr/src/uts/ppc/os/bootops.c I made that a simple wrapper for a call through a function pointer. Up to a certain point, it resolves to "normal" bop_alloc(). It pays attention to debug flags, so you can get trace messages for all bop_alloc() requests. They are simple trace messages; I did not create a bop_alloc flight recorder. When it is time to forbid calls to BOP_ALLOC(), then bop_alloc_disable() is called. Any calls to BOP_ALLOC() (bop_alloc()) now complain that bop_alloc is dead. Multiple allocators ------------------- As it is for Sparc and x86, Solaris/PPC has multiple, more or less independent, systems with their own internal memory allocators, all trying to get along with each other, sharing the same total pool of physical memory and virtual address space. I hate that. And, some day, I would do something completely different. But, what can you do? There is limited time. Let's not try to take over the world, just yet. Because of this, there comes a point when the kernel has to take over all mappings, and go through a process of discovery: just what pages have already been mapped by some other subsystem (inetboot or VOF, whatever). The only agreement we can rely on is that the kernel "owns" a certain va-range, 0xe000_0000 to 0xefff_ffff. Since we have to setup page_t's, page cache, etc. before everything settles down, there will be some pages allocated after the last snapshot that is practical and the very last instant, at the time kernel takes over completely. On x86 and on Sparc, they have learned how messy this is. I thought I learned from them. But still, I tried to get that last snapshot from the PROM translations. Among the more costly lessons for me was that I cannot use the PROM translations, I had better get my last view of the way things are, directly from the hardware pagetables. It's the only way to be sure. On PowerPC, with a global pagetable, that means sweeping through the entire pagetable, looking for any mappings with virtual addresses below KERNELBASE, and making sure those underlying physical pages are taken out of the "free list". I/O address space ----------------- A small amount of kernel va space is carved out for I/O. The way things are now, it is pretty much hard-wired for the requirements of the ODW board, with its Discovery-2 I/O chipset. See usr/src/uts/chrp/os/startup.c, function: startup_iomem(). There must be a better way -- more flexible, more general-purpose. But, I just went along with something quick and dirty to get it up and running on ODW, and let Brian Horn make progress on his work on I/O in general, and the 'vfe' network driver, in particular. -- Guy Shaw