See usr/src/uts/chrp/os/startup.c

Solaris/PPC 2.6 had startup code, but in this area the
differences between 2.6 and 2.11 were greater than the
differences among processors.  So, startup.c for PowerPC
got its start from existing 2.11 startup.c for another
processor.


Sparc or x86?
-------------
For some code, we have taken modern Solaris/x86 as the
starting point, and made the changes needed for PowerPC.
In other instances, we started from the corresponding Sparc
code.  Sometimes Sparc seems like the best choice because
it is more similar to PowerPC, in some way.  Both Sparc
and PowerPC use OpenFirmware interfaces; x86 is different.
Sparc and PowerPC have 32 general-purpose registers, so
for example, they tend to pass most function parameters in
registers;  x86 is register starved.  Sparc and PowerPC
are both RISC ISAs; they have fixed size instructions,
and they both need more than one instruction to synthesize
a 32-bit constant.

For other purposes, it seems that x86 is a better starting
point.  IA32 and PPC32 are 32-bit kernels, and they both
reserve the upper portion of address space in order to
be always present.  They have similar requirements that
dictate how kernel virtual address space will be carved up.
They both have strictly physically indexed caches --
no VAC.

For startup.c, I decided to start with the IA32 startup.c.
Overall, it seemed the closest fit.


Differences
===========

OpenFirmware
------------
Although we started with x86 startup code, there are places
where we grafted in the code from Sparc, because we still
want to do things the OpenFirmware way.  For example,
memsegs can be arrays or linked lists.  I retained code
to get along with both.


Nucleus allocation
------------------
Before we sort of settled on inetboot+VOF for our bootware,
I had a scheme where I assumed that the kernel would be
mapped by a small number of BAT registers.  That means
it is pretty likely that there would be some leftover pages
at the end of the large mappings for both kernel text and
data.  That didn't seem to be a problem, because the "wasted"
text pages could be used for module text, and the leftover
data pages would constitute a "nucleus"; certain kinds of
kernel data structures would be allocated from there, first.
The term "nucleus" was borrowed, rather loosely, from Sparc
kernel startup.

One bit of logic that was retained is that allocations that
come only from the valloc area on other processors come from
a combination of nucleus and valloc, on PowerPC.  However,
with inetboot+VOF, the nucleus size is 0, which is a degenerate
case, but it works.


HAT resources from nucleus+valloc
---------------------------------
PowerPC has inverted pagetables.  We use the one global
pagetable, and it is inherited from VOF.  Capacity planning
is done based on the total amount of physical memory.
Because of all this, we know the total size to plan for HAT
resources, such as HMEs, per PTEG-Pair structures, etc.
So, some of the data structures that get allocated in
the nucleus and/or valloc area are HAT data structures.
This kind of capacity planning up front would not be done
on x86.


Module text -- limited range
----------------------------
On PowerPC, there is good reason to reserve kernel
virtual address space for module text that is VA-close
to kernel text.  This is because the PowerPC PC-relative
branch instruction has limited range.  A single branch
instruction cannot deal with a branch target more than
+- 32 MBytes away.

If kernel text were mapped by BAT register, and so there
were a kernel text nucleus from which we could allocate
module text, then that would take care of all or part
of it.  But, without that, we need to do more allocations
up front and try to get memory as close as we can to
kernel text.


No ASCII art
------------
You will not see big ASCII art drawings of the layout
of kernel virtual address space, or physical memory.
That is deliberate.

See usr/src/uts/pmdb/amap.c

See usr/src/uts/pmdb/pmdb.texinfo section on amap.

If you really miss the ASCII art, or if amap is
unsatisfactory in some other way, you might consider
improving amap, rather than investing in hand-made
ASCII art.


Switchable bop_alloc
--------------------
On PowerPC, BOP_ALLOC() resolves to the PROM services
function, bop_alloc().

See usr/src/uts/ppc/os/bootops.c

I made that a simple wrapper for a call through a function
pointer.  Up to a certain point, it resolves to "normal"
bop_alloc().  It pays attention to debug flags, so you
can get trace messages for all bop_alloc() requests.
They are simple trace messages; I did not create a
bop_alloc flight recorder.  When it is time to forbid
calls to BOP_ALLOC(), then bop_alloc_disable() is called.
Any calls to BOP_ALLOC() (bop_alloc()) now complain that
bop_alloc is dead.


Multiple allocators
-------------------
As it is for Sparc and x86, Solaris/PPC has multiple,
more or less independent, systems with their own internal
memory allocators, all trying to get along with each other,
sharing the same total pool of physical memory and virtual
address space.  I hate that.  And, some day, I would do
something completely different.  But, what can you do?
There is limited time.  Let's not try to take over the
world, just yet.

Because of this, there comes a point when the kernel
has to take over all mappings, and go through a process
of discovery: just what pages have already been mapped
by some other subsystem (inetboot or VOF, whatever).
The only agreement we can rely on is that the kernel "owns"
a certain va-range, 0xe000_0000 to 0xefff_ffff.  Since we
have to setup page_t's, page cache, etc.  before everything
settles down, there will be some pages allocated after the
last snapshot that is practical and the very last instant,
at the time kernel takes over completely.

On x86 and on Sparc, they have learned how messy this is.
I thought I learned from them.  But still, I tried
to get that last snapshot from the PROM translations.
Among the more costly lessons for me was that I cannot
use the PROM translations, I had better get my last
view of the way things are, directly from the hardware
pagetables.  It's the only way to be sure.  On PowerPC,
with a global pagetable, that means sweeping through the
entire pagetable, looking for any mappings with virtual
addresses below KERNELBASE, and making sure those
underlying physical pages are taken out of the "free list".


I/O address space
-----------------
A small amount of kernel va space is carved out for I/O.
The way things are now, it is pretty much hard-wired for
the requirements of the ODW board, with its Discovery-2
I/O chipset.

See usr/src/uts/chrp/os/startup.c, function: startup_iomem().

There must be a better way -- more flexible, more
general-purpose.  But, I just went along with something
quick and dirty to get it up and running on ODW, and let
Brian Horn make progress on his work on I/O in general,
and the 'vfe' network driver, in particular.

-- Guy Shaw


Reply via email to