[powerpc-discuss] Solaris/PPC HAT: 2.6 to 2.11

Guy Shaw Fri, 22 Feb 2008 23:16:52 -0800

There are a couple of big decisions that were made
in Solaris/PPC 2.6: 1) big-endian; 2) always-resident
kernel address space.


Endian
------
Originally, Solaris/PPC was implemented little-endian.
This was because IBM had some reason they needed it
that way.  And, Sun was not that committed, either way.
Little-endian is a bit more complicated, because data
access to the page table is always big-endian, no matter
what the mode of the processor (MSR[LE]).

Solaris/PPC 2.11 is big-endian.

Sun had a policy of trying to maintain the code so that it
would work in either mode.  But, there were a few places
where things needed to be adjusted, once it was really
put to the test.


kernel address space
--------------------
Kernel address space is always mapped, and therefore
reduces usable address space for all user-land processes.
The kernel reserves the highest two segments (14 and 15).

I bet this is an unpopular decision.  The reasons are
partly for performance, partly due to limitations of
the PowerPC instruction set architecture, and partly
historical.  Sun already had a port to x86 that carved out
user address in this way.  IBM already had Unix code for
other processors that worked this way.  Moving data between
the kernel and user-land could not be done as efficiently
without always-resident kernel mappings.  Sparc has load
and store instructions through Address Space Identifiers
(ASI), but neither PowerPC nor x86 has any such thing.

I have seen slideware for new PowerPC models that have
something like ASIs, but I can't see designing the Solaris
HAT code around that.  Not now, anyway.

For 2.11, this was not changed.

A cool project would be to redesign the Solaris/PPC to
allow user-land to use all 16 segments, and do a fast
shuffle of mappings, as needed, to implement copyin()
and copyout() and related functions.  That would be a big
project, considering the amount of kernel code that "knows"
about these two segments, and relies on just having its
way with the remaining user-land address space, and
accessing kernel address space at the same time.

And then, there is the fact that the ABI has baked that in.
But, for some embedded applications, it may be that nobody
cares about legacy code.

Aspect: Compiler
----------------
2.6 used a Sun compiler that I know little about.  I do
know that I never did like the inlining facilities of Sun
compilers, and judging by the amount of use it got, a lot
of other people had issues with it, too.  A great deal
of code was written in assembly language that would not
have been necessary if there were an inlining mechanism
that had "arrived".  That is, if it were easier to use,
more standard, more easily understood.

For Polaris, I made heavy use of GCC extensions, especially
__inline__ and __asm__().  It is not like we have to
worry about getting along with Sun's current compiler
for PowerPC.  I will worry about that problem when the
time comes.  It would be a good problem to have, because
it would mean that Solaris/PPC made it as a product.

Anyway, all uses of __inline__ and __asm__() are easy
to reverse.  The fall-back position, in the worst case, is
to revert to calling external functions.  A less drastic
reform would be to adhere strictly to a convention for
using the preprocessor to make things work either way,
at all times.  It is conceptually easy.  I should have
done that.  I just didn't get around to it.  Sorry.

Pretty much any single PowerPC instruction that did
something that can't be done in C was made into a function
of the same name as the opcode.  One exception is that I
had to name sync() something else, because it is taken.
Sync() is called by common code.  It is the function to
sync up filesystems.  The inline function, ppc_sync(),
is the single PowerPC instruction, 'sync'.

In addition, a few two-instruction sequences were added.

See usr/src/uts/ppc/sys/ppc_instr.h.

I did go overboard.  I tried to implement even non-local
flow of control functions in GCC, such as setjmp() and
longjmp().  That was a big mistake.  What can I say?  I was
overly enthusiastic, and didn't think it through very well.
I backed off on that.

By the way, the heavy use of GCC extensions is not confined
to the HAT layer.  It gets mention here, because the HAT
code was modified to exploit them, first, and still makes
the most use of them, for things like segment register
and BAT register management, and so on.

Besides getting the hang of the __inline__ and __asm__(),
you have to have some confidence in the optimizer.
I played around with -xO3, -xO4, and -xO5, and a few other
command line options and examined the resulting code.
It falls short sometimes, but seems good enough.

We make very little use of GCC optimization, so far, for
two reasons.  First, because it is just generally a bad
idea to be concerned about optimization at that level of
abstraction, early on.  Second, because we were trying to
make use of a debugger that could not understand how to
relate back to source code when any optimization was done,
and objdump -S did not work very well on optimized code.

All that I felt was needed at the time was a sample of
GCC optimization to gain sufficient confidence in it
and make a decision about how much I could get away
with writing in C.  But, actually doing any optimization,
now, is not so important.


Aspect: Constructors and accessors
----------------------------------
2.6 made use of bit masks and C bit fields in an ad hoc
way, as needed.  I guess that is pretty standard practice.
But, I have always had trouble with bit fields.  They can
be different, depending on big-endian vs. little-endian,
and even with the same endianness, the C compiler can
assign bits in either order.  Also, C bit fields are not
well-defined for long or long long integral types.

When I did the Solaris/IA64 HAT, I did everything that
might ordinarily be done with C bit fields, shifts, and
masks using constructor and accessor functions, instead.
I used a very simple mini-language to describe bit fields,
and generated header files from those specifications.

It worked out well.  So, on Solaris/PPC, I did the same
thing, and systematically converted all HAT code to do
things that way.

See usr/src/uts/ppc/sysgen/*.fd.

There are several little decisions about what notation to
use.  My rule is, use whatever notation is in the hardware
reference manuals.  For IA64, bits are numbered right to
left, so I did that.  The PowerPC reference manuals number
bits from left to right (bit 0 is most significant bit),
so I use that notation.  The idea is to make it as easy
as possible to transcribe directly and faithfully from
the reference manual.

Since header files are generated, naming conventions are
enforced.  Extracting a field is always done by calling
the accessor function, <OBJECT>_GET_<FIELD>(value).

For example:
  SR_GET_VSID(x) extracts the VSID field
  from a segment register.

  SR_SET_VSID(sr, v) deposits the value v
  in the VSID field of a segment register.
  sr is not modified; the modified value
  is the return value of the SR_SET_VSID
  function.

  SR_NEW(t, ks, kp, n, vsid) constructs a
  segment register.

Accessor functions can always be implemented as macros
that use C bit fields and/or shifts and masks, simply by
changing the definition; but you cannot switch the other
way around -- not in C.

Accessor functions can be extended with extra validity
checking, or with extra instrumentation.  And, of course,
that can be controlled by a preprocessor symbol.

If I had it to do over, I would make even more heavy use of
constructor and accessor functions.  One thing I would do,
for sure, is to use a notation for field specifications
that has more redundancy.

This approach is not without cost.  Generated header files
make the build process more complex, especially since
make, in general, and ON build in particular are not very
smart about generated header files.  But, there are other
generated header files, for example, headers related to
RPC and XDR.  And, there are other reasons to move in the
direction of even more generated header files.  Once the
price is paid for even one generated .h file, and the work
is done to come up with a workable, if not ideal, way of
living with make and ON makefiles, then the price is paid.
The marginal cost for more generated files is negligible.

Opaque struct hat
-----------------
The HAT interface, as defined in common code, uses a
'struct hat *' as the first argument to a large class of
HAT functions.

See usr/src/uts/common/vm/hat.h.

Unless you go out of your way to ensure that 'struct hat'
is opaque, this exposes information about the members of
a HAT internal bookkeeping data structure.  The 2.6 HAT
code did not take any measures to prevent that.

In Solaris/PPC 2.11, 'struct hat' is declared, but never
defined; that is, it remains an incomplete data type.
Some other part of Solaris can pass a 'struct hat *' to
HAT interface functions, but non-HAT code cannot refer
to any members of a 'struct hat', because "there is no
there there".  HAT functions use a different structure,
and all interface functions must assign or cast to the
internal 'hat_t *', in order to really get at any members.

In Solaris/PPC 2.11, the type used internally is 'hat_t'.
In the paragraphs that follow, I will refer to hat_t,
just for brevity.  Except for visibility, 'hat_t' is
a synonym for 'struct hat', and I will use 'hat_t' to
refer to 'struct hat' even when describing 2.6 code,
which never used that type definition.

page table allocation
---------------------
The 2.6 HAT code allocated and initialized its own page
table.  When it is time to take over translations from
the firmware, all existing translations, managed by the
firmware, are mapped in the new page table, and then the
hardware is switched over to the new page table.  Then,
the firmware is notified that, from now on, it is to use
callback functions, provided by the HAT, to do any mapping
or unmapping on its behalf.

The 2.11 HAT just inherits the page table, in place.
There is no need for the code, the memory, or the time, to
be used establishing a new page table.  There is a slight
risk that we would regret not having the kernel allocate
its own page table, because we may run into a situation
where we run, not under VOF, but under an implementation
of "real" Open Firmware by a hardware manufacturer,
and something about its page table may not be suitable
for our purposes.  It may not be where we want it; it
may not be big enough, or perhaps it is bigger than we
need or would like.  I believe we do not have to worry,
because OpenFirmware is dead, for all practical purposes.

If we run under VOF, we control it, so it is not a problem.
If we run with some completely different bootware scheme,
then several things need to be changed in startup, anyway.
If we move to a more stand-alone approach, then we are
in control.  The worst case is that we re-introduce the
code to allocate a second page table and copy translations,
or something like it.

PTEs
----
"Don't fight the hardware" is a pretty well established
rule for HAT code.  And, in many ways the 2.6 HAT code
did an admirable job of doing things the PowerPC way.

You know how they say that some programmers can write BASIC
in any programming language.  Advocates of one language or
another talk about "the <language> way", and stress that
it is not enough to learn how to write programs that run
correctly; it is highly desirable to adapt to the style
encouraged by the language.

There is something like that with MMU hardware, as well.
The reference books give you all the facts you need to
know, but are pretty skimpy on explanations, rationale,
and code examples showing "the way".  Maybe it is another
case of not needing to state these things explicitly,
because they are obvious.  Or it could be part of an
overall trend toward confining documentation to sterile,
hard facts, and leaving out any "extra" expository writing.
For example, there used to be a Rationale section in the
ANSI C specifications -- but no more.

One way in which the 2.6 code seemed to fall short of
doing things the PowerPC way is in functions that search
for PTEs.  There were several places where the code worked,
but was unnecessarily complex -- both slow and non-obvious.
This was not a case of a trade-off between performance at
the cost of added complexity; it is one of those win-win
cases, where the simpler more straight-forward code is
also faster and more compact.

One observation about the "obvious" is that all the
information contained in an 8-byte PTE is partitioned into
the two 32-bit words so that everything you need to
know about the virtual address is in the first word, and
the second word describes properties of the physical page
or properties of the translation.  The first word is the
key, the second word is the data.  This is no accident.
The hardware has no need to do anything more complex than
a simple 32-bit compare on the first word of a PTE.

There are small code examples that show how simple the
search logic can be, once things are preconditioned by
constructing the right 32-bit key.  For example, the
code for TLB Miss handling on platforms with no hardware
page-table walker.

The 2.6 code had searches of PTEs involving complex
logical combinations of conditions to be tested.  But all
that logic can be hoisted, and the loop can be reduced to
a simple 32-bit compare.  Never compute that which can be
precomputed.  In 2.11, many of the PTE searching functions
were reformed, and there is more work that can be done
along these lines.

Software PTEs
-------------
2.6 HAT code defined two flavors of PTE: hardware PTE and
software PTE.  Software PTEs (swpte) differed from hardware
PTEs in two ways:

  1. swptes were stored in native data layout.
     When running in little-endian mode, this mattered,
     because the PowerPC page-tables are always accessed
     big-endian, regardless of the current processor state.

  2. swptes carry extra bits of information that
     otherwise would have to be kept in a separate
     place, such as in an HME.

In 2.11, I got rid of the swpte.  If it were just a
matter of big-endian vs little-endian access, I would
feel an obligation to keep swpte's, because I think it is
important to retain the ability to work in either mode.
I have not continually tested that my changes are truly
endian-independent, and I so I would bet that some new
dependencies on big-endian have crept in.  But, at least I
have kept in mind the policy, "do no harm".  However, the
swpte data structure was mostly used for a purpose that has
gone away.  They were used primarily to maintain
a separate linear page table for kernel address space.
But, that applied only to the 601, which had no hardware
page table walker.  That optimization will likely make
a comeback, say on an Efika box.  But, when that time
comes, the code will have to be redone, because the way
of describing the range of kernel address for which this
optimization applies has changed between 2.6 and 2.11.

So, for now, swpte's have just been eliminated.
They will almost certainly not be brought back, as is.
New and better code would be created to do the same kind
of optimization.

Also, I have not given up on mapping kernel text and
data with BAT registers, at least as an option, at least
for some embedded applications.  In that case, a linear
page-table for kernel address space is utterly useless,
even on a platform with no hardware page-table walker.

struct hat
----------
In 2.6, 'struct hat' data structures were pre-allocated.
The capacity planning was based on the maximum number of
processes, which was a configurable parameter.

In Solaris/PPC 2.11, hat_t's are allocated dynamically.
They have their own kmem cache, "ppcmmu_hat_cache".
The hat_t for the kernel, 'khat', is the one hat_t that
is not dynamically allocated.  It is of storage class
'extern', so it is always at a known location.  Some HAT
operations treat the kernel address space differently.
Testing for hat == &khat is a cheap test to discriminate
between kernel and non-kernel.

By the time ppcmmu_hat_cache needs to be ready, the kernel
memory allocator has been up and running for a long time.

HMEs
----
Solaris needs to keep more information about translations
than is provided for in the hardware PTE.  That is where
the HAT Mapping Entry (HME) comes in.  Other aspects of
Solaris VM design, particularly pageout, mean that it
is necessary to navigate quickly from a physical page
to all translations to that page.  Linux does not have
that requirement.  But, even if Solaris did away with
this navigation overhead, there is always some amount of
information needed to be kept about each translation,
in addition to what is supported by hardware.  It has to
be kept somewhere.  It has to be quick and easy to get
from a PTE to the supplemental information.  Aside from
the navigation overhead, the supplemental PTE could
be just a few bits.  On some MMU architectures, there
are unused bits in the PTE.  Of those that have unused
bits, some are reserved and others are explicitly made
available for use by software.  For example, the IA64,
operating in linear page table mode (VHPT Short Format),
has 11 bits that are available for use by the HAT layer.
That was sufficient, so that no additional storage was
needed for an HME, except for pure navigation overhead.

Note that PTEs and HMEs should not contain information
about the virtual address.  Nor should it contain
information about the underlying physical page; that sort
of information belongs in a page_t, or failing that, some
other data structure that keeps data about physical pages.
PTEs and HMEs should contain information ONLY about
the _relationship_ between the virtual address and the
physical page.

On PowerPC, PTEs are quite full.  This is mostly because
the PowerPC MMU architecture uses an inverted page table,
and so the bulk of the fully-qualified virtual address
must be contained in each PTE.

Since the PowerPC hardware page-table is an inverted
page-table, and there is only one, which is global for all
address spaces, Solaris/PPC 2.6 keeps all HMEs in an array,
parallel to the hardware page-table, with the same number
of entries.  This way, fast and simple address arithmetic
is all that is needed to move back and forth between a
PTE and its corresponding HME.

In 2.6, each HME contains:
  next pointer
  prev pointer
  page pointer
  hat index
  payload

This was common to HAT code for Sparc and for x86,
at the time.  Solaris/PPC 2.6 HME data structure was just
copied from srmmu or x86 code, and then modified slightly,
as needed.

'next' and 'prev' pointers are provided so that a
doubly-linked list of HMEs can be maintained for each
page_t.  A pointer to the page_t is there for quick
navigation, because you may arrive at an HME, not by
traversing the doubly-linked list of translations for a
page, but by way of a PTE lookup, in which case you would
not necessarily know in advance what page_t is involved.
Also, there is a way to navigate from an HME to the hat_t
for the address space for this translation.  And then
there are those extra bits that we could not fit into a
PTE, which for our immediate purposes, we can just lump
together and call them the 'payload'.  For, in a sense,
this is the only "real" information in an HME; the rest
is navigation overhead, and could be considered redundant.
Altogether, that adds up to 20 bytes per HME, for a total
of 28 bytes per potential translation.  Notice that I
said _potential_ translation.  Since PTEs and HMEs are
both pre-allocated in an array, HMEs occupy space even
when they correspond to unused (invalid) PTEs.

In 2.6, they did manage to save a bit of space by storing a
16 bit index into a table of hat_t data structures instead
of a 32-bit pointer.  But, in 2.11, we actually would need
more space for that, because the scheme for allocating
hat_t instances was changed from a fixed size array to
truly dynamic allocation, so now we really would need
a pointer to navigate back to the hat_t, and that would
mean 32-bits.  On a 64 bit machine, storage requirements
for HMEs would, of course, be much worse -- almost double.

In 2.11, many preparations were made to move to a scheme
that involved much less overhead per translation.  We still
need some room for the 'payload', those extra bits that
we could not sneak into unused space in a PTE.  One byte
per PTE is sufficient.  It would still be organized as
a parallel array of 1 byte HMEs.  OK, but what about the
'next' and 'prev' pointers, and the 'page' pointer, and
what about the pointer back to the hat_t?

page pointer --- The PTE contains the RPN (PowerPC MMU
terminology, Real Page Number), the physical page frame
number (pfn in Solaris type terminology).  It is not that
expensive to navigate from a pfn to a page_t.  There are
times when it is necessary, but these are not the most
common cases, and it is not _that_ performance critical.
It is unlikely that simply eliminating the HME page
pointer will be a performance problem; but if it were,
a better investment would be in improving performance
of pfn-to-page_t.  There are plenty of opportunities
for that, if it should be needed.

'next' and 'prev' --- At any time, it is pretty likely
that a significant fraction of HMEs are not in use.  In a
way, they don't count.  Of the HMEs in use, most will be
for translations to a page that is not shared; that is,
the doubly-linked list of HMEs would be:

    { next=NULL; prev=NULL; payload }

A page_t contains the head of the linked list of
translations to that page (p_mapping), and a share count
(p_share).  For pages with only one translation, the
p_mapping field can be a pointer to the PTE, directly,
and we can dispense with the doubly-linked list altogether.
Now, all we have to worry about is providing HMEs for the
small (but important) minority of pages that are shared.
At this stage, even a great deal more overhead per shared
page would be acceptable.  But, even that is not necessary.
In fact, there are performance reasons for reducing per-HME
overhead even further, by using unrolled linked lists with
node sizes of one or two cache lines.

That just leaves navigation from an HME to its hat_t.
If we were writing a HAT for per-address-space page-tables,
either linear or forward-mapped, then we would not need
to navigate back to the hat_t.  It is only because there
are situations where we are rummaging around in a global
page-table that we don't know what address space applies
to a random PTE.  But, the PTE contains the VSID.  We are
already paying the price of storing the VSID in every PTE.
There is a many-to-one correspondence between VSIDs
and hat_t's.  In 2.6 the many-to-one mapping is simple:
the mapping of VSID _ranges_ to hat_t is bijective.
There might be reasons to change to a more flexible way
of allocating VSIDs, one at a time.  I see no reason
to be in a hurry to change the current scheme of using
blocks of 16 VSIDs.  The situations where you need to
navigate back to the hat_t are not the most common, and
do not require performance at any cost.  The performance
of navigating from HME (PTE) to hat_t just has to be
"good enough".  An extensible array or hash table to
map VSID range to hat_t would be good enough, unless its
implementation is messed up, somehow.  Also, the kernel
VSID range can be made a special case.

PTEG pairs
----------
The PowerPC MMU organizes PTEs into groups of 8.
Each group is called a PTE Group, or PTEG for short.
The hardware hash functions resolve to a group,
so within each PTEG it is necessary to search
for a matching PTE.  Without having any other
information on the side, that pretty much means
a linear search of up to 8 8-byte PTEs.

PTEGs are paired up, so that if a PTEG gets full,
a secondary hash function is used to navigate to
the the other PTEG in that PTE-Group-pair,
or PTEGP, for short.

Because a PTEG-Pair is an important natural
unit for operating on PTEs, Solaris/PPC 2.6
keeps some bookkeeping data for each PTEG-Pair.

The per-PTEG-pair data structure contains:

  1. ptegp_mutex: a mutex protecting the PTEG-pair to
     regulate concurrent accesses to the page-table
     at the granularity of a PTEG-pair.

  2. ptegp_validmap: One bit per PTE in the PTEG-Pair,
     all in one 16-bit short int, indicating whether the
     corresponding PTE is known to be invalid (available).
     This value makes it unnecessary to scan the PTEG-Pair
     for a free slot, except when all entries have been
     used.  If none of the entries in a PTEG-Pair are
     available, the PTEG-Pair is scanned and an attempt is
     made to unload any mappings that have become "stale"
     (no longer associated with in-use VSIDs).

  3. ptegp_lockmap: A 2-bit lock count for each PTE.
     A PTE with a non-zero lock count is for a locked
     translation, that is, one not subject to displacement.
     Since a PTE can be locked multiple times we need
     to maintain a lock count for each PTE.  On PPC,
     the PTEs of the same address space are not grouped
     like in other architectures (e.g. x86, srmmu) so we
     need a separate counter for each PTE.  The occurrence
     of multiple locks on a PTE is not common, and so we
     are using a two level scheme to minimize memory for
     the ptelock counters.  We use 2 bits per PTE in the
     PTEG-Pair structure which keeps a lock count of up
     to 2.  A value of 3 indicates an overflow.  For lock
     counts greater than 2 we use a small hash table to
     maintain the true lock count for those PTEs.

This part of Solaris/PPC HAT bookkeeping has not changed.
But that is only for lack of time.  The plan is to change
the granularity of PTEG-Pair locking from a PTEG-Pair to a
single PTEG, so that the PTEG valid map and the PTEG lock
map data combined all fit in a 32-bit word.  That way,
operations on the bookkeeping data for each PTEG are
done atomically, and no mutex is required.  The 32-bit
word contains all the bookkeeping data and acts as its
own lock.  No other functionality of a mutex is required.
No blocking is required.  There is no need to know anything
about the owner of the "lock".

A 32-bit word to cover 8 PTEs means that we have a budget
of 4 bits per PTE.  That budget can used for 1 valid bit
plus a 3-bit lock count, where the value 7 would indicate
overflow.

Also, the size of the hash table for lock counts is
fixed, based on fixed fraction of the number of entries
in the page table.  I have this thing about fixed size
allocations.  I would change that, given time.

VSID ranges
-----------
Solaris/PPC 2.6 HAT had its own allocator for VSID ranges.
You cannot blame them for that.  vmems were not around when
the HAT was designed, and even when the code was written.
The same is true for use of the kernel memory allocator.
Jeff Bonwick's slab allocator was brand new, at the time.
Its development was concurrent with Solaris/PPC HAT
development.  It had not yet "arrived".

In Solaris/PPC 2.11, the scheme for allocating VSIDs was
changed to just use the vmem allocator.

The 2.6 design had a scheme for cycling through all VSID
ranges and delaying the actual removal of PTEs until a
VSID range needs to be reused.  Although this looks to
me like a slick idea, and might be revisited sometime,
it did not seem to really be useful because common code
is not aware of it; common code calls upon the HAT layer
to unload all the mappings of an address space on exit.

So, the whole notion of keeping "stale" PTEs in the page
table has to be rethought.  Either there has to be more
communication between common code and the HAT layer, or
the HAT layer has to be made smarter.  For now, I just
use the vmem allocator for VSID ranges, and don't bother
trying to do lazy unloading of PTEs, and just reuse VSID
ranges, rather than try to cycle through all of them,
then do VSID-range garbage collection.

It is almost certain that we can get almost all of the
benefit from a lazy PTE-unload scheme, just by being smart
about batch processing of cross calls for TLB shoot-down.
For a single-processor, I don't think the lazy PTE-unload
buys much at all; almost certainly not enough to justify
its complexity and its profligate use of VSID ranges and
the resources needed to keep track of them, and to garbage
collect "stale" PTEs.

BATs
----
Solaris/PPC 2.6 had a simple array of information about the
BAT registers in use.  Information about both instruction
and data BAT registers was kept in a single array.  One of
the properties stored about each entry was which type it
was: Instruction (IBAT) or Data (DBAT).

Very little use is made of BAT registers, unfortunately.
However some updates were made, and while I was at it,
I made some other changes.

First, there is now room for up to 8 IBATs and 8 DBATs, and
configuration variables are used to tell whether there are
4 or 8.  Those configuration variables are set by cpuid(),
which dispatches to model-specific code.  The code for
MPC7450 knows how to interrogate and set the appropriate
HID0 bits and then set the configuration variables.

Second, there are two separate arrays, one for IBATs and
one for DBATs.  Most searches are for a specific purpose,
where it is known in advance whether it is for IBATs only
or for DBATs only.  Those need not look in the array for
the other type of BAT.  Also, there is no need to store
the type in each BAT entry.

Other optimizations are planned, but not done.
They are not useful, yet.

------------------------------------------------------------------------

Table: Summary of changes
-------------------------

  Resource              2.6                     2.11
----------------------  ----------------------  --------------------
Page table              allocated               inherited from VOF
                        translations copied

HMEs                    20 bytes                still 20 bytes
                                                Future: 1 byte

hat_t                   fixed size pool         dynamically allocated
                                                kmem_cache_alloc()
                                                "ppcmmu_hat_cache"

VSID ranges             custom allocator        vmem
                        bit maps, linked lists

PTEGP bookkeeping       protected by mutex      unchanged
                        validmap + lockmap      Future: data modified
                                                  atomically; no mutex

BAT registers           4                       4 or 8 data and instr
                        Unified I and D         Segregated I and D

------------------------------------------------------------------------

-- Guy Shaw

[powerpc-discuss] Solaris/PPC HAT: 2.6 to 2.11

Reply via email to