Re: [plex86] Architectural ramblings

Robert W. Cunningham Sun, 07 Jan 2001 10:40:41 -0800
Kevin Lawton wrote:

> "Robert W. Cunningham" wrote:
>
> > By the time Plex86 makes it to 1.0, I suspect the P6 architecture will be dead
> > and gone.
>
> Ye have little faith.

Intel has already announced it intends to make no P6 PC CPUs after 2001 (the
architecture will survive in laptops for a bit longer).  That's less than 12 months
of "new" life on the desktop.  Will Plex86 1.0 be done in less than a year?

Sure, there will be P6 processors and clones out there for many years to come, but
they will represent a dwindling portion of the market, especially after Plex86 1.0
is released.  Just as '386, '486 and Pentium 1 processors do today.


> > Make that decision, to effectively restrict the target platform to a single
> > processor architecture, then we can talk about more precise cache optimization!
>
> Changing the set of programs you run inside the VM will greatly effect
> the performance.  Thus, it's best to let people be able to tweak things
> where possible.  First attempts, generally are less flexible, but are
> great testbeds for gathering valuable instrumentation data.  But gearing
> for one architecture doesn't solve the problem, because there can be
> a huge amount of interdependency between the host architecture and
> the user specific workload.  The person using (and compiling) the
> program should decide such parameters, not you.

All I'm talking about is code within Plex86 itself, especially code that must cross
rings and/or must dump and restore the processor state.  I believe there are tricks
on the P6 architecture that allow you to dump and restore the processor state
without ever leaving the caches (if a pair of state switches are fast enough in
time, and delayed cache writes are used), something I believe may be impossible on
any earlier Intel architecture.

As far as guest code goes, Plex86 will have to do the best it can with it.  It
should, however, use all the facilities of the host platform to the greatest extent
possible, no matter if the guest code does so or not.

>From the little I've seen so far, that seems to be pretty much the case:  Plex86
appears (to me) to demand a processor with SOME cache, but goes to great lengths to
minimize the use of that cache (or at least the thrashing of it), thus leaving more
for use by the guest code.  An excellent strategy overall:  My suggestion is simply
to assume Plex86 can use a bit more cache if it needs to, and optimize accordingly.
This may already be done!  I'm mainly relying on documentation and comments in my
first pass through the code, so I'm certain there are many, many subtleties I've yet
to grasp.


> If the code is real smart, it could even be put into a special
> instrumentation mode, where at the end of the run, it spits out
> statistics which can be used to fine-tine the parameters for the
> workload that was run.  So subsequent runs will be more optimal.

Yes!  Self-instrumentation is royal pain in the arse if it isn't designed in from
the start.  Once you have multi-levels of cache humming along, it can be next to
impossible to gather meaningful information and save it without drastically
affecting the operation of those very same caches (and thus making much of the
information gathered invalid or at least irrelevant).  Some architectures, such as
the Hitachi SuperH processor family (BTW, my favorite embedded processor, with such
a sweet instruction set and ideal architecture), include debug and scratchpad
registers that can be used to obtain and accumulate significant amounts of run-time
information without seriously affecting the operation of the application or the
state of the processor (redundant hardware is employed).

Does the P6 have similar full-speed and transparent debug and/or monitoring
facilities?  Or do the existing facilities (if any) cause disruption to the
processor or cache states?


> There is no one optimal strategy, even for one specific stepping
> model of any given CPU.  A web server running inside the VM may
> require completely different parameters than a compression algorithm.
> This is further complicated and constrained by other attributes
> of the VM environment, such as the workload imposed on the host
> OS, interrupt rates used by the guest OS where each context switch
> necessitates certain decoupling/recoupling of translated code
> fragments, clocking skewing which magnifies the interrupt rate,
> cache competition between the host and VM/guest, etc.

Yes, but we can certainly optimize the Plex86 code for a given minimum target!  If
we *know* we can count on having at least 128 KB of cache, then it might be worth at
least considering a strategy where Plex86 can "own" up to (say) half of the cache,
and manipulate it as needed?  (Can the x86 lock individual cache lines, or groups of
cache lines, under program control?  The later 68K processors could, as can many
contemporary embedded processors.  But if the x86, and the P6 in particular, cannot
do so, then that may sink the entire cache optimization boat before it has a chance
to float.)

Such optimizations would likely be needed only where the host code needs lots of
emulation and virtualization, which will generally be within the OS itself, and not
within the application (unless you are running apps that directly access the
hardware, such as by using VxDs).  Let's look at a specific case:  What will it take
for Plex86 to be optimized to perform well with WinModem code?  I suspect some of
the greatest uses for Plex86 will be exactly those situations where higher-level
products such as Wine and Win4Lin fail:  Direct hardware access and interaction.

Another such aspect will be multimedia in general:  Will Plex86 do well with DirectX
video and sound?  What optimizations will allow it to be useful in these areas
(WinModems, DirectX, RealAudio/Video, MS MediaPlayer, etc.)?  I suspect (but
certainly do not know for sure) the most important optimizations (in these specific
areas, at least) will require Plex86 to minimize its access to system DRAM, even if
it greatly reduces the cache available to the guest code.

When it comes to word processors, most disk access, and the 2-D GUI, speed will
likely not be a top priority, since such code is often operating at "human speed",
and tens of milliseconds can be frittered away without a visible or noticeable
penalty.

How will Plex86 deal with applications that already contain much platform
optimization?  How well will Plex86 do running the Windows SETI@Home client?


> <non-sequitur>
> I think it was Mark Twain who said something like "Never let
> school get in the way of a good education."
> </non-sequitur>

Exactly.  School is merely the start:  Learning must never end.  However, no
education, formal or otherwise, is ever a wasted effort, which is why I encourage
the students I mentor (ranging from third grade to college) to ALWAYS pursue and
complete a college degree in SOMETHING:  At the very least it proves to the world
they can be taught and can learn within a structured environment.  Some of the best
programmers I have *ever* worked with had degrees in areas such as music,
philosophy, and biology (which was my minor).  Get a degree in ANYTHING, then go do
whatever you want.  Doors will open, where without a degree they will be shut by
default and may have to be pried open.

One very memorable associate had a degree in Medieval French Literature!  He
programmed at guru level for five years, then went to medical school, then did his
residency and dual board certifications.  After which he went and got his PhD in
Cognitive Science, in just 18 months.  Then he started a medical equipment company
outside of Boston that may revolutionize patient care in the coming years.  And I'm
still not sure he knows what he wants to be when he grows up.  But I do know he's
having a blast figuring it out!

A good brain will be a useful brain if you feed it SOMETHING, using as many
different avenues as possible (Marshall McLuhan:  "The medium IS the message.").
However, a formal education is one of the best ways to learn the most stuff in the
least time.  An absence of formal education is never good news, since to be
adequately self-taught requires that you be both teacher and student, an extremely
rare skill pairing.

I also recommend that everyone should take at least two years of pure Philosophy:
Much of college teaches you facts and theorems, but (IMHO) only Philosophy
(specifically, the history of Philosophy, and the philosophy surrounding Man and
Society) teaches you how to THINK, and how to think about thinking.  Becoming aware
of one's own mental processes is the most important key to enabling a lifetime of
learning:  It allows you to become an effective teacher to yourself, to understand
how and why you best learn things.  This is very different from Cognitive Science,
which studies how OTHER people learn:  Philosophy, ultimately, makes you think about
yourself.  Then, if you are so inspired, take an AI course or two.

No learning is ever wasted:  I pursued years of frustrating piano lessons before I
realized I had absolutely no talent.  But along the way I stumbled onto MIDI, which
led me to study optimized real-time communication systems, and how they are best
designed, used and controlled.  Without understanding the subtle timing and
magnitude of piano keystrokes (something I completely lacked in practice, but
understood in theory), I would have never come to appreciate the power and
limitations of MIDI.  And that has shaped much of my career in surprising ways.  It
also allowed me to truly understand that some interface devices "just don't work"
for some people (such as myself and piano keyboards), while a different device may
do far better.  Thus, my music lessons also taught me about ergonomics, and led me
to read works by Don Norman and others, which directly affected how I specify,
design, implement and test my systems.

Though I am still extremely frustrated by being unable to create music (it lives
inside me and just can't get out!), my "futile" piano lessons have proven to be some
of the most important lessons of my life.  They also taught me to never yield to
momentary "failure" and frustration, but to seek the broadest perspective possible.
When true failure ultimately does arrive, those piano lessons also taught me to
accept it with grace, without regret, and to then move on.

Go ahead, do and pursue "crazy" things.  You don't always have to reach or achieve
them:  The pursuit may prove to be more important than the goal, and the journey
more important than the path taken.  It all comes down to being open to
possibilities and perspectives that are not your own, but are new or come from
others.  It means letting outside events and people have an impact and effect on
your life (and you on theirs).  It means having an Open Life, something that offers
tantalizing parallels to the current revolution of Open Source software.

Now, what was I saying about cache optimization?  I seem to have strayed from my
point...



-BobC
Re: [plex86] Architectural ramblings

Reply via email to