RE: PowerPC KVM build directions

2009-04-02 Thread Hollis Blanchard
On Thu, 2009-04-02 at 10:52 +0800, Liu Yu-B13201 wrote:
 
  Since KVM supports a NetBSD 4.0  guest (I 
  think) and 8544/e500 emulation is already present in qemu -- 
  theoretically the first part should work...but I recall Liu 
  mentioning that there might be some OS specific quirks 
  present in the port...and that was what I my question was 
  hinting at earlier.. 
 
 There must be a lot of differences between Linux and NetBSD.
 So the design of kvmppc may not be careful considerate.
 
 The main trick is to hijack the system interrupts.
 You should check if this part of code cater to NetBSD.
 Context switch needs to be taken care as well.
 
 At least there is one thing I can point out.
 See comments in the file booke_interrupts.S  line 195. :-)

Liu, you're talking about BSD as the *host*. Rahul is asking about BSD
as the guest.

Rahul, one major quirk we exploit is that Linux does not use the MSR[AS]
bit at all. One way that bit could be used is to give 32-bit userspace a
separate 4GB address space from the kernel. Instead, Linux puts both
kernel and userspace into the same 4GB address space (with Linux
mappings above 0xc000 and user mappings below). If NetBSD uses
MSR[AS]=1 for userspace (which I think is what the hardware architects
envisioned), you're going to have a lot of MMU fun.

Another potential issue could be the initial environment (described
earlier as option 2) not being what BSD expects. Do you use u-boot? You
can see the initial environment set up in kvm_arch_vcpu_setup() in KVM
and mpc8544ds_init() in Qemu.

Does NetBSD use flattened device trees at all? KVM (Qemu) supplies a
stripped-down device tree to the guest so that the guest won't try to
access IO devices not currently emulated by qemu. If BSD has a hardcoded
device configuration system (e.g. we built for 8544, therefore we
always have the following SoC devices) that will be an issue.

As a concrete example, qemu doesn't emulate the TSEC ethernet
controller. You need to either convince your guest not to try to use the
TSEC (and use e1000 or some other qemu-supported device instead), or add
just enough TSEC emulation to qemu to make your guest happy. That could
be as simple as always reporting link down so the guest doesn't try to
use it.

Please keep us posted about any issues you encounter. Also, documenting
the hurdles and how you overcame them might be an interesting conference
paper, if you're into that sort of thing. :)

-- 
Hollis Blanchard
IBM Linux Technology Center

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


NetBSD and device trees

2009-04-02 Thread Hollis Blanchard
(I'll address the MMU issue in a separate mail.)

On Thu, 2009-04-02 at 11:56 -0700, Rahul Kulkarni wrote:
 Another potential issue could be the initial environment (described
 earlier as option 2) not being what BSD expects. Do you use u-boot?
 You
 can see the initial environment set up in kvm_arch_vcpu_setup() in KVM
 and mpc8544ds_init() in Qemu.
 
 Rahul Yes..I will look into those functions..We do use uboot..Are
 you hinting to go with option 1?

If you use u-boot then you might not have much work to do (option 2 will
probably work for you with few changes).

 Does NetBSD use flattened device trees at all? KVM (Qemu) supplies a
 stripped-down device tree to the guest so that the guest won't try to
 access IO devices not currently emulated by qemu. If BSD has a
 hardcoded
 device configuration system (e.g. we built for 8544, therefore we
 always have the following SoC devices) that will be an issue.
 
 Rahul The device config is hardcoded our NetBSD code base(more so
 because of the embedded nature it's a preferred way) but since I see
 NetBSD supported on Qemu..I would think there is a support available
 for a flattened device tree to be passed in from qemu..I'll look at
 x86 implementations.

Really quick history: Traditionally, desktop/server PowerPC had Open
Firmware (IEEE1275). Open Firmware provides runtime services (sometimes
including IP stack, disk drivers, filesystems, etc), and those services
allow the kernel to retrieve a device tree describing the physical
topology of the system. The runtime services (callbacks) are relatively
high overhead for embedded systems, so traditionally embedded PowerPC
systems used something simpler (ppcboot/u-boot, redboot, CFE, homebrew,
etc). These systems usually hardcoded the expected set of IO devices at
build time.

However, in recent years Linux developers have found that the
flexibility granted by the device tree is invaluable, even without the
runtime services. So they developed a flat device tree data structure
(flat because it's a contiguous in-memory format representing a tree),
and had firmware (especially u-boot) pass that tree to the kernel as a
binary blob.

The takeaway here is that the flat device tree is so far mostly a
PowerPC Linux specific concept. Although the idea is beginning to catch
on with architectures and kernels, I expect that NetBSD doesn't know
anything about it, and x86 Linux doesn't either.

So since PowerPC NetBSD has build-time tables describing the hardware it
will try to use. I see the following options:
1) Teach NetBSD about flat device trees. Probably a lot of work.
2) Emulate more 85xx hardware in qemu. Maybe an easy to medium amount of
work, depending on the complexity and number of the IO devices.
3) Build a special NetBSD kernel with modified tables appropriate for
qemu. Probably the easiest/quickest way, but if your long-term goal is
to run unmodified NetBSD kernels built for real hardware, this is only a
prototyping step.

If you have more than one person playing with this, #2 could be done in
conjunction with #3, until you've emulated all the necessary devices.

Also, if you do #2, you could actually use qemu (without KVM) as a
development environment on normal x86 Linux or Windows workstations (I
think virtual prototyping or virtual platforms is the buzzword these
days). This might be a benefit for your internal software development
processes.

If there is interest (or maybe even existing work) in the NetBSD
community for flat device tree support, you may be able to team up with
other developers to tackle problem #1. To find out, I would post to
devicetree-disc...@ozlabs.org asking if they've heard of NetBSD work,
and also NetBSD/PowerPC mailing lists to see if they've heard of device
tree work.

-- 
Hollis Blanchard
IBM Linux Technology Center

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


MMU tricks for NetBSD guests

2009-04-02 Thread Hollis Blanchard
On Thu, 2009-04-02 at 11:56 -0700, Rahul Kulkarni wrote:
 
 Rahul, one major quirk we exploit is that Linux does not use the
 MSR[AS]
 bit at all. One way that bit could be used is to give 32-bit userspace
 a
 separate 4GB address space from the kernel. Instead, Linux puts both
 kernel and userspace into the same 4GB address space (with Linux
 mappings above 0xc000 and user mappings below). If NetBSD uses
 MSR[AS]=1 for userspace (which I think is what the hardware architects
 envisioned), you're going to have a lot of MMU fun.
 
 Rahul The NetBSD port for e500/85xx which we have uses the MSR[AS]
 (IS/DS) for user/kernel address space separation which keep the
 address spaces split. So that's a major problem to start with. How do
 we get creative with this to provide guest mappings is something,
 which has to be explored. Let me know if you have any thoughts..

OK, so this is going to be a fun one if you like this sort of thing. (I
like this sort of thing, but unfortunately don't have any time I can
commit to it.) I haven't thought through the details all the way, but at
a high level here are my thoughts:

First, to understand the architecture and the shortcut we're using
today, read http://www.linux-kvm.org/page/PowerPC_Book_E_MMU .

Now if you don't have the AS shortcut (which you don't), the key
observation is that the guest is really is a collection of 4GB address
spaces, and those are identified by 9-bit AS|PID.

(By the way, does NetBSD use PID1 and PID2? I sure hope not... :)

You can treat the 2^9 guest spaces as separate host spaces. When the
guest uses a space, reserve a host space for it, and then map guest AS|
PID to the host spaces.

So for example:
  * Guest creates a new process and gives it PID 7.
  * KVM reserves a new host PID. Let's say host PID 23 is available.
  * Guest creates a mapping (tlbwe) for PID 7, EA 0xc, RA
0x0.
  * Host intercepts this (it's a privilege violation because guest
is running with MSR[PR]=1).
  * Host already translates real address from guest physical to host
physical. Let's say guest physical 0 corresponds to host
physical 128M.
  * Your new code: host *also* translates guest PID (7) to host PID
(23).
  * Resulting shadow mapping: PID 23, EA 0xc000, RA 0x0200.

You'll probably want all shadow mappings to have AS=1. In that case, you
would treat guest AS=0 PID=7 as a separate host address space from guest
AS=1 PID=7. gAS|gPID 0|7 would be hAS|hPID 1|23, and gAS|gPID 1|7 would
be hAS|hPID 1|24. In other words, each guest task (PID) will consume two
host address spaces (two different host PIDs, one for each guest AS
value).

Alexander Graf has already done something like this for his 970 work, so
he might be able to provide more details or issues to be aware of in a
scheme like this.

It would be easier to whiteboard, but obviously that's not really an
option...

-- 
Hollis Blanchard
IBM Linux Technology Center

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: MMU tricks for NetBSD guests

2009-04-02 Thread Alexander Graf


On 02.04.2009, at 22:08, Hollis Blanchard wrote:


On Thu, 2009-04-02 at 11:56 -0700, Rahul Kulkarni wrote:


Rahul, one major quirk we exploit is that Linux does not use the
MSR[AS]
bit at all. One way that bit could be used is to give 32-bit  
userspace

a
separate 4GB address space from the kernel. Instead, Linux puts both
kernel and userspace into the same 4GB address space (with Linux
mappings above 0xc000 and user mappings below). If NetBSD uses
MSR[AS]=1 for userspace (which I think is what the hardware  
architects

envisioned), you're going to have a lot of MMU fun.

Rahul The NetBSD port for e500/85xx which we have uses the MSR[AS]
(IS/DS) for user/kernel address space separation which keep the
address spaces split. So that's a major problem to start with. How do
we get creative with this to provide guest mappings is something,
which has to be explored. Let me know if you have any thoughts..


OK, so this is going to be a fun one if you like this sort of thing.  
(I

like this sort of thing, but unfortunately don't have any time I can
commit to it.) I haven't thought through the details all the way,  
but at

a high level here are my thoughts:

First, to understand the architecture and the shortcut we're using
today, read http://www.linux-kvm.org/page/PowerPC_Book_E_MMU .

Now if you don't have the AS shortcut (which you don't), the key
observation is that the guest is really is a collection of 4GB address
spaces, and those are identified by 9-bit AS|PID.

(By the way, does NetBSD use PID1 and PID2? I sure hope not... :)

You can treat the 2^9 guest spaces as separate host spaces. When the
guest uses a space, reserve a host space for it, and then map guest  
AS|

PID to the host spaces.

So for example:
 * Guest creates a new process and gives it PID 7.
 * KVM reserves a new host PID. Let's say host PID 23 is  
available.

 * Guest creates a mapping (tlbwe) for PID 7, EA 0xc, RA
   0x0.
 * Host intercepts this (it's a privilege violation because guest
   is running with MSR[PR]=1).
 * Host already translates real address from guest physical to  
host

   physical. Let's say guest physical 0 corresponds to host
   physical 128M.
 * Your new code: host *also* translates guest PID (7) to host PID
   (23).
 * Resulting shadow mapping: PID 23, EA 0xc000, RA 0x0200.

You'll probably want all shadow mappings to have AS=1. In that case,  
you
would treat guest AS=0 PID=7 as a separate host address space from  
guest
AS=1 PID=7. gAS|gPID 0|7 would be hAS|hPID 1|23, and gAS|gPID 1|7  
would
be hAS|hPID 1|24. In other words, each guest task (PID) will consume  
two

host address spaces (two different host PIDs, one for each guest AS
value).

Alexander Graf has already done something like this for his 970  
work, so

he might be able to provide more details or issues to be aware of in a
scheme like this.



That sounds a lot like what I imlemented for real mode on 970. I  
assume the PID is similar to a full SLB context and AS=1/AS=0 is just  
another bit that could as well be in the PID?


So what we do on 970[1] is we treat real mode as yet another vsid.  
970 translates EA - VA - RA. It looks like booke does the same, with  
the VSID coming from the PID.
This basically means that if we're getting into real mode in the  
guest, we just switch to guest VSID 0x (which  
doesn't exist in guests) and map that as one of our host VSIDs  
available in the pool.


You could do the same. Just OR the AS bit into your guest PID you  
use to map things and allocate whatever PID you need on the host  
dynamically :-).


Alex

[1] Sources at http://www.powerkvm.org

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html