Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects

2009-08-19 Thread Ira W. Snyder
On Wed, Aug 19, 2009 at 08:40:33AM +0300, Avi Kivity wrote:
 On 08/19/2009 03:38 AM, Ira W. Snyder wrote:
 On Wed, Aug 19, 2009 at 12:26:23AM +0300, Avi Kivity wrote:

 On 08/18/2009 11:59 PM, Ira W. Snyder wrote:
  
 On a non shared-memory system (where the guest's RAM is not just a chunk
 of userspace RAM in the host system), virtio's management model seems to
 fall apart. Feature negotiation doesn't work as one would expect.


 In your case, virtio-net on the main board accesses PCI config space
 registers to perform the feature negotiation; software on your PCI cards
 needs to trap these config space accesses and respond to them according
 to virtio ABI.

  
 Is this real PCI (physical hardware) or fake PCI (software PCI
 emulation) that you are describing?



 Real PCI.

 The host (x86, PCI master) must use real PCI to actually configure the
 boards, enable bus mastering, etc. Just like any other PCI device, such
 as a network card.

 On the guests (ppc, PCI agents) I cannot add/change PCI functions (the
 last .[0-9] in the PCI address) nor can I change PCI BAR's once the
 board has started. I'm pretty sure that would violate the PCI spec,
 since the PCI master would need to re-scan the bus, and re-assign
 addresses, which is a task for the BIOS.


 Yes.  Can the boards respond to PCI config space cycles coming from the  
 host, or is the config space implemented in silicon and immutable?   
 (reading on, I see the answer is no).  virtio-pci uses the PCI config  
 space to configure the hardware.


Yes, the PCI config space is implemented in silicon. I can change a few
things (mostly PCI BAR attributes), but not much.

 (There's no real guest on your setup, right?  just a kernel running on
 and x86 system and other kernels running on the PCI cards?)

  
 Yes, the x86 (PCI master) runs Linux (booted via PXELinux). The ppc's
 (PCI agents) also run Linux (booted via U-Boot). They are independent
 Linux systems, with a physical PCI interconnect.

 The x86 has CONFIG_PCI=y, however the ppc's have CONFIG_PCI=n. Linux's
 PCI stack does bad things as a PCI agent. It always assumes it is a PCI
 master.

 It is possible for me to enable CONFIG_PCI=y on the ppc's by removing
 the PCI bus from their list of devices provided by OpenFirmware. They
 can not access PCI via normal methods. PCI drivers cannot work on the
 ppc's, because Linux assumes it is a PCI master.

 To the best of my knowledge, I cannot trap configuration space accesses
 on the PCI agents. I haven't needed that for anything I've done thus
 far.



 Well, if you can't do that, you can't use virtio-pci on the host.   
 You'll need another virtio transport (equivalent to fake pci you  
 mentioned above).


Ok.

Is there something similar that I can study as an example? Should I look
at virtio-pci?

 This does appear to be solved by vbus, though I haven't written a
 vbus-over-PCI implementation, so I cannot be completely sure.


 Even if virtio-pci doesn't work out for some reason (though it should),
 you can write your own virtio transport and implement its config space
 however you like.

  
 This is what I did with virtio-over-PCI. The way virtio-net negotiates
 features makes this work non-intuitively.


 I think you tried to take two virtio-nets and make them talk together?   
 That won't work.  You need the code from qemu to talk to virtio-net  
 config space, and vhost-net to pump the rings.


It *is* possible to make two unmodified virtio-net's talk together. I've
done it, and it is exactly what the virtio-over-PCI patch does. Study it
and you'll see how I connected the rx/tx queues together.

The feature negotiation code also works, but in a very unintuitive
manner. I made it work in the virtio-over-PCI patch, but the devices are
hardcoded into the driver. It would be quite a bit of work to swap
virtio-net and virtio-console, for example.

 I'm not at all clear on how to get feature negotiation to work on a
 system like mine. From my study of lguest and kvm (see below) it looks
 like userspace will need to be involved, via a miscdevice.


 I don't see why.  Is the kernel on the PCI cards in full control of all
 accesses?

  
 I'm not sure what you mean by this. Could you be more specific? This is
 a normal, unmodified vanilla Linux kernel running on the PCI agents.


 I meant, does board software implement the config space accesses issued  
 from the host, and it seems the answer is no.


 In my virtio-over-PCI patch, I hooked two virtio-net's together. I wrote
 an algorithm to pair the tx/rx queues together. Since virtio-net
 pre-fills its rx queues with buffers, I was able to use the DMA engine
 to copy from the tx queue into the pre-allocated memory in the rx queue.



 Please find a name other than virtio-over-PCI since it conflicts with  
 virtio-pci.  You're tunnelling virtio config cycles (which are usually  
 done on pci config cycles) on a new protocol which is 

Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects

2009-08-19 Thread Avi Kivity

On 08/19/2009 06:28 PM, Ira W. Snyder wrote:



Well, if you can't do that, you can't use virtio-pci on the host.
You'll need another virtio transport (equivalent to fake pci you
mentioned above).

 

Ok.

Is there something similar that I can study as an example? Should I look
at virtio-pci?

   


There's virtio-lguest, virtio-s390, and virtio-vbus.


I think you tried to take two virtio-nets and make them talk together?
That won't work.  You need the code from qemu to talk to virtio-net
config space, and vhost-net to pump the rings.

 

It *is* possible to make two unmodified virtio-net's talk together. I've
done it, and it is exactly what the virtio-over-PCI patch does. Study it
and you'll see how I connected the rx/tx queues together.
   


Right, crossing the cables works, but feature negotiation is screwed up, 
and both sides think the data is in their RAM.


vhost-net doesn't do negotiation and doesn't assume the data lives in 
its address space.



Please find a name other than virtio-over-PCI since it conflicts with
virtio-pci.  You're tunnelling virtio config cycles (which are usually
done on pci config cycles) on a new protocol which is itself tunnelled
over PCI shared memory.

 

Sorry about that. Do you have suggestions for a better name?

   


virtio-$yourhardware or maybe virtio-dma


I called it virtio-over-PCI in my previous postings to LKML, so until a
new patch is written and posted, I'll keep referring to it by the name
used in the past, so people can search for it.

When I post virtio patches, should I CC another mailing list in addition
to LKML?
   


virtualizat...@lists.linux-foundation.org is virtio's home.


That said, I'm not sure how qemu-system-ppc running on x86 could
possibly communicate using virtio-net. This would mean the guest is an
emulated big-endian PPC, while the host is a little-endian x86. I
haven't actually tested this situation, so perhaps I am wrong.
   


I'm confused now.  You don't actually have any guest, do you, so why 
would you run qemu at all?



The x86 side only needs to run virtio-net, which is present in RHEL 5.3.
You'd only need to run virtio-tunnel or however it's called.  All the
eventfd magic takes place on the PCI agents.

 

I can upgrade the kernel to anything I want on both the x86 and ppc's.
I'd like to avoid changing the x86 (RHEL5) userspace, though. On the
ppc's, I have full control over the userspace environment.
   


You don't need any userspace on virtio-net's side.

Your ppc boards emulate a virtio-net device, so all you need is the 
virtio-net module (and virtio bindings).  If you chose to emulate, say, 
an e1000 card all you'd need is the e1000 driver.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects

2009-08-19 Thread Ira W. Snyder
On Wed, Aug 19, 2009 at 06:37:06PM +0300, Avi Kivity wrote:
 On 08/19/2009 06:28 PM, Ira W. Snyder wrote:

 Well, if you can't do that, you can't use virtio-pci on the host.
 You'll need another virtio transport (equivalent to fake pci you
 mentioned above).

  
 Ok.

 Is there something similar that I can study as an example? Should I look
 at virtio-pci?



 There's virtio-lguest, virtio-s390, and virtio-vbus.

 I think you tried to take two virtio-nets and make them talk together?
 That won't work.  You need the code from qemu to talk to virtio-net
 config space, and vhost-net to pump the rings.

  
 It *is* possible to make two unmodified virtio-net's talk together. I've
 done it, and it is exactly what the virtio-over-PCI patch does. Study it
 and you'll see how I connected the rx/tx queues together.


 Right, crossing the cables works, but feature negotiation is screwed up,  
 and both sides think the data is in their RAM.

 vhost-net doesn't do negotiation and doesn't assume the data lives in  
 its address space.


Yes, that is exactly what I did: crossed the cables (in software).

I'll take a closer look at vhost-net now, and make sure I understand how
it works.

 Please find a name other than virtio-over-PCI since it conflicts with
 virtio-pci.  You're tunnelling virtio config cycles (which are usually
 done on pci config cycles) on a new protocol which is itself tunnelled
 over PCI shared memory.

  
 Sorry about that. Do you have suggestions for a better name?



 virtio-$yourhardware or maybe virtio-dma


How about virtio-phys?

Arnd and BenH are both looking at PPC systems (similar to mine). Grant
Likely is looking at talking to an processor core running on an FPGA,
IIRC. Most of the code can be shared, very little should need to be
board-specific, I hope.

 I called it virtio-over-PCI in my previous postings to LKML, so until a
 new patch is written and posted, I'll keep referring to it by the name
 used in the past, so people can search for it.

 When I post virtio patches, should I CC another mailing list in addition
 to LKML?


 virtualizat...@lists.linux-foundation.org is virtio's home.

 That said, I'm not sure how qemu-system-ppc running on x86 could
 possibly communicate using virtio-net. This would mean the guest is an
 emulated big-endian PPC, while the host is a little-endian x86. I
 haven't actually tested this situation, so perhaps I am wrong.


 I'm confused now.  You don't actually have any guest, do you, so why  
 would you run qemu at all?


I do not run qemu. I am just stating a problem with virtio-net that I
noticed. This is just so someone more knowledgeable can be aware of the
problem.

 The x86 side only needs to run virtio-net, which is present in RHEL 5.3.
 You'd only need to run virtio-tunnel or however it's called.  All the
 eventfd magic takes place on the PCI agents.

  
 I can upgrade the kernel to anything I want on both the x86 and ppc's.
 I'd like to avoid changing the x86 (RHEL5) userspace, though. On the
 ppc's, I have full control over the userspace environment.


 You don't need any userspace on virtio-net's side.

 Your ppc boards emulate a virtio-net device, so all you need is the  
 virtio-net module (and virtio bindings).  If you chose to emulate, say,  
 an e1000 card all you'd need is the e1000 driver.


Thanks for the replies.
Ira
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects

2009-08-19 Thread Avi Kivity

On 08/19/2009 07:29 PM, Ira W. Snyder wrote:


   

virtio-$yourhardware or maybe virtio-dma

 

How about virtio-phys?
   


Could work.


Arnd and BenH are both looking at PPC systems (similar to mine). Grant
Likely is looking at talking to an processor core running on an FPGA,
IIRC. Most of the code can be shared, very little should need to be
board-specific, I hope.
   


Excellent.


That said, I'm not sure how qemu-system-ppc running on x86 could
possibly communicate using virtio-net. This would mean the guest is an
emulated big-endian PPC, while the host is a little-endian x86. I
haven't actually tested this situation, so perhaps I am wrong.

   

I'm confused now.  You don't actually have any guest, do you, so why
would you run qemu at all?

 

I do not run qemu. I am just stating a problem with virtio-net that I
noticed. This is just so someone more knowledgeable can be aware of the
problem.

   


Ah, it certainly doesn't byteswap.  Maybe nobody tried it.  Hollis?

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects

2009-08-18 Thread Ira W. Snyder
On Tue, Aug 18, 2009 at 11:46:06AM +0300, Michael S. Tsirkin wrote:
 On Mon, Aug 17, 2009 at 04:17:09PM -0400, Gregory Haskins wrote:
  Michael S. Tsirkin wrote:
   On Mon, Aug 17, 2009 at 10:14:56AM -0400, Gregory Haskins wrote:
   Case in point: Take an upstream kernel and you can modprobe the
   vbus-pcibridge in and virtio devices will work over that transport
   unmodified.
  
   See http://lkml.org/lkml/2009/8/6/244 for details.
   
   The modprobe you are talking about would need
   to be done in guest kernel, correct?
  
  Yes, and your point is? unmodified (pardon the psuedo pun) modifies
  virtio, not guest.
   It means you can take an off-the-shelf kernel
  with off-the-shelf virtio (ala distro-kernel) and modprobe
  vbus-pcibridge and get alacrityvm acceleration.
 
 Heh, by that logic ksplice does not modify running kernel either :)
 
  It is not a design goal of mine to forbid the loading of a new driver,
  so I am ok with that requirement.
  
   OTOH, Michael's patch is purely targeted at improving virtio-net on kvm,
   and its likewise constrained by various limitations of that decision
   (such as its reliance of the PCI model, and the kvm memory scheme).
   
   vhost is actually not related to PCI in any way. It simply leaves all
   setup for userspace to do.  And the memory scheme was intentionally
   separated from kvm so that it can easily support e.g. lguest.
   
  
  I think you have missed my point. I mean that vhost requires a separate
  bus-model (ala qemu-pci).
 
 So? That can be in userspace, and can be anything including vbus.
 
  And no, your memory scheme is not separated,
  at least, not very well.  It still assumes memory-regions and
  copy_to_user(), which is very kvm-esque.
 
 I don't think so: works for lguest, kvm, UML and containers
 
  Vbus has people using things
  like userspace containers (no regions),
 
 vhost by default works without regions
 
  and physical hardware (dma
  controllers, so no regions or copy_to_user) so your scheme quickly falls
  apart once you get away from KVM.
 
 Someone took a driver and is building hardware for it ... so what?
 

I think Greg is referring to something like my virtio-over-PCI patch.
I'm pretty sure that vhost is completely useless for my situation. I'd
like to see vhost work for my use, so I'll try to explain what I'm
doing.

I've got a system where I have about 20 computers connected via PCI. The
PCI master is a normal x86 system, and the PCI agents are PowerPC
systems. The PCI agents act just like any other PCI card, except they
are running Linux, and have their own RAM and peripherals.

I wrote a custom driver which imitated a network interface and a serial
port. I tried to push it towards mainline, and DavidM rejected it, with
the argument, use virtio, don't add another virtualization layer to the
kernel. I think he has a decent argument, so I wrote virtio-over-PCI.

Now, there are some things about virtio that don't work over PCI.
Mainly, memory is not truly shared. It is extremely slow to access
memory that is far away, meaning across the PCI bus. This can be
worked around by using a DMA controller to transfer all data, along with
an intelligent scheme to perform only writes across the bus. If you're
careful, reads are never needed.

So, in my system, copy_(to|from)_user() is completely wrong. There is no
userspace, only a physical system. In fact, because normal x86 computers
do not have DMA controllers, the host system doesn't actually handle any
data transfer!

I used virtio-net in both the guest and host systems in my example
virtio-over-PCI patch, and succeeded in getting them to communicate.
However, the lack of any setup interface means that the devices must be
hardcoded into both drivers, when the decision could be up to userspace.
I think this is a problem that vbus could solve.

For my own selfish reasons (I don't want to maintain an out-of-tree
driver) I'd like to see *something* useful in mainline Linux. I'm happy
to answer questions about my setup, just ask.

Ira
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects

2009-08-18 Thread Avi Kivity

On 08/18/2009 06:53 PM, Ira W. Snyder wrote:

So, in my system, copy_(to|from)_user() is completely wrong. There is no
userspace, only a physical system. In fact, because normal x86 computers
do not have DMA controllers, the host system doesn't actually handle any
data transfer!
   


In fact, modern x86s do have dma engines these days (google for Intel 
I/OAT), and one of our plans for vhost-net is to allow their use for 
packets above a certain size.  So a patch allowing vhost-net to 
optionally use a dma engine is a good thing.



I used virtio-net in both the guest and host systems in my example
virtio-over-PCI patch, and succeeded in getting them to communicate.
However, the lack of any setup interface means that the devices must be
hardcoded into both drivers, when the decision could be up to userspace.
I think this is a problem that vbus could solve.
   


Exposing a knob to userspace is not an insurmountable problem; vhost-net 
already allows changing the memory layout, for example.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects

2009-08-18 Thread Ira W. Snyder
On Tue, Aug 18, 2009 at 07:51:21PM +0300, Avi Kivity wrote:
 On 08/18/2009 06:53 PM, Ira W. Snyder wrote:
 So, in my system, copy_(to|from)_user() is completely wrong. There is no
 userspace, only a physical system. In fact, because normal x86 computers
 do not have DMA controllers, the host system doesn't actually handle any
 data transfer!


 In fact, modern x86s do have dma engines these days (google for Intel  
 I/OAT), and one of our plans for vhost-net is to allow their use for  
 packets above a certain size.  So a patch allowing vhost-net to  
 optionally use a dma engine is a good thing.


Yes, I'm aware that very modern x86 PCs have general purpose DMA
engines, even though I don't have any capable hardware. However, I think
it is better to support using any PC (with or without DMA engine, any
architecture) as the PCI master, and just handle the DMA all from the
PCI agent, which is known to have DMA?

 I used virtio-net in both the guest and host systems in my example
 virtio-over-PCI patch, and succeeded in getting them to communicate.
 However, the lack of any setup interface means that the devices must be
 hardcoded into both drivers, when the decision could be up to userspace.
 I think this is a problem that vbus could solve.


 Exposing a knob to userspace is not an insurmountable problem; vhost-net  
 already allows changing the memory layout, for example.


Let me explain the most obvious problem I ran into: setting the MAC
addresses used in virtio.

On the host (PCI master), I want eth0 (virtio-net) to get a random MAC
address.

On the guest (PCI agent), I want eth0 (virtio-net) to get a specific MAC
address, aa:bb:cc:dd:ee:ff.

The virtio feature negotiation code handles this, by seeing the
VIRTIO_NET_F_MAC feature in it's configuration space. If BOTH drivers do
not have VIRTIO_NET_F_MAC set, then NEITHER will use the specified MAC
address. This is because the feature negotiation code only accepts a
feature if it is offered by both sides of the connection.

In this case, I must have the guest generate a random MAC address and
have the host put aa:bb:cc:dd:ee:ff into the guest's configuration
space. This basically means hardcoding the MAC addresses in the Linux
drivers, which is a big no-no.

What would I expose to userspace to make this situation manageable?

Thanks for the response,
Ira
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects

2009-08-18 Thread Avi Kivity

On 08/18/2009 08:27 PM, Ira W. Snyder wrote:

In fact, modern x86s do have dma engines these days (google for Intel
I/OAT), and one of our plans for vhost-net is to allow their use for
packets above a certain size.  So a patch allowing vhost-net to
optionally use a dma engine is a good thing.
 

Yes, I'm aware that very modern x86 PCs have general purpose DMA
engines, even though I don't have any capable hardware. However, I think
it is better to support using any PC (with or without DMA engine, any
architecture) as the PCI master, and just handle the DMA all from the
PCI agent, which is known to have DMA?
   


Certainly; but if your PCI agent will support the DMA API, then the same 
vhost code will work with both I/OAT and your specialized hardware.



Exposing a knob to userspace is not an insurmountable problem; vhost-net
already allows changing the memory layout, for example.

 

Let me explain the most obvious problem I ran into: setting the MAC
addresses used in virtio.

On the host (PCI master), I want eth0 (virtio-net) to get a random MAC
address.

On the guest (PCI agent), I want eth0 (virtio-net) to get a specific MAC
address, aa:bb:cc:dd:ee:ff.

The virtio feature negotiation code handles this, by seeing the
VIRTIO_NET_F_MAC feature in it's configuration space. If BOTH drivers do
not have VIRTIO_NET_F_MAC set, then NEITHER will use the specified MAC
address. This is because the feature negotiation code only accepts a
feature if it is offered by both sides of the connection.

In this case, I must have the guest generate a random MAC address and
have the host put aa:bb:cc:dd:ee:ff into the guest's configuration
space. This basically means hardcoding the MAC addresses in the Linux
drivers, which is a big no-no.

What would I expose to userspace to make this situation manageable?

   


I think in this case you want one side to be virtio-net (I'm guessing 
the x86) and the other side vhost-net (the ppc boards with the dma 
engine).  virtio-net on x86 would communicate with userspace on the ppc 
board to negotiate features and get a mac address, the fast path would 
be between virtio-net and vhost-net (which would use the dma engine to 
push and pull data).


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects

2009-08-18 Thread Ira W. Snyder
On Tue, Aug 18, 2009 at 08:47:04PM +0300, Avi Kivity wrote:
 On 08/18/2009 08:27 PM, Ira W. Snyder wrote:
 In fact, modern x86s do have dma engines these days (google for Intel
 I/OAT), and one of our plans for vhost-net is to allow their use for
 packets above a certain size.  So a patch allowing vhost-net to
 optionally use a dma engine is a good thing.
  
 Yes, I'm aware that very modern x86 PCs have general purpose DMA
 engines, even though I don't have any capable hardware. However, I think
 it is better to support using any PC (with or without DMA engine, any
 architecture) as the PCI master, and just handle the DMA all from the
 PCI agent, which is known to have DMA?


 Certainly; but if your PCI agent will support the DMA API, then the same  
 vhost code will work with both I/OAT and your specialized hardware.


Yes, that's true. My ppc is a Freescale MPC8349EMDS. It has a Linux
DMAEngine driver in mainline, which I've used. That's excellent.

 Exposing a knob to userspace is not an insurmountable problem; vhost-net
 already allows changing the memory layout, for example.

  
 Let me explain the most obvious problem I ran into: setting the MAC
 addresses used in virtio.

 On the host (PCI master), I want eth0 (virtio-net) to get a random MAC
 address.

 On the guest (PCI agent), I want eth0 (virtio-net) to get a specific MAC
 address, aa:bb:cc:dd:ee:ff.

 The virtio feature negotiation code handles this, by seeing the
 VIRTIO_NET_F_MAC feature in it's configuration space. If BOTH drivers do
 not have VIRTIO_NET_F_MAC set, then NEITHER will use the specified MAC
 address. This is because the feature negotiation code only accepts a
 feature if it is offered by both sides of the connection.

 In this case, I must have the guest generate a random MAC address and
 have the host put aa:bb:cc:dd:ee:ff into the guest's configuration
 space. This basically means hardcoding the MAC addresses in the Linux
 drivers, which is a big no-no.

 What would I expose to userspace to make this situation manageable?



 I think in this case you want one side to be virtio-net (I'm guessing  
 the x86) and the other side vhost-net (the ppc boards with the dma  
 engine).  virtio-net on x86 would communicate with userspace on the ppc  
 board to negotiate features and get a mac address, the fast path would  
 be between virtio-net and vhost-net (which would use the dma engine to  
 push and pull data).


Ah, that seems backwards, but it should work after vhost-net learns how
to use the DMAEngine API.

I haven't studied vhost-net very carefully yet. As soon as I saw the
copy_(to|from)_user() I stopped reading, because it seemed useless for
my case. I'll look again and try to find where vhost-net supports
setting MAC addresses and other features.

Also, in my case I'd like to boot Linux with my rootfs over NFS. Is
vhost-net capable of this?

I've had Arnd, BenH, and Grant Likely (and others, privately) contact me
about devices they are working with that would benefit from something
like virtio-over-PCI. I'd like to see vhost-net be merged with the
capability to support my use case. There are plenty of others that would
benefit, not just myself.

I'm not sure vhost-net is being written with this kind of future use in
mind. I'd hate to see it get merged, and then have to change the ABI to
support physical-device-to-device usage. It would be better to keep
future use in mind now, rather than try and hack it in later.

Thanks for the comments.
Ira
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects

2009-08-18 Thread Avi Kivity

On 08/18/2009 09:27 PM, Ira W. Snyder wrote:

I think in this case you want one side to be virtio-net (I'm guessing
the x86) and the other side vhost-net (the ppc boards with the dma
engine).  virtio-net on x86 would communicate with userspace on the ppc
board to negotiate features and get a mac address, the fast path would
be between virtio-net and vhost-net (which would use the dma engine to
push and pull data).

 


Ah, that seems backwards, but it should work after vhost-net learns how
to use the DMAEngine API.

I haven't studied vhost-net very carefully yet. As soon as I saw the
copy_(to|from)_user() I stopped reading, because it seemed useless for
my case. I'll look again and try to find where vhost-net supports
setting MAC addresses and other features.
   


It doesn't; all it does is pump the rings, leaving everything else to 
userspace.



Also, in my case I'd like to boot Linux with my rootfs over NFS. Is
vhost-net capable of this?
   


It's just another network interface.  You'd need an initramfs though to 
contain the needed userspace.



I've had Arnd, BenH, and Grant Likely (and others, privately) contact me
about devices they are working with that would benefit from something
like virtio-over-PCI. I'd like to see vhost-net be merged with the
capability to support my use case. There are plenty of others that would
benefit, not just myself.

I'm not sure vhost-net is being written with this kind of future use in
mind. I'd hate to see it get merged, and then have to change the ABI to
support physical-device-to-device usage. It would be better to keep
future use in mind now, rather than try and hack it in later.
   


Please review and comment then.  I'm fairly confident there won't be any 
ABI issues since vhost-net does so little outside pumping the rings.


Note the signalling paths go through eventfd: when vhost-net wants the 
other side to look at its ring, it tickles an eventfd which is supposed 
to trigger an interrupt on the other side.  Conversely, when another 
eventfd is signalled, vhost-net will look at the ring and process any 
data there.  You'll need to wire your signalling to those eventfds, 
either in userspace or in the kernel.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects

2009-08-18 Thread Michael S. Tsirkin
On Tue, Aug 18, 2009 at 11:27:35AM -0700, Ira W. Snyder wrote:
 I haven't studied vhost-net very carefully yet. As soon as I saw the
 copy_(to|from)_user() I stopped reading, because it seemed useless for
 my case. I'll look again and try to find where vhost-net supports
 setting MAC addresses and other features.

vhost net doesn't do this at all. You bind raw socket to a network
device, and program that with usual userspace interfaces.

 Also, in my case I'd like to boot Linux with my rootfs over NFS. Is
 vhost-net capable of this?
 
 I've had Arnd, BenH, and Grant Likely (and others, privately) contact me
 about devices they are working with that would benefit from something
 like virtio-over-PCI. I'd like to see vhost-net be merged with the
 capability to support my use case. There are plenty of others that would
 benefit, not just myself.
 
 I'm not sure vhost-net is being written with this kind of future use in
 mind. I'd hate to see it get merged, and then have to change the ABI to
 support physical-device-to-device usage. It would be better to keep
 future use in mind now, rather than try and hack it in later.

I still need to think your usage over. I am not so sure this fits what
vhost is trying to do. If not, possibly it's better to just have a
separate driver for your device.


 Thanks for the comments.
 Ira
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects

2009-08-18 Thread Michael S. Tsirkin
On Tue, Aug 18, 2009 at 10:27:52AM -0700, Ira W. Snyder wrote:
 On Tue, Aug 18, 2009 at 07:51:21PM +0300, Avi Kivity wrote:
  On 08/18/2009 06:53 PM, Ira W. Snyder wrote:
  So, in my system, copy_(to|from)_user() is completely wrong. There is no
  userspace, only a physical system. In fact, because normal x86 computers
  do not have DMA controllers, the host system doesn't actually handle any
  data transfer!
 
 
  In fact, modern x86s do have dma engines these days (google for Intel  
  I/OAT), and one of our plans for vhost-net is to allow their use for  
  packets above a certain size.  So a patch allowing vhost-net to  
  optionally use a dma engine is a good thing.
 
 
 Yes, I'm aware that very modern x86 PCs have general purpose DMA
 engines, even though I don't have any capable hardware. However, I think
 it is better to support using any PC (with or without DMA engine, any
 architecture) as the PCI master, and just handle the DMA all from the
 PCI agent, which is known to have DMA?
 
  I used virtio-net in both the guest and host systems in my example
  virtio-over-PCI patch, and succeeded in getting them to communicate.
  However, the lack of any setup interface means that the devices must be
  hardcoded into both drivers, when the decision could be up to userspace.
  I think this is a problem that vbus could solve.
 
 
  Exposing a knob to userspace is not an insurmountable problem; vhost-net  
  already allows changing the memory layout, for example.
 
 
 Let me explain the most obvious problem I ran into: setting the MAC
 addresses used in virtio.
 
 On the host (PCI master), I want eth0 (virtio-net) to get a random MAC
 address.
 
 On the guest (PCI agent), I want eth0 (virtio-net) to get a specific MAC
 address, aa:bb:cc:dd:ee:ff.
 
 The virtio feature negotiation code handles this, by seeing the
 VIRTIO_NET_F_MAC feature in it's configuration space. If BOTH drivers do
 not have VIRTIO_NET_F_MAC set, then NEITHER will use the specified MAC
 address. This is because the feature negotiation code only accepts a
 feature if it is offered by both sides of the connection.
 
 In this case, I must have the guest generate a random MAC address and
 have the host put aa:bb:cc:dd:ee:ff into the guest's configuration
 space. This basically means hardcoding the MAC addresses in the Linux
 drivers, which is a big no-no.
 
 What would I expose to userspace to make this situation manageable?
 
 Thanks for the response,
 Ira

This calls for some kind of change in guest virtio.  vhost being a host
kernel only feature, does not deal with this problem.  But assuming
virtio in guest supports this somehow, vhost will not interfere: you do
the setup in qemu userspace anyway, vhost will happily use a network
device however you chose to set it up.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects

2009-08-18 Thread Ira W. Snyder
On Tue, Aug 18, 2009 at 09:52:48PM +0300, Avi Kivity wrote:
 On 08/18/2009 09:27 PM, Ira W. Snyder wrote:
 I think in this case you want one side to be virtio-net (I'm guessing
 the x86) and the other side vhost-net (the ppc boards with the dma
 engine).  virtio-net on x86 would communicate with userspace on the ppc
 board to negotiate features and get a mac address, the fast path would
 be between virtio-net and vhost-net (which would use the dma engine to
 push and pull data).

  

 Ah, that seems backwards, but it should work after vhost-net learns how
 to use the DMAEngine API.

 I haven't studied vhost-net very carefully yet. As soon as I saw the
 copy_(to|from)_user() I stopped reading, because it seemed useless for
 my case. I'll look again and try to find where vhost-net supports
 setting MAC addresses and other features.


 It doesn't; all it does is pump the rings, leaving everything else to  
 userspace.


Ok.

On a non shared-memory system (where the guest's RAM is not just a chunk
of userspace RAM in the host system), virtio's management model seems to
fall apart. Feature negotiation doesn't work as one would expect.

This does appear to be solved by vbus, though I haven't written a
vbus-over-PCI implementation, so I cannot be completely sure.

I'm not at all clear on how to get feature negotiation to work on a
system like mine. From my study of lguest and kvm (see below) it looks
like userspace will need to be involved, via a miscdevice.

 Also, in my case I'd like to boot Linux with my rootfs over NFS. Is
 vhost-net capable of this?


 It's just another network interface.  You'd need an initramfs though to  
 contain the needed userspace.


Ok. I'm using an initramfs already, so adding some more userspace to it
isn't a problem.

 I've had Arnd, BenH, and Grant Likely (and others, privately) contact me
 about devices they are working with that would benefit from something
 like virtio-over-PCI. I'd like to see vhost-net be merged with the
 capability to support my use case. There are plenty of others that would
 benefit, not just myself.

 I'm not sure vhost-net is being written with this kind of future use in
 mind. I'd hate to see it get merged, and then have to change the ABI to
 support physical-device-to-device usage. It would be better to keep
 future use in mind now, rather than try and hack it in later.


 Please review and comment then.  I'm fairly confident there won't be any  
 ABI issues since vhost-net does so little outside pumping the rings.


Ok. I thought I should at least express my concerns while we're
discussing this, rather than being too late after finding the time to
study the driver.

Off the top of my head, I would think that transporting userspace
addresses in the ring (for copy_(to|from)_user()) vs. physical addresses
(for DMAEngine) might be a problem. Pinning userspace pages into memory
for DMA is a bit of a pain, though it is possible.

There is also the problem of different endianness between host and guest
in virtio-net. The struct virtio_net_hdr (include/linux/virtio_net.h)
defines fields in host byte order. Which totally breaks if the guest has
a different endianness. This is a virtio-net problem though, and is not
transport specific.

 Note the signalling paths go through eventfd: when vhost-net wants the  
 other side to look at its ring, it tickles an eventfd which is supposed  
 to trigger an interrupt on the other side.  Conversely, when another  
 eventfd is signalled, vhost-net will look at the ring and process any  
 data there.  You'll need to wire your signalling to those eventfds,  
 either in userspace or in the kernel.


Ok. I've never used eventfd before, so that'll take yet more studying.

I've browsed over both the kvm and lguest code, and it looks like they
each re-invent a mechanism for transporting interrupts between the host
and guest, using eventfd. They both do this by implementing a
miscdevice, which is basically their management interface.

See drivers/lguest/lguest_user.c (see write() and LHREQ_EVENTFD) and
kvm-kmod-devel-88/x86/kvm_main.c (see kvm_vm_ioctl(), called via
kvm_dev_ioctl()) for how they hook up eventfd's.

I can now imagine how two userspace programs (host and guest) could work
together to implement a management interface, including hotplug of
devices, etc. Of course, this would basically reinvent the vbus
management interface into a specific driver.

I think this is partly what Greg is trying to abstract out into generic
code. I haven't studied the actual data transport mechanisms in vbus,
though I have studied virtio's transport mechanism. I think a generic
management interface for virtio might be a good thing to consider,
because it seems there are at least two implementations already: kvm and
lguest.

Thanks for answering my questions. It helps to talk with someone more
familiar with the issues than I am.

Ira
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to 

Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects

2009-08-18 Thread Michael S. Tsirkin
On Tue, Aug 18, 2009 at 08:53:29AM -0700, Ira W. Snyder wrote:
 I think Greg is referring to something like my virtio-over-PCI patch.
 I'm pretty sure that vhost is completely useless for my situation. I'd
 like to see vhost work for my use, so I'll try to explain what I'm
 doing.
 
 I've got a system where I have about 20 computers connected via PCI. The
 PCI master is a normal x86 system, and the PCI agents are PowerPC
 systems. The PCI agents act just like any other PCI card, except they
 are running Linux, and have their own RAM and peripherals.
 
 I wrote a custom driver which imitated a network interface and a serial
 port. I tried to push it towards mainline, and DavidM rejected it, with
 the argument, use virtio, don't add another virtualization layer to the
 kernel. I think he has a decent argument, so I wrote virtio-over-PCI.
 
 Now, there are some things about virtio that don't work over PCI.
 Mainly, memory is not truly shared. It is extremely slow to access
 memory that is far away, meaning across the PCI bus. This can be
 worked around by using a DMA controller to transfer all data, along with
 an intelligent scheme to perform only writes across the bus. If you're
 careful, reads are never needed.
 
 So, in my system, copy_(to|from)_user() is completely wrong.
 There is no userspace, only a physical system.

Can guests do DMA to random host memory? Or is there some kind of IOMMU
and DMA API involved? If the later, then note that you'll still need
some kind of driver for your device. The question we need to ask
ourselves then is whether this driver can reuse bits from vhost.

 In fact, because normal x86 computers
 do not have DMA controllers, the host system doesn't actually handle any
 data transfer!

Is it true that PPC has to initiate all DMA then? How do you
manage not to do DMA reads then?

 I used virtio-net in both the guest and host systems in my example
 virtio-over-PCI patch, and succeeded in getting them to communicate.
 However, the lack of any setup interface means that the devices must be
 hardcoded into both drivers, when the decision could be up to userspace.
 I think this is a problem that vbus could solve.

What you describe (passing setup from host to guest) seems like
a feature that guest devices need to support. It seems unlikely that
vbus, being a transport layer, can address this.

 
 For my own selfish reasons (I don't want to maintain an out-of-tree
 driver) I'd like to see *something* useful in mainline Linux. I'm happy
 to answer questions about my setup, just ask.
 
 Ira

Thanks Ira, I'll think about it.
A couple of questions:
- Could you please describe what kind of communication needs to happen?
- I'm not familiar with DMA engine in question. I'm guessing it's the
  usual thing: in/out buffers need to be kernel memory, interface is
  asynchronous, small limited number of outstanding requests? Is there a
  userspace interface for it and if yes how does it work?



-- 
MST

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects

2009-08-18 Thread Arnd Bergmann
On Tuesday 18 August 2009 20:35:22 Michael S. Tsirkin wrote:
 On Tue, Aug 18, 2009 at 10:27:52AM -0700, Ira W. Snyder wrote:
  Also, in my case I'd like to boot Linux with my rootfs over NFS. Is
  vhost-net capable of this?
  
  I've had Arnd, BenH, and Grant Likely (and others, privately) contact me
  about devices they are working with that would benefit from something
  like virtio-over-PCI. I'd like to see vhost-net be merged with the
  capability to support my use case. There are plenty of others that would
  benefit, not just myself.

yes.

  I'm not sure vhost-net is being written with this kind of future use in
  mind. I'd hate to see it get merged, and then have to change the ABI to
  support physical-device-to-device usage. It would be better to keep
  future use in mind now, rather than try and hack it in later.
 
 I still need to think your usage over. I am not so sure this fits what
 vhost is trying to do. If not, possibly it's better to just have a
 separate driver for your device.

I now think we need both. virtio-over-PCI does it the right way for its
purpose and can be rather generic. It could certainly be extended to
support virtio-net on both sides (host and guest) of KVM, but I think
it better fits the use where a kernel wants to communicate with some
other machine where you normally wouldn't think of using qemu.

Vhost-net OTOH is great in the way that it serves as an easy way to
move the virtio-net code from qemu into the kernel, without changing
its behaviour. It should even straightforward to do live-migration between
hosts with and without it, something that would be much harder with
the virtio-over-PCI logic. Also, its internal state is local to the
process owning its file descriptor, which makes it much easier to
manage permissions and cleanup of its resources.

Arnd 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects

2009-08-18 Thread Avi Kivity

On 08/18/2009 11:59 PM, Ira W. Snyder wrote:

On a non shared-memory system (where the guest's RAM is not just a chunk
of userspace RAM in the host system), virtio's management model seems to
fall apart. Feature negotiation doesn't work as one would expect.
   


In your case, virtio-net on the main board accesses PCI config space 
registers to perform the feature negotiation; software on your PCI cards 
needs to trap these config space accesses and respond to them according 
to virtio ABI.


(There's no real guest on your setup, right?  just a kernel running on 
and x86 system and other kernels running on the PCI cards?)



This does appear to be solved by vbus, though I haven't written a
vbus-over-PCI implementation, so I cannot be completely sure.
   


Even if virtio-pci doesn't work out for some reason (though it should), 
you can write your own virtio transport and implement its config space 
however you like.



I'm not at all clear on how to get feature negotiation to work on a
system like mine. From my study of lguest and kvm (see below) it looks
like userspace will need to be involved, via a miscdevice.
   


I don't see why.  Is the kernel on the PCI cards in full control of all 
accesses?



Ok. I thought I should at least express my concerns while we're
discussing this, rather than being too late after finding the time to
study the driver.

Off the top of my head, I would think that transporting userspace
addresses in the ring (for copy_(to|from)_user()) vs. physical addresses
(for DMAEngine) might be a problem. Pinning userspace pages into memory
for DMA is a bit of a pain, though it is possible.
   


Oh, the ring doesn't transport userspace addresses.  It transports guest 
addresses, and it's up to vhost to do something with them.


Currently vhost supports two translation modes:

1. virtio address == host virtual address (using copy_to_user)
2. virtio address == offsetted host virtual address (using copy_to_user)

The latter mode is used for kvm guests (with multiple offsets, skipping 
some details).


I think you need to add a third mode, virtio address == host physical 
address (using dma engine).  Once you do that, and wire up the 
signalling, things should work.



There is also the problem of different endianness between host and guest
in virtio-net. The struct virtio_net_hdr (include/linux/virtio_net.h)
defines fields in host byte order. Which totally breaks if the guest has
a different endianness. This is a virtio-net problem though, and is not
transport specific.
   


Yeah.  You'll need to add byteswaps.


I've browsed over both the kvm and lguest code, and it looks like they
each re-invent a mechanism for transporting interrupts between the host
and guest, using eventfd. They both do this by implementing a
miscdevice, which is basically their management interface.

See drivers/lguest/lguest_user.c (see write() and LHREQ_EVENTFD) and
kvm-kmod-devel-88/x86/kvm_main.c (see kvm_vm_ioctl(), called via
kvm_dev_ioctl()) for how they hook up eventfd's.

I can now imagine how two userspace programs (host and guest) could work
together to implement a management interface, including hotplug of
devices, etc. Of course, this would basically reinvent the vbus
management interface into a specific driver.
   


You don't need anything in the guest userspace (virtio-net) side.


I think this is partly what Greg is trying to abstract out into generic
code. I haven't studied the actual data transport mechanisms in vbus,
though I have studied virtio's transport mechanism. I think a generic
management interface for virtio might be a good thing to consider,
because it seems there are at least two implementations already: kvm and
lguest.
   


Management code in the kernel doesn't really help unless you plan to 
manage things with echo and cat.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects

2009-08-18 Thread Avi Kivity

On 08/19/2009 12:26 AM, Avi Kivity wrote:


Off the top of my head, I would think that transporting userspace
addresses in the ring (for copy_(to|from)_user()) vs. physical addresses
(for DMAEngine) might be a problem. Pinning userspace pages into memory
for DMA is a bit of a pain, though it is possible.



Oh, the ring doesn't transport userspace addresses.  It transports 
guest addresses, and it's up to vhost to do something with them.


Currently vhost supports two translation modes:

1. virtio address == host virtual address (using copy_to_user)
2. virtio address == offsetted host virtual address (using copy_to_user)

The latter mode is used for kvm guests (with multiple offsets, 
skipping some details).


I think you need to add a third mode, virtio address == host physical 
address (using dma engine).  Once you do that, and wire up the 
signalling, things should work.



You don't need in fact a third mode.  You can mmap the x86 address space 
into your ppc userspace and use the second mode.  All you need then is 
the dma engine glue and byte swapping.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects

2009-08-18 Thread Ira W. Snyder
On Tue, Aug 18, 2009 at 11:57:48PM +0300, Michael S. Tsirkin wrote:
 On Tue, Aug 18, 2009 at 08:53:29AM -0700, Ira W. Snyder wrote:
  I think Greg is referring to something like my virtio-over-PCI patch.
  I'm pretty sure that vhost is completely useless for my situation. I'd
  like to see vhost work for my use, so I'll try to explain what I'm
  doing.
  
  I've got a system where I have about 20 computers connected via PCI. The
  PCI master is a normal x86 system, and the PCI agents are PowerPC
  systems. The PCI agents act just like any other PCI card, except they
  are running Linux, and have their own RAM and peripherals.
  
  I wrote a custom driver which imitated a network interface and a serial
  port. I tried to push it towards mainline, and DavidM rejected it, with
  the argument, use virtio, don't add another virtualization layer to the
  kernel. I think he has a decent argument, so I wrote virtio-over-PCI.
  
  Now, there are some things about virtio that don't work over PCI.
  Mainly, memory is not truly shared. It is extremely slow to access
  memory that is far away, meaning across the PCI bus. This can be
  worked around by using a DMA controller to transfer all data, along with
  an intelligent scheme to perform only writes across the bus. If you're
  careful, reads are never needed.
  
  So, in my system, copy_(to|from)_user() is completely wrong.
  There is no userspace, only a physical system.
 
 Can guests do DMA to random host memory? Or is there some kind of IOMMU
 and DMA API involved? If the later, then note that you'll still need
 some kind of driver for your device. The question we need to ask
 ourselves then is whether this driver can reuse bits from vhost.
 

Mostly. All of my systems are 32 bit (both x86 and ppc). From the view
of the ppc (and DMAEngine), I can view the first 1GB of host memory.

This limited view is due to address space limitations on the ppc. The
view of PCI memory must live somewhere in the ppc address space, along
with the ppc's SDRAM, flash, and other peripherals. Since this is a
32bit processor, I only have 4GB of address space to work with.

The PCI address space could be up to 4GB in size. If I tried to allow
the ppc boards to view all 4GB of PCI address space, then they would
have no address space left for their onboard SDRAM, etc.

Hopefully that makes sense.

I use dma_set_mask(dev, DMA_BIT_MASK(30) on the host system to ensure
that when dma_map_sg() is called, it returns addresses that can be
accessed directly by the device.

The DMAEngine can access any local (ppc) memory without any restriction.

I have used the Linux DMAEngine API (include/linux/dmaengine.h) to
handle all data transfer across the PCI bus. The Intel I/OAT (and many
others) use the same API.

  In fact, because normal x86 computers
  do not have DMA controllers, the host system doesn't actually handle any
  data transfer!
 
 Is it true that PPC has to initiate all DMA then? How do you
 manage not to do DMA reads then?
 

Yes, the ppc initiates all DMA. It handles all data transfer (both reads
and writes) across the PCI bus, for speed reasons. A CPU cannot create
burst transactions on the PCI bus. This is the reason that most (all?)
network cards (as a familiar example) use DMA to transfer packet
contents into RAM.

Sorry if I made a confusing statement (no reads are necessary)
earlier. What I meant to say was: If you are very careful, it is not
necessary for the CPU to do any reads over the PCI bus to maintain
state. Writes are the only necessary CPU-initiated transaction.

I implemented this in my virtio-over-PCI patch, copying as much as
possible from the virtio vring structure. The descriptors in the rings
are only changed by one side of the connection, therefore they can be
cached as they are written (via the CPU) across the PCI bus, with the
knowledge that both sides will have a consistent view.

I'm sorry, this is hard to explain via email. It is much easier in a
room with a whiteboard. :)

  I used virtio-net in both the guest and host systems in my example
  virtio-over-PCI patch, and succeeded in getting them to communicate.
  However, the lack of any setup interface means that the devices must be
  hardcoded into both drivers, when the decision could be up to userspace.
  I think this is a problem that vbus could solve.
 
 What you describe (passing setup from host to guest) seems like
 a feature that guest devices need to support. It seems unlikely that
 vbus, being a transport layer, can address this.
 

I think I explained this poorly as well.

Virtio needs two things to function:
1) a set of descriptor rings (1 or more)
2) a way to kick each ring.

With the amount of space available in the ppc's PCI BAR's (which point
at a small chunk of SDRAM), I could potentially make ~6 virtqueues + 6
kick interrupts available.

Right now, my virtio-over-PCI driver hardcoded the first and second
virtqueues to be for virtio-net only, and nothing else.

What if the user wanted 2 

Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects

2009-08-18 Thread Ira W. Snyder
On Wed, Aug 19, 2009 at 12:26:23AM +0300, Avi Kivity wrote:
 On 08/18/2009 11:59 PM, Ira W. Snyder wrote:
 On a non shared-memory system (where the guest's RAM is not just a chunk
 of userspace RAM in the host system), virtio's management model seems to
 fall apart. Feature negotiation doesn't work as one would expect.


 In your case, virtio-net on the main board accesses PCI config space  
 registers to perform the feature negotiation; software on your PCI cards  
 needs to trap these config space accesses and respond to them according  
 to virtio ABI.


Is this real PCI (physical hardware) or fake PCI (software PCI
emulation) that you are describing?

The host (x86, PCI master) must use real PCI to actually configure the
boards, enable bus mastering, etc. Just like any other PCI device, such
as a network card.

On the guests (ppc, PCI agents) I cannot add/change PCI functions (the
last .[0-9] in the PCI address) nor can I change PCI BAR's once the
board has started. I'm pretty sure that would violate the PCI spec,
since the PCI master would need to re-scan the bus, and re-assign
addresses, which is a task for the BIOS.

 (There's no real guest on your setup, right?  just a kernel running on  
 and x86 system and other kernels running on the PCI cards?)


Yes, the x86 (PCI master) runs Linux (booted via PXELinux). The ppc's
(PCI agents) also run Linux (booted via U-Boot). They are independent
Linux systems, with a physical PCI interconnect.

The x86 has CONFIG_PCI=y, however the ppc's have CONFIG_PCI=n. Linux's
PCI stack does bad things as a PCI agent. It always assumes it is a PCI
master.

It is possible for me to enable CONFIG_PCI=y on the ppc's by removing
the PCI bus from their list of devices provided by OpenFirmware. They
can not access PCI via normal methods. PCI drivers cannot work on the
ppc's, because Linux assumes it is a PCI master.

To the best of my knowledge, I cannot trap configuration space accesses
on the PCI agents. I haven't needed that for anything I've done thus
far.

 This does appear to be solved by vbus, though I haven't written a
 vbus-over-PCI implementation, so I cannot be completely sure.


 Even if virtio-pci doesn't work out for some reason (though it should),  
 you can write your own virtio transport and implement its config space  
 however you like.


This is what I did with virtio-over-PCI. The way virtio-net negotiates
features makes this work non-intuitively.

 I'm not at all clear on how to get feature negotiation to work on a
 system like mine. From my study of lguest and kvm (see below) it looks
 like userspace will need to be involved, via a miscdevice.


 I don't see why.  Is the kernel on the PCI cards in full control of all  
 accesses?


I'm not sure what you mean by this. Could you be more specific? This is
a normal, unmodified vanilla Linux kernel running on the PCI agents.

 Ok. I thought I should at least express my concerns while we're
 discussing this, rather than being too late after finding the time to
 study the driver.

 Off the top of my head, I would think that transporting userspace
 addresses in the ring (for copy_(to|from)_user()) vs. physical addresses
 (for DMAEngine) might be a problem. Pinning userspace pages into memory
 for DMA is a bit of a pain, though it is possible.


 Oh, the ring doesn't transport userspace addresses.  It transports guest  
 addresses, and it's up to vhost to do something with them.

 Currently vhost supports two translation modes:

 1. virtio address == host virtual address (using copy_to_user)
 2. virtio address == offsetted host virtual address (using copy_to_user)

 The latter mode is used for kvm guests (with multiple offsets, skipping  
 some details).

 I think you need to add a third mode, virtio address == host physical  
 address (using dma engine).  Once you do that, and wire up the  
 signalling, things should work.


Ok.

In my virtio-over-PCI patch, I hooked two virtio-net's together. I wrote
an algorithm to pair the tx/rx queues together. Since virtio-net
pre-fills its rx queues with buffers, I was able to use the DMA engine
to copy from the tx queue into the pre-allocated memory in the rx queue.

I have an intuitive idea about how I think vhost-net works in this case.

 There is also the problem of different endianness between host and guest
 in virtio-net. The struct virtio_net_hdr (include/linux/virtio_net.h)
 defines fields in host byte order. Which totally breaks if the guest has
 a different endianness. This is a virtio-net problem though, and is not
 transport specific.


 Yeah.  You'll need to add byteswaps.


I wonder if Rusty would accept a new feature:
VIRTIO_F_NET_LITTLE_ENDIAN, which would allow the virtio-net driver to
use LE for all of it's multi-byte fields.

I don't think the transport should have to care about the endianness.

 I've browsed over both the kvm and lguest code, and it looks like they
 each re-invent a mechanism for transporting interrupts between 

Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects

2009-08-18 Thread Ira W. Snyder
On Wed, Aug 19, 2009 at 01:06:45AM +0300, Avi Kivity wrote:
 On 08/19/2009 12:26 AM, Avi Kivity wrote:

 Off the top of my head, I would think that transporting userspace
 addresses in the ring (for copy_(to|from)_user()) vs. physical addresses
 (for DMAEngine) might be a problem. Pinning userspace pages into memory
 for DMA is a bit of a pain, though it is possible.


 Oh, the ring doesn't transport userspace addresses.  It transports  
 guest addresses, and it's up to vhost to do something with them.

 Currently vhost supports two translation modes:

 1. virtio address == host virtual address (using copy_to_user)
 2. virtio address == offsetted host virtual address (using copy_to_user)

 The latter mode is used for kvm guests (with multiple offsets,  
 skipping some details).

 I think you need to add a third mode, virtio address == host physical  
 address (using dma engine).  Once you do that, and wire up the  
 signalling, things should work.


 You don't need in fact a third mode.  You can mmap the x86 address space  
 into your ppc userspace and use the second mode.  All you need then is  
 the dma engine glue and byte swapping.


Hmm, I'll have to think about that.

The ppc is a 32-bit processor, so it has 4GB of address space for
everything, including PCI, SDRAM, flash memory, and all other
peripherals.

This is exactly like 32bit x86, where you cannot have a PCI card that
exposes a 4GB PCI BAR. The system would have no address space left for
its own SDRAM.

On my x86 computers, I only have 1GB of physical RAM, and so the ppc's
have plenty of room in their address spaces to map the entire x86 RAM
into their own address space. That is exactly what I do now. Accesses to
ppc physical address 0x8000 magically hit x86 physical address
0x0.

Ira
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects

2009-08-18 Thread Avi Kivity

On 08/19/2009 03:44 AM, Ira W. Snyder wrote:

You don't need in fact a third mode.  You can mmap the x86 address space
into your ppc userspace and use the second mode.  All you need then is
the dma engine glue and byte swapping.

 

Hmm, I'll have to think about that.

The ppc is a 32-bit processor, so it has 4GB of address space for
everything, including PCI, SDRAM, flash memory, and all other
peripherals.

This is exactly like 32bit x86, where you cannot have a PCI card that
exposes a 4GB PCI BAR. The system would have no address space left for
its own SDRAM.
   


(you actually can, since x86 has a 36-40 bit physical address space even 
with a 32-bit virtual address space, but that doesn't help you).



On my x86 computers, I only have 1GB of physical RAM, and so the ppc's
have plenty of room in their address spaces to map the entire x86 RAM
into their own address space. That is exactly what I do now. Accesses to
ppc physical address 0x8000 magically hit x86 physical address
0x0.
   


So if you mmap() that, you could work with virtual addresses.  It may be 
more efficient to work with physical addresses directly though.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html