from:"Tom Lyon"

[systemd-devel] PATCH: src/udev/ata_id/ata_id.c: WWN recognition fails on big-endian systems

2015-09-18 Thread Tom Lyon

--- a/src/udev/ata_id/ata_id.c
+++ b/src/udev/ata_id/ata_id.c
@@ -485,6 +485,10 @@ int main(int argc, char *argv[])
 disk_identify_fixup_uint16(identify.byte,  90); /*
time required for enhanced SECURITY ERASE UNIT */
 disk_identify_fixup_uint16(identify.byte,  91); /*
current APM values */
 disk_identify_fixup_uint16(identify.byte,  94); /*
current AAM value */
+disk_identify_fixup_uint16(identify.byte, 108); /* wwn
*/
+disk_identify_fixup_uint16(identify.byte, 109); /* wwn
*/
+disk_identify_fixup_uint16(identify.byte, 110); /* wwn
*/
+disk_identify_fixup_uint16(identify.byte, 111); /* wwn
*/
 disk_identify_fixup_uint16(identify.byte, 128); /*
device lock function */
 disk_identify_fixup_uint16(identify.byte, 217); /*
nominal media rotation rate */
 memcpy(, identify.byte, sizeof id);
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel

BUG: virtio_mmio multi-queue competely broken -- virtio registers considered harmful

2013-05-02 Thread Tom Lyon

Virtiio_mmio attempts to mimic the layout of some control registers from 
virtio_pci.  These registers, in particular VIRTIO_MMIO_QUEUE_SEL and 
VIRTIO_PCI_QUEUE_SEL,
are active in nature, and not just passive like a normal memory 
location.  Thus, the host side must react immediately upon write of 
these registers to map some other registers (queue address, size, etc) 
to queue-specific locations.  This is just not possible for mmio, and, I 
would argue, not desirable for PCI either.


Because the queue selector register doesn't work in mmio, it is clear 
that only single queue virtio devices can work.  This means no 
virtio_net - I've seen a few messages

complaining that it doesn't work but nothing so far on why.

It seems from some messages back in March that there is a register 
re-layout in the works for virtio_pci.  I think that virtio_pci could 
become just one of the
various ways to configure a virtio_mmio device and there would no need 
for any registers, just memory locations acting like memory. The one 
gotcha is in
figuring out the kick/notify mechanism for the guest to notify the host 
when there is work on a queue.  For notify, using a hypervisor call 
could unify the pci and mmio

cases, but comes with the cost of leaving the pure pci domain.

I got into this code because I am looking at the possibility of using an 
off the shelf embedded processor sitting on a PCIe port to emulate the 
virtio pci interface.  The
notion of active registers makes this a non-starter, whereas if there 
was a purely memory based system like mmio (with mq fixes), a real PCI 
device could easily emulate it.
Excepting, of course, whatever the notify mechanism is.  If it were 
hypercall based, then the hypervisor could call a transport or device 
specific way of notifying and a small

notify driver could poke the PCI device is some way.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: BUG: virtio_mmio multi-queue competely broken -- virtio registers considered harmful

2013-05-02 Thread Tom Lyon


On 05/01/2013 10:25 PM, Michael S. Tsirkin wrote:

On Wed, May 01, 2013 at 08:40:54PM -0700, Tom Lyon wrote:

Virtiio_mmio attempts to mimic the layout of some control registers
from virtio_pci.  These registers, in particular
VIRTIO_MMIO_QUEUE_SEL and VIRTIO_PCI_QUEUE_SEL,
are active in nature, and not just passive like a normal memory
location.  Thus, the host side must react immediately upon write of
these registers to map some other registers (queue address, size,
etc) to queue-specific locations.  This is just not possible for
mmio, and, I would argue, not desirable for PCI either.

Because the queue selector register doesn't work in mmio, it is
clear that only single queue virtio devices can work.  This means no
virtio_net - I've seen a few messages
complaining that it doesn't work but nothing so far on why.

It seems from some messages back in March that there is a register
re-layout in the works for virtio_pci.  I think that virtio_pci
could become just one of the
various ways to configure a virtio_mmio device and there would no
need for any registers, just memory locations acting like memory.
The one gotcha is in
figuring out the kick/notify mechanism for the guest to notify the
host when there is work on a queue.  For notify, using a hypervisor
call could unify the pci and mmio
cases, but comes with the cost of leaving the pure pci domain.

I got into this code because I am looking at the possibility of
using an off the shelf embedded processor sitting on a PCIe port to
emulate the virtio pci interface.  The
notion of active registers makes this a non-starter, whereas if
there was a purely memory based system like mmio (with mq fixes), a
real PCI device could easily emulate it.
Excepting, of course, whatever the notify mechanism is.  If it were
hypercall based, then the hypervisor could call a transport or
device specific way of notifying and a small
notify driver could poke the PCI device is some way.


This was discussed on this thread:
'[PATCH 16/22] virtio_pci: use separate notification offsets for each 
vq'
Please take a look there and confirm that this addresses your concern.
I'm working on making memory io as fast as pio on x86,
implemented on intel, once I do it on AMD too and assuming it's
as fast as PIO, we'll do mmio everywhere.
Then with a PCI card, you won't have exits for notification just normal
passthrough.


Yes, I had seen that thread. It addresses my concerns for pci, but not 
mmio, although I slightly favor a hypercall notify mechanism over a pci 
write.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: BUG: virtio_mmio multi-queue competely broken -- virtio registers considered harmful

2013-05-02 Thread Tom Lyon


On 05/02/2013 04:37 AM, Pawel Moll wrote:

Hi Tom,

On Thu, 2013-05-02 at 04:40 +0100, Tom Lyon wrote:

Virtiio_mmio attempts to mimic the layout of some control registers from
virtio_pci.  These registers, in particular VIRTIO_MMIO_QUEUE_SEL and
VIRTIO_PCI_QUEUE_SEL,
are active in nature, and not just passive like a normal memory
location.  Thus, the host side must react immediately upon write of
these registers to map some other registers (queue address, size, etc)
to queue-specific locations.  This is just not possible for mmio, and, I
would argue, not desirable for PCI either.

Could you, please, elaborate more about the environment you are talking
about here?

The intention of the MMIO device is to behave like a normal memory
mapped peripheral, say serial port. In the world of architecture without
separate I/O address space (like ARM, MIPS, SH-4 to name only those I
know anything about), such peripherals are usually mapped into the
virtual address space with special attributes, eg. guaranteeing
transactions order. That's why the host can react immediately and to
my knowledge multi-queue virtio devices work just fine.

I'd love see comments from someone with x86 expertise in such areas.
Maybe we are missing some memory barriers here? So the host
implementation would have a chance to react to the QUEUE_SEL access
before servicing other transactions?

Regards

Paweł


Ah, my mistake. I was assuming that mmio just used shared memory 
regions, not that there was emulated IO registers.  I was driven to that
assumption by looking at the rpmsg use case in which a main processor 
talks to satellite processors - there seems to be no
hypervisor and therefore no emulated IO in that case.  However, looking 
deeper, it seems that rpmsg has its own notify mechanism anyways, so it
is not so cleanly layered on virtio.  Also, for my needs, I want rpmsg 
and virtio_net.


In my desire to use a PCIe card to emulate virtio I am also talking 
about a non-hypervisor use case - just  leveraging the virtio framework 
for inter-processor
communication.  Virtio is so close to being the answer it woud be a 
shame if it wasn't. So I would propose that virtio be restructured such 
that the configuration
layout is identical among virtio transports and acts like memory, the 
configuration location is transport-specific, and the notification 
mechanism is transport specific - but I
could live writing to some queue specific memory/io location (for both 
pci and mmio).



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: BUG: virtio_mmio multi-queue competely broken -- virtio registers considered harmful

2013-05-02 Thread Tom Lyon


On 05/01/2013 10:25 PM, Michael S. Tsirkin wrote:

On Wed, May 01, 2013 at 08:40:54PM -0700, Tom Lyon wrote:

Virtiio_mmio attempts to mimic the layout of some control registers
from virtio_pci.  These registers, in particular
VIRTIO_MMIO_QUEUE_SEL and VIRTIO_PCI_QUEUE_SEL,
are active in nature, and not just passive like a normal memory
location.  Thus, the host side must react immediately upon write of
these registers to map some other registers (queue address, size,
etc) to queue-specific locations.  This is just not possible for
mmio, and, I would argue, not desirable for PCI either.

Because the queue selector register doesn't work in mmio, it is
clear that only single queue virtio devices can work.  This means no
virtio_net - I've seen a few messages
complaining that it doesn't work but nothing so far on why.

It seems from some messages back in March that there is a register
re-layout in the works for virtio_pci.  I think that virtio_pci
could become just one of the
various ways to configure a virtio_mmio device and there would no
need for any registers, just memory locations acting like memory.
The one gotcha is in
figuring out the kick/notify mechanism for the guest to notify the
host when there is work on a queue.  For notify, using a hypervisor
call could unify the pci and mmio
cases, but comes with the cost of leaving the pure pci domain.

I got into this code because I am looking at the possibility of
using an off the shelf embedded processor sitting on a PCIe port to
emulate the virtio pci interface.  The
notion of active registers makes this a non-starter, whereas if
there was a purely memory based system like mmio (with mq fixes), a
real PCI device could easily emulate it.
Excepting, of course, whatever the notify mechanism is.  If it were
hypercall based, then the hypervisor could call a transport or
device specific way of notifying and a small
notify driver could poke the PCI device is some way.


This was discussed on this thread:
'[PATCH 16/22] virtio_pci: use separate notification offsets for each 
vq'
Please take a look there and confirm that this addresses your concern.
I'm working on making memory io as fast as pio on x86,
implemented on intel, once I do it on AMD too and assuming it's
as fast as PIO, we'll do mmio everywhere.
Then with a PCI card, you won't have exits for notification just normal
passthrough.


Yes, I had seen that thread. It addresses my concerns for pci, but not 
mmio, although I slightly favor a hypercall notify mechanism over a pci 
write.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BUG: virtio_mmio multi-queue competely broken -- virtio registers considered harmful

2013-05-02 Thread Tom Lyon


On 05/02/2013 04:37 AM, Pawel Moll wrote:

Hi Tom,

On Thu, 2013-05-02 at 04:40 +0100, Tom Lyon wrote:

Virtiio_mmio attempts to mimic the layout of some control registers from
virtio_pci.  These registers, in particular VIRTIO_MMIO_QUEUE_SEL and
VIRTIO_PCI_QUEUE_SEL,
are active in nature, and not just passive like a normal memory
location.  Thus, the host side must react immediately upon write of
these registers to map some other registers (queue address, size, etc)
to queue-specific locations.  This is just not possible for mmio, and, I
would argue, not desirable for PCI either.

Could you, please, elaborate more about the environment you are talking
about here?

The intention of the MMIO device is to behave like a normal memory
mapped peripheral, say serial port. In the world of architecture without
separate I/O address space (like ARM, MIPS, SH-4 to name only those I
know anything about), such peripherals are usually mapped into the
virtual address space with special attributes, eg. guaranteeing
transactions order. That's why the host can react immediately and to
my knowledge multi-queue virtio devices work just fine.

I'd love see comments from someone with x86 expertise in such areas.
Maybe we are missing some memory barriers here? So the host
implementation would have a chance to react to the QUEUE_SEL access
before servicing other transactions?

Regards

Paweł


Ah, my mistake. I was assuming that mmio just used shared memory 
regions, not that there was emulated IO registers.  I was driven to that
assumption by looking at the rpmsg use case in which a main processor 
talks to satellite processors - there seems to be no
hypervisor and therefore no emulated IO in that case.  However, looking 
deeper, it seems that rpmsg has its own notify mechanism anyways, so it
is not so cleanly layered on virtio.  Also, for my needs, I want rpmsg 
and virtio_net.


In my desire to use a PCIe card to emulate virtio I am also talking 
about a non-hypervisor use case - just  leveraging the virtio framework 
for inter-processor
communication.  Virtio is so close to being the answer it woud be a 
shame if it wasn't. So I would propose that virtio be restructured such 
that the configuration
layout is identical among virtio transports and acts like memory, the 
configuration location is transport-specific, and the notification 
mechanism is transport specific - but I
could live writing to some queue specific memory/io location (for both 
pci and mmio).



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

BUG: virtio_mmio multi-queue competely broken -- virtio registers considered harmful

2013-05-01 Thread Tom Lyon

Virtiio_mmio attempts to mimic the layout of some control registers from 
virtio_pci.  These registers, in particular VIRTIO_MMIO_QUEUE_SEL and 
VIRTIO_PCI_QUEUE_SEL,
are active in nature, and not just passive like a normal memory 
location.  Thus, the host side must react immediately upon write of 
these registers to map some other registers (queue address, size, etc) 
to queue-specific locations.  This is just not possible for mmio, and, I 
would argue, not desirable for PCI either.


Because the queue selector register doesn't work in mmio, it is clear 
that only single queue virtio devices can work.  This means no 
virtio_net - I've seen a few messages

complaining that it doesn't work but nothing so far on why.

It seems from some messages back in March that there is a register 
re-layout in the works for virtio_pci.  I think that virtio_pci could 
become just one of the
various ways to configure a virtio_mmio device and there would no need 
for any registers, just memory locations acting like memory. The one 
gotcha is in
figuring out the kick/notify mechanism for the guest to notify the host 
when there is work on a queue.  For notify, using a hypervisor call 
could unify the pci and mmio

cases, but comes with the cost of leaving the pure pci domain.

I got into this code because I am looking at the possibility of using an 
off the shelf embedded processor sitting on a PCIe port to emulate the 
virtio pci interface.  The
notion of active registers makes this a non-starter, whereas if there 
was a purely memory based system like mmio (with mq fixes), a real PCI 
device could easily emulate it.
Excepting, of course, whatever the notify mechanism is.  If it were 
hypercall based, then the hypervisor could call a transport or device 
specific way of notifying and a small

notify driver could poke the PCI device is some way.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] vfio: Fix free in dma mapping error path

2011-05-04 Thread Tom Lyon

On Friday, April 29, 2011 03:05:54 pm Alex Williamson wrote:
 This is allocated via vmalloc, so needs vfree, not kfree.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---
 
  drivers/vfio/vfio_dma.c |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/drivers/vfio/vfio_dma.c b/drivers/vfio/vfio_dma.c
 index cee1a25..4a488b6 100644
 --- a/drivers/vfio/vfio_dma.c
 +++ b/drivers/vfio/vfio_dma.c
 @@ -322,7 +322,7 @@ int vfio_dma_map_common(struct vfio_listener *listener,
   if (ret != npage) {
   printk(KERN_ERR %s: get_user_pages_fast returns %d, not %d\n,
   __func__, ret, npage);
 - kfree(pages);
 + vfree(pages);
   ret = -EFAULT;
   goto out_lock;
   }
Applied. Thanks.

Re: [Qemu-devel] [PATCH] vfio: Add an ioctl to reset the device

2011-04-19 Thread Tom Lyon

On Tuesday, April 19, 2011 01:32:59 pm Alex Williamson wrote:
 When using VFIO to assign a device to a guest, we want to make sure
 the device is quiesced on VM reset to stop all DMA within the guest
 mapped memory.  Add an ioctl which just calls pci_reset_function()
 and returns whether it succeeds.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---
 
 We've recently needed to add this functionality for current KVM
 based device assignment, VFIO should provide a way to do this too.
 An example of it being used in the Qemu VFIO driver can be found
 here:
 
 https://github.com/awilliam/qemu-vfio/blob/vfio/hw/vfio.c
 
  drivers/vfio/vfio_main.c |4 
  include/linux/vfio.h |3 +++
  2 files changed, 7 insertions(+), 0 deletions(-)
 
 diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
 index 7e427fc..b9bb692 100644
 --- a/drivers/vfio/vfio_main.c
 +++ b/drivers/vfio/vfio_main.c
 @@ -490,6 +490,10 @@ static long vfio_unl_ioctl(struct file *filep,
   ret = vfio_irq_eoi_eventfd(vdev, fd);
   break;
 
 + case VFIO_RESET_FUNCTION:
 + ret = pci_reset_function(vdev-pdev);
 + break;
 +
   default:
   return -EINVAL;
   }
 diff --git a/include/linux/vfio.h b/include/linux/vfio.h
 index f07d8fe..47d9bb9 100644
 --- a/include/linux/vfio.h
 +++ b/include/linux/vfio.h
 @@ -215,6 +215,9 @@ struct vfio_dma_map {
  /* Re-enable INTx via eventfd */
  #define  VFIO_IRQ_EOI_EVENTFD_IOW(';', 110, int)
 
 +/* Reset PCI function */
 +#define VFIO_RESET_FUNCTION  _IO(';', 111)
 +
  /*
   * Reads, writes, and mmaps determine which PCI BAR (or config space)
   * from the high level bits of the file offset

Applied.

Re: [ANNOUNCE] VFIO V6 public VFIO repositories

2010-12-21 Thread Tom Lyon

On Monday, December 20, 2010 09:37:33 pm Benjamin Herrenschmidt wrote:
 Hi Tom, just wrote that to linux-pci in reply to your VFIO annouce,
 but your email bounced. Alex gave me your ieee one instead, I'm sending
 this copy to you, please feel free to reply on the list !
 
 Cheers,
 Ben.
 
 On Tue, 2010-12-21 at 16:29 +1100, Benjamin Herrenschmidt wrote:
  On Mon, 2010-11-22 at 15:21 -0800, Tom Lyon wrote:
   VFIO driver development has moved to a publicly accessible
   respository
   
   on github:
 git://github.com/pugs/vfio-linux-2.6.git
   
   This is a clone of the Linux-2.6 tree with all VFIO changes on the vfio
   branch (which is the default). There is a tag 'vfio-v6' marking the
   latest release of VFIO.
   
   In addition, I am open-sourcing my user level code which uses VFIO.
   It is a simple UDP/IP/Ethernet stack supporting 3 different VFIO based
   
   hardware drivers. This code is available at:
 git://github.com/pugs/vfio-user-level-drivers.git
  
  So I do have some concerns about this...
  
  So first, before I go into the meat of my issues, let's just drop a
  quick one about the interface: why netlink ? I find it horrible
  myself... Just confuses everything and adds overhead. ioctl's would have
  been a better choice imho.
  
  Now, my actual issues, which in fact extend to the whole generic iommu
  APIs that have been added to drivers/pci for domains, and that in
  turns stains VFIO in ways that I'm not sure I can use on POWER...
  
  I would appreciate your input on how you think is the best way for me to
  solve some of these mismatches between our HW and this design.
  
  Basically, the whole iommu domain stuff has been entirely designed
  around the idea that you can create those domains which are each an
  entire address space, and put devices in there.
  
  This is sadly not how the IBM iommus work on POWER today...
  
  I have currently one shared DMA address space (per host bridge), but I
  can assign regions of it to different devices (and I have limited
  filtering capabilities so basically, a bus per region, a device per
  region or a function per region).
  
  That means essentially that I cannot just create a mapping for the DMA
  addresses I want, but instead, need to have some kind of allocator for
  DMA translations (which we have in the kernel, ie, dma_map/unmap use a
  bitmap allocator).
  
  I generally have 2 regions per device, one in 32-bit space of quite
  limited size (some times as small as 128M window) and one in 64-bit
  space that I can make quite large if I need to, enough to map all of
  memory if that's really desired, using large pages or something like
  that).
  
  Now that has various consequences vs. the interfaces betweem iommu
  
  domains and qemu, and VFIO:
   - I don't quite see how I can translate the concept of domains and
  
  attaching devices to such domains. The basic idea won't work. The
  domains in my case are essentially pre-existing, not created on-the-fly,
  and may contain multiple devices tho I suppose I can assume for now that
  we only support KVM pass-through with 1 device == 1 domain.
  
  I don't know how to sort that one out if the userspace or kvm code
  assumes it can put multiple devices in one domain and they start to
  magically share the translations...
  
  Not sure what the right approach here is. I could make the Linux
  domain some artifical SW construct that contains a list of the real
  iommu's it's bound to and establish translations in all of them... but
  that isn't very efficient. If the guest kernel explicitely use some
  iommu PV ops targeting a device, I need to only setup translations for
  -that- device, not everything in the domain.
  
   - The code in virt/kvm/iommu.c that assumes it can map the entire guest
  
  memory 1:1 in the IOMMU is just not usable for us that way. We -might-
  be able to do that for 64-bit capable devices as we can create quite
  large regions in the 64-bit space, but at the very least we need some
  kind of offset, and the guest must know about it...
  
   - Similar deal with all the code that currently assume it can pick up a
  
  virtual address and create a mapping from that. Either we provide an
  allocator, or if we want to keep the flexibility of userspace/kvm
  choosing the virtual addresses (preferable), we need to convey some
  ranges information down to the user.
  
   - Finally, my guest are always paravirt. There's well defined Hcalls
  
  for inserting/removing DMA translations and we're implementing these
  since existing kernels already know how to use them. That means that
  overall, I might simply not need to use any of the above.
  
  IE. I could have my own infrastructure for iommu, my H-calls populating
  the target iommu directly from the kernel (kvm) or qemu (via ioctls in
  the non-kvm case). Might be the best option ... but that would mean
  somewhat disentangling VFIO from uiommu...
  
  Any suggestions ? Great ideas ?

Ben - I don't have any

change of email address: p...@cisco.com - p...@ieee.org

2010-12-16 Thread Tom Lyon

eom
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ANNOUNCE] VFIO V6 public VFIO repositories

2010-11-22 Thread Tom Lyon

VFIO driver development has moved to a publicly accessible respository
on github:

git://github.com/pugs/vfio-linux-2.6.git

This is a clone of the Linux-2.6 tree with all VFIO changes on the vfio 
branch (which is the default). There is a tag 'vfio-v6' marking the latest
release of VFIO.

In addition, I am open-sourcing my user level code which uses VFIO.
It is a simple UDP/IP/Ethernet stack supporting 3 different VFIO based
hardware drivers. This code is available at:

git://github.com/pugs/vfio-user-level-drivers.git
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] vfio: fix config virtualization, esp command byte

2010-11-16 Thread Tom Lyon

Applied.

On Tuesday, November 16, 2010 10:57:27 am Alex Williamson wrote:
 On Tue, 2010-11-16 at 10:54 -0700, Alex Williamson wrote:
  On Tue, 2010-11-09 at 17:09 -0800, Tom Lyon wrote:
   Cleans up config space virtualization, especialy handling of bytes
   which have some virtual and some real bits, like PCI_COMMAND.
   
   Alex, I hope you can test this with your setups.
  
  Sorry for the delay.  FWIW, I'm not having much luck with this, I'll try
  to debug the problem.  Thanks,
 
 This seems to be the bug.  Thanks,
 
 Alex
 
 vfio: Don't write random bits on read
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---
 
 diff --git a/drivers/vfio/vfio_pci_config.c
 b/drivers/vfio/vfio_pci_config.c index 7132ac4..422d7b1 100644
 --- a/drivers/vfio/vfio_pci_config.c
 +++ b/drivers/vfio/vfio_pci_config.c
 @@ -964,11 +964,6 @@ static int vfio_config_rwbyte(int write,
   return 0;
   }
 
 - if (write) {
 - if (copy_from_user(newval, buf, 1))
 - return -EFAULT;
 - }
 -
   if (~virt) {/* mix of real and virt bits */
   /* update vconfig with latest hw bits */
   ret = vfio_read_config_byte(vdev, pos, realbits);
 @@ -978,9 +973,14 @@ static int vfio_config_rwbyte(int write,
   (vdev-vconfig[pos]  virt) | (realbits  ~virt);
   }
 
 - /* update vconfig with writeable bits */
 - vdev-vconfig[pos] =
 - (vdev-vconfig[pos]  ~wr) | (newval  wr);
 + if (write) {
 + if (copy_from_user(newval, buf, 1))
 + return -EFAULT;
 +
 + /* update vconfig with writeable bits */
 + vdev-vconfig[pos] =
 + (vdev-vconfig[pos]  ~wr) | (newval  wr);
 +}
 
   /*
* Now massage virtual fields
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] vfio: virtualize INTX_DISABLE

2010-11-09 Thread Tom Lyon

Alex - I am rejecting these 2 patches.

For patch 1/2, I started with yours and found a couple of problems, but then I 
got into the spirit and did a buinch more cleaning up. My patch to follow.

For patch 2/2, the INTX stuff, I don't really see the problem.  If the user 
turns on the bit, it'll result in at most one more interrupt, right?  If he 
turns off the bit, then he doesn't want interrupts.

On Friday, November 05, 2010 10:30:15 am Alex Williamson wrote:
 Tom,
 
 Since we use INTX_DISABLE internally for PCI 2.3 devices, it's probably
 not a good idea to allow vfio users direct access to it.  In trying to
 virtualize it, I stumbled on some config space virtualization issues.
 I think we want vconfig and the value written to hardware to be different
 when there are virtualized bits, but we lump them together.  I think this
 is why I've never been able to rely on the memory enable bit of config
 space through vfio.  With this first patch, all virtualized bits that
 are writable are sticky in vconfig.  Non-writable bits come from either
 the old value or the physical hardware depending on whether it's
 virtualized.  This means we can get rid of a lot of duplicated setting
 and reading of vconfig, and only add code for more complicated
 behaviors.  This is tricky code, to please double check that I'm not
 doing something stupid.  Thanks,
 
 Alex
 
 ---
 
 Alex Williamson (2):
   vfio: Virtualize PCI_COMMAND_INTX_DISABLE
   vfio: Fix config space virtualization
 
 
  drivers/vfio/vfio_intrs.c  |   38 
  drivers/vfio/vfio_main.c   |   14 
  drivers/vfio/vfio_pci_config.c |  131
  include/linux/vfio.h   | 
   4 +
  4 files changed, 84 insertions(+), 103 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] vfio: fix config virtualization, esp command byte

2010-11-09 Thread Tom Lyon

Cleans up config space virtualization, especialy handling of bytes
which have some virtual and some real bits, like PCI_COMMAND.

Alex, I hope you can test this with your setups.

Signed-off-by: Tom Lyon p...@cisco.com
---
 drivers/vfio/vfio_pci_config.c |  166 +---
 1 files changed, 53 insertions(+), 113 deletions(-)

diff --git a/drivers/vfio/vfio_pci_config.c b/drivers/vfio/vfio_pci_config.c
index 8304316..7132ac4 100644
--- a/drivers/vfio/vfio_pci_config.c
+++ b/drivers/vfio/vfio_pci_config.c
@@ -745,6 +745,8 @@ static int vfio_virt_init(struct vfio_dev *vdev)
  */
 static void vfio_bar_restore(struct vfio_dev *vdev)
 {
+   if (vdev-pdev-is_virtfn)
+   return;
printk(KERN_WARNING %s: reset recovery - restoring bars\n, __func__);
 
 #define do_bar(off, which) \
@@ -815,26 +817,15 @@ static inline int vfio_read_config_byte(struct vfio_dev 
*vdev,
 static inline int vfio_write_config_byte(struct vfio_dev *vdev,
int pos, u8 val)
 {
-   vdev-vconfig[pos] = val;
return pci_user_write_config_byte(vdev-pdev, pos, val);
 }
 
 /* handle virtualized fields in the basic config space */
-static u8 vfio_virt_basic(struct vfio_dev *vdev, int write,
-   u16 pos, u16 off, u8 val, u8 newval)
+static void vfio_virt_basic(struct vfio_dev *vdev, int write, u16 pos, u8 *rbp)
 {
-   switch (off) {
-   /*
-* vendor and device are virt because they don't
-* show up otherwise for sr-iov vfs
-*/
-   case PCI_VENDOR_ID:
-   case PCI_VENDOR_ID + 1:
-   case PCI_DEVICE_ID:
-   case PCI_DEVICE_ID + 1:
-   /* read only */
-   val = vdev-vconfig[pos];
-   break;
+   u8 val;
+
+   switch (pos) {
case PCI_COMMAND:
/*
 * If the real mem or IO enable bits are zero
@@ -842,100 +833,58 @@ static u8 vfio_virt_basic(struct vfio_dev *vdev, int 
write,
 * Restore the real BARs before allowing those
 * bits to re-enable
 */
+   val = vdev-vconfig[pos];
if (vdev-pdev-is_virtfn)
val |= PCI_COMMAND_MEMORY;
if (write) {
-   int upd = 0;
-
-   upd = (newval  PCI_COMMAND_MEMORY) 
- (val  PCI_COMMAND_MEMORY);
-   upd += (newval  PCI_COMMAND_IO) 
-  (val  PCI_COMMAND_IO);
-   if (upd)
-   vfio_bar_restore(vdev);
-   vfio_write_config_byte(vdev, pos, newval);
-   }
-   break;
-   case PCI_BASE_ADDRESS_0:
-   case PCI_BASE_ADDRESS_0+1:
-   case PCI_BASE_ADDRESS_0+2:
-   case PCI_BASE_ADDRESS_0+3:
-   case PCI_BASE_ADDRESS_1:
-   case PCI_BASE_ADDRESS_1+1:
-   case PCI_BASE_ADDRESS_1+2:
-   case PCI_BASE_ADDRESS_1+3:
-   case PCI_BASE_ADDRESS_2:
-   case PCI_BASE_ADDRESS_2+1:
-   case PCI_BASE_ADDRESS_2+2:
-   case PCI_BASE_ADDRESS_2+3:
-   case PCI_BASE_ADDRESS_3:
-   case PCI_BASE_ADDRESS_3+1:
-   case PCI_BASE_ADDRESS_3+2:
-   case PCI_BASE_ADDRESS_3+3:
-   case PCI_BASE_ADDRESS_4:
-   case PCI_BASE_ADDRESS_4+1:
-   case PCI_BASE_ADDRESS_4+2:
-   case PCI_BASE_ADDRESS_4+3:
-   case PCI_BASE_ADDRESS_5:
-   case PCI_BASE_ADDRESS_5+1:
-   case PCI_BASE_ADDRESS_5+2:
-   case PCI_BASE_ADDRESS_5+3:
-   case PCI_ROM_ADDRESS:
-   case PCI_ROM_ADDRESS+1:
-   case PCI_ROM_ADDRESS+2:
-   case PCI_ROM_ADDRESS+3:
-   if (write) {
-   vdev-vconfig[pos] = newval;
-   vdev-bardirty = 1;
-   } else {
-   if (vdev-bardirty)
-   vfio_bar_fixup(vdev);
-   val = vdev-vconfig[pos];
+
+   if (((val  PCI_COMMAND_MEMORY) 
+   (*rbp  PCI_COMMAND_MEMORY)) ||
+   ((val  PCI_COMMAND_IO) 
+   (*rbp  PCI_COMMAND_IO)))
+   vfio_bar_restore(vdev);
+   *rbp = ~(PCI_COMMAND_MEMORY + PCI_COMMAND_IO);
+   *rbp |= val  (PCI_COMMAND_MEMORY + PCI_COMMAND_IO);
}
+   vdev-vconfig[pos] = val;
break;
-   default:
+   case PCI_BASE_ADDRESS_0 ... PCI_BASE_ADDRESS_5 + 3:
+   case PCI_ROM_ADDRESS ... PCI_ROM_ADDRESS + 3:
if (write)
-   vdev-vconfig[pos] = newval;
-   else
-   val = vdev-vconfig[pos];
+   vdev-bardirty = 1;
+   else if (vdev-bardirty)
+   vfio_bar_fixup(vdev);
break;
}
-   return val

Re: [PATCH] vfio: Fix PCI 2.3 shared interrupt

2010-11-03 Thread Tom Lyon

Applied.

On Wednesday, November 03, 2010 01:17:33 pm Alex Williamson wrote:
 Trying to be too clever with setting the irq_disabled flag.  PCI 2.3
 disabled devices can still share IRQs, which can lead to clearing
 the irq_disabled flag, preventing the EOI from registering, and leaving
 the device without interrupts.  Interrupt handler should only ever
 set irq_disabled and we can exit earlier to avoid the config space
 access if we know we're disabled.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---
 
  drivers/vfio/vfio_intrs.c |9 -
  1 files changed, 8 insertions(+), 1 deletions(-)
 
 diff --git a/drivers/vfio/vfio_intrs.c b/drivers/vfio/vfio_intrs.c
 index 604082c..73e3deb 100644
 --- a/drivers/vfio/vfio_intrs.c
 +++ b/drivers/vfio/vfio_intrs.c
 @@ -57,6 +57,12 @@ irqreturn_t vfio_interrupt(int irq, void *dev_id)
 
   spin_lock_irq(vdev-irqlock);
 
 + /* INTX disabled interrupts can still be shared */
 + if (vdev-irq_disabled) {
 + spin_unlock_irq(vdev-irqlock);
 + return ret;
 + }
 +
   if (vdev-pci_2_3) {
   pci_block_user_cfg_access(pdev);
 
 @@ -87,7 +93,8 @@ done:
   ret = IRQ_HANDLED;
   }
 
 - vdev-irq_disabled = (ret == IRQ_HANDLED);
 + if (ret == IRQ_HANDLED)
 + vdev-irq_disabled = true;
 
   spin_unlock_irq(vdev-irqlock);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] vfio: Extended capability fixes

2010-11-02 Thread Tom Lyon

Applied. Thanks!

On Monday, November 01, 2010 10:08:35 pm Alex Williamson wrote:
 - Virtual channel position gets truncated as a u8
  - Print the ecap that's unknown, not the last cap we saw
  - Print actual config offset, which provides enough info to make
some sense of the error.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---
 
  drivers/vfio/vfio_pci_config.c |8 
  1 files changed, 4 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/vfio/vfio_pci_config.c
 b/drivers/vfio/vfio_pci_config.c index 8af995d..8304316 100644
 --- a/drivers/vfio/vfio_pci_config.c
 +++ b/drivers/vfio/vfio_pci_config.c
 @@ -410,7 +410,7 @@ static int vfio_msi_cap_len(struct vfio_dev *vdev, u8
 pos) * Determine extended capability length for VC (2  9) and
   * MFVC capabilities
   */
 -static int vfio_vc_cap_len(struct vfio_dev *vdev, u8 pos)
 +static int vfio_vc_cap_len(struct vfio_dev *vdev, u16 pos)
  {
   struct pci_dev *pdev = vdev-pdev;
   u32 dw;
 @@ -580,7 +580,7 @@ int vfio_build_config_map(struct vfio_dev *vdev)
   printk(KERN_WARNING
   %s: pci config conflict at %x, 
   caps %x %x\n,
 - __func__, i, map[pos+i], cap);
 + __func__, pos+i, map[pos+i], cap);
   map[pos+i] = cap;
   }
   ret = pci_read_config_byte(pdev, pos + PCI_CAP_LIST_NEXT, pos);
 @@ -683,7 +683,7 @@ int vfio_build_config_map(struct vfio_dev *vdev)
   if (len == 0 || len == 0xFF) {
   printk(KERN_WARNING
   %s: unknown length for pci ext cap %x\n,
 - __func__, cap);
 + __func__, ecap);
   len = PCI_CAP_SIZEOF;
   }
   for (i = 0; i  len; i++) {
 @@ -691,7 +691,7 @@ int vfio_build_config_map(struct vfio_dev *vdev)
   printk(KERN_WARNING
   %s: pci config conflict at %x, 
   caps %x %x\n,
 - __func__, i, map[epos+i], ecap);
 + __func__, epos+i, map[epos+i], ecap);
   map[epos+i] = ecap;
   }
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/5] Fixes, non-PCI-2.3 support, EOI enhancements

2010-11-01 Thread Tom Lyon

I've applied all your patches. Thanks!

On Saturday, October 30, 2010 09:58:55 am Alex Williamson wrote:
 Hi Tom,
 
 I've updated some patches I've been working on to v5 and wanted to
 see what you think.  I also found a couple minor bugs, fixed in this
 series.
 
 The main idea is that since the VFIO interface defines that the INTx
 interrupt is disabled when it fires, we should provide an interface
 to re-enable it.  We currently have to do a read-modify-write to PCI
 config space, requring two ioctls.  I introduce an ioctl to do this
 for us.  An important aspect of this is that now we can support
 non-PCI-2.3 devices since re-enabling the interrupt is abstracted
 (with the same caveat as current KVM device assignment, that they
 must get an exclusive interrupt).  The real trick though is that we
 can then also create an irqfd-like mechanism to trigger re-enabling
 interrupts from and eventfd.  This allows qemu to do all the setup
 for connecting interrupts from VFIO into KVM and EOIs from KVM to
 VFIO, then lets userspace get completely out of the way.
 
 Hope you like it.  Thanks,
 
 Alex
 ---
 
 Alex Williamson (5):
   vfio: Add a new ioctl to support EOI via eventfd
   vfio: Add support for non-PCI 2.3 compliant devices
   vfio: Add ioctl to re-enable interrupts
   vfio: Fix requested regions
   vfio: Fix the ROM mask
 
 
  drivers/vfio/vfio_intrs.c  |  254
 +--- drivers/vfio/vfio_main.c   | 
  37 +-
  drivers/vfio/vfio_pci_config.c |2
  drivers/vfio/vfio_rdwr.c   |4 -
  include/linux/vfio.h   |   14 ++
  5 files changed, 279 insertions(+), 32 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/4] VFIO V5: Non-privileged user level PCI drivers

2010-10-28 Thread Tom Lyon

[Rebased to 2.6.36]
Just in time for Halloween - be afraid!

This version adds support for PCIe extended capabilities, including Advanced
Error Reporting.  All of the config table initialization has been rewritten to
be much more readable. All config accesses are byte-at-a-time and endian issues
have been resolved. Reading of PCI ROMs is now handled properly. Problems with
usage if pci_iomap and pci_request_regions have now been resolved.  Devices are
now reset upon first open. Races in rmMemlock rlimit accounting have been
cleaned up. Lots of other tweaks and comments.

Blurb from version 4:

After a long summer break, it's tanned, it's rested, and it's ready to rumble!

In this version:*** REBASE to 2.6.35 ***

There's new code using generic netlink messages which allows the kernel
to notify the user level of weird events and allows the user level to 
respond. This is currently used to handle device removal (whether software
or hardware driven), PCI error events, and system suspend  hibernate.

The driver now supports devices which use multiple MSI interrupts, reflecting
the actual number of interrupts allocated by the system to the user level.

PCI config accesses are now done through the pci_user_{read,write)_config
routines from drivers/pci/access.c.

Numerous other tweaks and cleanups.

Blurb from version 3:

There are lots of bug fixes and cleanups in this version, but the main
change is to check to make sure that the IOMMU has interrupt remapping
enabled, which is necessary to prevent user level code from triggering
spurious interrupts for other devices.  Since most platforms today
do not have the necessary hardware and/or software for this, a module
option can override this check, thus making vfio useful (but not safe)
on many more platforms.

In the next version I plan to add kernel to user messaging using the
generic netlink mechanism to allow the user driver to react to hot add
and remove, and power management requests.

Blurb from version 2:

This version now requires an IOMMU domain to be set before any access to
device registers is granted (except that config space may be read).  In
addition, the VFIO_DMA_MAP_ANYWHERE is dropped - it used the dma_map_sg API
which does not have sufficient controls around IOMMU usage.  The IOMMU domain
is obtained from the 'uiommu' driver which is included in this patch.

Various locking, security, and documentation issues have also been fixed.

Please commit - it or me!
But seriously, who gets to commit this? Avi for KVM? or GregKH for drivers?

Blurb from version 1:

This patch is the evolution of code which was first proposed as a patch to
uio/uio_pci_generic, then as a more generic uio patch. Now it is taken entirely
out of the uio framework, and things seem much cleaner. Of course, there is
a lot of functional overlap with uio, but the previous version just seemed
like a giant mode switch in the uio code that did not lead to clarity for
either the new or old code.

[a pony for avi...]
The major new functionality in this version is the ability to deal with
PCI config space accesses (through read  write calls) - but includes table
driven code to determine whats safe to write and what is not. Also, some
virtualization of the config space to allow drivers to think they're writing
some registers when they're not.  Also, IO space accesses are also allowed.
Drivers for devices which use MSI-X are now prevented from directly writing
the MSI-X vector area.

All interrupts are now handled using eventfds, which makes things very simple.

The name VFIO refers to the Virtual Function capabilities of SR-IOV devices
but the driver does support many more types of devices.  I was none too sure
what driver directory this should live in, so for now I made up my own under
drivers/vfio. As a new driver/new directory, who makes the commit decision?

I currently have user level drivers working for 3 different network adapters
- the Cisco Palo enic, the Intel 82599 VF, and the Intel 82576 VF (but the
whole user level framework is a long ways from release).  This driver could
also clearly replace a number of other drivers written just to give user
access to certain devices - but that will take time.

Tom Lyon (4):
  VFIO V5: export pci_user_{read,write}_config
  VFIO V5: additions to include/linux/pci_regs.h
  VFIO V5: uiommu driver - allow user progs to manipulate iommu domains
  VFIO V5: Non-privileged user level PCI drivers

 Documentation/ioctl/ioctl-number.txt |1 +
 Documentation/vfio.txt   |  182 ++
 MAINTAINERS  |8 +
 drivers/Kconfig  |2 +
 drivers/Makefile |2 +
 drivers/pci/access.c |6 +-
 drivers/pci/pci.h|7 -
 drivers/vfio/Kconfig |   18 +
 drivers/vfio/Makefile|   11 +
 drivers/vfio/uiommu.c|  126 
 drivers/vfio/vfio_dma.c  |  400 
 drivers/vfio

[PATCH 1/4] VFIO V5: export pci_user_{read,write}_config

2010-10-28 Thread Tom Lyon


Acked-by:   Jesse Barnes jbar...@virtuousgeek.org
Signed-off-by: Tom Lyon p...@cisco.com
---
 drivers/pci/access.c |6 --
 drivers/pci/pci.h|7 ---
 include/linux/pci.h  |8 
 3 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index 531bc69..96ed449 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -157,7 +157,8 @@ int pci_user_read_config_##size 
\
raw_spin_unlock_irq(pci_lock); \
*val = (type)data;  \
return ret; \
-}
+}  \
+EXPORT_SYMBOL_GPL(pci_user_read_config_##size);
 
 #define PCI_USER_WRITE_CONFIG(size,type)   \
 int pci_user_write_config_##size   \
@@ -171,7 +172,8 @@ int pci_user_write_config_##size
\
pos, sizeof(type), val);\
raw_spin_unlock_irq(pci_lock); \
return ret; \
-}
+}  \
+EXPORT_SYMBOL_GPL(pci_user_write_config_##size);
 
 PCI_USER_READ_CONFIG(byte, u8)
 PCI_USER_READ_CONFIG(word, u16)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 6beb11b..e1db481 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -77,13 +77,6 @@ static inline bool pci_is_bridge(struct pci_dev *pci_dev)
return !!(pci_dev-subordinate);
 }
 
-extern int pci_user_read_config_byte(struct pci_dev *dev, int where, u8 *val);
-extern int pci_user_read_config_word(struct pci_dev *dev, int where, u16 *val);
-extern int pci_user_read_config_dword(struct pci_dev *dev, int where, u32 
*val);
-extern int pci_user_write_config_byte(struct pci_dev *dev, int where, u8 val);
-extern int pci_user_write_config_word(struct pci_dev *dev, int where, u16 val);
-extern int pci_user_write_config_dword(struct pci_dev *dev, int where, u32 
val);
-
 struct pci_vpd_ops {
ssize_t (*read)(struct pci_dev *dev, loff_t pos, size_t count, void 
*buf);
ssize_t (*write)(struct pci_dev *dev, loff_t pos, size_t count, const 
void *buf);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index c8d95e3..7f22c8a 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -756,6 +756,14 @@ static inline int pci_write_config_dword(struct pci_dev 
*dev, int where,
return pci_bus_write_config_dword(dev-bus, dev-devfn, where, val);
 }
 
+/* user-space driven config access */
+extern int pci_user_read_config_byte(struct pci_dev *dev, int where, u8 *val);
+extern int pci_user_read_config_word(struct pci_dev *dev, int where, u16 *val);
+extern int pci_user_read_config_dword(struct pci_dev *dev, int where, u32 
*val);
+extern int pci_user_write_config_byte(struct pci_dev *dev, int where, u8 val);
+extern int pci_user_write_config_word(struct pci_dev *dev, int where, u16 val);
+extern int pci_user_write_config_dword(struct pci_dev *dev, int where, u32 
val);
+
 int __must_check pci_enable_device(struct pci_dev *dev);
 int __must_check pci_enable_device_io(struct pci_dev *dev);
 int __must_check pci_enable_device_mem(struct pci_dev *dev);
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/4] VFIO V5: uiommu driver - allow user progs to manipulate iommu domains

2010-10-28 Thread Tom Lyon


Signed-off-by: Tom Lyon p...@cisco.com
---
 drivers/Kconfig|2 +
 drivers/Makefile   |1 +
 drivers/vfio/Kconfig   |8 +++
 drivers/vfio/Makefile  |1 +
 drivers/vfio/uiommu.c  |  126 
 include/linux/uiommu.h |   76 +
 6 files changed, 214 insertions(+), 0 deletions(-)
 create mode 100644 drivers/vfio/Kconfig
 create mode 100644 drivers/vfio/Makefile
 create mode 100644 drivers/vfio/uiommu.c
 create mode 100644 include/linux/uiommu.h

diff --git a/drivers/Kconfig b/drivers/Kconfig
index a2b902f..711c1cb 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -111,4 +111,6 @@ source drivers/xen/Kconfig
 source drivers/staging/Kconfig
 
 source drivers/platform/Kconfig
+
+source drivers/vfio/Kconfig
 endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index a2aea53..c445440 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -53,6 +53,7 @@ obj-$(CONFIG_FUSION)  += message/
 obj-y  += firewire/
 obj-y  += ieee1394/
 obj-$(CONFIG_UIO)  += uio/
+obj-$(CONFIG_UIOMMU)   += vfio/
 obj-y  += cdrom/
 obj-y  += auxdisplay/
 obj-$(CONFIG_PCCARD)   += pcmcia/
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
new file mode 100644
index 000..3ab9af3
--- /dev/null
+++ b/drivers/vfio/Kconfig
@@ -0,0 +1,8 @@
+menuconfig UIOMMU
+   tristate User level manipulation of IOMMU
+   help
+ Device driver to allow user level programs to
+ manipulate IOMMU domains.
+
+ If you don't know what to do here, say N.
+
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
new file mode 100644
index 000..556f3c1
--- /dev/null
+++ b/drivers/vfio/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_UIOMMU) += uiommu.o
diff --git a/drivers/vfio/uiommu.c b/drivers/vfio/uiommu.c
new file mode 100644
index 000..5c17c5a
--- /dev/null
+++ b/drivers/vfio/uiommu.c
@@ -0,0 +1,126 @@
+/*
+ * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
+ * Author: Tom Lyon, p...@cisco.com
+ *
+ * This program is free software; you may redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; version 2 of the License.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+*/
+
+/*
+ * uiommu driver - issue fd handles for IOMMU domains
+ * so they may be passed to vfio (and others?)
+ */
+#include linux/fs.h
+#include linux/mm.h
+#include linux/init.h
+#include linux/kernel.h
+#include linux/miscdevice.h
+#include linux/module.h
+#include linux/slab.h
+#include linux/file.h
+#include linux/iommu.h
+#include linux/uiommu.h
+
+MODULE_LICENSE(GPL);
+MODULE_AUTHOR(Tom Lyon p...@cisco.com);
+MODULE_DESCRIPTION(User IOMMU driver);
+
+static struct uiommu_domain *uiommu_domain_alloc(void)
+{
+   struct iommu_domain *domain;
+   struct uiommu_domain *udomain;
+
+   domain = iommu_domain_alloc();
+   if (!domain)
+   return NULL;
+   udomain = kzalloc(sizeof *udomain, GFP_KERNEL);
+   if (!udomain) {
+   iommu_domain_free(domain);
+   return NULL;
+   }
+   udomain-domain = domain;
+   atomic_inc(udomain-refcnt);
+   return udomain;
+}
+
+static int uiommu_open(struct inode *inode, struct file *file)
+{
+   struct uiommu_domain *udomain;
+
+   udomain = uiommu_domain_alloc();
+   if (!udomain)
+   return -ENOMEM;
+   file-private_data = udomain;
+   return 0;
+}
+
+static int uiommu_release(struct inode *inode, struct file *file)
+{
+   struct uiommu_domain *udomain;
+
+   udomain = file-private_data;
+   uiommu_put(udomain);
+   return 0;
+}
+
+static const struct file_operations uiommu_fops = {
+   .owner  = THIS_MODULE,
+   .open   = uiommu_open,
+   .release= uiommu_release,
+};
+
+static struct miscdevice uiommu_dev = {
+   .name   = uiommu,
+   .minor  = MISC_DYNAMIC_MINOR,
+   .fops   = uiommu_fops,
+};
+
+struct uiommu_domain *uiommu_fdget(int fd)
+{
+   struct file *file;
+   struct uiommu_domain *udomain;
+
+   file = fget(fd);
+   if (!file)
+   return ERR_PTR(-EBADF);
+   if (file-f_op != uiommu_fops) {
+   fput(file);
+   return ERR_PTR(-EINVAL);
+   }
+   udomain = file-private_data;
+   atomic_inc(udomain-refcnt);
+   return

[PATCH 2/4] VFIO V5: additions to include/linux/pci_regs.h

2010-10-28 Thread Tom Lyon


Signed-off-by: Tom Lyon p...@cisco.com
---
 include/linux/pci_regs.h |  107 ++
 1 files changed, 98 insertions(+), 9 deletions(-)

diff --git a/include/linux/pci_regs.h b/include/linux/pci_regs.h
index 455b9cc..70addc9 100644
--- a/include/linux/pci_regs.h
+++ b/include/linux/pci_regs.h
@@ -26,6 +26,7 @@
  * Under PCI, each device has 256 bytes of configuration address space,
  * of which the first 64 bytes are standardized as follows:
  */
+#definePCI_STD_HEADER_SIZEOF   64
 #define PCI_VENDOR_ID  0x00/* 16 bits */
 #define PCI_DEVICE_ID  0x02/* 16 bits */
 #define PCI_COMMAND0x04/* 16 bits */
@@ -209,9 +210,12 @@
 #define  PCI_CAP_ID_SHPC   0x0C/* PCI Standard Hot-Plug Controller */
 #define  PCI_CAP_ID_SSVID  0x0D/* Bridge subsystem vendor/device ID */
 #define  PCI_CAP_ID_AGP3   0x0E/* AGP Target PCI-PCI bridge */
+#define  PCI_CAP_ID_SECDEV 0x0F/* Secure Device */
 #define  PCI_CAP_ID_EXP0x10/* PCI Express */
 #define  PCI_CAP_ID_MSIX   0x11/* MSI-X */
+#define  PCI_CAP_ID_SATA   0x12/* SATA Data/Index Conf. */
 #define  PCI_CAP_ID_AF 0x13/* PCI Advanced Features */
+#define  PCI_CAP_ID_MAXPCI_CAP_ID_AF
 #define PCI_CAP_LIST_NEXT  1   /* Next capability in the list */
 #define PCI_CAP_FLAGS  2   /* Capability defined flags (16 bits) */
 #define PCI_CAP_SIZEOF 4
@@ -276,6 +280,7 @@
 #define  PCI_VPD_ADDR_MASK 0x7fff  /* Address mask */
 #define  PCI_VPD_ADDR_F0x8000  /* Write 0, 1 indicates 
completion */
 #define PCI_VPD_DATA   4   /* 32-bits of data returned here */
+#definePCI_CAP_VPD_SIZEOF  8
 
 /* Slot Identification */
 
@@ -297,8 +302,10 @@
 #define PCI_MSI_ADDRESS_HI 8   /* Upper 32 bits (if 
PCI_MSI_FLAGS_64BIT set) */
 #define PCI_MSI_DATA_328   /* 16 bits of data for 32-bit 
devices */
 #define PCI_MSI_MASK_3212  /* Mask bits register for 
32-bit devices */
+#define PCI_MSI_PENDING_32 16  /* Pending intrs for 32-bit devices */
 #define PCI_MSI_DATA_6412  /* 16 bits of data for 64-bit 
devices */
 #define PCI_MSI_MASK_6416  /* Mask bits register for 
64-bit devices */
+#define PCI_MSI_PENDING_64 20  /* Pending intrs for 64-bit devices */
 
 /* MSI-X registers (these are at offset PCI_MSIX_FLAGS) */
 #define PCI_MSIX_FLAGS 2
@@ -306,6 +313,7 @@
 #define  PCI_MSIX_FLAGS_ENABLE (1  15)
 #define  PCI_MSIX_FLAGS_MASKALL(1  14)
 #define PCI_MSIX_FLAGS_BIRMASK (7  0)
+#definePCI_CAP_MSIX_SIZEOF 12  /* size of MSIX registers */
 
 /* CompactPCI Hotswap Register */
 
@@ -328,6 +336,7 @@
 #define  PCI_AF_CTRL_FLR   0x01
 #define PCI_AF_STATUS  5
 #define  PCI_AF_STATUS_TP  0x01
+#definePCI_CAP_AF_SIZEOF   6   /* size of AF registers */
 
 /* PCI-X registers */
 
@@ -364,6 +373,9 @@
 #define  PCI_X_STATUS_SPL_ERR  0x2000  /* Rcvd Split Completion Error 
Msg */
 #define  PCI_X_STATUS_266MHZ   0x4000  /* 266 MHz capable */
 #define  PCI_X_STATUS_533MHZ   0x8000  /* 533 MHz capable */
+#definePCI_X_ECC_CSR   8   /* ECC control and status */
+#define PCI_CAP_PCIX_SIZEOF_V0 8   /* size of registers for Version 0 */
+#define PCI_CAP_PCIX_SIZEOF_V1224  /* size for Version 1  2 */
 
 /* PCI Bridge Subsystem ID registers */
 
@@ -451,6 +463,7 @@
 #define  PCI_EXP_LNKSTA_DLLLA  0x2000  /* Data Link Layer Link Active */
 #define  PCI_EXP_LNKSTA_LBMS   0x4000  /* Link Bandwidth Management Status */
 #define  PCI_EXP_LNKSTA_LABS   0x8000  /* Link Autonomous Bandwidth Status */
+#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V1 20  /* v1 endpoints end here */
 #define PCI_EXP_SLTCAP 20  /* Slot Capabilities */
 #define  PCI_EXP_SLTCAP_ABP0x0001 /* Attention Button Present */
 #define  PCI_EXP_SLTCAP_PCP0x0002 /* Power Controller Present */
@@ -498,6 +511,7 @@
 #define  PCI_EXP_DEVCAP2_ARI   0x20/* Alternative Routing-ID */
 #define PCI_EXP_DEVCTL240  /* Device Control 2 */
 #define  PCI_EXP_DEVCTL2_ARI   0x20/* Alternative Routing-ID */
+#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 44  /* v2 endpoints end here */
 #define PCI_EXP_LNKCTL248  /* Link Control 2 */
 #define PCI_EXP_SLTCTL256  /* Slot Control 2 */
 
@@ -506,20 +520,42 @@
 #define PCI_EXT_CAP_VER(header)((header  16)  0xf)
 #define PCI_EXT_CAP_NEXT(header)   ((header  20)  0xffc)
 
-#define PCI_EXT_CAP_ID_ERR 1
-#define PCI_EXT_CAP_ID_VC  2
-#define PCI_EXT_CAP_ID_DSN 3
-#define PCI_EXT_CAP_ID_PWR 4
-#define PCI_EXT_CAP_ID_VNDR11
-#define PCI_EXT_CAP_ID_ACS 13
-#define PCI_EXT_CAP_ID_ARI 14
-#define PCI_EXT_CAP_ID_ATS 15
-#define PCI_EXT_CAP_ID_SRIOV

Re: [PATCH 3/3] VFIO V4: VFIO driver: Non-privileged user level PCI drivers

2010-10-15 Thread Tom Lyon

Michael  Alex et al -
Sorry for going quiet; I've been digesting the comments and researching 
a 
lot more stuff.  I plan to release V5 shortly after 2.6.36 is out, highlights 
will be:

1. Re-written pci config tables - using approach suggested by MST to clean 
things up. Looking much better. Also fixed endian issues.
2. Clean up ROM bar handling. Now only read() system call can access rom bar; 
it is always disabled afterwards.
3. Clean up pci_iomap and pci_request_regions handling.  Allocates on demand 
but frees all on close.
4. Resets device on open. Disables, but does not do full reset on close due to 
lock problem with remove() callers.
5. Fixed up memlock rlimit accounting.
6. Lots of small cleanups.

Stay tuned.
I'm also looking at adding PCIe extended capabilities - got a request from a 
cisco project.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/3] VFIO V4: VFIO driver: Non-privileged user level PCI drivers

2010-09-30 Thread Tom Lyon

On Monday, September 27, 2010 04:54:21 am Michael S. Tsirkin wrote:
 On Wed, Sep 22, 2010 at 02:18:24PM -0700, Tom Lyon wrote:
  Signed-off-by: Tom Lyon p...@cisco.com
 
 Some comments on the pci bits:
 
 After going over them for the Nth time - something needs to be done
 with the rvirt/write tables. I doubt anyone besides me and you
 has gone over them: one has to pull
 up the spec just to understand which bits are set here. Why can't we
 have a module init routine and use the macros in pci_regs.h for this
 purpose, as all other drivers do? We will also get to use the nice
 endian-ness macros linux has for portability ...

I tried a couple of approaches before settling on this one. Yes, its dense, 
perhaps its ugly, but a table approach beats open coding of rules.
And I absolutely don't expect it to verifiable to someone without a PCI spec;
I had to have the PCI spec to write it! This is an example of where 
correctness and readability pull to different places. And the defines in 
pci_regs are incomplete and not real self-consistent anyways.

Perhaps its time for someone to code up a different approach? Code talks!

 
  +static struct perm_bits pci_cap_basic_perm[] = {
 
 What endian-ness is this? Native?
 You pass this to user as is but pci config is little endian...
 Also -are all these readonly? So const? read mostly?
Yes, I need some consts.
Endianness - I haven't really thought much and I don't have  any big-endian 
machines to test with.  However, for now everything is byte-at-a-time which 
makes things easier. I'll take a pass at cleaning it up.
 
  +   { 0x,   0, },   /* 0x00 vendor  device id - RO */
  +   { 0x0003,   0x, },  /* 0x04 cmd - mem  io bits virt */
 
 Interrupt disable bit is under host control.
Yes?

 
  +   { 0,0, },   /* 0x08 class code  revision id */
  +   { 0,0xFF00, },  /* 0x0c bist, htype, lat, cache */
  +   { 0x,   0x, },  /* 0x10 bar */
  +   { 0x,   0x, },  /* 0x14 bar */
  +   { 0x,   0x, },  /* 0x18 bar */
  +   { 0x,   0x, },  /* 0x1c bar */
  +   { 0x,   0x, },  /* 0x20 bar */
  +   { 0x,   0x, },  /* 0x24 bar */
  +   { 0,0, },   /* 0x28 cardbus - not yet */
  +   { 0,0, },   /* 0x2c subsys vendor  dev */
  +   { 0x,   0x, },  /* 0x30 rom bar */
  +   { 0,0, },   /* 0x34 capability ptr  resv */
  +   { 0,0, },   /* 0x38 resv */
  +   { 0x00FF,   0x00FF, },  /* 0x3c max_lat ... irq */
  +};
  +
  +static struct perm_bits pci_cap_pm_perm[] = {
  +   { 0,0, },   /* 0x00 PM capabilities */
  +   { 0,0x, },  /* 0x04 PM control/status */
  +};
  +
  +static struct perm_bits pci_cap_vpd_perm[] = {
  +   { 0,0x, },  /* 0x00 address */
  +   { 0,0x, },  /* 0x04 data */
  +};
  +
  +static struct perm_bits pci_cap_slotid_perm[] = {
  +   { 0,0, },   /* 0x00 all read only */
  +};
  +
  +/* 4 different possible layouts of MSI capability */
  +static struct perm_bits pci_cap_msi_10_perm[] = {
  +   { 0x00FF,   0x00FF, },  /* 0x00 MSI message control */
  +   { 0x,   0x, },  /* 0x04 MSI message address */
  +   { 0x,   0x, },  /* 0x08 MSI message data */
  +};
  +static struct perm_bits pci_cap_msi_14_perm[] = {
  +   { 0x00FF,   0x00FF, },  /* 0x00 MSI message control */
  +   { 0x,   0x, },  /* 0x04 MSI message address */
  +   { 0x,   0x, },  /* 0x08 MSI message upper addr */
  +   { 0x,   0x, },  /* 0x0c MSI message data */
  +};
  +static struct perm_bits pci_cap_msi_20_perm[] = {
  +   { 0x00FF,   0x00FF, },  /* 0x00 MSI message control */
  +   { 0x,   0x, },  /* 0x04 MSI message address */
  +   { 0x,   0x, },  /* 0x08 MSI message data */
  +   { 0,0x, },  /* 0x0c MSI mask bits */
  +   { 0,0x, },  /* 0x10 MSI pending bits */
  +};
  +static struct perm_bits pci_cap_msi_24_perm[] = {
  +   { 0x00FF,   0x00FF, },  /* 0x00 MSI message control */
  +   { 0x,   0x, },  /* 0x04 MSI message address */
  +   { 0x,   0x, },  /* 0x08 MSI message upper addr */
  +   { 0x,   0x, },  /* 0x0c MSI message data */
  +   { 0,0x, },  /* 0x10 MSI mask bits */
  +   { 0,0x, },  /* 0x14 MSI pending bits */
  +};
 
 You let guest control MSI mask and do not virtualize,
 but the host expects to control this register - it is
 used for masking during interrupt rebalancing.
 This will break the device, or even the host with
 'unsafe interrupts' enabled.
 
 Note that if you virtualize it, you will have to virtualize
 the pending bits too.
I can't find any interaction

[PATCH 0/3] VFIO V4: VFIO driver: Non-privileged user level PCI drivers

2010-09-22 Thread Tom Lyon

After a long summer break, it's tanned, it's rested, and it's ready to rumble!

In this version:*** REBASE to 2.6.35 ***

There's new code using generic netlink messages which allows the kernel
to notify the user level of weird events and allows the user level to 
respond. This is currently used to handle device removal (whether software
or hardware driven), PCI error events, and system suspend  hibernate.

The driver now supports devices which use multiple MSI interrupts, reflecting
the actual number of interrupts allocated by the system to the user level.

PCI config accesses are now done through the pci_user_{read,write)_config
routines from drivers/pci/access.c.

Numerous other tweaks and cleanups.

Blurb from version 3:

There are lots of bug fixes and cleanups in this version, but the main
change is to check to make sure that the IOMMU has interrupt remapping
enabled, which is necessary to prevent user level code from triggering
spurious interrupts for other devices.  Since most platforms today
do not have the necessary hardware and/or software for this, a module
option can override this check, thus making vfio useful (but not safe)
on many more platforms.

In the next version I plan to add kernel to user messaging using the
generic netlink mechanism to allow the user driver to react to hot add
and remove, and power management requests.

Blurb from version 2:

This version now requires an IOMMU domain to be set before any access to
device registers is granted (except that config space may be read).  In
addition, the VFIO_DMA_MAP_ANYWHERE is dropped - it used the dma_map_sg API
which does not have sufficient controls around IOMMU usage.  The IOMMU domain
is obtained from the 'uiommu' driver which is included in this patch.

Various locking, security, and documentation issues have also been fixed.

Please commit - it or me!
But seriously, who gets to commit this? Avi for KVM? or GregKH for drivers?

Blurb from version 1:

This patch is the evolution of code which was first proposed as a patch to
uio/uio_pci_generic, then as a more generic uio patch. Now it is taken entirely
out of the uio framework, and things seem much cleaner. Of course, there is
a lot of functional overlap with uio, but the previous version just seemed
like a giant mode switch in the uio code that did not lead to clarity for
either the new or old code.

[a pony for avi...]
The major new functionality in this version is the ability to deal with
PCI config space accesses (through read  write calls) - but includes table
driven code to determine whats safe to write and what is not. Also, some
virtualization of the config space to allow drivers to think they're writing
some registers when they're not.  Also, IO space accesses are also allowed.
Drivers for devices which use MSI-X are now prevented from directly writing
the MSI-X vector area.

All interrupts are now handled using eventfds, which makes things very simple.

The name VFIO refers to the Virtual Function capabilities of SR-IOV devices
but the driver does support many more types of devices.  I was none too sure
what driver directory this should live in, so for now I made up my own under
drivers/vfio. As a new driver/new directory, who makes the commit decision?

I currently have user level drivers working for 3 different network adapters
- the Cisco Palo enic, the Intel 82599 VF, and the Intel 82576 VF (but the
whole user level framework is a long ways from release).  This driver could
also clearly replace a number of other drivers written just to give user
access to certain devices - but that will take time.

Tom Lyon (3):
  VFIO V4: export pci_user_{read,write}_config
  VFIO V4: uiommu driver - allow user progs to manipulate iommu domains
  VFIO V4: VFIO driver: Non-privileged user level PCI drivers

 Documentation/ioctl/ioctl-number.txt |1 +
 Documentation/vfio.txt   |  183 
 MAINTAINERS  |8 +
 drivers/Kconfig  |2 +
 drivers/Makefile |2 +
 drivers/pci/access.c |6 +-
 drivers/pci/pci.h|7 -
 drivers/vfio/Kconfig |   18 +
 drivers/vfio/Makefile|   11 +
 drivers/vfio/uiommu.c|  126 ++
 drivers/vfio/vfio_dma.c  |  346 +++
 drivers/vfio/vfio_intrs.c|  257 
 drivers/vfio/vfio_main.c |  768 ++
 drivers/vfio/vfio_netlink.c  |  459 
 drivers/vfio/vfio_pci_config.c   |  698 ++
 drivers/vfio/vfio_rdwr.c |  158 +++
 drivers/vfio/vfio_sysfs.c|  118 ++
 include/linux/Kbuild |1 +
 include/linux/pci.h  |8 +
 include/linux/uiommu.h   |   76 
 include/linux/vfio.h |  267 
 21 files changed, 3511 insertions(+), 9 deletions

[PATCH 1/3] VFIO V4: export pci_user_{read,write}_config

2010-09-22 Thread Tom Lyon


Signed-off-by: Tom Lyon p...@cisco.com
---
 drivers/pci/access.c |6 --
 drivers/pci/pci.h|7 ---
 include/linux/pci.h  |8 
 3 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index 531bc69..96ed449 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -157,7 +157,8 @@ int pci_user_read_config_##size 
\
raw_spin_unlock_irq(pci_lock); \
*val = (type)data;  \
return ret; \
-}
+}  \
+EXPORT_SYMBOL_GPL(pci_user_read_config_##size);
 
 #define PCI_USER_WRITE_CONFIG(size,type)   \
 int pci_user_write_config_##size   \
@@ -171,7 +172,8 @@ int pci_user_write_config_##size
\
pos, sizeof(type), val);\
raw_spin_unlock_irq(pci_lock); \
return ret; \
-}
+}  \
+EXPORT_SYMBOL_GPL(pci_user_write_config_##size);
 
 PCI_USER_READ_CONFIG(byte, u8)
 PCI_USER_READ_CONFIG(word, u16)
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index f8077b3..76c3ed3 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -67,13 +67,6 @@ static inline bool pci_is_bridge(struct pci_dev *pci_dev)
return !!(pci_dev-subordinate);
 }
 
-extern int pci_user_read_config_byte(struct pci_dev *dev, int where, u8 *val);
-extern int pci_user_read_config_word(struct pci_dev *dev, int where, u16 *val);
-extern int pci_user_read_config_dword(struct pci_dev *dev, int where, u32 
*val);
-extern int pci_user_write_config_byte(struct pci_dev *dev, int where, u8 val);
-extern int pci_user_write_config_word(struct pci_dev *dev, int where, u16 val);
-extern int pci_user_write_config_dword(struct pci_dev *dev, int where, u32 
val);
-
 struct pci_vpd_ops {
ssize_t (*read)(struct pci_dev *dev, loff_t pos, size_t count, void 
*buf);
ssize_t (*write)(struct pci_dev *dev, loff_t pos, size_t count, const 
void *buf);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index f26fda7..36eacbf 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -754,6 +754,14 @@ static inline int pci_write_config_dword(struct pci_dev 
*dev, int where,
return pci_bus_write_config_dword(dev-bus, dev-devfn, where, val);
 }
 
+/* user-space driven config access */
+extern int pci_user_read_config_byte(struct pci_dev *dev, int where, u8 *val);
+extern int pci_user_read_config_word(struct pci_dev *dev, int where, u16 *val);
+extern int pci_user_read_config_dword(struct pci_dev *dev, int where, u32 
*val);
+extern int pci_user_write_config_byte(struct pci_dev *dev, int where, u8 val);
+extern int pci_user_write_config_word(struct pci_dev *dev, int where, u16 val);
+extern int pci_user_write_config_dword(struct pci_dev *dev, int where, u32 
val);
+
 int __must_check pci_enable_device(struct pci_dev *dev);
 int __must_check pci_enable_device_io(struct pci_dev *dev);
 int __must_check pci_enable_device_mem(struct pci_dev *dev);
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] VFIO V4: uiommu driver - allow user progs to manipulate iommu domains

2010-09-22 Thread Tom Lyon


Signed-off-by: Tom Lyon p...@cisco.com
---
 drivers/Kconfig|2 +
 drivers/Makefile   |1 +
 drivers/vfio/Kconfig   |8 +++
 drivers/vfio/Makefile  |1 +
 drivers/vfio/uiommu.c  |  126 
 include/linux/uiommu.h |   76 +
 6 files changed, 214 insertions(+), 0 deletions(-)
 create mode 100644 drivers/vfio/Kconfig
 create mode 100644 drivers/vfio/Makefile
 create mode 100644 drivers/vfio/uiommu.c
 create mode 100644 include/linux/uiommu.h

diff --git a/drivers/Kconfig b/drivers/Kconfig
index a2b902f..711c1cb 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -111,4 +111,6 @@ source drivers/xen/Kconfig
 source drivers/staging/Kconfig
 
 source drivers/platform/Kconfig
+
+source drivers/vfio/Kconfig
 endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index 91874e0..e496fc3 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -53,6 +53,7 @@ obj-$(CONFIG_FUSION)  += message/
 obj-$(CONFIG_FIREWIRE) += firewire/
 obj-y  += ieee1394/
 obj-$(CONFIG_UIO)  += uio/
+obj-$(CONFIG_UIOMMU)   += vfio/
 obj-y  += cdrom/
 obj-y  += auxdisplay/
 obj-$(CONFIG_PCCARD)   += pcmcia/
diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
new file mode 100644
index 000..3ab9af3
--- /dev/null
+++ b/drivers/vfio/Kconfig
@@ -0,0 +1,8 @@
+menuconfig UIOMMU
+   tristate User level manipulation of IOMMU
+   help
+ Device driver to allow user level programs to
+ manipulate IOMMU domains.
+
+ If you don't know what to do here, say N.
+
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
new file mode 100644
index 000..556f3c1
--- /dev/null
+++ b/drivers/vfio/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_UIOMMU) += uiommu.o
diff --git a/drivers/vfio/uiommu.c b/drivers/vfio/uiommu.c
new file mode 100644
index 000..eec1759
--- /dev/null
+++ b/drivers/vfio/uiommu.c
@@ -0,0 +1,126 @@
+/*
+ * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
+ * Author: Tom Lyon, p...@cisco.com
+ *
+ * This program is free software; you may redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; version 2 of the License.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+*/
+
+/*
+ * uiommu driver - issue fd handles for IOMMU domains
+ * so they may be passed to vfio (and others?)
+ */
+#include linux/fs.h
+#include linux/mm.h
+#include linux/init.h
+#include linux/kernel.h
+#include linux/miscdevice.h
+#include linux/module.h
+#include linux/slab.h
+#include linux/file.h
+#include linux/iommu.h
+#include linux/uiommu.h
+
+MODULE_LICENSE(GPL);
+MODULE_AUTHOR(Tom Lyon p...@cisco.com);
+MODULE_DESCRIPTION(User IOMMU driver);
+
+static struct uiommu_domain *uiommu_domain_alloc(void)
+{
+   struct iommu_domain *domain;
+   struct uiommu_domain *udomain;
+
+   domain = iommu_domain_alloc();
+   if (domain == NULL)
+   return NULL;
+   udomain = kzalloc(sizeof *udomain, GFP_KERNEL);
+   if (udomain == NULL) {
+   iommu_domain_free(domain);
+   return NULL;
+   }
+   udomain-domain = domain;
+   atomic_inc(udomain-refcnt);
+   return udomain;
+}
+
+static int uiommu_open(struct inode *inode, struct file *file)
+{
+   struct uiommu_domain *udomain;
+
+   udomain = uiommu_domain_alloc();
+   if (udomain == NULL)
+   return -ENOMEM;
+   file-private_data = udomain;
+   return 0;
+}
+
+static int uiommu_release(struct inode *inode, struct file *file)
+{
+   struct uiommu_domain *udomain;
+
+   udomain = file-private_data;
+   uiommu_put(udomain);
+   return 0;
+}
+
+static const struct file_operations uiommu_fops = {
+   .owner  = THIS_MODULE,
+   .open   = uiommu_open,
+   .release= uiommu_release,
+};
+
+static struct miscdevice uiommu_dev = {
+   .name   = uiommu,
+   .minor  = MISC_DYNAMIC_MINOR,
+   .fops   = uiommu_fops,
+};
+
+struct uiommu_domain *uiommu_fdget(int fd)
+{
+   struct file *file;
+   struct uiommu_domain *udomain;
+
+   file = fget(fd);
+   if (!file)
+   return ERR_PTR(-EBADF);
+   if (file-f_op != uiommu_fops) {
+   fput(file);
+   return ERR_PTR(-EINVAL);
+   }
+   udomain = file-private_data;
+   atomic_inc(udomain-refcnt

Re: [PATCH V3] VFIO driver: Non-privileged user level PCI drivers

2010-07-28 Thread Tom Lyon

On Tuesday, July 27, 2010 04:53:22 pm Michael S. Tsirkin wrote:
 On Tue, Jul 27, 2010 at 03:13:14PM -0700, Tom Lyon wrote:
  [ Sorry for the long hiatus, I've been wrapped up in other issues.]
  
  I think the fundamental issue to resolve is to decide on the model which
  the VFIO driver presents to its users.
  
  Fundamentally, VFIO as part of the OS must protect the system from its
  users and also protect the users from each other.  No disagreement here.
  
  But another fundamental purpose of an OS to to present an abstract model
  of the underlying hardware to its users, so that the users don't have to
  deal with the full complexity of the hardware.
  
  So I think VFIO should present a 'virtual', abstracted PCI device to its
  users whereas Michael has argued for a simpler model of presenting the
  real PCI device config registers while preventing writes only to the
  registers which would clearly disrupt the system.
 
 In fact, there is no contradiction. I am all for an abstracted
 API *and* I think the virtualization concept is a bad way
 to build this API.
 
 The 'virtual' interface you present is very complex and hardware specific:
 you do not hide literally *anything*. Deciding which functionality
 userspace needs, and exposing it to userspace as a set of APIs would be
 abstract. Instead you ask people to go read the PCI spec, the device spec,
 and bang on PCI registers, little-endian-ness and all, then try to
 interpret what do the virtual values mean.

Exactly! The PCI bus is far better *specified*, *documented*, and widely 
implemented than a Linux driver could ever hope to be.  And there are lots of 
current Linux drivers which bang around in pci config space simply because the 
authors were not aware of some api call buried deep in linux which would do 
the work for them - or - got tired of using OS-specific APIs when porting a 
driver and decided to just ask the hardware.

 Example:
 
 How do I find # of MSI-X vectors? Sure, scan the capability list,
 find the capability, read the value, convert from little endian
 at each step.
 A page or two of code, and let's hope I have a ppc to test on.
 And note no driver has this code - they all use OS routines.
 
 So why wouldn't
   ioctl(dev, VFIO_GET_MSIX_VECTORS, n);
 better serve the declared goal of presenting an abstracted PCI device to
 users?

By and large, the user drivers just know how many because the hardware is 
constant.

And inventing 20 or 30 ioctls to do a bunch of random stuff is gross when you 
can instead use normal read and write calls to a well defined structure.
 
  Now, the virtual model *could* look little like the real hardware, and
  use bunches of ioctls for everything it needs,
 
 Or reads/writes at special offsets, or sysfs attributes.
 
  or it could look a lot like PCI and
  use reads and writes of the virtual PCI config registers to trigger its
  actions.  The latter makes things more amenable to those porting drivers
  from other environments.
 
 I really doubt this helps at all. Drivers typically use OS-specific
 APIs. It is very uncommon for them to touch standard registers,
 which is 100% of what your patch seem to be dealing with.
 
 And again, how about a small userspace library that would wrap vfio and
 add the abstractions for drivers that do need them?

Yes, there will be userspace libraries - I already have a vfio backend for 
libpci.
 
  I realize that to date the VFIO driver has been a  bit of a mish-mash
  between the ioctl and config based techniques; I intend to clean that
  up.  And, yes, the abstract model presented by VFIO will need plenty of
  documentation.
 
 And, it will need to be maintained forever, bugs and all.
 For example, if you change some register you emulated
 to fix a bug, to the driver this looks like a hardware change,
 and it will crash.

The changes will come only to allow for a more-perfect emulation, so I doubt 
that  will cause driver problems.  No different than discovering and fixing 
bugs in the ioctls needed in you scenario.

 
 The PCI spec has some weak versioning support, but it
 is mostly not a problem in that space: a specific driver needs to
 only deal with a specific device.  We have a generic driver so PCI
 configuration space is a bad interface to use.

PCI has great versioning. Damn near every change made in 16+ years has been 
upwards compatible.  BIOS and OS writers don't have trouble with generic PCI, 
why should vfio?

 
  Since KVM/qemu already has its own notion of a virtual PCI device which
  it presents to the guest OS, we either need to reconcile VFIO and qemu,
  or provide a bypass of the VFIO virtual model.  This could be direct
  access through sysfs, or else an ioctl to VFIO.  Since I have no
  internals knowledge of qemu, I look to others to choose.
 
 Ah, so there will be 2 APIs, one for qemu, one for userspace drivers?

I hope not, but I also hope not to become the qemu expert to find out.  Alex 
W. seemed to be making progress in this area

Re: [PATCH V3] VFIO driver: Non-privileged user level PCI drivers

2010-07-27 Thread Tom Lyon

[ Sorry for the long hiatus, I've been wrapped up in other issues.]

I think the fundamental issue to resolve is to decide on the model which the 
VFIO driver presents to its users.

Fundamentally, VFIO as part of the OS must protect the system from its users 
and also protect the users from each other.  No disagreement here.

But another fundamental purpose of an OS to to present an abstract model of 
the underlying hardware to its users, so that the users don't have to deal 
with the full complexity of the hardware.

So I think VFIO should present a 'virtual', abstracted PCI device to its users 
whereas Michael has argued for a simpler model of presenting the real PCI
device config registers while preventing writes only to the registers which 
would clearly disrupt the system.

Now, the virtual model *could* look little like the real hardware, and use 
bunches of ioctls for everything it needs, or it could look a lot like PCI and 
use reads and writes of the virtual PCI config registers to trigger its 
actions.  The latter makes things more amenable to those porting drivers from 
other environments.

I realize that to date the VFIO driver has been a  bit of a mish-mash between 
the ioctl and config based techniques; I intend to clean that up.  And, yes, 
the abstract model presented by VFIO will need plenty of documentation.

Since KVM/qemu already has its own notion of a virtual PCI device which it 
presents to the guest OS, we either need to reconcile VFIO and qemu, or 
provide a bypass of the VFIO virtual model.  This could be direct access 
through sysfs, or else an ioctl to VFIO.  Since I have no internals knowledge 
of qemu, I look to others to choose.

Other little things:
1. Yes, I can share some code with sysfs if I can get the right EXPORTs there.
2. I'll add multiple MSI support, but I wish to point out that even though the 
PCI MSI API supports it, none of the architectures do.
3. FLR needs work.  I was foolish enough to assume that FLR wouldn't reset 
BARs; now I know better.
4. I'll get rid of the vfio config_map in sysfs; it was there for debugging.
5. I'm still looking to support hotplug/unplug and power management stuff via 
generic netlink notifications.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V2] 2.6.34: simple IOMMU API extension to check safe interrupt remapping

2010-07-02 Thread Tom Lyon

On Friday 02 July 2010 02:26:46 am Roedel, Joerg wrote:
 On Thu, Jul 01, 2010 at 05:24:32PM -0400, Tom Lyon wrote:
  This patch allows IOMMU users to determine whether the hardware and software
  support safe, isolated interrupt remapping.  Not all Intel IOMMUs have the
  hardware, and the software for AMD is not there yet.
  Signed-off-by: Tom Lyon p...@cisco.com
 
 It does not make a lot of sense to check for this feature in the
 IOMMU-API currently because there is no support for intr-remapping in
 there. But it will be there when intr-remapping support for AMD IOMMU is
 implemented. So this change can be considered as a first step in that
 direction. Please repost with the change requested below an I'll add
 this one to my tree.
OK, but I'm not sure what you mean by the first sentence.  The Intel stuff
today does intr remapping as a side effect of X2APIC support.

 
  Version 2: previous ifdefs not needed.
  
  MST has convinced me that any user level driver for PCI master devices 
  can't 
  be safe unless there is an IOMMU protecting the APIC MSI/MSI-X interrupt 
  addresses from device writes.  This interrupt remapping is not present in 
  all
  Intel IOMMUs and the code for the interrupt mapping in the AMD IOMMUs is not
  implemented yet.
  
  Needed by not-yet-accepted VFIO driver.
  
  diff -uprN linux-2.6.34/drivers/pci/intel-iommu.c 
  iommuapi-linux-2.6.34/drivers/pci/intel-iommu.c
  --- linux-2.6.34/drivers/pci/intel-iommu.c  2010-05-16 14:17:36.0 
  -0700
  +++ iommuapi-linux-2.6.34/drivers/pci/intel-iommu.c 2010-06-30 
  15:47:10.0 -0700
  @@ -3705,6 +3705,10 @@ static int intel_iommu_domain_has_cap(st
   
  if (cap == IOMMU_CAP_CACHE_COHERENCY)
  return dmar_domain-iommu_snooping;
  +   if (cap == IOMMU_CAP_SAFE_INTR_REMAP)
  +   return intr_remapping_enabled;
   
  return 0;
   }
  diff -uprN linux-2.6.34/include/linux/iommu.h 
  iommuapi-linux-2.6.34/include/linux/iommu.h
  --- linux-2.6.34/include/linux/iommu.h  2010-05-16 14:17:36.0 
  -0700
  +++ iommuapi-linux-2.6.34/include/linux/iommu.h 2010-06-30 
  15:47:34.0 -0700
  @@ -30,6 +30,7 @@ struct iommu_domain {
   };
   
   #define IOMMU_CAP_CACHE_COHERENCY  0x1
  +#define IOMMU_CAP_SAFE_INTR_REMAP  0x2 /* isolates device intrs */
 
 I think the SAFE_ is not necessary in the name. It is misleading because
 it indicates that there is an unsafe variant of intr-remapping available
 when this capability is not set. Just call it IOMMU_CAP_INTR_REMAPPING.
 
 
   Joerg
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V3] 2.6.34: simple IOMMU API extension to check safe interrupt remapping

2010-07-02 Thread Tom Lyon

This patch allows IOMMU users to determine whether the hardware and software
support safe, isolated interrupt remapping.  Not all Intel IOMMUs have the
hardware, and the software for AMD is not there yet.
Signed-off-by: Tom Lyon p...@cisco.com
---
Version 3: shorter name requested by Joerg.
Version 2: previous ifdefs not needed.

MST has convinced me that any user level driver for PCI master devices can't 
be safe unless there is an IOMMU protecting the APIC MSI/MSI-X interrupt 
addresses from device writes.  This interrupt remapping is not present in all
Intel IOMMUs and the code for the interrupt mapping in the AMD IOMMUs is not
implemented yet.

Needed by not-yet-accepted VFIO driver.

diff -uprN linux-2.6.34/drivers/pci/intel-iommu.c 
iommuapi-linux-2.6.34/drivers/pci/intel-iommu.c
--- linux-2.6.34/drivers/pci/intel-iommu.c  2010-05-16 14:17:36.0 
-0700
+++ iommuapi-linux-2.6.34/drivers/pci/intel-iommu.c 2010-07-02 
12:39:19.0 -0700
@@ -3705,6 +3705,8 @@ static int intel_iommu_domain_has_cap(st
 
if (cap == IOMMU_CAP_CACHE_COHERENCY)
return dmar_domain-iommu_snooping;
+   if (cap == IOMMU_CAP_INTR_REMAP)
+   return intr_remapping_enabled;
 
return 0;
 }
diff -uprN linux-2.6.34/include/linux/iommu.h 
iommuapi-linux-2.6.34/include/linux/iommu.h
--- linux-2.6.34/include/linux/iommu.h  2010-05-16 14:17:36.0 -0700
+++ iommuapi-linux-2.6.34/include/linux/iommu.h 2010-07-02 12:38:45.0 
-0700
@@ -30,6 +30,7 @@ struct iommu_domain {
 };
 
 #define IOMMU_CAP_CACHE_COHERENCY  0x1
+#define IOMMU_CAP_INTR_REMAP   0x2 /* isolates device intrs */
 
 struct iommu_ops {
int (*domain_init)(struct iommu_domain *domain);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V2] VFIO driver: Non-privileged user level PCI drivers

2010-07-01 Thread Tom Lyon

On Thursday 01 July 2010 08:48:41 am Alex Williamson wrote:
 On Thu, 2010-07-01 at 18:31 +0300, Michael S. Tsirkin wrote:
  On Thu, Jul 01, 2010 at 09:29:04AM -0600, Alex Williamson wrote:
   On Tue, 2010-06-08 at 14:21 -0700, Tom Lyon wrote:
+The VFIO_DMA_MASK ioctl is used to set the maximum permissible DMA 
address
+(device dependent). It takes a single unsigned 64 bit integer as an 
argument.
+This call also has the side effect of enabling PCI bus mastership.
   
   Hi Tom,
   
   This interface doesn't make sense for the MAP_IOVA user.  Especially in
   qemu, we have no idea what the DMA mask is for the device we're
   assigning.  It doesn't really matter though because the guest will use
   bounce buffers internally once it loads the device specific drivers and
   discovers the DMA mask.  This only seems relevant if we're using a
   DMA_MAP call that gets to pick the dmaaddr, so I'd propose we only make
   this a required call for that interface, and create a separate ioctl for
   actually enabling bus master.  Thanks,
   
   Alex
  
  I expect there's no need for a separate ioctl to do this:
  you can do this by write to the control register.
 
 Nope, vfio only allows direct writes to the memory and io space bits of
 the command register, all other bits are virtualized.  I wonder if
 that's necessary though since we require the device to be attached to an
 iommu domain before we allow config space access.
 
I had already gotten rid of the mask setting  master mode ioctl. It was a 
remnant
of using the dma_map_sg API which is no longer in there.  And I tweaked the
config stuff to allow writing the master bit.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V2] 2.6.34: simple IOMMU API extension to check safe interrupt remapping

2010-07-01 Thread Tom Lyon

This patch allows IOMMU users to determine whether the hardware and software
support safe, isolated interrupt remapping.  Not all Intel IOMMUs have the
hardware, and the software for AMD is not there yet.
Signed-off-by: Tom Lyon p...@cisco.com
---

Version 2: previous ifdefs not needed.

MST has convinced me that any user level driver for PCI master devices can't 
be safe unless there is an IOMMU protecting the APIC MSI/MSI-X interrupt 
addresses from device writes.  This interrupt remapping is not present in all
Intel IOMMUs and the code for the interrupt mapping in the AMD IOMMUs is not
implemented yet.

Needed by not-yet-accepted VFIO driver.

diff -uprN linux-2.6.34/drivers/pci/intel-iommu.c 
iommuapi-linux-2.6.34/drivers/pci/intel-iommu.c
--- linux-2.6.34/drivers/pci/intel-iommu.c  2010-05-16 14:17:36.0 
-0700
+++ iommuapi-linux-2.6.34/drivers/pci/intel-iommu.c 2010-06-30 
15:47:10.0 -0700
@@ -3705,6 +3705,10 @@ static int intel_iommu_domain_has_cap(st
 
if (cap == IOMMU_CAP_CACHE_COHERENCY)
return dmar_domain-iommu_snooping;
+   if (cap == IOMMU_CAP_SAFE_INTR_REMAP)
+   return intr_remapping_enabled;
 
return 0;
 }
diff -uprN linux-2.6.34/include/linux/iommu.h 
iommuapi-linux-2.6.34/include/linux/iommu.h
--- linux-2.6.34/include/linux/iommu.h  2010-05-16 14:17:36.0 -0700
+++ iommuapi-linux-2.6.34/include/linux/iommu.h 2010-06-30 15:47:34.0 
-0700
@@ -30,6 +30,7 @@ struct iommu_domain {
 };
 
 #define IOMMU_CAP_CACHE_COHERENCY  0x1
+#define IOMMU_CAP_SAFE_INTR_REMAP  0x2 /* isolates device intrs */
 
 struct iommu_ops {
int (*domain_init)(struct iommu_domain *domain);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V2] VFIO driver: Non-privileged user level PCI drivers

2010-06-30 Thread Tom Lyon

Thanks, Alex!
Am incorporating...

On Tuesday 29 June 2010 11:14:12 pm Alex Williamson wrote:
 On Tue, 2010-06-08 at 14:21 -0700, Tom Lyon wrote:
  The VFIO driver is used to allow privileged AND non-privileged processes 
  to 
  implement user-level device drivers for any well-behaved PCI, PCI-X, and 
  PCIe
  devices.
 
 Hi Tom,
 
 I found a few bugs.  Patch below.  The first chunk clears the
 pci_config_map on close, otherwise we end up passing virtualized state
 from one user to the next.  The second is an off by one in the basic
 perms.  Finally, vfio_bar_fixup() needs an overhaul.  It wasn't setting
 the lower bits right and is allowing virtual writes of bits that aren't
 aligned to the size.  This section probably needs another pass or two of
 refinement.  Thanks,
 
 Alex
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---
 
 diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
 index 96639e5..a0e8227 100644
 --- a/drivers/vfio/vfio_main.c
 +++ b/drivers/vfio/vfio_main.c
 @@ -129,6 +129,10 @@ static int vfio_release(struct inode *inode, struct file 
 *filep)
   eventfd_ctx_put(vdev-ev_msi);
   vdev-ev_irq = NULL;
   }
 + if (vdev-pci_config_map) {
 + kfree(vdev-pci_config_map);
 + vdev-pci_config_map = NULL;
 + }
   vfio_domain_unset(vdev);
   /* reset to known state if we can */
   (void) pci_reset_function(vdev-pdev);
 diff --git a/drivers/vfio/vfio_pci_config.c b/drivers/vfio/vfio_pci_config.c
 index c821b5d..f6e26b1 100644
 --- a/drivers/vfio/vfio_pci_config.c
 +++ b/drivers/vfio/vfio_pci_config.c
 @@ -79,18 +79,18 @@ struct perm_bits {
  static struct perm_bits pci_cap_basic_perm[] = {
   { 0x,   0, },   /* 0x00 vendor  device id - RO */
   { 0,0xFFFC, },  /* 0x04 cmd  status except mem/io */
 - { 0,0xFF00, },  /* 0x08 bist, htype, lat, cache */
 - { 0x,   0x, },  /* 0x0c bar */
 + { 0,0, },   /* 0x08 class code  revision id */
 + { 0,0xFF00, },  /* 0x0c bist, htype, lat, cache */
   { 0x,   0x, },  /* 0x10 bar */
   { 0x,   0x, },  /* 0x14 bar */
   { 0x,   0x, },  /* 0x18 bar */
   { 0x,   0x, },  /* 0x1c bar */
   { 0x,   0x, },  /* 0x20 bar */
 - { 0,0, },   /* 0x24 cardbus - not yet */
 - { 0,0, },   /* 0x28 subsys vendor  dev */
 - { 0x,   0x, },  /* 0x2c rom bar */
 - { 0,0, },   /* 0x30 capability ptr  resv */
 - { 0,0, },   /* 0x34 resv */
 + { 0x,   0x, },  /* 0x24 bar */
 + { 0,0, },   /* 0x28 cardbus - not yet */
 + { 0,0, },   /* 0x2c subsys vendor  dev */
 + { 0x,   0x, },  /* 0x30 rom bar */
 + { 0,0, },   /* 0x34 capability ptr  resv */
   { 0,0, },   /* 0x38 resv */
   { 0x00FF,   0x00FF, },  /* 0x3c max_lat ... irq */
  };
 @@ -318,30 +318,55 @@ static void vfio_virt_init(struct vfio_dev *vdev)
  static void vfio_bar_fixup(struct vfio_dev *vdev)
  {
   struct pci_dev *pdev = vdev-pdev;
 - int bar;
 - u32 *lp;
 - u32 len;
 + int bar, mem64 = 0;
 + u32 *lp = NULL;
 + u64 len = 0;
  
   for (bar = 0; bar = 5; bar++) {
 - len = pci_resource_len(pdev, bar);
 - lp = (u32 *)vdev-vinfo.bar[bar * 4];
 - if (len == 0) {
 - *lp = 0;
 - } else if (pci_resource_flags(pdev, bar)  IORESOURCE_MEM) {
 - *lp = ~0x1;
 - *lp = (*lp  ~(len-1)) |
 - (*lp  ~PCI_BASE_ADDRESS_MEM_MASK);
 - if (*lp  PCI_BASE_ADDRESS_MEM_TYPE_64)
 - bar++;
 - } else if (pci_resource_flags(pdev, bar)  IORESOURCE_IO) {
 + if (!mem64) {
 + len = pci_resource_len(pdev, bar);
 + lp = (u32 *)vdev-vinfo.bar[bar * 4];
 + if (len == 0) {
 + *lp = 0;
 + continue;
 + }
 +
 + len = ~(len - 1);
 + } else
 + len = 32;
 +
 + if (*lp == ~0U)
 + *lp = (u32)len;
 + else
 + *lp = (u32)len;
 + 
 + if (mem64) {
 + mem64 = 0;
 + continue;
 + }
 +
 + if (pci_resource_flags(pdev, bar)  IORESOURCE_IO)
   *lp |= PCI_BASE_ADDRESS_SPACE_IO;
 - *lp = (*lp  ~(len-1

Re: [PATCH V2] VFIO driver: Non-privileged user level PCI drivers

2010-06-30 Thread Tom Lyon

On Wednesday 30 June 2010 03:32:56 pm Michael S. Tsirkin wrote:
 On Wed, Jun 30, 2010 at 03:17:55PM -0700, Tom Lyon wrote:
  Thanks, Alex!
  Am incorporating...
 
 I get it there's no chance you'll drop the virtualization
 from the driver then?
 
I think it'll get a whole lot simpler by depending on iommu interrupt remapping,
but  I thinke some of it is still useful.

I am now stuck trying to get access to a system with interrupt remapping.
Turns out my Intel IOMMU doesn't have it.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] 2.6.34: simple IOMMU API extension to check safe interrupt remapping

2010-06-30 Thread Tom Lyon

This patch allows IOMMU users to determine whether the hardware and software
support safe, isolated interrupt remapping.  Not all Intel IOMMUs have the
hardware, and the software for AMD is not there yet.
Signed-off-by: Tom Lyon p...@cisco.com
---

MST has convinced me that any user level driver for PCI master devices can't 
be safe unless there is an IOMMU protecting the APIC MSI/MSI-X interrupt 
addresses from device writes.  This interrupt remapping is not present in all
Intel IOMMUs and the code for the interrupt mapping in the AMD IOMMUs is not
implemented yet.

Needed by not-yet-accepted VFIO driver.

diff -uprN linux-2.6.34/drivers/pci/intel-iommu.c 
iommuapi-linux-2.6.34/drivers/pci/intel-iommu.c
--- linux-2.6.34/drivers/pci/intel-iommu.c  2010-05-16 14:17:36.0 
-0700
+++ iommuapi-linux-2.6.34/drivers/pci/intel-iommu.c 2010-06-30 
15:47:10.0 -0700
@@ -3705,6 +3705,10 @@ static int intel_iommu_domain_has_cap(st
 
if (cap == IOMMU_CAP_CACHE_COHERENCY)
return dmar_domain-iommu_snooping;
+#ifdef CONFIG_INTR_REMAP
+   if (cap == IOMMU_CAP_SAFE_INTR_REMAP)
+   return intr_remapping_enabled;
+#endif
 
return 0;
 }
diff -uprN linux-2.6.34/include/linux/iommu.h 
iommuapi-linux-2.6.34/include/linux/iommu.h
--- linux-2.6.34/include/linux/iommu.h  2010-05-16 14:17:36.0 -0700
+++ iommuapi-linux-2.6.34/include/linux/iommu.h 2010-06-30 15:47:34.0 
-0700
@@ -30,6 +30,7 @@ struct iommu_domain {
 };
 
 #define IOMMU_CAP_CACHE_COHERENCY  0x1
+#define IOMMU_CAP_SAFE_INTR_REMAP  0x2 /* isolates device intrs */
 
 struct iommu_ops {
int (*domain_init)(struct iommu_domain *domain);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V2] VFIO driver: Non-privileged user level PCI drivers

2010-06-30 Thread Tom Lyon

On Wednesday 30 June 2010 09:16:23 pm Alex Williamson wrote:
 On Tue, 2010-06-08 at 14:21 -0700, Tom Lyon wrote:
  +int vfio_dma_unmap_dm(struct vfio_listener *listener, struct vfio_dma_map 
  *dmp)
  +{
  +   unsigned long start, npage;
  +   struct dma_map_page *mlp;
  +   struct list_head *pos, *pos2;
  +   int ret;
  +
  +   start = dmp-vaddr  ~PAGE_SIZE;
  +   npage = dmp-size  PAGE_SHIFT;
  +
  +   ret = -ENXIO;
  +   mutex_lock(listener-vdev-dgate);
  +   list_for_each_safe(pos, pos2, listener-dm_list) {
  +   mlp = list_entry(pos, struct dma_map_page, list);
  +   if (dmp-vaddr != mlp-vaddr || mlp-npage != npage)
  +   continue;
  +   ret = 0;
  +   vfio_dma_unmap(listener, mlp);
  +   break;
  +   }
 
 Hi Tom,
 
 Shouldn't we be matching the mlp based on daddr instead of vaddr?  We
 can have multiple dma address pointing at the same virtual address, so
 dma address is the unique element.  I'm also nervous about this dm_list.
 For qemu device assignment, we're potentially statically mapping many GB
 of iova space.  It seems like this could get incredibly bloated and
 slow.  Thanks,
 
 Alex

In weird circumstances, differing user vaddrs could reolve to the same physical 
address,
so the uniqueness of any mapping is the vaddr,len.

Yes, a linear list is slow, but does qemu need a lot of mappings, or just big 
ones?
 
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V2] VFIO driver: Non-privileged user level PCI drivers

2010-06-17 Thread Tom Lyon

On Sunday 13 June 2010 03:23:39 am Michael S. Tsirkin wrote:
 On Fri, Jun 11, 2010 at 03:15:53PM -0700, Tom Lyon wrote:
  [ bunch of stuff about MSI-X checking and IOMMUs and config registers...]
  
  OK, here's the thing.  The IOMMU API today does not do squat about
  dealing with interrupts. Interrupts are special because the APIC
  addresses are not each in their own page.  Yes, the IOMMU hardware
  supports it (at least Intel), and there's some Intel intr remapping
  code (not AMD), but it doesn't look like it is enough.
 
 The iommu book from AMD seems to say that interrupt remapping table
 address is taken from the device table entry.  So hardware support seems
 to be there, and to me it looks like it should be enough.
 Need to look at the iommu/msi code some more to figure out
 whether what linux does is handling this correctly -
 if it doesn't we need to fix that.
 
  Therefore, we must not allow the user level driver to diddle the MSI
  or MSI-X areas - either in config space or in the device memory space.
 
 It won't help.
 Consider that you want to let a userspace driver control
 the device with DMA capabilities.
 
 So if there is a range of addresses that device
 can write into that can break host, these writes
 can be triggered by userspace. Limiting
 userspace access to MSI registers won't help:
 you need a way to protect host from the device.

OK, after more investigation, I realize you are right.
We definitely need the IOMMU protection for interrupts, and
if we have it, a lot of the code for config space protection is pointless.
It does seem that the Intel  intr_remapping code does what we want
(accidentally) but that the AMD iommu code does not yet do any
interrupt remapping.  Joerg - can you comment? On the roadmap?

I should have an AMD system w IOMMU in a couple of days, so I
can check this out.

 
   If the device doesn't have its MSI-X registers in nice page aligned
   areas, then it is not well-behaved and it is S.O.L. The SR-IOV spec
   recommends that devices be designed the well-behaved way.
  
  When the code in vfio_pci_config speaks of virtualization it means
  that there are fake registers which the user driver can read or write,
  but do not affect the real registers. BARs are one case, MSI regs
  another. The PCI vendor and device ID are virtual because SR-IOV
  doesn't supply them but I wanted the user driver to find them in the
  same old place.
 
 Sorry, I still don't understand why do we bother.  All this is already
 implemented in userspace.  Why can't we just use this existing userspace
 implementation?  It seems that all kernel needs to do is prevent
 userspace from writing BARs.

I assume the userspace of which you speak is qemu?  This is not what I'm
doing with vfio - I'm interested in the HPC networking model of direct 
user space access to the network. 

 Why can't we replace all this complexity with basically:
 
 if (addr = PCI_BASE_ADDRESS_5  addr + len = PCI_BASE_ADDRESS_0)
   return -ENOPERM;
 
 And maybe another register or two. Most registers should be fine.
 
  [ Re: Hotplug and Suspend/Resume]
  There are *plenty* of real drivers - brand new ones - which don't
  bother with these today.  Yeah, I can see adding them to the framework
  someday - but if there's no urgent need then it is way down the
  priority list.
 
 Well, for kernel drivers everything mostly works out of the box, it is
 handled by the PCI subsystem.  So some kind of framework will need to be
 added for userspace drivers as well.  And I suspect this issue won't be
 fixable later without breaking applications.

Whatever works out of the box for the kernel drivers which don't implement
suspend/resume will work for the user level drivers which don't.
 
  Meanwhile, the other uses beckon.
 
 Which other uses? I thought the whole point was fixing
 what's broken with current kvm implementation.
 So it seems to be we should not rush it ignoring existing issues such as
 hotplug.
Non-kvm cases.  That don't care about suspend/resume.

 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V2] VFIO driver: Non-privileged user level PCI drivers

2010-06-11 Thread Tom Lyon

The inline comments are getting pretty hard to wade through, so I'm deleting 
some
of the lesser stuff - but I am incorporating into the code.

On Tuesday 08 June 2010 10:45:57 pm Michael S. Tsirkin wrote:
 On Tue, Jun 08, 2010 at 04:54:43PM -0700, Tom Lyon wrote:
  On Tuesday 08 June 2010 03:38:44 pm Michael S. Tsirkin wrote:
   On Tue, Jun 08, 2010 at 02:21:52PM -0700, Tom Lyon wrote:
...
+   /* reset to known state if we can */
+   (void) pci_reset_function(vdev-pdev);
   
   We are opening the device - how can it not be in a known state?
  If an alternate driver left it in a weird state.
 
 Don't we care if it fails then? I think we do ...
   And we should make sure (at open time) we *can* reset on close, fail 
   binding if we can't.
  How do you propose?
 
 Fail open if reset fails on open?
OK, it'll now fail open if reset fails.

[ bunch of stuff about MSI-X checking and IOMMUs and config registers...]

OK, here's the thing.  The IOMMU API today does not do squat about dealing with
interrupts. Interrupts are special because the APIC  addresses are not each in 
their own page.  Yes, the IOMMU hardware supports it (at least Intel), and 
there's
some Intel intr remapping code (not AMD), but it doesn't look like it is enough.

Therefore, we must not allow the user level driver to diddle the MSI or MSI-X
areas - either in config space or in the device memory space.  If the device
doesn't have its MSI-X registers in nice page aligned areas, then it is not
well-behaved and it is S.O.L. The SR-IOV spec recommends that devices be
designed the well-behaved way.

When the code in vfio_pci_config speaks of virtualization it means that there
are fake registers which the user driver can read or write, but do not affect 
the
real registers. BARs are one case, MSI regs another. The PCI vendor and device
ID are virtual because SR-IOV doesn't supply them but I wanted the user driver
to find them in the same old place.

+   case VFIO_DMA_MASK: /* set master mode and DMA mask */
+   if (copy_from_user(mask, uarg, sizeof mask))
+   return -EFAULT;
+   pci_set_master(pdev);
   
   This better be done by userspace when it sees fit.
   Otherwise device might corrupt userspace memory.
This ioctl is gone now - it was vestigial from the dma_sg_map interface.

[ Re: Hotplug and Suspend/Resume]
There are *plenty* of real drivers - brand new ones - which don't bother
with these today.  Yeah, I can see adding them to the framework someday -
but if there's no urgent need then it is way down the priority list. Meanwhile,
the other uses beckon.  And I never heard the Infiniband users complaining
about not having these things.


+   pages = kzalloc(npage * sizeof(struct page *), GFP_KERNEL);
   
   If you map a 4G region, this will try to allocate 8Mbytes?
  Yes. Ce la vie.
 
 First of all, this will fail - and the request is quite real with
 decent sized guests. Second, with appropriately sized allocs before failing
 it will stress the system pretty hard. Split it in chunks of 4K or something.
Changed to use vmalloc/vfree - don't need physical contiguity.
 



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V2] VFIO driver: Non-privileged user level PCI drivers

2010-06-10 Thread Tom Lyon

On Thursday 10 June 2010 10:27:36 am Konrad Rzeszutek Wilk wrote:
  +EXPORT_SYMBOL(uiommu_fdget);
 
 EXPORT_SYMBOL_GPL
 .. snip
  +EXPORT_SYMBOL(uiommu_put);
 
 ditto.
 

Is there a definitive explanation somewhere of when to use each?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V2] VFIO driver: Non-privileged user level PCI drivers

2010-06-08 Thread Tom Lyon

On Tuesday 08 June 2010 03:38:44 pm Michael S. Tsirkin wrote:
 On Tue, Jun 08, 2010 at 02:21:52PM -0700, Tom Lyon wrote:
  The VFIO driver is used to allow privileged AND non-privileged processes 
  to 
  implement user-level device drivers for any well-behaved PCI, PCI-X, and 
  PCIe
  devices.
  Signed-off-by: Tom Lyon p...@cisco.com
 
 Some general comments:
 - Please pass this through ./scripts/checkpatch.pl to fix some formatting.
I did. What did you find?

 - Lots of hard-coded constants. Please try using pci_regs.h much more,
   where not possible please add named enums.
This is mostly for lengths not specified in pci_regs, but given in standards 
docs.

 - There are places where you get parameters from userspace and pass them
   on to kmalloc etc. Everything you get from userspace needs to be
   validated.
I  thought I had. Thats what more eyeballs are for.

 - You play non-standard tricks with minor numbers.
   Won't it be easier to just make udev create a node
   for the device in the way everyone does it? The name could be
   descriptive including e.g. bus/dev/fn so userspace can find
   your device.
I just copied what uio was doing. What is the way everyone does it?
 
 - I note that if we exclude the iommu mapping, the rest conceptually could 
 belong
   in pci_generic driver in uio. So if we move these ioctls to the iommu 
 driver,
   as Avi suggested, then vfio can be part of the uio framework.
But the interrupt handling is different in uio; uio doesn't support read or 
write calls
to read and write registers or memory, and it doesn't support ioctls at all for 
other
misc stuff.  If we could blow off backwards compatibility with uio, then, sure 
we
could have a nice unified solution.

  ---
  This version now requires an IOMMU domain to be set before any access to
  device registers is granted (except that config space may be read).  In
  addition, the VFIO_DMA_MAP_ANYWHERE is dropped - it used the dma_map_sg API
  which does not have sufficient controls around IOMMU usage.  The IOMMU 
  domain
  is obtained from the 'uiommu' driver which is included in this patch.
  
  Various locking, security, and documentation issues have also been fixed.
  
  Please commit - it or me!
  But seriously, who gets to commit this? Avi for KVM? or GregKH for drivers?
  
  Blurb from previous patch version:
  
  This patch is the evolution of code which was first proposed as a patch to
  uio/uio_pci_generic, then as a more generic uio patch. Now it is taken 
  entirely
  out of the uio framework, and things seem much cleaner. Of course, there is
  a lot of functional overlap with uio, but the previous version just seemed
  like a giant mode switch in the uio code that did not lead to clarity for
  either the new or old code.
  
  [a pony for avi...]
  The major new functionality in this version is the ability to deal with
  PCI config space accesses (through read  write calls) - but includes table
  driven code to determine whats safe to write and what is not. Also, some
  virtualization of the config space to allow drivers to think they're writing
  some registers when they're not.  Also, IO space accesses are also allowed.
  Drivers for devices which use MSI-X are now prevented from directly writing
  the MSI-X vector area.
 
 This table adds a lot of complexity to the code,
 and I don't really understand why we need this code in
 kernel: isn't the point of iommu that it protects us
 from buggy devices? If yes, we should be able to
 just ask userspace to be careful and avoid doing silly things
 like overwriting MSI-X vectors, and if it's not careful,
 no one gets hurt :)
 
 If some registers absolutely must be protected,
 I think we should document this carefully in code.
 
  +/*
  + * MSI and MSI-X Interrupt handler.
  + * Just signal an event
  + */
  +static irqreturn_t msihandler(int irq, void *arg)
  +{
  +   struct eventfd_ctx *ctx = arg;
  +
  +   eventfd_signal(ctx, 1);
  +   return IRQ_HANDLED;
  +}
  +
  +void vfio_disable_msi(struct vfio_dev *vdev)
  +{
  +   struct pci_dev *pdev = vdev-pdev;
  +
  +   if (vdev-ev_msi) {
  +   eventfd_ctx_put(vdev-ev_msi);
  +   free_irq(pdev-irq, vdev-ev_msi);
  +   vdev-ev_msi = NULL;
  +   }
  +   pci_disable_msi(pdev);
  +}
  +
  +int vfio_enable_msi(struct vfio_dev *vdev, int fd)
  +{
  +   struct pci_dev *pdev = vdev-pdev;
  +   struct eventfd_ctx *ctx;
  +   int ret;
  +
  +   ctx = eventfd_ctx_fdget(fd);
  +   if (IS_ERR(ctx))
  +   return PTR_ERR(ctx);
  +   vdev-ev_msi = ctx;
  +   pci_enable_msi(pdev);
  +   ret = request_irq(pdev-irq, msihandler, 0,
  +   vdev-name, ctx);
  +   if (ret) {
  +   eventfd_ctx_put(ctx);
  +   pci_disable_msi(pdev);
  +   vdev-ev_msi = NULL;
  +   }
  +   return ret;
  +}
  +
  +void vfio_disable_msix(struct vfio_dev *vdev)
  +{
  +   struct pci_dev *pdev = vdev-pdev;
  +   int i;
  +
  +   if (vdev-ev_msix  vdev-msix) {
  +   for (i = 0; i

Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers

2010-06-07 Thread Tom Lyon

On Sunday 06 June 2010 02:54:51 am Michael S. Tsirkin wrote:
 On Thu, Jun 03, 2010 at 02:41:38PM -0700, Tom Lyon wrote:
  OK, in the interest of making progress, I am about to embark on the 
  following:
  
  1. Create a user-iommu-domain driver - opening it will give a new empty 
  domain.
  Ultimately this can also populate sysfs with the state of its world, 
  which would
  also be a good addition to the base iommu stuff.
  If someone closes the fd while in use, the domain stays valid anyway 
  until users
  drop off.
  
  2. Add DOMAIN_SET and DOMAIN_UNSET ioctls to the vfio driver.  Require that
 a domain be set before using the VFIO_DMA_MAP_IOVA ioctl
 
 Require domain to be set before you allow any access to the device:
 mmap, write, read.  IMO this is the only safe way to make sure userspace
 does not corrupt memory, and this removes the need to special-case
 MSI memory, play with bus master enable and hope it can be cleared without
 reset, etc.

Michael - the light bulb finally lit for me and I now understand what you've 
been
saying the past few weeks.  Of course you're right - we need iommu set before 
any
register access.  I had thought that was done by default but now I see that the 
dma_map_sg routine only attaches to the iommu on demand.

So I will torpedo the MAP_ANYWHERE stuff. I'd like to keep the MAP_IOVA ioctl
with the vfio fd so that the user can still do everything with one fd. I'm 
thinking the
fd opens and iommu bindings could be done in a program before spinning out the
program with the user driver.
 
  (this is the one
 that KVM wants).
 
 Not sure I understand. I think that MAP should be done on the domain,
 not the device, this handles pinning pages correctly and
 this way you don't need any special checks.
 
 However, the VFIO_DMA_MAP_ANYWHERE ioctl is the one
 which uses the dma_sg interface which has no expicit control of domains. 
  I
 intend to keep it the way it is, but expect only non-hypervisor programs 
  would
 want to use it.
 
 If we support MAP_IOVA, why is MAP_ANYWHERE useful? Can't
 non-hypervisors just pick an address?
 
  3. Clean up the docs and other nits that folks have found.
  
  Comments? 
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers

2010-06-03 Thread Tom Lyon

OK, in the interest of making progress, I am about to embark on the following:

1. Create a user-iommu-domain driver - opening it will give a new empty domain.
Ultimately this can also populate sysfs with the state of its world, which 
would
also be a good addition to the base iommu stuff.
If someone closes the fd while in use, the domain stays valid anyway until 
users
drop off.

2. Add DOMAIN_SET and DOMAIN_UNSET ioctls to the vfio driver.  Require that
   a domain be set before using the VFIO_DMA_MAP_IOVA ioctl (this is the one
   that KVM wants).  However, the VFIO_DMA_MAP_ANYWHERE ioctl is the one
   which uses the dma_sg interface which has no expicit control of domains. I
   intend to keep it the way it is, but expect only non-hypervisor programs 
would
   want to use it.

3. Clean up the docs and other nits that folks have found.

Comments? 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers

2010-06-02 Thread Tom Lyon

On Wednesday 02 June 2010 10:46:15 am Chris Wright wrote:
 * Joerg Roedel (j...@8bytes.org) wrote:
  On Wed, Jun 02, 2010 at 02:21:00PM +0300, Michael S. Tsirkin wrote:
   On Wed, Jun 02, 2010 at 01:12:25PM +0200, Joerg Roedel wrote:
  
Even if it is bound to a domain the userspace driver could program the
device to do dma to unmapped regions causing io-page-faults. The kernel
can't do anything about it.
   
   It can always corrupt its own memory directly as well :)
   But that is not a reason not to detect errors if we can,
   and not to make APIs hard to misuse.
  
  Changing the domain of a device while dma can happen is the same type of
  bug as unmapping potential dma target addresses. We can't catch this
  kind of misuse.
  
 With 10 devices you have 10 extra ioctls.

And this works implicitly with your proposal?
   
   Yes.  so you do:
   iommu = open
   ioctl(dev1, BIND, iommu)
   ioctl(dev2, BIND, iommu)
   ioctl(dev3, BIND, iommu)
   ioctl(dev4, BIND, iommu)
   
   No need to add a SHARE ioctl.
  
  In my proposal this looks like:
  
  
  dev1 = open();
  ioctl(dev2, SHARE, dev1);
  ioctl(dev3, SHARE, dev1);
  ioctl(dev4, SHARE, dev1);
  
  So we actually save an ioctl.
 
 This is not any hot path, so saving an ioctl shouldn't be a consideration.
 Only important consideration is a good API.  I may have lost context here,
 but the SHARE API is limited to the vfio fd.  The BIND API expects a new
 iommu object.  Are there other uses for this object?  Tom's current vfio
 driver exposes a dma mapping interface, would the iommu object expose
 one as well?  Current interface is device specific DMA interface for
 host device drivers typically mapping in-flight dma buffers, and IOMMU
 specific interface for assigned devices typically mapping entire virtual
 address space.

Actually, it a domain object - which may be usable among iommus (Joerg?).
However, you can't really do the dma mapping with just the domain because
every device supports a different size address space as a master, i.e.,
the dma_mask.

And I don't know how kvm would deal with devices with varying dma mask support,
or why they'd be in the same domain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers

2010-06-01 Thread Tom Lyon

On Tuesday 01 June 2010 03:46:51 am Michael S. Tsirkin wrote:
 On Tue, Jun 01, 2010 at 01:28:48PM +0300, Avi Kivity wrote:
  On 06/01/2010 12:55 PM, Michael S. Tsirkin wrote:
 
It can't program the iommu.
  What
  the patch proposes is that userspace tells vfio about the needed
  mappings, and vfio programs the iommu.
   
  There seems to be some misunderstanding.  The userspace interface
  proposed forces a separate domain per device and forces userspace to
  repeat iommu programming for each device.  We are better off sharing a
  domain between devices and programming the iommu once.
 
 
iommufd = open(/dev/iommu);
ioctl(iommufd, IOMMUFD_ASSIGN_RANGE, ...)
ioctl(vfiofd, VFIO_SET_IOMMU, iommufd)
 
  ?
 
 Yes.
 
  If so, I agree.
 
 Good.

I'm not really opposed to multiple devices per domain, but let me point out how 
I
ended up here.  First, the driver has two ways of mapping pages, one based on 
the
iommu api and one based on the dma_map_sg api.  With the latter, the system
already allocates a domain per device and there's no way to control it. This was
presumably done to help isolation between drivers.  If there are multiple 
drivers
in the user level, do we not want the same isoation to apply to them?  
Also, domains are not a very scarce resource - my little core i5 has 256, 
and the intel architecture goes to 64K.
And then there's the fact that it is possible to have multiple disjoint iommus 
on a system,
so it may not even be possible to bring 2 devices under one domain. 

Given all that, I am inclined to leave it alone until someone has a real 
problem.
Note that not sharing iommu domains doesn't mean you can't share device memory,
just that you have to do multiple mappings
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers

2010-06-01 Thread Tom Lyon

On Monday 31 May 2010 10:17:35 am Alan Cox wrote:
 
 Does look like it needs a locking audit, some memory and error checks
 reviewing and some further review of the ioctl security and
 overflows/trusted values.
Yes. Thanks for the detailed look.
 
 Rather a nice way of attacking the user space PCI problem.
And thanks for that!

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] VFIO driver: Non-privileged user level PCI drivers

2010-06-01 Thread Tom Lyon

On Tuesday 01 June 2010 09:29:47 pm Alex Williamson wrote:
 On Tue, 2010-06-01 at 13:28 +0300, Avi Kivity wrote:
  On 06/01/2010 12:55 PM, Michael S. Tsirkin wrote:
  
 It can't program the iommu.
   What
   the patch proposes is that userspace tells vfio about the needed
   mappings, and vfio programs the iommu.

   There seems to be some misunderstanding.  The userspace interface
   proposed forces a separate domain per device and forces userspace to
   repeat iommu programming for each device.  We are better off sharing a
   domain between devices and programming the iommu once.
  
  
 iommufd = open(/dev/iommu);
 ioctl(iommufd, IOMMUFD_ASSIGN_RANGE, ...)
 ioctl(vfiofd, VFIO_SET_IOMMU, iommufd)
 
 It seems part of the annoyance of the current KVM device assignment is
 that we have multiple files open, we mmap here, read there, write over
 there, maybe, if it's not emulated.  I quite like Tom's approach that we
 have one stop shopping with /dev/vfion, including config space
 emulation so each driver doesn't have to try to write their own.  So
 continuing with that, shouldn't we be able to add a GET_IOMMU/SET_IOMMU
 ioctl to vfio so that after we setup one device we can bind the next to
 the same domain?

This is just what I was thinking.  But rather than a get/set, just use two fds.

ioctl(vfio_fd1, VFIO_SET_DOMAIN, vfio_fd2);

This may fail if there are really 2 different IOMMUs, so user code must be
prepared for failure,  In addition, this is strictlyupwards compatible with 
what is there now, so maybe we can add it later.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V3] drivers/uio/uio_pci_generic.c: allow access for non-privileged processes

2010-04-30 Thread Tom Lyon

Michael, et al - sorry for the delay, but I've been digesting the comments and 
researching new approaches.

I think the plan for V4 will be to take things entirely out of the UIO 
framework, and instead have a driver which supports user mode use of 
well-behaved PCI devices. I would like to use read and write to support 
access to memory regions, IO regions,  or PCI config space. Config space is a 
bitch because not everything is safe to read or write, but I've come up with a 
table driven approach which can be run-time extended for non-compliant devices 
(under root control) which could then enable non-privileged users. For 
instance, OHCI 1394 devices use a dword in config space which is not formatted 
as a PCI capability, root can use sysfs to enable access:
echo offset readbits writebits  
/sys/dev/pci/devices/:xx:xx.x/yyy/config_permit


A well-behaved PCI device must have memory BARs = 4K for mmaping, must have 
separate memory space for MSI-X that does not need mmaping
by the user driver, must support the PCI 2.3 interrupt masking, and must not go 
totally crazy with PCI config space (tg3 is real ugly, e1000 is fine).

Again, my primary usage model is for direct user-level access to network 
devices, not for virtualization, but I think both will work.

So, I will go outside UIO because:
1 - it doesn't allow reads and writes to sub-drivers, just irqcontrol
2 - it doesn't have ioctls
3 - it has its own interrupt model which doesn't use eventfds
4 - it's ugly doing the new stuff and maintaining backwards compat.

I hereby solicit comments on the name and location for the new driver.

Michael - some of your comments below imply you didn't look at the companion 
changes to uio.c, which had the eventfd interrupts and effectively the same 
iommu handling - but see my inline comments below.

On Wednesday 21 April 2010 02:38:49 am Michael S. Tsirkin wrote:
 On Mon, Apr 19, 2010 at 03:05:35PM -0700, Tom Lyon wrote:
  
  These are changes to uio_pci_generic.c to allow better use of the driver by
  non-privileged processes.
  1. Add back old code which allowed interrupt re-enablement through uio fd.
  2. Translate PCI bards to uio mmap regions, to allow mmap through uio fd.
 
 Since it's common for drivers to need configuration cycles
 for device control, the above 2 are not sufficient for generic devices.
 And if you fix the above, you won't need irqcontrol,
 which IMO we are better off saving for stuff like eventfd mapping.
I will handle config access for well-behaved devices.

 
 Also - this modifies a kernel/userspace interface in a way
 that makes an operation that was always safe previously
 potentially unsafe.
Not sure what you meant, but probably irrelevant in new scheme.
 
 Also, BAR regions could be less than 1 page in size,
 mapping these to unpriveledged process is a security problem.
Agreed, no mmaping, just r/w.

 Also, for a generic driver, we likely need write combining
 support in the interface.
Given that many system platforms don't have it, doesn't seem like a big deal.
But I'll look into it.

 Also, io space often can not be mmaped. We need read/write
 for that.
Agreed.

 
  3. Allow devices which support MSI or MSI-X, but not IRQ.
 
 If the device supports MSI or MSI-X, it can perform
 PCI writes upstream, and MSI-X vectors are controlled
 through memory. So with MSI-X + mmap to an unpriveledged
 process you can easily cause the device to modify any memory.
Yes, will virtualize this in the driver. User level will not be allowed to
mmap real MSI-X region (if MSI-X in use); MSI config writes will be intercepted.

 With MSI it's hard to be sure, maybe some devices might make guarantees
 not to do writes except for MSI, but there's no generic way to declare
 that: bus master needs to be enabled for MSI to work, and once bus
 master is enabled, nothing seems to prevent the device from corrupting
 host memory.
The code already requires iommu protection for masters, I will make sure this
includes MSI and MSI-X devices.

As an aside, the IOMMU is supposed to be able to do interrupt translation also,
but the format for vectors changes, so it doesn't really help with 
virtualization.

 So the patch doesn't look like generic enough or safe enough
 for users I have in mind (virtualization). What users/devices
 do you have in mind?
Non-virt, just new user level drivers for special cases.

 For virtualization, I've been thinking about unpriviledged access and
 msi as well, and here's a plan I thought might work:
 
 - add a uio_iommu character device that controls an iommu domain
 - uio_iommu would make sure iommu is programmed in a safe way
 - use irqcontrol to bind pci device to iommu domain
 - allow config cycles through uio fd, but
   force bus master to 0 unless device is bound to a domain
 - for sub-page regions, and io, we can't allow mmap to an unpriveledged
   process. extend irqcontrol to allow read/write and range-check the
   operations
 - for msi/msix, drivers use multiple

[PATCH V3] drivers/uio/uio_pci_generic.c: allow access for non-privileged processes

2010-04-19 Thread Tom Lyon


These are changes to uio_pci_generic.c to allow better use of the driver by
non-privileged processes.
1. Add back old code which allowed interrupt re-enablement through uio fd.
2. Translate PCI bards to uio mmap regions, to allow mmap through uio fd.
3. Allow devices which support MSI or MSI-X, but not IRQ.
Signed-off-by: Tom Lyon p...@cisco.com
---
checkpatch.pl is happy with this one.

--- linux-2.6.33/drivers/uio/uio_pci_generic.c  2010-02-24 10:52:17.0 
-0800
+++ mylinux-2.6.33/drivers/uio/uio_pci_generic.c2010-04-19 
14:57:21.0 -0700
@@ -14,9 +14,10 @@
  * # ls -l /sys/bus/pci/devices/:00:19.0/driver
  * .../:00:19.0/driver - ../../../bus/pci/drivers/uio_pci_generic
  *
- * Driver won't bind to devices which do not support the Interrupt Disable Bit
- * in the command register. All devices compliant to PCI 2.3 (circa 2002) and
- * all compliant PCI Express devices should support this bit.
+ * Driver won't bind to devices which do not support MSI, MSI-x, or the
+ * Interrupt Disable Bit in the command register. All devices compliant
+ * to PCI 2.3 (circa 2002) and all compliant PCI Express devices should
+ * support one of these.
  */
 
 #include linux/device.h
@@ -27,7 +28,7 @@
 
 #define DRIVER_VERSION 0.01.0
 #define DRIVER_AUTHOR  Michael S. Tsirkin m...@redhat.com
-#define DRIVER_DESCGeneric UIO driver for PCI 2.3 devices
+#define DRIVER_DESCGeneric UIO driver for PCIe  PCI 2.3 devices
 
 struct uio_pci_generic_dev {
struct uio_info info;
@@ -41,6 +42,39 @@ to_uio_pci_generic_dev(struct uio_info *
return container_of(info, struct uio_pci_generic_dev, info);
 }
 
+/* Read/modify/write command register to disable interrupts.
+ * Note: we could cache the value and optimize the read if there was a way to
+ * get notified of user changes to command register through sysfs.
+ * */
+static void irqtoggle(struct uio_pci_generic_dev *gdev, int irq_on)
+{
+   struct pci_dev *pdev = gdev-pdev;
+   unsigned long flags;
+   u16 orig, new;
+
+   spin_lock_irqsave(gdev-lock, flags);
+   pci_block_user_cfg_access(pdev);
+   pci_read_config_word(pdev, PCI_COMMAND, orig);
+   new = irq_on ? (orig  ~PCI_COMMAND_INTX_DISABLE)
+: (orig | PCI_COMMAND_INTX_DISABLE);
+   if (new != orig)
+   pci_write_config_word(gdev-pdev, PCI_COMMAND, new);
+   pci_unblock_user_cfg_access(pdev);
+   spin_unlock_irqrestore(gdev-lock, flags);
+}
+
+/* irqcontrol is use by userspace to enable/disable interrupts. */
+/* A privileged app can write the PCI_COMMAND register directly,
+ * but we need this for normal apps
+ */
+static int irqcontrol(struct uio_info *info, s32 irq_on)
+{
+   struct uio_pci_generic_dev *gdev = to_uio_pci_generic_dev(info);
+
+   irqtoggle(gdev, irq_on);
+   return 0;
+}
+
 /* Interrupt handler. Read/modify/write the command register to disable
  * the interrupt. */
 static irqreturn_t irqhandler(int irq, struct uio_info *info)
@@ -89,7 +123,7 @@ done:
 /* Verify that the device supports Interrupt Disable bit in command register,
  * per PCI 2.3, by flipping this bit and reading it back: this bit was readonly
  * in PCI 2.2. */
-static int __devinit verify_pci_2_3(struct pci_dev *pdev)
+static int verify_pci_2_3(struct pci_dev *pdev)
 {
u16 orig, new;
int err = 0;
@@ -121,17 +155,52 @@ err:
return err;
 }
 
-static int __devinit probe(struct pci_dev *pdev,
+/* we could've used the generic pci sysfs stuff for mmap,
+ * but this way we can allow non-privileged users as long
+ * as /dev/uio* has the right permissions
+ */
+static void uio_do_maps(struct uio_pci_generic_dev *gdev)
+{
+   struct pci_dev *pdev = gdev-pdev;
+   struct uio_info *info = gdev-info;
+   int i, j;
+   char *name;
+
+   for (i = 0, j = 0; i  PCI_STD_RESOURCE_END  j  MAX_UIO_MAPS; i++) {
+   if (pci_resource_flags(pdev, i)  IORESOURCE_MEM) {
+   name = kmalloc(8, GFP_KERNEL);
+   if (name == NULL)
+   break;
+   sprintf(name, membar%d, i);
+   info-mem[j].name = name;
+   info-mem[j].addr = pci_resource_start(pdev, i);
+   info-mem[j].size = pci_resource_len(pdev, i);
+   info-mem[j].memtype = UIO_MEM_PHYS;
+   j++;
+   }
+   }
+   for (i = 0, j = 0; i  PCI_STD_RESOURCE_END 
+  j  MAX_UIO_PORT_REGIONS; i++) {
+   if (pci_resource_flags(pdev, i)  IORESOURCE_IO) {
+   name = kmalloc(8, GFP_KERNEL);
+   if (name == NULL)
+   break;
+   sprintf(name, iobar%d, i);
+   info-port[j].name = name;
+   info-port[j].start = pci_resource_start(pdev, i);
+   info-port[j

Re: [PATCH V2] drivers/uio/uio.c: DMA mapping, interrupt extensions, etc.

2010-04-17 Thread Tom Lyon

The current uio and uio_pci_generic allow multiple opens; I was just preserving 
that behavior.

On Saturday 17 April 2010 03:43:09 am Joerg Roedel wrote:
 On Thu, Apr 15, 2010 at 01:55:29PM -0700, Tom Lyon wrote:
 
  +   down(idev-gate);
  +   if (idev-listeners == 0) { /* first open */
  +   if (idev-pmaster  !iommu_found()  !capable(CAP_SYS_RAWIO)) 
  {
  +   up(idev-gate);
  +   return -EPERM;
  +   }
  +   /* reset to known state if we can */
  +   if (idev-pdev)
  +   (void) pci_reset_function(idev-pdev);
  +   }
  +   idev-listeners++;
  +   up(idev-gate);
 
 Why do you want to allow multiple opens for a device? Is it not
 sufficient to allow only one?
 
 
  +   if (idev-domain == NULL) {
  +   if (!list_empty(listener-dm_list))/* no mix with anywhere 
  */
  +   return -EINVAL;
  +   if (!iommu_found())
  +   return -EINVAL;
  +   idev-domain = iommu_domain_alloc();
  +   if (idev-domain == NULL)
  +   return -ENXIO;
  +   idev-cachec = iommu_domain_has_cap(idev-domain,
  +   IOMMU_CAP_CACHE_COHERENCY);
  +   ret = iommu_attach_device(idev-domain, idev-dev-parent);
  +   if (ret) {
  +   iommu_domain_free(idev-domain);
  +   idev-domain = NULL;
  +   printk(KERN_ERR %s: device_attach failed %d\n, 
  __func__, ret);
  +   return ret;
  +   }
  +   }
 
 If userspace calls this path this will make all the addresses mapped
 with DMA-API paths unusable by the device. This doesn't look like a sane
 userspace interface.
 
 For better and more in-depth review I suggest that you split up this
 large patch into a series of smaler which implement specific aspects of
 your work.
 
   Joerg
 
 P.S.: I got these warning when applying your patches ...
 
 Applying: drivers/uio/uio_pci_generic.c: allow access for non-privileged 
 processes
 /home/joro/src/linux.trees.git/.git/rebase-apply/patch:86: trailing 
 whitespace.
 info-mem[j].name = name; 
 /home/joro/src/linux.trees.git/.git/rebase-apply/patch:87: trailing 
 whitespace.
 info-mem[j].addr = pci_resource_start(pdev, i); 
 /home/joro/src/linux.trees.git/.git/rebase-apply/patch:89: trailing 
 whitespace.
 info-mem[j].memtype = UIO_MEM_PHYS; 
 warning: 3 lines add whitespace errors.
 Applying: drivers/uio/uio.c: DMA mapping, interrupt extensions, etc.
 /home/joro/src/linux.trees.git/.git/rebase-apply/patch:315: trailing 
 whitespace.
 ret = iommu_map_range(idev-domain, iova, 
 /home/joro/src/linux.trees.git/.git/rebase-apply/patch:325: trailing 
 whitespace.
 } 
 /home/joro/src/linux.trees.git/.git/rebase-apply/patch:366: trailing 
 whitespace.
  * adjacent pages, but noone seems to really do that. So we squash 
 /home/joro/src/linux.trees.git/.git/rebase-apply/patch:368: trailing 
 whitespace.
  * This works if (a) there is an iommu, or (b) the user allocates 
 /home/joro/src/linux.trees.git/.git/rebase-apply/patch:578: trailing 
 whitespace.
 unsigned int cmd, unsigned long arg) 
 warning: squelched 1 whitespace error
 warning: 6 lines add whitespace errors.
 
 And there are also some coding-style issues.
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V2] drivers/uio/uio_pci_generic.c: allow access for non-privileged processes

2010-04-15 Thread Tom Lyon

This is the firt of 2 related, but independent, patches. This is for 
uio_pci_generic.c, the next is for uio.c.

The 2 patches were previously one large patch. Changes for this version:
- uio_pci_generic.c just gets extensions so that a single fd can be used
  by non-privileged processes for interrupt control and mmaps
- All of the DMA and IOMMU related stuff move to uio.c; no longer a need
  to pass ioctls to individual uio drivers. It turns out that the code 
  is not PCI specific anyways.
- A new ioctl to pin DMA buffers to certain IO virtual addresses for KVM.
- New eventfd based interrupt notifications, including support for PCI
  specific MSI and MSI-X interrupts.
- PCI specific code to reset PCI functions before and after use

--- linux-2.6.33/drivers/uio/uio_pci_generic.c  2010-02-24 10:52:17.0 
-0800
+++ uio-2.6.33/drivers/uio/uio_pci_generic.c2010-04-15 13:44:25.0 
-0700
@@ -14,9 +14,9 @@
  * # ls -l /sys/bus/pci/devices/:00:19.0/driver
  * .../:00:19.0/driver - ../../../bus/pci/drivers/uio_pci_generic
  *
- * Driver won't bind to devices which do not support the Interrupt Disable Bit
+ * Driver won't bind to devices which do not support MSI, MSI-x, or the 
Interrupt Disable Bit
  * in the command register. All devices compliant to PCI 2.3 (circa 2002) and
- * all compliant PCI Express devices should support this bit.
+ * all compliant PCI Express devices should support one of these.
  */
 
 #include linux/device.h
@@ -41,6 +41,39 @@
return container_of(info, struct uio_pci_generic_dev, info);
 }
 
+/* Read/modify/write command register to disable interrupts.
+ * Note: we could cache the value and optimize the read if there was a way to
+ * get notified of user changes to command register through sysfs.
+ * */
+static void irqtoggle(struct uio_pci_generic_dev *gdev, int irq_on)
+{
+   struct pci_dev *pdev = gdev-pdev;
+   unsigned long flags;
+   u16 orig, new;
+
+   spin_lock_irqsave(gdev-lock, flags);
+   pci_block_user_cfg_access(pdev);
+   pci_read_config_word(pdev, PCI_COMMAND, orig);
+   new = irq_on ? (orig  ~PCI_COMMAND_INTX_DISABLE)
+   : (orig | PCI_COMMAND_INTX_DISABLE);
+   if (new != orig)
+   pci_write_config_word(gdev-pdev, PCI_COMMAND, new);
+   pci_unblock_user_cfg_access(pdev);
+   spin_unlock_irqrestore(gdev-lock, flags);
+}
+
+/* irqcontrol is use by userspace to enable/disable interrupts. */
+/* A privileged app can write the PCI_COMMAND register directly,
+ * but we need this for normal apps
+ */
+static int irqcontrol(struct uio_info *info, s32 irq_on)
+{
+   struct uio_pci_generic_dev *gdev = to_uio_pci_generic_dev(info);
+
+   irqtoggle(gdev, irq_on);
+   return 0;
+}
+
 /* Interrupt handler. Read/modify/write the command register to disable
  * the interrupt. */
 static irqreturn_t irqhandler(int irq, struct uio_info *info)
@@ -89,7 +122,7 @@
 /* Verify that the device supports Interrupt Disable bit in command register,
  * per PCI 2.3, by flipping this bit and reading it back: this bit was readonly
  * in PCI 2.2. */
-static int __devinit verify_pci_2_3(struct pci_dev *pdev)
+static int verify_pci_2_3(struct pci_dev *pdev)
 {
u16 orig, new;
int err = 0;
@@ -121,17 +154,51 @@
return err;
 }
 
-static int __devinit probe(struct pci_dev *pdev,
+/* we could've used the generic pci sysfs stuff for mmap,
+ * but this way we can allow non-privileged users as long
+ * as /dev/uio* has the right permissions
+ */
+static void uio_do_maps(struct uio_pci_generic_dev *gdev)
+{
+   struct pci_dev *pdev = gdev-pdev;
+   struct uio_info *info = gdev-info;
+   int i, j;
+   char *name;
+
+   for (i=0, j=0; iPCI_STD_RESOURCE_END  jMAX_UIO_MAPS; i++) {
+   if (pci_resource_flags(pdev, i)  IORESOURCE_MEM) {
+   name = kmalloc(8, GFP_KERNEL);
+   if (name == NULL)
+   break;
+   sprintf(name, membar%d, i);
+   info-mem[j].name = name; 
+   info-mem[j].addr = pci_resource_start(pdev, i); 
+   info-mem[j].size = pci_resource_len(pdev, i);
+   info-mem[j].memtype = UIO_MEM_PHYS; 
+   j++;
+   }
+   }
+   for (i=0, j=0; iPCI_STD_RESOURCE_END  jMAX_UIO_PORT_REGIONS; i++) {
+   if (pci_resource_flags(pdev, i)  IORESOURCE_IO) {
+   name = kmalloc(8, GFP_KERNEL);
+   if (name == NULL)
+   break;
+   sprintf(name, iobar%d, i);
+   info-port[j].name = name;
+   info-port[j].start = pci_resource_start(pdev, i);
+   info-port[j].size = pci_resource_len(pdev, i);
+   info-port[j].porttype = UIO_PORT_X86;
+   j++;
+   }
+

[PATCH V2] drivers/uio/uio.c: DMA mapping, interrupt extensions, etc.

2010-04-15 Thread Tom Lyon

This is the second of 2 related, but independent, patches. This is for 
uio.c, the previous is for uio_pci_generic.c.

The 2 patches were previously one large patch. Changes for this version:
- uio_pci_generic.c just gets extensions so that a single fd can be used
  by non-privileged processes for interrupt control and mmaps
- All of the DMA and IOMMU related stuff move to uio.c; no longer a need
  to pass ioctls to individual uio drivers. It turns out that the code 
  is not PCI specific anyways.
- A new ioctl to pin DMA buffers to certain IO virtual addresses for KVM.
- New eventfd based interrupt notifications, including support for PCI
  specific MSI and MSI-X interrupts.
- PCI specific code to reset PCI functions before and after use

diff -ruNP linux-2.6.33/drivers/uio/uio.c uio-2.6.33/drivers/uio/uio.c
--- linux-2.6.33/drivers/uio/uio.c  2010-02-24 10:52:17.0 -0800
+++ uio-2.6.33/drivers/uio/uio.c2010-04-15 12:39:02.0 -0700
@@ -23,6 +23,11 @@
 #include linux/string.h
 #include linux/kobject.h
 #include linux/uio_driver.h
+#include linux/mmu_notifier.h
+#include linux/pci.h
+#include linux/iommu.h
+#include linux/eventfd.h
+#include linux/semaphore.h
 
 #define UIO_MAX_DEVICES 255
 
@@ -37,8 +42,37 @@
struct uio_info *info;
struct kobject  *map_dir;
struct kobject  *portio_dir;
+   struct pci_dev  *pdev;
+   int pmaster;
+   struct semaphoregate;
+   int listeners;
+   atomic_tmapcount;
+   struct msix_entry   *msix;
+   int nvec;
+   struct iommu_domain *domain;
+   int cachec;
+   struct eventfd_ctx  *ev_irq;
+   struct eventfd_ctx  *ev_msi;
+   struct eventfd_ctx  **ev_msix;
 };
 
+/*
+ * Structure for keeping track of memory nailed down by the
+ * user for DMA
+ */
+struct dma_map_page {
+struct list_head list;
+struct page **pages;
+   struct scatterlist *sg;
+dma_addr_t  daddr;
+   unsigned long   vaddr;
+   int npage;
+   int rdwr;
+};
+
+static void uio_disable_msi(struct uio_device *);
+static void uio_disable_msix(struct uio_device *);
+
 static int uio_major;
 static DEFINE_IDR(uio_idr);
 static const struct file_operations uio_fops;
@@ -440,17 +474,38 @@
struct uio_device *idev = (struct uio_device *)dev_id;
irqreturn_t ret = idev-info-handler(irq, idev-info);
 
-   if (ret == IRQ_HANDLED)
+   if (ret != IRQ_HANDLED)
+   return ret;
+   if (idev-ev_irq)
+   eventfd_signal(idev-ev_irq, 1);
+   else
uio_event_notify(idev-info);
 
return ret;
 }
 
+/*
+ * MSI and MSI-X Interrupt handler.
+ * Just record an event
+ */
+static irqreturn_t msihandler(int irq, void *arg)
+{
+   struct eventfd_ctx *ctx = arg;
+
+   eventfd_signal(ctx, 1);
+   return IRQ_HANDLED;
+}
+
 struct uio_listener {
struct uio_device *dev;
s32 event_count;
+   struct mm_struct*mm;
+   struct mmu_notifier mmu_notifier;
+   struct list_headdm_list;
 };
 
+static void uio_dma_unmapall(struct uio_listener *);
+
 static int uio_open(struct inode *inode, struct file *filep)
 {
struct uio_device *idev;
@@ -470,7 +525,7 @@
goto out;
}
 
-   listener = kmalloc(sizeof(*listener), GFP_KERNEL);
+   listener = kzalloc(sizeof(*listener), GFP_KERNEL);
if (!listener) {
ret = -ENOMEM;
goto err_alloc_listener;
@@ -478,8 +533,22 @@
 
listener-dev = idev;
listener-event_count = atomic_read(idev-event);
+   INIT_LIST_HEAD(listener-dm_list);
filep-private_data = listener;
 
+   down(idev-gate);
+   if (idev-listeners == 0) { /* first open */
+   if (idev-pmaster  !iommu_found()  !capable(CAP_SYS_RAWIO)) 
{
+   up(idev-gate);
+   return -EPERM;
+   }
+   /* reset to known state if we can */
+   if (idev-pdev)
+   (void) pci_reset_function(idev-pdev);
+   }
+   idev-listeners++;
+   up(idev-gate);
+
if (idev-info-open) {
ret = idev-info-open(idev-info, inode);
if (ret)
@@ -514,6 +583,34 @@
if (idev-info-release)
ret = idev-info-release(idev-info, inode);
 
+   uio_dma_unmapall(listener);
+   if (listener-mm) {
+   mmu_notifier_unregister(listener-mmu_notifier, listener-mm);
+   listener-mm = NULL;
+   }
+
+   down(idev-gate);
+   if (--idev-listeners = 0) {
+   if (idev-msix) {
+   uio_disable_msix(idev);
+   }
+   if (idev-ev_msi) {
+   uio_disable_msi(idev);
+

Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for non-privileged processes

2010-04-09 Thread Tom Lyon

On Friday 09 April 2010 02:58:19 am Avi Kivity wrote:
 On 04/02/2010 08:05 PM, Greg KH wrote:
 
  Currently kvm does device assignment with its own code, I'd like to unify
  it with uio, not split it off.
 
  Separate notifications for msi-x interrupts are just as useful for uio as
  they are for kvm.
   
  I agree, there should not be a difference here for KVM vs. the normal
  version.
 
 
 Just so you know what you got into, here are the kvm requirements:
 
 - msi interrupts delivered via eventfd (these allow us to inject 
 interrupts from uio to a guest without going through userspace)
Check.
 - nonlinear iommu mapping (i.e. map discontiguous ranges of the device 
 address space into ranges of the virtual address space)
Check.
 - dynamic iommu mapping (support guest memory hotplug)
Check.
 - unprivileged operation once an admin has assigned a device (my 
 preferred implementation is to have all operations go through an fd, 
 which can be passed via SCM_RIGHTS from a privileged application that 
 opens the file)
Check.
 - access to all config space, but BARs must be translated so userspace 
 cannot attack the host
Please elaborate. All of PCI config? All of PCIe config? Seems like a huge mess.
 - some mechanism which allows us to affine device interrupts with their 
 target vcpus (eventually, this is vague)
Do-able.
 - anything mst might add
mst?
 - a pony
Rainbow or glitter?

The 'check' items are already done, not fully tested; probably available next 
week.
Can we leave the others for future patches? Please? And I definitely need help 
with 
the PCI config stuff.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] uio_pci_generic: extensions to allow access for non-privileged processes

2010-04-09 Thread Tom Lyon

Mea culpa. 

On Friday 09 April 2010 02:08:55 am Joerg Roedel wrote:
 Btw. This patch posting is broken. It suffers from line-wraps which make
 it impossible to apply as-is. I was able to fix it but please consider
 this in your next posting.
 
 On Wed, Mar 31, 2010 at 05:12:35PM -0700, Tom Lyon wrote:
 
  --- linux-2.6.33/drivers/uio/uio_pci_generic.c  2010-02-24 
  10:52:17.0 -0800
 ^
 Unexpected line-wrap.
 
 I also got some whitespace warnings when trying to apply it. Please make
 sure you fix this in the next version too.
 
 Thanks,
 
   Joerg
 
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for non-privileged processes

2010-04-01 Thread Tom Lyon

On Thursday 01 April 2010 02:09:09 am Avi Kivity wrote:
 On 04/01/2010 03:08 AM, Tom Lyon wrote:
  uio_pci_generic has previously been discussed on the KVM list, but this
  patch has nothing to do with KVM, so it is also going to LKML.

 (needs to go to lkml even if it was for kvm)

  The point of this patch is to beef up the uio_pci_generic driver so that
  a non-privileged user process can run a user level driver for most PCIe
  devices. This can only be safe if there is an IOMMU in the system with
  per-device domains.  Privileged users (CAP_SYS_RAWIO) are allowed if
  there is no IOMMU.
 
  Specifically, I seek to allow low-latency user level network drivers (non
  tcp/ip) which directly access SR-IOV style virtual network adapters, for
  use with packages such as OpenMPI.
 
  Key areas of change:
  - ioctl extensions to allow registration and dma mapping of memory
  regions, with lock accounting
  - support for mmu notifier driven de-mapping

 Note that current iommus/devices don't support restart-on-fault dma, so
 userspace drivers will have to lock memory so that it is not swapped
 out.  I don't think this prevents page migration, though.
The driver provides a way to lock memory for DMA; the mmu notifier support is 
to catch things when the user accidentally frees locked pages.

  - support for MSI and MSI-X interrupts (the intel 82599 VFs support only
  MSI-X)

 How does a userspace program receive those interrupts?
Same as other UIO drivers - by read()ing an event counter.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for non-privileged processes

2010-04-01 Thread Tom Lyon

On Thursday 01 April 2010 05:52:18 am Joerg Roedel wrote:
 On Wed, Mar 31, 2010 at 05:08:38PM -0700, Tom Lyon wrote:
  uio_pci_generic has previously been discussed on the KVM list, but this
  patch has nothing to do with KVM, so it is also going to LKML.

 But since you send it to the KVM list it should be suitable for KVM too,
 no?
I know not.


  The point of this patch is to beef up the uio_pci_generic driver so that
  a non-privileged user process can run a user level driver for most PCIe
  devices. This can only be safe if there is an IOMMU in the system with
  per-device domains.  Privileged users (CAP_SYS_RAWIO) are allowed if
  there is no IOMMU.

 If you rely on an IOMMU you can use the IOMMU-API instead of the DMA-API
 for dma mappings. This change makes this driver suitable for KVM use
 too. If the interface is designed clever enough we can even use it for
 IOMMU emulation for pass-through devices.
The use with privileged processes and no IOMMUs is still quite useful, so I'd 
rather stick with the DMA interface.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for non-privileged processes

2010-04-01 Thread Tom Lyon

On Thursday 01 April 2010 07:25:04 am Michael S. Tsirkin wrote:
 On Wed, Mar 31, 2010 at 05:08:38PM -0700, Tom Lyon wrote:
  uio_pci_generic has previously been discussed on the KVM list, but this
  patch has nothing to do with KVM, so it is also going to LKML.
 
  The point of this patch is to beef up the uio_pci_generic driver so that
  a non-privileged user process can run a user level driver for most PCIe
  devices. This can only be safe if there is an IOMMU in the system with
  per-device domains.

 Why? Per-guest domain should be safe enough.
I'm not sure what 'per-guest' means in an ordinary process context.

   Privileged users (CAP_SYS_RAWIO) are allowed if there is
  no IOMMU.

 qemu does not support it, I doubt this last option is worth having.
This is extremely useful in non IOMMU systems - again, we're talking ordinary 
processes, nothing to do with VMs. As long as the program can be trusted, 
e.g., in embedded apps.


  Specifically, I seek to allow low-latency user level network drivers (non
  tcp/ip) which directly access SR-IOV style virtual network adapters, for
  use with packages such as OpenMPI.
 
  Key areas of change:
  - ioctl extensions to allow registration and dma mapping of memory
  regions, with lock accounting
  - support for mmu notifier driven de-mapping
  - support for MSI and MSI-X interrupts (the intel 82599 VFs support only
  MSI-X)
  - allowing interrupt enabling and device register mapping all
  through /dev/uio* so that  permissions may be granted just by chmod
  on /dev/uio*

 For non-priveledged users, we need a way to enforce that
 device is bound to an iommu.
Right now I just use iommu_found - assuming that if we have one, it is in use. 
Something better would be nice.

 Further, locking really needs to be scoped with iommu domain existance
 and with iommu mappings: as long as a page is mapped in iommu,
 it must be locked. This patch does not seem to enforce that.
Sure it does. The DMA API - get_user_pages and dma_map_sg lock pages into the 
MMU and the IOMMU. The MMU notifier unlocks if the user forgets to do it 
explicitly.

 Also note that what we really want is a single iommu domain per guest,
 not per device.
For my networking applications, I will need the ability to talk to multiple 
devices on potentially separate IOMMUs. What would per-guest mean then?


 For this reason, I think we should address the problem somwwhat
 differently:
 - Create a character device to represent the iommu
 - This device will handle memory locking etc
 - Allow binding this device to iommu
 - Allow other operations only after iommu is bound
There are still per-device issues with locking - in particular the size of the 
device's DMA address space. The DMA API already handles this - why not use 
it? It would be nice to have a way to test whether a device is truly covered 
by an IOMMU, but today it appears that if an IOMMU exists, then it covers all 
devices (at least as far as I can see for x86).




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for non-privileged processes

2010-04-01 Thread Tom Lyon

On Thursday 01 April 2010 08:54:14 am Avi Kivity wrote:
 On 04/01/2010 06:39 PM, Tom Lyon wrote:
  - support for MSI and MSI-X interrupts (the intel 82599 VFs support
  only MSI-X)
 
  How does a userspace program receive those interrupts?
 
  Same as other UIO drivers - by read()ing an event counter.

 IIRC the usual event counter is /dev/uioX, what's your event counter now?
Exact same mechanism.


 kvm really wants the event counter to be an eventfd, that allows hooking
 it directly to kvm (which can inject an interrupt on an eventfd_signal),
 can you adapt your patch to do this?

My patch does not currently go anywhere near the read/fd logic of /dev/uioX.
I think a separate patch would be appropriate.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for non-privileged processes

2010-04-01 Thread Tom Lyon

On Thursday 01 April 2010 09:07:47 am Joerg Roedel wrote:
 On Thu, Apr 01, 2010 at 08:40:34AM -0700, Tom Lyon wrote:
  On Thursday 01 April 2010 05:52:18 am Joerg Roedel wrote:
The point of this patch is to beef up the uio_pci_generic driver so
that a non-privileged user process can run a user level driver for
most PCIe devices. This can only be safe if there is an IOMMU in the
system with per-device domains.  Privileged users (CAP_SYS_RAWIO) are
allowed if there is no IOMMU.
  
   If you rely on an IOMMU you can use the IOMMU-API instead of the
   DMA-API for dma mappings. This change makes this driver suitable for
   KVM use too. If the interface is designed clever enough we can even use
   it for IOMMU emulation for pass-through devices.
 
  The use with privileged processes and no IOMMUs is still quite useful, so
  I'd rather stick with the DMA interface.

 For the KVM use-case we need to be able to specify the io virtual
 address for a given process virtual address. This is not possible with
 the dma-api interface. So if we want to have uio-dma without an hardware
 iommu we need two distinct interfaces for userspace to cover all
 use-cases. I don't think its worth it to have two interfaces.

   Joerg

I started to add that capability but then realized that the IOMMU API also 
doesn't allow it. The map function allows a range of physically contiguous 
pages, not virtual.

My preferred approach would be to add a DMA_ATTR that would request allocation 
of DMA at a specific device/iommu address.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for non-privileged processes

2010-04-01 Thread Tom Lyon

On Thursday 01 April 2010 09:10:57 am Avi Kivity wrote:
 On 04/01/2010 07:06 PM, Tom Lyon wrote:
  On Thursday 01 April 2010 08:54:14 am Avi Kivity wrote:
  On 04/01/2010 06:39 PM, Tom Lyon wrote:
  - support for MSI and MSI-X interrupts (the intel 82599 VFs support
  only MSI-X)
 
  How does a userspace program receive those interrupts?
 
  Same as other UIO drivers - by read()ing an event counter.
 
  IIRC the usual event counter is /dev/uioX, what's your event counter
  now?
 
  Exact same mechanism.

 But there are multiple msi-x interrupts, how do you know which one
 triggered?

You don't. This would suck for KVM, I guess, but we'd need major rework of the 
generic UIO stuff to have a separate event channel for each MSI-X.

For my purposes, collapsing all the MSI-Xs into one MSI-look-alike is fine, 
because I'd be using MSI anyways if I could. The weird Intel 82599 VF only 
supports MSI-X.

So one big question is - do we expand the whole UIO framework for KVM 
requirements, or do we split off either KVM or non-VM into a separate driver?
Hans or Greg - care to opine?




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/1] uio_pci_generic: extensions to allow access for non-privileged processes

2010-04-01 Thread Tom Lyon

On Thursday 01 April 2010 08:54:14 am Avi Kivity wrote:
 On 04/01/2010 06:39 PM, Tom Lyon wrote:
  - support for MSI and MSI-X interrupts (the intel 82599 VFs support
  only MSI-X)
 
  How does a userspace program receive those interrupts?
 
  Same as other UIO drivers - by read()ing an event counter.

 IIRC the usual event counter is /dev/uioX, what's your event counter now?

 kvm really wants the event counter to be an eventfd, that allows hooking
 it directly to kvm (which can inject an interrupt on an eventfd_signal),
 can you adapt your patch to do this?

I looked further into eventfds - they seem the perfect solution for the 
MSI/MSI-X interrupts. Will include in V2.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/1] uio_pci_generic: extensions to allow access for non-privileged processes

2010-03-31 Thread Tom Lyon


diff -rupN linux-2.6.33/drivers/uio/uio.c uio-2.6.33/drivers/uio/uio.c
--- linux-2.6.33/drivers/uio/uio.c  2010-02-24 10:52:17.0 -0800
+++ uio-2.6.33/drivers/uio/uio.c2010-03-31 12:26:24.0 -0700
@@ -730,12 +730,24 @@ static int uio_mmap(struct file *filep,
}
 }
 
+static int uio_ioctl(struct inode *inode, struct file *filep,
+   unsigned int cmd, unsigned long arg) 
+{
+   struct uio_listener *listener = filep-private_data;
+   struct uio_device *idev = listener-dev;
+
+   if (idev == NULL || idev-info == NULL || idev-info-ioctl == NULL)
+   return -EINVAL;
+   return idev-info-ioctl(idev-info, cmd, arg);
+}
+
 static const struct file_operations uio_fops = {
.owner  = THIS_MODULE,
.open   = uio_open,
.release= uio_release,
.read   = uio_read,
.write  = uio_write,
+   .ioctl  = uio_ioctl,
.mmap   = uio_mmap,
.poll   = uio_poll,
.fasync = uio_fasync,
diff -rupN linux-2.6.33/drivers/uio/uio_pci_generic.c 
uio-2.6.33/drivers/uio/uio_pci_generic.c
--- linux-2.6.33/drivers/uio/uio_pci_generic.c  2010-02-24 
10:52:17.0 -0800
+++ uio-2.6.33/drivers/uio/uio_pci_generic.c2010-03-31 
16:28:33.0 -0700
@@ -1,4 +1,7 @@
-/* uio_pci_generic - generic UIO driver for PCI 2.3 devices
+/* uio_pci_generic - generic UIO driver for PCI 2.3 and PCIe devices 
+ *
+ * Copyright (C) 2010 Cisco Systems, Inc.
+ * Extensions by Tom Lyon p...@cisco.com
  *
  * Copyright (C) 2009 Red Hat, Inc.
  * Author: Michael S. Tsirkin m...@redhat.com
@@ -14,25 +17,35 @@
  * # ls -l /sys/bus/pci/devices/:00:19.0/driver
  * .../:00:19.0/driver - ../../../bus/pci/drivers/uio_pci_generic
  *
- * Driver won't bind to devices which do not support the Interrupt Disable 
Bit
+ * Driver won't bind to devices which do not support MSI, MSI-x, or the 
Interrupt Disable Bit
  * in the command register. All devices compliant to PCI 2.3 (circa 2002) and
- * all compliant PCI Express devices should support this bit.
+ * all compliant PCI Express devices should support one of these.
  */
 
 #include linux/device.h
 #include linux/module.h
 #include linux/pci.h
+#include linux/list.h
+#include linux/mm.h
+#include linux/mmu_notifier.h
 #include linux/uio_driver.h
 #include linux/spinlock.h
+#include linux/iommu.h
 
-#define DRIVER_VERSION 0.01.0
+#define DRIVER_VERSION 0.02.0
 #define DRIVER_AUTHOR  Michael S. Tsirkin m...@redhat.com
-#define DRIVER_DESCGeneric UIO driver for PCI 2.3 devices
+#define DRIVER_DESCGeneric UIO driver for PCI devices
 
 struct uio_pci_generic_dev {
struct uio_info info;
struct pci_dev *pdev;
spinlock_t lock; /* guards command register accesses */
+   int msi;
+   struct msix_entry *msix;
+   int nvec;
+   struct mm_struct *mm;
+   struct mmu_notifier mmu_notifier;
+   struct list_head dm_list;
 };
 
 static inline struct uio_pci_generic_dev *
@@ -41,6 +54,51 @@ to_uio_pci_generic_dev(struct uio_info *
return container_of(info, struct uio_pci_generic_dev, info);
 }
 
+/* Read/modify/write command register to disable interrupts.
+ * Note: we could cache the value and optimize the read if there was a way to
+ * get notified of user changes to command register through sysfs.
+ * */
+static void irqtoggle(struct uio_pci_generic_dev *gdev, int irq_on)
+{
+   struct pci_dev *pdev = gdev-pdev;
+   unsigned long flags;
+   u16 orig, new;
+
+   spin_lock_irqsave(gdev-lock, flags);
+   pci_block_user_cfg_access(pdev);
+   pci_read_config_word(pdev, PCI_COMMAND, orig);
+   new = irq_on ? (orig  ~PCI_COMMAND_INTX_DISABLE)
+   : (orig | PCI_COMMAND_INTX_DISABLE);
+   if (new != orig)
+   pci_write_config_word(gdev-pdev, PCI_COMMAND, new);
+   pci_unblock_user_cfg_access(pdev);
+   spin_unlock_irqrestore(gdev-lock, flags);
+}
+
+/* irqcontrol is use by userspace to enable/disable interrupts. */
+/* A privileged app can write the PCI_COMMAND register directly,
+ * but we need this for normal apps
+ */
+static int irqcontrol(struct uio_info *info, s32 irq_on)
+{
+   struct uio_pci_generic_dev *gdev = to_uio_pci_generic_dev(info);
+
+   irqtoggle(gdev, irq_on);
+   return 0;
+}
+
+/* MSI and MSI-X Interrupt handler.
+ * For bad devices, we may get an interrupt per event/packet, but most
+ * devices will self-throttle until user driver wants more
+ */
+static irqreturn_t msihandler(int irq, void *arg)
+{
+   struct uio_info *info = arg;
+
+   uio_event_notify(info);
+   return IRQ_HANDLED;
+}
+
 /* Interrupt handler. Read/modify/write the command register to disable
  * the interrupt. */
 static irqreturn_t irqhandler(int irq, struct uio_info *info)
@@ -89,7 +147,7 @@ done:
 /* Verify that the device supports Interrupt Disable bit in command

Re: [BAWUG] T-Mobile (the cellular company)

2002-07-22 Thread Tom Lyon


T-Mobile aka VoiceStream is sharing the Cingular network, so coverage
will be the same.

At 07:29 PM 7/22/2002 -0700, Matt Peterson wrote:
T-Mobile (Deutsche Telekom) has made its west coast attack on Market St
in San Francisco recently.  I briefly poked into the shop (previously a
computer shop that sold Ricochet modems) today, to find some attractive
pricing plans.  1000 whenever wherever minutes along with unlimited
weekend minutes for $39.99 (which includes voicemail, caller id and the
usual digital features).  What scared me was $2/Megabyte for data, ack!
They also don't appear to sell the sexy Sony Ericsson T68i, but they
will SIM them up (for unlocked versions bought on eBay, etc).  Also the
coverage map looks a bit spotty.  Has anyone used their service yet?
How's the Bay Area signal?

P.S. My secret plot is to get a trimode phone, thus worldwide phone
number, only to charge those outrageous international roaming rates to
my employer.  Woo, more Europe and Asia vacations here I come!

P.P.S. Oops, that backfired, need to temporally un-subscribe boss from
the list first.

--
Matt Peterson  Bay Area Wireless Users Group
Founder  http://www.bawug.org/
   --
--
general wireless list, a bawug thing http://www.bawug.org/
[un]subscribe: http://lists.bawug.org/mailman/listinfo/wireless

--
general wireless list, a bawug thing http://www.bawug.org/
[un]subscribe: http://lists.bawug.org/mailman/listinfo/wireless

[e-smith-devinfo] 3ware naked boot: BUG

2001-08-14 Thread Tom Lyon


Hi, I've just been through the process of booting e-smith on
a small DELL server with a 3ware 6200 controller with mirrored
drives. It has been painful. 

The server was totally clean, no OS, no IDE drives. e-smith
seems to really want to see /dev/hda since there were several
timeout/error messages. But the biggest problem/BUG was that
the lilo.conf file and the MBR were created to point at
/dev/hda instead of /dev/sda. Once I fixed that (after learning
some Linux), everything seems fine.

p.s. booted from CDROM

__
Do You Yahoo!?
Make international calls for as low as $.04/minute with Yahoo! Messenger
http://phonecard.yahoo.com/

--
Please report bugs to [EMAIL PROTECTED]
Please mail [EMAIL PROTECTED] (only) to discuss security issues
Support for registered customers and partners to [EMAIL PROTECTED]
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Archives by mail and http://www.mail-archive.com/devinfo%40lists.e-smith.org

64 matches

Mail list logo