Re: [RFC PATCH 1/1] vhost: TX used buffer guest signal accumulation

2010-10-29 Thread Michael S. Tsirkin
On Thu, Oct 28, 2010 at 12:32:35PM -0700, Shirley Ma wrote:
 On Thu, 2010-10-28 at 07:20 +0200, Michael S. Tsirkin wrote:
  My concern is this can delay signalling for unlimited time.
  Could you pls test this with guests that do not have
  2b5bbe3b8bee8b38bdc27dd9c0270829b6eb7eeb
  b0c39dbdc204006ef3558a66716ff09797619778
  that is 2.6.31 and older? 
 
 The patch only induces delay signaling unlimited time when there is no
 TX packet to transmit. I thought TX signaling only noticing guest to
 release the used buffers, anything else beside this?

Right, that's it I think. For newer kernels we orphan the skb
on xmit so we don't care that much about completing them.
This does rely on an undocumented assumption about guest
behaviour though.

 I tested rhel5u5 guest (2.6.18 kernel), it works fine. I checked the two
 commits log, I don't think this patch could cause any issue w/o these
 two patches.
 
 Also I found a big TX regression for old guest and new guest. For old
 guest, I am able to get almost 11Gb/s for 2K message size, but for the
 new guest kernel, I can only get 3.5 Gb/s with the patch and same host.
 I will dig it why.
 
 thanks
 Shirley
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/1] vhost: TX used buffer guest signal accumulation

2010-10-29 Thread Michael S. Tsirkin
On Thu, Oct 28, 2010 at 10:14:22AM -0700, Shirley Ma wrote:
 
  Two ideas:
  1. How about writing out used, just delaying the signal?
 This way we don't have to queue separately.
 
 This improves some performance, but not as good as delaying
 both used and signal. Since delaying used buffers combining
 multiple small copies to a large copy, which saves more CPU
 utilization and increased some BW.

Hmm. I don't yet understand. We are still doing copies into the per-vq
buffer, and the data copied is really small.  Is it about cache line
bounces?  Could you try figuring it out?

  2. How about flushing out queued stuff before we exit
 the handle_tx loop? That would address most of
 the spec issue. 
 
 The performance is almost as same as the previous patch. I will resubmit
 the modified one, adding vhost_add_used_and_signal_n after handle_tx
 loop for processing pending queue.
 
 This patch was a part of modified macvtap zero copy which I haven't
 submitted yet. I found this helped vhost TX in general. This pending
 queue will be used by DMA done later, so I put it in vq instead of a
 local variable in handle_tx.
 
 Thanks
 Shirley

BTW why do we need another array? Isn't heads field exactly what we need
here?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/1] vhost: TX used buffer guest signal accumulation

2010-10-29 Thread Michael S. Tsirkin
On Thu, Oct 28, 2010 at 02:40:50PM -0700, Shirley Ma wrote:
 On Thu, 2010-10-28 at 14:04 -0700, Sridhar Samudrala wrote:
  It would be some change in virtio-net driver that may have improved
  the
  latency of small messages which in turn would have reduced the
  bandwidth
  as TCP could not accumulate and send large packets.
 
 I will check out any latency improvement patch in virtio_net. If that's
 the case, whether it is good to have some tunable parameter to benefit
 both BW and latency workload?
 
 Shirley 

No, we need it to work well automatically somehow.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/1] vhost: TX used buffer guest signal accumulation

2010-10-29 Thread Michael S. Tsirkin
On Thu, Oct 28, 2010 at 01:13:55PM -0700, Shirley Ma wrote:
 On Thu, 2010-10-28 at 12:32 -0700, Shirley Ma wrote:
  Also I found a big TX regression for old guest and new guest. For old
  guest, I am able to get almost 11Gb/s for 2K message size, but for the
  new guest kernel, I can only get 3.5 Gb/s with the patch and same
  host.
  I will dig it why. 
 
 The regression is from guest kernel, not from this patch. Tested 2.6.31
 kernel, it's performance is less than 2Gb/s for 2K message size already.
 I will resubmit the patch for review. 
 
 I will start to test from 2.6.30 kernel to figure it when TX regression
 induced in virtio_net. Any suggestion which guest kernel I should test
 to figure out this regression?
 
 Thanks
 Shirley

git bisect 2.6.31 2.6.30
and go from here.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v3 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-10-29 Thread Michael S. Tsirkin
On Thu, Oct 28, 2010 at 12:48:57PM +0530, Krishna Kumar2 wrote:
  Krishna Kumar2/India/IBM wrote on 10/28/2010 10:44:14 AM:
 
  Results for UDP BW tests (unidirectional, sum across
  3 iterations, each iteration of 45 seconds, default
  netperf, vhosts bound to cpus 0-3; no other tuning):

 Is binding vhost threads to CPUs really required?
 What happens if we let the scheduler do its job?
   
Nothing drastic, I remember BW% and SD% both improved a
bit as a result of binding.
  
   If there's a significant improvement this would mean that
   we need to rethink the vhost-net interaction with the scheduler.
 
  I will get a test run with and without binding and post the
  results later today.
 
 Correction: The result with binding is is much better for
 SD/CPU compared to without-binding:

Can you pls ty finding out why that is?  Is some thread bouncing between
CPUs?  Does a wrong numa node get picked up?
In practice users are very unlikely to pin threads to CPUs.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: Create an eventfd mechanism for EOIs to get to userspace

2010-10-29 Thread Alex Williamson
To support VFIO based device assignment, we need to be able to get
an EOI out of the KVM irqchip.  This introduces a mechanism to do
that by registering an eventfd to be signaled when the IRQ is ACKed.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 include/linux/kvm.h  |   13 ++
 include/linux/kvm_host.h |6 +++
 virt/kvm/eventfd.c   |   95 ++
 virt/kvm/kvm_main.c  |8 
 4 files changed, 122 insertions(+), 0 deletions(-)

diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index ea2dc1a..92d5b27 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -541,6 +541,7 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_PPC_GET_PVINFO 57
 #define KVM_CAP_PPC_IRQ_LEVEL 58
 #define KVM_CAP_ASYNC_PF 59
+#define KVM_CAP_EOI_EVENTFD 60
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -620,6 +621,16 @@ struct kvm_clock_data {
__u32 pad[9];
 };
 
+#define KVM_EOI_EVENTFD_FLAG_DEASSIGN (1  0)
+#define KVM_EOI_EVENTFD_FLAG_DEASSERT (1  1)
+
+struct kvm_eoi {
+   __u32 fd;
+   __u32 gsi;
+   __u32 flags;
+   __u8  pad[20];
+};
+
 /*
  * ioctls for VM fds
  */
@@ -677,6 +688,8 @@ struct kvm_clock_data {
 #define KVM_SET_PIT2  _IOW(KVMIO,  0xa0, struct kvm_pit_state2)
 /* Available with KVM_CAP_PPC_GET_PVINFO */
 #define KVM_PPC_GET_PVINFO   _IOW(KVMIO,  0xa1, struct kvm_ppc_pvinfo)
+/* Available with KVM_CAP_EOI_EVENTFD */
+#define KVM_EOI_EVENTFD   _IOW(KVMIO,  0xa2, struct kvm_eoi)
 
 /*
  * ioctls for vcpu fds
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ee4314e..5d50a7e 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -227,6 +227,7 @@ struct kvm {
struct list_head  items;
} irqfds;
struct list_head ioeventfds;
+   struct list_head eoi_eventfds;
 #endif
struct kvm_vm_stat stat;
struct kvm_arch arch;
@@ -643,6 +644,7 @@ void kvm_eventfd_init(struct kvm *kvm);
 int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags);
 void kvm_irqfd_release(struct kvm *kvm);
 int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
+int kvm_eoi_eventfd(struct kvm *kvm, struct kvm_eoi *eoi);
 
 #else
 
@@ -658,6 +660,10 @@ static inline int kvm_ioeventfd(struct kvm *kvm, struct 
kvm_ioeventfd *args)
return -ENOSYS;
 }
 
+static inline int kvm_eoi_eventfd(struct kvm *kvm, struct kvm_eoi *eoi)
+{
+   return -ENOSYS;
+}
 #endif /* CONFIG_HAVE_KVM_EVENTFD */
 
 #ifdef CONFIG_KVM_APIC_ARCHITECTURE
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index c1f1e3c..3dbfb21 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -253,6 +253,7 @@ kvm_eventfd_init(struct kvm *kvm)
spin_lock_init(kvm-irqfds.lock);
INIT_LIST_HEAD(kvm-irqfds.items);
INIT_LIST_HEAD(kvm-ioeventfds);
+   INIT_LIST_HEAD(kvm-eoi_eventfds);
 }
 
 /*
@@ -586,3 +587,97 @@ kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
 
return kvm_assign_ioeventfd(kvm, args);
 }
+
+/*
+ * 
+ *  eoi_eventfd: Translate KVM APIC/IOAPIC EOI into eventfd signal.
+ *
+ *  userspace can register GSIs with an eventfd for receiving notification
+ *  when an EOI occurs.
+ * 
+ */
+
+struct _eoi_eventfd {
+   struct list_headlist;
+   struct kvm  *kvm;
+   struct eventfd_ctx  *eventfd;
+   booldeassert;
+   struct kvm_irq_ack_notifier notifier;
+};
+
+static void kvm_eoi_eventfd_acked(struct kvm_irq_ack_notifier *notifier)
+{
+   struct _eoi_eventfd *p;
+
+   p = container_of(notifier, struct _eoi_eventfd, notifier);
+
+   if (p-deassert)
+   kvm_set_irq(p-kvm, KVM_USERSPACE_IRQ_SOURCE_ID,
+   notifier-gsi, 0);
+
+   eventfd_signal(p-eventfd, 1);
+}
+
+static int kvm_assign_eoi_eventfd(struct kvm *kvm, struct kvm_eoi *eoi)
+{
+   struct eventfd_ctx *eventfd;
+   struct _eoi_eventfd *p;
+
+   eventfd = eventfd_ctx_fdget(eoi-fd);
+   if (IS_ERR(eventfd))
+   return PTR_ERR(eventfd);
+
+   p = kzalloc(sizeof(*p), GFP_KERNEL);
+   if (!p) {
+   eventfd_ctx_put(eventfd);
+   return -ENOMEM;
+   }
+
+   INIT_LIST_HEAD(p-list);
+   p-kvm = kvm;
+   p-eventfd = eventfd;
+   p-deassert = !!(eoi-flags  KVM_EOI_EVENTFD_FLAG_DEASSERT);
+
+   p-notifier.gsi = eoi-gsi;
+   p-notifier.irq_acked = kvm_eoi_eventfd_acked;
+
+   list_add_tail(p-list, kvm-eoi_eventfds);
+   kvm_register_irq_ack_notifier(kvm, p-notifier);
+
+   return 0;
+}
+
+static int kvm_deassign_eoi_eventfd(struct kvm *kvm, struct kvm_eoi *eoi)
+{
+   struct eventfd_ctx *eventfd;
+   struct _eoi_eventfd *p, *tmp;
+   int ret = -ENOENT;
+
+   eventfd = eventfd_ctx_fdget(eoi-fd);
+   

Re: [v3 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-10-29 Thread linux_kvm
On Fri, 29 Oct 2010 13:26 +0200, Michael S. Tsirkin m...@redhat.com
wrote:
 On Thu, Oct 28, 2010 at 12:48:57PM +0530, Krishna Kumar2 wrote:
   Krishna Kumar2/India/IBM wrote on 10/28/2010 10:44:14 AM:
 In practice users are very unlikely to pin threads to CPUs.

I may be misunderstanding what you're referring to. It caught my
attention since I'm working on a configuration to do what you say is
unlikely, so I'll chime in for what it's worth.

An option in Vyatta allows assigning CPU affinity to network adapters,
since apparently seperate L2 caches can have a significant impact on
throughput.

Although much of their focus seems to be on commercial virtualization
platforms, I do see quite a few forum posts with regard to KVM.
Mabye this still qualifies as an edge case, but as for virtualized
routing theirs seems to offer the most functionality.

http://www.vyatta.org/forum/viewtopic.php?t=2697

-cb
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] pci: Add callbacks to support retreiving and updating interrupts

2010-10-29 Thread Alex Williamson
For device assignment, we need to be able to retreive the IRQ that
the device interrupt pin is assigned to and be notified if that
mapping changes (this happens via ACPI interrupt link updates on
x86).  We can then make use of the IRQ for EOI notification.
Current qemu-kvm device assignment code invades common code with
some hard coded hacks to achieve this.  This attempts to architect
the solution.

Chipset components responsible for interrupt mapping can call
pci_bridge_update_irqs() to signal when interrupt mapping may have
changed.  The bridge for those chipsets should then implement a
get_irq callback.  This is stubbed out for everybody as I only
know how PIIX3 works.  Devices wishing to be notified about IRQ
updates can register via pci_register_update_irqs(), where they
can then check mappings with pci_get_irq().

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 hw/apb_pci.c   |4 ++--
 hw/bonito.c|2 +-
 hw/grackle_pci.c   |2 +-
 hw/gt64xxx.c   |5 +++--
 hw/pci.c   |   52 +++-
 hw/pci.h   |   16 
 hw/piix_pci.c  |   23 ++-
 hw/ppc4xx_pci.c|2 +-
 hw/ppce500_pci.c   |2 +-
 hw/prep_pci.c  |2 +-
 hw/sh_pci.c|2 +-
 hw/unin_pci.c  |   10 ++
 hw/versatile_pci.c |2 +-
 13 files changed, 99 insertions(+), 25 deletions(-)

diff --git a/hw/apb_pci.c b/hw/apb_pci.c
index 0ecac55..47ff0d9 100644
--- a/hw/apb_pci.c
+++ b/hw/apb_pci.c
@@ -336,8 +336,8 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base,
 d = FROM_SYSBUS(APBState, s);
 
 d-bus = pci_register_bus(d-busdev.qdev, pci,
- pci_apb_set_irq, pci_pbm_map_irq, d,
- 0, 32);
+ pci_apb_set_irq, NULL,
+ pci_pbm_map_irq, d, 0, 32);
 pci_bus_set_mem_base(d-bus, mem_base);
 
 for (i = 0; i  32; i++) {
diff --git a/hw/bonito.c b/hw/bonito.c
index dcf0311..d2869bb 100644
--- a/hw/bonito.c
+++ b/hw/bonito.c
@@ -772,7 +772,7 @@ PCIBus *bonito_init(qemu_irq *pic)
 dev = qdev_create(NULL, Bonito-pcihost);
 pcihost = FROM_SYSBUS(BonitoState, sysbus_from_qdev(dev));
 b = pci_register_bus(pcihost-busdev.qdev, pci, pci_bonito_set_irq,
- pci_bonito_map_irq, pic, 0x28, 32);
+ NULL, pci_bonito_map_irq, pic, 0x28, 32);
 pcihost-bus = b;
 qdev_init_nofail(dev);
 
diff --git a/hw/grackle_pci.c b/hw/grackle_pci.c
index 91c755f..e747d7e 100644
--- a/hw/grackle_pci.c
+++ b/hw/grackle_pci.c
@@ -89,7 +89,7 @@ PCIBus *pci_grackle_init(uint32_t base, qemu_irq *pic)
 s = sysbus_from_qdev(dev);
 d = FROM_SYSBUS(GrackleState, s);
 d-host_state.bus = pci_register_bus(d-busdev.qdev, pci,
- pci_grackle_set_irq,
+ pci_grackle_set_irq, NULL,
  pci_grackle_map_irq,
  pic, 0, 4);
 
diff --git a/hw/gt64xxx.c b/hw/gt64xxx.c
index cabf7ea..2a0fc4a 100644
--- a/hw/gt64xxx.c
+++ b/hw/gt64xxx.c
@@ -1114,8 +1114,9 @@ PCIBus *pci_gt64120_init(qemu_irq *pic)
 s-pci = qemu_mallocz(sizeof(GT64120PCIState));
 
 s-pci-bus = pci_register_bus(NULL, pci,
-   pci_gt64120_set_irq, pci_gt64120_map_irq,
-   pic, PCI_DEVFN(18, 0), 4);
+   pci_gt64120_set_irq, NULL,
+   pci_gt64120_map_irq, pic,
+   PCI_DEVFN(18, 0), 4);
 s-ISD_handle = cpu_register_io_memory(gt64120_read, gt64120_write, s);
 d = pci_register_device(s-pci-bus, GT64120 PCI Bus, sizeof(PCIDevice),
 0, NULL, NULL);
diff --git a/hw/pci.c b/hw/pci.c
index 1280d4d..645b119 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -41,6 +41,7 @@ struct PCIBus {
 BusState qbus;
 int devfn_min;
 pci_set_irq_fn set_irq;
+pci_get_irq_fn get_irq;
 pci_map_irq_fn map_irq;
 pci_hotplug_fn hotplug;
 DeviceState *hotplug_qdev;
@@ -139,6 +140,23 @@ static void pci_change_irq_level(PCIDevice *pci_dev, int 
irq_num, int change)
 bus-set_irq(bus-irq_opaque, irq_num, bus-irq_count[irq_num] != 0);
 }
 
+int pci_get_irq(PCIDevice *pci_dev, int pin)
+{
+PCIBus *bus;
+for (;;) {
+if (!pci_dev)
+return -ENOSYS;
+bus = pci_dev-bus;
+if (!bus)
+return -ENOSYS;
+pin = bus-map_irq(pci_dev, pin);
+if (bus-get_irq)
+break;
+pci_dev = bus-parent_dev;
+}
+return bus-get_irq(bus-irq_opaque, pin);
+}
+
 /* Update interrupt status bit in config space on interrupt
  * state change. */
 static void pci_update_irq_status(PCIDevice *dev)
@@ -260,10 +278,11 @@ PCIBus 

Re: [RFC PATCH 1/1] vhost: TX used buffer guest signal accumulation

2010-10-29 Thread Shirley Ma
On Fri, 2010-10-29 at 10:10 +0200, Michael S. Tsirkin wrote:
 Hmm. I don't yet understand. We are still doing copies into the per-vq
 buffer, and the data copied is really small.  Is it about cache line
 bounces?  Could you try figuring it out?

per-vq buffer is much less expensive than 3 put_copy() call. I will
collect the profiling data to show that.

   2. How about flushing out queued stuff before we exit
  the handle_tx loop? That would address most of
  the spec issue. 
  
  The performance is almost as same as the previous patch. I will
 resubmit
  the modified one, adding vhost_add_used_and_signal_n after handle_tx
  loop for processing pending queue.
  
  This patch was a part of modified macvtap zero copy which I haven't
  submitted yet. I found this helped vhost TX in general. This pending
  queue will be used by DMA done later, so I put it in vq instead of a
  local variable in handle_tx.
  
  Thanks
  Shirley
 
 BTW why do we need another array? Isn't heads field exactly what we
 need
 here?

head field is only for up to 32, the more used buffers add and signal
accumulated the better performance is from test results. That's was one
of the reason I didn't use heads. The other reason was I used these
buffer for pending dma done in mavctap zero copy patch. It could be up
to vq-num in worse case.

Thanks
Shirley

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] Minimal RAM API support

2010-10-29 Thread Alex Williamson
For VFIO based device assignment, we need to know what guest memory
areas are actual RAM.  RAMBlocks have long since become a grab bag
of misc allocations, so aren't effective for this.  Anthony has had
a RAM API in mind for a while now that addresses this problem.  This
implements just enough of it so that we have an interface to get
actual guest memory physical addresses to setup the host IOMMU.  We
can continue building a full RAM API on top of this stub.

Anthony, feel free to add copyright to memory.c as it's based on
your initial implementation.  I had to add something since the file
in your branch just copies a header with Frabrice's copywrite.
Thanks,

Alex

---

Alex Williamson (2):
  RAM API: Make use of it for x86 PC
  Minimal RAM API support


 Makefile.target |1 +
 cpu-common.h|2 +
 hw/pc.c |   12 
 memory.c|   82 +++
 memory.h|   23 +++
 5 files changed, 114 insertions(+), 6 deletions(-)
 create mode 100644 memory.c
 create mode 100644 memory.h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Minimal RAM API support

2010-10-29 Thread Alex Williamson
This adds a minimum chunk of Anthony's RAM API support so that we
can identify actual VM RAM versus all the other things that make
use of qemu_ram_alloc.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 Makefile.target |1 +
 cpu-common.h|2 +
 memory.c|   82 +++
 memory.h|   23 +++
 4 files changed, 108 insertions(+), 0 deletions(-)
 create mode 100644 memory.c
 create mode 100644 memory.h

diff --git a/Makefile.target b/Makefile.target
index c48cbcc..e4e2eb4 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -175,6 +175,7 @@ obj-$(CONFIG_VIRTFS) += virtio-9p.o
 obj-y += rwhandler.o
 obj-$(CONFIG_KVM) += kvm.o kvm-all.o
 obj-$(CONFIG_NO_KVM) += kvm-stub.o
+obj-y += memory.o
 LIBS+=-lz
 
 QEMU_CFLAGS += $(VNC_TLS_CFLAGS)
diff --git a/cpu-common.h b/cpu-common.h
index a543b5d..6aa2738 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -23,6 +23,8 @@
 /* address in the RAM (different from a physical address) */
 typedef unsigned long ram_addr_t;
 
+#include memory.h
+
 /* memory API */
 
 typedef void CPUWriteMemoryFunc(void *opaque, target_phys_addr_t addr, 
uint32_t value);
diff --git a/memory.c b/memory.c
new file mode 100644
index 000..86947fb
--- /dev/null
+++ b/memory.c
@@ -0,0 +1,82 @@
+/*
+ * RAM API
+ *
+ *  Copyright Red Hat, Inc. 2010
+ *
+ * Authors:
+ *  Alex Williamson alex.william...@redhat.com
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see http://www.gnu.org/licenses/.
+ */
+#include memory.h
+#include range.h
+
+QemuRamSlots ram_slots = { .slots = QLIST_HEAD_INITIALIZER(ram_slots) };
+
+static QemuRamSlot *qemu_ram_find_slot(target_phys_addr_t start_addr,
+   ram_addr_t size)
+{
+QemuRamSlot *slot;
+
+QLIST_FOREACH(slot, ram_slots.slots, next) {
+if (slot-start_addr == start_addr  slot-size == size) {
+return slot;
+}
+
+if (ranges_overlap(start_addr, size, slot-start_addr, slot-size)) {
+abort();
+}
+}
+
+return NULL;
+}
+
+void qemu_ram_register(target_phys_addr_t start_addr, ram_addr_t size,
+   ram_addr_t phys_offset)
+{
+QemuRamSlot *slot;
+
+if (!size) {
+return;
+}
+
+assert(!qemu_ram_find_slot(start_addr, size));
+
+slot = qemu_mallocz(sizeof(QemuRamSlot));
+
+slot-start_addr = start_addr;
+slot-size = size;
+slot-offset = phys_offset;
+
+QLIST_INSERT_HEAD(ram_slots.slots, slot, next);
+
+cpu_register_physical_memory(slot-start_addr, slot-size, slot-offset);
+}
+
+void qemu_ram_unregister(target_phys_addr_t start_addr, ram_addr_t size)
+{
+QemuRamSlot *slot;
+
+if (!size) {
+return;
+}
+
+slot = qemu_ram_find_slot(start_addr, size);
+assert(slot != NULL);
+
+QLIST_REMOVE(slot, next);
+cpu_register_physical_memory(start_addr, size, IO_MEM_UNASSIGNED);
+
+return;
+}
diff --git a/memory.h b/memory.h
new file mode 100644
index 000..91e552e
--- /dev/null
+++ b/memory.h
@@ -0,0 +1,23 @@
+#ifndef QEMU_MEMORY_H
+#define QEMU_MEMORY_H
+
+#include qemu-common.h
+#include cpu-common.h
+
+typedef struct QemuRamSlot {
+target_phys_addr_t start_addr;
+ram_addr_t size;
+ram_addr_t offset;
+void *host;
+QLIST_ENTRY(QemuRamSlot) next;
+} QemuRamSlot;
+
+typedef struct QemuRamSlots {
+QLIST_HEAD(slots, QemuRamSlot) slots;
+} QemuRamSlots;
+extern QemuRamSlots ram_slots;
+
+void qemu_ram_register(target_phys_addr_t start_addr, ram_addr_t size,
+   ram_addr_t phys_offset);
+void qemu_ram_unregister(target_phys_addr_t start_addr, ram_addr_t size);
+#endif

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] RAM API: Make use of it for x86 PC

2010-10-29 Thread Alex Williamson
Register the actual VM RAM using the new API

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 hw/pc.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index 69b13bf..0ea6d10 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -912,14 +912,14 @@ void pc_memory_init(ram_addr_t ram_size,
 /* allocate RAM */
 ram_addr = qemu_ram_alloc(NULL, pc.ram,
   below_4g_mem_size + above_4g_mem_size);
-cpu_register_physical_memory(0, 0xa, ram_addr);
-cpu_register_physical_memory(0x10,
- below_4g_mem_size - 0x10,
- ram_addr + 0x10);
+
+qemu_ram_register(0, 0xa, ram_addr);
+qemu_ram_register(0x10, below_4g_mem_size - 0x10,
+  ram_addr + 0x10);
 #if TARGET_PHYS_ADDR_BITS  32
 if (above_4g_mem_size  0) {
-cpu_register_physical_memory(0x1ULL, above_4g_mem_size,
- ram_addr + below_4g_mem_size);
+qemu_ram_register(0x1ULL, above_4g_mem_size,
+  ram_addr + below_4g_mem_size);
 }
 #endif
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] exec: Implement qemu_ram_free_from_ptr()

2010-10-29 Thread Alex Williamson
Required for regions mapped via qemu_ram_alloc_from_ptr().  VFIO
will make use of this to remove mappings when devices are hot
unplugged.  (Current callers of qemu_ram_alloc_from_ptr() should
probably need this too)

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 cpu-common.h |1 +
 exec.c   |   13 +
 2 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/cpu-common.h b/cpu-common.h
index a543b5d..8a3d1da 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -43,6 +43,7 @@ ram_addr_t cpu_get_physical_page_desc(target_phys_addr_t 
addr);
 ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name,
 ram_addr_t size, void *host);
 ram_addr_t qemu_ram_alloc(DeviceState *dev, const char *name, ram_addr_t size);
+void qemu_ram_free_from_ptr(ram_addr_t addr);
 void qemu_ram_free(ram_addr_t addr);
 /* This should only be used for ram local to a device.  */
 void *qemu_get_ram_ptr(ram_addr_t addr);
diff --git a/exec.c b/exec.c
index 631d8c5..2b3b9ba 100644
--- a/exec.c
+++ b/exec.c
@@ -2882,6 +2882,19 @@ ram_addr_t qemu_ram_alloc(DeviceState *dev, const char 
*name, ram_addr_t size)
 return qemu_ram_alloc_from_ptr(dev, name, size, NULL);
 }
 
+void qemu_ram_free_from_ptr(ram_addr_t addr)
+{
+RAMBlock *block;
+
+QLIST_FOREACH(block, ram_list.blocks, next) {
+if (addr == block-offset) {
+QLIST_REMOVE(block, next);
+qemu_free(block);
+return;
+}
+}
+}
+
 void qemu_ram_free(ram_addr_t addr)
 {
 RAMBlock *block;

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: Add suport for KVM_EOI_EVENTFD

2010-10-29 Thread Alex Williamson
This allows us to register an eventfd to be triggered on EOI for the
given IRQ.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 Userspace side of:
[PATCH] kvm: Create an eventfd mechanism for EOIs to get to userspace

 kvm-all.c   |   19 +++
 kvm.h   |   10 ++
 kvm/include/linux/kvm.h |   13 +
 3 files changed, 42 insertions(+), 0 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 0e60748..75dbe76 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1349,5 +1349,24 @@ int kvm_set_irqfd(int gsi, int fd, bool assigned)
 }
 #endif
 
+#if defined(KVM_EOI_EVENTFD)
+int kvm_eoi_eventfd(int gsi, int fd, uint32_t flags)
+{
+struct kvm_eoi eoi = {
+.fd = fd,
+.gsi = gsi,
+.flags = flags,
+};
+int r;
+
+if (!kvm_enabled() || !kvm_irqchip_in_kernel())
+return -ENOSYS;
+
+r = kvm_vm_ioctl(kvm_state, KVM_EOI_EVENTFD, eoi);
+if (r  0)
+return r;
+return 0;
+}
+#endif
 #undef PAGE_SIZE
 #include qemu-kvm.c
diff --git a/kvm.h b/kvm.h
index 02280a6..777904a 100644
--- a/kvm.h
+++ b/kvm.h
@@ -203,6 +203,16 @@ int kvm_set_irqfd(int gsi, int fd, bool assigned)
 }
 #endif
 
+#if defined(KVM_EOI_EVENTFD)  defined(CONFIG_KVM)
+int kvm_eoi_eventfd(int gsi, int fd, uint32_t flags);
+#else
+static inline
+int kvm_eoi_eventfd(int gsi, int fd, uint32_t flags)
+{
+return -ENOSYS;
+}
+#endif
+
 int kvm_set_ioeventfd_pio_word(int fd, uint16_t adr, uint16_t val, bool 
assign);
 
 int kvm_has_gsi_routing(void);
diff --git a/kvm/include/linux/kvm.h b/kvm/include/linux/kvm.h
index e46729e..5490f62 100644
--- a/kvm/include/linux/kvm.h
+++ b/kvm/include/linux/kvm.h
@@ -530,6 +530,7 @@ struct kvm_enable_cap {
 #ifdef __KVM_HAVE_XCRS
 #define KVM_CAP_XCRS 56
 #endif
+#define KVM_CAP_EOI_EVENTFD 60
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -609,6 +610,16 @@ struct kvm_clock_data {
__u32 pad[9];
 };
 
+#define KVM_EOI_EVENTFD_FLAG_DEASSIGN (1  0)
+#define KVM_EOI_EVENTFD_FLAG_DEASSERT (1  1)
+
+struct kvm_eoi {
+   __u32 fd;
+   __u32 gsi;
+   __u32 flags;
+   __u8  pad[20];
+};
+
 /*
  * ioctls for VM fds
  */
@@ -663,6 +674,8 @@ struct kvm_clock_data {
 /* Available with KVM_CAP_PIT_STATE2 */
 #define KVM_GET_PIT2  _IOR(KVMIO,  0x9f, struct kvm_pit_state2)
 #define KVM_SET_PIT2  _IOW(KVMIO,  0xa0, struct kvm_pit_state2)
+/* Available with KVM_CAP_EOI_EVENTFD */
+#define KVM_EOI_EVENTFD   _IOW(KVMIO,  0xa2, struct kvm_eoi)
 
 /*
  * ioctls for vcpu fds

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] APIC/IOAPIC EOI callback

2010-10-29 Thread Alex Williamson
For device assignment, we need to know when the VM writes an end
of interrupt to the APIC, which allows us to re-enable interrupts
on the physical device.  Add a new wrapper for ioapic generated
interrupts with a callback on eoi and create an interface for
drivers to be notified on eoi.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

Note that the notifier and notifier_enabled eoi_client fields aren't
used here yet.  I'll send an RFC patch showing how we make use of
these with the proposed KVM_EOI_EVENTFD patches.

 hw/apic.c   |   18 --
 hw/apic.h   |4 
 hw/ioapic.c |   38 --
 hw/pc.h |   16 +++-
 4 files changed, 71 insertions(+), 5 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index 63d62c7..a24117b 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -22,6 +22,7 @@
 #include host-utils.h
 #include sysbus.h
 #include trace.h
+#include pc.h
 
 /* APIC Local Vector Table */
 #define APIC_LVT_TIMER   0
@@ -103,6 +104,7 @@ struct APICState {
 int wait_for_sipi;
 };
 
+static uint8_t vector_to_gsi_map[256] = { 0xff };
 static APICState *local_apics[MAX_APICS + 1];
 static int apic_irq_delivered;
 
@@ -292,6 +294,15 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode,
  trigger_mode);
 }
 
+void apic_deliver_ioapic_irq(uint8_t dest, uint8_t dest_mode,
+ uint8_t delivery_mode, uint8_t vector_num,
+ uint8_t polarity, uint8_t trigger_mode, int gsi)
+{
+vector_to_gsi_map[vector_num] = gsi;
+apic_deliver_irq(dest, dest_mode, delivery_mode,
+ vector_num, polarity, trigger_mode);
+}
+
 void cpu_set_apic_base(DeviceState *d, uint64_t val)
 {
 APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
@@ -420,8 +431,11 @@ static void apic_eoi(APICState *s)
 if (isrv  0)
 return;
 reset_bit(s-isr, isrv);
-/* XXX: send the EOI packet to the APIC bus to allow the I/O APIC to
-set the remote IRR bit for level triggered interrupts. */
+  
+if (vector_to_gsi_map[isrv] != 0xff) {
+ioapic_eoi(vector_to_gsi_map[isrv]);
+vector_to_gsi_map[isrv] = 0xff;
+}
 apic_update_irq(s);
 }
 
diff --git a/hw/apic.h b/hw/apic.h
index 8a0c9d0..59d0e37 100644
--- a/hw/apic.h
+++ b/hw/apic.h
@@ -8,6 +8,10 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode,
  uint8_t delivery_mode,
  uint8_t vector_num, uint8_t polarity,
  uint8_t trigger_mode);
+void apic_deliver_ioapic_irq(uint8_t dest, uint8_t dest_mode,
+ uint8_t delivery_mode,
+ uint8_t vector_num, uint8_t polarity,
+ uint8_t trigger_mode, int gsi);
 int apic_accept_pic_intr(DeviceState *s);
 void apic_deliver_pic_intr(DeviceState *s, int level);
 int apic_get_interrupt(DeviceState *s);
diff --git a/hw/ioapic.c b/hw/ioapic.c
index 5ae21e9..ffd1c92 100644
--- a/hw/ioapic.c
+++ b/hw/ioapic.c
@@ -26,6 +26,7 @@
 #include qemu-timer.h
 #include host-utils.h
 #include sysbus.h
+#include qlist.h
 
 //#define DEBUG_IOAPIC
 
@@ -61,6 +62,39 @@ struct IOAPICState {
 uint64_t ioredtbl[IOAPIC_NUM_PINS];
 };
 
+static QLIST_HEAD(ioapic_eoi_client_list,
+  ioapic_eoi_client) ioapic_eoi_client_list =
+  QLIST_HEAD_INITIALIZER(ioapic_eoi_client_list);
+
+int ioapic_register_eoi_client(ioapic_eoi_client *client)
+{
+QLIST_INSERT_HEAD(ioapic_eoi_client_list, client, list);
+return 0;
+}
+
+void ioapic_unregister_eoi_client(ioapic_eoi_client *client)
+{
+QLIST_REMOVE(client, list);
+}
+
+int ioapic_eoi_client_get_fd(ioapic_eoi_client *client)
+{
+if (!client-notifier_enabled) {
+return -ENODEV;
+}
+return event_notifier_get_fd(client-notifier);
+}
+
+void ioapic_eoi(int gsi)
+{
+ioapic_eoi_client *client;
+QLIST_FOREACH(client, ioapic_eoi_client_list, list) {
+if (client-irq == gsi) {
+client-eoi(client);
+}
+}
+}
+
 static void ioapic_service(IOAPICState *s)
 {
 uint8_t i;
@@ -90,8 +124,8 @@ static void ioapic_service(IOAPICState *s)
 else
 vector = entry  0xff;
 
-apic_deliver_irq(dest, dest_mode, delivery_mode,
- vector, polarity, trig_mode);
+apic_deliver_ioapic_irq(dest, dest_mode, delivery_mode,
+vector, polarity, trig_mode, i);
 }
 }
 }
diff --git a/hw/pc.h b/hw/pc.h
index 63b0249..5945bff 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -5,6 +5,7 @@
 #include ioport.h
 #include isa.h
 #include fdc.h
+#include event_notifier.h
 
 /* PC-style peripherals (also used by other machines).  */
 
@@ -48,8 +49,21 @@ typedef struct isa_irq_state {
 
 void isa_irq_handler(void *opaque, int n, int level);
 
-/* i8254.c */
+struct 

[RFC PATCH] kvm: KVM_EOI_EVENTFD support for eoi_client

2010-10-29 Thread Alex Williamson
With the KVM irqchip, we need to get the EOI via an eventfd.  This
adds support for that, abstracting the details to the caller.

The get_fd function allows drivers to make further optimizations
in handling the EOI.  For instance with VFIO, we can make use of
an irqfd-like mechanism to have the VFIO kernel module consume
the EOI directly, bypassing qemu userspace.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 hw/ioapic.c |   52 
 1 files changed, 52 insertions(+), 0 deletions(-)

diff --git a/hw/ioapic.c b/hw/ioapic.c
index c43be3a..707f2a2 100644
--- a/hw/ioapic.c
+++ b/hw/ioapic.c
@@ -72,14 +72,66 @@ static QLIST_HEAD(ioapic_eoi_client_list,
   ioapic_eoi_client) ioapic_eoi_client_list =
   QLIST_HEAD_INITIALIZER(ioapic_eoi_client_list);
 
+#ifdef KVM_EOI_EVENTFD
+static void ioapic_eoi_callback(void *opaque)
+{
+ioapic_eoi_client *client = opaque;
+
+if (!event_notifier_test_and_clear(client-notifier)) {
+return;
+}
+
+client-eoi(client);
+}
+#endif
+
 int ioapic_register_eoi_client(ioapic_eoi_client *client)
 {
 QLIST_INSERT_HEAD(ioapic_eoi_client_list, client, list);
+
+#ifdef KVM_EOI_EVENTFD
+if (kvm_enabled()  kvm_irqchip_in_kernel()) {
+int ret, fd;
+
+ret = event_notifier_init(client-notifier, 0);
+if (ret) {
+fprintf(stderr, %s notifier init failed %d\n, __FUNCTION__, ret);
+return ret;
+}
+
+fd = event_notifier_get_fd(client-notifier);
+qemu_set_fd_handler(fd, ioapic_eoi_callback, NULL, client);
+
+ret = kvm_eoi_eventfd(client-irq, fd, KVM_EOI_EVENTFD_FLAG_DEASSERT);
+if (ret) {
+fprintf(stderr, %s eoi eventfd failed %d\n, __FUNCTION__, ret);
+return ret;
+}
+client-notifier_enabled = true;
+}
+#endif
 return 0;
 }
 
 void ioapic_unregister_eoi_client(ioapic_eoi_client *client)
 {
+#ifdef KVM_EOI_EVENTFD
+if (kvm_enabled()  kvm_irqchip_in_kernel()) {
+int ret, fd;
+
+fd = event_notifier_get_fd(client-notifier);
+
+ret = kvm_eoi_eventfd(client-irq, fd, KVM_EOI_EVENTFD_FLAG_DEASSIGN);
+if (ret) {
+fprintf(stderr, %s eoi eventfd failed %d\n, __FUNCTION__, ret);
+}
+
+qemu_set_fd_handler(fd, NULL, NULL, NULL);
+
+event_notifier_cleanup(client-notifier);
+client-notifier_enabled = false;
+}
+#endif
 QLIST_REMOVE(client, list);
 }
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] APIC/IOAPIC EOI callback

2010-10-29 Thread Anthony Liguori

On 10/29/2010 12:56 PM, Alex Williamson wrote:

For device assignment, we need to know when the VM writes an end
of interrupt to the APIC, which allows us to re-enable interrupts
on the physical device.  Add a new wrapper for ioapic generated
interrupts with a callback on eoi and create an interface for
drivers to be notified on eoi.

Signed-off-by: Alex Williamsonalex.william...@redhat.com
---

Note that the notifier and notifier_enabled eoi_client fields aren't
used here yet.  I'll send an RFC patch showing how we make use of
these with the proposed KVM_EOI_EVENTFD patches.

  hw/apic.c   |   18 --
  hw/apic.h   |4 
  hw/ioapic.c |   38 --
  hw/pc.h |   16 +++-
  4 files changed, 71 insertions(+), 5 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index 63d62c7..a24117b 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -22,6 +22,7 @@
  #include host-utils.h
  #include sysbus.h
  #include trace.h
+#include pc.h

  /* APIC Local Vector Table */
  #define APIC_LVT_TIMER   0
@@ -103,6 +104,7 @@ struct APICState {
  int wait_for_sipi;
  };

+static uint8_t vector_to_gsi_map[256] = { 0xff };
  static APICState *local_apics[MAX_APICS + 1];
  static int apic_irq_delivered;

@@ -292,6 +294,15 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode,
   trigger_mode);
  }

+void apic_deliver_ioapic_irq(uint8_t dest, uint8_t dest_mode,
+ uint8_t delivery_mode, uint8_t vector_num,
+ uint8_t polarity, uint8_t trigger_mode, int gsi)
+{
+vector_to_gsi_map[vector_num] = gsi;
+apic_deliver_irq(dest, dest_mode, delivery_mode,
+ vector_num, polarity, trigger_mode);
+}
+
  void cpu_set_apic_base(DeviceState *d, uint64_t val)
  {
  APICState *s = DO_UPCAST(APICState, busdev.qdev, d);
@@ -420,8 +431,11 @@ static void apic_eoi(APICState *s)
  if (isrv  0)
  return;
  reset_bit(s-isr, isrv);
-/* XXX: send the EOI packet to the APIC bus to allow the I/O APIC to
-set the remote IRR bit for level triggered interrupts. */
+
+if (vector_to_gsi_map[isrv] != 0xff) {
+ioapic_eoi(vector_to_gsi_map[isrv]);
+vector_to_gsi_map[isrv] = 0xff;
+}
  apic_update_irq(s);
  }

diff --git a/hw/apic.h b/hw/apic.h
index 8a0c9d0..59d0e37 100644
--- a/hw/apic.h
+++ b/hw/apic.h
@@ -8,6 +8,10 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode,
   uint8_t delivery_mode,
   uint8_t vector_num, uint8_t polarity,
   uint8_t trigger_mode);
+void apic_deliver_ioapic_irq(uint8_t dest, uint8_t dest_mode,
+ uint8_t delivery_mode,
+ uint8_t vector_num, uint8_t polarity,
+ uint8_t trigger_mode, int gsi);
  int apic_accept_pic_intr(DeviceState *s);
  void apic_deliver_pic_intr(DeviceState *s, int level);
  int apic_get_interrupt(DeviceState *s);
diff --git a/hw/ioapic.c b/hw/ioapic.c
index 5ae21e9..ffd1c92 100644
--- a/hw/ioapic.c
+++ b/hw/ioapic.c
@@ -26,6 +26,7 @@
  #include qemu-timer.h
  #include host-utils.h
  #include sysbus.h
+#include qlist.h

  //#define DEBUG_IOAPIC

@@ -61,6 +62,39 @@ struct IOAPICState {
  uint64_t ioredtbl[IOAPIC_NUM_PINS];
  };

+static QLIST_HEAD(ioapic_eoi_client_list,
+  ioapic_eoi_client) ioapic_eoi_client_list =
+  QLIST_HEAD_INITIALIZER(ioapic_eoi_client_list);
+
+int ioapic_register_eoi_client(ioapic_eoi_client *client)
+{
+QLIST_INSERT_HEAD(ioapic_eoi_client_list, client, list);
+return 0;
+}
+
+void ioapic_unregister_eoi_client(ioapic_eoi_client *client)
+{
+QLIST_REMOVE(client, list);
+}
+
+int ioapic_eoi_client_get_fd(ioapic_eoi_client *client)
+{
+if (!client-notifier_enabled) {
+return -ENODEV;
+}
+return event_notifier_get_fd(client-notifier);
+}
+
+void ioapic_eoi(int gsi)
+{
+ioapic_eoi_client *client;
+QLIST_FOREACH(client,ioapic_eoi_client_list, list) {
+if (client-irq == gsi) {
+client-eoi(client);
+}
+}
+}
   


I think this all goes away with a NotifierList.

Regards,

Anthony Liguori


+
  static void ioapic_service(IOAPICState *s)
  {
  uint8_t i;
@@ -90,8 +124,8 @@ static void ioapic_service(IOAPICState *s)
  else
  vector = entry  0xff;

-apic_deliver_irq(dest, dest_mode, delivery_mode,
- vector, polarity, trig_mode);
+apic_deliver_ioapic_irq(dest, dest_mode, delivery_mode,
+vector, polarity, trig_mode, i);
  }
  }
  }
diff --git a/hw/pc.h b/hw/pc.h
index 63b0249..5945bff 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -5,6 +5,7 @@
  #include ioport.h
  #include isa.h
  #include fdc.h
+#include event_notifier.h

  /* PC-style peripherals 

Re: [Qemu-devel] [PATCH 1/2] Minimal RAM API support

2010-10-29 Thread Blue Swirl
On Fri, Oct 29, 2010 at 4:39 PM, Alex Williamson
alex.william...@redhat.com wrote:
 This adds a minimum chunk of Anthony's RAM API support so that we
 can identify actual VM RAM versus all the other things that make
 use of qemu_ram_alloc.

 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---

  Makefile.target |    1 +
  cpu-common.h    |    2 +
  memory.c        |   82 
 +++
  memory.h        |   23 +++
  4 files changed, 108 insertions(+), 0 deletions(-)
  create mode 100644 memory.c
  create mode 100644 memory.h

 diff --git a/Makefile.target b/Makefile.target
 index c48cbcc..e4e2eb4 100644
 --- a/Makefile.target
 +++ b/Makefile.target
 @@ -175,6 +175,7 @@ obj-$(CONFIG_VIRTFS) += virtio-9p.o
  obj-y += rwhandler.o
  obj-$(CONFIG_KVM) += kvm.o kvm-all.o
  obj-$(CONFIG_NO_KVM) += kvm-stub.o
 +obj-y += memory.o

Please move this to Makefile.objs to compile the object in hwlib.
There are no target dependencies.

  LIBS+=-lz

  QEMU_CFLAGS += $(VNC_TLS_CFLAGS)
 diff --git a/cpu-common.h b/cpu-common.h
 index a543b5d..6aa2738 100644
 --- a/cpu-common.h
 +++ b/cpu-common.h
 @@ -23,6 +23,8 @@
  /* address in the RAM (different from a physical address) */
  typedef unsigned long ram_addr_t;

 +#include memory.h
 +
  /* memory API */

  typedef void CPUWriteMemoryFunc(void *opaque, target_phys_addr_t addr, 
 uint32_t value);
 diff --git a/memory.c b/memory.c
 new file mode 100644
 index 000..86947fb
 --- /dev/null
 +++ b/memory.c
 @@ -0,0 +1,82 @@
 +/*
 + * RAM API
 + *
 + *  Copyright Red Hat, Inc. 2010
 + *
 + * Authors:
 + *  Alex Williamson alex.william...@redhat.com
 + *
 + * This library is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU Lesser General Public
 + * License as published by the Free Software Foundation; either
 + * version 2 of the License, or (at your option) any later version.
 + *
 + * This library is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 + * Lesser General Public License for more details.
 + *
 + * You should have received a copy of the GNU Lesser General Public
 + * License along with this library; if not, see 
 http://www.gnu.org/licenses/.
 + */
 +#include memory.h
 +#include range.h
 +
 +QemuRamSlots ram_slots = { .slots = QLIST_HEAD_INITIALIZER(ram_slots) };

Please avoid global state. This is not used elsewhere, so it could be
static. But instead the API should take a state parameter
(RAMSlotState *) so that no static state is needed.

 +
 +static QemuRamSlot *qemu_ram_find_slot(target_phys_addr_t start_addr,
 +                                       ram_addr_t size)
 +{
 +    QemuRamSlot *slot;
 +
 +    QLIST_FOREACH(slot, ram_slots.slots, next) {
 +        if (slot-start_addr == start_addr  slot-size == size) {
 +            return slot;
 +        }
 +
 +        if (ranges_overlap(start_addr, size, slot-start_addr, slot-size)) {
 +            abort();
 +        }
 +    }
 +
 +    return NULL;
 +}
 +
 +void qemu_ram_register(target_phys_addr_t start_addr, ram_addr_t size,
 +                       ram_addr_t phys_offset)
 +{
 +    QemuRamSlot *slot;
 +
 +    if (!size) {
 +        return;
 +    }
 +
 +    assert(!qemu_ram_find_slot(start_addr, size));
 +
 +    slot = qemu_mallocz(sizeof(QemuRamSlot));
 +
 +    slot-start_addr = start_addr;
 +    slot-size = size;
 +    slot-offset = phys_offset;
 +
 +    QLIST_INSERT_HEAD(ram_slots.slots, slot, next);
 +
 +    cpu_register_physical_memory(slot-start_addr, slot-size, slot-offset);
 +}
 +
 +void qemu_ram_unregister(target_phys_addr_t start_addr, ram_addr_t size)
 +{
 +    QemuRamSlot *slot;
 +
 +    if (!size) {
 +        return;
 +    }
 +
 +    slot = qemu_ram_find_slot(start_addr, size);
 +    assert(slot != NULL);
 +
 +    QLIST_REMOVE(slot, next);
 +    cpu_register_physical_memory(start_addr, size, IO_MEM_UNASSIGNED);
 +
 +    return;
 +}
 diff --git a/memory.h b/memory.h
 new file mode 100644
 index 000..91e552e
 --- /dev/null
 +++ b/memory.h
 @@ -0,0 +1,23 @@
 +#ifndef QEMU_MEMORY_H
 +#define QEMU_MEMORY_H
 +
 +#include qemu-common.h
 +#include cpu-common.h
 +
 +typedef struct QemuRamSlot {
 +    target_phys_addr_t start_addr;
 +    ram_addr_t size;
 +    ram_addr_t offset;
 +    void *host;
 +    QLIST_ENTRY(QemuRamSlot) next;
 +} QemuRamSlot;
 +
 +typedef struct QemuRamSlots {
 +    QLIST_HEAD(slots, QemuRamSlot) slots;
 +} QemuRamSlots;

This definition should be in memory.c.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 1/2] Minimal RAM API support

2010-10-29 Thread Alex Williamson
On Fri, 2010-10-29 at 19:57 +, Blue Swirl wrote:
 On Fri, Oct 29, 2010 at 4:39 PM, Alex Williamson
 alex.william...@redhat.com wrote:
  This adds a minimum chunk of Anthony's RAM API support so that we
  can identify actual VM RAM versus all the other things that make
  use of qemu_ram_alloc.
 
  Signed-off-by: Alex Williamson alex.william...@redhat.com
  ---
 
   Makefile.target |1 +
   cpu-common.h|2 +
   memory.c|   82 
  +++
   memory.h|   23 +++
   4 files changed, 108 insertions(+), 0 deletions(-)
   create mode 100644 memory.c
   create mode 100644 memory.h
 
  diff --git a/Makefile.target b/Makefile.target
  index c48cbcc..e4e2eb4 100644
  --- a/Makefile.target
  +++ b/Makefile.target
  @@ -175,6 +175,7 @@ obj-$(CONFIG_VIRTFS) += virtio-9p.o
   obj-y += rwhandler.o
   obj-$(CONFIG_KVM) += kvm.o kvm-all.o
   obj-$(CONFIG_NO_KVM) += kvm-stub.o
  +obj-y += memory.o
 
 Please move this to Makefile.objs to compile the object in hwlib.
 There are no target dependencies.

Ok, will do.

   LIBS+=-lz
 
   QEMU_CFLAGS += $(VNC_TLS_CFLAGS)
  diff --git a/cpu-common.h b/cpu-common.h
  index a543b5d..6aa2738 100644
  --- a/cpu-common.h
  +++ b/cpu-common.h
  @@ -23,6 +23,8 @@
   /* address in the RAM (different from a physical address) */
   typedef unsigned long ram_addr_t;
 
  +#include memory.h
  +
   /* memory API */
 
   typedef void CPUWriteMemoryFunc(void *opaque, target_phys_addr_t addr, 
  uint32_t value);
  diff --git a/memory.c b/memory.c
  new file mode 100644
  index 000..86947fb
  --- /dev/null
  +++ b/memory.c
  @@ -0,0 +1,82 @@
  +/*
  + * RAM API
  + *
  + *  Copyright Red Hat, Inc. 2010
  + *
  + * Authors:
  + *  Alex Williamson alex.william...@redhat.com
  + *
  + * This library is free software; you can redistribute it and/or
  + * modify it under the terms of the GNU Lesser General Public
  + * License as published by the Free Software Foundation; either
  + * version 2 of the License, or (at your option) any later version.
  + *
  + * This library is distributed in the hope that it will be useful,
  + * but WITHOUT ANY WARRANTY; without even the implied warranty of
  + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  + * Lesser General Public License for more details.
  + *
  + * You should have received a copy of the GNU Lesser General Public
  + * License along with this library; if not, see 
  http://www.gnu.org/licenses/.
  + */
  +#include memory.h
  +#include range.h
  +
  +QemuRamSlots ram_slots = { .slots = QLIST_HEAD_INITIALIZER(ram_slots) };
 
 Please avoid global state. This is not used elsewhere, so it could be
 static. But instead the API should take a state parameter
 (RAMSlotState *) so that no static state is needed.

The reason for this not being static is that the vfio driver I'm working
on walks it.  Also the reason for the definition being in memory.h
instead of memory.c as you've noted below.  Probably better to solve
that usage by creating an interface that calls a function pointer for
each entry... I'll work on that.  Thanks,

Alex

  +
  +static QemuRamSlot *qemu_ram_find_slot(target_phys_addr_t start_addr,
  +   ram_addr_t size)
  +{
  +QemuRamSlot *slot;
  +
  +QLIST_FOREACH(slot, ram_slots.slots, next) {
  +if (slot-start_addr == start_addr  slot-size == size) {
  +return slot;
  +}
  +
  +if (ranges_overlap(start_addr, size, slot-start_addr, 
  slot-size)) {
  +abort();
  +}
  +}
  +
  +return NULL;
  +}
  +
  +void qemu_ram_register(target_phys_addr_t start_addr, ram_addr_t size,
  +   ram_addr_t phys_offset)
  +{
  +QemuRamSlot *slot;
  +
  +if (!size) {
  +return;
  +}
  +
  +assert(!qemu_ram_find_slot(start_addr, size));
  +
  +slot = qemu_mallocz(sizeof(QemuRamSlot));
  +
  +slot-start_addr = start_addr;
  +slot-size = size;
  +slot-offset = phys_offset;
  +
  +QLIST_INSERT_HEAD(ram_slots.slots, slot, next);
  +
  +cpu_register_physical_memory(slot-start_addr, slot-size, 
  slot-offset);
  +}
  +
  +void qemu_ram_unregister(target_phys_addr_t start_addr, ram_addr_t size)
  +{
  +QemuRamSlot *slot;
  +
  +if (!size) {
  +return;
  +}
  +
  +slot = qemu_ram_find_slot(start_addr, size);
  +assert(slot != NULL);
  +
  +QLIST_REMOVE(slot, next);
  +cpu_register_physical_memory(start_addr, size, IO_MEM_UNASSIGNED);
  +
  +return;
  +}
  diff --git a/memory.h b/memory.h
  new file mode 100644
  index 000..91e552e
  --- /dev/null
  +++ b/memory.h
  @@ -0,0 +1,23 @@
  +#ifndef QEMU_MEMORY_H
  +#define QEMU_MEMORY_H
  +
  +#include qemu-common.h
  +#include cpu-common.h
  +
  +typedef struct QemuRamSlot {
  +target_phys_addr_t start_addr;
  +ram_addr_t size;
  +ram_addr_t offset;
  +void *host;
  +

Re: [PATCH v13 10/16] Add a hook to intercept external buffers from NIC driver.

2010-10-29 Thread David Miller
From: Xin, Xiaohui xiaohui@intel.com
Date: Wed, 27 Oct 2010 09:33:12 +0800

 Somehow, it seems not a trivial work to support it now. Can we support it
 later and as a todo with our current work?

I would prefer the feature work properly, rather than only in specific
cases, before being integated.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


qemu-kvm-0.13.0 compile error

2010-10-29 Thread Teck Choon Giam
Hi,

I have problem to compile qemu-kvm 0.13.0 with gcc version 4.1.2
20080704 (Red Hat 4.1.2-48) in CentOS 5.  Errors below:

/usr/src/qemu-kvm-0.13.0/hw/ide/core.c: In function ‘ide_drive_pio_post_load’:
/usr/src/qemu-kvm-0.13.0/hw/ide/core.c:2782: warning: comparison is
always false due to limited range of data type
ui/vnc-enc-tight.c: In function ‘tight_detect_smooth_image’:
ui/vnc-enc-tight.c:284: warning: comparison is always true due to
limited range of data type
ui/vnc-enc-tight.c:297: warning: comparison is always true due to
limited range of data type
ui/vnc-enc-tight.c: In function ‘tight_encode_indexed_rect16’:
ui/vnc-enc-tight.c:456: warning: comparison is always false due to
limited range of data type
ui/vnc-enc-tight.c: In function ‘tight_encode_indexed_rect32’:
ui/vnc-enc-tight.c:457: warning: comparison is always false due to
limited range of data type
ui/vnc-enc-tight.c: In function ‘send_sub_rect’:
ui/vnc-enc-tight.c:1458: warning: ‘ret’ may be used uninitialized in
this function
In file included from /usr/src/qemu-kvm-0.13.0/kvm-all.c:1347:
/usr/src/qemu-kvm-0.13.0/qemu-kvm.c: In function ‘kvm_run’:
/usr/src/qemu-kvm-0.13.0/qemu-kvm.c:675: warning: implicit declaration
of function ‘kvm_handle_internal_error’
kvm-all.o: In function `kvm_run':
/usr/src/qemu-kvm-0.13.0/qemu-kvm.c:675: undefined reference to
`kvm_handle_internal_error'
collect2: ld returned 1 exit status
make[1]: *** [qemu-system-x86_64] Error 1
make: *** [subdir-x86_64-softmmu] Error 2

A search about the related function output:

# grep kvm_handle_internal_error ./*
./kvm-all.c:static void kvm_handle_internal_error(CPUState *env,
struct kvm_run *run)
./kvm-all.c:kvm_handle_internal_error(env, run);
./qemu-kvm.c:kvm_handle_internal_error(env, run);


Thanks.

Kindest regards,
Giam Teck Choon
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH iproute2] Support 'mode' parameter when creating macvtap device

2010-10-29 Thread Arnd Bergmann
On Friday 29 October 2010, Sridhar Samudrala wrote:
 Add support for 'mode' parameter when creating a macvtap device.
 This allows a macvtap device to be created in bridge, private or
 the default vepa modes.
 
 Signed-off-by: Sridhar Samudrala s...@us.ibm.com

Acked-by: Arnd Bergmann a...@arndb.de

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] macvlan: Introduce 'passthru' mode to takeover the underlying device

2010-10-29 Thread Arnd Bergmann
On Friday 29 October 2010, Sridhar Samudrala wrote:
 With the current default 'vepa' mode, a KVM guest using virtio with 
 macvtap backend has the following limitations.
 - cannot change/add a mac address on the guest virtio-net

I believe this could be changed if there is a neeed, but I actually
consider it one of the design points of macvlan that the guest
is not able to change the mac address. With 802.1Qbg you rely on
the switch being able to identify the guest by its MAC address,
which the host kernel must ensure.

 - cannot create a vlan device on the guest virtio-net

Why not? If this doesn't work, it's probably a bug!
Why does the passthru mode enable it if it doesn't work
already?

 - cannot enable promiscuous mode on guest virtio-net

Could you elaborate why such a setup would be useful?

Arnd

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ovs-dev] Flow Control and Port Mirroring

2010-10-29 Thread Simon Horman
[ CCed VHOST contacts ]

On Thu, Oct 28, 2010 at 01:22:02PM -0700, Jesse Gross wrote:
 On Thu, Oct 28, 2010 at 4:54 AM, Simon Horman ho...@verge.net.au wrote:
  My reasoning is that in the non-mirroring case the guest is
  limited by the external interface through wich the packets
  eventually flow - that is 1Gbit/s. But in the mirrored either
  there is no flow control or the flow control is acting on the
  rate of dummy0, which is essentailly infinate.
 
  Before investigating this any further I wanted to ask if
  this behaviour is intentional.
 
 It's not intentional but I can take a guess at what is happening.
 
 When we send the packet to a mirror, the skb is cloned but only the
 original skb is charged to the sender.  If the original packet is
 delivered to localhost then it will be freed quickly and no longer
 accounted for, despite the fact that the real packet is still
 sitting in the transmit queue on the NIC.  The UDP stack will then
 send the next packet, limited only by the speed of the CPU.

That would explain what I have observed.

 Normally, this would be tracked by accounting for the memory charged
 to the socket.  However, I know that Xen tracks whether the actual
 pages of memory have been freed, which should avoid this problem since
 the memory won't be released util the last packet has been sent.  I
 don't know what KVM virtio does but I'm guessing that it similar to
 the former, since this problem is occurring.

I am also familiar of how Xen tracks pages but less sure of the
virtio side of things.

 While it would be easy to charge the socket for all clones, I also
 want to be careful about over accounting of the same data, leading to
 a very small effective socket buffer.

Agreed, we don't want to see over-charging.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html