Re: [RFC PATCH 1/1] vhost: TX used buffer guest signal accumulation
On Thu, Oct 28, 2010 at 12:32:35PM -0700, Shirley Ma wrote: On Thu, 2010-10-28 at 07:20 +0200, Michael S. Tsirkin wrote: My concern is this can delay signalling for unlimited time. Could you pls test this with guests that do not have 2b5bbe3b8bee8b38bdc27dd9c0270829b6eb7eeb b0c39dbdc204006ef3558a66716ff09797619778 that is 2.6.31 and older? The patch only induces delay signaling unlimited time when there is no TX packet to transmit. I thought TX signaling only noticing guest to release the used buffers, anything else beside this? Right, that's it I think. For newer kernels we orphan the skb on xmit so we don't care that much about completing them. This does rely on an undocumented assumption about guest behaviour though. I tested rhel5u5 guest (2.6.18 kernel), it works fine. I checked the two commits log, I don't think this patch could cause any issue w/o these two patches. Also I found a big TX regression for old guest and new guest. For old guest, I am able to get almost 11Gb/s for 2K message size, but for the new guest kernel, I can only get 3.5 Gb/s with the patch and same host. I will dig it why. thanks Shirley -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/1] vhost: TX used buffer guest signal accumulation
On Thu, Oct 28, 2010 at 10:14:22AM -0700, Shirley Ma wrote: Two ideas: 1. How about writing out used, just delaying the signal? This way we don't have to queue separately. This improves some performance, but not as good as delaying both used and signal. Since delaying used buffers combining multiple small copies to a large copy, which saves more CPU utilization and increased some BW. Hmm. I don't yet understand. We are still doing copies into the per-vq buffer, and the data copied is really small. Is it about cache line bounces? Could you try figuring it out? 2. How about flushing out queued stuff before we exit the handle_tx loop? That would address most of the spec issue. The performance is almost as same as the previous patch. I will resubmit the modified one, adding vhost_add_used_and_signal_n after handle_tx loop for processing pending queue. This patch was a part of modified macvtap zero copy which I haven't submitted yet. I found this helped vhost TX in general. This pending queue will be used by DMA done later, so I put it in vq instead of a local variable in handle_tx. Thanks Shirley BTW why do we need another array? Isn't heads field exactly what we need here? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/1] vhost: TX used buffer guest signal accumulation
On Thu, Oct 28, 2010 at 02:40:50PM -0700, Shirley Ma wrote: On Thu, 2010-10-28 at 14:04 -0700, Sridhar Samudrala wrote: It would be some change in virtio-net driver that may have improved the latency of small messages which in turn would have reduced the bandwidth as TCP could not accumulate and send large packets. I will check out any latency improvement patch in virtio_net. If that's the case, whether it is good to have some tunable parameter to benefit both BW and latency workload? Shirley No, we need it to work well automatically somehow. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 1/1] vhost: TX used buffer guest signal accumulation
On Thu, Oct 28, 2010 at 01:13:55PM -0700, Shirley Ma wrote: On Thu, 2010-10-28 at 12:32 -0700, Shirley Ma wrote: Also I found a big TX regression for old guest and new guest. For old guest, I am able to get almost 11Gb/s for 2K message size, but for the new guest kernel, I can only get 3.5 Gb/s with the patch and same host. I will dig it why. The regression is from guest kernel, not from this patch. Tested 2.6.31 kernel, it's performance is less than 2Gb/s for 2K message size already. I will resubmit the patch for review. I will start to test from 2.6.30 kernel to figure it when TX regression induced in virtio_net. Any suggestion which guest kernel I should test to figure out this regression? Thanks Shirley git bisect 2.6.31 2.6.30 and go from here. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [v3 RFC PATCH 0/4] Implement multiqueue virtio-net
On Thu, Oct 28, 2010 at 12:48:57PM +0530, Krishna Kumar2 wrote: Krishna Kumar2/India/IBM wrote on 10/28/2010 10:44:14 AM: Results for UDP BW tests (unidirectional, sum across 3 iterations, each iteration of 45 seconds, default netperf, vhosts bound to cpus 0-3; no other tuning): Is binding vhost threads to CPUs really required? What happens if we let the scheduler do its job? Nothing drastic, I remember BW% and SD% both improved a bit as a result of binding. If there's a significant improvement this would mean that we need to rethink the vhost-net interaction with the scheduler. I will get a test run with and without binding and post the results later today. Correction: The result with binding is is much better for SD/CPU compared to without-binding: Can you pls ty finding out why that is? Is some thread bouncing between CPUs? Does a wrong numa node get picked up? In practice users are very unlikely to pin threads to CPUs. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: Create an eventfd mechanism for EOIs to get to userspace
To support VFIO based device assignment, we need to be able to get an EOI out of the KVM irqchip. This introduces a mechanism to do that by registering an eventfd to be signaled when the IRQ is ACKed. Signed-off-by: Alex Williamson alex.william...@redhat.com --- include/linux/kvm.h | 13 ++ include/linux/kvm_host.h |6 +++ virt/kvm/eventfd.c | 95 ++ virt/kvm/kvm_main.c |8 4 files changed, 122 insertions(+), 0 deletions(-) diff --git a/include/linux/kvm.h b/include/linux/kvm.h index ea2dc1a..92d5b27 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -541,6 +541,7 @@ struct kvm_ppc_pvinfo { #define KVM_CAP_PPC_GET_PVINFO 57 #define KVM_CAP_PPC_IRQ_LEVEL 58 #define KVM_CAP_ASYNC_PF 59 +#define KVM_CAP_EOI_EVENTFD 60 #ifdef KVM_CAP_IRQ_ROUTING @@ -620,6 +621,16 @@ struct kvm_clock_data { __u32 pad[9]; }; +#define KVM_EOI_EVENTFD_FLAG_DEASSIGN (1 0) +#define KVM_EOI_EVENTFD_FLAG_DEASSERT (1 1) + +struct kvm_eoi { + __u32 fd; + __u32 gsi; + __u32 flags; + __u8 pad[20]; +}; + /* * ioctls for VM fds */ @@ -677,6 +688,8 @@ struct kvm_clock_data { #define KVM_SET_PIT2 _IOW(KVMIO, 0xa0, struct kvm_pit_state2) /* Available with KVM_CAP_PPC_GET_PVINFO */ #define KVM_PPC_GET_PVINFO _IOW(KVMIO, 0xa1, struct kvm_ppc_pvinfo) +/* Available with KVM_CAP_EOI_EVENTFD */ +#define KVM_EOI_EVENTFD _IOW(KVMIO, 0xa2, struct kvm_eoi) /* * ioctls for vcpu fds diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index ee4314e..5d50a7e 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -227,6 +227,7 @@ struct kvm { struct list_head items; } irqfds; struct list_head ioeventfds; + struct list_head eoi_eventfds; #endif struct kvm_vm_stat stat; struct kvm_arch arch; @@ -643,6 +644,7 @@ void kvm_eventfd_init(struct kvm *kvm); int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags); void kvm_irqfd_release(struct kvm *kvm); int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args); +int kvm_eoi_eventfd(struct kvm *kvm, struct kvm_eoi *eoi); #else @@ -658,6 +660,10 @@ static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) return -ENOSYS; } +static inline int kvm_eoi_eventfd(struct kvm *kvm, struct kvm_eoi *eoi) +{ + return -ENOSYS; +} #endif /* CONFIG_HAVE_KVM_EVENTFD */ #ifdef CONFIG_KVM_APIC_ARCHITECTURE diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index c1f1e3c..3dbfb21 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -253,6 +253,7 @@ kvm_eventfd_init(struct kvm *kvm) spin_lock_init(kvm-irqfds.lock); INIT_LIST_HEAD(kvm-irqfds.items); INIT_LIST_HEAD(kvm-ioeventfds); + INIT_LIST_HEAD(kvm-eoi_eventfds); } /* @@ -586,3 +587,97 @@ kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) return kvm_assign_ioeventfd(kvm, args); } + +/* + * + * eoi_eventfd: Translate KVM APIC/IOAPIC EOI into eventfd signal. + * + * userspace can register GSIs with an eventfd for receiving notification + * when an EOI occurs. + * + */ + +struct _eoi_eventfd { + struct list_headlist; + struct kvm *kvm; + struct eventfd_ctx *eventfd; + booldeassert; + struct kvm_irq_ack_notifier notifier; +}; + +static void kvm_eoi_eventfd_acked(struct kvm_irq_ack_notifier *notifier) +{ + struct _eoi_eventfd *p; + + p = container_of(notifier, struct _eoi_eventfd, notifier); + + if (p-deassert) + kvm_set_irq(p-kvm, KVM_USERSPACE_IRQ_SOURCE_ID, + notifier-gsi, 0); + + eventfd_signal(p-eventfd, 1); +} + +static int kvm_assign_eoi_eventfd(struct kvm *kvm, struct kvm_eoi *eoi) +{ + struct eventfd_ctx *eventfd; + struct _eoi_eventfd *p; + + eventfd = eventfd_ctx_fdget(eoi-fd); + if (IS_ERR(eventfd)) + return PTR_ERR(eventfd); + + p = kzalloc(sizeof(*p), GFP_KERNEL); + if (!p) { + eventfd_ctx_put(eventfd); + return -ENOMEM; + } + + INIT_LIST_HEAD(p-list); + p-kvm = kvm; + p-eventfd = eventfd; + p-deassert = !!(eoi-flags KVM_EOI_EVENTFD_FLAG_DEASSERT); + + p-notifier.gsi = eoi-gsi; + p-notifier.irq_acked = kvm_eoi_eventfd_acked; + + list_add_tail(p-list, kvm-eoi_eventfds); + kvm_register_irq_ack_notifier(kvm, p-notifier); + + return 0; +} + +static int kvm_deassign_eoi_eventfd(struct kvm *kvm, struct kvm_eoi *eoi) +{ + struct eventfd_ctx *eventfd; + struct _eoi_eventfd *p, *tmp; + int ret = -ENOENT; + + eventfd = eventfd_ctx_fdget(eoi-fd); +
Re: [v3 RFC PATCH 0/4] Implement multiqueue virtio-net
On Fri, 29 Oct 2010 13:26 +0200, Michael S. Tsirkin m...@redhat.com wrote: On Thu, Oct 28, 2010 at 12:48:57PM +0530, Krishna Kumar2 wrote: Krishna Kumar2/India/IBM wrote on 10/28/2010 10:44:14 AM: In practice users are very unlikely to pin threads to CPUs. I may be misunderstanding what you're referring to. It caught my attention since I'm working on a configuration to do what you say is unlikely, so I'll chime in for what it's worth. An option in Vyatta allows assigning CPU affinity to network adapters, since apparently seperate L2 caches can have a significant impact on throughput. Although much of their focus seems to be on commercial virtualization platforms, I do see quite a few forum posts with regard to KVM. Mabye this still qualifies as an edge case, but as for virtualized routing theirs seems to offer the most functionality. http://www.vyatta.org/forum/viewtopic.php?t=2697 -cb -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] pci: Add callbacks to support retreiving and updating interrupts
For device assignment, we need to be able to retreive the IRQ that the device interrupt pin is assigned to and be notified if that mapping changes (this happens via ACPI interrupt link updates on x86). We can then make use of the IRQ for EOI notification. Current qemu-kvm device assignment code invades common code with some hard coded hacks to achieve this. This attempts to architect the solution. Chipset components responsible for interrupt mapping can call pci_bridge_update_irqs() to signal when interrupt mapping may have changed. The bridge for those chipsets should then implement a get_irq callback. This is stubbed out for everybody as I only know how PIIX3 works. Devices wishing to be notified about IRQ updates can register via pci_register_update_irqs(), where they can then check mappings with pci_get_irq(). Signed-off-by: Alex Williamson alex.william...@redhat.com --- hw/apb_pci.c |4 ++-- hw/bonito.c|2 +- hw/grackle_pci.c |2 +- hw/gt64xxx.c |5 +++-- hw/pci.c | 52 +++- hw/pci.h | 16 hw/piix_pci.c | 23 ++- hw/ppc4xx_pci.c|2 +- hw/ppce500_pci.c |2 +- hw/prep_pci.c |2 +- hw/sh_pci.c|2 +- hw/unin_pci.c | 10 ++ hw/versatile_pci.c |2 +- 13 files changed, 99 insertions(+), 25 deletions(-) diff --git a/hw/apb_pci.c b/hw/apb_pci.c index 0ecac55..47ff0d9 100644 --- a/hw/apb_pci.c +++ b/hw/apb_pci.c @@ -336,8 +336,8 @@ PCIBus *pci_apb_init(target_phys_addr_t special_base, d = FROM_SYSBUS(APBState, s); d-bus = pci_register_bus(d-busdev.qdev, pci, - pci_apb_set_irq, pci_pbm_map_irq, d, - 0, 32); + pci_apb_set_irq, NULL, + pci_pbm_map_irq, d, 0, 32); pci_bus_set_mem_base(d-bus, mem_base); for (i = 0; i 32; i++) { diff --git a/hw/bonito.c b/hw/bonito.c index dcf0311..d2869bb 100644 --- a/hw/bonito.c +++ b/hw/bonito.c @@ -772,7 +772,7 @@ PCIBus *bonito_init(qemu_irq *pic) dev = qdev_create(NULL, Bonito-pcihost); pcihost = FROM_SYSBUS(BonitoState, sysbus_from_qdev(dev)); b = pci_register_bus(pcihost-busdev.qdev, pci, pci_bonito_set_irq, - pci_bonito_map_irq, pic, 0x28, 32); + NULL, pci_bonito_map_irq, pic, 0x28, 32); pcihost-bus = b; qdev_init_nofail(dev); diff --git a/hw/grackle_pci.c b/hw/grackle_pci.c index 91c755f..e747d7e 100644 --- a/hw/grackle_pci.c +++ b/hw/grackle_pci.c @@ -89,7 +89,7 @@ PCIBus *pci_grackle_init(uint32_t base, qemu_irq *pic) s = sysbus_from_qdev(dev); d = FROM_SYSBUS(GrackleState, s); d-host_state.bus = pci_register_bus(d-busdev.qdev, pci, - pci_grackle_set_irq, + pci_grackle_set_irq, NULL, pci_grackle_map_irq, pic, 0, 4); diff --git a/hw/gt64xxx.c b/hw/gt64xxx.c index cabf7ea..2a0fc4a 100644 --- a/hw/gt64xxx.c +++ b/hw/gt64xxx.c @@ -1114,8 +1114,9 @@ PCIBus *pci_gt64120_init(qemu_irq *pic) s-pci = qemu_mallocz(sizeof(GT64120PCIState)); s-pci-bus = pci_register_bus(NULL, pci, - pci_gt64120_set_irq, pci_gt64120_map_irq, - pic, PCI_DEVFN(18, 0), 4); + pci_gt64120_set_irq, NULL, + pci_gt64120_map_irq, pic, + PCI_DEVFN(18, 0), 4); s-ISD_handle = cpu_register_io_memory(gt64120_read, gt64120_write, s); d = pci_register_device(s-pci-bus, GT64120 PCI Bus, sizeof(PCIDevice), 0, NULL, NULL); diff --git a/hw/pci.c b/hw/pci.c index 1280d4d..645b119 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -41,6 +41,7 @@ struct PCIBus { BusState qbus; int devfn_min; pci_set_irq_fn set_irq; +pci_get_irq_fn get_irq; pci_map_irq_fn map_irq; pci_hotplug_fn hotplug; DeviceState *hotplug_qdev; @@ -139,6 +140,23 @@ static void pci_change_irq_level(PCIDevice *pci_dev, int irq_num, int change) bus-set_irq(bus-irq_opaque, irq_num, bus-irq_count[irq_num] != 0); } +int pci_get_irq(PCIDevice *pci_dev, int pin) +{ +PCIBus *bus; +for (;;) { +if (!pci_dev) +return -ENOSYS; +bus = pci_dev-bus; +if (!bus) +return -ENOSYS; +pin = bus-map_irq(pci_dev, pin); +if (bus-get_irq) +break; +pci_dev = bus-parent_dev; +} +return bus-get_irq(bus-irq_opaque, pin); +} + /* Update interrupt status bit in config space on interrupt * state change. */ static void pci_update_irq_status(PCIDevice *dev) @@ -260,10 +278,11 @@ PCIBus
Re: [RFC PATCH 1/1] vhost: TX used buffer guest signal accumulation
On Fri, 2010-10-29 at 10:10 +0200, Michael S. Tsirkin wrote: Hmm. I don't yet understand. We are still doing copies into the per-vq buffer, and the data copied is really small. Is it about cache line bounces? Could you try figuring it out? per-vq buffer is much less expensive than 3 put_copy() call. I will collect the profiling data to show that. 2. How about flushing out queued stuff before we exit the handle_tx loop? That would address most of the spec issue. The performance is almost as same as the previous patch. I will resubmit the modified one, adding vhost_add_used_and_signal_n after handle_tx loop for processing pending queue. This patch was a part of modified macvtap zero copy which I haven't submitted yet. I found this helped vhost TX in general. This pending queue will be used by DMA done later, so I put it in vq instead of a local variable in handle_tx. Thanks Shirley BTW why do we need another array? Isn't heads field exactly what we need here? head field is only for up to 32, the more used buffers add and signal accumulated the better performance is from test results. That's was one of the reason I didn't use heads. The other reason was I used these buffer for pending dma done in mavctap zero copy patch. It could be up to vq-num in worse case. Thanks Shirley -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] Minimal RAM API support
For VFIO based device assignment, we need to know what guest memory areas are actual RAM. RAMBlocks have long since become a grab bag of misc allocations, so aren't effective for this. Anthony has had a RAM API in mind for a while now that addresses this problem. This implements just enough of it so that we have an interface to get actual guest memory physical addresses to setup the host IOMMU. We can continue building a full RAM API on top of this stub. Anthony, feel free to add copyright to memory.c as it's based on your initial implementation. I had to add something since the file in your branch just copies a header with Frabrice's copywrite. Thanks, Alex --- Alex Williamson (2): RAM API: Make use of it for x86 PC Minimal RAM API support Makefile.target |1 + cpu-common.h|2 + hw/pc.c | 12 memory.c| 82 +++ memory.h| 23 +++ 5 files changed, 114 insertions(+), 6 deletions(-) create mode 100644 memory.c create mode 100644 memory.h -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] Minimal RAM API support
This adds a minimum chunk of Anthony's RAM API support so that we can identify actual VM RAM versus all the other things that make use of qemu_ram_alloc. Signed-off-by: Alex Williamson alex.william...@redhat.com --- Makefile.target |1 + cpu-common.h|2 + memory.c| 82 +++ memory.h| 23 +++ 4 files changed, 108 insertions(+), 0 deletions(-) create mode 100644 memory.c create mode 100644 memory.h diff --git a/Makefile.target b/Makefile.target index c48cbcc..e4e2eb4 100644 --- a/Makefile.target +++ b/Makefile.target @@ -175,6 +175,7 @@ obj-$(CONFIG_VIRTFS) += virtio-9p.o obj-y += rwhandler.o obj-$(CONFIG_KVM) += kvm.o kvm-all.o obj-$(CONFIG_NO_KVM) += kvm-stub.o +obj-y += memory.o LIBS+=-lz QEMU_CFLAGS += $(VNC_TLS_CFLAGS) diff --git a/cpu-common.h b/cpu-common.h index a543b5d..6aa2738 100644 --- a/cpu-common.h +++ b/cpu-common.h @@ -23,6 +23,8 @@ /* address in the RAM (different from a physical address) */ typedef unsigned long ram_addr_t; +#include memory.h + /* memory API */ typedef void CPUWriteMemoryFunc(void *opaque, target_phys_addr_t addr, uint32_t value); diff --git a/memory.c b/memory.c new file mode 100644 index 000..86947fb --- /dev/null +++ b/memory.c @@ -0,0 +1,82 @@ +/* + * RAM API + * + * Copyright Red Hat, Inc. 2010 + * + * Authors: + * Alex Williamson alex.william...@redhat.com + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see http://www.gnu.org/licenses/. + */ +#include memory.h +#include range.h + +QemuRamSlots ram_slots = { .slots = QLIST_HEAD_INITIALIZER(ram_slots) }; + +static QemuRamSlot *qemu_ram_find_slot(target_phys_addr_t start_addr, + ram_addr_t size) +{ +QemuRamSlot *slot; + +QLIST_FOREACH(slot, ram_slots.slots, next) { +if (slot-start_addr == start_addr slot-size == size) { +return slot; +} + +if (ranges_overlap(start_addr, size, slot-start_addr, slot-size)) { +abort(); +} +} + +return NULL; +} + +void qemu_ram_register(target_phys_addr_t start_addr, ram_addr_t size, + ram_addr_t phys_offset) +{ +QemuRamSlot *slot; + +if (!size) { +return; +} + +assert(!qemu_ram_find_slot(start_addr, size)); + +slot = qemu_mallocz(sizeof(QemuRamSlot)); + +slot-start_addr = start_addr; +slot-size = size; +slot-offset = phys_offset; + +QLIST_INSERT_HEAD(ram_slots.slots, slot, next); + +cpu_register_physical_memory(slot-start_addr, slot-size, slot-offset); +} + +void qemu_ram_unregister(target_phys_addr_t start_addr, ram_addr_t size) +{ +QemuRamSlot *slot; + +if (!size) { +return; +} + +slot = qemu_ram_find_slot(start_addr, size); +assert(slot != NULL); + +QLIST_REMOVE(slot, next); +cpu_register_physical_memory(start_addr, size, IO_MEM_UNASSIGNED); + +return; +} diff --git a/memory.h b/memory.h new file mode 100644 index 000..91e552e --- /dev/null +++ b/memory.h @@ -0,0 +1,23 @@ +#ifndef QEMU_MEMORY_H +#define QEMU_MEMORY_H + +#include qemu-common.h +#include cpu-common.h + +typedef struct QemuRamSlot { +target_phys_addr_t start_addr; +ram_addr_t size; +ram_addr_t offset; +void *host; +QLIST_ENTRY(QemuRamSlot) next; +} QemuRamSlot; + +typedef struct QemuRamSlots { +QLIST_HEAD(slots, QemuRamSlot) slots; +} QemuRamSlots; +extern QemuRamSlots ram_slots; + +void qemu_ram_register(target_phys_addr_t start_addr, ram_addr_t size, + ram_addr_t phys_offset); +void qemu_ram_unregister(target_phys_addr_t start_addr, ram_addr_t size); +#endif -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] RAM API: Make use of it for x86 PC
Register the actual VM RAM using the new API Signed-off-by: Alex Williamson alex.william...@redhat.com --- hw/pc.c | 12 ++-- 1 files changed, 6 insertions(+), 6 deletions(-) diff --git a/hw/pc.c b/hw/pc.c index 69b13bf..0ea6d10 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -912,14 +912,14 @@ void pc_memory_init(ram_addr_t ram_size, /* allocate RAM */ ram_addr = qemu_ram_alloc(NULL, pc.ram, below_4g_mem_size + above_4g_mem_size); -cpu_register_physical_memory(0, 0xa, ram_addr); -cpu_register_physical_memory(0x10, - below_4g_mem_size - 0x10, - ram_addr + 0x10); + +qemu_ram_register(0, 0xa, ram_addr); +qemu_ram_register(0x10, below_4g_mem_size - 0x10, + ram_addr + 0x10); #if TARGET_PHYS_ADDR_BITS 32 if (above_4g_mem_size 0) { -cpu_register_physical_memory(0x1ULL, above_4g_mem_size, - ram_addr + below_4g_mem_size); +qemu_ram_register(0x1ULL, above_4g_mem_size, + ram_addr + below_4g_mem_size); } #endif -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] exec: Implement qemu_ram_free_from_ptr()
Required for regions mapped via qemu_ram_alloc_from_ptr(). VFIO will make use of this to remove mappings when devices are hot unplugged. (Current callers of qemu_ram_alloc_from_ptr() should probably need this too) Signed-off-by: Alex Williamson alex.william...@redhat.com --- cpu-common.h |1 + exec.c | 13 + 2 files changed, 14 insertions(+), 0 deletions(-) diff --git a/cpu-common.h b/cpu-common.h index a543b5d..8a3d1da 100644 --- a/cpu-common.h +++ b/cpu-common.h @@ -43,6 +43,7 @@ ram_addr_t cpu_get_physical_page_desc(target_phys_addr_t addr); ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name, ram_addr_t size, void *host); ram_addr_t qemu_ram_alloc(DeviceState *dev, const char *name, ram_addr_t size); +void qemu_ram_free_from_ptr(ram_addr_t addr); void qemu_ram_free(ram_addr_t addr); /* This should only be used for ram local to a device. */ void *qemu_get_ram_ptr(ram_addr_t addr); diff --git a/exec.c b/exec.c index 631d8c5..2b3b9ba 100644 --- a/exec.c +++ b/exec.c @@ -2882,6 +2882,19 @@ ram_addr_t qemu_ram_alloc(DeviceState *dev, const char *name, ram_addr_t size) return qemu_ram_alloc_from_ptr(dev, name, size, NULL); } +void qemu_ram_free_from_ptr(ram_addr_t addr) +{ +RAMBlock *block; + +QLIST_FOREACH(block, ram_list.blocks, next) { +if (addr == block-offset) { +QLIST_REMOVE(block, next); +qemu_free(block); +return; +} +} +} + void qemu_ram_free(ram_addr_t addr) { RAMBlock *block; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: Add suport for KVM_EOI_EVENTFD
This allows us to register an eventfd to be triggered on EOI for the given IRQ. Signed-off-by: Alex Williamson alex.william...@redhat.com --- Userspace side of: [PATCH] kvm: Create an eventfd mechanism for EOIs to get to userspace kvm-all.c | 19 +++ kvm.h | 10 ++ kvm/include/linux/kvm.h | 13 + 3 files changed, 42 insertions(+), 0 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 0e60748..75dbe76 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1349,5 +1349,24 @@ int kvm_set_irqfd(int gsi, int fd, bool assigned) } #endif +#if defined(KVM_EOI_EVENTFD) +int kvm_eoi_eventfd(int gsi, int fd, uint32_t flags) +{ +struct kvm_eoi eoi = { +.fd = fd, +.gsi = gsi, +.flags = flags, +}; +int r; + +if (!kvm_enabled() || !kvm_irqchip_in_kernel()) +return -ENOSYS; + +r = kvm_vm_ioctl(kvm_state, KVM_EOI_EVENTFD, eoi); +if (r 0) +return r; +return 0; +} +#endif #undef PAGE_SIZE #include qemu-kvm.c diff --git a/kvm.h b/kvm.h index 02280a6..777904a 100644 --- a/kvm.h +++ b/kvm.h @@ -203,6 +203,16 @@ int kvm_set_irqfd(int gsi, int fd, bool assigned) } #endif +#if defined(KVM_EOI_EVENTFD) defined(CONFIG_KVM) +int kvm_eoi_eventfd(int gsi, int fd, uint32_t flags); +#else +static inline +int kvm_eoi_eventfd(int gsi, int fd, uint32_t flags) +{ +return -ENOSYS; +} +#endif + int kvm_set_ioeventfd_pio_word(int fd, uint16_t adr, uint16_t val, bool assign); int kvm_has_gsi_routing(void); diff --git a/kvm/include/linux/kvm.h b/kvm/include/linux/kvm.h index e46729e..5490f62 100644 --- a/kvm/include/linux/kvm.h +++ b/kvm/include/linux/kvm.h @@ -530,6 +530,7 @@ struct kvm_enable_cap { #ifdef __KVM_HAVE_XCRS #define KVM_CAP_XCRS 56 #endif +#define KVM_CAP_EOI_EVENTFD 60 #ifdef KVM_CAP_IRQ_ROUTING @@ -609,6 +610,16 @@ struct kvm_clock_data { __u32 pad[9]; }; +#define KVM_EOI_EVENTFD_FLAG_DEASSIGN (1 0) +#define KVM_EOI_EVENTFD_FLAG_DEASSERT (1 1) + +struct kvm_eoi { + __u32 fd; + __u32 gsi; + __u32 flags; + __u8 pad[20]; +}; + /* * ioctls for VM fds */ @@ -663,6 +674,8 @@ struct kvm_clock_data { /* Available with KVM_CAP_PIT_STATE2 */ #define KVM_GET_PIT2 _IOR(KVMIO, 0x9f, struct kvm_pit_state2) #define KVM_SET_PIT2 _IOW(KVMIO, 0xa0, struct kvm_pit_state2) +/* Available with KVM_CAP_EOI_EVENTFD */ +#define KVM_EOI_EVENTFD _IOW(KVMIO, 0xa2, struct kvm_eoi) /* * ioctls for vcpu fds -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] APIC/IOAPIC EOI callback
For device assignment, we need to know when the VM writes an end of interrupt to the APIC, which allows us to re-enable interrupts on the physical device. Add a new wrapper for ioapic generated interrupts with a callback on eoi and create an interface for drivers to be notified on eoi. Signed-off-by: Alex Williamson alex.william...@redhat.com --- Note that the notifier and notifier_enabled eoi_client fields aren't used here yet. I'll send an RFC patch showing how we make use of these with the proposed KVM_EOI_EVENTFD patches. hw/apic.c | 18 -- hw/apic.h |4 hw/ioapic.c | 38 -- hw/pc.h | 16 +++- 4 files changed, 71 insertions(+), 5 deletions(-) diff --git a/hw/apic.c b/hw/apic.c index 63d62c7..a24117b 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -22,6 +22,7 @@ #include host-utils.h #include sysbus.h #include trace.h +#include pc.h /* APIC Local Vector Table */ #define APIC_LVT_TIMER 0 @@ -103,6 +104,7 @@ struct APICState { int wait_for_sipi; }; +static uint8_t vector_to_gsi_map[256] = { 0xff }; static APICState *local_apics[MAX_APICS + 1]; static int apic_irq_delivered; @@ -292,6 +294,15 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, trigger_mode); } +void apic_deliver_ioapic_irq(uint8_t dest, uint8_t dest_mode, + uint8_t delivery_mode, uint8_t vector_num, + uint8_t polarity, uint8_t trigger_mode, int gsi) +{ +vector_to_gsi_map[vector_num] = gsi; +apic_deliver_irq(dest, dest_mode, delivery_mode, + vector_num, polarity, trigger_mode); +} + void cpu_set_apic_base(DeviceState *d, uint64_t val) { APICState *s = DO_UPCAST(APICState, busdev.qdev, d); @@ -420,8 +431,11 @@ static void apic_eoi(APICState *s) if (isrv 0) return; reset_bit(s-isr, isrv); -/* XXX: send the EOI packet to the APIC bus to allow the I/O APIC to -set the remote IRR bit for level triggered interrupts. */ + +if (vector_to_gsi_map[isrv] != 0xff) { +ioapic_eoi(vector_to_gsi_map[isrv]); +vector_to_gsi_map[isrv] = 0xff; +} apic_update_irq(s); } diff --git a/hw/apic.h b/hw/apic.h index 8a0c9d0..59d0e37 100644 --- a/hw/apic.h +++ b/hw/apic.h @@ -8,6 +8,10 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode, uint8_t vector_num, uint8_t polarity, uint8_t trigger_mode); +void apic_deliver_ioapic_irq(uint8_t dest, uint8_t dest_mode, + uint8_t delivery_mode, + uint8_t vector_num, uint8_t polarity, + uint8_t trigger_mode, int gsi); int apic_accept_pic_intr(DeviceState *s); void apic_deliver_pic_intr(DeviceState *s, int level); int apic_get_interrupt(DeviceState *s); diff --git a/hw/ioapic.c b/hw/ioapic.c index 5ae21e9..ffd1c92 100644 --- a/hw/ioapic.c +++ b/hw/ioapic.c @@ -26,6 +26,7 @@ #include qemu-timer.h #include host-utils.h #include sysbus.h +#include qlist.h //#define DEBUG_IOAPIC @@ -61,6 +62,39 @@ struct IOAPICState { uint64_t ioredtbl[IOAPIC_NUM_PINS]; }; +static QLIST_HEAD(ioapic_eoi_client_list, + ioapic_eoi_client) ioapic_eoi_client_list = + QLIST_HEAD_INITIALIZER(ioapic_eoi_client_list); + +int ioapic_register_eoi_client(ioapic_eoi_client *client) +{ +QLIST_INSERT_HEAD(ioapic_eoi_client_list, client, list); +return 0; +} + +void ioapic_unregister_eoi_client(ioapic_eoi_client *client) +{ +QLIST_REMOVE(client, list); +} + +int ioapic_eoi_client_get_fd(ioapic_eoi_client *client) +{ +if (!client-notifier_enabled) { +return -ENODEV; +} +return event_notifier_get_fd(client-notifier); +} + +void ioapic_eoi(int gsi) +{ +ioapic_eoi_client *client; +QLIST_FOREACH(client, ioapic_eoi_client_list, list) { +if (client-irq == gsi) { +client-eoi(client); +} +} +} + static void ioapic_service(IOAPICState *s) { uint8_t i; @@ -90,8 +124,8 @@ static void ioapic_service(IOAPICState *s) else vector = entry 0xff; -apic_deliver_irq(dest, dest_mode, delivery_mode, - vector, polarity, trig_mode); +apic_deliver_ioapic_irq(dest, dest_mode, delivery_mode, +vector, polarity, trig_mode, i); } } } diff --git a/hw/pc.h b/hw/pc.h index 63b0249..5945bff 100644 --- a/hw/pc.h +++ b/hw/pc.h @@ -5,6 +5,7 @@ #include ioport.h #include isa.h #include fdc.h +#include event_notifier.h /* PC-style peripherals (also used by other machines). */ @@ -48,8 +49,21 @@ typedef struct isa_irq_state { void isa_irq_handler(void *opaque, int n, int level); -/* i8254.c */ +struct
[RFC PATCH] kvm: KVM_EOI_EVENTFD support for eoi_client
With the KVM irqchip, we need to get the EOI via an eventfd. This adds support for that, abstracting the details to the caller. The get_fd function allows drivers to make further optimizations in handling the EOI. For instance with VFIO, we can make use of an irqfd-like mechanism to have the VFIO kernel module consume the EOI directly, bypassing qemu userspace. Signed-off-by: Alex Williamson alex.william...@redhat.com --- hw/ioapic.c | 52 1 files changed, 52 insertions(+), 0 deletions(-) diff --git a/hw/ioapic.c b/hw/ioapic.c index c43be3a..707f2a2 100644 --- a/hw/ioapic.c +++ b/hw/ioapic.c @@ -72,14 +72,66 @@ static QLIST_HEAD(ioapic_eoi_client_list, ioapic_eoi_client) ioapic_eoi_client_list = QLIST_HEAD_INITIALIZER(ioapic_eoi_client_list); +#ifdef KVM_EOI_EVENTFD +static void ioapic_eoi_callback(void *opaque) +{ +ioapic_eoi_client *client = opaque; + +if (!event_notifier_test_and_clear(client-notifier)) { +return; +} + +client-eoi(client); +} +#endif + int ioapic_register_eoi_client(ioapic_eoi_client *client) { QLIST_INSERT_HEAD(ioapic_eoi_client_list, client, list); + +#ifdef KVM_EOI_EVENTFD +if (kvm_enabled() kvm_irqchip_in_kernel()) { +int ret, fd; + +ret = event_notifier_init(client-notifier, 0); +if (ret) { +fprintf(stderr, %s notifier init failed %d\n, __FUNCTION__, ret); +return ret; +} + +fd = event_notifier_get_fd(client-notifier); +qemu_set_fd_handler(fd, ioapic_eoi_callback, NULL, client); + +ret = kvm_eoi_eventfd(client-irq, fd, KVM_EOI_EVENTFD_FLAG_DEASSERT); +if (ret) { +fprintf(stderr, %s eoi eventfd failed %d\n, __FUNCTION__, ret); +return ret; +} +client-notifier_enabled = true; +} +#endif return 0; } void ioapic_unregister_eoi_client(ioapic_eoi_client *client) { +#ifdef KVM_EOI_EVENTFD +if (kvm_enabled() kvm_irqchip_in_kernel()) { +int ret, fd; + +fd = event_notifier_get_fd(client-notifier); + +ret = kvm_eoi_eventfd(client-irq, fd, KVM_EOI_EVENTFD_FLAG_DEASSIGN); +if (ret) { +fprintf(stderr, %s eoi eventfd failed %d\n, __FUNCTION__, ret); +} + +qemu_set_fd_handler(fd, NULL, NULL, NULL); + +event_notifier_cleanup(client-notifier); +client-notifier_enabled = false; +} +#endif QLIST_REMOVE(client, list); } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] APIC/IOAPIC EOI callback
On 10/29/2010 12:56 PM, Alex Williamson wrote: For device assignment, we need to know when the VM writes an end of interrupt to the APIC, which allows us to re-enable interrupts on the physical device. Add a new wrapper for ioapic generated interrupts with a callback on eoi and create an interface for drivers to be notified on eoi. Signed-off-by: Alex Williamsonalex.william...@redhat.com --- Note that the notifier and notifier_enabled eoi_client fields aren't used here yet. I'll send an RFC patch showing how we make use of these with the proposed KVM_EOI_EVENTFD patches. hw/apic.c | 18 -- hw/apic.h |4 hw/ioapic.c | 38 -- hw/pc.h | 16 +++- 4 files changed, 71 insertions(+), 5 deletions(-) diff --git a/hw/apic.c b/hw/apic.c index 63d62c7..a24117b 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -22,6 +22,7 @@ #include host-utils.h #include sysbus.h #include trace.h +#include pc.h /* APIC Local Vector Table */ #define APIC_LVT_TIMER 0 @@ -103,6 +104,7 @@ struct APICState { int wait_for_sipi; }; +static uint8_t vector_to_gsi_map[256] = { 0xff }; static APICState *local_apics[MAX_APICS + 1]; static int apic_irq_delivered; @@ -292,6 +294,15 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, trigger_mode); } +void apic_deliver_ioapic_irq(uint8_t dest, uint8_t dest_mode, + uint8_t delivery_mode, uint8_t vector_num, + uint8_t polarity, uint8_t trigger_mode, int gsi) +{ +vector_to_gsi_map[vector_num] = gsi; +apic_deliver_irq(dest, dest_mode, delivery_mode, + vector_num, polarity, trigger_mode); +} + void cpu_set_apic_base(DeviceState *d, uint64_t val) { APICState *s = DO_UPCAST(APICState, busdev.qdev, d); @@ -420,8 +431,11 @@ static void apic_eoi(APICState *s) if (isrv 0) return; reset_bit(s-isr, isrv); -/* XXX: send the EOI packet to the APIC bus to allow the I/O APIC to -set the remote IRR bit for level triggered interrupts. */ + +if (vector_to_gsi_map[isrv] != 0xff) { +ioapic_eoi(vector_to_gsi_map[isrv]); +vector_to_gsi_map[isrv] = 0xff; +} apic_update_irq(s); } diff --git a/hw/apic.h b/hw/apic.h index 8a0c9d0..59d0e37 100644 --- a/hw/apic.h +++ b/hw/apic.h @@ -8,6 +8,10 @@ void apic_deliver_irq(uint8_t dest, uint8_t dest_mode, uint8_t delivery_mode, uint8_t vector_num, uint8_t polarity, uint8_t trigger_mode); +void apic_deliver_ioapic_irq(uint8_t dest, uint8_t dest_mode, + uint8_t delivery_mode, + uint8_t vector_num, uint8_t polarity, + uint8_t trigger_mode, int gsi); int apic_accept_pic_intr(DeviceState *s); void apic_deliver_pic_intr(DeviceState *s, int level); int apic_get_interrupt(DeviceState *s); diff --git a/hw/ioapic.c b/hw/ioapic.c index 5ae21e9..ffd1c92 100644 --- a/hw/ioapic.c +++ b/hw/ioapic.c @@ -26,6 +26,7 @@ #include qemu-timer.h #include host-utils.h #include sysbus.h +#include qlist.h //#define DEBUG_IOAPIC @@ -61,6 +62,39 @@ struct IOAPICState { uint64_t ioredtbl[IOAPIC_NUM_PINS]; }; +static QLIST_HEAD(ioapic_eoi_client_list, + ioapic_eoi_client) ioapic_eoi_client_list = + QLIST_HEAD_INITIALIZER(ioapic_eoi_client_list); + +int ioapic_register_eoi_client(ioapic_eoi_client *client) +{ +QLIST_INSERT_HEAD(ioapic_eoi_client_list, client, list); +return 0; +} + +void ioapic_unregister_eoi_client(ioapic_eoi_client *client) +{ +QLIST_REMOVE(client, list); +} + +int ioapic_eoi_client_get_fd(ioapic_eoi_client *client) +{ +if (!client-notifier_enabled) { +return -ENODEV; +} +return event_notifier_get_fd(client-notifier); +} + +void ioapic_eoi(int gsi) +{ +ioapic_eoi_client *client; +QLIST_FOREACH(client,ioapic_eoi_client_list, list) { +if (client-irq == gsi) { +client-eoi(client); +} +} +} I think this all goes away with a NotifierList. Regards, Anthony Liguori + static void ioapic_service(IOAPICState *s) { uint8_t i; @@ -90,8 +124,8 @@ static void ioapic_service(IOAPICState *s) else vector = entry 0xff; -apic_deliver_irq(dest, dest_mode, delivery_mode, - vector, polarity, trig_mode); +apic_deliver_ioapic_irq(dest, dest_mode, delivery_mode, +vector, polarity, trig_mode, i); } } } diff --git a/hw/pc.h b/hw/pc.h index 63b0249..5945bff 100644 --- a/hw/pc.h +++ b/hw/pc.h @@ -5,6 +5,7 @@ #include ioport.h #include isa.h #include fdc.h +#include event_notifier.h /* PC-style peripherals
Re: [Qemu-devel] [PATCH 1/2] Minimal RAM API support
On Fri, Oct 29, 2010 at 4:39 PM, Alex Williamson alex.william...@redhat.com wrote: This adds a minimum chunk of Anthony's RAM API support so that we can identify actual VM RAM versus all the other things that make use of qemu_ram_alloc. Signed-off-by: Alex Williamson alex.william...@redhat.com --- Makefile.target | 1 + cpu-common.h | 2 + memory.c | 82 +++ memory.h | 23 +++ 4 files changed, 108 insertions(+), 0 deletions(-) create mode 100644 memory.c create mode 100644 memory.h diff --git a/Makefile.target b/Makefile.target index c48cbcc..e4e2eb4 100644 --- a/Makefile.target +++ b/Makefile.target @@ -175,6 +175,7 @@ obj-$(CONFIG_VIRTFS) += virtio-9p.o obj-y += rwhandler.o obj-$(CONFIG_KVM) += kvm.o kvm-all.o obj-$(CONFIG_NO_KVM) += kvm-stub.o +obj-y += memory.o Please move this to Makefile.objs to compile the object in hwlib. There are no target dependencies. LIBS+=-lz QEMU_CFLAGS += $(VNC_TLS_CFLAGS) diff --git a/cpu-common.h b/cpu-common.h index a543b5d..6aa2738 100644 --- a/cpu-common.h +++ b/cpu-common.h @@ -23,6 +23,8 @@ /* address in the RAM (different from a physical address) */ typedef unsigned long ram_addr_t; +#include memory.h + /* memory API */ typedef void CPUWriteMemoryFunc(void *opaque, target_phys_addr_t addr, uint32_t value); diff --git a/memory.c b/memory.c new file mode 100644 index 000..86947fb --- /dev/null +++ b/memory.c @@ -0,0 +1,82 @@ +/* + * RAM API + * + * Copyright Red Hat, Inc. 2010 + * + * Authors: + * Alex Williamson alex.william...@redhat.com + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see http://www.gnu.org/licenses/. + */ +#include memory.h +#include range.h + +QemuRamSlots ram_slots = { .slots = QLIST_HEAD_INITIALIZER(ram_slots) }; Please avoid global state. This is not used elsewhere, so it could be static. But instead the API should take a state parameter (RAMSlotState *) so that no static state is needed. + +static QemuRamSlot *qemu_ram_find_slot(target_phys_addr_t start_addr, + ram_addr_t size) +{ + QemuRamSlot *slot; + + QLIST_FOREACH(slot, ram_slots.slots, next) { + if (slot-start_addr == start_addr slot-size == size) { + return slot; + } + + if (ranges_overlap(start_addr, size, slot-start_addr, slot-size)) { + abort(); + } + } + + return NULL; +} + +void qemu_ram_register(target_phys_addr_t start_addr, ram_addr_t size, + ram_addr_t phys_offset) +{ + QemuRamSlot *slot; + + if (!size) { + return; + } + + assert(!qemu_ram_find_slot(start_addr, size)); + + slot = qemu_mallocz(sizeof(QemuRamSlot)); + + slot-start_addr = start_addr; + slot-size = size; + slot-offset = phys_offset; + + QLIST_INSERT_HEAD(ram_slots.slots, slot, next); + + cpu_register_physical_memory(slot-start_addr, slot-size, slot-offset); +} + +void qemu_ram_unregister(target_phys_addr_t start_addr, ram_addr_t size) +{ + QemuRamSlot *slot; + + if (!size) { + return; + } + + slot = qemu_ram_find_slot(start_addr, size); + assert(slot != NULL); + + QLIST_REMOVE(slot, next); + cpu_register_physical_memory(start_addr, size, IO_MEM_UNASSIGNED); + + return; +} diff --git a/memory.h b/memory.h new file mode 100644 index 000..91e552e --- /dev/null +++ b/memory.h @@ -0,0 +1,23 @@ +#ifndef QEMU_MEMORY_H +#define QEMU_MEMORY_H + +#include qemu-common.h +#include cpu-common.h + +typedef struct QemuRamSlot { + target_phys_addr_t start_addr; + ram_addr_t size; + ram_addr_t offset; + void *host; + QLIST_ENTRY(QemuRamSlot) next; +} QemuRamSlot; + +typedef struct QemuRamSlots { + QLIST_HEAD(slots, QemuRamSlot) slots; +} QemuRamSlots; This definition should be in memory.c. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 1/2] Minimal RAM API support
On Fri, 2010-10-29 at 19:57 +, Blue Swirl wrote: On Fri, Oct 29, 2010 at 4:39 PM, Alex Williamson alex.william...@redhat.com wrote: This adds a minimum chunk of Anthony's RAM API support so that we can identify actual VM RAM versus all the other things that make use of qemu_ram_alloc. Signed-off-by: Alex Williamson alex.william...@redhat.com --- Makefile.target |1 + cpu-common.h|2 + memory.c| 82 +++ memory.h| 23 +++ 4 files changed, 108 insertions(+), 0 deletions(-) create mode 100644 memory.c create mode 100644 memory.h diff --git a/Makefile.target b/Makefile.target index c48cbcc..e4e2eb4 100644 --- a/Makefile.target +++ b/Makefile.target @@ -175,6 +175,7 @@ obj-$(CONFIG_VIRTFS) += virtio-9p.o obj-y += rwhandler.o obj-$(CONFIG_KVM) += kvm.o kvm-all.o obj-$(CONFIG_NO_KVM) += kvm-stub.o +obj-y += memory.o Please move this to Makefile.objs to compile the object in hwlib. There are no target dependencies. Ok, will do. LIBS+=-lz QEMU_CFLAGS += $(VNC_TLS_CFLAGS) diff --git a/cpu-common.h b/cpu-common.h index a543b5d..6aa2738 100644 --- a/cpu-common.h +++ b/cpu-common.h @@ -23,6 +23,8 @@ /* address in the RAM (different from a physical address) */ typedef unsigned long ram_addr_t; +#include memory.h + /* memory API */ typedef void CPUWriteMemoryFunc(void *opaque, target_phys_addr_t addr, uint32_t value); diff --git a/memory.c b/memory.c new file mode 100644 index 000..86947fb --- /dev/null +++ b/memory.c @@ -0,0 +1,82 @@ +/* + * RAM API + * + * Copyright Red Hat, Inc. 2010 + * + * Authors: + * Alex Williamson alex.william...@redhat.com + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see http://www.gnu.org/licenses/. + */ +#include memory.h +#include range.h + +QemuRamSlots ram_slots = { .slots = QLIST_HEAD_INITIALIZER(ram_slots) }; Please avoid global state. This is not used elsewhere, so it could be static. But instead the API should take a state parameter (RAMSlotState *) so that no static state is needed. The reason for this not being static is that the vfio driver I'm working on walks it. Also the reason for the definition being in memory.h instead of memory.c as you've noted below. Probably better to solve that usage by creating an interface that calls a function pointer for each entry... I'll work on that. Thanks, Alex + +static QemuRamSlot *qemu_ram_find_slot(target_phys_addr_t start_addr, + ram_addr_t size) +{ +QemuRamSlot *slot; + +QLIST_FOREACH(slot, ram_slots.slots, next) { +if (slot-start_addr == start_addr slot-size == size) { +return slot; +} + +if (ranges_overlap(start_addr, size, slot-start_addr, slot-size)) { +abort(); +} +} + +return NULL; +} + +void qemu_ram_register(target_phys_addr_t start_addr, ram_addr_t size, + ram_addr_t phys_offset) +{ +QemuRamSlot *slot; + +if (!size) { +return; +} + +assert(!qemu_ram_find_slot(start_addr, size)); + +slot = qemu_mallocz(sizeof(QemuRamSlot)); + +slot-start_addr = start_addr; +slot-size = size; +slot-offset = phys_offset; + +QLIST_INSERT_HEAD(ram_slots.slots, slot, next); + +cpu_register_physical_memory(slot-start_addr, slot-size, slot-offset); +} + +void qemu_ram_unregister(target_phys_addr_t start_addr, ram_addr_t size) +{ +QemuRamSlot *slot; + +if (!size) { +return; +} + +slot = qemu_ram_find_slot(start_addr, size); +assert(slot != NULL); + +QLIST_REMOVE(slot, next); +cpu_register_physical_memory(start_addr, size, IO_MEM_UNASSIGNED); + +return; +} diff --git a/memory.h b/memory.h new file mode 100644 index 000..91e552e --- /dev/null +++ b/memory.h @@ -0,0 +1,23 @@ +#ifndef QEMU_MEMORY_H +#define QEMU_MEMORY_H + +#include qemu-common.h +#include cpu-common.h + +typedef struct QemuRamSlot { +target_phys_addr_t start_addr; +ram_addr_t size; +ram_addr_t offset; +void *host; +
Re: [PATCH v13 10/16] Add a hook to intercept external buffers from NIC driver.
From: Xin, Xiaohui xiaohui@intel.com Date: Wed, 27 Oct 2010 09:33:12 +0800 Somehow, it seems not a trivial work to support it now. Can we support it later and as a todo with our current work? I would prefer the feature work properly, rather than only in specific cases, before being integated. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
qemu-kvm-0.13.0 compile error
Hi, I have problem to compile qemu-kvm 0.13.0 with gcc version 4.1.2 20080704 (Red Hat 4.1.2-48) in CentOS 5. Errors below: /usr/src/qemu-kvm-0.13.0/hw/ide/core.c: In function ‘ide_drive_pio_post_load’: /usr/src/qemu-kvm-0.13.0/hw/ide/core.c:2782: warning: comparison is always false due to limited range of data type ui/vnc-enc-tight.c: In function ‘tight_detect_smooth_image’: ui/vnc-enc-tight.c:284: warning: comparison is always true due to limited range of data type ui/vnc-enc-tight.c:297: warning: comparison is always true due to limited range of data type ui/vnc-enc-tight.c: In function ‘tight_encode_indexed_rect16’: ui/vnc-enc-tight.c:456: warning: comparison is always false due to limited range of data type ui/vnc-enc-tight.c: In function ‘tight_encode_indexed_rect32’: ui/vnc-enc-tight.c:457: warning: comparison is always false due to limited range of data type ui/vnc-enc-tight.c: In function ‘send_sub_rect’: ui/vnc-enc-tight.c:1458: warning: ‘ret’ may be used uninitialized in this function In file included from /usr/src/qemu-kvm-0.13.0/kvm-all.c:1347: /usr/src/qemu-kvm-0.13.0/qemu-kvm.c: In function ‘kvm_run’: /usr/src/qemu-kvm-0.13.0/qemu-kvm.c:675: warning: implicit declaration of function ‘kvm_handle_internal_error’ kvm-all.o: In function `kvm_run': /usr/src/qemu-kvm-0.13.0/qemu-kvm.c:675: undefined reference to `kvm_handle_internal_error' collect2: ld returned 1 exit status make[1]: *** [qemu-system-x86_64] Error 1 make: *** [subdir-x86_64-softmmu] Error 2 A search about the related function output: # grep kvm_handle_internal_error ./* ./kvm-all.c:static void kvm_handle_internal_error(CPUState *env, struct kvm_run *run) ./kvm-all.c:kvm_handle_internal_error(env, run); ./qemu-kvm.c:kvm_handle_internal_error(env, run); Thanks. Kindest regards, Giam Teck Choon -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH iproute2] Support 'mode' parameter when creating macvtap device
On Friday 29 October 2010, Sridhar Samudrala wrote: Add support for 'mode' parameter when creating a macvtap device. This allows a macvtap device to be created in bridge, private or the default vepa modes. Signed-off-by: Sridhar Samudrala s...@us.ibm.com Acked-by: Arnd Bergmann a...@arndb.de -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] macvlan: Introduce 'passthru' mode to takeover the underlying device
On Friday 29 October 2010, Sridhar Samudrala wrote: With the current default 'vepa' mode, a KVM guest using virtio with macvtap backend has the following limitations. - cannot change/add a mac address on the guest virtio-net I believe this could be changed if there is a neeed, but I actually consider it one of the design points of macvlan that the guest is not able to change the mac address. With 802.1Qbg you rely on the switch being able to identify the guest by its MAC address, which the host kernel must ensure. - cannot create a vlan device on the guest virtio-net Why not? If this doesn't work, it's probably a bug! Why does the passthru mode enable it if it doesn't work already? - cannot enable promiscuous mode on guest virtio-net Could you elaborate why such a setup would be useful? Arnd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ovs-dev] Flow Control and Port Mirroring
[ CCed VHOST contacts ] On Thu, Oct 28, 2010 at 01:22:02PM -0700, Jesse Gross wrote: On Thu, Oct 28, 2010 at 4:54 AM, Simon Horman ho...@verge.net.au wrote: My reasoning is that in the non-mirroring case the guest is limited by the external interface through wich the packets eventually flow - that is 1Gbit/s. But in the mirrored either there is no flow control or the flow control is acting on the rate of dummy0, which is essentailly infinate. Before investigating this any further I wanted to ask if this behaviour is intentional. It's not intentional but I can take a guess at what is happening. When we send the packet to a mirror, the skb is cloned but only the original skb is charged to the sender. If the original packet is delivered to localhost then it will be freed quickly and no longer accounted for, despite the fact that the real packet is still sitting in the transmit queue on the NIC. The UDP stack will then send the next packet, limited only by the speed of the CPU. That would explain what I have observed. Normally, this would be tracked by accounting for the memory charged to the socket. However, I know that Xen tracks whether the actual pages of memory have been freed, which should avoid this problem since the memory won't be released util the last packet has been sent. I don't know what KVM virtio does but I'm guessing that it similar to the former, since this problem is occurring. I am also familiar of how Xen tracks pages but less sure of the virtio side of things. While it would be easy to charge the socket for all clones, I also want to be careful about over accounting of the same data, leading to a very small effective socket buffer. Agreed, we don't want to see over-charging. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html