Re: [PATCH V4 3/7] kvm tools: Add SPAPR PPC64 hcall rtascall structure
On Tue, Jan 31, 2012 at 8:34 AM, Matt Evans m...@ozlabs.org wrote: +#define DEBUG_SPAPR_HCALLS I suppose this shouldn't be defined by default? +#ifdef DEBUG_SPAPR_HCALLS +#define hcall_dprintf(fmt, ...) \ + do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0) +#else +#define hcall_dprintf(fmt, ...) \ + do { } while (0) +#endif I don't expect you to fix this but this sort of thing really cries out for userspace tracepoints. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm tools: Repair running on non ioeventfd-capable platforms
On Tue, Jan 31, 2012 at 8:30 AM, Matt Evans m...@ozlabs.org wrote: Commit d3923126a24212f1e746a84a575dadbd9f259418 added a bunch of nice error checking around ioevent__init() but the init may gracefully fail if ioevents simply aren't supported (PPC64 KVM). This commit adds a new return code for the init -- positive, but identifiable as 'not success 0'. Signed-off-by: Matt Evans m...@ozlabs.org Applied, thanks! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4 0/7] Add initial SPAPR PPC64 architecture support
On Tue, Jan 31, 2012 at 8:34 AM, Matt Evans m...@ozlabs.org wrote: This series adds support for a PPC64 platform, SPAPR, on top of the previous more general PPC64 CPU support. This platform is paravirtualised, with most of the MMU hypercalls being dealt with in the kernel. Userland needs to describe the environment to the machine with a device tree, cope with some hypercalls (and RTAS calls) for hvconsole, PCI config and IRQs, and emulate the interrupt controller (XICS) and PCI Host Bridge of the SPAPR-esque machine. I think the patches are good to go once we've sorted out the minor nits I pointed out. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4 v3] KVM: Introduce kvm_memory_slot::arch and move lpage_info into it
On 01/31/2012 03:17 AM, Takuya Yoshikawa wrote: Added s390 and ppc developers to Cc, (2012/01/30 14:35), Takuya Yoshikawa wrote: Some members of kvm_memory_slot are not used by every architecture. This patch is the first step to make this difference clear by introducing kvm_memory_slot::arch; lpage_info is moved into it. I am planning to move rmap stuff into arch next if this patch is accepted. Please let me know if you have some opinion about which members should be moved into this. Is there anything else? Everything else seems to be generic. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4 v3] KVM: Introduce kvm_memory_slot::arch and move lpage_info into it
(2012/01/31 18:18), Avi Kivity wrote: On 01/31/2012 03:17 AM, Takuya Yoshikawa wrote: Added s390 and ppc developers to Cc, (2012/01/30 14:35), Takuya Yoshikawa wrote: Some members of kvm_memory_slot are not used by every architecture. This patch is the first step to make this difference clear by introducing kvm_memory_slot::arch; lpage_info is moved into it. I am planning to move rmap stuff into arch next if this patch is accepted. Please let me know if you have some opinion about which members should be moved into this. Is there anything else? Everything else seems to be generic. About members, I agree. But dirty_bitmap allocation/destruction should be implemented by kvm_arch_create_*_dirty_bitmap(). Of course I may want to add another x86 specific member in the future. Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/3] KVM: x86 emulator: Allow PM/VM86 switch during task switch
On Mon, Jan 30, 2012 at 04:26:06PM +0100, Kevin Wolf wrote: Would that be enough or would we have to avoid clearing it in all other places as well? Where would it be initialised if it's not enough? Maybe vmx_vcpu_reset(). Do all CPL changes go through set_cr0/segment/rflags/cpl? I guess yes, so initialising on reset and keeping it valid all the time should be possible indeed. CPL can be changed while guest is running. SVM saves it for us in cpl field. VMX does not, so we either will have to update cpl on each exit (cpl = cs 3) or somehow mark it not up-to-date and recalculate on access. Can VMX exit while cpl != cs 3 or can this happen only during emulation? If it can we cannot know real cpl after exit. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 42563] rhel5u5 guest panic when booting up
https://bugzilla.kernel.org/show_bug.cgi?id=42563 --- Comment #8 from Gleb g...@redhat.com 2012-01-31 09:39:50 --- (In reply to comment #7) (In reply to comment #6) (In reply to comment #5) (In reply to comment #4) Yes. With 'ignore_msrs=y' added when loading kvm module, rhel5.x guest can boot up now. As it will effect all the RHEL5.x (e.g.5.3, 5.3) guest, this issue still need to be fixed. Agree. Gleb, I think we need to restore the perf msrs behaviour to what it was before, when the PMU is not enabled. Yes, they were special cased for a reason. I found the bug got fixed in kvm upstream. Gleb, you fixed it by the following commit? Yes. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 42563] rhel5u5 guest panic when booting up
https://bugzilla.kernel.org/show_bug.cgi?id=42563 Jay Ren yongjie@intel.com changed: What|Removed |Added Status|NEW |RESOLVED Resolution||CODE_FIX -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 42563] rhel5u5 guest panic when booting up
https://bugzilla.kernel.org/show_bug.cgi?id=42563 Jay Ren yongjie@intel.com changed: What|Removed |Added Status|RESOLVED|VERIFIED -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 42563] rhel5u5 guest panic when booting up
https://bugzilla.kernel.org/show_bug.cgi?id=42563 Jay Ren yongjie@intel.com changed: What|Removed |Added Status|VERIFIED|CLOSED --- Comment #9 from Jay Ren yongjie@intel.com 2012-01-31 09:44:11 --- verified and closed it. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4 2/7] kvm tools: Generate SPAPR PPC64 guest device tree
On 31 Jan 2012, at 18:59, Pekka Enberg wrote: On Tue, Jan 31, 2012 at 8:34 AM, Matt Evans m...@ozlabs.org wrote: +static struct cpu_info cpu_power7_info = { + POWER7, + power7_page_sizes_prop, sizeof(power7_page_sizes_prop), + power7_segment_sizes_prop, sizeof(power7_segment_sizes_prop), + 32, /* SLB size */ + 51200, /* TB frequency */ + 128,/* d-cache block size */ + 128,/* i-cache block size */ + CPUINFO_FLAG_DFP | CPUINFO_FLAG_VSX | CPUINFO_FLAG_VMX +}; Why don't you use C99 initializers here? ...just out of bad bad habit :) Initialisers'd be cleaner, I'll convert them. Thanks! Matt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4 3/7] kvm tools: Add SPAPR PPC64 hcall rtascall structure
On 31 Jan 2012, at 19:11, Pekka Enberg wrote: On Tue, Jan 31, 2012 at 8:34 AM, Matt Evans m...@ozlabs.org wrote: +#define DEBUG_SPAPR_HCALLS I suppose this shouldn't be defined by default? Well, I had a bit of a debate about it. I left it on as it is actually interesting whilst experimenting with different SPAPR guests. For instance (and this is really out there, there are other reasons why this won't work) if someone tries an AIX guest it'd be good to see lots of Unsupported! spew. (The 'unsupported' message could be a warning but that might also get annoying, OTOH.) I would prefer to leave it on for the moment. +#ifdef DEBUG_SPAPR_HCALLS +#define hcall_dprintf(fmt, ...) \ +do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0) +#else +#define hcall_dprintf(fmt, ...) \ +do { } while (0) +#endif I don't expect you to fix this but this sort of thing really cries out for userspace tracepoints. Ha! Yes... Cheers, Matt-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/3] KVM: x86 emulator: Allow PM/VM86 switch during task switch
On 01/31/2012 11:37 AM, Gleb Natapov wrote: On Mon, Jan 30, 2012 at 04:26:06PM +0100, Kevin Wolf wrote: Would that be enough or would we have to avoid clearing it in all other places as well? Where would it be initialised if it's not enough? Maybe vmx_vcpu_reset(). Do all CPL changes go through set_cr0/segment/rflags/cpl? I guess yes, so initialising on reset and keeping it valid all the time should be possible indeed. CPL can be changed while guest is running. SVM saves it for us in cpl field. VMX does not, so we either will have to update cpl on each exit (cpl = cs 3) or somehow mark it not up-to-date and recalculate on access. Can VMX exit while cpl != cs 3 or can this happen only during emulation? If it can we cannot know real cpl after exit. Perhaps it can, with unrestricted guests, but I think we don't allow those conditions (we trap cr0 writes). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Increase KVMSlots size
Hi All, In my development I came across the state where I need more than 32 kvmslots QEMU/kvm-all.c: cut struct KVMState { KVMSlot slots[32]; cut Followings are the solution I thought of: Solution 1: Increase the slot to 64 in both KVM and QEMU. I think this will also require to add a new ioctl call so that qemu can come to know on which KVM it is running (whether it is running on 32 slots KVM or 64 slots KVM). Other solutions: a) Decouple the qemu and kvm data structure. QEMU and KVM use some variable size data structure or lists etc. QEMU can keep on adding as many as it wants without knowing how much slots KVM supports. b) There are linear search in the KVM slots array in critical path of code of KVM. Does this make some sense using RB-tree, heapsort, balanced binary tree type of solutions? Looking forward for suggestions. Thanks -Bharat -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[VT-d reboot problems] Re: [PATCH] x86 / reboot: Blacklist Dell OptiPlex 990 known to require PCI reboot
(added KVM folks to the Cc:) * Bastien ROUCARIES roucaries.bast...@gmail.com wrote: Ping^2 Bastien On Mon, Jan 23, 2012 at 11:28 AM, Bastien ROUCARIES roucaries.bast...@gmail.com wrote: On Mon, Jan 16, 2012 at 8:21 PM, H. Peter Anvin h...@zytor.com wrote: On 01/16/2012 03:27 AM, Bastien ROUCARIES wrote: Does it work if you disable VT-d in the firmware? If so, then adding it to the reboot method blacklist is the wrong fix - we need to figure out why VT-d interferes with Dell's reboot code. Yes it work This is particularly so since we are very close to having a full Dell model catalogue in the kernel... Ping ? Do you need some dump ? testing ? So disabling VT-d in the BIOS fixes the reboot problem and Matthew Garrett suggests we should figure out why and how VT-d on this Dell box interferes with the reboot method. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: VMXON region vs VMCS region?
On 01/31/2012 05:35 AM, Zhi Yong Wu wrote: HI, Can anyone let me know know the difference between VMXON region and VMCS region? relationship? There is no relationship between them: VMXON region is created per logical processor and used by it for VMX ops. VMCS region is created for each guest vcpu and used both by the hypervisor and the processor. It will be appreciated if you can make some comments. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/9] pci-assign: Optionally enable 64bit BARs in guest
On 01/28/2012 04:21 PM, Alex Williamson wrote: To date we've only exposed BARs as 32bit even if the device physically supports 64bit BARs. Enable 64bit BARs to be exposed as such in the guest, which may free up MMIO below 4G should the guest choose to use it. This adds a new mem64= option to pci-assign, with the default being off for testing and enablement. The goal is to eventually make this enabled by default. Seems fine, but do we really need the option? If it doesn't work we should treat it as an ordinary but and fix it. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4 v3] KVM: Introduce kvm_memory_slot::arch and move lpage_info into it
Christian Borntraeger borntrae...@de.ibm.com wrote: Some members of kvm_memory_slot are not used by every architecture. This patch is the first step to make this difference clear by introducing kvm_memory_slot::arch; lpage_info is moved into it. Patch series seems to work on s390. Christian Thanks! Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/9] pci-assign: Optionally enable 64bit BARs in guest
On 2012-01-31 13:40, Avi Kivity wrote: On 01/28/2012 04:21 PM, Alex Williamson wrote: To date we've only exposed BARs as 32bit even if the device physically supports 64bit BARs. Enable 64bit BARs to be exposed as such in the guest, which may free up MMIO below 4G should the guest choose to use it. This adds a new mem64= option to pci-assign, with the default being off for testing and enablement. The goal is to eventually make this enabled by default. Seems fine, but do we really need the option? If it doesn't work we should treat it as an ordinary but and fix it. So far it's against the architecture of the emulated system: our current chipset predates 64 bit PCI. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/9] pci-assign: Update MSI-X MMIO to Memory API
On 01/28/2012 04:21 PM, Alex Williamson wrote: Stop using compatibility mode and at the same time fix available access sizes. The PCI spec indicates that the MSI-X table may only be accessed as DWORD or QWORD. static const MemoryRegionOps msix_mmio_ops = { -.old_mmio = { -.read = { msix_mmio_readb, msix_mmio_readw, msix_mmio_readl, }, -.write = { msix_mmio_writeb, msix_mmio_writew, msix_mmio_writel, }, -}, +.read = msix_mmio_read, +.write = msix_mmio_write, .endianness = DEVICE_NATIVE_ENDIAN, +.impl = { +.min_access_size = 4, +.max_access_size = 8, +}, }; .impl.min_access_size = 4 means the core will convert 1-byte I/O to 4-byte I/O (using rmw if needed). That's not what we want, I think you can leave it at 1 and explicitly ignore small accesses in the callbacks. Have you tested 8-byte I/O? This is the first user. Don't you need to set .valid.max_access_size? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 9/9] pci-assign: Update MSI-X config based on table writes
On 01/28/2012 04:22 PM, Alex Williamson wrote: We currently only update MSI-X configuration with the enable bit in PCI config space is toggled. This is pretty sketchy and part of the reason for the odd checks for vector data is to guess whether the guest is going to use the vector so we can pre-enable it. Two key things missed by doing this is that we don't catch vector changes after enabling (ex. setting smp_affinity on an irq) and we don't support guests that don't touch the vector table prior to enabling the MSI-X capability (ex. freebsd). This patch fixes both of these problems. I'm not able to get good behavior attempting to disable masked vectors, so we don't actually do anything on mask yet. +static inline bool msix_masked(MSIXTableEntry *entry) +{ +return !!(entry-ctrl 0x1); +} !! !needed (nor 'inline'). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/9] pci-assign: Optionally enable 64bit BARs in guest
On 01/31/2012 02:45 PM, Jan Kiszka wrote: On 2012-01-31 13:40, Avi Kivity wrote: On 01/28/2012 04:21 PM, Alex Williamson wrote: To date we've only exposed BARs as 32bit even if the device physically supports 64bit BARs. Enable 64bit BARs to be exposed as such in the guest, which may free up MMIO below 4G should the guest choose to use it. This adds a new mem64= option to pci-assign, with the default being off for testing and enablement. The goal is to eventually make this enabled by default. Seems fine, but do we really need the option? If it doesn't work we should treat it as an ordinary but and fix it. So far it's against the architecture of the emulated system: our current chipset predates 64 bit PCI. Then it should be enabled/disabled at the chipset level. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9] pci-assign: 64bit MMIO + better MSI-X table support
On 01/30/2012 03:44 PM, Alex Williamson wrote: On Mon, 2012-01-30 at 11:11 +0100, Jan Kiszka wrote: On 2012-01-28 15:21, Alex Williamson wrote: Patch 1 2 here are independent of the rest, but I include them here to avoid conflicts. The first patch enables exposing MMIO BARs as their native width to the guest. I added a config option for this with the default to use the existing behavior as I suspect we may have some latent issues there. Patch 2 is just some trivial debug build warning fixes. The rest of the patches work on improving MSI-X table support. Particularly, vectors can now be updated by the guest after MSI-X is enabled to support things like irqbalance for SMP affinity tuning. We also now update MSI-X configuration as new vectors are unmasked, which enables assignment of MSI-X devices on FreeBSD. I was able to assign and use an 82576 (PF VF) on a FreeBSD 9.0 guest with this series. Hopefully Shashidhar can report whether this improves the behavior he as seeing with an 82599. I wasn't able to get masking to work reliably, so I left that as is for now. Perhaps someone has suggestions on getting that to work. Thanks, Unless it's urging, let's focus on getting this implemented via the MSI/MSI-X core, not widely duplicated in device-assignment. I disagree. This isn't making the code duplication worse and it solves at least two use cases that are currently broken. This won't make it any more difficult to eventually move to msix.c, if it does, the core needs more work. Thanks, I agree (with Alex), but maybe I missed something? Patch 9 does call kvm directly instead of going through msi services, but I don't think this should hold the patches. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9] pci-assign: 64bit MMIO + better MSI-X table support
On 2012-01-31 13:52, Avi Kivity wrote: On 01/30/2012 03:44 PM, Alex Williamson wrote: On Mon, 2012-01-30 at 11:11 +0100, Jan Kiszka wrote: On 2012-01-28 15:21, Alex Williamson wrote: Patch 1 2 here are independent of the rest, but I include them here to avoid conflicts. The first patch enables exposing MMIO BARs as their native width to the guest. I added a config option for this with the default to use the existing behavior as I suspect we may have some latent issues there. Patch 2 is just some trivial debug build warning fixes. The rest of the patches work on improving MSI-X table support. Particularly, vectors can now be updated by the guest after MSI-X is enabled to support things like irqbalance for SMP affinity tuning. We also now update MSI-X configuration as new vectors are unmasked, which enables assignment of MSI-X devices on FreeBSD. I was able to assign and use an 82576 (PF VF) on a FreeBSD 9.0 guest with this series. Hopefully Shashidhar can report whether this improves the behavior he as seeing with an 82599. I wasn't able to get masking to work reliably, so I left that as is for now. Perhaps someone has suggestions on getting that to work. Thanks, Unless it's urging, let's focus on getting this implemented via the MSI/MSI-X core, not widely duplicated in device-assignment. I disagree. This isn't making the code duplication worse and it solves at least two use cases that are currently broken. This won't make it any more difficult to eventually move to msix.c, if it does, the core needs more work. Thanks, I agree (with Alex), but maybe I missed something? Patch 9 does call kvm directly instead of going through msi services, but I don't think this should hold the patches. If this solves the issues, ok. That all needs quite some refactoring anyway (including the kvm core services). Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/9] pci-assign: Optionally enable 64bit BARs in guest
On 2012-01-31 13:51, Avi Kivity wrote: On 01/31/2012 02:45 PM, Jan Kiszka wrote: On 2012-01-31 13:40, Avi Kivity wrote: On 01/28/2012 04:21 PM, Alex Williamson wrote: To date we've only exposed BARs as 32bit even if the device physically supports 64bit BARs. Enable 64bit BARs to be exposed as such in the guest, which may free up MMIO below 4G should the guest choose to use it. This adds a new mem64= option to pci-assign, with the default being off for testing and enablement. The goal is to eventually make this enabled by default. Seems fine, but do we really need the option? If it doesn't work we should treat it as an ordinary but and fix it. So far it's against the architecture of the emulated system: our current chipset predates 64 bit PCI. Then it should be enabled/disabled at the chipset level. Makes me wonder if we already do some filtering if the device supports 64 bit but the next bridge does not. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/9] pci-assign: Optionally enable 64bit BARs in guest
On 01/31/2012 02:57 PM, Jan Kiszka wrote: Seems fine, but do we really need the option? If it doesn't work we should treat it as an ordinary but and fix it. So far it's against the architecture of the emulated system: our current chipset predates 64 bit PCI. Then it should be enabled/disabled at the chipset level. Makes me wonder if we already do some filtering if the device supports 64 bit but the next bridge does not. Our 440fx does support 64-bit bars, so the question doesn't arise for x86. Instead we violate the spec. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for tuesday 31
Am 31.01.2012 00:53, schrieb Anthony Liguori: On 01/30/2012 05:41 PM, Andreas Färber wrote: Am 30.01.2012 19:55, schrieb Juan Quintela: Please send in any agenda items you are interested in covering. VMState: Anthony specifically said that VMState were not affected by QOM and that patches should not be deferred until the merge. Yet there's no review and/or decision-making for a month now. Ping^2 for AHCI+SDHC. Do you have pointers (to pending VMState patches)? http://patchwork.ozlabs.org/patch/137732/ (PATCH v4) It's basically about how to deal with variable-sized arrays. (Alex mentioned it on one call around November.) I found ways to deal with subsets of arrays embedded within the struct and variable-sized list of pointers to structs but no solution for a malloc()'ed array of structs. Maybe I'm just too stupid to see. Anyway, no one commented since Xmas. Igor posted (and refined for v2) a patch with a callback-based approach that I find promising. From my view, unofficially Juan is the VMState guy, he's been cc'ed. Are we lacking an official maintainer that cares? Or is Juan the official, undocumented maintainer but simply busy? SUSE's interest is making AHCI migratable, and my VMState workaround for that is simply ugly: http://patchwork.ozlabs.org/patch/133066/ (RFC) Therefore I'm waiting for some resolution. Regards, Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/9] pci-assign: Optionally enable 64bit BARs in guest
On 2012-01-31 14:10, Avi Kivity wrote: On 01/31/2012 02:57 PM, Jan Kiszka wrote: Seems fine, but do we really need the option? If it doesn't work we should treat it as an ordinary but and fix it. So far it's against the architecture of the emulated system: our current chipset predates 64 bit PCI. Then it should be enabled/disabled at the chipset level. Makes me wonder if we already do some filtering if the device supports 64 bit but the next bridge does not. Our 440fx does support 64-bit bars, so the question doesn't arise for x86. Instead we violate the spec. If you mean by our the 440fx-qemu, not the real 440fx. That one does not even support 1GB RAM. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/9] pci-assign: Optionally enable 64bit BARs in guest
On 01/31/2012 03:21 PM, Jan Kiszka wrote: On 2012-01-31 14:10, Avi Kivity wrote: On 01/31/2012 02:57 PM, Jan Kiszka wrote: Seems fine, but do we really need the option? If it doesn't work we should treat it as an ordinary but and fix it. So far it's against the architecture of the emulated system: our current chipset predates 64 bit PCI. Then it should be enabled/disabled at the chipset level. Makes me wonder if we already do some filtering if the device supports 64 bit but the next bridge does not. Our 440fx does support 64-bit bars, so the question doesn't arise for x86. Instead we violate the spec. If you mean by our the 440fx-qemu, not the real 440fx. That one does not even support 1GB RAM. Yes, that's what I meant. It also supports pci hotplug, more slots, cpu hotplug, etc. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm tools: Repair running on non ioeventfd-capable platforms
On Tue, 2012-01-31 at 17:30 +1100, Matt Evans wrote: Commit d3923126a24212f1e746a84a575dadbd9f259418 added a bunch of nice error checking around ioevent__init() but the init may gracefully fail if ioevents simply aren't supported (PPC64 KVM). This commit adds a new return code for the init -- positive, but identifiable as 'not success 0'. Signed-off-by: Matt Evans m...@ozlabs.org --- Sasha, you screwed PPC64! And it took me a month to notice hahaha! I only broke it two days ago :) -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] srcu: Implement call_srcu()
On 01/31/2012 03:32 PM, Peter Zijlstra wrote: Subject: srcu: Implement call_srcu() From: Peter Zijlstra a.p.zijls...@chello.nl Date: Mon Jan 30 23:20:49 CET 2012 Implement call_srcu() by using a state machine driven by call_rcu_sched() and timer callbacks. The state machine is a direct derivation of the existing synchronize_srcu() code and replaces synchronize_sched() calls with a call_rcu_sched() callback and the schedule_timeout() calls with simple timer callbacks. It then re-implements synchronize_srcu() using a completion where we send the complete through call_srcu(). It completely wrecks synchronize_srcu_expedited() which is only used by KVM. 3 of the 5 2 of the 5 use cases look like they really want to use call_srcu() instead, the remaining 2 I don't know but hope they can, which would let us remove it. They really need to return quickly to userspace, and they really need to perform some operation between rcu_assign_pointer() and returning, so no. Compile tested only!! :-) How much did synchronize_srcu_expedited() regress? Presumably your compiler didn't tell you that. Can we get it back to speed by scheduling a work function on all cpus? wouldn't that force a quiescent state and allow call_srcu() to fire? In kvm's use case synchronize_srcu_expedited() is usually called when no thread is in a critical section, so we don't have to wait for anything except the srcu machinery. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH] srcu: Implement call_srcu()
On Tue, 2012-01-31 at 15:47 +0200, Avi Kivity wrote: They really need to return quickly to userspace, and they really need to perform some operation between rcu_assign_pointer() and returning, so no. Bugger :/ Compile tested only!! :-) How much did synchronize_srcu_expedited() regress? Presumably your compiler didn't tell you that. Nope, quite a lot I figure. Can we get it back to speed by scheduling a work function on all cpus? wouldn't that force a quiescent state and allow call_srcu() to fire? In kvm's use case synchronize_srcu_expedited() is usually called when no thread is in a critical section, so we don't have to wait for anything except the srcu machinery. OK, I'll try and come up with means of making it go fast again ;-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for tuesday 31
On 01/30/2012 05:41 PM, Andreas Färber wrote: Am 30.01.2012 19:55, schrieb Juan Quintela: Please send in any agenda items you are interested in covering. QOM roadmap update: * Series 3/4 is on the list. - Please officially designate a merge date (Friday?). - To make review sensible, I ask for a hard device freeze until merged. I.e., no new devices and no conflicting changes to DeviceInfo. I put together a few slides to help this discussion: http://www.codemonkey.ws/files/qom-overview.pdf Regards, Anthony Liguori * Further roadmap: i440FX, CPU, MemoryRegion (when, who, where, how) VMState: Anthony specifically said that VMState were not affected by QOM and that patches should not be deferred until the merge. Yet there's no review and/or decision-making for a month now. Ping^2 for AHCI+SDHC. Mascot contest: Any update? Were there too few entries? Has it been forgotten? If the contest is still open then that should be stated on the list. Andreas -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for tuesday 31
On 01/31/2012 05:15 PM, Andreas Färber wrote: Am 31.01.2012 00:53, schrieb Anthony Liguori: On 01/30/2012 05:41 PM, Andreas Färber wrote: Am 30.01.2012 19:55, schrieb Juan Quintela: Please send in any agenda items you are interested in covering. VMState: Anthony specifically said that VMState were not affected by QOM and that patches should not be deferred until the merge. Yet there's no review and/or decision-making for a month now. Ping^2 for AHCI+SDHC. Do you have pointers (to pending VMState patches)? http://patchwork.ozlabs.org/patch/137732/ (PATCH v4) It's basically about how to deal with variable-sized arrays. (Alex mentioned it on one call around November.) I found ways to deal with subsets of arrays embedded within the struct and variable-sized list of pointers to structs but no solution for a malloc()'ed array of structs. Maybe I'm just too stupid to see. Anyway, no one commented since Xmas. Igor posted (and refined for v2) a patch with a callback-based approach that I find promising. From my view, unofficially Juan is the VMState guy, he's been cc'ed. Are we lacking an official maintainer that cares? Or is Juan the official, undocumented maintainer but simply busy? SUSE's interest is making AHCI migratable, and my VMState workaround for that is simply ugly: http://patchwork.ozlabs.org/patch/133066/ (RFC) If I'm not mistaken, if you change AHCIState's .ports type to uint32_t you can use existing VMSTATE_BUFFER_MULTIPLY macro like this: VMSTATE_BUFFER_MULTIPLY(dev, AHCIState, 0, NULL, 0, ports, sizeof(AHCIDevice)) VMSTATE_BUFFER_MULTIPLY currently lacks VMS_POINTER flag and therefore doesn't make any use of _start field (you don't need it anyway) Nevertheless, VMSTATE_BUFFER_MULTIPLY is just a partial solution to a bigger set of possible problems. SD card's vmstate implementation requires shift operation, SDHC gets size from switch {} statement, something else later may require division or addition e.t.c., get_bufsize callback will cover all possible cases. -- Mitsyanko Igor ASWG, Moscow RD center, Samsung Electronics email: i.mitsya...@samsung.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Kernel bug?
Hi, We are setting up a ubuntu / openstack / KVM based private cloud. Of the 3 servers involved in this configuration 1 has been crashing frequently over the last month. Since it is the only server responsible for the virtualization (so the only one running kvm virtual machines) my guess was to find some help with you guys. We are running the same install on 2 other machines (not running virtual machines!) who have been completely stable over the last three months, so a hardware issue might also be at play here? I have collected some stack traces over the last month and was hoping one of you might point us in the right direction as to what is going wrong on our system. We are running a default ubuntu 11.10 server install. The kernel has been upgraded from the 3.0.0-12 release to the 3.0.0-14 release over the last month, but we keep getting these crashes. We are now running the 3.2.2-030202-generic hoping that this is a kernel problem, and it is fixed in this latest version. If any additional information is required or I should file an official bug with this information, please let me know... kind regards Bram Jan 27 13:41:28 cmggcn01 kernel: [871350.761867] general protection fault: [#2] SMP Jan 27 13:41:28 cmggcn01 kernel: [871350.790117] CPU 14 Jan 27 13:41:28 cmggcn01 kernel: [871350.790387] Modules linked in: btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs reiserfs ebt_arp ebt_ip 8021q garp ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp kvm_intel kvm nbd vesafb ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi psmouse dcdbas dm_multipath serio_raw joydev ghes hed acpi_power_meter bonding lp parport i7core_edac edac_core ses enclosure usbhid hid megaraid_sas bnx2 Jan 27 13:41:28 cmggcn01 kernel: [871351.005151] Jan 27 13:41:28 cmggcn01 kernel: [871351.036187] Pid: 90, comm: kswapd0 Tainted: G D 3.0.0-14-server #23-Ubuntu Dell Inc. PowerEdge R710/0MD99X Jan 27 13:41:28 cmggcn01 kernel: [871351.072809] RIP: 0010:[a01a5890] [a01a5890] kvm_unmap_rmapp+0x20/0x60 [kvm] Jan 27 13:41:28 cmggcn01 kernel: [871351.105190] RSP: 0018:8817f3e27a60 EFLAGS: 00010202 Jan 27 13:41:28 cmggcn01 kernel: [871351.141329] RAX: 8817f5d067f8 RBX: c9001fd41ff8 RCX: a01a58d0 Jan 27 13:41:28 cmggcn01 kernel: [871351.179076] RDX: RSI: RDI: 8817f5d067f8 Jan 27 13:41:28 cmggcn01 kernel: [871351.212086] RBP: 8817f3e27a80 R08: 8817f315b3e0 R09: 0100 Jan 27 13:41:28 cmggcn01 kernel: [871351.245788] R10: 000e R11: 0002 R12: 8817f2f0c000 Jan 27 13:41:28 cmggcn01 kernel: [871351.277514] R13: R14: 880be235e000 R15: 000d3cff Jan 27 13:41:28 cmggcn01 kernel: [871351.308421] FS: () GS:88183fce() knlGS: Jan 27 13:41:28 cmggcn01 kernel: [871351.339685] CS: 0010 DS: ES: CR0: 8005003b Jan 27 13:41:28 cmggcn01 kernel: [871351.370089] CR2: 7f8836442000 CR3: 01c03000 CR4: 26e0 Jan 27 13:41:28 cmggcn01 kernel: [871351.399771] DR0: DR1: DR2: Jan 27 13:41:28 cmggcn01 kernel: [871351.428208] DR3: DR6: 0ff0 DR7: 0400 Jan 27 13:41:28 cmggcn01 kernel: [871351.456153] Process kswapd0 (pid: 90, threadinfo 8817f3e26000, task 8817f63ac560) Jan 27 13:41:28 cmggcn01 kernel: [871351.484943] Stack: Jan 27 13:41:28 cmggcn01 kernel: [871351.512984] c9001fd41ff8 0001 7f834a87e000 Jan 27 13:41:28 cmggcn01 kernel: [871351.542025] 8817f3e27aa0 a01a5945 880be235e060 0001 Jan 27 13:41:28 cmggcn01 kernel: [871351.571050] 8817f3e27b10 a01a1dd9 8817f3e27ae0 a01a58d0 Jan 27 13:41:28 cmggcn01 kernel: [871351.600455] Call Trace: Jan 27 13:41:28 cmggcn01 kernel: [871351.628903] [a01a5945] kvm_age_rmapp+0x75/0x90 [kvm] Jan 27 13:41:28 cmggcn01 kernel: [871351.659242] [a01a1dd9] kvm_handle_hva+0x99/0x180 [kvm] Jan 27 13:41:28 cmggcn01 kernel: [871351.687386] [a01a58d0] ? kvm_unmap_rmapp+0x60/0x60 [kvm] Jan 27 13:41:28 cmggcn01 kernel: [871351.716053] [a01a8af7] kvm_age_hva+0x17/0x20 [kvm] Jan 27 13:41:28 cmggcn01 kernel: [871351.746652] [a018a4dd] kvm_mmu_notifier_clear_flush_young+0x4d/0x90 [kvm] Jan 27 13:41:28 cmggcn01 kernel: [871351.774490] [8114d7b8] __mmu_notifier_clear_flush_young+0x48/0x60 Jan 27 13:41:28 cmggcn01 kernel: [871351.801948] [81138f1b] page_referenced_one+0x18b/0x1f0 Jan 27 13:41:28 cmggcn01 kernel: [871351.827654]
Re: [Qemu-devel] KVM call agenda for tuesday 31
On 01/31/2012 03:59 PM, Anthony Liguori wrote: On 01/30/2012 05:41 PM, Andreas Färber wrote: Am 30.01.2012 19:55, schrieb Juan Quintela: Please send in any agenda items you are interested in covering. QOM roadmap update: * Series 3/4 is on the list. - Please officially designate a merge date (Friday?). - To make review sensible, I ask for a hard device freeze until merged. I.e., no new devices and no conflicting changes to DeviceInfo. I put together a few slides to help this discussion: http://www.codemonkey.ws/files/qom-overview.pdf That was helpful, thanks. Can you clarify - Types and their properties will be ABI compatible - Types and properties will not be backwards compatible – We can re-examine this as the device model matures and stabilizes the first two seem very similar, except for the not. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for tuesday 31
On 01/31/2012 07:15 AM, Andreas Färber wrote: Am 31.01.2012 00:53, schrieb Anthony Liguori: On 01/30/2012 05:41 PM, Andreas Färber wrote: Am 30.01.2012 19:55, schrieb Juan Quintela: Please send in any agenda items you are interested in covering. VMState: Anthony specifically said that VMState were not affected by QOM and that patches should not be deferred until the merge. Yet there's no review and/or decision-making for a month now. Ping^2 for AHCI+SDHC. Do you have pointers (to pending VMState patches)? http://patchwork.ozlabs.org/patch/137732/ (PATCH v4) It's basically about how to deal with variable-sized arrays. (Alex mentioned it on one call around November.) I found ways to deal with subsets of arrays embedded within the struct and variable-sized list of pointers to structs but no solution for a malloc()'ed array of structs. Maybe I'm just too stupid to see. Anyway, no one commented since Xmas. /me puts on his flame proof suit Don't use VMState. Just open code a save/restore function. VMState is too limited in how it handles complex data structures. I really believe the only long term solution we're going to get to here is something that uses a builder interface (like Visitors). Regards, Anthony Liguori Igor posted (and refined for v2) a patch with a callback-based approach that I find promising. From my view, unofficially Juan is the VMState guy, he's been cc'ed. Are we lacking an official maintainer that cares? Or is Juan the official, undocumented maintainer but simply busy? SUSE's interest is making AHCI migratable, and my VMState workaround for that is simply ugly: http://patchwork.ozlabs.org/patch/133066/ (RFC) Therefore I'm waiting for some resolution. Regards, Andreas -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for tuesday 31
On 01/31/2012 08:09 AM, Avi Kivity wrote: On 01/31/2012 03:59 PM, Anthony Liguori wrote: On 01/30/2012 05:41 PM, Andreas Färber wrote: Am 30.01.2012 19:55, schrieb Juan Quintela: Please send in any agenda items you are interested in covering. QOM roadmap update: * Series 3/4 is on the list. - Please officially designate a merge date (Friday?). - To make review sensible, I ask for a hard device freeze until merged. I.e., no new devices and no conflicting changes to DeviceInfo. I put together a few slides to help this discussion: http://www.codemonkey.ws/files/qom-overview.pdf That was helpful, thanks. Can you clarify - Types and their properties will be ABI compatible - Types and properties will not be backwards compatible – We can re-examine this as the device model matures and stabilizes the first two seem very similar, except for the not. I guess I really mean QMP compatible. The expectation is that well written management code does: if (check_for_command(qom-list)) { run_command(qom-list, /); } ABI compatible means that if qom-list is there, it's semantics never change. Backwards compatibility would mean that once qom-list is introduced, it never goes away. In terms of QOM types, we won't guarantee that a Type will stick around forever, but we will guarantee if it's there, it'll behave in a certain way. When the device model stabilizes over time, we can revisit this, but in the 1.x series, I'm sure we're going to go through a few significant refactorings that change types significantly. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel bug?
On 01/31/2012 04:06 PM, Bram De Wilde wrote: Hi, We are setting up a ubuntu / openstack / KVM based private cloud. Of the 3 servers involved in this configuration 1 has been crashing frequently over the last month. Since it is the only server responsible for the virtualization (so the only one running kvm virtual machines) my guess was to find some help with you guys. We are running the same install on 2 other machines (not running virtual machines!) who have been completely stable over the last three months, so a hardware issue might also be at play here? I have collected some stack traces over the last month and was hoping one of you might point us in the right direction as to what is going wrong on our system. We are running a default ubuntu 11.10 server install. The kernel has been upgraded from the 3.0.0-12 release to the 3.0.0-14 release over the last month, but we keep getting these crashes. We are now running the 3.2.2-030202-generic hoping that this is a kernel problem, and it is fixed in this latest version. If any additional information is required or I should file an official bug with this information, please let me know... File a bug please. Meanwhile: Jan 27 13:41:28 cmggcn01 kernel: [871350.761867] general protection fault: [#2] SMP Jan 27 13:41:28 cmggcn01 kernel: [871350.790117] CPU 14 Jan 27 13:41:28 cmggcn01 kernel: [871350.790387] Modules linked in: btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs reiserfs ebt_arp ebt_ip 8021q garp ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp kvm_intel kvm nbd vesafb ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi psmouse dcdbas dm_multipath serio_raw joydev ghes hed acpi_power_meter bonding lp parport i7core_edac edac_core ses enclosure usbhid hid megaraid_sas bnx2 Jan 27 13:41:28 cmggcn01 kernel: [871351.005151] Jan 27 13:41:28 cmggcn01 kernel: [871351.036187] Pid: 90, comm: kswapd0 Tainted: G D 3.0.0-14-server #23-Ubuntu Dell Inc. PowerEdge R710/0MD99X Jan 27 13:41:28 cmggcn01 kernel: [871351.072809] RIP: 0010:[a01a5890] [a01a5890] kvm_unmap_rmapp+0x20/0x60 [kvm] Jan 27 13:41:28 cmggcn01 kernel: [871351.105190] RSP: 0018:8817f3e27a60 EFLAGS: 00010202 Jan 27 13:41:28 cmggcn01 kernel: [871351.141329] RAX: 8817f5d067f8 RBX: c9001fd41ff8 RCX: a01a58d0 Jan 27 13:41:28 cmggcn01 kernel: [871351.179076] RDX: RSI: RDI: 8817f5d067f8 Jan 27 13:41:28 cmggcn01 kernel: [871351.212086] RBP: 8817f3e27a80 R08: 8817f315b3e0 R09: 0100 Jan 27 13:41:28 cmggcn01 kernel: [871351.245788] R10: 000e R11: 0002 R12: 8817f2f0c000 Jan 27 13:41:28 cmggcn01 kernel: [871351.277514] R13: R14: 880be235e000 R15: 000d3cff Jan 27 13:41:28 cmggcn01 kernel: [871351.308421] FS: () GS:88183fce() knlGS: Jan 27 13:41:28 cmggcn01 kernel: [871351.339685] CS: 0010 DS: ES: CR0: 8005003b Jan 27 13:41:28 cmggcn01 kernel: [871351.370089] CR2: 7f8836442000 CR3: 01c03000 CR4: 26e0 Jan 27 13:41:28 cmggcn01 kernel: [871351.399771] DR0: DR1: DR2: Jan 27 13:41:28 cmggcn01 kernel: [871351.428208] DR3: DR6: 0ff0 DR7: 0400 Jan 27 13:41:28 cmggcn01 kernel: [871351.456153] Process kswapd0 (pid: 90, threadinfo 8817f3e26000, task 8817f63ac560) Jan 27 13:41:28 cmggcn01 kernel: [871351.484943] Stack: Jan 27 13:41:28 cmggcn01 kernel: [871351.512984] c9001fd41ff8 0001 7f834a87e000 Jan 27 13:41:28 cmggcn01 kernel: [871351.542025] 8817f3e27aa0 a01a5945 880be235e060 0001 Jan 27 13:41:28 cmggcn01 kernel: [871351.571050] 8817f3e27b10 a01a1dd9 8817f3e27ae0 a01a58d0 snip Jan 27 13:41:28 cmggcn01 kernel: [871352.080582] Code: e7 d0 e8 e0 66 90 e9 a2 fe ff ff 55 48 89 e5 41 55 41 54 53 48 83 ec 08 66 66 66 66 90 45 31 ed 49 89 fc 48 89 f3 eb 20 0f 1f 00 f6 00 01 74 35 48 8b 15 74 7a 02 00 48 89 c6 4c 89 e7 41 bd 01 0:e8 e0 66 90 e9 callq 0xe99066e5 5:a2 fe ff ff 55 48 89 mov%al,0x41e5894855fe c:e5 41 e:55 push %rbp f:41 54push %r12 11:53 push %rbx 12:48 83 ec 08 sub$0x8,%rsp 16:66 66 66 66 90 data32 data32 data32 xchg %ax,%ax 1b:45 31 ed xor%r13d,%r13d 1e:49 89 fc
Re: [Android-virt] [PATCH v5 00/13] KVM/ARM Implementation
On 01/30/2012 11:46 PM, Peter Maydell wrote: On 20 January 2012 02:59, Christoffer Dall c.d...@virtualopensystems.com wrote: There's a new list of issues available at: https://github.com/virtualopensystems/linux-kvm-arm/issues Thanks for putting this up. Here's a couple more for you :-) * Support guest kernels configured for LPAE At the moment (well, if you have Marc's 3.3rc1 tree with the A15 L2 cache control register bodge in it) you can boot an A15 kernel configured without LPAE as a KVM guest, but an LPAE kernel with LPAE enabled will not boot. This probably mostly requires supporting the 64 bit wide cp15 registers that LPAE implies. * handle QEMU being ^C'd Currently if you ^C the qemu process then instead of a nice clean exit things go a bit pear shaped, with apparently part of the qemu/kvm combo having quit and the rest not and lots of error messages being emitted. (I admit to not having looked at this one enough to be certain tht it's a kernel side thing rather than a QEMU one; I'm just guessing.) -- PMM I took the initiative to add them to that list, thanks. -Antonios -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-Autotest][PATCH 1/5] virt: Check illegal instruction code
Add a check on the base VM class for illegal instruction code executed by the VM. That check is performed on the serial console output. Signed-off-by: Jiří Župka jzu...@redhat.com --- client/virt/virt_vm.py | 27 +++ 1 files changed, 27 insertions(+), 0 deletions(-) diff --git a/client/virt/virt_vm.py b/client/virt/virt_vm.py index 32593c1..d4fb47d 100644 --- a/client/virt/virt_vm.py +++ b/client/virt/virt_vm.py @@ -133,6 +133,18 @@ class VMDeadKernelCrashError(VMError): return (VM is dead due to a kernel crash:\n%s % self.kernel_crash) +class VMInvalidInstructionCode(VMError): +def __init__(self, invalid_code): +VMError.__init__(self, invalid_code) +self.invalid_code = invalid_code + +def __str__(self): +error = +for invalid_code in self.invalid_code: +error += %s % (invalid_code) +return (Invalid instruction was executed on VM:\n%s % error) + + class VMAddressError(VMError): pass @@ -656,6 +668,21 @@ class BaseVM(object): raise VMDeadKernelCrashError(match.group(0)) +def verify_illegal_instructonn(self): + +Find illegal instruction code on VM serial console output. + +@raise: VMInvalidInstructionCode, in case a wrong instruction code. + +if self.serial_console is not None: +data = self.serial_console.get_output() +match = re.findall(r.*trap invalid opcode.*\n, data, + re.MULTILINE) + +if match: +raise VMInvalidInstructionCode(match) + + def get_params(self): Return the VM's params dict. Most modified params take effect only -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-Autotest][PATCH 2/5] virt: Add aliases to class Flag.
The patch solve problem with doubled name of cpuflags sse4_1, sse4.1 etc. in cpuflag test. Signed-off-by: Jiří Župka jzu...@redhat.com --- client/virt/virt_utils.py | 14 -- 1 files changed, 12 insertions(+), 2 deletions(-) diff --git a/client/virt/virt_utils.py b/client/virt/virt_utils.py index a367ffe..20ed4ba 100644 --- a/client/virt/virt_utils.py +++ b/client/virt/virt_utils.py @@ -1294,8 +1294,12 @@ class Flag(str): Class for easy merge cpuflags. -def __init__(self, *args, **kwargs): -super(Flag, self).__init__( *args, **kwargs) +aliases = {} + +def __new__(cls, flag): +if flag in Flag.aliases: +flag = Flag.aliases[flag] +return str.__new__(cls, flag) def __eq__(self, other): s = set(self.split(|)) @@ -1324,6 +1328,12 @@ kvm_map_flags_to_test = { } +kvm_map_flags_aliases = { +'sse4.1' :'sse4_1', +'sse4.2' :'sse4_2', +} + + def kvm_flags_to_stresstests(flags): Covert [cpu flags] to [tests] -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[KVM-Autotest][PATCH 3/5] kvm test: Introduce multi_host.srv
We had migration_control.srv which worked with old version of kvm config files (tests.cfg, subtest.cfg, etc..). Because config files were changed this control code stopped working. This patch repairs this problem and renames the file to the more correct and generic name multi-host.srv. Also, added new configuration file tests-shared.cfg which contains shared data among single host and multi host tests. The changes are useful for multihost tests, since we can use a lot of configuration from regular single host tests. Signed-off-by: Jiří Župka jzu...@redhat.com --- client/tests/kvm/migration_control.srv | 123 -- client/tests/kvm/multi-host-tests.cfg.sample | 43 + client/tests/kvm/multi_host.srv | 106 ++ client/tests/kvm/tests-shared.cfg.sample | 48 ++ client/tests/kvm/tests.cfg.sample| 43 +- 5 files changed, 198 insertions(+), 165 deletions(-) delete mode 100644 client/tests/kvm/migration_control.srv create mode 100644 client/tests/kvm/multi-host-tests.cfg.sample create mode 100644 client/tests/kvm/multi_host.srv create mode 100644 client/tests/kvm/tests-shared.cfg.sample diff --git a/client/tests/kvm/migration_control.srv b/client/tests/kvm/migration_control.srv deleted file mode 100644 index c669ccd..000 --- a/client/tests/kvm/migration_control.srv +++ /dev/null @@ -1,123 +0,0 @@ -AUTHOR = Yolkfull Chow yz...@redhat.com -TIME = SHORT -NAME = KVM Test (migration across multiple hosts) -TEST_CATEGORY = Functional -TEST_CLASS = Virtualization -TEST_TYPE = Server -DOC = -Migrate KVM guest between two hosts. It parses the base config file, restricts -it with appropriate parameters, generates the test dicts, modify the test_dicts -so there's a distinction between the migration roles ('dest' or 'source'). - - -import sys, os, commands, glob, shutil, logging, random -from autotest_lib.server import utils -from autotest_lib.client.common_lib import cartesian_config - -# Specify the directory of autotest before you start this test -AUTOTEST_DIR = '/usr/local/autotest' - -# Specify the root directory that on client machines -rootdir = '/tmp/kvm_autotest_root' - -KVM_DIR = os.path.join(AUTOTEST_DIR, 'client/tests/kvm') - - -def generate_mac_address(): -r = random.SystemRandom() -mac = 9a:%02x:%02x:%02x:%02x:%02x % (r.randint(0x00, 0xff), - r.randint(0x00, 0xff), - r.randint(0x00, 0xff), - r.randint(0x00, 0xff), - r.randint(0x00, 0xff)) -return mac - - -def run(pair): -logging.info(KVM migration running on source host [%s] and destination - host [%s]\n, pair[0], pair[1]) - -source = hosts.create_host(pair[0]) -dest = hosts.create_host(pair[1]) -source_at = autotest.Autotest(source) -dest_at = autotest.Autotest(dest) - -cfg_file = os.path.join(KVM_DIR, tests_base.cfg) - -if not os.path.exists(cfg_file): -raise error.JobError(Config file %s was not found, cfg_file) - -# Get test set (dictionary list) from the configuration file -parser = cartesian_config.Parser() -test_variants = -image_name(_.*)? ?= /tmp/kvm_autotest_root/images/ -cdrom(_.*)? ?= /tmp/kvm_autotest_root/ -floppy ?= /tmp/kvm_autotest_root/ -Linux: -unattended_install: -kernel ?= /tmp/kvm_autotest_root/ -initrd ?= /tmp/kvm_autotest_root/ -qemu_binary = /usr/libexec/qemu-kvm -qemu_img_binary = /usr/bin/qemu-img -only qcow2 -only virtio_net -only virtio_blk -only smp2 -only no_pci_assignable -only smallpages -only Fedora.14.64 -only migrate_multi_host -nic_mode = tap -nic_mac_nic1 = %s - % (generate_mac_address()) -parser.parse_file(cfg_file) -parser.parse_string(test_variants) -test_dicts = parser.get_dicts() - -source_control_file = dest_control_file = -kvm_test_dir = os.path.join(os.environ['AUTODIR'],'tests/kvm') -sys.path.append(kvm_test_dir)\n - -for params in test_dicts: -params['srchost'] = source.ip -params['dsthost'] = dest.ip -params['rootdir'] = rootdir - -source_params = params.copy() -source_params['role'] = source - -dest_params = params.copy() -dest_params['role'] = destination -dest_params['migration_mode'] = tcp - -# Report the parameters we've received -print Test parameters: -keys = params.keys() -keys.sort() -for key in keys: -logging.debug(%s = %s, key, params[key]) - -source_control_file += (job.run_test('kvm', tag='%s', params=%s) % -(source_params['shortname'], source_params)) -dest_control_file += (job.run_test('kvm', tag='%s', params=%s) % - (dest_params['shortname'], dest_params)) - -logging.info('Source control
[KVM-Autotest][PATCH 5/5] kvm test: Add multihost migration support to cpuflag test
Signed-off-by: Jiří Župka jzu...@redhat.com --- client/tests/kvm/tests/cpuflags.py | 284 +++- client/virt/subtests.cfg.sample| 17 ++- 2 files changed, 263 insertions(+), 38 deletions(-) diff --git a/client/tests/kvm/tests/cpuflags.py b/client/tests/kvm/tests/cpuflags.py index 6f281d0..c2111d4 100644 --- a/client/tests/kvm/tests/cpuflags.py +++ b/client/tests/kvm/tests/cpuflags.py @@ -1,4 +1,4 @@ -import logging, re, random, os, time +import logging, re, random, os, time, socket, pickle from autotest_lib.client.common_lib import error, utils from autotest_lib.client.virt import kvm_vm from autotest_lib.client.virt import virt_utils, aexpect @@ -14,10 +14,9 @@ def run_cpuflags(test, params, env): @param params: Dictionary with the test parameters. @param env: Dictionary with test environment. +virt_utils.Flag.aliases = virt_utils.kvm_map_flags_aliases qemu_binary = virt_utils.get_path('.', params.get(qemu_binary, qemu)) -cpuflags_path = os.path.join(test.virtdir, deps) -cpuflags_tar = cpuflags-test.tar.bz2 cpuflags_src = os.path.join(test.virtdir, deps, test_cpu_flags) smp = int(params.get(smp, 1)) @@ -25,8 +24,9 @@ def run_cpuflags(test, params, env): mig_timeout = float(params.get(mig_timeout, 3600)) mig_protocol = params.get(migration_protocol, tcp) -mig_speed = params.get(mig_speed, 1G) +mig_speed = params.get(mig_speed, 100M) +multi_host_migration = params.get(multi_host_migration, no) class HgFlags(object): def __init__(self, cpu_model, extra_flags=set([])): @@ -62,7 +62,8 @@ def run_cpuflags(test, params, env): virtual_flags) -def start_guest_with_cpuflags(cpuflags, smp=None): +def start_guest_with_cpuflags(cpuflags, smp=None, migration=False, + wait=True): Try to boot guest with special cpu flags and try login in to them. @@ -74,10 +75,15 @@ def run_cpuflags(test, params, env): vm_name = vm1-cpuflags vm = kvm_vm.VM(vm_name, params_b, test.bindir, env['address_cache']) env.register_vm(vm_name, vm) -vm.create() +if (migration is True): +vm.create(migration_mode=mig_protocol) +else: +vm.create() vm.verify_alive() -session = vm.wait_for_login() +session = None +if wait: +session = vm.wait_for_login() return (vm, session) @@ -228,8 +234,10 @@ def run_cpuflags(test, params, env): session = vm.wait_for_login() vm.copy_files_to(cpuflags_src, dst_dir) +session.cmd(sync) session.cmd(cd %s; make EXTRA_FLAGS=''; % os.path.join(dst_dir, test_cpu_flags)) +session.cmd(sync) session.close() @@ -293,7 +301,30 @@ def run_cpuflags(test, params, env): return cpu_model -def test_qemu_interface(): +def parse_cpu_model(): + +Parse cpu_models from config file. + +@return: [(cpumodel, extra_flags)] + +cpu_models = params.get(cpu_models,).split() +if not cpu_models: +cpu_models = get_cpu_models() + +logging.debug(CPU models found: %s, str(cpu_models)) +models = [] +for cpu_model in cpu_models: +try: +(cpu_model, extra_flags) = cpu_model.split(:) +extra_flags = set(map(virt_utils.Flag, extra_flags.split(,))) +except ValueError: +cpu_model = cpu_model +extra_flags = set([]) +models.append((cpu_model,extra_flags)) +return models + + +def test_qemu_interface_group(): 1) qemu-kvm-cmd -cpu ?model 2) qemu-kvm-cmd -cpu ?dump @@ -349,23 +380,20 @@ def run_cpuflags(test, params, env): test_qemu_dump() test_qemu_cpuid() + class test_temp(Subtest): def clean(self): logging.info(cleanup) if (hasattr(self, vm)): self.vm.destroy(gracefully=False) -def test_boot_guest(): + +def test_boot_guest_group(): 1) boot with cpu_model 2) migrate with flags 3) qemu-kvm-cmd -cpu model_name,+Flag -cpu_models = params.get(cpu_models,).split() -if not cpu_models: -cpu_models = get_cpu_models() -logging.debug(CPU models found: %s, str(cpu_models)) - # 1) boot with cpu_model class test_boot_cpu_model(test_temp): def test(self, cpu_model): @@ -460,29 +488,18 @@ def run_cpuflags(test, params, env): if fwarn_flags: raise error.TestFail(Qemu did not warn the use of flags %s % str(fwarn_flags)) -for cpu_model in cpu_models: -try: -
Re: [Qemu-devel] KVM call agenda for tuesday 31
On 01/31/2012 04:17 PM, Anthony Liguori wrote: On 01/31/2012 08:09 AM, Avi Kivity wrote: On 01/31/2012 03:59 PM, Anthony Liguori wrote: On 01/30/2012 05:41 PM, Andreas Färber wrote: Am 30.01.2012 19:55, schrieb Juan Quintela: Please send in any agenda items you are interested in covering. QOM roadmap update: * Series 3/4 is on the list. - Please officially designate a merge date (Friday?). - To make review sensible, I ask for a hard device freeze until merged. I.e., no new devices and no conflicting changes to DeviceInfo. I put together a few slides to help this discussion: http://www.codemonkey.ws/files/qom-overview.pdf That was helpful, thanks. Can you clarify - Types and their properties will be ABI compatible - Types and properties will not be backwards compatible – We can re-examine this as the device model matures and stabilizes the first two seem very similar, except for the not. I guess I really mean QMP compatible. The expectation is that well written management code does: if (check_for_command(qom-list)) { run_command(qom-list, /); } ABI compatible means that if qom-list is there, it's semantics never change. Backwards compatibility would mean that once qom-list is introduced, it never goes away. In terms of QOM types, we won't guarantee that a Type will stick around forever, but we will guarantee if it's there, it'll behave in a certain way. When the device model stabilizes over time, we can revisit this, but in the 1.x series, I'm sure we're going to go through a few significant refactorings that change types significantly. Ok. We have to communicate this very carefully. If a management tool uses something that is ABI compatible, but not backwards compatible, and lacks a known alternative at the time of writing the management tool, then the ABI-compatible-but-not-backwards-compatible property becomes not backwards compatible to the management tool's users. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for tuesday 31
On Tue, Jan 31, 2012 at 08:12:29AM -0600, Anthony Liguori wrote: On 01/31/2012 07:15 AM, Andreas Färber wrote: Am 31.01.2012 00:53, schrieb Anthony Liguori: On 01/30/2012 05:41 PM, Andreas Färber wrote: Am 30.01.2012 19:55, schrieb Juan Quintela: Please send in any agenda items you are interested in covering. VMState: Anthony specifically said that VMState were not affected by QOM and that patches should not be deferred until the merge. Yet there's no review and/or decision-making for a month now. Ping^2 for AHCI+SDHC. Do you have pointers (to pending VMState patches)? http://patchwork.ozlabs.org/patch/137732/ (PATCH v4) It's basically about how to deal with variable-sized arrays. (Alex mentioned it on one call around November.) I found ways to deal with subsets of arrays embedded within the struct and variable-sized list of pointers to structs but no solution for a malloc()'ed array of structs. Maybe I'm just too stupid to see. Anyway, no one commented since Xmas. /me puts on his flame proof suit Don't use VMState. Just open code a save/restore function. VMState is too limited in how it handles complex data structures. I really believe the only long term solution we're going to get to here is something that uses a builder interface (like Visitors). Regards, Anthony Liguori So the TPM patches started implementing a visitor interface for BER, they are still blocked, right? Igor posted (and refined for v2) a patch with a callback-based approach that I find promising. From my view, unofficially Juan is the VMState guy, he's been cc'ed. Are we lacking an official maintainer that cares? Or is Juan the official, undocumented maintainer but simply busy? SUSE's interest is making AHCI migratable, and my VMState workaround for that is simply ugly: http://patchwork.ozlabs.org/patch/133066/ (RFC) Therefore I'm waiting for some resolution. Regards, Andreas -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for tuesday 31
On Tue, Jan 31, 2012 at 05:04:48PM +0200, Michael S. Tsirkin wrote: On Tue, Jan 31, 2012 at 08:12:29AM -0600, Anthony Liguori wrote: On 01/31/2012 07:15 AM, Andreas Färber wrote: Am 31.01.2012 00:53, schrieb Anthony Liguori: On 01/30/2012 05:41 PM, Andreas Färber wrote: Am 30.01.2012 19:55, schrieb Juan Quintela: Please send in any agenda items you are interested in covering. VMState: Anthony specifically said that VMState were not affected by QOM and that patches should not be deferred until the merge. Yet there's no review and/or decision-making for a month now. Ping^2 for AHCI+SDHC. Do you have pointers (to pending VMState patches)? http://patchwork.ozlabs.org/patch/137732/ (PATCH v4) It's basically about how to deal with variable-sized arrays. (Alex mentioned it on one call around November.) I found ways to deal with subsets of arrays embedded within the struct and variable-sized list of pointers to structs but no solution for a malloc()'ed array of structs. Maybe I'm just too stupid to see. Anyway, no one commented since Xmas. /me puts on his flame proof suit Don't use VMState. Just open code a save/restore function. VMState is too limited in how it handles complex data structures. I really believe the only long term solution we're going to get to here is something that uses a builder interface (like Visitors). Regards, Anthony Liguori So the TPM patches started implementing a visitor interface for BER, they are still blocked, right? And by the way, I believe we really should switch to something like BER and just drop the legacy migration format - being non self delimiting makes it just too painful for words to abstract, and attempts to hide that mess behind a visitor interface will just cause more cruft to accumulate, and cause inefficiencies such as extra data copies. Igor posted (and refined for v2) a patch with a callback-based approach that I find promising. From my view, unofficially Juan is the VMState guy, he's been cc'ed. Are we lacking an official maintainer that cares? Or is Juan the official, undocumented maintainer but simply busy? SUSE's interest is making AHCI migratable, and my VMState workaround for that is simply ugly: http://patchwork.ozlabs.org/patch/133066/ (RFC) Therefore I'm waiting for some resolution. Regards, Andreas -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for tuesday 31
On 01/31/2012 03:12 PM, Anthony Liguori wrote: Don't use VMState. Just open code a save/restore function. VMState is too limited in how it handles complex data structures. I really believe the only long term solution we're going to get to here is something that uses a builder interface (like Visitors). Visitors for complex data structures still require you separate get/set functions. VMState was created to avoid having to write things twice and avoid getting the two out of sync. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4 V11] Add ioctl for KVMCLOCK_GUEST_STOPPED
Now that we have a flag that will tell the guest it was suspended, create an interface for that communication using a KVM ioctl. Signed-off-by: Eric B Munson emun...@mgebm.net Cc: mi...@redhat.com Cc: h...@zytor.com Cc: ry...@linux.vnet.ibm.com Cc: aligu...@us.ibm.com Cc: mtosa...@redhat.com Cc: kvm@vger.kernel.org Cc: linux-a...@vger.kernel.org Cc: x...@kernel.org Cc: linux-ker...@vger.kernel.org --- Changes from V10: Return the ioctl to per vcpu instead of per vm Changes from V9: Use kvm_for_each_vcpu to iterate online vcpu's Changes from V8: Make KVM_GUEST_PAUSED a per vm ioctl instead of per vcpu Changes from V7: Define KVM_CAP_GUEST_PAUSED and support check Call mark_page_dirty () after setting PVCLOCK_GUEST_STOPPED Changes from V4: Rename KVM_GUEST_PAUSED to KVMCLOCK_GUEST_PAUSED Add new ioctl description to api.txt Documentation/virtual/kvm/api.txt | 13 + arch/x86/kvm/x86.c| 21 + include/linux/kvm.h |3 +++ 3 files changed, 37 insertions(+), 0 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index e1d94bf..1931e5c 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1491,6 +1491,19 @@ following algorithm: Some guests configure the LINT1 NMI input to cause a panic, aiding in debugging. +4.65 KVMCLOCK_GUEST_PAUSED + +Capability: KVM_CAP_GUEST_PAUSED +Architechtures: Any that implement pvclocks (currently x86 only) +Type: vcpu ioctl +Parameters: None +Returns: 0 on success, -1 on error + +This signals to the host kernel that the specified guest is being paused by +userspace. The host will set a flag in the pvclock structure that is checked +from the soft lockup watchdog. This ioctl can be called during pause or +unpause. + 5. The kvm_run structure Application code obtains a pointer to the kvm_run structure by diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 14d6cad..c9cabba 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2056,6 +2056,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_X86_ROBUST_SINGLESTEP: case KVM_CAP_XSAVE: case KVM_CAP_ASYNC_PF: + case KVM_CAP_GUEST_PAUSED: case KVM_CAP_GET_TSC_KHZ: r = 1; break; @@ -2503,6 +2504,22 @@ static int kvm_vcpu_ioctl_x86_set_xcrs(struct kvm_vcpu *vcpu, return r; } +/* + * kvm_set_guest_paused() indicates to the guest kernel that it has been + * stopped by the hypervisor. This function will be called from the host only. + * EINVAL is returned when the host attempts to set the flag for a guest that + * does not support pv clocks. + */ +static int kvm_set_guest_paused(struct kvm_vcpu *vcpu) +{ + struct pvclock_vcpu_time_info *src = vcpu-arch.hv_clock; + if (!vcpu-arch.time_page) + return -EINVAL; + src-flags |= PVCLOCK_GUEST_STOPPED; + mark_page_dirty(vcpu-kvm, vcpu-arch.time PAGE_SHIFT); + return 0; +} + long kvm_arch_vcpu_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { @@ -2784,6 +2801,10 @@ long kvm_arch_vcpu_ioctl(struct file *filp, goto out; } + case KVMCLOCK_GUEST_PAUSED: { + r = kvm_set_guest_paused(vcpu); + break; + } default: r = -EINVAL; } diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 68e67e5..4ffe0df 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -558,6 +558,7 @@ struct kvm_ppc_pvinfo { #define KVM_CAP_PPC_PAPR 68 #define KVM_CAP_S390_GMAP 71 #define KVM_CAP_TSC_DEADLINE_TIMER 72 +#define KVM_CAP_GUEST_PAUSED 73 #ifdef KVM_CAP_IRQ_ROUTING @@ -763,6 +764,8 @@ struct kvm_clock_data { #define KVM_CREATE_SPAPR_TCE _IOW(KVMIO, 0xa8, struct kvm_create_spapr_tce) /* Available with KVM_CAP_RMA */ #define KVM_ALLOCATE_RMA _IOR(KVMIO, 0xa9, struct kvm_allocate_rma) +/* VM is being stopped by host */ +#define KVMCLOCK_GUEST_PAUSED_IO(KVMIO, 0xaa) #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1 0) -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4 V11] Add flag to indicate that a vm was stopped by the host
This flag will be used to check if the vm was stopped by the host when a soft lockup was detected. The host will set the flag when it stops the guest. On resume, the guest will check this flag if a soft lockup is detected and skip issuing the warning. Signed-off-by: Eric B Munson emun...@mgebm.net Cc: mi...@redhat.com Cc: h...@zytor.com Cc: ry...@linux.vnet.ibm.com Cc: aligu...@us.ibm.com Cc: mtosa...@redhat.com Cc: kvm@vger.kernel.org Cc: linux-a...@vger.kernel.org Cc: x...@kernel.org Cc: linux-ker...@vger.kernel.org --- arch/x86/include/asm/pvclock-abi.h |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/pvclock-abi.h b/arch/x86/include/asm/pvclock-abi.h index 35f2d19..6167fd7 100644 --- a/arch/x86/include/asm/pvclock-abi.h +++ b/arch/x86/include/asm/pvclock-abi.h @@ -40,5 +40,6 @@ struct pvclock_wall_clock { } __attribute__((__packed__)); #define PVCLOCK_TSC_STABLE_BIT (1 0) +#define PVCLOCK_GUEST_STOPPED (1 1) #endif /* __ASSEMBLY__ */ #endif /* _ASM_X86_PVCLOCK_ABI_H */ -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4 V11] Add check for suspended vm in softlockup detector
A suspended VM can cause spurious soft lockup warnings. To avoid these, the watchdog now checks if the kernel knows it was stopped by the host and skips the warning if so. When the watchdog is reset successfully, clear the guest paused flag. Signed-off-by: Eric B Munson emun...@mgebm.net Cc: mi...@redhat.com Cc: h...@zytor.com Cc: ry...@linux.vnet.ibm.com Cc: aligu...@us.ibm.com Cc: mtosa...@redhat.com Cc: kvm@vger.kernel.org Cc: linux-a...@vger.kernel.org Cc: x...@kernel.org Cc: linux-ker...@vger.kernel.org --- Changes from V3: Clear the PAUSED flag when the watchdog is reset kernel/watchdog.c | 12 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 1d7bca7..91485e5 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -25,6 +25,7 @@ #include linux/sysctl.h #include asm/irq_regs.h +#include linux/kvm_para.h #include linux/perf_event.h int watchdog_enabled = 1; @@ -280,6 +281,9 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) __this_cpu_write(softlockup_touch_sync, false); sched_clock_tick(); } + + /* Clear the guest paused flag on watchdog reset */ + kvm_check_and_clear_guest_paused(); __touch_watchdog(); return HRTIMER_RESTART; } @@ -292,6 +296,14 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) */ duration = is_softlockup(touch_ts); if (unlikely(duration)) { + /* +* If a virtual machine is stopped by the host it can look to +* the watchdog like a soft lockup, check to see if the host +* stopped the vm before we issue the warning +*/ + if (kvm_check_and_clear_guest_paused()) + return HRTIMER_RESTART; + /* only warn once */ if (__this_cpu_read(soft_watchdog_warn) == true) return HRTIMER_RESTART; -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4 V11] Add functions to check if the host has stopped the vm
When a host stops or suspends a VM it will set a flag to show this. The watchdog will use these functions to determine if a softlockup is real, or the result of a suspended VM. Signed-off-by: Eric B Munson emun...@mgebm.net asm-generic changes Acked-by: Arnd Bergmann a...@arndb.de Cc: mi...@redhat.com Cc: h...@zytor.com Cc: ry...@linux.vnet.ibm.com Cc: aligu...@us.ibm.com Cc: mtosa...@redhat.com Cc: kvm@vger.kernel.org Cc: linux-a...@vger.kernel.org Cc: x...@kernel.org Cc: linux-ker...@vger.kernel.org --- Changes from V6: Use __this_cpu_and when clearing the PVCLOCK_GUEST_STOPPED flag Changes from V5: Collapse generic stubs into this patch check_and_clear_guest_stopped() takes no args and uses __get_cpu_var() Include individual definitions in ia64, s390, and powerpc arch/ia64/include/asm/kvm_para.h|5 + arch/powerpc/include/asm/kvm_para.h |5 + arch/s390/include/asm/kvm_para.h|5 + arch/x86/include/asm/kvm_para.h |8 arch/x86/kernel/kvmclock.c | 21 + 5 files changed, 44 insertions(+), 0 deletions(-) diff --git a/arch/ia64/include/asm/kvm_para.h b/arch/ia64/include/asm/kvm_para.h index 1588aee..2019cb9 100644 --- a/arch/ia64/include/asm/kvm_para.h +++ b/arch/ia64/include/asm/kvm_para.h @@ -26,6 +26,11 @@ static inline unsigned int kvm_arch_para_features(void) return 0; } +static inline bool kvm_check_and_clear_guest_paused(void) +{ + return false; +} + #endif #endif diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index 50533f9..1f80293 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -169,6 +169,11 @@ static inline unsigned int kvm_arch_para_features(void) return r; } +static inline bool kvm_check_and_clear_guest_paused(void) +{ + return false; +} + #endif /* __KERNEL__ */ #endif /* __POWERPC_KVM_PARA_H__ */ diff --git a/arch/s390/include/asm/kvm_para.h b/arch/s390/include/asm/kvm_para.h index 6964db2..a988329 100644 --- a/arch/s390/include/asm/kvm_para.h +++ b/arch/s390/include/asm/kvm_para.h @@ -149,6 +149,11 @@ static inline unsigned int kvm_arch_para_features(void) return 0; } +static inline bool kvm_check_and_clear_guest_paused(void) +{ + return false; +} + #endif #endif /* __S390_KVM_PARA_H */ diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 734c376..99c4bbe 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -95,6 +95,14 @@ struct kvm_vcpu_pv_apf_data { extern void kvmclock_init(void); extern int kvm_register_clock(char *txt); +#ifdef CONFIG_KVM_CLOCK +bool kvm_check_and_clear_guest_paused(void); +#else +static inline bool kvm_check_and_clear_guest_paused(void) +{ + return false; +} +#endif /* CONFIG_KVMCLOCK */ /* This instruction is vmcall. On non-VT architectures, it will generate a * trap that we will then rewrite to the appropriate instruction. diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index 44842d7..bdf6423 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -22,6 +22,7 @@ #include asm/msr.h #include asm/apic.h #include linux/percpu.h +#include linux/hardirq.h #include asm/x86_init.h #include asm/reboot.h @@ -114,6 +115,26 @@ static void kvm_get_preset_lpj(void) preset_lpj = lpj; } +bool kvm_check_and_clear_guest_paused(void) +{ + bool ret = false; + struct pvclock_vcpu_time_info *src; + + /* +* per_cpu() is safe here because this function is only called from +* timer functions where preemption is already disabled. +*/ + WARN_ON(!in_atomic()); + src = __get_cpu_var(hv_clock); + if ((src-flags PVCLOCK_GUEST_STOPPED) != 0) { + __this_cpu_and(hv_clock.flags, ~PVCLOCK_GUEST_STOPPED); + ret = true; + } + + return ret; +} +EXPORT_SYMBOL_GPL(kvm_check_and_clear_guest_paused); + static struct clocksource kvm_clock = { .name = kvm-clock, .read = kvm_clock_get_cycles, -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4 V11] Avoid soft lockup message when KVM is stopped by host
Changes from V10: Return ioctl to per vcpu instead of per vm Changes from V9: Use kvm_for_each_vcpu to iterate online vcpu's Changes from V8: Make KVM_GUEST_PAUSED a per vm ioctl instead of per vcpu Changes from V7: Define KVM_CAP_GUEST_PAUSED and support check Call mark_page_dirty () after setting PVCLOCK_GUEST_STOPPED Changes from V6: Use __this_cpu_and when clearing the PVCLOCK_GUEST_STOPPED flag Changes from V5: Collapse generic check_and_clear_guest_stopped into patch 2 Include check_and_clear_guest_stopped defintion to ia64, s390, and powerpc Change check_and_clear_guest_stopped to use __get_cpu_var instead of taking the cpuid arg. Protect check_and_clear_guest_stopped declaration with CONFIG_KVM_CLOCK check Changes from V4: Rename KVM_GUEST_PAUSED to KVMCLOCK_GUEST_PAUSED Add description of KVMCLOCK_GUEST_PAUSED ioctl to api.txt Changes from V3: Include CC's on patch 3 Drop clear flag ioctl and have the watchdog clear the flag when it is reset Changes from V2: A new kvm functions defined in kvm_para.h, the only change to pvclock is the initial flag definition Changes from V1: (Thanks Marcelo) Host code has all been moved to arch/x86/kvm/x86.c KVM_PAUSE_GUEST was renamed to KVM_GUEST_PAUSED Cc: mi...@redhat.com Cc: h...@zytor.com Cc: ry...@linux.vnet.ibm.com Cc: aligu...@us.ibm.com Cc: mtosa...@redhat.com Cc: kvm@vger.kernel.org Cc: linux-a...@vger.kernel.org Cc: x...@kernel.org Cc: linux-ker...@vger.kernel.org Eric B Munson (4): Add flag to indicate that a vm was stopped by the host Add functions to check if the host has stopped the vm Add ioctl for KVMCLOCK_GUEST_STOPPED Add check for suspended vm in softlockup detector Documentation/virtual/kvm/api.txt | 13 + arch/ia64/include/asm/kvm_para.h|5 + arch/powerpc/include/asm/kvm_para.h |5 + arch/s390/include/asm/kvm_para.h|5 + arch/x86/include/asm/kvm_para.h |8 arch/x86/include/asm/pvclock-abi.h |1 + arch/x86/kernel/kvmclock.c | 21 + arch/x86/kvm/x86.c | 21 + include/linux/kvm.h |3 +++ kernel/watchdog.c | 12 10 files changed, 94 insertions(+), 0 deletions(-) -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] reset edge sense circuit of i8259 on init
On Tue, Jan 24, 2012 at 03:06:05PM +0200, Gleb Natapov wrote: The spec says that during initialization The edge sense circuit is reset which means that following initialization an interrupt request (IR) input must make a low-to-high transition to generate an interrupt, but currently if edge triggered interrupt is in IRR it is delivered after i8259 initialization. ping Signed-off-by: Gleb Natapov g...@redhat.com diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c index b6a7353..81cf4fa 100644 --- a/arch/x86/kvm/i8259.c +++ b/arch/x86/kvm/i8259.c @@ -307,6 +307,7 @@ static void pic_ioport_write(void *opaque, u32 addr, u32 val) if (val 0x10) { s-init4 = val 1; s-last_irr = 0; + s-irr = s-elcr; s-imr = 0; s-priority_add = 0; s-special_mask = 0; -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for tuesday 31
Am 31.01.2012 14:59, schrieb Anthony Liguori: On 01/30/2012 05:41 PM, Andreas Färber wrote: Am 30.01.2012 19:55, schrieb Juan Quintela: Please send in any agenda items you are interested in covering. QOM roadmap update: * Series 3/4 is on the list. - Please officially designate a merge date (Friday?). - To make review sensible, I ask for a hard device freeze until merged. I.e., no new devices and no conflicting changes to DeviceInfo. I put together a few slides to help this discussion: http://www.codemonkey.ws/files/qom-overview.pdf So basically we barely managed to walk through the slides without getting into any of the requested agenda points. ;) Re slide 5 I could use a more detailed explanation. From what I understand, ... ...types are registered through device_init() and module_call_init(). ...classes are lazily initialized, eventually calling class_init. ...objects are instantiated through object_new[_with_type](), eventually calling instance_init, by convention named initfn. Up to here would be step 1. How does the realize step 2 fit in technically? What types does this apply to? All? Who calls realize? Where should class authors draw the line between the two? (DOs and DON'Ts) Then some conceptual questions: How is inheritance (virtual methods in C++) supposed to work? (in my case a reset function overwritten by each subclass) I've seen code in series 3/4 calling methods the parent class defines in its header, replacing the parent class' function pointer? Or should derived classes store their parent's function pointer in a new ObjectClass and invoke that? (i.e., are such parent functions supposed to be public or static) When you talk about setting properties, does that refer to using the property accessor functions moved to Object in series 3/4? Or should qdev devices that currently expose only arbitrary setup functions instead move their state to a header file so that it can be written into by the parent? If so, should such headers reside in hw/ or include/qemu/? Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 42703] New: random hangs on virtualization host
https://bugzilla.kernel.org/show_bug.cgi?id=42703 Summary: random hangs on virtualization host Product: Virtualization Version: unspecified Kernel Version: 3.0.0-16 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: kvm AssignedTo: virtualization_...@kernel-bugs.osdl.org ReportedBy: gbramdewi...@gmail.com Regression: No Hi, We are experiencing random system crashes on our ubuntu 11.10 server running an openstack / KVM cloud. Unfortunately we did not observe any specific behavior that trigers the problem, but I have collected a couple of stack traces. Running on a dell 710 dual Intel Xeon X5650 CPU's Please let me know if I can provide more information. Kind regards bram Jan 13 17:06:46 cmggcn01 kernel: [876590.142455] general protection fault: [#1] SMP Jan 13 17:06:46 cmggcn01 kernel: [876590.165966] CPU 4 Jan 13 17:06:46 cmggcn01 kernel: [876590.166135] Modules linked in: ebt_arp ebt_ip 8021q garp ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp kvm_intel kvm nbd vesafb ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dcdbas dm_multipath psmouse serio_raw ghes hed acpi_power_meter bonding i7core_edac lp joydev parport edac_core ses enclosure usbhid hid megaraid_sas bnx2 Jan 13 17:06:46 cmggcn01 kernel: [876590.313703] Jan 13 17:06:46 cmggcn01 kernel: [876590.337109] Pid: 93, comm: ksmd Not tainted 3.0.0-14-server #23-Ubuntu Dell Inc. PowerEdge R710/0MD99X Jan 13 17:06:46 cmggcn01 kernel: [876590.362347] RIP: 0010:[a01b3411] [a01b3411] kvm_set_pte_rmapp+0x51/0x130 [kvm] Jan 13 17:06:46 cmggcn01 kernel: [876590.386424] RSP: 0018:8817f4129bc0 EFLAGS: 00010202 Jan 13 17:06:46 cmggcn01 kernel: [876590.411497] RAX: 88050d943ff8 RBX: 88050d943ff8 RCX: a01b33c0 Jan 13 17:06:46 cmggcn01 kernel: [876590.435831] RDX: 8817f4129c88 RSI: RDI: 88050d943ff8 Jan 13 17:06:46 cmggcn01 kernel: [876590.460571] RBP: 8817f4129c00 R08: 880bf6012960 R09: 0100 Jan 13 17:06:46 cmggcn01 kernel: [876590.484258] R10: 00ab R11: 0002 R12: c900288c3ff8 Jan 13 17:06:46 cmggcn01 kernel: [876590.507470] R13: 8817f4129c88 R14: 880b917d8000 R15: 002f7279 Jan 13 17:06:46 cmggcn01 kernel: [876590.531508] FS: () GS:88183fc4() knlGS: Jan 13 17:06:46 cmggcn01 kernel: [876590.554747] CS: 0010 DS: ES: CR0: 8005003b Jan 13 17:06:46 cmggcn01 kernel: [876590.577706] CR2: 7f3ac168a000 CR3: 01c03000 CR4: 26e0 Jan 13 17:06:46 cmggcn01 kernel: [876590.602479] DR0: DR1: DR2: Jan 13 17:06:46 cmggcn01 kernel: [876590.624973] DR3: DR6: 0ff0 DR7: 0400 Jan 13 17:06:46 cmggcn01 kernel: [876590.647214] Process ksmd (pid: 93, threadinfo 8817f4128000, task 8817f63fdc80) Jan 13 17:06:46 cmggcn01 kernel: [876590.669444] Stack: Jan 13 17:06:46 cmggcn01 kernel: [876590.690742] 8817f4129c10 a01b3409 008f 880b6bea70b0 Jan 13 17:06:46 cmggcn01 kernel: [876590.713167] 0002 7f54c405 880b6bea7000 0010d9ff Jan 13 17:06:46 cmggcn01 kernel: [876590.735595] 8817f4129c70 a01b0dd9 8817f4129c80 a01b33c0 Jan 13 17:06:46 cmggcn01 kernel: [876590.758191] Call Trace: Jan 13 17:06:46 cmggcn01 kernel: [876590.780877] [a01b3409] ? kvm_set_pte_rmapp+0x49/0x130 [kvm] Jan 13 17:06:46 cmggcn01 kernel: [876590.804088] [a01b0dd9] kvm_handle_hva+0x99/0x180 [kvm] Jan 13 17:06:46 cmggcn01 kernel: [876590.829628] [a01b33c0] ? rmap_write_protect+0x150/0x150 [kvm] Jan 13 17:06:46 cmggcn01 kernel: [876590.853510] [a01b7ad1] kvm_set_spte_hva+0x21/0x30 [kvm] Jan 13 17:06:46 cmggcn01 kernel: [876590.876966] [a019591d] kvm_mmu_notifier_change_pte+0x5d/0x90 [kvm] Jan 13 17:06:46 cmggcn01 kernel: [876590.901395] [8114d87e] __mmu_notifier_change_pte+0x3e/0x80 Jan 13 17:06:46 cmggcn01 kernel: [876590.925191] [8114e07f] write_protect_page+0x10f/0x170 Jan 13 17:06:46 cmggcn01 kernel: [876590.950156] [8114e2df] ? replace_page+0x1ff/0x280 Jan 13 17:06:46 cmggcn01 kernel: [876590.974177] [8114e3e0] try_to_merge_one_page+0x80/0x220 Jan 13 17:06:46 cmggcn01 kernel: [876590.999032] [8114e5f7] try_to_merge_with_ksm_page+0x77/0xc0 Jan 13 17:06:46 cmggcn01 kernel: [876591.022944] [8114f616] cmp_and_merge_page+0xe6/0x260 Jan 13 17:06:46 cmggcn01 kernel: [876591.046993] [8114f83f]
[Bug 42703] random hangs on virtualization host
https://bugzilla.kernel.org/show_bug.cgi?id=42703 Avi Kivity a...@redhat.com changed: What|Removed |Added CC||a...@redhat.com AssignedTo|virtualization_kvm@kernel-b |a...@redhat.com |ugs.osdl.org| --- Comment #1 from Avi Kivity a...@redhat.com 2012-01-31 17:15:48 --- Jan 27 13:41:28 cmggcn01 kernel: [871350.761867] general protection fault: [#2] SMP Jan 27 13:41:28 cmggcn01 kernel: [871350.790117] CPU 14 Jan 27 13:41:28 cmggcn01 kernel: [871350.790387] Modules linked in: btrfs zlib_deflate libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs xfs reiserfs ebt_arp ebt_ip 8021q garp ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables bridge stp kvm_intel kvm nbd vesafb ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi psmouse dcdbas dm_multipath serio_raw joydev ghes hed acpi_power_meter bonding lp parport i7core_edac edac_core ses enclosure usbhid hid megaraid_sas bnx2 Jan 27 13:41:28 cmggcn01 kernel: [871351.005151] Jan 27 13:41:28 cmggcn01 kernel: [871351.036187] Pid: 90, comm: kswapd0 Tainted: G D 3.0.0-14-server #23-Ubuntu Dell Inc. PowerEdge R710/0MD99X Jan 27 13:41:28 cmggcn01 kernel: [871351.072809] RIP: 0010:[a01a5890] [a01a5890] kvm_unmap_rmapp+0x20/0x60 [kvm] Jan 27 13:41:28 cmggcn01 kernel: [871351.105190] RSP: 0018:8817f3e27a60 EFLAGS: 00010202 Jan 27 13:41:28 cmggcn01 kernel: [871351.141329] RAX: 8817f5d067f8 RBX: c9001fd41ff8 RCX: a01a58d0 Jan 27 13:41:28 cmggcn01 kernel: [871351.179076] RDX: RSI: RDI: 8817f5d067f8 Jan 27 13:41:28 cmggcn01 kernel: [871351.212086] RBP: 8817f3e27a80 R08: 8817f315b3e0 R09: 0100 Jan 27 13:41:28 cmggcn01 kernel: [871351.245788] R10: 000e R11: 0002 R12: 8817f2f0c000 Jan 27 13:41:28 cmggcn01 kernel: [871351.277514] R13: R14: 880be235e000 R15: 000d3cff Jan 27 13:41:28 cmggcn01 kernel: [871351.308421] FS: () GS:88183fce() knlGS: Jan 27 13:41:28 cmggcn01 kernel: [871351.339685] CS: 0010 DS: ES: CR0: 8005003b Jan 27 13:41:28 cmggcn01 kernel: [871351.370089] CR2: 7f8836442000 CR3: 01c03000 CR4: 26e0 Jan 27 13:41:28 cmggcn01 kernel: [871351.399771] DR0: DR1: DR2: Jan 27 13:41:28 cmggcn01 kernel: [871351.428208] DR3: DR6: 0ff0 DR7: 0400 Jan 27 13:41:28 cmggcn01 kernel: [871351.456153] Process kswapd0 (pid: 90, threadinfo 8817f3e26000, task 8817f63ac560) Jan 27 13:41:28 cmggcn01 kernel: [871351.484943] Stack: Jan 27 13:41:28 cmggcn01 kernel: [871351.512984] c9001fd41ff8 0001 7f834a87e000 Jan 27 13:41:28 cmggcn01 kernel: [871351.542025] 8817f3e27aa0 a01a5945 880be235e060 0001 Jan 27 13:41:28 cmggcn01 kernel: [871351.571050] 8817f3e27b10 a01a1dd9 8817f3e27ae0 a01a58d0 snip Jan 27 13:41:28 cmggcn01 kernel: [871352.080582] Code: e7 d0 e8 e0 66 90 e9 a2 fe ff ff 55 48 89 e5 41 55 41 54 53 48 83 ec 08 66 66 66 66 90 45 31 ed 49 89 fc 48 89 f3 eb 20 0f 1f 00 f6 00 01 74 35 48 8b 15 74 7a 02 00 48 89 c6 4c 89 e7 41 bd 01 0:e8 e0 66 90 e9 callq 0xe99066e5 5:a2 fe ff ff 55 48 89 mov%al,0x41e5894855fe c:e5 41 e:55 push %rbp f:41 54push %r12 11:53 push %rbx 12:48 83 ec 08 sub$0x8,%rsp 16:66 66 66 66 90 data32 data32 data32 xchg %ax,%ax 1b:45 31 ed xor%r13d,%r13d 1e:49 89 fc mov%rdi,%r12 21:48 89 f3 mov%rsi,%rbx 24:eb 20jmp0x46 26:0f 1f 00 nopl (%rax) 29:f6 00 01 testb $0x1,(%rax) ^ dies here, %rax is non-canonical. 2c:74 35je 0x63 2e:48 8b 15 74 7a 02 00 mov0x27a74(%rip),%rdx# 0x27aa9 35:48 89 c6 mov%rax,%rsi 38:4c 89 e7 mov%r12,%rdi static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, unsigned long data) { u64 *spte; int need_tlb_flush = 0; while ((spte = rmap_next(kvm, rmapp, NULL))) { BUG_ON(!(*spte PT_PRESENT_MASK)); ^ here,
Re: performance trouble
Le Mon, Jan 30, 2012 at 07:21:20PM +0200, Avi Kivity ecrivait : Start with top and vmstat to see if the cpu or I/O are the bottleneck with top at the kvm layer, the VM take 50% of CPU (the physical box have 16 real cores-not HT- : AMD Opteron 6134). one result of vmstat : procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 5 0 0 9904544 19612 19544000 01110 6 5 88 0 and I run it 1H and the result is always the same (just free result change time to time). (also helpful to run the Windows performance monitor in the guest). If between the start of the function and the response, the are no performance pic or something like that in Windows. What we can see : o we start the function, o we see network communication with the database server during ~3/4s o after nothing during 10s before display the result David. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: performance trouble
What's up with the cc list? On 01/31/2012 07:15 PM, David Cure wrote: Le Mon, Jan 30, 2012 at 07:21:20PM +0200, Avi Kivity ecrivait : Start with top and vmstat to see if the cpu or I/O are the bottleneck with top at the kvm layer, the VM take 50% of CPU (the physical box have 16 real cores-not HT- : AMD Opteron 6134). one result of vmstat : procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 5 0 0 9904544 19612 19544000 01110 6 5 88 0 and I run it 1H and the result is always the same (just free result change time to time). Running vmstat with no arguments gives the average for all uptime. It's meaningless. Run 'vmstat 1'. How many vcpus are there in total (over all guests)? What does top show? (also helpful to run the Windows performance monitor in the guest). If between the start of the function and the response, the are no performance pic or something like that in Windows. What we can see : o we start the function, o we see network communication with the database server during ~3/4s o after nothing during 10s before display the result That's really strange. No network traffic/cpu/block I/O? What do 'vmstat 1' and 'top -d 1' say during this? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/7] i8254: Do not raise IRQ level on reset
Avoid changing the IRQ level to high on reset as it may trigger spurious events. Instead, open-code the effects of pit_load_count(0) in the reset handler. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/i8254.c |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/hw/i8254.c b/hw/i8254.c index add1fab..d4a08d7 100644 --- a/hw/i8254.c +++ b/hw/i8254.c @@ -481,7 +481,13 @@ static void pit_reset(DeviceState *dev) s = pit-channels[i]; s-mode = 3; s-gate = (i != 2); -pit_load_count(s, 0); +s-count_load_time = qemu_get_clock_ns(vm_clock); +s-count = 0x1; +if (i == 0) { +s-next_transition_time = +pit_get_next_transition_time(s, s-count_load_time); +qemu_mod_timer(s-irq_timer, s-next_transition_time); +} } } -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/7] hpet: Save/restore cached RTC IRQ level
In legacy mode, the HPET suppresses the RTC interrupt delivery via IRQ 8 but keeps track of the RTC output level and applies it when legacy mode is turned off again. This value has to be preserved across save/ restore as it cannot be reconstructed otherwise. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/hpet.c | 26 ++ 1 files changed, 26 insertions(+), 0 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index aba9ea9..39b291f 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -240,6 +240,24 @@ static int hpet_post_load(void *opaque, int version_id) return 0; } +static bool hpet_rtc_irq_level_needed(void *opaque) +{ +HPETState *s = opaque; + +return s-rtc_irq_level != 0; +} + +static const VMStateDescription vmstate_hpet_rtc_irq_level = { +.name = hpet/rtc_irq_level, +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField[]) { +VMSTATE_UINT8(rtc_irq_level, HPETState), +VMSTATE_END_OF_LIST() +} +}; + static const VMStateDescription vmstate_hpet_timer = { .name = hpet_timer, .version_id = 1, @@ -273,6 +291,14 @@ static const VMStateDescription vmstate_hpet = { VMSTATE_STRUCT_VARRAY_UINT8(timer, HPETState, num_timers, 0, vmstate_hpet_timer, HPETTimer), VMSTATE_END_OF_LIST() +}, +.subsections = (VMStateSubsection[]) { +{ +.vmsd = vmstate_hpet_rtc_irq_level, +.needed = hpet_rtc_irq_level_needed, +}, { +/* empty */ +} } }; -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 6/7] pcspk: Convert to qdev
Convert the PC speaker device to a qdev ISA model. Move the public interface to a dedicated header file at this chance. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- arch_init.c|1 + hw/i82378.c|3 +- hw/mips_jazz.c |3 +- hw/pc.c|3 +- hw/pc.h|4 --- hw/pcspk.c | 62 ++- hw/pcspk.h | 45 7 files changed, 104 insertions(+), 17 deletions(-) create mode 100644 hw/pcspk.h diff --git a/arch_init.c b/arch_init.c index 2366511..a45485b 100644 --- a/arch_init.c +++ b/arch_init.c @@ -42,6 +42,7 @@ #include gdbstub.h #include hw/smbios.h #include exec-memory.h +#include hw/pcspk.h #ifdef TARGET_SPARC int graphic_width = 1024; diff --git a/hw/i82378.c b/hw/i82378.c index 127cadf..79fa8b7 100644 --- a/hw/i82378.c +++ b/hw/i82378.c @@ -20,6 +20,7 @@ #include pci.h #include pc.h #include i8254.h +#include pcspk.h //#define DEBUG_I82378 @@ -195,7 +196,7 @@ static void i82378_init(DeviceState *dev, I82378State *s) pit = pit_init(isabus, 0x40, 0, NULL); /* speaker */ -pcspk_init(pit); +pcspk_init(isabus, pit); /* 2 82C37 (dma) */ DMA_init(1, s-out[1]); diff --git a/hw/mips_jazz.c b/hw/mips_jazz.c index b61b218..65608dc 100644 --- a/hw/mips_jazz.c +++ b/hw/mips_jazz.c @@ -37,6 +37,7 @@ #include loader.h #include mc146818rtc.h #include i8254.h +#include pcspk.h #include blockdev.h #include sysbus.h #include exec-memory.h @@ -193,7 +194,7 @@ static void mips_jazz_init(MemoryRegion *address_space, cpu_exit_irq = qemu_allocate_irqs(cpu_request_exit, NULL, 1); DMA_init(0, cpu_exit_irq); pit = pit_init(isa_bus, 0x40, 0, NULL); -pcspk_init(pit); +pcspk_init(isa_bus, pit); /* ISA IO space at 0x9000 */ isa_mmio_init(0x9000, 0x0100); diff --git a/hw/pc.c b/hw/pc.c index 6fb1de7..f6a91a9 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -37,6 +37,7 @@ #include multiboot.h #include mc146818rtc.h #include i8254.h +#include pcspk.h #include msi.h #include sysbus.h #include sysemu.h @@ -1169,7 +1170,7 @@ void pc_basic_device_init(ISABus *isa_bus, qemu_irq *gsi, /* connect PIT to output control line of the HPET */ qdev_connect_gpio_out(hpet, 0, qdev_get_gpio_in(pit-qdev, 0)); } -pcspk_init(pit); +pcspk_init(isa_bus, pit); for(i = 0; i MAX_SERIAL_PORTS; i++) { if (serial_hds[i]) { diff --git a/hw/pc.h b/hw/pc.h index b08708d..1b47bbd 100644 --- a/hw/pc.h +++ b/hw/pc.h @@ -149,10 +149,6 @@ void piix4_smbus_register_device(SMBusDevice *dev, uint8_t addr); /* hpet.c */ extern int no_hpet; -/* pcspk.c */ -void pcspk_init(ISADevice *pit); -int pcspk_audio_init(ISABus *bus); - /* piix_pci.c */ struct PCII440FXState; typedef struct PCII440FXState PCII440FXState; diff --git a/hw/pcspk.c b/hw/pcspk.c index 43df818..5bd9e32 100644 --- a/hw/pcspk.c +++ b/hw/pcspk.c @@ -28,6 +28,7 @@ #include audio/audio.h #include qemu-timer.h #include i8254.h +#include pcspk.h #define PCSPK_BUF_LEN 1792 #define PCSPK_SAMPLE_RATE 32000 @@ -35,10 +36,13 @@ #define PCSPK_MIN_COUNT ((PIT_FREQ + PCSPK_MAX_FREQ - 1) / PCSPK_MAX_FREQ) typedef struct { +ISADevice dev; +MemoryRegion ioport; +uint32_t iobase; uint8_t sample_buf[PCSPK_BUF_LEN]; QEMUSoundCard card; SWVoiceOut *voice; -ISADevice *pit; +void *pit; unsigned int pit_count; unsigned int samples; unsigned int play_pos; @@ -47,7 +51,7 @@ typedef struct { } PCSpkState; static const char *s_spk = pcspk; -static PCSpkState pcspk_state; +static PCSpkState *pcspk_state; static inline void generate_samples(PCSpkState *s) { @@ -99,7 +103,7 @@ static void pcspk_callback(void *opaque, int free) int pcspk_audio_init(ISABus *bus) { -PCSpkState *s = pcspk_state; +PCSpkState *s = pcspk_state; struct audsettings as = {PCSPK_SAMPLE_RATE, 1, AUD_FMT_U8, 0}; AUD_register_card(s_spk, s-card); @@ -113,7 +117,8 @@ int pcspk_audio_init(ISABus *bus) return 0; } -static uint32_t pcspk_ioport_read(void *opaque, uint32_t addr) +static uint64_t pcspk_io_read(void *opaque, target_phys_addr_t addr, + unsigned size) { PCSpkState *s = opaque; int out; @@ -124,7 +129,8 @@ static uint32_t pcspk_ioport_read(void *opaque, uint32_t addr) return pit_get_gate(s-pit, 2) | (s-data_on 1) | s-dummy_refresh_clock | out; } -static void pcspk_ioport_write(void *opaque, uint32_t addr, uint32_t val) +static void pcspk_io_write(void *opaque, target_phys_addr_t addr, uint64_t val, + unsigned size) { PCSpkState *s = opaque; const int gate = val 1; @@ -138,11 +144,47 @@ static void pcspk_ioport_write(void *opaque, uint32_t addr, uint32_t val) } } -void pcspk_init(ISADevice *pit) +static const MemoryRegionOps pcspk_io_ops = { +.read = pcspk_io_read, +.write =
[PATCH v3 4/7] i8254: Pass alternative IRQ output object on initialization
HPET legacy emulation will require control over the PIT IRQ output. To enable this, add support for an alternative IRQ output object to the PIT factory function. If the isa_irq number is 0, this object will be used. This also removes the IRQ number property from the PIT class as we now use a generic GPIO output pin that is connected by the factory function. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/alpha_dp264.c |2 +- hw/i82378.c|2 +- hw/i8254.c |4 +--- hw/i8254.h |6 -- hw/mips_fulong2e.c |2 +- hw/mips_jazz.c |2 +- hw/mips_malta.c|2 +- hw/mips_r4k.c |2 +- hw/pc.c|2 +- 9 files changed, 12 insertions(+), 12 deletions(-) diff --git a/hw/alpha_dp264.c b/hw/alpha_dp264.c index 4c0efd3..ea0fd95 100644 --- a/hw/alpha_dp264.c +++ b/hw/alpha_dp264.c @@ -73,7 +73,7 @@ static void clipper_init(ram_addr_t ram_size, clipper_pci_map_irq); rtc_init(isa_bus, 1980, rtc_irq); -pit_init(isa_bus, 0x40, 0); +pit_init(isa_bus, 0x40, 0, NULL); isa_create_simple(isa_bus, i8042); /* VGA setup. Don't bother loading the bios. */ diff --git a/hw/i82378.c b/hw/i82378.c index 652970e..127cadf 100644 --- a/hw/i82378.c +++ b/hw/i82378.c @@ -192,7 +192,7 @@ static void i82378_init(DeviceState *dev, I82378State *s) isa_bus_irqs(isabus, s-i8259); /* 1 82C54 (pit) */ -pit = pit_init(isabus, 0x40, 0); +pit = pit_init(isabus, 0x40, 0, NULL); /* speaker */ pcspk_init(pit); diff --git a/hw/i8254.c b/hw/i8254.c index df83880..326211a 100644 --- a/hw/i8254.c +++ b/hw/i8254.c @@ -57,7 +57,6 @@ typedef struct PITChannelState { typedef struct PITState { ISADevice dev; MemoryRegion ioports; -uint32_t irq; uint32_t iobase; PITChannelState channels[3]; } PITState; @@ -532,7 +531,7 @@ static int pit_initfn(ISADevice *dev) s = pit-channels[0]; /* the timer 0 is connected to an IRQ */ s-irq_timer = qemu_new_timer_ns(vm_clock, pit_irq_timer, s); -s-irq = isa_get_irq(dev, pit-irq); +qdev_init_gpio_out(dev-qdev, s-irq, 1); memory_region_init_io(pit-ioports, pit_ioport_ops, pit, pit, 4); isa_register_ioport(dev, pit-ioports, pit-iobase); @@ -556,7 +555,6 @@ static DeviceInfo pit_info = { .no_user = 1, .class_init = pit_class_initfn, .props = (Property[]) { -DEFINE_PROP_UINT32(irq, PITState, irq, -1), DEFINE_PROP_HEX32(iobase, PITState, iobase, -1), DEFINE_PROP_END_OF_LIST(), }, diff --git a/hw/i8254.h b/hw/i8254.h index cd3111c..fc64a63 100644 --- a/hw/i8254.h +++ b/hw/i8254.h @@ -30,14 +30,16 @@ #define PIT_FREQ 1193182 -static inline ISADevice *pit_init(ISABus *bus, int base, int irq) +static inline ISADevice *pit_init(ISABus *bus, int base, int isa_irq, + qemu_irq alt_irq) { ISADevice *dev; dev = isa_create(bus, isa-pit); qdev_prop_set_uint32(dev-qdev, iobase, base); -qdev_prop_set_uint32(dev-qdev, irq, irq); qdev_init_nofail(dev-qdev); +qdev_connect_gpio_out(dev-qdev, 0, + isa_irq = 0 ? isa_get_irq(dev, isa_irq) : alt_irq); return dev; } diff --git a/hw/mips_fulong2e.c b/hw/mips_fulong2e.c index ead72ae..e3ba9dd 100644 --- a/hw/mips_fulong2e.c +++ b/hw/mips_fulong2e.c @@ -364,7 +364,7 @@ static void mips_fulong2e_init(ram_addr_t ram_size, const char *boot_device, smbus_eeprom_init(smbus, 1, eeprom_spd, sizeof(eeprom_spd)); /* init other devices */ -pit = pit_init(isa_bus, 0x40, 0); +pit = pit_init(isa_bus, 0x40, 0, NULL); cpu_exit_irq = qemu_allocate_irqs(cpu_request_exit, NULL, 1); DMA_init(0, cpu_exit_irq); diff --git a/hw/mips_jazz.c b/hw/mips_jazz.c index 61dee4d..b61b218 100644 --- a/hw/mips_jazz.c +++ b/hw/mips_jazz.c @@ -192,7 +192,7 @@ static void mips_jazz_init(MemoryRegion *address_space, isa_bus_irqs(isa_bus, i8259); cpu_exit_irq = qemu_allocate_irqs(cpu_request_exit, NULL, 1); DMA_init(0, cpu_exit_irq); -pit = pit_init(isa_bus, 0x40, 0); +pit = pit_init(isa_bus, 0x40, 0, NULL); pcspk_init(pit); /* ISA IO space at 0x9000 */ diff --git a/hw/mips_malta.c b/hw/mips_malta.c index 32f9f65..f611701 100644 --- a/hw/mips_malta.c +++ b/hw/mips_malta.c @@ -970,7 +970,7 @@ void mips_malta_init (ram_addr_t ram_size, isa_get_irq(NULL, 9), NULL, NULL, 0); /* TODO: Populate SPD eeprom data. */ smbus_eeprom_init(smbus, 8, NULL, 0); -pit = pit_init(isa_bus, 0x40, 0); +pit = pit_init(isa_bus, 0x40, 0, NULL); cpu_exit_irq = qemu_allocate_irqs(cpu_request_exit, NULL, 1); DMA_init(0, cpu_exit_irq); diff --git a/hw/mips_r4k.c b/hw/mips_r4k.c index 1b3ec2d..83401f0 100644 --- a/hw/mips_r4k.c +++ b/hw/mips_r4k.c @@ -270,7 +270,7 @@ void mips_r4k_init (ram_addr_t ram_size, isa_mmio_init(0x1400, 0x0001);
[PATCH v3 7/7] i8254: Factor out pit_get_channel_info
Instead of providing 4 individual query functions for mode, gate, output and initial counter state, introduce a service that queries all information at once. This comes with tiny additional costs for pcspk_callback but with a much cleaner interface. Also, it will simplify the implementation of the KVM in-kernel PIT model. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/i8254.c | 35 ++- hw/i8254.h | 12 hw/pcspk.c | 16 +++- 3 files changed, 29 insertions(+), 34 deletions(-) diff --git a/hw/i8254.c b/hw/i8254.c index 6412277..ff253bb 100644 --- a/hw/i8254.c +++ b/hw/i8254.c @@ -90,7 +90,7 @@ static int pit_get_count(PITChannelState *s) } /* get pit output bit */ -static int pit_get_out1(PITChannelState *s, int64_t current_time) +static int pit_get_out(PITChannelState *s, int64_t current_time) { uint64_t d; int out; @@ -122,13 +122,6 @@ static int pit_get_out1(PITChannelState *s, int64_t current_time) return out; } -int pit_get_out(ISADevice *dev, int channel, int64_t current_time) -{ -PITState *pit = DO_UPCAST(PITState, dev, dev); -PITChannelState *s = pit-channels[channel]; -return pit_get_out1(s, current_time); -} - /* return -1 if no transition will occur. */ static int64_t pit_get_next_transition_time(PITChannelState *s, int64_t current_time) @@ -215,25 +208,15 @@ void pit_set_gate(ISADevice *dev, int channel, int val) s-gate = val; } -int pit_get_gate(ISADevice *dev, int channel) -{ -PITState *pit = DO_UPCAST(PITState, dev, dev); -PITChannelState *s = pit-channels[channel]; -return s-gate; -} - -int pit_get_initial_count(ISADevice *dev, int channel) +void pit_get_channel_info(ISADevice *dev, int channel, PITChannelInfo *info) { PITState *pit = DO_UPCAST(PITState, dev, dev); PITChannelState *s = pit-channels[channel]; -return s-count; -} -int pit_get_mode(ISADevice *dev, int channel) -{ -PITState *pit = DO_UPCAST(PITState, dev, dev); -PITChannelState *s = pit-channels[channel]; -return s-mode; +info-gate = s-gate; +info-mode = s-mode; +info-initial_count = s-count; +info-out = pit_get_out(s, qemu_get_clock_ns(vm_clock)); } static inline void pit_load_count(PITChannelState *s, int val) @@ -274,7 +257,9 @@ static void pit_ioport_write(void *opaque, uint32_t addr, uint32_t val) if (!(val 0x10) !s-status_latched) { /* status latch */ /* XXX: add BCD and null count */ -s-status = (pit_get_out1(s, qemu_get_clock_ns(vm_clock)) 7) | +s-status = +(pit_get_out(s, + qemu_get_clock_ns(vm_clock)) 7) | (s-rw_mode 4) | (s-mode 1) | s-bcd; @@ -381,7 +366,7 @@ static void pit_irq_timer_update(PITChannelState *s, int64_t current_time) return; } expire_time = pit_get_next_transition_time(s, current_time); -irq_level = pit_get_out1(s, current_time); +irq_level = pit_get_out(s, current_time); qemu_set_irq(s-irq, irq_level); #ifdef DEBUG_PIT printf(irq_level=%d next_delay=%f\n, diff --git a/hw/i8254.h b/hw/i8254.h index 8ad8e07..a1d2e98 100644 --- a/hw/i8254.h +++ b/hw/i8254.h @@ -30,6 +30,13 @@ #define PIT_FREQ 1193182 +typedef struct PITChannelInfo { +int gate; +int mode; +int initial_count; +int out; +} PITChannelInfo; + static inline ISADevice *pit_init(ISABus *bus, int base, int isa_irq, qemu_irq alt_irq) { @@ -45,9 +52,6 @@ static inline ISADevice *pit_init(ISABus *bus, int base, int isa_irq, } void pit_set_gate(ISADevice *dev, int channel, int val); -int pit_get_gate(ISADevice *dev, int channel); -int pit_get_initial_count(ISADevice *dev, int channel); -int pit_get_mode(ISADevice *dev, int channel); -int pit_get_out(ISADevice *dev, int channel, int64_t current_time); +void pit_get_channel_info(ISADevice *dev, int channel, PITChannelInfo *info); #endif /* !HW_I8254_H */ diff --git a/hw/pcspk.c b/hw/pcspk.c index 5bd9e32..834fb0c 100644 --- a/hw/pcspk.c +++ b/hw/pcspk.c @@ -75,12 +75,16 @@ static inline void generate_samples(PCSpkState *s) static void pcspk_callback(void *opaque, int free) { PCSpkState *s = opaque; +PITChannelInfo ch; unsigned int n; -if (pit_get_mode(s-pit, 2) != 3) +pit_get_channel_info(s-pit, 2, ch); + +if (ch.mode != 3) { return; +} -n = pit_get_initial_count(s-pit, 2); +n = ch.initial_count; /* avoid frequencies that are not reproducible with sample rate */ if (n PCSPK_MIN_COUNT) n = 0; @@ -121,12 +125,14 @@ static uint64_t pcspk_io_read(void *opaque, target_phys_addr_t addr, unsigned size)
[PATCH v3 0/7] pit, hpet, pcspk: fixes preparation for KVM
This is a preparatory series to allow the introduction of the KVM in-kernel PIT. It also fixes various bugs in the PIT and HPET code, see patches for details. Changes in V3: - rebased over master - tuned pic_init interface to avoid isa_get_irq(NULL, ...) Jan Kiszka (7): i8254: Do not raise IRQ level on reset hpet: Save/restore cached RTC IRQ level i8254: Factor out interface header i8254: Pass alternative IRQ output object on initialization i8254: Rework fix interaction with HPET in legacy mode pcspk: Convert to qdev i8254: Factor out pit_get_channel_info arch_init.c|1 + hw/alpha_dp264.c |3 +- hw/hpet.c | 65 ++-- hw/hpet_emul.h |3 ++ hw/i82378.c|6 ++- hw/i8254.c | 92 ++- hw/i8254.h | 57 hw/mips_fulong2e.c |3 +- hw/mips_jazz.c |6 ++- hw/mips_malta.c|3 +- hw/mips_r4k.c |3 +- hw/pc.c| 19 -- hw/pc.h| 29 hw/pcspk.c | 79 hw/pcspk.h | 45 + 15 files changed, 288 insertions(+), 126 deletions(-) create mode 100644 hw/i8254.h create mode 100644 hw/pcspk.h -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 5/7] i8254: Rework fix interaction with HPET in legacy mode
When the HPET enters legacy mode, the IRQ output of the PIT is suppressed and replaced by the HPET timer 0. But the current code to emulate this was broken in many ways. It reset the PIT state after re-enabling, it worked against a stale static PIT structure, and it did not properly saved/restored the IRQ output mask in the PIT vmstate. This patch solves the PIT IRQ control in a different way. On x86, it both redirects the PIT IRQ to the HPET, just like the RTC. But it also keeps the control line from the HPET to the PIT. This allows to disable the PIT QEMU timer when it is not needed. The PIT's view on the control line state is now saved in the same format that qemu-kvm is already using. Note that, in contrast to the suppressed RTC IRQ line, we do not need to save/restore the PIT line state in the HPET. As we trigger a PIT IRQ update via the control line, the line state is reconstructed on mode switch. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/hpet.c | 38 -- hw/hpet_emul.h |3 +++ hw/i8254.c | 46 ++ hw/i8254.h |3 --- hw/pc.c| 15 --- 5 files changed, 57 insertions(+), 48 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index c921ec9..9fbbd47 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -65,6 +65,7 @@ typedef struct HPETState { qemu_irq irqs[HPET_NUM_IRQ_ROUTES]; uint32_t flags; uint8_t rtc_irq_level; +qemu_irq pit_enabled; uint8_t num_timers; HPETTimer timer[HPET_MAX_TIMERS]; @@ -573,12 +574,15 @@ static void hpet_ram_write(void *opaque, target_phys_addr_t addr, hpet_del_timer(s-timer[i]); } } -/* i8254 and RTC are disabled when HPET is in legacy mode */ +/* i8254 and RTC output pins are disabled + * when HPET is in legacy mode */ if (activating_bit(old_val, new_val, HPET_CFG_LEGACY)) { -hpet_pit_disable(); +qemu_set_irq(s-pit_enabled, 0); +qemu_irq_lower(s-irqs[0]); qemu_irq_lower(s-irqs[RTC_ISA_IRQ]); } else if (deactivating_bit(old_val, new_val, HPET_CFG_LEGACY)) { -hpet_pit_enable(); +qemu_irq_lower(s-irqs[0]); +qemu_set_irq(s-pit_enabled, 1); qemu_set_irq(s-irqs[RTC_ISA_IRQ], s-rtc_irq_level); } break; @@ -632,7 +636,6 @@ static void hpet_reset(DeviceState *d) { HPETState *s = FROM_SYSBUS(HPETState, sysbus_from_qdev(d)); int i; -static int count = 0; for (i = 0; i s-num_timers; i++) { HPETTimer *timer = s-timer[i]; @@ -649,29 +652,27 @@ static void hpet_reset(DeviceState *d) timer-wrap_flag = 0; } +qemu_set_irq(s-pit_enabled, 1); s-hpet_counter = 0ULL; s-hpet_offset = 0ULL; s-config = 0ULL; -if (count 0) { -/* we don't enable pit when hpet_reset is first called (by hpet_init) - * because hpet is taking over for pit here. On subsequent invocations, - * hpet_reset is called due to system reset. At this point control must - * be returned to pit until SW reenables hpet. - */ -hpet_pit_enable(); -} hpet_cfg.hpet[s-hpet_id].event_timer_block_id = (uint32_t)s-capability; hpet_cfg.hpet[s-hpet_id].address = sysbus_from_qdev(d)-mmio[0].addr; -count = 1; } -static void hpet_handle_rtc_irq(void *opaque, int n, int level) +static void hpet_handle_legacy_irq(void *opaque, int n, int level) { HPETState *s = FROM_SYSBUS(HPETState, opaque); -s-rtc_irq_level = level; -if (!hpet_in_legacy_mode(s)) { -qemu_set_irq(s-irqs[RTC_ISA_IRQ], level); +if (n == HPET_LEGACY_PIT_INT) { +if (!hpet_in_legacy_mode(s)) { +qemu_set_irq(s-irqs[0], level); +} +} else { +s-rtc_irq_level = level; +if (!hpet_in_legacy_mode(s)) { +qemu_set_irq(s-irqs[RTC_ISA_IRQ], level); +} } } @@ -714,7 +715,8 @@ static int hpet_init(SysBusDevice *dev) s-capability |= (s-num_timers - 1) HPET_ID_NUM_TIM_SHIFT; s-capability |= ((HPET_CLK_PERIOD) 32); -qdev_init_gpio_in(dev-qdev, hpet_handle_rtc_irq, 1); +qdev_init_gpio_in(dev-qdev, hpet_handle_legacy_irq, 2); +qdev_init_gpio_out(dev-qdev, s-pit_enabled, 1); /* HPET Area */ memory_region_init_io(s-iomem, hpet_ram_ops, s, hpet, 0x400); diff --git a/hw/hpet_emul.h b/hw/hpet_emul.h index 6128702..757f79f 100644 --- a/hw/hpet_emul.h +++ b/hw/hpet_emul.h @@ -22,6 +22,9 @@ #define HPET_NUM_IRQ_ROUTES 32 +#define HPET_LEGACY_PIT_INT 0 +#define HPET_LEGACY_RTC_INT 1 + #define HPET_CFG_ENABLE 0x001 #define HPET_CFG_LEGACY 0x002 diff --git a/hw/i8254.c b/hw/i8254.c index 326211a..6412277 100644 --- a/hw/i8254.c +++ b/hw/i8254.c @@ -52,6 +52,7 @@ typedef struct PITChannelState {
[PATCH v3 3/7] i8254: Factor out interface header
Move the public interface of the PIT into its own header file and update all users. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/alpha_dp264.c |1 + hw/hpet.c |1 + hw/i82378.c|1 + hw/i8254.c |1 + hw/i8254.h | 54 hw/mips_fulong2e.c |1 + hw/mips_jazz.c |1 + hw/mips_malta.c|1 + hw/mips_r4k.c |1 + hw/pc.c|1 + hw/pc.h| 25 hw/pcspk.c |1 + 12 files changed, 64 insertions(+), 25 deletions(-) create mode 100644 hw/i8254.h diff --git a/hw/alpha_dp264.c b/hw/alpha_dp264.c index 876335a..4c0efd3 100644 --- a/hw/alpha_dp264.c +++ b/hw/alpha_dp264.c @@ -14,6 +14,7 @@ #include sysemu.h #include mc146818rtc.h #include ide.h +#include i8254.h #define MAX_IDE_BUS 2 diff --git a/hw/hpet.c b/hw/hpet.c index 39b291f..c921ec9 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -31,6 +31,7 @@ #include hpet_emul.h #include sysbus.h #include mc146818rtc.h +#include i8254.h //#define HPET_DEBUG #ifdef HPET_DEBUG diff --git a/hw/i82378.c b/hw/i82378.c index 99b453a..652970e 100644 --- a/hw/i82378.c +++ b/hw/i82378.c @@ -19,6 +19,7 @@ #include pci.h #include pc.h +#include i8254.h //#define DEBUG_I82378 diff --git a/hw/i8254.c b/hw/i8254.c index d4a08d7..df83880 100644 --- a/hw/i8254.c +++ b/hw/i8254.c @@ -25,6 +25,7 @@ #include pc.h #include isa.h #include qemu-timer.h +#include i8254.h //#define DEBUG_PIT diff --git a/hw/i8254.h b/hw/i8254.h new file mode 100644 index 000..cd3111c --- /dev/null +++ b/hw/i8254.h @@ -0,0 +1,54 @@ +/* + * QEMU 8253/8254 interval timer emulation + * + * Copyright (c) 2003-2004 Fabrice Bellard + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the Software), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#ifndef HW_I8254_H +#define HW_I8254_H + +#include hw.h +#include isa.h + +#define PIT_FREQ 1193182 + +static inline ISADevice *pit_init(ISABus *bus, int base, int irq) +{ +ISADevice *dev; + +dev = isa_create(bus, isa-pit); +qdev_prop_set_uint32(dev-qdev, iobase, base); +qdev_prop_set_uint32(dev-qdev, irq, irq); +qdev_init_nofail(dev-qdev); + +return dev; +} + +void pit_set_gate(ISADevice *dev, int channel, int val); +int pit_get_gate(ISADevice *dev, int channel); +int pit_get_initial_count(ISADevice *dev, int channel); +int pit_get_mode(ISADevice *dev, int channel); +int pit_get_out(ISADevice *dev, int channel, int64_t current_time); + +void hpet_pit_disable(void); +void hpet_pit_enable(void); + +#endif /* !HW_I8254_H */ diff --git a/hw/mips_fulong2e.c b/hw/mips_fulong2e.c index 163a668..ead72ae 100644 --- a/hw/mips_fulong2e.c +++ b/hw/mips_fulong2e.c @@ -40,6 +40,7 @@ #include elf.h #include vt82c686.h #include mc146818rtc.h +#include i8254.h #include blockdev.h #include exec-memory.h diff --git a/hw/mips_jazz.c b/hw/mips_jazz.c index 63165b9..61dee4d 100644 --- a/hw/mips_jazz.c +++ b/hw/mips_jazz.c @@ -36,6 +36,7 @@ #include mips-bios.h #include loader.h #include mc146818rtc.h +#include i8254.h #include blockdev.h #include sysbus.h #include exec-memory.h diff --git a/hw/mips_malta.c b/hw/mips_malta.c index 64603ce..32f9f65 100644 --- a/hw/mips_malta.c +++ b/hw/mips_malta.c @@ -45,6 +45,7 @@ #include loader.h #include elf.h #include mc146818rtc.h +#include i8254.h #include blockdev.h #include exec-memory.h #include sysbus.h /* SysBusDevice */ diff --git a/hw/mips_r4k.c b/hw/mips_r4k.c index 1c0615c..1b3ec2d 100644 --- a/hw/mips_r4k.c +++ b/hw/mips_r4k.c @@ -22,6 +22,7 @@ #include loader.h #include elf.h #include mc146818rtc.h +#include i8254.h #include blockdev.h #include exec-memory.h diff --git a/hw/pc.c b/hw/pc.c index 31608d3..080a731 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -36,6 +36,7 @@ #include elf.h #include multiboot.h #include mc146818rtc.h +#include i8254.h #include msi.h #include sysbus.h #include sysemu.h
[PATCH uq/master] kvm: Implement kvm_irqchip_in_kernel like kvm_enabled
To both avoid that kvm_irqchip_in_kernel always has to be paired with kvm_enabled and that the former ends up in a function call, implement it like the latter. This means keeping the state in a global variable and defining kvm_irqchip_in_kernel as a preprocessor macro. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/pc.c |4 ++-- hw/pc_piix.c |6 +++--- kvm-all.c | 13 - kvm-stub.c|5 - kvm.h |8 +--- target-i386/kvm.c |4 ++-- 6 files changed, 16 insertions(+), 24 deletions(-) diff --git a/hw/pc.c b/hw/pc.c index f6a91a9..d298dc2 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -889,7 +889,7 @@ static DeviceState *apic_init(void *env, uint8_t apic_id) DeviceState *dev; static int apic_mapped; -if (kvm_enabled() kvm_irqchip_in_kernel()) { +if (kvm_irqchip_in_kernel()) { dev = qdev_create(NULL, kvm-apic); } else { dev = qdev_create(NULL, apic); @@ -908,7 +908,7 @@ static DeviceState *apic_init(void *env, uint8_t apic_id) } /* KVM does not support MSI yet. */ -if (!kvm_enabled() || !kvm_irqchip_in_kernel()) { +if (!kvm_irqchip_in_kernel()) { msi_supported = true; } diff --git a/hw/pc_piix.c b/hw/pc_piix.c index a285ad2..70b40bc 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -104,7 +104,7 @@ static void ioapic_init(GSIState *gsi_state) SysBusDevice *d; unsigned int i; -if (kvm_enabled() kvm_irqchip_in_kernel()) { +if (kvm_irqchip_in_kernel()) { dev = qdev_create(NULL, kvm-ioapic); } else { dev = qdev_create(NULL, ioapic); @@ -183,7 +183,7 @@ static void pc_init1(MemoryRegion *system_memory, } gsi_state = g_malloc0(sizeof(*gsi_state)); -if (kvm_enabled() kvm_irqchip_in_kernel()) { +if (kvm_irqchip_in_kernel()) { kvm_piix3_setup_irq_routing(pci_enabled); gsi = qemu_allocate_irqs(kvm_piix3_gsi_handler, gsi_state, GSI_NUM_PINS); @@ -209,7 +209,7 @@ static void pc_init1(MemoryRegion *system_memory, } isa_bus_irqs(isa_bus, gsi); -if (kvm_enabled() kvm_irqchip_in_kernel()) { +if (kvm_irqchip_in_kernel()) { i8259 = kvm_i8259_init(isa_bus); } else if (xen_enabled()) { i8259 = xen_interrupt_controller_init(); diff --git a/kvm-all.c b/kvm-all.c index 7d4e544..3f2460f 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -74,7 +74,6 @@ struct KVMState #ifdef KVM_CAP_SET_GUEST_DEBUG struct kvm_sw_breakpoint_head kvm_sw_breakpoints; #endif -int irqchip_in_kernel; int pit_in_kernel; int xsave, xcrs; int many_ioeventfds; @@ -88,6 +87,7 @@ struct KVMState }; KVMState *kvm_state; +bool kvm_kernel_irqchip; static const KVMCapabilityInfo kvm_required_capabilites[] = { KVM_CAP_INFO(USER_MEMORY), @@ -193,11 +193,6 @@ static void kvm_reset_vcpu(void *opaque) kvm_arch_reset_vcpu(env); } -int kvm_irqchip_in_kernel(void) -{ -return kvm_state-irqchip_in_kernel; -} - int kvm_pit_in_kernel(void) { return kvm_state-pit_in_kernel; @@ -742,7 +737,7 @@ int kvm_irqchip_set_irq(KVMState *s, int irq, int level) struct kvm_irq_level event; int ret; -assert(s-irqchip_in_kernel); +assert(kvm_irqchip_in_kernel()); event.level = level; event.irq = irq; @@ -862,7 +857,7 @@ static int kvm_irqchip_create(KVMState *s) if (kvm_check_extension(s, KVM_CAP_IRQ_INJECT_STATUS)) { s-irqchip_inject_ioctl = KVM_IRQ_LINE_STATUS; } -s-irqchip_in_kernel = 1; +kvm_kernel_irqchip = true; kvm_init_irq_routing(s); @@ -1311,7 +1306,7 @@ int kvm_has_gsi_routing(void) int kvm_allows_irq0_override(void) { -return !kvm_enabled() || !kvm_irqchip_in_kernel() || kvm_has_gsi_routing(); +return !kvm_irqchip_in_kernel() || kvm_has_gsi_routing(); } void kvm_setup_guest_memory(void *start, size_t size) diff --git a/kvm-stub.c b/kvm-stub.c index 6c2b06b..f63a0d2 100644 --- a/kvm-stub.c +++ b/kvm-stub.c @@ -16,11 +16,6 @@ #include gdbstub.h #include kvm.h -int kvm_irqchip_in_kernel(void) -{ -return 0; -} - int kvm_pit_in_kernel(void) { return 0; diff --git a/kvm.h b/kvm.h index 40b5ffc..f9f1dc8 100644 --- a/kvm.h +++ b/kvm.h @@ -23,11 +23,14 @@ #endif extern int kvm_allowed; +extern bool kvm_kernel_irqchip; #if defined CONFIG_KVM || !defined NEED_CPU_H -#define kvm_enabled() (kvm_allowed) +#define kvm_enabled() (kvm_allowed) +#define kvm_irqchip_in_kernel() (kvm_kernel_irqchip) #else -#define kvm_enabled() (0) +#define kvm_enabled() (0) +#define kvm_irqchip_in_kernel() (false) #endif struct kvm_run; @@ -80,7 +83,6 @@ int kvm_set_signal_mask(CPUState *env, const sigset_t *sigset); #endif int kvm_pit_in_kernel(void); -int kvm_irqchip_in_kernel(void); int kvm_on_sigbus_vcpu(CPUState *env, int code, void *addr); int kvm_on_sigbus(int code, void *addr); diff --git a/target-i386/kvm.c
[PATCH 4/4] kvm: x86: Add user space part for in-kernel i8254
This provides the required user space stubs to enable the in-kernel i8254 emulation of KVM. The in-kernel model supports lost tick compensation according to the delay policy. This is enabled by default and can be switched off via a device property. Depending on the feature set of the host kernel (before 2.6.32), we may have to disable the HPET or lack sound output from the PC speaker. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Makefile.target |2 +- hw/i8254.h | 11 +++ hw/kvm/i8254.c | 237 +++ hw/pc.c | 14 +++- 4 files changed, 261 insertions(+), 3 deletions(-) create mode 100644 hw/kvm/i8254.c diff --git a/Makefile.target b/Makefile.target index 68481a3..2af5511 100644 --- a/Makefile.target +++ b/Makefile.target @@ -235,7 +235,7 @@ obj-i386-y += vmport.o obj-i386-y += pci-hotplug.o smbios.o wdt_ib700.o obj-i386-y += debugcon.o multiboot.o obj-i386-y += pc_piix.o -obj-i386-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o kvm/i8259.o kvm/ioapic.o +obj-i386-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o kvm/i8259.o kvm/ioapic.o kvm/i8254.o obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o # shared objects diff --git a/hw/i8254.h b/hw/i8254.h index a1d2e98..ba6b598 100644 --- a/hw/i8254.h +++ b/hw/i8254.h @@ -51,6 +51,17 @@ static inline ISADevice *pit_init(ISABus *bus, int base, int isa_irq, return dev; } +static inline ISADevice *kvm_pit_init(ISABus *bus, int base) +{ +ISADevice *dev; + +dev = isa_create(bus, kvm-pit); +qdev_prop_set_uint32(dev-qdev, iobase, base); +qdev_init_nofail(dev-qdev); + +return dev; +} + void pit_set_gate(ISADevice *dev, int channel, int val); void pit_get_channel_info(ISADevice *dev, int channel, PITChannelInfo *info); diff --git a/hw/kvm/i8254.c b/hw/kvm/i8254.c new file mode 100644 index 000..0d7196a --- /dev/null +++ b/hw/kvm/i8254.c @@ -0,0 +1,237 @@ +/* + * KVM in-kernel PIT (i8254) support + * + * Copyright (c) 2012 Siemens AG + * + * Authors: + * Jan Kiszka jan.kis...@siemens.com + * + * This work is licensed under the terms of the GNU GPL version 2. + * See the COPYING file in the top-level directory. + */ +#include qemu-timer.h +#include hw/i8254.h +#include hw/i8254_internal.h +#include kvm.h + +#define KVM_PIT_REINJECT_BIT 0 + +typedef struct KVMPITState { +PITCommonState pit; +LostTickPolicy lost_tick_policy; +} KVMPITState; + +static void kvm_pit_get(PITCommonState *s) +{ +struct kvm_pit_state2 kpit; +struct kvm_pit_channel_state *kchan; +struct PITChannelState *sc; +int i, ret; + +if (kvm_has_pit_state2()) { +ret = kvm_vm_ioctl(kvm_state, KVM_GET_PIT2, kpit); +if (ret 0) { +fprintf(stderr, KVM_GET_PIT2 failed: %s\n, strerror(ret)); +abort(); +} +s-channels[0].irq_disabled = kpit.flags KVM_PIT_FLAGS_HPET_LEGACY; +} else { +/* + * kvm_pit_state2 is superset of kvm_pit_state struct, + * so we can use it for KVM_GET_PIT as well. + */ +ret = kvm_vm_ioctl(kvm_state, KVM_GET_PIT, kpit); +if (ret 0) { +fprintf(stderr, KVM_GET_PIT failed: %s\n, strerror(ret)); +abort(); +} +} +for (i = 0; i 3; i++) { +kchan = kpit.channels[i]; +sc = s-channels[i]; +sc-count = kchan-count; +sc-latched_count = kchan-latched_count; +sc-count_latched = kchan-count_latched; +sc-status_latched = kchan-status_latched; +sc-status = kchan-status; +sc-read_state = kchan-read_state; +sc-write_state = kchan-write_state; +sc-write_latch = kchan-write_latch; +sc-rw_mode = kchan-rw_mode; +sc-mode = kchan-mode; +sc-bcd = kchan-bcd; +sc-gate = kchan-gate; +sc-count_load_time = kchan-count_load_time; +} + +sc = s-channels[0]; +sc-next_transition_time = +pit_get_next_transition_time(sc, sc-count_load_time); +} + +static void kvm_pit_put(PITCommonState *s) +{ +struct kvm_pit_state2 kpit; +struct kvm_pit_channel_state *kchan; +struct PITChannelState *sc; +int i, ret; + +kpit.flags = s-channels[0].irq_disabled ? KVM_PIT_FLAGS_HPET_LEGACY : 0; +for (i = 0; i 3; i++) { +kchan = kpit.channels[i]; +sc = s-channels[i]; +kchan-count = sc-count; +kchan-latched_count = sc-latched_count; +kchan-count_latched = sc-count_latched; +kchan-status_latched = sc-status_latched; +kchan-status = sc-status; +kchan-read_state = sc-read_state; +kchan-write_state = sc-write_state; +kchan-write_latch = sc-write_latch; +kchan-rw_mode = sc-rw_mode; +kchan-mode = sc-mode; +kchan-bcd = sc-bcd; +kchan-gate = sc-gate; +kchan-count_load_time = sc-count_load_time; +} + +ret = kvm_vm_ioctl(kvm_state, +
[PATCH 1/4] i8254: Factor out base class for KVM reuse
Applying the concept used for the *PICs once again: establish a base class for the i8254 that can be used both by the current user space emulation and the upcoming KVM in-kernel version. We share most of the public interface of the i8254, specifically to the pcspk, vmstate, reset and certain init parts. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Makefile.objs |2 +- hw/i8254.c | 277 ++--- hw/i8254_common.c | 311 +++ hw/i8254_internal.h | 87 ++ 4 files changed, 434 insertions(+), 243 deletions(-) create mode 100644 hw/i8254_common.c create mode 100644 hw/i8254_internal.h diff --git a/Makefile.objs b/Makefile.objs index b942625..6a733fa 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -210,7 +210,7 @@ hw-obj-$(CONFIG_EMPTY_SLOT) += empty_slot.o hw-obj-$(CONFIG_SERIAL) += serial.o hw-obj-$(CONFIG_PARALLEL) += parallel.o -hw-obj-$(CONFIG_I8254) += i8254.o +hw-obj-$(CONFIG_I8254) += i8254_common.o i8254.o hw-obj-$(CONFIG_PCSPK) += pcspk.o hw-obj-$(CONFIG_PCKBD) += pckbd.o hw-obj-$(CONFIG_USB_UHCI) += usb-uhci.o diff --git a/hw/i8254.c b/hw/i8254.c index ff253bb..90e9a87 100644 --- a/hw/i8254.c +++ b/hw/i8254.c @@ -26,6 +26,7 @@ #include isa.h #include qemu-timer.h #include i8254.h +#include i8254_internal.h //#define DEBUG_PIT @@ -34,34 +35,6 @@ #define RW_STATE_WORD0 3 #define RW_STATE_WORD1 4 -typedef struct PITChannelState { -int count; /* can be 65536 */ -uint16_t latched_count; -uint8_t count_latched; -uint8_t status_latched; -uint8_t status; -uint8_t read_state; -uint8_t write_state; -uint8_t write_latch; -uint8_t rw_mode; -uint8_t mode; -uint8_t bcd; /* not supported */ -uint8_t gate; /* timer start */ -int64_t count_load_time; -/* irq handling */ -int64_t next_transition_time; -QEMUTimer *irq_timer; -qemu_irq irq; -uint32_t irq_disabled; -} PITChannelState; - -typedef struct PITState { -ISADevice dev; -MemoryRegion ioports; -uint32_t iobase; -PITChannelState channels[3]; -} PITState; - static void pit_irq_timer_update(PITChannelState *s, int64_t current_time); static int pit_get_count(PITChannelState *s) @@ -89,99 +62,11 @@ static int pit_get_count(PITChannelState *s) return counter; } -/* get pit output bit */ -static int pit_get_out(PITChannelState *s, int64_t current_time) -{ -uint64_t d; -int out; - -d = muldiv64(current_time - s-count_load_time, PIT_FREQ, - get_ticks_per_sec()); -switch(s-mode) { -default: -case 0: -out = (d = s-count); -break; -case 1: -out = (d s-count); -break; -case 2: -if ((d % s-count) == 0 d != 0) -out = 1; -else -out = 0; -break; -case 3: -out = (d % s-count) ((s-count + 1) 1); -break; -case 4: -case 5: -out = (d == s-count); -break; -} -return out; -} - -/* return -1 if no transition will occur. */ -static int64_t pit_get_next_transition_time(PITChannelState *s, -int64_t current_time) -{ -uint64_t d, next_time, base; -int period2; - -d = muldiv64(current_time - s-count_load_time, PIT_FREQ, - get_ticks_per_sec()); -switch(s-mode) { -default: -case 0: -case 1: -if (d s-count) -next_time = s-count; -else -return -1; -break; -case 2: -base = (d / s-count) * s-count; -if ((d - base) == 0 d != 0) -next_time = base + s-count; -else -next_time = base + s-count + 1; -break; -case 3: -base = (d / s-count) * s-count; -period2 = ((s-count + 1) 1); -if ((d - base) period2) -next_time = base + period2; -else -next_time = base + s-count; -break; -case 4: -case 5: -if (d s-count) -next_time = s-count; -else if (d == s-count) -next_time = s-count + 1; -else -return -1; -break; -} -/* convert to timer units */ -next_time = s-count_load_time + muldiv64(next_time, get_ticks_per_sec(), - PIT_FREQ); -/* fix potential rounding problems */ -/* XXX: better solution: use a clock at PIT_FREQ Hz */ -if (next_time = current_time) -next_time = current_time + 1; -return next_time; -} - /* val must be 0 or 1 */ -void pit_set_gate(ISADevice *dev, int channel, int val) +static void pit_set_channel_gate(PITCommonState *s, PITChannelState *sc, + int val) { -PITState *pit = DO_UPCAST(PITState, dev, dev); -PITChannelState *s = pit-channels[channel]; - -switch(s-mode) { +switch (sc-mode) { default:
[PATCH 3/4] kvm: Add kvm_has_pit_state2 helper
To be used for in-kernel PIT emulation. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- kvm-all.c | 10 ++ kvm-stub.c |5 + kvm.h |1 + 3 files changed, 16 insertions(+), 0 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 3f2460f..8df5235 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -75,6 +75,7 @@ struct KVMState struct kvm_sw_breakpoint_head kvm_sw_breakpoints; #endif int pit_in_kernel; +int pit_state2; int xsave, xcrs; int many_ioeventfds; int irqchip_inject_ioctl; @@ -954,6 +955,10 @@ int kvm_init(void) s-xcrs = kvm_check_extension(s, KVM_CAP_XCRS); #endif +#ifdef KVM_CAP_PIT_STATE2 +s-pit_state2 = kvm_check_extension(s, KVM_CAP_PIT_STATE2); +#endif + ret = kvm_arch_init(s); if (ret 0) { goto err; @@ -1291,6 +1296,11 @@ int kvm_has_xcrs(void) return kvm_state-xcrs; } +int kvm_has_pit_state2(void) +{ +return kvm_state-pit_state2; +} + int kvm_has_many_ioeventfds(void) { if (!kvm_enabled()) { diff --git a/kvm-stub.c b/kvm-stub.c index f63a0d2..1f1c686 100644 --- a/kvm-stub.c +++ b/kvm-stub.c @@ -78,6 +78,11 @@ int kvm_allows_irq0_override(void) return 1; } +int kvm_has_pit_state2(void) +{ +return 0; +} + void kvm_setup_guest_memory(void *start, size_t size) { } diff --git a/kvm.h b/kvm.h index f9f1dc8..8ef4476 100644 --- a/kvm.h +++ b/kvm.h @@ -54,6 +54,7 @@ int kvm_has_robust_singlestep(void); int kvm_has_debugregs(void); int kvm_has_xsave(void); int kvm_has_xcrs(void); +int kvm_has_pit_state2(void); int kvm_has_many_ioeventfds(void); int kvm_has_gsi_routing(void); -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] uq/master: Introduce KVM PIT support
This adds another piece of qemu-kvm to upstream: The accelerated in-kernel model of the i8254. It does this in the same fashion as the interrupt controllers were already introduced. And it even has one bug less than qemu-kvm: PC speaker output still works with KVM acceleration enabled. The patches depend on the follow upstream or uq/master patches: - pit, hpet, pcspk: fixes preparation for KVM (v7, 7 patches) http://thread.gmane.org/gmane.comp.emulators.qemu/134393 - Generic tick reinjection control (2 patches) http://thread.gmane.org/gmane.comp.emulators.qemu/133306 - kvm: Implement kvm_irqchip_in_kernel http://thread.gmane.org/gmane.comp.emulators.qemu/134399 The series plus dependencies is also available at git://git.kiszka.org/qemu-kvm.git queues/kvm-pit Please review / merge. Jan Kiszka (4): i8254: Factor out base class for KVM reuse i8254: Open-code timer restore kvm: Add kvm_has_pit_state2 helper kvm: x86: Add user space part for in-kernel i8254 Makefile.objs |2 +- Makefile.target |2 +- hw/i8254.c | 289 --- hw/i8254.h | 11 ++ hw/i8254_common.c | 311 +++ hw/i8254_internal.h | 87 ++ hw/kvm/i8254.c | 237 +++ hw/pc.c | 14 ++- kvm-all.c | 10 ++ kvm-stub.c |5 + kvm.h |1 + 11 files changed, 723 insertions(+), 246 deletions(-) create mode 100644 hw/i8254_common.c create mode 100644 hw/i8254_internal.h create mode 100644 hw/kvm/i8254.c -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] i8254: Open-code timer restore
Same as for the APIC: To enable migration between accelerated and non-accelerated models, we need to arm the channel 0 timer only inside the emulated PIT model. The common code just saves/restores that timer to the the next_transition_time field. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/i8254.c| 12 hw/i8254_common.c |6 +++--- 2 files changed, 15 insertions(+), 3 deletions(-) diff --git a/hw/i8254.c b/hw/i8254.c index 90e9a87..9020845 100644 --- a/hw/i8254.c +++ b/hw/i8254.c @@ -300,6 +300,17 @@ static const MemoryRegionOps pit_ioport_ops = { .old_portio = pit_portio }; +static void pit_post_load(PITCommonState *s) +{ +PITChannelState *sc = s-channels[0]; + +if (sc-next_transition_time != -1) { +qemu_mod_timer(sc-irq_timer, sc-next_transition_time); +} else { +qemu_del_timer(sc-irq_timer); +} +} + static int pit_initfn(PITCommonState *pit) { PITChannelState *s; @@ -323,6 +334,7 @@ static void pit_class_init(ObjectClass *klass, void *data) k-init = pit_initfn; k-set_channel_gate = pit_set_channel_gate; k-get_channel_info = pit_get_channel_info_common; +k-post_load= pit_post_load; } static DeviceInfo pit_info = { diff --git a/hw/i8254_common.c b/hw/i8254_common.c index 6ff91fb..371c3ed 100644 --- a/hw/i8254_common.c +++ b/hw/i8254_common.c @@ -234,9 +234,8 @@ static int pit_load_old(QEMUFile *f, void *opaque, int version_id) qemu_get_8s(f, s-gate); s-count_load_time = qemu_get_be64(f); s-irq_disabled = 0; -if (s-irq_timer) { +if (i == 0) { s-next_transition_time = qemu_get_be64(f); -qemu_get_timer(f, s-irq_timer); } } return 0; @@ -275,7 +274,8 @@ static const VMStateDescription vmstate_pit_common = { VMSTATE_UINT32_V(channels[0].irq_disabled, PITCommonState, 3), VMSTATE_STRUCT_ARRAY(channels, PITCommonState, 3, 2, vmstate_pit_channel, PITChannelState), -VMSTATE_TIMER(channels[0].irq_timer, PITCommonState), +VMSTATE_INT64(channels[0].next_transition_time, + PITCommonState), /* formerly irq_timer */ VMSTATE_END_OF_LIST() } }; -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/9] pci-assign: Proper initialization for MSI-X table
On Tue, 2012-01-31 at 19:40 +0200, Michael S. Tsirkin wrote: On Sat, Jan 28, 2012 at 07:22:09AM -0700, Alex Williamson wrote: Per the PCI spec, all vectors should be masked at handoff. Signed-off-by: Alex Williamson alex.william...@redhat.com --- hw/device-assignment.c | 20 +++- 1 files changed, 19 insertions(+), 1 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index af614d3..6efa219 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -1462,6 +1462,22 @@ static const MemoryRegionOps msix_mmio_ops = { }, }; +static void msix_reset(AssignedDevice *dev) +{ +MSIXTableEntry *entry; +int i; + +if (!dev-msix_table) { +return; +} + +memset(dev-msix_table, 0, 0x1000); + +for (i = 0, entry = dev-msix_table; i dev-msix_max; i++, entry++) { +entry-ctrl = 0x1; /* Masked */ This is broken for BE hosts. Show me a BE host that even remotely works with this device assignment implementation. Thanks, Alex +} +} + static int assigned_dev_register_msix_mmio(AssignedDevice *dev) { dev-msix_table = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, @@ -1470,7 +1486,9 @@ static int assigned_dev_register_msix_mmio(AssignedDevice *dev) fprintf(stderr, fail allocate msix_table! %s\n, strerror(errno)); return -EFAULT; } -memset(dev-msix_table, 0, 0x1000); + +msix_reset(dev); + memory_region_init_io(dev-mmio, msix_mmio_ops, dev, assigned-dev-msix, MSIX_PAGE_SIZE); return 0; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/9] pci-assign: Proper initialization for MSI-X table
On Sat, Jan 28, 2012 at 07:22:09AM -0700, Alex Williamson wrote: Per the PCI spec, all vectors should be masked at handoff. Signed-off-by: Alex Williamson alex.william...@redhat.com --- hw/device-assignment.c | 20 +++- 1 files changed, 19 insertions(+), 1 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index af614d3..6efa219 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -1462,6 +1462,22 @@ static const MemoryRegionOps msix_mmio_ops = { }, }; +static void msix_reset(AssignedDevice *dev) +{ +MSIXTableEntry *entry; +int i; + +if (!dev-msix_table) { +return; +} + +memset(dev-msix_table, 0, 0x1000); + +for (i = 0, entry = dev-msix_table; i dev-msix_max; i++, entry++) { +entry-ctrl = 0x1; /* Masked */ This is broken for BE hosts. +} +} + static int assigned_dev_register_msix_mmio(AssignedDevice *dev) { dev-msix_table = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, @@ -1470,7 +1486,9 @@ static int assigned_dev_register_msix_mmio(AssignedDevice *dev) fprintf(stderr, fail allocate msix_table! %s\n, strerror(errno)); return -EFAULT; } -memset(dev-msix_table, 0, 0x1000); + +msix_reset(dev); + memory_region_init_io(dev-mmio, msix_mmio_ops, dev, assigned-dev-msix, MSIX_PAGE_SIZE); return 0; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/9] pci-assign: Use struct for MSI-X table
On Sat, Jan 28, 2012 at 07:21:58AM -0700, Alex Williamson wrote: This makes access much easier. Signed-off-by: Alex Williamson alex.william...@redhat.com Yes but this also makes it easier to forget to handle endian-ness. How about using pci_get/pci_set instead? --- hw/device-assignment.c | 55 ++-- hw/device-assignment.h |9 +++- 2 files changed, 33 insertions(+), 31 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 1fc7ffa..422ee00 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -973,10 +973,9 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev); uint16_t entries_nr = 0, entries_max_nr; int pos = 0, i, r = 0; -uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl; struct kvm_assigned_msix_nr msix_nr; struct kvm_assigned_msix_entry msix_entry; -void *va = adev-msix_table_page; +MSIXTableEntry *entry = adev-msix_table; pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX); @@ -985,12 +984,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) entries_max_nr += 1; /* Get the usable entry number for allocating */ -for (i = 0; i entries_max_nr; i++) { -memcpy(msg_ctrl, va + i * 16 + 12, 4); -memcpy(msg_data, va + i * 16 + 8, 4); +for (i = 0; i entries_max_nr; i++, entry++) { /* Ignore unused entry even it's unmasked */ -if (msg_data == 0) +if (entry-data == 0) { continue; +} entries_nr ++; } @@ -1013,16 +1011,13 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) msix_entry.assigned_dev_id = msix_nr.assigned_dev_id; entries_nr = 0; -for (i = 0; i entries_max_nr; i++) { +entry = adev-msix_table; +for (i = 0; i entries_max_nr; i++, entry++) { if (entries_nr = msix_nr.entry_nr) break; -memcpy(msg_ctrl, va + i * 16 + 12, 4); -memcpy(msg_data, va + i * 16 + 8, 4); -if (msg_data == 0) +if (entry-data == 0) { continue; - -memcpy(msg_addr, va + i * 16, 4); -memcpy(msg_upper_addr, va + i * 16 + 4, 4); +} r = kvm_get_irq_route_gsi(); if (r 0) @@ -1031,10 +1026,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) adev-entry[entries_nr].gsi = r; adev-entry[entries_nr].type = KVM_IRQ_ROUTING_MSI; adev-entry[entries_nr].flags = 0; -adev-entry[entries_nr].u.msi.address_lo = msg_addr; -adev-entry[entries_nr].u.msi.address_hi = msg_upper_addr; -adev-entry[entries_nr].u.msi.data = msg_data; -DEBUG(MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!, msg_data, msg_addr); +adev-entry[entries_nr].u.msi.address_lo = entry-addr_lo; +adev-entry[entries_nr].u.msi.address_hi = entry-addr_hi; +adev-entry[entries_nr].u.msi.data = entry-data; +DEBUG(MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!, + entry-data, entry-addr_lo); kvm_add_routing_entry(adev-entry[entries_nr]); msix_entry.gsi = adev-entry[entries_nr].gsi; @@ -1443,7 +1439,7 @@ static uint64_t msix_mmio_read(void *opaque, target_phys_addr_t addr, AssignedDevice *adev = opaque; uint64_t val; -memcpy(val, (void *)((uint8_t *)adev-msix_table_page + addr), size); +memcpy(val, (void *)((uint8_t *)adev-msix_table + addr), size); return val; } @@ -1456,7 +1452,7 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr, DEBUG(write to MSI-X entry table mmio offset 0x%lx, val 0x%lx\n, addr, val); -memcpy((void *)((uint8_t *)adev-msix_table_page + addr), val, size); +memcpy((void *)((uint8_t *)adev-msix_table + addr), val, size); } static const MemoryRegionOps msix_mmio_ops = { @@ -1471,15 +1467,13 @@ static const MemoryRegionOps msix_mmio_ops = { static int assigned_dev_register_msix_mmio(AssignedDevice *dev) { -dev-msix_table_page = mmap(NULL, 0x1000, -PROT_READ|PROT_WRITE, -MAP_ANONYMOUS|MAP_PRIVATE, 0, 0); -if (dev-msix_table_page == MAP_FAILED) { -fprintf(stderr, fail allocate msix_table_page! %s\n, -strerror(errno)); +dev-msix_table = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, + MAP_ANONYMOUS|MAP_PRIVATE, 0, 0); +if (dev-msix_table == MAP_FAILED) { +fprintf(stderr, fail allocate msix_table! %s\n, strerror(errno)); return -EFAULT; } -memset(dev-msix_table_page, 0, 0x1000); +memset(dev-msix_table, 0, 0x1000); memory_region_init_io(dev-mmio, msix_mmio_ops, dev, assigned-dev-msix, MSIX_PAGE_SIZE);
Re: [PATCH v2] KVM: Fix assigned device MSI-X entry setting leak
On Mon, Jan 30, 2012 at 02:05:54PM -0700, Alex Williamson wrote: We need to prioritize our matching when setting MSI-X vector entries. Unused entries should only be used if we don't find an exact match or else we risk duplicating entries. This was causing an ENOSPC return when trying to mask and unmask MSI-X vectors based on guest MSI-X table updates. The new KVM_CAP_DEVICE_MSIX_MASK extension indicates the presence of this fix. Signed-off-by: Alex Williamson alex.william...@redhat.com --- v2: Add capability indicating MSIX_MASKing now works. The faulting sequence went something like: Start: [0] entry 0, vector A [1] entry 1, vector B [2] entry 2, vector C Set entry 1 to 0: [0] entry 0, vector A [1] entry 1-1, vector B-0 [2] entry 2, vector C Set entry 2 to 0: [0] entry 0, vector A [1] entry 1-2, vector 0-0 - incorrectly matched [2] entry 2, vector C Set entry 2 to C: [0] entry 0, vector A [1] entry 2-2, vector 0-C - incorrectly matched again [2] entry 2, vector C Set entry 1 to B: [0] entry 0, vector A [1] entry 2, vector C [2] entry 2, vector C -ENOSPC Documentation/virtual/kvm/api.txt |5 +++-- arch/x86/kvm/x86.c|1 + include/linux/kvm.h |1 + virt/kvm/assigned-dev.c | 21 ++--- 4 files changed, 19 insertions(+), 9 deletions(-) I don't object to fixing this bug. But I don't think we need a capability for this. Here's why: setting an entry to zero is not really matching what devices need, since that would lose interrupts. What devices need is mask entries to disable them, then unmask and get an interrupt if it was delayed. So, if we are going to add a new capability, how about sticking a mask bit in the padding instead? As was shown in the past this can even improve performance since we can propagate the mask bit back to the host device. diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index e1d94bf..dd0b554 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1304,8 +1304,9 @@ Type: vm ioctl Parameters: struct kvm_assigned_msix_entry (in) Returns: 0 on success, -1 on error -Specifies the routing of an MSI-X assigned device interrupt to a GSI. Setting -the GSI vector to zero means disabling the interrupt. +Specifies the routing of an MSI-X assigned device interrupt to a GSI. If +KVM_CAP_DEVICE_MSIX_MASK is available, setting the GSI vector to zero means +disabling the interrupt. struct kvm_assigned_msix_entry { __u32 assigned_dev_id; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 14d6cad..a35255e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2057,6 +2057,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_XSAVE: case KVM_CAP_ASYNC_PF: case KVM_CAP_GET_TSC_KHZ: + case KVM_CAP_DEVICE_MSIX_MASK: r = 1; break; case KVM_CAP_COALESCED_MMIO: diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 68e67e5..64438e6 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -558,6 +558,7 @@ struct kvm_ppc_pvinfo { #define KVM_CAP_PPC_PAPR 68 #define KVM_CAP_S390_GMAP 71 #define KVM_CAP_TSC_DEADLINE_TIMER 72 +#define KVM_CAP_DEVICE_MSIX_MASK 73 #ifdef KVM_CAP_IRQ_ROUTING diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c index 758e3b3..58f3f6b 100644 --- a/virt/kvm/assigned-dev.c +++ b/virt/kvm/assigned-dev.c @@ -728,7 +728,7 @@ msix_nr_out: static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm, struct kvm_assigned_msix_entry *entry) { - int r = 0, i; + int r = 0, i, unused = -1; struct kvm_assigned_dev_kernel *adev; mutex_lock(kvm-lock); @@ -741,17 +741,24 @@ static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm, goto msix_entry_out; } - for (i = 0; i adev-entries_nr; i++) - if (adev-guest_msix_entries[i].vector == 0 || - adev-guest_msix_entries[i].entry == entry-entry) { + for (i = 0; i adev-entries_nr; i++) { + if (adev-guest_msix_entries[i].entry == entry-entry) { adev-guest_msix_entries[i].entry = entry-entry; adev-guest_msix_entries[i].vector = entry-gsi; adev-host_msix_entries[i].entry = entry-entry; break; - } + } else if (unused 0 !adev-guest_msix_entries[i].vector) + unused = i; + } + if (i == adev-entries_nr) { - r = -ENOSPC; - goto msix_entry_out; + if (unused 0) { + r = -ENOSPC; + goto msix_entry_out; + } + adev-guest_msix_entries[unused].entry = entry-entry; + adev-guest_msix_entries[unused].vector =
Re: [PATCH 6/9] pci-assign: Proper initialization for MSI-X table
On Tue, Jan 31, 2012 at 12:07:39PM -0700, Alex Williamson wrote: On Tue, 2012-01-31 at 19:40 +0200, Michael S. Tsirkin wrote: On Sat, Jan 28, 2012 at 07:22:09AM -0700, Alex Williamson wrote: Per the PCI spec, all vectors should be masked at handoff. Signed-off-by: Alex Williamson alex.william...@redhat.com --- hw/device-assignment.c | 20 +++- 1 files changed, 19 insertions(+), 1 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index af614d3..6efa219 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -1462,6 +1462,22 @@ static const MemoryRegionOps msix_mmio_ops = { }, }; +static void msix_reset(AssignedDevice *dev) +{ +MSIXTableEntry *entry; +int i; + +if (!dev-msix_table) { +return; +} + +memset(dev-msix_table, 0, 0x1000); + +for (i = 0, entry = dev-msix_table; i dev-msix_max; i++, entry++) { +entry-ctrl = 0x1; /* Masked */ This is broken for BE hosts. Show me a BE host that even remotely works with this device assignment implementation. Thanks, Alex I don't get it. Yes lots of cleanup is needed but why add more broken code? +} +} + static int assigned_dev_register_msix_mmio(AssignedDevice *dev) { dev-msix_table = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, @@ -1470,7 +1486,9 @@ static int assigned_dev_register_msix_mmio(AssignedDevice *dev) fprintf(stderr, fail allocate msix_table! %s\n, strerror(errno)); return -EFAULT; } -memset(dev-msix_table, 0, 0x1000); + +msix_reset(dev); + memory_region_init_io(dev-mmio, msix_mmio_ops, dev, assigned-dev-msix, MSIX_PAGE_SIZE); return 0; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/9] pci-assign: Proper initialization for MSI-X table
On 2012-01-31 20:12, Michael S. Tsirkin wrote: On Tue, Jan 31, 2012 at 12:07:39PM -0700, Alex Williamson wrote: On Tue, 2012-01-31 at 19:40 +0200, Michael S. Tsirkin wrote: On Sat, Jan 28, 2012 at 07:22:09AM -0700, Alex Williamson wrote: Per the PCI spec, all vectors should be masked at handoff. Signed-off-by: Alex Williamson alex.william...@redhat.com --- hw/device-assignment.c | 20 +++- 1 files changed, 19 insertions(+), 1 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index af614d3..6efa219 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -1462,6 +1462,22 @@ static const MemoryRegionOps msix_mmio_ops = { }, }; +static void msix_reset(AssignedDevice *dev) +{ +MSIXTableEntry *entry; +int i; + +if (!dev-msix_table) { +return; +} + +memset(dev-msix_table, 0, 0x1000); + +for (i = 0, entry = dev-msix_table; i dev-msix_max; i++, entry++) { +entry-ctrl = 0x1; /* Masked */ This is broken for BE hosts. Show me a BE host that even remotely works with this device assignment implementation. Thanks, Alex I don't get it. Yes lots of cleanup is needed but why add more broken code? At some point, we will just do this via the PCI core, and that will have to do it correctly anyway. This code here is supposed to be removed again. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/9] pci-assign: Use struct for MSI-X table
On Tue, 2012-01-31 at 19:40 +0200, Michael S. Tsirkin wrote: On Sat, Jan 28, 2012 at 07:21:58AM -0700, Alex Williamson wrote: This makes access much easier. Signed-off-by: Alex Williamson alex.william...@redhat.com Yes but this also makes it easier to forget to handle endian-ness. How about using pci_get/pci_set instead? Do we really think existing device assignment is ever going to work on anything but x86/x86? Alex --- hw/device-assignment.c | 55 ++-- hw/device-assignment.h |9 +++- 2 files changed, 33 insertions(+), 31 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 1fc7ffa..422ee00 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -973,10 +973,9 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev); uint16_t entries_nr = 0, entries_max_nr; int pos = 0, i, r = 0; -uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl; struct kvm_assigned_msix_nr msix_nr; struct kvm_assigned_msix_entry msix_entry; -void *va = adev-msix_table_page; +MSIXTableEntry *entry = adev-msix_table; pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX); @@ -985,12 +984,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) entries_max_nr += 1; /* Get the usable entry number for allocating */ -for (i = 0; i entries_max_nr; i++) { -memcpy(msg_ctrl, va + i * 16 + 12, 4); -memcpy(msg_data, va + i * 16 + 8, 4); +for (i = 0; i entries_max_nr; i++, entry++) { /* Ignore unused entry even it's unmasked */ -if (msg_data == 0) +if (entry-data == 0) { continue; +} entries_nr ++; } @@ -1013,16 +1011,13 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) msix_entry.assigned_dev_id = msix_nr.assigned_dev_id; entries_nr = 0; -for (i = 0; i entries_max_nr; i++) { +entry = adev-msix_table; +for (i = 0; i entries_max_nr; i++, entry++) { if (entries_nr = msix_nr.entry_nr) break; -memcpy(msg_ctrl, va + i * 16 + 12, 4); -memcpy(msg_data, va + i * 16 + 8, 4); -if (msg_data == 0) +if (entry-data == 0) { continue; - -memcpy(msg_addr, va + i * 16, 4); -memcpy(msg_upper_addr, va + i * 16 + 4, 4); +} r = kvm_get_irq_route_gsi(); if (r 0) @@ -1031,10 +1026,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) adev-entry[entries_nr].gsi = r; adev-entry[entries_nr].type = KVM_IRQ_ROUTING_MSI; adev-entry[entries_nr].flags = 0; -adev-entry[entries_nr].u.msi.address_lo = msg_addr; -adev-entry[entries_nr].u.msi.address_hi = msg_upper_addr; -adev-entry[entries_nr].u.msi.data = msg_data; -DEBUG(MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!, msg_data, msg_addr); +adev-entry[entries_nr].u.msi.address_lo = entry-addr_lo; +adev-entry[entries_nr].u.msi.address_hi = entry-addr_hi; +adev-entry[entries_nr].u.msi.data = entry-data; +DEBUG(MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!, + entry-data, entry-addr_lo); kvm_add_routing_entry(adev-entry[entries_nr]); msix_entry.gsi = adev-entry[entries_nr].gsi; @@ -1443,7 +1439,7 @@ static uint64_t msix_mmio_read(void *opaque, target_phys_addr_t addr, AssignedDevice *adev = opaque; uint64_t val; -memcpy(val, (void *)((uint8_t *)adev-msix_table_page + addr), size); +memcpy(val, (void *)((uint8_t *)adev-msix_table + addr), size); return val; } @@ -1456,7 +1452,7 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr, DEBUG(write to MSI-X entry table mmio offset 0x%lx, val 0x%lx\n, addr, val); -memcpy((void *)((uint8_t *)adev-msix_table_page + addr), val, size); +memcpy((void *)((uint8_t *)adev-msix_table + addr), val, size); } static const MemoryRegionOps msix_mmio_ops = { @@ -1471,15 +1467,13 @@ static const MemoryRegionOps msix_mmio_ops = { static int assigned_dev_register_msix_mmio(AssignedDevice *dev) { -dev-msix_table_page = mmap(NULL, 0x1000, -PROT_READ|PROT_WRITE, -MAP_ANONYMOUS|MAP_PRIVATE, 0, 0); -if (dev-msix_table_page == MAP_FAILED) { -fprintf(stderr, fail allocate msix_table_page! %s\n, -strerror(errno)); +dev-msix_table = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, + MAP_ANONYMOUS|MAP_PRIVATE, 0, 0); +if (dev-msix_table == MAP_FAILED) { +fprintf(stderr, fail allocate
Re: [PATCH 4/9] pci-assign: Use struct for MSI-X table
On Tue, Jan 31, 2012 at 12:05:49PM -0700, Alex Williamson wrote: On Tue, 2012-01-31 at 19:40 +0200, Michael S. Tsirkin wrote: On Sat, Jan 28, 2012 at 07:21:58AM -0700, Alex Williamson wrote: This makes access much easier. Signed-off-by: Alex Williamson alex.william...@redhat.com Yes but this also makes it easier to forget to handle endian-ness. How about using pci_get/pci_set instead? Do we really think existing device assignment is ever going to work on anything but x86/x86? Alex It's a question of whether anyone bothers to fix all portability bugs. Besides endian-ness, and msix vector decoding, there's not much that is x86 specific there. --- hw/device-assignment.c | 55 ++-- hw/device-assignment.h |9 +++- 2 files changed, 33 insertions(+), 31 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 1fc7ffa..422ee00 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -973,10 +973,9 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev); uint16_t entries_nr = 0, entries_max_nr; int pos = 0, i, r = 0; -uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl; struct kvm_assigned_msix_nr msix_nr; struct kvm_assigned_msix_entry msix_entry; -void *va = adev-msix_table_page; +MSIXTableEntry *entry = adev-msix_table; pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX); @@ -985,12 +984,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) entries_max_nr += 1; /* Get the usable entry number for allocating */ -for (i = 0; i entries_max_nr; i++) { -memcpy(msg_ctrl, va + i * 16 + 12, 4); -memcpy(msg_data, va + i * 16 + 8, 4); +for (i = 0; i entries_max_nr; i++, entry++) { /* Ignore unused entry even it's unmasked */ -if (msg_data == 0) +if (entry-data == 0) { continue; +} entries_nr ++; } @@ -1013,16 +1011,13 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) msix_entry.assigned_dev_id = msix_nr.assigned_dev_id; entries_nr = 0; -for (i = 0; i entries_max_nr; i++) { +entry = adev-msix_table; +for (i = 0; i entries_max_nr; i++, entry++) { if (entries_nr = msix_nr.entry_nr) break; -memcpy(msg_ctrl, va + i * 16 + 12, 4); -memcpy(msg_data, va + i * 16 + 8, 4); -if (msg_data == 0) +if (entry-data == 0) { continue; - -memcpy(msg_addr, va + i * 16, 4); -memcpy(msg_upper_addr, va + i * 16 + 4, 4); +} r = kvm_get_irq_route_gsi(); if (r 0) @@ -1031,10 +1026,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) adev-entry[entries_nr].gsi = r; adev-entry[entries_nr].type = KVM_IRQ_ROUTING_MSI; adev-entry[entries_nr].flags = 0; -adev-entry[entries_nr].u.msi.address_lo = msg_addr; -adev-entry[entries_nr].u.msi.address_hi = msg_upper_addr; -adev-entry[entries_nr].u.msi.data = msg_data; -DEBUG(MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!, msg_data, msg_addr); +adev-entry[entries_nr].u.msi.address_lo = entry-addr_lo; +adev-entry[entries_nr].u.msi.address_hi = entry-addr_hi; +adev-entry[entries_nr].u.msi.data = entry-data; +DEBUG(MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!, + entry-data, entry-addr_lo); kvm_add_routing_entry(adev-entry[entries_nr]); msix_entry.gsi = adev-entry[entries_nr].gsi; @@ -1443,7 +1439,7 @@ static uint64_t msix_mmio_read(void *opaque, target_phys_addr_t addr, AssignedDevice *adev = opaque; uint64_t val; -memcpy(val, (void *)((uint8_t *)adev-msix_table_page + addr), size); +memcpy(val, (void *)((uint8_t *)adev-msix_table + addr), size); return val; } @@ -1456,7 +1452,7 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr, DEBUG(write to MSI-X entry table mmio offset 0x%lx, val 0x%lx\n, addr, val); -memcpy((void *)((uint8_t *)adev-msix_table_page + addr), val, size); +memcpy((void *)((uint8_t *)adev-msix_table + addr), val, size); } static const MemoryRegionOps msix_mmio_ops = { @@ -1471,15 +1467,13 @@ static const MemoryRegionOps msix_mmio_ops = { static int assigned_dev_register_msix_mmio(AssignedDevice *dev) { -dev-msix_table_page = mmap(NULL, 0x1000, -PROT_READ|PROT_WRITE, -MAP_ANONYMOUS|MAP_PRIVATE, 0, 0); -
Re: Increase KVMSlots size
On Tue, Jan 31, 2012 at 11:47:32AM +, Bhushan Bharat-R65777 wrote: Hi All, In my development I came across the state where I need more than 32 kvmslots QEMU/kvm-all.c: cut struct KVMState { KVMSlot slots[32]; cut Followings are the solution I thought of: Solution 1: Increase the slot to 64 in both KVM and QEMU. I think this will also require to add a new ioctl call so that qemu can come to know on which KVM it is running (whether it is running on 32 slots KVM or 64 slots KVM). It already exists, its called KVM_CAP_NR_MEMSLOTS (implemented by x86). Other solutions: a) Decouple the qemu and kvm data structure. QEMU and KVM use some variable size data structure or lists etc. QEMU can keep on adding as many as it wants without knowing how much slots KVM supports. b) There are linear search in the KVM slots array in critical path of code of KVM. Does this make some sense using RB-tree, heapsort, balanced binary tree type of solutions? See discussion at http://lwn.net/Articles/429308/. x86 optimizes the case of a miss (GFN not mapped by any memslot) by caching the result into a specially invalid pagetable entry (so an invalid fault optimizes away the need to walk all memslots). It would be good to know how much of a problem an increased memslot array is for PPC. Looking forward for suggestions. Thanks -Bharat -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/9] pci-assign: Only calculate maximum MSI-X vector entries once
On Sat, Jan 28, 2012 at 07:22:04AM -0700, Alex Williamson wrote: Signed-off-by: Alex Williamson alex.william...@redhat.com --- Why? Optimization? hw/device-assignment.c | 17 +++-- hw/device-assignment.h |1 + 2 files changed, 8 insertions(+), 10 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 422ee00..af614d3 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -971,20 +971,14 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev) static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) { AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev); -uint16_t entries_nr = 0, entries_max_nr; -int pos = 0, i, r = 0; +uint16_t entries_nr = 0; +int i, r = 0; struct kvm_assigned_msix_nr msix_nr; struct kvm_assigned_msix_entry msix_entry; MSIXTableEntry *entry = adev-msix_table; -pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX); - -entries_max_nr = *(uint16_t *)(pci_dev-config + pos + 2); -entries_max_nr = PCI_MSIX_FLAGS_QSIZE; -entries_max_nr += 1; - /* Get the usable entry number for allocating */ -for (i = 0; i entries_max_nr; i++, entry++) { +for (i = 0; i adev-msix_max; i++, entry++) { /* Ignore unused entry even it's unmasked */ if (entry-data == 0) { continue; @@ -1012,7 +1006,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) msix_entry.assigned_dev_id = msix_nr.assigned_dev_id; entries_nr = 0; entry = adev-msix_table; -for (i = 0; i entries_max_nr; i++, entry++) { +for (i = 0; i adev-msix_max; i++, entry++) { if (entries_nr = msix_nr.entry_nr) break; if (entry-data == 0) { @@ -1225,6 +1219,9 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev) bar_nr = msix_table_entry PCI_MSIX_FLAGS_BIRMASK; msix_table_entry = ~PCI_MSIX_FLAGS_BIRMASK; dev-msix_table_addr = pci_region[bar_nr].base_addr + msix_table_entry; +dev-msix_max = pci_get_word(pci_dev-config + pos + PCI_MSIX_FLAGS); +dev-msix_max = PCI_MSIX_FLAGS_QSIZE; +dev-msix_max += 1; } /* Minimal PM support, nothing writable, device appears to NAK changes */ diff --git a/hw/device-assignment.h b/hw/device-assignment.h index e229303..a909fb2 100644 --- a/hw/device-assignment.h +++ b/hw/device-assignment.h @@ -119,6 +119,7 @@ typedef struct AssignedDevice { struct kvm_irq_routing_entry *entry; MSIXTableEntry *msix_table; target_phys_addr_t msix_table_addr; +uint16_t msix_max; MemoryRegion mmio; char *configfd_name; int32_t bootindex; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/9] pci-assign: Proper initialization for MSI-X table
On Tue, Jan 31, 2012 at 08:16:31PM +0100, Jan Kiszka wrote: On 2012-01-31 20:12, Michael S. Tsirkin wrote: On Tue, Jan 31, 2012 at 12:07:39PM -0700, Alex Williamson wrote: On Tue, 2012-01-31 at 19:40 +0200, Michael S. Tsirkin wrote: On Sat, Jan 28, 2012 at 07:22:09AM -0700, Alex Williamson wrote: Per the PCI spec, all vectors should be masked at handoff. Signed-off-by: Alex Williamson alex.william...@redhat.com --- hw/device-assignment.c | 20 +++- 1 files changed, 19 insertions(+), 1 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index af614d3..6efa219 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -1462,6 +1462,22 @@ static const MemoryRegionOps msix_mmio_ops = { }, }; +static void msix_reset(AssignedDevice *dev) +{ +MSIXTableEntry *entry; +int i; + +if (!dev-msix_table) { +return; +} + +memset(dev-msix_table, 0, 0x1000); + +for (i = 0, entry = dev-msix_table; i dev-msix_max; i++, entry++) { +entry-ctrl = 0x1; /* Masked */ This is broken for BE hosts. Show me a BE host that even remotely works with this device assignment implementation. Thanks, Alex I don't get it. Yes lots of cleanup is needed but why add more broken code? At some point, we will just do this via the PCI core, and that will have to do it correctly anyway. This code here is supposed to be removed again. Jan Either we say this is all dead code, then let's just fix bugs until it is rewritten. To fix the bug, all we need is patch 9, maybe 6 and 7. Or there is intent to maintain it going forward then, methinks, it needs to be brought up to some minimal standards. -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/9] pci-assign: Only calculate maximum MSI-X vector entries once
On Tue, 2012-01-31 at 22:18 +0200, Michael S. Tsirkin wrote: On Sat, Jan 28, 2012 at 07:22:04AM -0700, Alex Williamson wrote: Signed-off-by: Alex Williamson alex.william...@redhat.com --- Why? Optimization? Because in 6/9 we'd have to calculate it again for resetting the msix table and in 9/9 we use it to drop writes outside of the valid vector range. I suspect we might actually want to call msix_reset on device reset, so that means we'd be recalculating it every time we reset the device, write to the msix table, and again if we're actually updating msix entries. Redundancy exceeded my cost of two bytes metric. Thanks, Alex hw/device-assignment.c | 17 +++-- hw/device-assignment.h |1 + 2 files changed, 8 insertions(+), 10 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 422ee00..af614d3 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -971,20 +971,14 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev) static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) { AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev); -uint16_t entries_nr = 0, entries_max_nr; -int pos = 0, i, r = 0; +uint16_t entries_nr = 0; +int i, r = 0; struct kvm_assigned_msix_nr msix_nr; struct kvm_assigned_msix_entry msix_entry; MSIXTableEntry *entry = adev-msix_table; -pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX); - -entries_max_nr = *(uint16_t *)(pci_dev-config + pos + 2); -entries_max_nr = PCI_MSIX_FLAGS_QSIZE; -entries_max_nr += 1; - /* Get the usable entry number for allocating */ -for (i = 0; i entries_max_nr; i++, entry++) { +for (i = 0; i adev-msix_max; i++, entry++) { /* Ignore unused entry even it's unmasked */ if (entry-data == 0) { continue; @@ -1012,7 +1006,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) msix_entry.assigned_dev_id = msix_nr.assigned_dev_id; entries_nr = 0; entry = adev-msix_table; -for (i = 0; i entries_max_nr; i++, entry++) { +for (i = 0; i adev-msix_max; i++, entry++) { if (entries_nr = msix_nr.entry_nr) break; if (entry-data == 0) { @@ -1225,6 +1219,9 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev) bar_nr = msix_table_entry PCI_MSIX_FLAGS_BIRMASK; msix_table_entry = ~PCI_MSIX_FLAGS_BIRMASK; dev-msix_table_addr = pci_region[bar_nr].base_addr + msix_table_entry; +dev-msix_max = pci_get_word(pci_dev-config + pos + PCI_MSIX_FLAGS); +dev-msix_max = PCI_MSIX_FLAGS_QSIZE; +dev-msix_max += 1; } /* Minimal PM support, nothing writable, device appears to NAK changes */ diff --git a/hw/device-assignment.h b/hw/device-assignment.h index e229303..a909fb2 100644 --- a/hw/device-assignment.h +++ b/hw/device-assignment.h @@ -119,6 +119,7 @@ typedef struct AssignedDevice { struct kvm_irq_routing_entry *entry; MSIXTableEntry *msix_table; target_phys_addr_t msix_table_addr; +uint16_t msix_max; MemoryRegion mmio; char *configfd_name; int32_t bootindex; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3 6/7] pcspk: Convert to qdev
On 01/31/2012 11:41 AM, Jan Kiszka wrote: Convert the PC speaker device to a qdev ISA model. Move the public interface to a dedicated header file at this chance. Signed-off-by: Jan Kiszkajan.kis...@siemens.com Heh, I did this too more or less the same way. Some comments below: --- arch_init.c|1 + hw/i82378.c|3 +- hw/mips_jazz.c |3 +- hw/pc.c|3 +- hw/pc.h|4 --- hw/pcspk.c | 62 ++- hw/pcspk.h | 45 7 files changed, 104 insertions(+), 17 deletions(-) create mode 100644 hw/pcspk.h diff --git a/arch_init.c b/arch_init.c index 2366511..a45485b 100644 --- a/arch_init.c +++ b/arch_init.c @@ -42,6 +42,7 @@ #include gdbstub.h #include hw/smbios.h #include exec-memory.h +#include hw/pcspk.h #ifdef TARGET_SPARC int graphic_width = 1024; diff --git a/hw/i82378.c b/hw/i82378.c index 127cadf..79fa8b7 100644 --- a/hw/i82378.c +++ b/hw/i82378.c @@ -20,6 +20,7 @@ #include pci.h #include pc.h #include i8254.h +#include pcspk.h //#define DEBUG_I82378 @@ -195,7 +196,7 @@ static void i82378_init(DeviceState *dev, I82378State *s) pit = pit_init(isabus, 0x40, 0, NULL); /* speaker */ -pcspk_init(pit); +pcspk_init(isabus, pit); /* 2 82C37 (dma) */ DMA_init(1,s-out[1]); diff --git a/hw/mips_jazz.c b/hw/mips_jazz.c index b61b218..65608dc 100644 --- a/hw/mips_jazz.c +++ b/hw/mips_jazz.c @@ -37,6 +37,7 @@ #include loader.h #include mc146818rtc.h #include i8254.h +#include pcspk.h #include blockdev.h #include sysbus.h #include exec-memory.h @@ -193,7 +194,7 @@ static void mips_jazz_init(MemoryRegion *address_space, cpu_exit_irq = qemu_allocate_irqs(cpu_request_exit, NULL, 1); DMA_init(0, cpu_exit_irq); pit = pit_init(isa_bus, 0x40, 0, NULL); -pcspk_init(pit); +pcspk_init(isa_bus, pit); /* ISA IO space at 0x9000 */ isa_mmio_init(0x9000, 0x0100); diff --git a/hw/pc.c b/hw/pc.c index 6fb1de7..f6a91a9 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -37,6 +37,7 @@ #include multiboot.h #include mc146818rtc.h #include i8254.h +#include pcspk.h #include msi.h #include sysbus.h #include sysemu.h @@ -1169,7 +1170,7 @@ void pc_basic_device_init(ISABus *isa_bus, qemu_irq *gsi, /* connect PIT to output control line of the HPET */ qdev_connect_gpio_out(hpet, 0, qdev_get_gpio_in(pit-qdev, 0)); } -pcspk_init(pit); +pcspk_init(isa_bus, pit); for(i = 0; i MAX_SERIAL_PORTS; i++) { if (serial_hds[i]) { diff --git a/hw/pc.h b/hw/pc.h index b08708d..1b47bbd 100644 --- a/hw/pc.h +++ b/hw/pc.h @@ -149,10 +149,6 @@ void piix4_smbus_register_device(SMBusDevice *dev, uint8_t addr); /* hpet.c */ extern int no_hpet; -/* pcspk.c */ -void pcspk_init(ISADevice *pit); -int pcspk_audio_init(ISABus *bus); - /* piix_pci.c */ struct PCII440FXState; typedef struct PCII440FXState PCII440FXState; diff --git a/hw/pcspk.c b/hw/pcspk.c index 43df818..5bd9e32 100644 --- a/hw/pcspk.c +++ b/hw/pcspk.c @@ -28,6 +28,7 @@ #include audio/audio.h #include qemu-timer.h #include i8254.h +#include pcspk.h #define PCSPK_BUF_LEN 1792 #define PCSPK_SAMPLE_RATE 32000 @@ -35,10 +36,13 @@ #define PCSPK_MIN_COUNT ((PIT_FREQ + PCSPK_MAX_FREQ - 1) / PCSPK_MAX_FREQ) typedef struct { +ISADevice dev; +MemoryRegion ioport; +uint32_t iobase; uint8_t sample_buf[PCSPK_BUF_LEN]; QEMUSoundCard card; SWVoiceOut *voice; -ISADevice *pit; +void *pit; unsigned int pit_count; unsigned int samples; unsigned int play_pos; @@ -47,7 +51,7 @@ typedef struct { } PCSpkState; static const char *s_spk = pcspk; -static PCSpkState pcspk_state; +static PCSpkState *pcspk_state; static inline void generate_samples(PCSpkState *s) { @@ -99,7 +103,7 @@ static void pcspk_callback(void *opaque, int free) int pcspk_audio_init(ISABus *bus) { -PCSpkState *s =pcspk_state; +PCSpkState *s = pcspk_state; struct audsettings as = {PCSPK_SAMPLE_RATE, 1, AUD_FMT_U8, 0}; This can be a follow up, but what this is really doing is enabling audio for this device. ISABus is unused as a parameter. I think it would be better to make this a qdev bool property (audio_enabled) and then modify the code that calls this function to either set the property as a global, or use the QOM path to specifically set it on this device. Either way, I think setting an audio_enabled flag via qdev makes a whole lot more sense conceptionally than using the -soundhw option. AUD_register_card(s_spk,s-card); @@ -113,7 +117,8 @@ int pcspk_audio_init(ISABus *bus) return 0; } -static uint32_t pcspk_ioport_read(void *opaque, uint32_t addr) +static uint64_t pcspk_io_read(void *opaque, target_phys_addr_t addr, + unsigned size) { PCSpkState *s = opaque;
Re: [PATCH 5/9] pci-assign: Only calculate maximum MSI-X vector entries once
On Tue, Jan 31, 2012 at 01:31:59PM -0700, Alex Williamson wrote: On Tue, 2012-01-31 at 22:18 +0200, Michael S. Tsirkin wrote: On Sat, Jan 28, 2012 at 07:22:04AM -0700, Alex Williamson wrote: Signed-off-by: Alex Williamson alex.william...@redhat.com --- Why? Optimization? Because in 6/9 we'd have to calculate it again for resetting the msix table and in 9/9 we use it to drop writes outside of the valid vector range. I suspect we might actually want to call msix_reset on device reset, so that means we'd be recalculating it every time we reset the device, write to the msix table, and again if we're actually updating msix entries. Redundancy exceeded my cost of two bytes metric. Thanks, Alex No problem but motivation should probably go into commit log. hw/device-assignment.c | 17 +++-- hw/device-assignment.h |1 + 2 files changed, 8 insertions(+), 10 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 422ee00..af614d3 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -971,20 +971,14 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev) static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) { AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev); -uint16_t entries_nr = 0, entries_max_nr; -int pos = 0, i, r = 0; +uint16_t entries_nr = 0; +int i, r = 0; struct kvm_assigned_msix_nr msix_nr; struct kvm_assigned_msix_entry msix_entry; MSIXTableEntry *entry = adev-msix_table; -pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX); - -entries_max_nr = *(uint16_t *)(pci_dev-config + pos + 2); -entries_max_nr = PCI_MSIX_FLAGS_QSIZE; -entries_max_nr += 1; - /* Get the usable entry number for allocating */ -for (i = 0; i entries_max_nr; i++, entry++) { +for (i = 0; i adev-msix_max; i++, entry++) { /* Ignore unused entry even it's unmasked */ if (entry-data == 0) { continue; @@ -1012,7 +1006,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) msix_entry.assigned_dev_id = msix_nr.assigned_dev_id; entries_nr = 0; entry = adev-msix_table; -for (i = 0; i entries_max_nr; i++, entry++) { +for (i = 0; i adev-msix_max; i++, entry++) { if (entries_nr = msix_nr.entry_nr) break; if (entry-data == 0) { @@ -1225,6 +1219,9 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev) bar_nr = msix_table_entry PCI_MSIX_FLAGS_BIRMASK; msix_table_entry = ~PCI_MSIX_FLAGS_BIRMASK; dev-msix_table_addr = pci_region[bar_nr].base_addr + msix_table_entry; +dev-msix_max = pci_get_word(pci_dev-config + pos + PCI_MSIX_FLAGS); +dev-msix_max = PCI_MSIX_FLAGS_QSIZE; +dev-msix_max += 1; } /* Minimal PM support, nothing writable, device appears to NAK changes */ diff --git a/hw/device-assignment.h b/hw/device-assignment.h index e229303..a909fb2 100644 --- a/hw/device-assignment.h +++ b/hw/device-assignment.h @@ -119,6 +119,7 @@ typedef struct AssignedDevice { struct kvm_irq_routing_entry *entry; MSIXTableEntry *msix_table; target_phys_addr_t msix_table_addr; +uint16_t msix_max; MemoryRegion mmio; char *configfd_name; int32_t bootindex; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/9] pci-assign: Proper initialization for MSI-X table
On Tue, 2012-01-31 at 22:19 +0200, Michael S. Tsirkin wrote: On Tue, Jan 31, 2012 at 08:16:31PM +0100, Jan Kiszka wrote: On 2012-01-31 20:12, Michael S. Tsirkin wrote: On Tue, Jan 31, 2012 at 12:07:39PM -0700, Alex Williamson wrote: On Tue, 2012-01-31 at 19:40 +0200, Michael S. Tsirkin wrote: On Sat, Jan 28, 2012 at 07:22:09AM -0700, Alex Williamson wrote: Per the PCI spec, all vectors should be masked at handoff. Signed-off-by: Alex Williamson alex.william...@redhat.com --- hw/device-assignment.c | 20 +++- 1 files changed, 19 insertions(+), 1 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index af614d3..6efa219 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -1462,6 +1462,22 @@ static const MemoryRegionOps msix_mmio_ops = { }, }; +static void msix_reset(AssignedDevice *dev) +{ +MSIXTableEntry *entry; +int i; + +if (!dev-msix_table) { +return; +} + +memset(dev-msix_table, 0, 0x1000); + +for (i = 0, entry = dev-msix_table; i dev-msix_max; i++, entry++) { +entry-ctrl = 0x1; /* Masked */ This is broken for BE hosts. Show me a BE host that even remotely works with this device assignment implementation. Thanks, Alex I don't get it. Yes lots of cleanup is needed but why add more broken code? At some point, we will just do this via the PCI core, and that will have to do it correctly anyway. This code here is supposed to be removed again. Jan Either we say this is all dead code, then let's just fix bugs until it is rewritten. To fix the bug, all we need is patch 9, maybe 6 and 7. Or there is intent to maintain it going forward then, methinks, it needs to be brought up to some minimal standards. There's minimal standards and then there's endian neutrality and non-x86 support for a piece of code that, I think, is probably never going to see qemu.git. There's a big gap between those. I'd love for us to move to vfio, but for now we're stuck with this and I'll try to update it and make it a little better each time. Looking out for big endian hosts isn't high on my priority list for this code though. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/9] pci-assign: Optionally enable 64bit BARs in guest
On Tue, 2012-01-31 at 15:33 +0200, Avi Kivity wrote: On 01/31/2012 03:21 PM, Jan Kiszka wrote: On 2012-01-31 14:10, Avi Kivity wrote: On 01/31/2012 02:57 PM, Jan Kiszka wrote: Seems fine, but do we really need the option? If it doesn't work we should treat it as an ordinary but and fix it. So far it's against the architecture of the emulated system: our current chipset predates 64 bit PCI. Then it should be enabled/disabled at the chipset level. Makes me wonder if we already do some filtering if the device supports 64 bit but the next bridge does not. Our 440fx does support 64-bit bars, so the question doesn't arise for x86. Instead we violate the spec. If you mean by our the 440fx-qemu, not the real 440fx. That one does not even support 1GB RAM. Yes, that's what I meant. It also supports pci hotplug, more slots, cpu hotplug, etc. I'll drop this patch for now, it was just something I enabled based on a query from MST and didn't want to lose it. Maybe we need the option in PCI core, but I'm just turn it on and hope for the best w/o giving users a way to disable it. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/9] pci-assign: Update MSI-X MMIO to Memory API
On Tue, 2012-01-31 at 14:45 +0200, Avi Kivity wrote: On 01/28/2012 04:21 PM, Alex Williamson wrote: Stop using compatibility mode and at the same time fix available access sizes. The PCI spec indicates that the MSI-X table may only be accessed as DWORD or QWORD. static const MemoryRegionOps msix_mmio_ops = { -.old_mmio = { -.read = { msix_mmio_readb, msix_mmio_readw, msix_mmio_readl, }, -.write = { msix_mmio_writeb, msix_mmio_writew, msix_mmio_writel, }, -}, +.read = msix_mmio_read, +.write = msix_mmio_write, .endianness = DEVICE_NATIVE_ENDIAN, +.impl = { +.min_access_size = 4, +.max_access_size = 8, +}, }; .impl.min_access_size = 4 means the core will convert 1-byte I/O to 4-byte I/O (using rmw if needed). That's not what we want, I think you can leave it at 1 and explicitly ignore small accesses in the callbacks. Have you tested 8-byte I/O? This is the first user. Don't you need to set .valid.max_access_size? I have not explicitly tested 8-byte I/O, figured it might just work. Hmm, I wonder if we really need to be strict enough to reject byte and word access. It doesn't follow the spec, but I don't know that it buys us anything to be strict about it. Anyway, I'll look at .valid and respin. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/9] pci-assign: Optionally enable 64bit BARs in guest
On Tue, Jan 31, 2012 at 02:08:38PM -0700, Alex Williamson wrote: On Tue, 2012-01-31 at 15:33 +0200, Avi Kivity wrote: On 01/31/2012 03:21 PM, Jan Kiszka wrote: On 2012-01-31 14:10, Avi Kivity wrote: On 01/31/2012 02:57 PM, Jan Kiszka wrote: Seems fine, but do we really need the option? If it doesn't work we should treat it as an ordinary but and fix it. So far it's against the architecture of the emulated system: our current chipset predates 64 bit PCI. Then it should be enabled/disabled at the chipset level. Makes me wonder if we already do some filtering if the device supports 64 bit but the next bridge does not. Our 440fx does support 64-bit bars, so the question doesn't arise for x86. Instead we violate the spec. If you mean by our the 440fx-qemu, not the real 440fx. That one does not even support 1GB RAM. Yes, that's what I meant. It also supports pci hotplug, more slots, cpu hotplug, etc. I'll drop this patch for now, it was just something I enabled based on a query from MST and didn't want to lose it. Maybe we need the option in PCI core, but I'm just turn it on and hope for the best w/o giving users a way to disable it. Thanks, Alex The patch itself makes sense to me. Avi found some bugs in 64 bit BAR handling, let's wait till that is fixed then merge this patch. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/7] hpet: Save/restore cached RTC IRQ level
On 01/31/2012 11:41 AM, Jan Kiszka wrote: In legacy mode, the HPET suppresses the RTC interrupt delivery via IRQ 8 but keeps track of the RTC output level and applies it when legacy mode is turned off again. This value has to be preserved across save/ restore as it cannot be reconstructed otherwise. Signed-off-by: Jan Kiszkajan.kis...@siemens.com --- hw/hpet.c | 26 ++ 1 files changed, 26 insertions(+), 0 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index aba9ea9..39b291f 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -240,6 +240,24 @@ static int hpet_post_load(void *opaque, int version_id) return 0; } +static bool hpet_rtc_irq_level_needed(void *opaque) +{ +HPETState *s = opaque; + +return s-rtc_irq_level != 0; +} + +static const VMStateDescription vmstate_hpet_rtc_irq_level = { +.name = hpet/rtc_irq_level, +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField[]) { +VMSTATE_UINT8(rtc_irq_level, HPETState), +VMSTATE_END_OF_LIST() +} +}; + This won't work. We don't clear rtc_irq_level on reset so rtc_irq_level may be high or low after reset. As such, we can't use a subsection here. We need to bump the savevm state. Regards, Anthony Liguori static const VMStateDescription vmstate_hpet_timer = { .name = hpet_timer, .version_id = 1, @@ -273,6 +291,14 @@ static const VMStateDescription vmstate_hpet = { VMSTATE_STRUCT_VARRAY_UINT8(timer, HPETState, num_timers, 0, vmstate_hpet_timer, HPETTimer), VMSTATE_END_OF_LIST() +}, +.subsections = (VMStateSubsection[]) { +{ +.vmsd =vmstate_hpet_rtc_irq_level, +.needed = hpet_rtc_irq_level_needed, +}, { +/* empty */ +} } }; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/9] pci-assign: Use struct for MSI-X table
On Tue, 2012-01-31 at 22:00 +0200, Michael S. Tsirkin wrote: On Tue, Jan 31, 2012 at 12:05:49PM -0700, Alex Williamson wrote: On Tue, 2012-01-31 at 19:40 +0200, Michael S. Tsirkin wrote: On Sat, Jan 28, 2012 at 07:21:58AM -0700, Alex Williamson wrote: This makes access much easier. Signed-off-by: Alex Williamson alex.william...@redhat.com Yes but this also makes it easier to forget to handle endian-ness. How about using pci_get/pci_set instead? Do we really think existing device assignment is ever going to work on anything but x86/x86? Alex It's a question of whether anyone bothers to fix all portability bugs. Besides endian-ness, and msix vector decoding, there's not much that is x86 specific there. The way interrupt routing is updated from the chipset is disgustingly x86 specific. The entire kernel side implementation is only enabled for x86 (well, not counting ia64). I expect the kernel side interrupt injection is very x86, or at least ioapic, specific. Beyond that, yeah, it may be trivial ;) Thanks, Alex --- hw/device-assignment.c | 55 ++-- hw/device-assignment.h |9 +++- 2 files changed, 33 insertions(+), 31 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 1fc7ffa..422ee00 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -973,10 +973,9 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev); uint16_t entries_nr = 0, entries_max_nr; int pos = 0, i, r = 0; -uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl; struct kvm_assigned_msix_nr msix_nr; struct kvm_assigned_msix_entry msix_entry; -void *va = adev-msix_table_page; +MSIXTableEntry *entry = adev-msix_table; pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX); @@ -985,12 +984,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) entries_max_nr += 1; /* Get the usable entry number for allocating */ -for (i = 0; i entries_max_nr; i++) { -memcpy(msg_ctrl, va + i * 16 + 12, 4); -memcpy(msg_data, va + i * 16 + 8, 4); +for (i = 0; i entries_max_nr; i++, entry++) { /* Ignore unused entry even it's unmasked */ -if (msg_data == 0) +if (entry-data == 0) { continue; +} entries_nr ++; } @@ -1013,16 +1011,13 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) msix_entry.assigned_dev_id = msix_nr.assigned_dev_id; entries_nr = 0; -for (i = 0; i entries_max_nr; i++) { +entry = adev-msix_table; +for (i = 0; i entries_max_nr; i++, entry++) { if (entries_nr = msix_nr.entry_nr) break; -memcpy(msg_ctrl, va + i * 16 + 12, 4); -memcpy(msg_data, va + i * 16 + 8, 4); -if (msg_data == 0) +if (entry-data == 0) { continue; - -memcpy(msg_addr, va + i * 16, 4); -memcpy(msg_upper_addr, va + i * 16 + 4, 4); +} r = kvm_get_irq_route_gsi(); if (r 0) @@ -1031,10 +1026,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) adev-entry[entries_nr].gsi = r; adev-entry[entries_nr].type = KVM_IRQ_ROUTING_MSI; adev-entry[entries_nr].flags = 0; -adev-entry[entries_nr].u.msi.address_lo = msg_addr; -adev-entry[entries_nr].u.msi.address_hi = msg_upper_addr; -adev-entry[entries_nr].u.msi.data = msg_data; -DEBUG(MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!, msg_data, msg_addr); +adev-entry[entries_nr].u.msi.address_lo = entry-addr_lo; +adev-entry[entries_nr].u.msi.address_hi = entry-addr_hi; +adev-entry[entries_nr].u.msi.data = entry-data; +DEBUG(MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!, + entry-data, entry-addr_lo); kvm_add_routing_entry(adev-entry[entries_nr]); msix_entry.gsi = adev-entry[entries_nr].gsi; @@ -1443,7 +1439,7 @@ static uint64_t msix_mmio_read(void *opaque, target_phys_addr_t addr, AssignedDevice *adev = opaque; uint64_t val; -memcpy(val, (void *)((uint8_t *)adev-msix_table_page + addr), size); +memcpy(val, (void *)((uint8_t *)adev-msix_table + addr), size); return val; } @@ -1456,7 +1452,7 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr, DEBUG(write to MSI-X entry table mmio offset 0x%lx, val 0x%lx\n, addr, val); -memcpy((void *)((uint8_t
Re: [PATCH 4/9] pci-assign: Use struct for MSI-X table
On Tue, Jan 31, 2012 at 02:17:30PM -0700, Alex Williamson wrote: On Tue, 2012-01-31 at 22:00 +0200, Michael S. Tsirkin wrote: On Tue, Jan 31, 2012 at 12:05:49PM -0700, Alex Williamson wrote: On Tue, 2012-01-31 at 19:40 +0200, Michael S. Tsirkin wrote: On Sat, Jan 28, 2012 at 07:21:58AM -0700, Alex Williamson wrote: This makes access much easier. Signed-off-by: Alex Williamson alex.william...@redhat.com Yes but this also makes it easier to forget to handle endian-ness. How about using pci_get/pci_set instead? Do we really think existing device assignment is ever going to work on anything but x86/x86? Alex It's a question of whether anyone bothers to fix all portability bugs. Besides endian-ness, and msix vector decoding, there's not much that is x86 specific there. The way interrupt routing is updated from the chipset is disgustingly x86 specific. Right, that's what I mean - the decoding of msix vector to destination. The entire kernel side implementation is only enabled for x86 (well, not counting ia64). I expect the kernel side interrupt injection is very x86, or at least ioapic, specific. Beyond that, yeah, it may be trivial ;) Thanks, Alex --- hw/device-assignment.c | 55 ++-- hw/device-assignment.h |9 +++- 2 files changed, 33 insertions(+), 31 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 1fc7ffa..422ee00 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -973,10 +973,9 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev); uint16_t entries_nr = 0, entries_max_nr; int pos = 0, i, r = 0; -uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl; struct kvm_assigned_msix_nr msix_nr; struct kvm_assigned_msix_entry msix_entry; -void *va = adev-msix_table_page; +MSIXTableEntry *entry = adev-msix_table; pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX); @@ -985,12 +984,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) entries_max_nr += 1; /* Get the usable entry number for allocating */ -for (i = 0; i entries_max_nr; i++) { -memcpy(msg_ctrl, va + i * 16 + 12, 4); -memcpy(msg_data, va + i * 16 + 8, 4); +for (i = 0; i entries_max_nr; i++, entry++) { /* Ignore unused entry even it's unmasked */ -if (msg_data == 0) +if (entry-data == 0) { continue; +} entries_nr ++; } @@ -1013,16 +1011,13 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) msix_entry.assigned_dev_id = msix_nr.assigned_dev_id; entries_nr = 0; -for (i = 0; i entries_max_nr; i++) { +entry = adev-msix_table; +for (i = 0; i entries_max_nr; i++, entry++) { if (entries_nr = msix_nr.entry_nr) break; -memcpy(msg_ctrl, va + i * 16 + 12, 4); -memcpy(msg_data, va + i * 16 + 8, 4); -if (msg_data == 0) +if (entry-data == 0) { continue; - -memcpy(msg_addr, va + i * 16, 4); -memcpy(msg_upper_addr, va + i * 16 + 4, 4); +} r = kvm_get_irq_route_gsi(); if (r 0) @@ -1031,10 +1026,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) adev-entry[entries_nr].gsi = r; adev-entry[entries_nr].type = KVM_IRQ_ROUTING_MSI; adev-entry[entries_nr].flags = 0; -adev-entry[entries_nr].u.msi.address_lo = msg_addr; -adev-entry[entries_nr].u.msi.address_hi = msg_upper_addr; -adev-entry[entries_nr].u.msi.data = msg_data; -DEBUG(MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!, msg_data, msg_addr); +adev-entry[entries_nr].u.msi.address_lo = entry-addr_lo; +adev-entry[entries_nr].u.msi.address_hi = entry-addr_hi; +adev-entry[entries_nr].u.msi.data = entry-data; +DEBUG(MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!, + entry-data, entry-addr_lo); kvm_add_routing_entry(adev-entry[entries_nr]); msix_entry.gsi = adev-entry[entries_nr].gsi; @@ -1443,7 +1439,7 @@ static uint64_t msix_mmio_read(void *opaque, target_phys_addr_t addr, AssignedDevice *adev = opaque; uint64_t val; -memcpy(val, (void *)((uint8_t *)adev-msix_table_page + addr), size); +memcpy(val, (void *)((uint8_t *)adev-msix_table + addr), size); return val;
Re: [PATCH 4/9] pci-assign: Use struct for MSI-X table
On Tue, 2012-01-31 at 23:24 +0200, Michael S. Tsirkin wrote: On Tue, Jan 31, 2012 at 02:17:30PM -0700, Alex Williamson wrote: On Tue, 2012-01-31 at 22:00 +0200, Michael S. Tsirkin wrote: On Tue, Jan 31, 2012 at 12:05:49PM -0700, Alex Williamson wrote: On Tue, 2012-01-31 at 19:40 +0200, Michael S. Tsirkin wrote: On Sat, Jan 28, 2012 at 07:21:58AM -0700, Alex Williamson wrote: This makes access much easier. Signed-off-by: Alex Williamson alex.william...@redhat.com Yes but this also makes it easier to forget to handle endian-ness. How about using pci_get/pci_set instead? Do we really think existing device assignment is ever going to work on anything but x86/x86? Alex It's a question of whether anyone bothers to fix all portability bugs. Besides endian-ness, and msix vector decoding, there's not much that is x86 specific there. The way interrupt routing is updated from the chipset is disgustingly x86 specific. Right, that's what I mean - the decoding of msix vector to destination. Yes, that, but I was referring more to INTx and the ugly hard coded hook to get notified when the 440fx changes PCI interrupt link setup. Thanks, Alex The entire kernel side implementation is only enabled for x86 (well, not counting ia64). I expect the kernel side interrupt injection is very x86, or at least ioapic, specific. Beyond that, yeah, it may be trivial ;) Thanks, Alex --- hw/device-assignment.c | 55 ++-- hw/device-assignment.h |9 +++- 2 files changed, 33 insertions(+), 31 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 1fc7ffa..422ee00 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -973,10 +973,9 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev); uint16_t entries_nr = 0, entries_max_nr; int pos = 0, i, r = 0; -uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl; struct kvm_assigned_msix_nr msix_nr; struct kvm_assigned_msix_entry msix_entry; -void *va = adev-msix_table_page; +MSIXTableEntry *entry = adev-msix_table; pos = pci_find_capability(pci_dev, PCI_CAP_ID_MSIX); @@ -985,12 +984,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) entries_max_nr += 1; /* Get the usable entry number for allocating */ -for (i = 0; i entries_max_nr; i++) { -memcpy(msg_ctrl, va + i * 16 + 12, 4); -memcpy(msg_data, va + i * 16 + 8, 4); +for (i = 0; i entries_max_nr; i++, entry++) { /* Ignore unused entry even it's unmasked */ -if (msg_data == 0) +if (entry-data == 0) { continue; +} entries_nr ++; } @@ -1013,16 +1011,13 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) msix_entry.assigned_dev_id = msix_nr.assigned_dev_id; entries_nr = 0; -for (i = 0; i entries_max_nr; i++) { +entry = adev-msix_table; +for (i = 0; i entries_max_nr; i++, entry++) { if (entries_nr = msix_nr.entry_nr) break; -memcpy(msg_ctrl, va + i * 16 + 12, 4); -memcpy(msg_data, va + i * 16 + 8, 4); -if (msg_data == 0) +if (entry-data == 0) { continue; - -memcpy(msg_addr, va + i * 16, 4); -memcpy(msg_upper_addr, va + i * 16 + 4, 4); +} r = kvm_get_irq_route_gsi(); if (r 0) @@ -1031,10 +1026,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) adev-entry[entries_nr].gsi = r; adev-entry[entries_nr].type = KVM_IRQ_ROUTING_MSI; adev-entry[entries_nr].flags = 0; -adev-entry[entries_nr].u.msi.address_lo = msg_addr; -adev-entry[entries_nr].u.msi.address_hi = msg_upper_addr; -adev-entry[entries_nr].u.msi.data = msg_data; -DEBUG(MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!, msg_data, msg_addr); +adev-entry[entries_nr].u.msi.address_lo = entry-addr_lo; +adev-entry[entries_nr].u.msi.address_hi = entry-addr_hi; +adev-entry[entries_nr].u.msi.data = entry-data; +DEBUG(MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!, + entry-data, entry-addr_lo); kvm_add_routing_entry(adev-entry[entries_nr]); msix_entry.gsi = adev-entry[entries_nr].gsi; @@ -1443,7 +1439,7 @@ static
Re: [Qemu-devel] [PATCH 1/4] i8254: Factor out base class for KVM reuse
On 01/31/2012 12:46 PM, Jan Kiszka wrote: Applying the concept used for the *PICs once again: establish a base class for the i8254 that can be used both by the current user space emulation and the upcoming KVM in-kernel version. We share most of the public interface of the i8254, specifically to the pcspk, vmstate, reset and certain init parts. Signed-off-by: Jan Kiszkajan.kis...@siemens.com Now that we have QOM bits, there's no need to factor out a common base class. Just make the methods that you want to override virtual with the default implementation and then make a KVMPIT that inherits from the PIT and then overrides whatever virtual functions it needs to. Regards, Anthony Liguori --- Makefile.objs |2 +- hw/i8254.c | 277 ++--- hw/i8254_common.c | 311 +++ hw/i8254_internal.h | 87 ++ 4 files changed, 434 insertions(+), 243 deletions(-) create mode 100644 hw/i8254_common.c create mode 100644 hw/i8254_internal.h diff --git a/Makefile.objs b/Makefile.objs index b942625..6a733fa 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -210,7 +210,7 @@ hw-obj-$(CONFIG_EMPTY_SLOT) += empty_slot.o hw-obj-$(CONFIG_SERIAL) += serial.o hw-obj-$(CONFIG_PARALLEL) += parallel.o -hw-obj-$(CONFIG_I8254) += i8254.o +hw-obj-$(CONFIG_I8254) += i8254_common.o i8254.o hw-obj-$(CONFIG_PCSPK) += pcspk.o hw-obj-$(CONFIG_PCKBD) += pckbd.o hw-obj-$(CONFIG_USB_UHCI) += usb-uhci.o diff --git a/hw/i8254.c b/hw/i8254.c index ff253bb..90e9a87 100644 --- a/hw/i8254.c +++ b/hw/i8254.c @@ -26,6 +26,7 @@ #include isa.h #include qemu-timer.h #include i8254.h +#include i8254_internal.h //#define DEBUG_PIT @@ -34,34 +35,6 @@ #define RW_STATE_WORD0 3 #define RW_STATE_WORD1 4 -typedef struct PITChannelState { -int count; /* can be 65536 */ -uint16_t latched_count; -uint8_t count_latched; -uint8_t status_latched; -uint8_t status; -uint8_t read_state; -uint8_t write_state; -uint8_t write_latch; -uint8_t rw_mode; -uint8_t mode; -uint8_t bcd; /* not supported */ -uint8_t gate; /* timer start */ -int64_t count_load_time; -/* irq handling */ -int64_t next_transition_time; -QEMUTimer *irq_timer; -qemu_irq irq; -uint32_t irq_disabled; -} PITChannelState; - -typedef struct PITState { -ISADevice dev; -MemoryRegion ioports; -uint32_t iobase; -PITChannelState channels[3]; -} PITState; - static void pit_irq_timer_update(PITChannelState *s, int64_t current_time); static int pit_get_count(PITChannelState *s) @@ -89,99 +62,11 @@ static int pit_get_count(PITChannelState *s) return counter; } -/* get pit output bit */ -static int pit_get_out(PITChannelState *s, int64_t current_time) -{ -uint64_t d; -int out; - -d = muldiv64(current_time - s-count_load_time, PIT_FREQ, - get_ticks_per_sec()); -switch(s-mode) { -default: -case 0: -out = (d= s-count); -break; -case 1: -out = (d s-count); -break; -case 2: -if ((d % s-count) == 0 d != 0) -out = 1; -else -out = 0; -break; -case 3: -out = (d % s-count) ((s-count + 1) 1); -break; -case 4: -case 5: -out = (d == s-count); -break; -} -return out; -} - -/* return -1 if no transition will occur. */ -static int64_t pit_get_next_transition_time(PITChannelState *s, -int64_t current_time) -{ -uint64_t d, next_time, base; -int period2; - -d = muldiv64(current_time - s-count_load_time, PIT_FREQ, - get_ticks_per_sec()); -switch(s-mode) { -default: -case 0: -case 1: -if (d s-count) -next_time = s-count; -else -return -1; -break; -case 2: -base = (d / s-count) * s-count; -if ((d - base) == 0 d != 0) -next_time = base + s-count; -else -next_time = base + s-count + 1; -break; -case 3: -base = (d / s-count) * s-count; -period2 = ((s-count + 1) 1); -if ((d - base) period2) -next_time = base + period2; -else -next_time = base + s-count; -break; -case 4: -case 5: -if (d s-count) -next_time = s-count; -else if (d == s-count) -next_time = s-count + 1; -else -return -1; -break; -} -/* convert to timer units */ -next_time = s-count_load_time + muldiv64(next_time, get_ticks_per_sec(), - PIT_FREQ); -/* fix potential rounding problems */ -/* XXX: better solution: use a clock at PIT_FREQ Hz */ -if (next_time= current_time) -next_time = current_time + 1; -return next_time; -} -
Re: [Qemu-devel] [PATCH 1/4] i8254: Factor out base class for KVM reuse
On 2012-01-31 22:40, Anthony Liguori wrote: On 01/31/2012 12:46 PM, Jan Kiszka wrote: Applying the concept used for the *PICs once again: establish a base class for the i8254 that can be used both by the current user space emulation and the upcoming KVM in-kernel version. We share most of the public interface of the i8254, specifically to the pcspk, vmstate, reset and certain init parts. Signed-off-by: Jan Kiszkajan.kis...@siemens.com Now that we have QOM bits, there's no need to factor out a common base class. Just make the methods that you want to override virtual with the default implementation and then make a KVMPIT that inherits from the PIT and then overrides whatever virtual functions it needs to. That doesn't sound like the proper design for this purpose. Rather, we have an abstract base class that both implementations are derived from. If I'm not using QOM properly to achieve this, please tell me. Thanks, Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH 1/4] i8254: Factor out base class for KVM reuse
On 01/31/2012 03:49 PM, Jan Kiszka wrote: On 2012-01-31 22:40, Anthony Liguori wrote: On 01/31/2012 12:46 PM, Jan Kiszka wrote: Applying the concept used for the *PICs once again: establish a base class for the i8254 that can be used both by the current user space emulation and the upcoming KVM in-kernel version. We share most of the public interface of the i8254, specifically to the pcspk, vmstate, reset and certain init parts. Signed-off-by: Jan Kiszkajan.kis...@siemens.com Now that we have QOM bits, there's no need to factor out a common base class. Just make the methods that you want to override virtual with the default implementation and then make a KVMPIT that inherits from the PIT and then overrides whatever virtual functions it needs to. That doesn't sound like the proper design for this purpose. It's hard to say really. There's a lot more in the common class that I expected (like initialization of the ISA regions. I would expect the base class to look a lot more like an interface such that the KVM PIT implementation was trivialized more than it is. But I can argue it both ways so if you feel strongly here, I won't object. Regards, Anthony Liguori Rather, we have an abstract base class that both implementations are derived from. If I'm not using QOM properly to achieve this, please tell me. Thanks, Jan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3 6/7] pcspk: Convert to qdev
On 2012-01-31 21:49, Anthony Liguori wrote: On 01/31/2012 11:41 AM, Jan Kiszka wrote: Convert the PC speaker device to a qdev ISA model. Move the public interface to a dedicated header file at this chance. Signed-off-by: Jan Kiszkajan.kis...@siemens.com Heh, I did this too more or less the same way. Some comments below: --- arch_init.c|1 + hw/i82378.c|3 +- hw/mips_jazz.c |3 +- hw/pc.c|3 +- hw/pc.h|4 --- hw/pcspk.c | 62 ++- hw/pcspk.h | 45 7 files changed, 104 insertions(+), 17 deletions(-) create mode 100644 hw/pcspk.h diff --git a/arch_init.c b/arch_init.c index 2366511..a45485b 100644 --- a/arch_init.c +++ b/arch_init.c @@ -42,6 +42,7 @@ #include gdbstub.h #include hw/smbios.h #include exec-memory.h +#include hw/pcspk.h #ifdef TARGET_SPARC int graphic_width = 1024; diff --git a/hw/i82378.c b/hw/i82378.c index 127cadf..79fa8b7 100644 --- a/hw/i82378.c +++ b/hw/i82378.c @@ -20,6 +20,7 @@ #include pci.h #include pc.h #include i8254.h +#include pcspk.h //#define DEBUG_I82378 @@ -195,7 +196,7 @@ static void i82378_init(DeviceState *dev, I82378State *s) pit = pit_init(isabus, 0x40, 0, NULL); /* speaker */ -pcspk_init(pit); +pcspk_init(isabus, pit); /* 2 82C37 (dma) */ DMA_init(1,s-out[1]); diff --git a/hw/mips_jazz.c b/hw/mips_jazz.c index b61b218..65608dc 100644 --- a/hw/mips_jazz.c +++ b/hw/mips_jazz.c @@ -37,6 +37,7 @@ #include loader.h #include mc146818rtc.h #include i8254.h +#include pcspk.h #include blockdev.h #include sysbus.h #include exec-memory.h @@ -193,7 +194,7 @@ static void mips_jazz_init(MemoryRegion *address_space, cpu_exit_irq = qemu_allocate_irqs(cpu_request_exit, NULL, 1); DMA_init(0, cpu_exit_irq); pit = pit_init(isa_bus, 0x40, 0, NULL); -pcspk_init(pit); +pcspk_init(isa_bus, pit); /* ISA IO space at 0x9000 */ isa_mmio_init(0x9000, 0x0100); diff --git a/hw/pc.c b/hw/pc.c index 6fb1de7..f6a91a9 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -37,6 +37,7 @@ #include multiboot.h #include mc146818rtc.h #include i8254.h +#include pcspk.h #include msi.h #include sysbus.h #include sysemu.h @@ -1169,7 +1170,7 @@ void pc_basic_device_init(ISABus *isa_bus, qemu_irq *gsi, /* connect PIT to output control line of the HPET */ qdev_connect_gpio_out(hpet, 0, qdev_get_gpio_in(pit-qdev, 0)); } -pcspk_init(pit); +pcspk_init(isa_bus, pit); for(i = 0; i MAX_SERIAL_PORTS; i++) { if (serial_hds[i]) { diff --git a/hw/pc.h b/hw/pc.h index b08708d..1b47bbd 100644 --- a/hw/pc.h +++ b/hw/pc.h @@ -149,10 +149,6 @@ void piix4_smbus_register_device(SMBusDevice *dev, uint8_t addr); /* hpet.c */ extern int no_hpet; -/* pcspk.c */ -void pcspk_init(ISADevice *pit); -int pcspk_audio_init(ISABus *bus); - /* piix_pci.c */ struct PCII440FXState; typedef struct PCII440FXState PCII440FXState; diff --git a/hw/pcspk.c b/hw/pcspk.c index 43df818..5bd9e32 100644 --- a/hw/pcspk.c +++ b/hw/pcspk.c @@ -28,6 +28,7 @@ #include audio/audio.h #include qemu-timer.h #include i8254.h +#include pcspk.h #define PCSPK_BUF_LEN 1792 #define PCSPK_SAMPLE_RATE 32000 @@ -35,10 +36,13 @@ #define PCSPK_MIN_COUNT ((PIT_FREQ + PCSPK_MAX_FREQ - 1) / PCSPK_MAX_FREQ) typedef struct { +ISADevice dev; +MemoryRegion ioport; +uint32_t iobase; uint8_t sample_buf[PCSPK_BUF_LEN]; QEMUSoundCard card; SWVoiceOut *voice; -ISADevice *pit; +void *pit; unsigned int pit_count; unsigned int samples; unsigned int play_pos; @@ -47,7 +51,7 @@ typedef struct { } PCSpkState; static const char *s_spk = pcspk; -static PCSpkState pcspk_state; +static PCSpkState *pcspk_state; static inline void generate_samples(PCSpkState *s) { @@ -99,7 +103,7 @@ static void pcspk_callback(void *opaque, int free) int pcspk_audio_init(ISABus *bus) { -PCSpkState *s =pcspk_state; +PCSpkState *s = pcspk_state; struct audsettings as = {PCSPK_SAMPLE_RATE, 1, AUD_FMT_U8, 0}; This can be a follow up, but what this is really doing is enabling audio for this device. ISABus is unused as a parameter. Yes, but this is a callback for the audio subsystem, look at arch_init.c. Nothing we can do here, only if we change that callback prototype. I think it would be better to make this a qdev bool property (audio_enabled) and then modify the code that calls this function to either set the property as a global, or use the QOM path to specifically set it on this device. Either way, I think setting an audio_enabled flag via qdev makes a whole lot more sense conceptionally than using the -soundhw option. Yep,
Re: [PATCH v3 2/7] hpet: Save/restore cached RTC IRQ level
On 2012-01-31 22:02, Anthony Liguori wrote: On 01/31/2012 11:41 AM, Jan Kiszka wrote: In legacy mode, the HPET suppresses the RTC interrupt delivery via IRQ 8 but keeps track of the RTC output level and applies it when legacy mode is turned off again. This value has to be preserved across save/ restore as it cannot be reconstructed otherwise. Signed-off-by: Jan Kiszkajan.kis...@siemens.com --- hw/hpet.c | 26 ++ 1 files changed, 26 insertions(+), 0 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index aba9ea9..39b291f 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -240,6 +240,24 @@ static int hpet_post_load(void *opaque, int version_id) return 0; } +static bool hpet_rtc_irq_level_needed(void *opaque) +{ +HPETState *s = opaque; + +return s-rtc_irq_level != 0; +} + +static const VMStateDescription vmstate_hpet_rtc_irq_level = { +.name = hpet/rtc_irq_level, +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField[]) { +VMSTATE_UINT8(rtc_irq_level, HPETState), +VMSTATE_END_OF_LIST() +} +}; + This won't work. We don't clear rtc_irq_level on reset so rtc_irq_level may be high or low after reset. As such, we can't use a subsection here. We need to bump the savevm state. rtc_irq_level is updated during reset by the connected RTC device. It clears its IRQ line. Jan signature.asc Description: OpenPGP digital signature