Re: New version of my kvmctl script
FinnTux N/A wrote: But here is a new version. Now with version number! :) Changes: - includes init script /etc/initd.d/kvm which starts and stops virtual machines when booting and shutting down the host. Shutting down the guest is done by sending system_powerdown to the guest. If the guest doesn't powerdown in 120 seconds (adjustable) it will be killed. - kvmctl can set vm to start or not to start during boot (kvmctl vmname onboot yes|no). Without yes/no script shows whether VM is started on boot or not. Great! I planned to make my own start/stop scripts. Now I will use yours and perhaps help to improve them. Regards Thomas -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New version of my kvmctl script
Freddie Cash wrote: On June 2, 2008 11:20 am FinnTux N/A wrote: 2008/6/2 Freddie Cash [EMAIL PROTECTED]: Considering this is a test harness and not a control program, perhaps it should be renamed in the kvm sources/builds, to something more appropriate like kvmtest. Anyways, I was afraid of these naming issues. kvmctl is pretty generic name. Any suggestions? I'd prefer to keep the management/control scripts/apps named kvmctl, as that's easier to remember and fits with things like sysctl, apachectl, brctl, and other similar apps. Full ACK from me! I would also prefer, that 'kvmctl' is really for controlling kvm virtual machines. Should we make a bug report to package 'kvm'? Regards Thomas -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Handle vma regions with no backing page
Ben-Ami Yassour wrote: Anthony Liguori [EMAIL PROTECTED] wrote on 04/29/2008 05:32:09 PM: Subject [PATCH] Handle vma regions with no backing page This patch allows VMA's that contain no backing page to be used for guest memory. This is a drop-in replacement for Ben-Ami's first page in his direct mmio series. Here, we continue to allow mmio pages to be represented in the rmap. struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn) { - return pfn_to_page(gfn_to_pfn(kvm, gfn)); + pfn_t pfn; + + pfn = gfn_to_pfn(kvm, gfn); + if (pfn_valid(pfn)) + return pfn_to_page(pfn); + + return NULL; } We noticed that pfn_valid does not always works as expected by this patch to indicate that a pfn has a backing page. We have seen a case where CONFIG_NUMA was not set and then where pfn_valid returned 1 for an mmio pfn. We then changed the config file with CONFIG_NUMA set and it worked fine as expected (since a different implementation of pfn_valid was used). How should we overcome this issue? Looks like we need to reintroduce a refcount bit in the pte, and check the page using the VMA. Nick Piggin's lockless pagecache patches, which have the same issue, also introduce a pte_special bit. We could follow a similar route. http://www.mail-archive.com/[EMAIL PROTECTED]/msg04789.html -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: using smp_processor_id() in preemptible [00000000] code: pm-suspend/17334
2008/6/4 Avi Kivity [EMAIL PROTECTED]: Jiri Kosina wrote: [ re-introduced LKML to CC, and also added KVM CCs] On Tue, 3 Jun 2008, Zdenek Kabelac wrote: 2008/6/3 Jiri Kosina [EMAIL PROTECTED]: On Tue, 3 Jun 2008, Zdenek Kabelac wrote: Another backtrace from suspend code path: (T61, 2GB, C2D, no SD card) kernel from git 20080603, commit 1beee8dc8cf58e3f605bd7b34d7a39939be7d8d2 agpgart-intel :00:00.0: LATE suspend platform bay.0: LATE suspend platform dock.0: LATE suspend Extended CMOS year: 2000 hwsleep-0324 [00] enter_sleep_state : Entering sleep state [S3] Back to C! BUG: using smp_processor_id() in preemptible [] code: pm-suspend/17334 caller is do_machine_check+0xa9/0x500 Pid: 17334, comm: pm-suspend Not tainted 2.6.26-rc4 #31 Call Trace: [8118347c] debug_smp_processor_id+0xcc/0xd0 [810184d9] do_machine_check+0xa9/0x500 [81010e7b] ? init_8259A+0x1b/0x120 [810189d6] mce_init+0x56/0xf0 [81018a7b] mce_resume+0xb/0x10 [81204fd0] __sysdev_resume+0x20/0x60 [81205068] sysdev_resume+0x58/0x90 [8120aac9] device_power_up+0x9/0x10 [8106f4f7] suspend_devices_and_enter+0x147/0x1a0 [8106f6c6] enter_state+0x146/0x1d0 [8106f80a] state_store+0xba/0x100 [81177ae7] kobj_attr_store+0x17/0x20 [81110fea] sysfs_write_file+0xca/0x140 [810ba00b] vfs_write+0xcb/0x190 [810ba1c0] sys_write+0x50/0x90 [8100c4fb] system_call_after_swapgs+0x7b/0x80 This looks very much like the oops you reported here: http://lkml.org/lkml/2008/4/7/130 Is this also a virtual machine run under KVM, as it has been in the aforementioned thread? Ahh yes - you are right , I've completely forget about that old post - I've thought that my post are usually getting fixed sooner :) So yes - this is actually the same bug which is still not fixed within the latest kernel - the machine is running qemu guest (which seems to me now somehow also slower) OK, so it looks like KVM could be wrongly enabling IRQs/preemption on the resume path. The original bug-report is on http://lkml.org/lkml/2008/4/7/130 Wait, is this in a virtual machine, or on a host that's also running a virtual machine (or has the kvm modules loaded)? I looked at the kvm host resume path, and it doesn't touch interrupts. Oops is from my real-hardware T61 (host) running the kvm-qemu (guest) and was noticed after suspend. But everything seemed to work just fine - hostguest continued to operate normally after resume. Zdenek -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] QEMU/KVM: set cpu_single_env before flushing work
Jan Kiszka wrote: As Jerone pointed out, current kvm_invoke_guest_debug may segfault. The reason is lacking re-initialization of cpu_single_env before flush_queued_work is called. Here is the fix. Applied, thanks. I also removed the same assignment a few lines later. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu: document boot option for drive flag
Carlo Marcelo Arenas Belon wrote: complement 982e9b725e32f58158b6b9968f04e3377f52e63 with some basic documentation, extracted from the commit message. Applied, thanks. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: using smp_processor_id() in preemptible [00000000] code: pm-suspend/17334
Jiri Kosina wrote: [ re-introduced LKML to CC, and also added KVM CCs] On Tue, 3 Jun 2008, Zdenek Kabelac wrote: 2008/6/3 Jiri Kosina [EMAIL PROTECTED]: On Tue, 3 Jun 2008, Zdenek Kabelac wrote: Another backtrace from suspend code path: (T61, 2GB, C2D, no SD card) kernel from git 20080603, commit 1beee8dc8cf58e3f605bd7b34d7a39939be7d8d2 agpgart-intel :00:00.0: LATE suspend platform bay.0: LATE suspend platform dock.0: LATE suspend Extended CMOS year: 2000 hwsleep-0324 [00] enter_sleep_state : Entering sleep state [S3] Back to C! BUG: using smp_processor_id() in preemptible [] code: pm-suspend/17334 caller is do_machine_check+0xa9/0x500 Pid: 17334, comm: pm-suspend Not tainted 2.6.26-rc4 #31 Call Trace: [8118347c] debug_smp_processor_id+0xcc/0xd0 [810184d9] do_machine_check+0xa9/0x500 [81010e7b] ? init_8259A+0x1b/0x120 [810189d6] mce_init+0x56/0xf0 [81018a7b] mce_resume+0xb/0x10 [81204fd0] __sysdev_resume+0x20/0x60 [81205068] sysdev_resume+0x58/0x90 [8120aac9] device_power_up+0x9/0x10 [8106f4f7] suspend_devices_and_enter+0x147/0x1a0 [8106f6c6] enter_state+0x146/0x1d0 [8106f80a] state_store+0xba/0x100 [81177ae7] kobj_attr_store+0x17/0x20 [81110fea] sysfs_write_file+0xca/0x140 [810ba00b] vfs_write+0xcb/0x190 [810ba1c0] sys_write+0x50/0x90 [8100c4fb] system_call_after_swapgs+0x7b/0x80 This looks very much like the oops you reported here: http://lkml.org/lkml/2008/4/7/130 Is this also a virtual machine run under KVM, as it has been in the aforementioned thread? Ahh yes - you are right , I've completely forget about that old post - I've thought that my post are usually getting fixed sooner :) So yes - this is actually the same bug which is still not fixed within the latest kernel - the machine is running qemu guest (which seems to me now somehow also slower) OK, so it looks like KVM could be wrongly enabling IRQs/preemption on the resume path. The original bug-report is on http://lkml.org/lkml/2008/4/7/130 Wait, is this in a virtual machine, or on a host that's also running a virtual machine (or has the kvm modules loaded)? I looked at the kvm host resume path, and it doesn't touch interrupts. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 01/12] expose ACPI pmtimer to userspace (/dev/pmtimer)
On Wed, 4 Jun 2008, Avi Kivity wrote: Anthony Liguori wrote: Thomas Gleixner wrote: Can we please keep that code inside of drivers/clocksource/acpi_pm.c without creating a new disconnected file in drivers/char ? Btw, depending on the use case we might as well have a sysfs entry for that. I think sysfs would actually make a lot of sense for this. It's read many thousands of times per second. You don't want a read()/sprintf()/atoi() sequence every time. Eek, according to Andrea it's only used for migration purpose. Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Clear CR4.VMXE in hardware_disable
Eli Collins wrote: Clear CR4.VMXE in hardware_disable. There's no reason to leave it set after doing a VMXOFF. VMware Workstation 6.5 checks CR4.VMXE as a proxy for whether the CPU is in VMX mode, so leaving VMXE set means we'll refuse to power on. With this change the user can power on after unloading the kvm-intel module. I tested on kvm-67 and kvm-69. Applied, thanks. The patch was whitespace-mangled though. Did you cut'n'paste into your email client? -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: using smp_processor_id() in preemptible [00000000] code: pm-suspend/17334
Zdenek Kabelac wrote: Oops is from my real-hardware T61 (host) running the kvm-qemu (guest) and was noticed after suspend. But everything seemed to work just fine - hostguest continued to operate normally after resume. Can you reproduce this, with and without the kvm modules loaded? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 01/12] expose ACPI pmtimer to userspace (/dev/pmtimer)
Thomas Gleixner wrote: On Wed, 4 Jun 2008, Avi Kivity wrote: Anthony Liguori wrote: Thomas Gleixner wrote: Can we please keep that code inside of drivers/clocksource/acpi_pm.c without creating a new disconnected file in drivers/char ? Btw, depending on the use case we might as well have a sysfs entry for that. I think sysfs would actually make a lot of sense for this. It's read many thousands of times per second. You don't want a read()/sprintf()/atoi() sequence every time. Eek, according to Andrea it's only used for migration purpose. Oh, right. We also emulate pmtimer in qemu but it shouldn't need to read the host pmtimer. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New version of my kvmctl script
Freddie Cash wrote: Considering this is a test harness and not a control program, perhaps it should be renamed in the kvm sources/builds, to something more appropriate like kvmtest. Sure, patches welcome. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm: unable to handle kernel NULL pointer dereference
Tobias Diedrich wrote: Hi, I get the following Oops when trying to start qemu-kvm (Debian/unstable kvm package version 60+dfsg-1) on my system: BUG: unable to handle kernel NULL pointer dereference at 0008 IP: [8021d44f] svm_vcpu_run+0x34/0x351 kvm-60 is quite old. Can you try kvm-69? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 00/12] fake ACPI C2 emulation v2
Marcelo Tosatti wrote: On Sun, Jun 01, 2008 at 12:21:29PM +0300, Avi Kivity wrote: Marcelo Tosatti wrote: Addressing comments on the previous patchset, follows: - Same fake C2 emulation - /dev/pmtimer - Support for multiple IO bitmap pages + userspace interface - In-kernel ACPI pmtimer emulation Tested with Linux and WinXP guests. Also tested migration. Do you have any performance numbers, comparing qemu/kernel/passthrough? Test is 1 million gettimeofday calls, Xeon 1.60GHz with 4MB L2. guest (qemu emulation): cycles:1189759332 guest (in-kernel emulation): cycles:628046412 guest (direct pmtimer): cycles:230372934 host (TSC): cycles:14862774 Ratio is 1:15:80 Looks like direct pmtimer is still quite slow. Are there any exits with direct pmtimer, or is it all due to the ioport latency? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: using smp_processor_id() in preemptible [00000000] code: pm-suspend/17334
On Tue, 3 Jun 2008, Jiri Kosina wrote: Ahh yes - you are right , I've completely forget about that old post - I've thought that my post are usually getting fixed sooner :) So yes - this is actually the same bug which is still not fixed within the latest kernel - the machine is running qemu guest (which seems to me now somehow also slower) OK, so it looks like KVM could be wrongly enabling IRQs/preemption on the resume path. The original bug-report is on http://lkml.org/lkml/2008/4/7/130 sysdev_resume() is supposed to run with interrupts disabled, at least it was that way when the timekeeping_resume code was written. Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: using smp_processor_id() in preemptible [00000000] code: pm-suspend/17334
On Wed, 4 Jun 2008, Thomas Gleixner wrote: OK, so it looks like KVM could be wrongly enabling IRQs/preemption on the resume path. The original bug-report is on http://lkml.org/lkml/2008/4/7/130 sysdev_resume() is supposed to run with interrupts disabled, at least it was that way when the timekeeping_resume code was written. True. However in Zdenek's case, interrupts/preemption gets enabled somewhere during the resume, which correctly triggers the warning. Thanks, -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG: using smp_processor_id() in preemptible [00000000] code: pm-suspend/17334
Jiri Kosina wrote: On Wed, 4 Jun 2008, Thomas Gleixner wrote: OK, so it looks like KVM could be wrongly enabling IRQs/preemption on the resume path. The original bug-report is on http://lkml.org/lkml/2008/4/7/130 sysdev_resume() is supposed to run with interrupts disabled, at least it was that way when the timekeeping_resume code was written. True. However in Zdenek's case, interrupts/preemption gets enabled somewhere during the resume, which correctly triggers the warning. We might add a check to the generic resume controller, to check after each -resume() method and warn in case of interrupts enabled. That will pinpoint the offending driver immediately. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-1984384 ] soft lockup - CPU#5 stuck for 11s! [qemu-system-x86:4966]
Bugs item #1984384, was opened at 2008-06-04 13:49 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1984384group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Rafal Wijata (ravpl) Assigned to: Nobody/Anonymous (nobody) Summary: soft lockup - CPU#5 stuck for 11s! [qemu-system-x86:4966] Initial Comment: I'm using kvm-69 running on Linux 2.6.24.7-92.fc8 #1 SMP Wed May 7 16:26:02 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux kvm modules loaded from kvm-69 rather than kernel provided My system almost freezed after I killed qemu process. I saw many, many tasks in 'D' state, along with [reiserfs/?] tasks. Normally I would consider it reiserfs bug(and maybe it is), but two things - it happened after qemu process was killed(running with 6cpus, 6G memory, 16G hdd placed on reiserfs placed on 200M/s hdd) - dmesg showed following messages(2 total), which suggest it stucked in kvm BUG: soft lockup - CPU#5 stuck for 11s! [qemu-system-x86:4966] CPU 5: Modules linked in: ipt_REJECT nf_conntrack_ipv4 iptable_filter ip_tables kvm_intel(U) kvm(U) tun nfs lockd nfs_acl autofs4 coretemp hwmon fuse sunrpc bridge xt_tcpudp nf_conntrack_ipv6 xt_state nf_conntrack ip6t_REJECT ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand acpi_cpufreq reiserfs ext2 dm_mirror dm_multipath dm_mod i5000_edac iTCO_wdt serio_raw pcspkr iTCO_vendor_support e1000 button edac_core i2c_i801 ata_piix i2c_core pata_acpi ata_generic sg usb_storage ahci libata shpchp 3w_9xxx sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd Pid: 4966, comm: qemu-system-x86 Not tainted 2.6.24.7-92.fc8 #1 RIP: 0010:[8834b29e] [8834b29e] :kvm:rmap_remove+0x170/0x198 RSP: 0018:8101f4df5bd8 EFLAGS: 0246 RAX: 0002 RBX: 81004294af60 RCX: RDX: RSI: 0106 RDI: 8101770448c0 RBP: 8101ce0454d0 R08: c20001b86030 R09: 8101d3587118 R10: 0019e7ea R11: 8101394dd9c0 R12: 8100240cece0 R13: R14: 0019e7ea R15: 0018 FS: () GS:81021f049580() knlGS: CS: 0010 DS: 002b ES: 002b CR0: 8005003b CR2: f7ff6000 CR3: 00021b5e5000 CR4: 26e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Call Trace: [8834b1dd] :kvm:rmap_remove+0xaf/0x198 [8834b372] :kvm:kvm_mmu_zap_page+0x8a/0x25e [8834b9f3] :kvm:free_mmu_pages+0x12/0x34 [8834bac9] :kvm:kvm_mmu_destroy+0x1d/0x5e [88346979] :kvm:kvm_arch_vcpu_uninit+0x1d/0x38 [8834555b] :kvm:kvm_vcpu_uninit+0x9/0x15 [88163aa8] :kvm_intel:vmx_free_vcpu+0x74/0x84 [8834657b] :kvm:kvm_arch_destroy_vm+0x69/0xb4 [88345538] :kvm:kvm_vcpu_release+0x13/0x18 [810a35d4] __fput+0xc2/0x18f [810a0de7] filp_close+0x5d/0x65 [8103b3df] put_files_struct+0x66/0xc4 [8103c6f7] do_exit+0x28c/0x76b [8103cc55] sys_exit_group+0x0/0xe [81044163] get_signal_to_deliver+0x3aa/0x3d8 [8100b359] do_notify_resume+0xa8/0x732 [8126b7f6] unlock_kernel+0x32/0x33 [881c01db] :reiserfs:reiserfs_setattr+0x26e/0x27d [810a1866] do_truncate+0x70/0x79 [8100bf17] sysret_signal+0x1c/0x27 [8100c1a7] ptregscall_common+0x67/0xb0 -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1984384group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-1984384 ] soft lockup - CPU#5 stuck for 11s! [qemu-system-x86:4966]
Bugs item #1984384, was opened at 2008-06-04 14:49 Message generated for change (Comment added) made by avik You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1984384group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Rafal Wijata (ravpl) Assigned to: Nobody/Anonymous (nobody) Summary: soft lockup - CPU#5 stuck for 11s! [qemu-system-x86:4966] Initial Comment: I'm using kvm-69 running on Linux 2.6.24.7-92.fc8 #1 SMP Wed May 7 16:26:02 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux kvm modules loaded from kvm-69 rather than kernel provided My system almost freezed after I killed qemu process. I saw many, many tasks in 'D' state, along with [reiserfs/?] tasks. Normally I would consider it reiserfs bug(and maybe it is), but two things - it happened after qemu process was killed(running with 6cpus, 6G memory, 16G hdd placed on reiserfs placed on 200M/s hdd) - dmesg showed following messages(2 total), which suggest it stucked in kvm BUG: soft lockup - CPU#5 stuck for 11s! [qemu-system-x86:4966] CPU 5: Modules linked in: ipt_REJECT nf_conntrack_ipv4 iptable_filter ip_tables kvm_intel(U) kvm(U) tun nfs lockd nfs_acl autofs4 coretemp hwmon fuse sunrpc bridge xt_tcpudp nf_conntrack_ipv6 xt_state nf_conntrack ip6t_REJECT ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand acpi_cpufreq reiserfs ext2 dm_mirror dm_multipath dm_mod i5000_edac iTCO_wdt serio_raw pcspkr iTCO_vendor_support e1000 button edac_core i2c_i801 ata_piix i2c_core pata_acpi ata_generic sg usb_storage ahci libata shpchp 3w_9xxx sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd Pid: 4966, comm: qemu-system-x86 Not tainted 2.6.24.7-92.fc8 #1 RIP: 0010:[8834b29e] [8834b29e] :kvm:rmap_remove+0x170/0x198 RSP: 0018:8101f4df5bd8 EFLAGS: 0246 RAX: 0002 RBX: 81004294af60 RCX: RDX: RSI: 0106 RDI: 8101770448c0 RBP: 8101ce0454d0 R08: c20001b86030 R09: 8101d3587118 R10: 0019e7ea R11: 8101394dd9c0 R12: 8100240cece0 R13: R14: 0019e7ea R15: 0018 FS: () GS:81021f049580() knlGS: CS: 0010 DS: 002b ES: 002b CR0: 8005003b CR2: f7ff6000 CR3: 00021b5e5000 CR4: 26e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Call Trace: [8834b1dd] :kvm:rmap_remove+0xaf/0x198 [8834b372] :kvm:kvm_mmu_zap_page+0x8a/0x25e [8834b9f3] :kvm:free_mmu_pages+0x12/0x34 [8834bac9] :kvm:kvm_mmu_destroy+0x1d/0x5e [88346979] :kvm:kvm_arch_vcpu_uninit+0x1d/0x38 [8834555b] :kvm:kvm_vcpu_uninit+0x9/0x15 [88163aa8] :kvm_intel:vmx_free_vcpu+0x74/0x84 [8834657b] :kvm:kvm_arch_destroy_vm+0x69/0xb4 [88345538] :kvm:kvm_vcpu_release+0x13/0x18 [810a35d4] __fput+0xc2/0x18f [810a0de7] filp_close+0x5d/0x65 [8103b3df] put_files_struct+0x66/0xc4 [8103c6f7] do_exit+0x28c/0x76b [8103cc55] sys_exit_group+0x0/0xe [81044163] get_signal_to_deliver+0x3aa/0x3d8 [8100b359] do_notify_resume+0xa8/0x732 [8126b7f6] unlock_kernel+0x32/0x33 [881c01db] :reiserfs:reiserfs_setattr+0x26e/0x27d [810a1866] do_truncate+0x70/0x79 [8100bf17] sysret_signal+0x1c/0x27 [8100c1a7] ptregscall_common+0x67/0xb0 -- Comment By: Avi Kivity (avik) Date: 2008-06-04 15:19 Message: Logged In: YES user_id=539971 Originator: NO It's a kvm bug; kvm is spending too much time tearing down the page tables. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1984384group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ata exception messages
On Tue, 2008-06-03 at 10:49 -0400, Beth Kon wrote: I'm running an Ubuntu 7.10 guest on a kvm git build (commit 3125ffd6edb9384b3e418fc08fea99e7e1548a96) and am seeing repeated messages like: [3393.124685] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen [3393.127599] ata1.00: cmd ca/00:30:af:c1:48/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 out I see that they're coming from ata_eh_link_report in drivers/ata/libata-eh.c but am not familiar enough with this code to understand what the problem is. Does anyone have any idea what might be causing this? I discovered that these messages were associated with my disk image being NFS mounted. -- Elizabeth Kon (Beth) IBM Linux Technology Center Open Hypervisor Team email: [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm causing memory corruption? now 2.6.26-rc4
Dave Hansen wrote: On Thu, 2008-03-27 at 16:59 +0200, Avi Kivity wrote: Dave Hansen wrote: On Thu, 2008-03-27 at 12:10 +0200, Avi Kivity wrote: btw, is this with = 4GB RAM on the host? Well, are you asking whether I have PAE on or not? :) No, I'm asking whether there is a possibility of address truncation :) PAE by itself doesn't affect kvm much, as it always runs the guest in pae mode. Can you try running with mem=2000M or something? I have a few more data points on this. Sorry for the massive delay from the last report -- I'm being a crappy bug reporter. But, this is on my one and only laptop which makes it a serious pain to diagnose. I also didn't have a hardware serial console on it before, which I do now. This is all on 2.6.26-rc4-01549-g1beee8d. Adding the mem= does not help at all. But, it is all a bit more diagnosable now than a month or two ago. I turned on all of the kernel debugging that I could get my grubby little hands on. It now oopses quite consistently when kvm runs instead of after. Here's a collection of oopses that I captured after setting up a serial line: http://sr71.net/~dave/kvm-oops1.txt After collecting all those, I turned on CONFIG_DEBUG_HIGHMEM and the oopses miraculously stopped. But, the guest hung (for at least 5 minutes or so) during windows bootup, pegging my host CPU. Most of the CPU was going to klogd, so I checked dmesg. Can you check with mem=900 (and CONFIG_HIGHMEM_DEBUG=n)? That will confirm that the problems are highmem related, but not physical address truncation related. I was seeing messages like this [ 428.918108] kvm_handle_exit: unexpected, valid vectoring info and exit reason is 0x9 And quite a few of them, like 100,000/sec. That's why klogd was pegging the CPU. Any idea on a next debugging step? That's a task switch. Newer kvms handle them. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ata exception messages
On Wed, 2008-06-04 at 15:38 +0300, Avi Kivity wrote: Beth Kon wrote: On Tue, 2008-06-03 at 10:49 -0400, Beth Kon wrote: I'm running an Ubuntu 7.10 guest on a kvm git build (commit 3125ffd6edb9384b3e418fc08fea99e7e1548a96) and am seeing repeated messages like: [3393.124685] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen [3393.127599] ata1.00: cmd ca/00:30:af:c1:48/00:00:00:00:00/e0 tag 0 cdb 0x0 data 4096 out I see that they're coming from ata_eh_link_report in drivers/ata/libata-eh.c but am not familiar enough with this code to understand what the problem is. Does anyone have any idea what might be causing this? I discovered that these messages were associated with my disk image being NFS mounted. Yes, the network has been misbehaving lately, so could be causing timeouts. Interesting. Is it an exceptionally slow server (or perhaps, on a lossy network)? I can see how timeouts can annoy the ide driver, but I've never seen this myself. -- Elizabeth Kon (Beth) IBM Linux Technology Center Open Hypervisor Team email: [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] paravirt clock source patches, #4
Gerd Hoffmann wrote: Jeremy Fitzhardinge wrote: Gerd Hoffmann wrote: paravirt clock source patches, next round. There is now a pvclock-abi.h file with the structs and some longish comments in it and everybody is switched over to use the stuff in there. This all looks pretty good. How do you want this to get into the kernel? [ note: fixed up kvm list address: s/-owner// ] Good question. The kvm patches have dependencies on not-yet merged bits, so they have to go through the kvm queue. The first two can also go through Ingos x86 tree I guess. Alternativey, if Ingo acks, I'll send all five through kvm.git. Note the kvm specific patches need to be backported as we want them for 2.6.26. I can do that. (Ingo: said patches are in http://thread.gmane.org/gmane.comp.emulators.kvm.devel/18149) -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: PCIPT: direct mmio
Ben-Ami Yassour wrote: Amit, Below is the patch for PCI passthrough tree, it enables a guest to access a device's memory mapped I/O regions directly, without requiring the host to trap and emulate every MMIO access. This patch requires only userspace changes and it is relaying on the kernel patch by Anthony: Handle vma regions with no backing page. Note that this patch requires CONFIG_NUMA to be set. It does require a change to the VT-d that Allen sent a while ago, to avoid mapping of memory slots with no backing page. diff --git a/libkvm/libkvm.c b/libkvm/libkvm.c index d1e95a4..ce062cb 100644 --- a/libkvm/libkvm.c +++ b/libkvm/libkvm.c @@ -400,7 +400,7 @@ void *kvm_create_userspace_phys_mem(kvm_context_t kvm, unsigned long phys_start, { int r; int prot = PROT_READ; -void *ptr; +void *ptr = NULL; struct kvm_userspace_memory_region memory = { .memory_size = len, .guest_phys_addr = phys_start, @@ -410,16 +410,24 @@ void *kvm_create_userspace_phys_mem(kvm_context_t kvm, unsigned long phys_start, if (writable) prot |= PROT_WRITE; -ptr = mmap(NULL, len, prot, MAP_ANONYMOUS | MAP_SHARED, -1, 0); -if (ptr == MAP_FAILED) { -fprintf(stderr, create_userspace_phys_mem: %s, strerror(errno)); -return 0; -} +if (len 0) { +ptr = mmap(NULL, len, prot, MAP_ANONYMOUS | MAP_SHARED, -1, 0); +if (ptr == MAP_FAILED) { +fprintf(stderr, create_userspace_phys_mem: %s, +strerror(errno)); +return 0; +} You're using 'len == 0' here to change the semantics of the function. It would be better to have two different APIs (perhaps sharing some of the implementation by calling a helper). -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] KVM/userspace: Support for assigning PCI devices to guest
Amit Shah wrote: The first option can be enforced by calls to pci_enable_device() and pci_request_regions(). This can solve the problem of assigning multiple devices of the same guest as well. Yes, that sounds good. Dynamically unbinding devices is prone to a lot of errors and assumptions and such policy shouldn't be enforced. We should either fail the assignment and let the administrator take care of doing the right thing and start the guest or just not launch the guest at all. Agreed. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-1984384 ] soft lockup - CPU#5 stuck for 11s! [qemu-system-x86:4966]
Bugs item #1984384, was opened at 2008-06-04 05:49 Message generated for change (Comment added) made by dsahern You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1984384group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Rafal Wijata (ravpl) Assigned to: Nobody/Anonymous (nobody) Summary: soft lockup - CPU#5 stuck for 11s! [qemu-system-x86:4966] Initial Comment: I'm using kvm-69 running on Linux 2.6.24.7-92.fc8 #1 SMP Wed May 7 16:26:02 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux kvm modules loaded from kvm-69 rather than kernel provided My system almost freezed after I killed qemu process. I saw many, many tasks in 'D' state, along with [reiserfs/?] tasks. Normally I would consider it reiserfs bug(and maybe it is), but two things - it happened after qemu process was killed(running with 6cpus, 6G memory, 16G hdd placed on reiserfs placed on 200M/s hdd) - dmesg showed following messages(2 total), which suggest it stucked in kvm BUG: soft lockup - CPU#5 stuck for 11s! [qemu-system-x86:4966] CPU 5: Modules linked in: ipt_REJECT nf_conntrack_ipv4 iptable_filter ip_tables kvm_intel(U) kvm(U) tun nfs lockd nfs_acl autofs4 coretemp hwmon fuse sunrpc bridge xt_tcpudp nf_conntrack_ipv6 xt_state nf_conntrack ip6t_REJECT ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand acpi_cpufreq reiserfs ext2 dm_mirror dm_multipath dm_mod i5000_edac iTCO_wdt serio_raw pcspkr iTCO_vendor_support e1000 button edac_core i2c_i801 ata_piix i2c_core pata_acpi ata_generic sg usb_storage ahci libata shpchp 3w_9xxx sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd Pid: 4966, comm: qemu-system-x86 Not tainted 2.6.24.7-92.fc8 #1 RIP: 0010:[8834b29e] [8834b29e] :kvm:rmap_remove+0x170/0x198 RSP: 0018:8101f4df5bd8 EFLAGS: 0246 RAX: 0002 RBX: 81004294af60 RCX: RDX: RSI: 0106 RDI: 8101770448c0 RBP: 8101ce0454d0 R08: c20001b86030 R09: 8101d3587118 R10: 0019e7ea R11: 8101394dd9c0 R12: 8100240cece0 R13: R14: 0019e7ea R15: 0018 FS: () GS:81021f049580() knlGS: CS: 0010 DS: 002b ES: 002b CR0: 8005003b CR2: f7ff6000 CR3: 00021b5e5000 CR4: 26e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Call Trace: [8834b1dd] :kvm:rmap_remove+0xaf/0x198 [8834b372] :kvm:kvm_mmu_zap_page+0x8a/0x25e [8834b9f3] :kvm:free_mmu_pages+0x12/0x34 [8834bac9] :kvm:kvm_mmu_destroy+0x1d/0x5e [88346979] :kvm:kvm_arch_vcpu_uninit+0x1d/0x38 [8834555b] :kvm:kvm_vcpu_uninit+0x9/0x15 [88163aa8] :kvm_intel:vmx_free_vcpu+0x74/0x84 [8834657b] :kvm:kvm_arch_destroy_vm+0x69/0xb4 [88345538] :kvm:kvm_vcpu_release+0x13/0x18 [810a35d4] __fput+0xc2/0x18f [810a0de7] filp_close+0x5d/0x65 [8103b3df] put_files_struct+0x66/0xc4 [8103c6f7] do_exit+0x28c/0x76b [8103cc55] sys_exit_group+0x0/0xe [81044163] get_signal_to_deliver+0x3aa/0x3d8 [8100b359] do_notify_resume+0xa8/0x732 [8126b7f6] unlock_kernel+0x32/0x33 [881c01db] :reiserfs:reiserfs_setattr+0x26e/0x27d [810a1866] do_truncate+0x70/0x79 [8100bf17] sysret_signal+0x1c/0x27 [8100c1a7] ptregscall_common+0x67/0xb0 -- Comment By: david ahern (dsahern) Date: 2008-06-04 08:20 Message: Logged In: YES user_id=1755596 Originator: NO I hit this issue yesterday as well. Host is running 2.6.26-rc3 from kvm.git, per-page-pte-tracking branch. At the time a VM had been up and running for ~24 hours, and I was installing another VM. The guest that had been running for ~24 hours terminated abruptly. [4776654.043860] BUG: soft lockup - CPU#0 stuck for 94s! [ksoftirqd/0:4] [4776654.043860] CPU 0: [4776654.043860] Modules linked in: tun bridge llc iptable_filter ip_tables x_tables kvm_intel kvm usbhid ahci ata_piix libata ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd usbcore [4776654.043860] Pid: 4, comm: ksoftirqd/0 Not tainted 2.6.26-rc3-00969-g7cce43a #1 [4776654.043860] RIP: 0010:[812cf205] [812cf205] _spin_unlock_irq+0xc/0x2a [4776654.043860] RSP: 0018:81510f38 EFLAGS: 0202 [4776654.043860] RAX: 81510f48 RBX: 81510f38 RCX: 811d8fad [4776654.043860] RDX: 81510f48 RSI: 561a RDI: 0001 [4776654.043860] RBP: 81510eb0 R08: 8101a61c3d58 R09: [4776654.043860] R10: 8147fe40 R11:
[ kvm-Bugs-1984384 ] soft lockup - CPU#5 stuck for 11s! [qemu-system-x86:4966]
Bugs item #1984384, was opened at 2008-06-04 14:49 Message generated for change (Comment added) made by avik You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1984384group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Rafal Wijata (ravpl) Assigned to: Nobody/Anonymous (nobody) Summary: soft lockup - CPU#5 stuck for 11s! [qemu-system-x86:4966] Initial Comment: I'm using kvm-69 running on Linux 2.6.24.7-92.fc8 #1 SMP Wed May 7 16:26:02 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux kvm modules loaded from kvm-69 rather than kernel provided My system almost freezed after I killed qemu process. I saw many, many tasks in 'D' state, along with [reiserfs/?] tasks. Normally I would consider it reiserfs bug(and maybe it is), but two things - it happened after qemu process was killed(running with 6cpus, 6G memory, 16G hdd placed on reiserfs placed on 200M/s hdd) - dmesg showed following messages(2 total), which suggest it stucked in kvm BUG: soft lockup - CPU#5 stuck for 11s! [qemu-system-x86:4966] CPU 5: Modules linked in: ipt_REJECT nf_conntrack_ipv4 iptable_filter ip_tables kvm_intel(U) kvm(U) tun nfs lockd nfs_acl autofs4 coretemp hwmon fuse sunrpc bridge xt_tcpudp nf_conntrack_ipv6 xt_state nf_conntrack ip6t_REJECT ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand acpi_cpufreq reiserfs ext2 dm_mirror dm_multipath dm_mod i5000_edac iTCO_wdt serio_raw pcspkr iTCO_vendor_support e1000 button edac_core i2c_i801 ata_piix i2c_core pata_acpi ata_generic sg usb_storage ahci libata shpchp 3w_9xxx sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd Pid: 4966, comm: qemu-system-x86 Not tainted 2.6.24.7-92.fc8 #1 RIP: 0010:[8834b29e] [8834b29e] :kvm:rmap_remove+0x170/0x198 RSP: 0018:8101f4df5bd8 EFLAGS: 0246 RAX: 0002 RBX: 81004294af60 RCX: RDX: RSI: 0106 RDI: 8101770448c0 RBP: 8101ce0454d0 R08: c20001b86030 R09: 8101d3587118 R10: 0019e7ea R11: 8101394dd9c0 R12: 8100240cece0 R13: R14: 0019e7ea R15: 0018 FS: () GS:81021f049580() knlGS: CS: 0010 DS: 002b ES: 002b CR0: 8005003b CR2: f7ff6000 CR3: 00021b5e5000 CR4: 26e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Call Trace: [8834b1dd] :kvm:rmap_remove+0xaf/0x198 [8834b372] :kvm:kvm_mmu_zap_page+0x8a/0x25e [8834b9f3] :kvm:free_mmu_pages+0x12/0x34 [8834bac9] :kvm:kvm_mmu_destroy+0x1d/0x5e [88346979] :kvm:kvm_arch_vcpu_uninit+0x1d/0x38 [8834555b] :kvm:kvm_vcpu_uninit+0x9/0x15 [88163aa8] :kvm_intel:vmx_free_vcpu+0x74/0x84 [8834657b] :kvm:kvm_arch_destroy_vm+0x69/0xb4 [88345538] :kvm:kvm_vcpu_release+0x13/0x18 [810a35d4] __fput+0xc2/0x18f [810a0de7] filp_close+0x5d/0x65 [8103b3df] put_files_struct+0x66/0xc4 [8103c6f7] do_exit+0x28c/0x76b [8103cc55] sys_exit_group+0x0/0xe [81044163] get_signal_to_deliver+0x3aa/0x3d8 [8100b359] do_notify_resume+0xa8/0x732 [8126b7f6] unlock_kernel+0x32/0x33 [881c01db] :reiserfs:reiserfs_setattr+0x26e/0x27d [810a1866] do_truncate+0x70/0x79 [8100bf17] sysret_signal+0x1c/0x27 [8100c1a7] ptregscall_common+0x67/0xb0 -- Comment By: Avi Kivity (avik) Date: 2008-06-04 17:33 Message: Logged In: YES user_id=539971 Originator: NO Did the system recover later? David, how much memory did you assign to the guest? -- Comment By: david ahern (dsahern) Date: 2008-06-04 17:20 Message: Logged In: YES user_id=1755596 Originator: NO I hit this issue yesterday as well. Host is running 2.6.26-rc3 from kvm.git, per-page-pte-tracking branch. At the time a VM had been up and running for ~24 hours, and I was installing another VM. The guest that had been running for ~24 hours terminated abruptly. [4776654.043860] BUG: soft lockup - CPU#0 stuck for 94s! [ksoftirqd/0:4] [4776654.043860] CPU 0: [4776654.043860] Modules linked in: tun bridge llc iptable_filter ip_tables x_tables kvm_intel kvm usbhid ahci ata_piix libata ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd usbcore [4776654.043860] Pid: 4, comm: ksoftirqd/0 Not tainted 2.6.26-rc3-00969-g7cce43a #1 [4776654.043860] RIP: 0010:[812cf205] [812cf205] _spin_unlock_irq+0xc/0x2a [4776654.043860] RSP: 0018:81510f38 EFLAGS: 0202 [4776654.043860] RAX: 81510f48 RBX:
Re: [PATCH 1/1] KVM/userspace: Support for assigning PCI devices to guest
On Wednesday 04 June 2008 19:49:21 Avi Kivity wrote: Amit Shah wrote: Dynamically unbinding devices is prone to a lot of errors and assumptions and such policy shouldn't be enforced. We should either fail the assignment and let the administrator take care of doing the right thing and start the guest or just not launch the guest at all. Agreed. I was thinking of putting the device in suspend state. However, I checked a few drivers and not all release resources during suspend. However, even if this is possible, it becomes an enforced policy that a user may not like. Muli, I changed my views on this a bit: what do you think? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] KVM/userspace: Support for assigning PCI devices to guest
Amit Shah wrote: I was thinking of putting the device in suspend state. However, I checked a few drivers and not all release resources during suspend. However, even if this is possible, it becomes an enforced policy that a user may not like. What happens if there is a real suspend? The right thing is for kvm to claim the device. It's conceptually the right thing; kvm _is_ the device driver for that device, through the guest it is running. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-1984384 ] soft lockup - CPU#5 stuck for 11s! [qemu-system-x86:4966]
Bugs item #1984384, was opened at 2008-06-04 13:49 Message generated for change (Comment added) made by ravpl You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1984384group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Rafal Wijata (ravpl) Assigned to: Nobody/Anonymous (nobody) Summary: soft lockup - CPU#5 stuck for 11s! [qemu-system-x86:4966] Initial Comment: I'm using kvm-69 running on Linux 2.6.24.7-92.fc8 #1 SMP Wed May 7 16:26:02 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux kvm modules loaded from kvm-69 rather than kernel provided My system almost freezed after I killed qemu process. I saw many, many tasks in 'D' state, along with [reiserfs/?] tasks. Normally I would consider it reiserfs bug(and maybe it is), but two things - it happened after qemu process was killed(running with 6cpus, 6G memory, 16G hdd placed on reiserfs placed on 200M/s hdd) - dmesg showed following messages(2 total), which suggest it stucked in kvm BUG: soft lockup - CPU#5 stuck for 11s! [qemu-system-x86:4966] CPU 5: Modules linked in: ipt_REJECT nf_conntrack_ipv4 iptable_filter ip_tables kvm_intel(U) kvm(U) tun nfs lockd nfs_acl autofs4 coretemp hwmon fuse sunrpc bridge xt_tcpudp nf_conntrack_ipv6 xt_state nf_conntrack ip6t_REJECT ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand acpi_cpufreq reiserfs ext2 dm_mirror dm_multipath dm_mod i5000_edac iTCO_wdt serio_raw pcspkr iTCO_vendor_support e1000 button edac_core i2c_i801 ata_piix i2c_core pata_acpi ata_generic sg usb_storage ahci libata shpchp 3w_9xxx sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd Pid: 4966, comm: qemu-system-x86 Not tainted 2.6.24.7-92.fc8 #1 RIP: 0010:[8834b29e] [8834b29e] :kvm:rmap_remove+0x170/0x198 RSP: 0018:8101f4df5bd8 EFLAGS: 0246 RAX: 0002 RBX: 81004294af60 RCX: RDX: RSI: 0106 RDI: 8101770448c0 RBP: 8101ce0454d0 R08: c20001b86030 R09: 8101d3587118 R10: 0019e7ea R11: 8101394dd9c0 R12: 8100240cece0 R13: R14: 0019e7ea R15: 0018 FS: () GS:81021f049580() knlGS: CS: 0010 DS: 002b ES: 002b CR0: 8005003b CR2: f7ff6000 CR3: 00021b5e5000 CR4: 26e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Call Trace: [8834b1dd] :kvm:rmap_remove+0xaf/0x198 [8834b372] :kvm:kvm_mmu_zap_page+0x8a/0x25e [8834b9f3] :kvm:free_mmu_pages+0x12/0x34 [8834bac9] :kvm:kvm_mmu_destroy+0x1d/0x5e [88346979] :kvm:kvm_arch_vcpu_uninit+0x1d/0x38 [8834555b] :kvm:kvm_vcpu_uninit+0x9/0x15 [88163aa8] :kvm_intel:vmx_free_vcpu+0x74/0x84 [8834657b] :kvm:kvm_arch_destroy_vm+0x69/0xb4 [88345538] :kvm:kvm_vcpu_release+0x13/0x18 [810a35d4] __fput+0xc2/0x18f [810a0de7] filp_close+0x5d/0x65 [8103b3df] put_files_struct+0x66/0xc4 [8103c6f7] do_exit+0x28c/0x76b [8103cc55] sys_exit_group+0x0/0xe [81044163] get_signal_to_deliver+0x3aa/0x3d8 [8100b359] do_notify_resume+0xa8/0x732 [8126b7f6] unlock_kernel+0x32/0x33 [881c01db] :reiserfs:reiserfs_setattr+0x26e/0x27d [810a1866] do_truncate+0x70/0x79 [8100bf17] sysret_signal+0x1c/0x27 [8100c1a7] ptregscall_common+0x67/0xb0 -- Comment By: Rafal Wijata (ravpl) Date: 2008-06-04 16:58 Message: Logged In: YES user_id=996150 Originator: YES In my case it recovered after a while ~2-3 minutes -- Comment By: Avi Kivity (avik) Date: 2008-06-04 16:33 Message: Logged In: YES user_id=539971 Originator: NO Did the system recover later? David, how much memory did you assign to the guest? -- Comment By: david ahern (dsahern) Date: 2008-06-04 16:20 Message: Logged In: YES user_id=1755596 Originator: NO I hit this issue yesterday as well. Host is running 2.6.26-rc3 from kvm.git, per-page-pte-tracking branch. At the time a VM had been up and running for ~24 hours, and I was installing another VM. The guest that had been running for ~24 hours terminated abruptly. [4776654.043860] BUG: soft lockup - CPU#0 stuck for 94s! [ksoftirqd/0:4] [4776654.043860] CPU 0: [4776654.043860] Modules linked in: tun bridge llc iptable_filter ip_tables x_tables kvm_intel kvm usbhid ahci ata_piix libata ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd usbcore [4776654.043860] Pid: 4, comm: ksoftirqd/0
[ kvm-Bugs-1984384 ] soft lockup - CPU#5 stuck for 11s! [qemu-system-x86:4966]
Bugs item #1984384, was opened at 2008-06-04 05:49 Message generated for change (Comment added) made by dsahern You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=1984384group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Rafal Wijata (ravpl) Assigned to: Nobody/Anonymous (nobody) Summary: soft lockup - CPU#5 stuck for 11s! [qemu-system-x86:4966] Initial Comment: I'm using kvm-69 running on Linux 2.6.24.7-92.fc8 #1 SMP Wed May 7 16:26:02 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux kvm modules loaded from kvm-69 rather than kernel provided My system almost freezed after I killed qemu process. I saw many, many tasks in 'D' state, along with [reiserfs/?] tasks. Normally I would consider it reiserfs bug(and maybe it is), but two things - it happened after qemu process was killed(running with 6cpus, 6G memory, 16G hdd placed on reiserfs placed on 200M/s hdd) - dmesg showed following messages(2 total), which suggest it stucked in kvm BUG: soft lockup - CPU#5 stuck for 11s! [qemu-system-x86:4966] CPU 5: Modules linked in: ipt_REJECT nf_conntrack_ipv4 iptable_filter ip_tables kvm_intel(U) kvm(U) tun nfs lockd nfs_acl autofs4 coretemp hwmon fuse sunrpc bridge xt_tcpudp nf_conntrack_ipv6 xt_state nf_conntrack ip6t_REJECT ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand acpi_cpufreq reiserfs ext2 dm_mirror dm_multipath dm_mod i5000_edac iTCO_wdt serio_raw pcspkr iTCO_vendor_support e1000 button edac_core i2c_i801 ata_piix i2c_core pata_acpi ata_generic sg usb_storage ahci libata shpchp 3w_9xxx sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd Pid: 4966, comm: qemu-system-x86 Not tainted 2.6.24.7-92.fc8 #1 RIP: 0010:[8834b29e] [8834b29e] :kvm:rmap_remove+0x170/0x198 RSP: 0018:8101f4df5bd8 EFLAGS: 0246 RAX: 0002 RBX: 81004294af60 RCX: RDX: RSI: 0106 RDI: 8101770448c0 RBP: 8101ce0454d0 R08: c20001b86030 R09: 8101d3587118 R10: 0019e7ea R11: 8101394dd9c0 R12: 8100240cece0 R13: R14: 0019e7ea R15: 0018 FS: () GS:81021f049580() knlGS: CS: 0010 DS: 002b ES: 002b CR0: 8005003b CR2: f7ff6000 CR3: 00021b5e5000 CR4: 26e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Call Trace: [8834b1dd] :kvm:rmap_remove+0xaf/0x198 [8834b372] :kvm:kvm_mmu_zap_page+0x8a/0x25e [8834b9f3] :kvm:free_mmu_pages+0x12/0x34 [8834bac9] :kvm:kvm_mmu_destroy+0x1d/0x5e [88346979] :kvm:kvm_arch_vcpu_uninit+0x1d/0x38 [8834555b] :kvm:kvm_vcpu_uninit+0x9/0x15 [88163aa8] :kvm_intel:vmx_free_vcpu+0x74/0x84 [8834657b] :kvm:kvm_arch_destroy_vm+0x69/0xb4 [88345538] :kvm:kvm_vcpu_release+0x13/0x18 [810a35d4] __fput+0xc2/0x18f [810a0de7] filp_close+0x5d/0x65 [8103b3df] put_files_struct+0x66/0xc4 [8103c6f7] do_exit+0x28c/0x76b [8103cc55] sys_exit_group+0x0/0xe [81044163] get_signal_to_deliver+0x3aa/0x3d8 [8100b359] do_notify_resume+0xa8/0x732 [8126b7f6] unlock_kernel+0x32/0x33 [881c01db] :reiserfs:reiserfs_setattr+0x26e/0x27d [810a1866] do_truncate+0x70/0x79 [8100bf17] sysret_signal+0x1c/0x27 [8100c1a7] ptregscall_common+0x67/0xb0 -- Comment By: david ahern (dsahern) Date: 2008-06-04 09:08 Message: Logged In: YES user_id=1755596 Originator: NO My host did not crash, only the guest. I actually was not aware it had gone down until I went to login. At that point I went digging through syslog to find out when it died (my control scripts log startup and shutdown). The host has not been rebooted, and I have not seen any problems starting guests. The guest that terminated has 2 cpus and 2GB of RAM and runs RHEL3 as the OS. The host has 6 GB of RAM. -- Comment By: Rafal Wijata (ravpl) Date: 2008-06-04 08:58 Message: Logged In: YES user_id=996150 Originator: YES In my case it recovered after a while ~2-3 minutes -- Comment By: Avi Kivity (avik) Date: 2008-06-04 08:33 Message: Logged In: YES user_id=539971 Originator: NO Did the system recover later? David, how much memory did you assign to the guest? -- Comment By: david ahern (dsahern) Date: 2008-06-04 08:20 Message: Logged In: YES user_id=1755596
Re: [PATCH] Handle vma regions with no backing page
On Tue, 2008-06-03 at 13:39 +0200, Andrea Arcangeli wrote: On Tue, Jun 03, 2008 at 02:17:55PM +0300, Ben-Ami Yassour wrote: Anthony Liguori [EMAIL PROTECTED] wrote on 04/29/2008 05:32:09 PM: Subject [PATCH] Handle vma regions with no backing page This patch allows VMA's that contain no backing page to be used for guest memory. This is a drop-in replacement for Ben-Ami's first page in his direct mmio series. Here, we continue to allow mmio pages to be represented in the rmap. struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn) { - return pfn_to_page(gfn_to_pfn(kvm, gfn)); + pfn_t pfn; + + pfn = gfn_to_pfn(kvm, gfn); + if (pfn_valid(pfn)) + return pfn_to_page(pfn); + + return NULL; } We noticed that pfn_valid does not always works as expected by this patch to indicate that a pfn has a backing page. We have seen a case where CONFIG_NUMA was not set and then where pfn_valid returned 1 for an mmio pfn. We then changed the config file with CONFIG_NUMA set and it worked fine as expected (since a different implementation of pfn_valid was used). How should we overcome this issue? There's a page_is_ram() too, but that's the e820 map check and it means it's RAM not that there's a page backing store. Certainly if it's not ram we should go ahead with just the pfn but it'd be a workaround. I really think it'd be better off to fix pfn_valid to work for NUMA. It does work for NUMA, it does not work without the NUMA option. I can't see how pfn_valid can be ok to return true when there's no backing page... Probably pfn_valid was used for debugging todate, but if you check vm_normal_page you'll see that it is not used just for debugging and it seems VM_MIXEDMAP will break as much as KVM. I can't see how VM_MIXEDMAP can be sane doing pfn_to_page(pfn) and pretending this is a normal page, when there's no 'struct page' backing the pfn. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] kvm: Batch writes to MMIO
Laurent Vivier wrote: When kernel has to send MMIO writes to userspace, it stores them in memory until it has to pass the hand to userspace for another reason. This avoids to have too many context switches on operations that can wait. These patches introduce an ioctl() to define MMIO allowed to be coalesced. This is the kernel part of the coalesced MMIO functionality. Applied all, thanks. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] KVM/userspace: Support for assigning PCI devices to guest
On Wednesday 04 June 2008 20:23:53 Avi Kivity wrote: Amit Shah wrote: I was thinking of putting the device in suspend state. However, I checked a few drivers and not all release resources during suspend. However, even if this is possible, it becomes an enforced policy that a user may not like. What happens if there is a real suspend? The right thing is for kvm to claim the device. It's conceptually the right thing; kvm _is_ the device driver for that device, through the guest it is running. Agreed. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Handle vma regions with no backing page
Avi Kivity wrote: Looks like we need to reintroduce a refcount bit in the pte, and check the page using the VMA. I don't think mucking with the VMA is going to help. We're already using the VMA to determine that the region is MMIO. What we need to be able to do is figure out, given a PFN, if that PFN is an MMIO page or not. Really what we're looking for is whether we have to release a reference to the page. I think it would be sufficient to change kvm_release_pfn_clean() to something like: void kvm_release_pfn_clean(pfn_t pfn) { struct page *page; if (!pfn_valid(pfn)) return; page = pfn_to_page(pfn); if (paeg_count(page)) put_page(page); } A couple other places need updating (like kvm_set_pfn_dirty()), but I think the general idea would work. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Clear CR4.VMXE in hardware_disable
On Wed, 4 Jun 2008, Avi Kivity wrote: Eli Collins wrote: Clear CR4.VMXE in hardware_disable. There's no reason to leave it set after doing a VMXOFF. VMware Workstation 6.5 checks CR4.VMXE as a proxy for whether the CPU is in VMX mode, so leaving VMXE set means we'll refuse to power on. With this change the user can power on after unloading the kvm-intel module. I tested on kvm-67 and kvm-69. Applied, thanks. The patch was whitespace-mangled though. Did you cut'n'paste into your email client? Thanks for the heads up. It's probably pine, I need to change some config options. I redirected the output of git diff to a file and then used pine's read file to include it. Btw, maybe of interest, I also tested that workstation and kvm-69 play nicely on SVM, you can run both simultaneously w/o issue (which was expected). Thanks, Eli -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Handle vma regions with no backing page
On Wed, Jun 04, 2008 at 06:09:24PM +0300, Ben-Ami Yassour1 wrote: We noticed that pfn_valid does not always works as expected by this patch to indicate that a pfn has a backing page. We have seen a case where CONFIG_NUMA was not set and then where pfn_valid returned 1 for an mmio pfn. We then changed the config file with CONFIG_NUMA set and it worked fine as expected (since a different implementation of pfn_valid was used). How should we overcome this issue? There's a page_is_ram() too, but that's the e820 map check and it means it's RAM not that there's a page backing store. Certainly if it's not ram we should go ahead with just the pfn but it'd be a workaround. I really think it'd be better off to fix pfn_valid to work for NUMA. It does work for NUMA, it does not work without the NUMA option. Andrea, how would you suggest to fix pfn_valid for the CONFIG_NUMA disabled case? Cheers, Muli -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Handle vma regions with no backing page
On Wed, Jun 04, 2008 at 07:17:55PM +0300, Muli Ben-Yehuda wrote: On Wed, Jun 04, 2008 at 06:09:24PM +0300, Ben-Ami Yassour1 wrote: We noticed that pfn_valid does not always works as expected by this patch to indicate that a pfn has a backing page. We have seen a case where CONFIG_NUMA was not set and then where pfn_valid returned 1 for an mmio pfn. We then changed the config file with CONFIG_NUMA set and it worked fine as expected (since a different implementation of pfn_valid was used). How should we overcome this issue? There's a page_is_ram() too, but that's the e820 map check and it means it's RAM not that there's a page backing store. Certainly if it's not ram we should go ahead with just the pfn but it'd be a workaround. I really think it'd be better off to fix pfn_valid to work for NUMA. It does work for NUMA, it does not work without the NUMA option. Andrea, how would you suggest to fix pfn_valid for the CONFIG_NUMA disabled case? I'm very surprised that this is broken for non-numa. To be sure I understand, what exactly means that pfn has not a backing page? I think we must fix pfn_valid, so that when it returns true, pfn_to_page returns an entry that is contained inside some mem_map array allocated by mm/page_alloc.c. pfn_valid is totally right to return true on mmio regions, as long as a 'struct page' exists. Like for example the 640k-1M area. It'd be a waste to pay the price of a discontiguous array to save a few struct pages (or perhaps these days they're inefficiently using 4k tlb entries to map struct page dunno, I surely prefer to waste a few struct pages and stay with 2M tlb). As long as a 'struct page' exists and pfn_to_page returns a 'struct page' checking PageReserved() should be enough to know if it's pageable RAM owned by the VM or an mmio region. I don't know why but when I did reserved-ram patch, the PageReserved check wasn't returning true on the reserved-ram. Perhaps it's because it was ram, dunno, so I had to use the page_count() hack. But for mmio !pfn_valid || PageReserved should work. Said that this mem_map code tends to change all the time for whatever reason, so I wouldn't be shocked if PageReserved doesn't actually work either. But mixing pfn_valid and PageReserved sounds the right objective. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Handle vma regions with no backing page
On Wed, Jun 04, 2008 at 02:41:20PM -0500, Anthony Liguori wrote: The pfn does have a backing page. When using CONFIG_FLATMEM, pfn_valid() is simply: #ifdef CONFIG_FLATMEM #define pfn_valid(pfn) ((pfn) end_pfn) #endif And this is true, pfn_valid() just indicates whether there is a space in mem_map[], and there certainly is. Note this only happens when there is a valid PFN that's greater than the PCI memory (using 4GB+ of memory). So everything is fine with pfn_valid. The check against end_pfn with flatmem is what I also the one I've looked while doing the reserved-ram patch. pfn_valid must only signal if pfn_to_page(pfn) returns garbage or a struct page (you can't call pfn_to_page if pfn_valid is 0). That's all. Dave mentioned that SetPageReserved() doesn't necessarily get called for zones with bad alignment. What does 'bad alignment' mean? Buddy was used to require each zone to start at 1MAX_ORDER naturally aligned physical address (any ram before the alignment was wasted). In any case all 'struct page' where pfn_valid would return true, should start with PG_reserved set, if it's not the case it should be fixed I guess. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Handle vma regions with no backing page
On Wed, 2008-06-04 at 21:51 +0200, Andrea Arcangeli wrote: Dave mentioned that SetPageReserved() doesn't necessarily get called for zones with bad alignment. What does 'bad alignment' mean? Buddy was used to require each zone to start at 1MAX_ORDER naturally aligned physical address (any ram before the alignment was wasted). In any case all 'struct page' where pfn_valid would return true, should start with PG_reserved set, if it's not the case it should be fixed I guess. I was thinking of the case where you have a large sparsemem section size, and a zone which is smaller. Say you have a 1GB SECTION_SIZE and a single 512MB zone. You'll get a true pfn_valid() for the 512MB that isn't in the zone since the mem_map[] is actually larger than the zone. But, since memmap_init_zone() is the one responsible for setting PageReserved(), I don't think anybody will set PageReserved() on that top 512MB. -- Dave -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Colour me impressed: 2.6.25, kvm-69, virtio_net
We've been using Xen 3.0 for the past 18-ish months on a couple boxes here and were quite impressed with it, especially when using the xen-tools package on Debian. But, it couldn't run FreeBSD or Windows guests on our existing hardware, so we bought shiny new Opteron 2000-series systems with SVM support. And then the problems began. Xen 3.1/3.2 broke too many things for us (especially in the network setup), and we could not get a stable, working, reliable dom0 let alone any domUs. Mainly due to our hardware requiring a newer Linux kernel than 2.6.18 (RAID controller and NIC weren't supported until 2.6.20). So we started testing KVM, originally with kvm-60 and kernel 2.6.22 on Debian Lenny. Since then, we've moved to kernel 2.6.24 and 2.6.25, along with kvm-69. We've been running Windows XP, FreeBSD, and Debian guests for the past few months with very few problems. Today, I managed to get a couple Linux guests to load using the virtio drivers in kernel 2.6.25. Colour me impressed! I thought the emulated e1000 interface had good performance: the network throughput of virtio_net (as tested using iperf) is wire-speed. I was able to saturate a gigabit link using iperf from a guest running with virtio_net! Average throughput was 860+ Mbps, with highs around 980 Mbps. That, combined with how easy it is to manage kvm (I wrote my own management scripts and config file format that is a lot easier to read than the Xen ones), configure networking in the host (done using the distro tools, not some arcane python scripts), and get hardware driver support in the host (standard distro kernels, not ancient xen-specific ones), makes it very hard to find reasons to run Xen. The only reason I can find, is if you have hardware that doesn't support VMX/SVM, but is supported by kernel 2.6.18, in which case Xen 3.0 works quite nicely (not 3.1 or later). Kudos to the kvm devs, the kernel devs, the qemu devs, and the rest who are involved in making KVM work so well! For those interested, I've put some iperf results into the KVM Wiki (http://kvm.qumranet.com/kvmwiki/Using_VirtIO_NIC), along with info on the management scripts/config file format I use (http://kvm.qumranet.com/kvmwiki/HowToConfigScript). Now, if only the FreeBSD port of KVM could be completed, so we could use ZFS in the host instead of LVM. ;) -- Freddie Cash [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Colour me impressed: 2.6.25, kvm-69, virtio_net
On Wednesday 04 June 2008 03:51:12 pm you wrote: snip Today, I managed to get a couple Linux guests to load using the virtio drivers in kernel 2.6.25. Colour me impressed! snip That, combined with how easy it is to manage kvm (I wrote my own management scripts and config file format that is a lot easier to read than the Xen ones), configure networking in the host (done using the distro tools, not some arcane python scripts), and get hardware driver support in the host (standard distro kernels, not ancient xen-specific ones), makes it very hard to find reasons to run Xen. The only reason I can find, is if you have hardware that doesn't support VMX/SVM, but is supported by kernel 2.6.18, in which case Xen 3.0 works quite nicely (not 3.1 or later). Kudos to the kvm devs, the kernel devs, the qemu devs, and the rest who are involved in making KVM work so well! I agree. I've been really impressed with KVM-69. It has worked very reliably. My story is very similar, except that I was using the free VMware Server (I couldn't justify the price tag for ESX). In short, KVM came to save the day and I get much better performance than I did with VMware. My setup is for a two-node cluster with DRBD and OCFS2. The ability to migrate VM's so quickly is wonderful. I too wrote my own scripts which I will share in a few months once I'm done fixing bugs. Yes, KVM is very easy to install, manage and use. It is even better when you write your own scripts. It's wonderful to be able to manage things in a way that best makes sense based on your experience and infrastructure. Thank you to the KVM team for all of your great work! -- Alberto Treviño [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] kvm with mmu notifier v18
Hello, this is an update of the patch to test kvm on mmu notifier v18. I'll post the mmu notifier v18 tomorrow after some more review but I can post the kvm side in the meantime (which works with the previous v17 as well if anyone wants to test). This has a relevant fix for kvm_unmap_rmapp: rmap_remove while deleting the current spte from the desc array, can overwrite the deleted current spte with the last spte in the desc array in turn reodering it. So if we restart rmap_next from the sptes after the deleted current spte, we may miss the later sptes that have been moved in the slot of the current spte. We've to teardown the whole desc array so the fix was to simply pick from the first entry and wait the others to come down. I also wonder if the update_pte done outside the mmu_lock is safe without mmu notifiers, or if the below changes are required regardless (I think they are). I cleaned up the fix but I probably need to extract it from this patch. Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED] diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index 8d45fab..ce3251c 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -21,6 +21,7 @@ config KVM tristate Kernel-based Virtual Machine (KVM) support depends on HAVE_KVM select PREEMPT_NOTIFIERS + select MMU_NOTIFIER select ANON_INODES ---help--- Support hosting fully virtualized guest machines using hardware diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 491f645..4dee9e5 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -651,6 +651,98 @@ static void rmap_write_protect(struct kvm *kvm, u64 gfn) account_shadowed(kvm, gfn); } +static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp) +{ + u64 *spte; + int need_tlb_flush = 0; + + while ((spte = rmap_next(kvm, rmapp, NULL))) { + BUG_ON(!(*spte PT_PRESENT_MASK)); + rmap_printk(kvm_rmap_unmap_hva: spte %p %llx\n, spte, *spte); + rmap_remove(kvm, spte); + set_shadow_pte(spte, shadow_trap_nonpresent_pte); + need_tlb_flush = 1; + } + return need_tlb_flush; +} + +int kvm_unmap_hva(struct kvm *kvm, unsigned long hva) +{ + int i; + int need_tlb_flush = 0; + + /* +* If mmap_sem isn't taken, we can look the memslots with only +* the mmu_lock by skipping over the slots with userspace_addr == 0. +*/ + for (i = 0; i kvm-nmemslots; i++) { + struct kvm_memory_slot *memslot = kvm-memslots[i]; + unsigned long start = memslot-userspace_addr; + unsigned long end; + + /* mmu_lock protects userspace_addr */ + if (!start) + continue; + + end = start + (memslot-npages PAGE_SHIFT); + if (hva = start hva end) { + gfn_t gfn_offset = (hva - start) PAGE_SHIFT; + need_tlb_flush |= kvm_unmap_rmapp(kvm, + memslot-rmap[gfn_offset]); + } + } + + return need_tlb_flush; +} + +static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp) +{ + u64 *spte; + int young = 0; + + spte = rmap_next(kvm, rmapp, NULL); + while (spte) { + int _young; + u64 _spte = *spte; + BUG_ON(!(_spte PT_PRESENT_MASK)); + _young = _spte PT_ACCESSED_MASK; + if (_young) { + young = !!_young; + set_shadow_pte(spte, _spte ~PT_ACCESSED_MASK); + } + spte = rmap_next(kvm, rmapp, spte); + } + return young; +} + +int kvm_age_hva(struct kvm *kvm, unsigned long hva) +{ + int i; + int young = 0; + + /* +* If mmap_sem isn't taken, we can look the memslots with only +* the mmu_lock by skipping over the slots with userspace_addr == 0. +*/ + for (i = 0; i kvm-nmemslots; i++) { + struct kvm_memory_slot *memslot = kvm-memslots[i]; + unsigned long start = memslot-userspace_addr; + unsigned long end; + + /* mmu_lock protects userspace_addr */ + if (!start) + continue; + + end = start + (memslot-npages PAGE_SHIFT); + if (hva = start hva end) { + gfn_t gfn_offset = (hva - start) PAGE_SHIFT; + young |= kvm_age_rmapp(kvm, memslot-rmap[gfn_offset]); + } + } + + return young; +} + #ifdef MMU_DEBUG static int is_empty_shadow_page(u64 *spt) { @@ -1203,6 +1295,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, int write, gfn_t gfn) int r; int largepage = 0; pfn_t pfn; + int mmu_seq; down_read(current-mm-mmap_sem);
KVM: IOAPIC: don't clear remote_irr if IRQ is reinjected from EOI
There's a bug in the IOAPIC code for level-triggered interrupts. Its relatively easy to trigger by sharing (virtio-blk + usbtablet was the testcase, initially reported by Gerd von Egidy). The remote_irr variable is used to indicate accepted but not yet acked interrupts. Its cleared from the EOI handler. Problem is that the EOI handler clears remote_irr unconditionally, even if it reinjected another pending interrupt. In that case, kvm_ioapic_set_irq() proceeds to ioapic_service() which sets remote_irr even if it failed to inject (since the IRR was high due to EOI reinjection). Since the TMR bit has been cleared by the first EOI, the second one fails to clear remote_irr. End result is interrupt line dead. Fix it by setting remote_irr only if a new pending interrupt has been generated (and the TMR bit for vector in question set). Signed-off-by: Marcelo Tosatti [EMAIL PROTECTED] diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index 99a1736..92ed191 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -45,7 +45,7 @@ #else #define ioapic_debug(fmt, arg...) #endif -static void ioapic_deliver(struct kvm_ioapic *vioapic, int irq); +static int ioapic_deliver(struct kvm_ioapic *vioapic, int irq); static unsigned long ioapic_read_indirect(struct kvm_ioapic *ioapic, unsigned long addr, @@ -89,8 +89,8 @@ static void ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx) pent = ioapic-redirtbl[idx]; if (!pent-fields.mask) { - ioapic_deliver(ioapic, idx); - if (pent-fields.trig_mode == IOAPIC_LEVEL_TRIG) + int injected = ioapic_deliver(ioapic, idx); + if (injected pent-fields.trig_mode == IOAPIC_LEVEL_TRIG) pent-fields.remote_irr = 1; } if (!pent-fields.trig_mode) @@ -133,7 +133,7 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val) } } -static void ioapic_inj_irq(struct kvm_ioapic *ioapic, +static int ioapic_inj_irq(struct kvm_ioapic *ioapic, struct kvm_vcpu *vcpu, u8 vector, u8 trig_mode, u8 delivery_mode) { @@ -143,7 +143,7 @@ static void ioapic_inj_irq(struct kvm_ioapic *ioapic, ASSERT((delivery_mode == IOAPIC_FIXED) || (delivery_mode == IOAPIC_LOWEST_PRIORITY)); - kvm_apic_set_irq(vcpu, vector, trig_mode); + return kvm_apic_set_irq(vcpu, vector, trig_mode); } static void ioapic_inj_nmi(struct kvm_vcpu *vcpu) @@ -191,7 +191,7 @@ static u32 ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest, return mask; } -static void ioapic_deliver(struct kvm_ioapic *ioapic, int irq) +static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq) { u8 dest = ioapic-redirtbl[irq].fields.dest_id; u8 dest_mode = ioapic-redirtbl[irq].fields.dest_mode; @@ -200,7 +200,7 @@ static void ioapic_deliver(struct kvm_ioapic *ioapic, int irq) u8 trig_mode = ioapic-redirtbl[irq].fields.trig_mode; u32 deliver_bitmask; struct kvm_vcpu *vcpu; - int vcpu_id; + int vcpu_id, r = 0; ioapic_debug(dest=%x dest_mode=%x delivery_mode=%x vector=%x trig_mode=%x\n, @@ -209,7 +209,7 @@ static void ioapic_deliver(struct kvm_ioapic *ioapic, int irq) deliver_bitmask = ioapic_get_delivery_bitmask(ioapic, dest, dest_mode); if (!deliver_bitmask) { ioapic_debug(no target on destination\n); - return; + return 0; } switch (delivery_mode) { @@ -221,7 +221,7 @@ static void ioapic_deliver(struct kvm_ioapic *ioapic, int irq) vcpu = ioapic-kvm-vcpus[0]; #endif if (vcpu != NULL) - ioapic_inj_irq(ioapic, vcpu, vector, + r = ioapic_inj_irq(ioapic, vcpu, vector, trig_mode, delivery_mode); else ioapic_debug(null lowest prio vcpu: @@ -239,7 +239,7 @@ static void ioapic_deliver(struct kvm_ioapic *ioapic, int irq) deliver_bitmask = ~(1 vcpu_id); vcpu = ioapic-kvm-vcpus[vcpu_id]; if (vcpu) { - ioapic_inj_irq(ioapic, vcpu, vector, + r = ioapic_inj_irq(ioapic, vcpu, vector, trig_mode, delivery_mode); } } @@ -262,6 +262,7 @@ static void ioapic_deliver(struct kvm_ioapic *ioapic, int irq) delivery_mode); break; } + return r; } void kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at
Re: kvm: unable to handle kernel NULL pointer dereference
* Tobias Diedrich ([EMAIL PROTECTED]) wrote: BUG: unable to handle kernel NULL pointer dereference at 0008 IP: [8021d44f] svm_vcpu_run+0x34/0x351 PGD 7e01b067 PUD 7bc86067 PMD 0 Oops: [1] PREEMPT CPU 0 Modules linked in: zaurus cdc_ether usbnet snd_hda_intel k8temp radeon drm snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_emu10k1 snd_seq_midi snd_rawmidi snd_ac97_codec ac97_bus snd_util_mem forcedeth emu10k1_gp gameport snd_hwdep pata_amd [last unloaded: snd_hda_intel] Pid: 3, comm: kvm Tainted: GW 2.6.26-rc4 #29 RIP: 0010:[8021d44f] [8021d44f] svm_vcpu_run+0x34/0x351 RSP: 0018:81007866fc38 EFLAGS: 00010046 RAX: 810076d42040 RBX: fffc RCX: RDX: 810076d42040 RSI: 810079b41000 RDI: 810076d42040 RBP: 81007866fc88 R08: 0002 R09: 0001 R10: 804237e5 R11: 81007866fc88 R12: 810076d42040 R13: R14: 810079b41000 R15: ae80 FS: 419b1950(0063) GS:808bc000() knlGS:f712b6c0 CS: 0010 DS: ES: CR0: 8005003b CR2: 0008 CR3: 79b8d000 CR4: 06e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process kvm (pid: 3, threadinfo 81007866e000, task 810019db8300) Stack: 81007866fc68 810076d42040 810076d42040 81007bc600a8 810076d42040 fffc 810076d42040 810079b41000 ae80 81007866fcc8 8020fa41 Call Trace: [8020fa41] kvm_arch_vcpu_ioctl_run+0x46a/0x6df [8020ab98] kvm_vcpu_ioctl+0xfd/0x3d0 [80293df1] ? kmem_cache_free+0x6e/0x81 [8024cf89] ? __dequeue_signal+0x1c/0x167 [802a322e] vfs_ioctl+0x2a/0x77 [802a34d6] do_vfs_ioctl+0x25b/0x270 [802a352d] sys_ioctl+0x42/0x65 [8021fffb] system_call_after_swapgs+0x7b/0x80 Code: 55 41 54 53 48 83 ec 28 48 89 7d b8 48 8b 87 50 15 00 00 48 8b 0d ba 9c 6f 00 c6 40 5c 00 48 8b 45 b8 83 b8 a0 00 00 00 00 75 0d 48 8b 51 08 48 39 90 68 15 00 00 74 4f 8b 41 14 3b 41 10 76 1a RIP [8021d44f] svm_vcpu_run+0x34/0x351 Odd, svm_data is NULL, so svm_data-asid_generation is oopsing. static void pre_svm_run(struct vcpu_svm *svm) { int cpu = raw_smp_processor_id(); struct svm_cpu_data *svm_data = per_cpu(svm_data, cpu); svm-vmcb-control.tlb_ctl = TLB_CONTROL_DO_NOTHING; if (svm-vcpu.cpu != cpu || svm-asid_generation != svm_data-asid_generation) --- here --- new_asid(svm, svm_data); } Doesn't really make any sense to find svm_data == NULL, since it's allocated during module init (or boot in this case). If that allocation failed, you shouldn't ever get as far as vcpu_run. I'm assuming that: gdb -q vmlinux (gdb) p/x 0x8021d456 + 0x6f9cba is the same as (gdb) p/x per_cpu__svm_data Otherwise, seems a bit like memory corruption (doesn't happen here w/ your .config). thanks, -chris -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 0/2] virtio-blk async IO (v3)
Resending the virtio-blk async patches, now that the reason for Gerd's hangs are known. The above results are on host hot-cached data, cold-cache data workloads are the real winners, where the current code waits until each read request is finished before submitting the next one. ide: Version 1.03 --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP localhost.locald 2G 36618 88 107274 31 80361 49 44311 98 292436 94 + +++ virtio-blk: Version 1.03 --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP localhost.locald 2G 39738 89 108389 31 82313 48 43943 98 290500 94 + +++ virtio-blk-async: Version 1.03 --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP localhost.locald 2G 40781 92 102806 34 86887 36 44339 97 347461 78 + +++ -- -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 1/2] QEMU/KVM: provide a reset method for virtio
So drivers can do whatever necessary on reset. Signed-off-by: Marcelo Tosatti [EMAIL PROTECTED] Index: kvm-userspace.vblk/qemu/hw/virtio.c === --- kvm-userspace.vblk.orig/qemu/hw/virtio.c +++ kvm-userspace.vblk/qemu/hw/virtio.c @@ -207,6 +207,9 @@ void virtio_reset(void *opaque) VirtIODevice *vdev = opaque; int i; +if (vdev-reset) +vdev-reset(vdev); + vdev-features = 0; vdev-queue_sel = 0; vdev-status = 0; Index: kvm-userspace.vblk/qemu/hw/virtio.h === --- kvm-userspace.vblk.orig/qemu/hw/virtio.h +++ kvm-userspace.vblk/qemu/hw/virtio.h @@ -116,6 +116,7 @@ struct VirtIODevice uint32_t (*get_features)(VirtIODevice *vdev); void (*set_features)(VirtIODevice *vdev, uint32_t val); void (*update_config)(VirtIODevice *vdev, uint8_t *config); +void (*reset)(VirtIODevice *vdev); VirtQueue vq[VIRTIO_PCI_QUEUE_MAX]; }; -- -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html