Re: [kvm-devel] [Qemu-devel] [PATCH 1/5] PCI DMA API (v3)
On 4/19/08, Anthony Liguori [EMAIL PROTECTED] wrote: Blue Swirl wrote: On 4/17/08, Anthony Liguori [EMAIL PROTECTED] wrote: Yes, the vector version of packet receive is tough. I'll take a look at your patch. Basically, you need to associate a set of RX vectors with each VLANClientState and then when it comes time to deliver a packet to the VLAN, before calling fd_read, see if there is an RX vector available for the client. In the case of tap, I want to optimize further and do the initial readv() to one of the clients RX buffers and then copy that RX buffer to the rest of the clients if necessary. The vector versions should also help SLIRP to add IP and Ethernet headers to the incoming packets. Yeah, I'm hoping that with my posted linux-aio interface, I can add vector support since linux-aio has a proper asynchronous vector function. Are we happy with the DMA API? If so, we should commit it now so we can start adding proper vector interfaces for net/block. Well, the IOVector part and bdrv_readv look OK, except for the heavy mallocing involved. I'm not so sure about the DMA side and how everything fits together for zero-copy IO. For example, do we still need explicit translation at some point? - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] [POWERPC KVM] Kconfig fixes
Hollis Blanchard wrote: 1 file changed, 5 insertions(+), 6 deletions(-) arch/powerpc/kvm/Kconfig | 11 +-- Don't allow building as a module (asm-offsets dependencies). Also, automatically select KVM_BOOKE_HOST until we better separate the guest and host layers. Applied, thanks. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] [QEMU POWERPC] FPRs no longer live in kvm_vcpu
Hollis Blanchard wrote: Applied, thanks. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] QEMU/KVM: clear HF_HALTED mask at vcpu startup time
Marcelo Tosatti wrote: Now that threads are spinned up before machine-init(), clearing of HF_HALTED_MASK for irqchip in kernel case needs to be moved to actual vcpu startup. Applied, thanks. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH] gfxboot VMX workaround v2
Anthony Liguori wrote: I'd prefer you not do an emulate_instruction loop at all. Just emulate one instruction on vmentry failure and let VT tell you what instructions you need to emulate. It's only four instructions so I don't think the performance is going to matter. Take a look at the patch I posted previously. Once we remove the other VT realmode hacks, we may need more instructions emulated. Consider for example changing to real mode without reloading fs and gs; this will cause all real mode code to be emulated. However, there's no need to do everything at once; the loop can certainly be added later when we have a proven need for it. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] KVM: MMU: kvm_pv_mmu_op should not take mmap_sem
Marcelo Tosatti wrote: kvm_pv_mmu_op should not take mmap_sem. All gfn_to_page() callers down in the MMU processing will take it if necessary, so as it is it can deadlock. Apparently a leftover from the days before slots_lock. Applied, thanks. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] Plunge in and navigate deep
Pump it all night long with our new winning formula that gives you the extra boost you need. http://www.tritwat.com/ - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 1/1] Enble a guest to access a device's memory mapped I/O regions directly.
Muli Ben-Yehuda wrote: Why avoid rmap on mmio pages? Sure it's unnecessary work, but having less cases improves overall reliability. The rmap functions already have a check to bail out if the pte is not an rmap pte, so in that sense, we aren't adding a new case for the code to handle, just adding direct MMIO ptes to the existing list of non-rmap ptes. I'm worried about the huge chain of direct_mmio parameters passed to functions, impact on the audit code (at the end of mmu.c, and the poor souls who debug the mmu. You can use pfn_valid() in gfn_to_pfn() and kvm_release_pfn_*() to conditionally update the page refcounts. Since rmap isn't useful for direct MMIO ptes, doesn't it make more sense to bail out early rather than in the bowls of the rmap code? It does, from a purist point of view (which also favors explicit parameters a la direct_mmio rather than indirect parameters like pfn_valid()), but I'm looking from the practical point of view now. With mmu notifiers, we don't need to hold the refcount at all. So presuming we drop the refcounting code completely, are any changes actually necessary here? -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] Ubuntu Gutsy host / XP guest / -smp 2
David Abrahams wrote: Versions of kvm producing this sort of output are common in archaeological digs. Please try a more recent release. Well, I'll try Hardy Heron soon enough, I suppose. It's due out in 2 weeks. I'm sure you understand that most people can't afford to rebuild all their important software so that it stays on the bleeding edge. Have you considered getting more recent versions of kvm into the updates or backports repositories of major distros? I'm not really sure how much influence you can have over such things; I'm just asking. That's up to the distro maintainers, or concerned users (who may either volunteer work or apply pressure). What HAL do you see in device manager? Standard PC This HAL does not support SMP. You need the ACPI Multiprocessor PC HAL or some such. And how would I get that HAL set up? Follow http://kvm.qumranet.com/kvmwiki/Windows_ACPI_Workaround, substituting your desired HAL for Standard PC. Unless you have a recent Intel processor, the combination of SMP and Windows XP will give noticeably lower performance. I recommend sticking with uniprocessor in such cases. I have a Core Duo; isn't that recent enough? No, this feature is present only on some of the Core 2s, IIRC. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [patch 00/13] RFC: split the global mutex
Marcelo Tosatti wrote: Introduce QEMUDevice, making the ioport/iomem-device relationship visible. At the moment it only contains a lock, but could be extended. With it the following is possible: - vcpu's to read/write via ioports/iomem while the iothread is working on some unrelated device, or just copying data from the kernel. - vcpu's to read/write via ioports/iomem to different devices simultaneously. This patchset is only a proof of concept kind of thing, so only serial+raw image are supported. Tried two benchmarks, iperf and tiobench. With tiobench the reported latency is significantly lower (20%+), but throughput with IDE is only slightly higher. Expect to see larger improvements with a higher performing IO scheme (SCSI still buggy, looking at it). The iperf numbers are pretty good. Performance of UP guests increase slightly but SMP is quite significant. I expect you're seeing contention induced by memcpy()s and inefficient emulation. With the dma api, I expect the benefit will drop. Note that workloads with multiple busy devices (such as databases, web servers) should be the real winners. What is the feeling on this? Its not _that_ intrusive and can be easily NOP'ed out for QEMU. I think many parts are missing (or maybe, I missed them). You need to lock the qemu internals (there are many read-mostly qemu caches scattered around the code), lock against hotplug, etc. For pure cpu emulation, there is a ton of work to be done: protecting the translator as well as making the translated code smp safe. I think that QemuDevice makes sense, and that we want this long term, but that we first need to improve efficiency (which reduces cpu utilization _and_ improves scalability) rather than look at scalability alone (which is much harder in addition to the drawback of not reducing cpu utilization). -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 3/6] KVM: MMU: Add EPT support
On Friday 18 April 2008 23:54:04 Anthony Liguori wrote: Yang, Sheng wrote: On Friday 18 April 2008 21:30:14 Anthony Liguori wrote: Yang, Sheng wrote: @@ -1048,17 +1071,18 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte, * whether the guest actually used the pte (in order to detect * demand paging). */ - spte = PT_PRESENT_MASK | PT_DIRTY_MASK; + spte = shadow_base_present_pte | shadow_dirty_mask; if (!speculative) pte_access |= PT_ACCESSED_MASK; if (!dirty) pte_access = ~ACC_WRITE_MASK; - if (!(pte_access ACC_EXEC_MASK)) - spte |= PT64_NX_MASK; - - spte |= PT_PRESENT_MASK; + if (pte_access ACC_EXEC_MASK) { + if (shadow_x_mask) + spte |= shadow_x_mask; + } else if (shadow_nx_mask) + spte |= shadow_nx_mask; This looks like it may be a bug. The old behavior sets NX if (pte_access ACC_EXEC_MASK). The new behavior unconditionally sets NX and never sets PRESENT. Also, the if (shadow_x_mas k) checks are unnecessary. spte |= 0 is a nop. Thanks for the comment! I realized two judgments of shadow_nx/x_mask is unnecessary... In fact, the correct behavior is either set shadow_x_mask or shadow_nx_mask, may be there is a better approach for this. The logic assured by program itself is always safer. But I will remove the redundant code at first. But I don't think it's a bug. The old behavior set NX if (!(pte_access ACC_EXEC_MASK)), the same as the new one. The new behavior sets NX regardless of whether (pte_access ACC_EXEC_MASK). Is the desired change to unconditionally set NX? Oh, I may see the point... shadow_x_mask != shadow_nx_mask. the old behavior was: if (!(pte_access ACC_EXEC_MASK)) spte |= PT64_NX_MASK; the new behavior is: if (pte_access ACC_EXEC_MASK) { spte |= shadow_x_mask; } else spte |= shadow_nx_mask; For current behavior, kvm_arch_init() got: kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK, PT_DIRTY_MASK, PT64_NX_MASK, 0); which means shadow_nx_mask = PT64_NX_MASK, and shadow_x_mask = 0 (NX means not executable, and X means executable). In patch 5/6, EPT got: kvm_mmu_set_mask_ptes(0ull, VMX_EPT_FAKE_ACCESSED_MASK, VMX_EPT_FAKE_DIRTY_MASK, 0ull, VMX_EPT_EXECUTABLE_MASK); which means, shadow_nx_mask = 0, and shadow_x_mask = VMX_EPT_EXECUTABLE_MASK So, when shadow enabled, and (!(pte_access ACC_EXEC_MASK)), then spte |= shadow_nx_mask = PT64_NX_MASK (no change would happen when the condition is not satisfied). When EPT enabled, and (pte_access ACC_EXEC_MASK), then spte |= shadow_x_mask = VMX_EPT_EXECUTABLE_MASK (no change would happen when condition is not satisfied). They are two different bit and mutual exclusive ones. Maybe there are some better way to get their meaning more clearly... And I also curious about the PRESENT bit. You see, the PRESENT bit was set at the beginning of the code, and I really don't know why the duplicate one exists there... Looking at the code, you appear to be right. In the future, I think you should separate any cleanups (like removing the redundant setting of PRESENT) into a separate patch and stick to just programmatic changes of PT_USER_MASK = shadow_user_mask, etc. in this patch. That makes it a lot easier to review correctness. Thanks for the advice, it's important to separate the cleanups. I will get it done more properly next time. -- Thanks Yang, Sheng Regards, Anthony Liguori if (pte_access ACC_USER_MASK) - spte |= PT_USER_MASK; + spte |= shadow_user_mask; if (largepage) spte |= PT_PAGE_SIZE_MASK; - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [Qemu-devel] Re: [PATCH 1/3] Refactor AIO interface to allow other AIO implementations
Avi Kivity wrote: For the majority of deployments posix aio should be sufficient. The few that need something else can use Linux aio. Does that mean for the majority of deployments, the slow version is sufficient. The few that care about performance can use Linux AIO? I'm under the impression that the entire and only point of Linux AIO is that it's faster than POSIX AIO on Linux. Of course, a managed environment can use Linux aio unconditionally if knows the kernel has all the needed goodies. Does that mean a managed environment can have some code which check the host kernel version + filesystem type holding the VM image, to conditionally enable Linux AIO? (Since if you care about performance, which is the sole reason for using Linux AIO, you wouldn't want to enable Linux AIO on any host in your cluster where it will trash performance.) Just wondering. Thanks, -- Jamie - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [Qemu-devel] Re: [PATCH 1/3] Refactor AIO interface to allow other AIO implementations
Jamie Lokier wrote: Avi Kivity wrote: For the majority of deployments posix aio should be sufficient. The few that need something else can use Linux aio. Does that mean for the majority of deployments, the slow version is sufficient. The few that care about performance can use Linux AIO? In essence, yes. s/slow/slower/ and s/performance/ultimate block device performance/. Many deployments don't care at all about block device performance; they care mostly about networking performance. I'm under the impression that the entire and only point of Linux AIO is that it's faster than POSIX AIO on Linux. It is. I estimate posix aio adds a few microseconds above linux aio per I/O request, when using O_DIRECT. Assuming 10 microseconds, you will need 10,000 I/O requests per second per vcpu to have a 10% performance difference. That's definitely rare. Of course, a managed environment can use Linux aio unconditionally if knows the kernel has all the needed goodies. Does that mean a managed environment can have some code which check the host kernel version + filesystem type holding the VM image, to conditionally enable Linux AIO? (Since if you care about performance, which is the sole reason for using Linux AIO, you wouldn't want to enable Linux AIO on any host in your cluster where it will trash performance.) Either that, or mandate that all hosts use a filesystem and kernel which provide the necessary performance. Take ovirt for example, which provides the entire hypervisor environment, and so can guarantee this. Also, I'd presume that those that need 10K IOPS and above will not place their high throughput images on a filesystem; rather on a separate SAN LUN. Just wondering. Hope this clarifies. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [patch 00/13] RFC: split the global mutex
On Sun, Apr 20, 2008 at 02:16:52PM +0300, Avi Kivity wrote: The iperf numbers are pretty good. Performance of UP guests increase slightly but SMP is quite significant. I expect you're seeing contention induced by memcpy()s and inefficient emulation. With the dma api, I expect the benefit will drop. You still have to memcpy() with the dma api. Even with vringfd the kernel-user copy has to be performed under the global mutex protection, difference being that several packets can be copied per-syscall instead of only one. Note that workloads with multiple busy devices (such as databases, web servers) should be the real winners. What is the feeling on this? Its not _that_ intrusive and can be easily NOP'ed out for QEMU. I think many parts are missing (or maybe, I missed them). You need to lock the qemu internals (there are many read-mostly qemu caches scattered around the code), lock against hotplug, etc. Yes, there are some parts missing, such as the bh list and hotplug as you mention. For pure cpu emulation, there is a ton of work to be done: protecting the translator as well as making the translated code smp safe. I now believe there is a lot of work (which was not clear before). Not particularly interested in getting real emulation to be multithreaded. Anyways, the lack of multithreading in qemu emulation should not be a blocker for these patches to get in, since these are infrastructural changes. I think that QemuDevice makes sense, and that we want this long term, but that we first need to improve efficiency (which reduces cpu utilization _and_ improves scalability) rather than look at scalability alone (which is much harder in addition to the drawback of not reducing cpu utilization). Will complete the QEMUDevice+splitlock patchset, keep it uptodated, and test it under a wider variety of workloads. Thanks. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [Qemu-devel] Re: [PATCH 1/3] Refactor AIO interface to allow other AIO implementations
On Sunday 20 April 2008, Avi Kivity wrote: Also, I'd presume that those that need 10K IOPS and above will not place their high throughput images on a filesystem; rather on a separate SAN LUN. i think that too; but still that LUN would be accessed by the VM's via one of these IO emulation layers, right? or maybe you're advocating using the SAN initiator in the VM instead of the host? -- Javier signature.asc Description: This is a digitally signed message part. - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [RFC] linuxboot Option ROM for Linux kernel booting
Hi, This should be submitted to upstream (but not to kvm-devel list), but this is only the test code that I want to quickly send out for comments. In case it looks OK, I will send it to upstream later. Inspired by extboot and conversations with Anthony and HPA, this linuxboot option ROM is a simple option ROM that intercepts int19 in order to execute linux setup code. This approach eliminates the need to manipulate the boot sector for this purpose. To test it, just load linux kernel with your KVM/QEMU image using -kernel option in normal way. I succesfully compiled and tested it with kvm-66 on Ubuntu 7.10, guest Ubuntu 8.04. Thanks, Quynh # diffstat linuxboot1.diff Makefile | 13 - linuxboot/Makefile | 40 +++ linuxboot/boot.S | 54 + linuxboot/farvar.h | 130 +++ linuxboot/rom.c | 104 linuxboot/signrom|binary linuxboot/signrom.c | 128 ++ linuxboot/util.h | 69 +++ qemu/Makefile|3 - qemu/Makefile.target |2 qemu/hw/linuxboot.c | 39 +++ qemu/hw/pc.c | 22 +++- qemu/hw/pc.h |5 + 13 files changed, 600 insertions(+), 9 deletions(-) commit f4f1178898c8a4bbbc0a432354dbcc56353099c3 Author: Nguyen Anh Quynh [EMAIL PROTECTED] Date: Mon Apr 21 12:27:47 2008 +0900 Linuxboot Option ROM support. Signed-off-by: Nguyen Anh Quynh [EMAIL PROTECTED] diff --git a/Makefile b/Makefile index 76c149a..fdd9388 100644 --- a/Makefile +++ b/Makefile @@ -5,7 +5,7 @@ DESTDIR= rpmrelease = devel -.PHONY: kernel user libkvm qemu bios vgabios extboot clean libfdt +.PHONY: kernel user libkvm qemu bios vgabios extboot linuxboot clean libfdt all: libkvm qemu ifneq '$(filter $(ARCH), x86_64 i386 ia64)' '' @@ -19,7 +19,7 @@ qemu kernel user libkvm: qemu: libkvm ifneq '$(filter $(ARCH), i386 x86_64)' '' -qemu: extboot +qemu: extboot linuxboot endif ifneq '$(filter $(ARCH), powerpc)' '' qemu: libfdt @@ -41,6 +41,14 @@ extboot: || ! cmp -s qemu/pc-bios/extboot.bin extboot/extboot.bin; then \ cp extboot/extboot.bin qemu/pc-bios/extboot.bin; \ fi + +linuxboot: + $(MAKE) -C $@ + if ! [ -f qemu/pc-bios/linuxboot.bin ] \ + || ! cmp -s qemu/pc-bios/linuxboot.bin linuxboot/linuxboot.bin; then \ + cp linuxboot/linuxboot.bin qemu/pc-bios/linuxboot.bin; \ + fi + libfdt: $(MAKE) -C $@ @@ -88,6 +96,7 @@ srpm: tar czf $(RPMTOPDIR)/SOURCES/kernel.tar.gz kernel tar czf $(RPMTOPDIR)/SOURCES/scripts.tar.gz scripts tar czf $(RPMTOPDIR)/SOURCES/extboot.tar.gz extboot + tar czf $(RPMTOPDIR)/SOURCES/linuxboot.tar.gz linuxboot cp Makefile configure kvm_stat $(RPMTOPDIR)/SOURCES rpmbuild --define=_topdir $(RPMTOPDIR) -bs $(tmpspec) $(RM) $(tmpspec) diff --git a/linuxboot/Makefile b/linuxboot/Makefile new file mode 100644 index 000..3bc88a6 --- /dev/null +++ b/linuxboot/Makefile @@ -0,0 +1,40 @@ +# Makefile for linuxboot Option ROM +# Nguyen Anh Quynh [EMAIL PROTECTED] + +CC = gcc +CCFLAGS = -g -Wall -Werror -nostdlib -fno-builtin -fomit-frame-pointer -Os + +cc-option = $(shell if test -z `$(1) $(2) -S -o /dev/null -xc \ + /dev/null 21`; then echo $(2); else echo $(3); fi ;) +CCFLAGS += $(call cc-option,$(CC),-nopie,) +CCFLAGS += $(call cc-option,$(CC),-fno-stack-protector,) +CCFLAGS += $(call cc-option,$(CC),-fno-stack-protector-all,) + +INSTALLDIR = /usr/share/qemu + +.PHONY: all +all: clean linuxboot.bin + +.PHONY: install +install: linuxboot.bin + cp linuxboot.bin $(INSTALLDIR) + +.PHONY: clean +clean: + $(RM) *.o *.img *.bin signrom *~ + +linuxboot.img: boot.o rom.o + $(LD) --oformat binary -Ttext 0 $^ -o $@ + +linuxboot.bin: linuxboot.img signrom + ./signrom linuxboot.img linuxboot.bin + +signrom: signrom.c + $(CC) -o $@ -g -Wall $^ + +%.o: %.c + $(CC) $(CCFLAGS) -c $ + +%.o: %.S + $(CC) $(CCFLAGS) -c $ + diff --git a/linuxboot/boot.S b/linuxboot/boot.S new file mode 100644 index 000..a9461d6 --- /dev/null +++ b/linuxboot/boot.S @@ -0,0 +1,54 @@ +/* + * boot.S + * Linux Boot Option ROM for QEMU. + + * Copyright (C) by Nguyen Anh Quynh [EMAIL PROTECTED], 2008. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, 51 Franklin Street, Fifth
Re: [kvm-devel] [RFC] linuxboot Option ROM for Linux kernel booting
Forget to say that this patch is against kvm-66. Thanks, Q On Mon, Apr 21, 2008 at 12:32 PM, Nguyen Anh Quynh [EMAIL PROTECTED] wrote: Hi, This should be submitted to upstream (but not to kvm-devel list), but this is only the test code that I want to quickly send out for comments. In case it looks OK, I will send it to upstream later. Inspired by extboot and conversations with Anthony and HPA, this linuxboot option ROM is a simple option ROM that intercepts int19 in order to execute linux setup code. This approach eliminates the need to manipulate the boot sector for this purpose. To test it, just load linux kernel with your KVM/QEMU image using -kernel option in normal way. I succesfully compiled and tested it with kvm-66 on Ubuntu 7.10, guest Ubuntu 8.04. Thanks, Quynh # diffstat linuxboot1.diff Makefile | 13 - linuxboot/Makefile | 40 +++ linuxboot/boot.S | 54 + linuxboot/farvar.h | 130 +++ linuxboot/rom.c | 104 linuxboot/signrom|binary linuxboot/signrom.c | 128 ++ linuxboot/util.h | 69 +++ qemu/Makefile|3 - qemu/Makefile.target |2 qemu/hw/linuxboot.c | 39 +++ qemu/hw/pc.c | 22 +++- qemu/hw/pc.h |5 + 13 files changed, 600 insertions(+), 9 deletions(-) - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [RFC] linuxboot Option ROM for Linux kernel booting
Hmm, the last patch includes a binary. So please take this patch instead. Thanks, Q # diffstat linuxboot1.diff Makefile | 13 - linuxboot/Makefile | 40 +++ linuxboot/boot.S | 54 + linuxboot/farvar.h | 130 +++ linuxboot/rom.c | 104 linuxboot/signrom.c | 128 ++ linuxboot/util.h | 69 +++ qemu/Makefile|3 - qemu/Makefile.target |2 qemu/hw/linuxboot.c | 39 +++ qemu/hw/pc.c | 22 +++- qemu/hw/pc.h |5 + 12 files changed, 600 insertions(+), 9 deletions(-) On Mon, Apr 21, 2008 at 12:33 PM, Nguyen Anh Quynh [EMAIL PROTECTED] wrote: Forget to say that this patch is against kvm-66. Thanks, Q On Mon, Apr 21, 2008 at 12:32 PM, Nguyen Anh Quynh [EMAIL PROTECTED] wrote: Hi, This should be submitted to upstream (but not to kvm-devel list), but this is only the test code that I want to quickly send out for comments. In case it looks OK, I will send it to upstream later. Inspired by extboot and conversations with Anthony and HPA, this linuxboot option ROM is a simple option ROM that intercepts int19 in order to execute linux setup code. This approach eliminates the need to manipulate the boot sector for this purpose. To test it, just load linux kernel with your KVM/QEMU image using -kernel option in normal way. I succesfully compiled and tested it with kvm-66 on Ubuntu 7.10, guest Ubuntu 8.04. Thanks, Quynh # diffstat linuxboot1.diff Makefile | 13 - linuxboot/Makefile | 40 +++ linuxboot/boot.S | 54 + linuxboot/farvar.h | 130 +++ linuxboot/rom.c | 104 linuxboot/signrom|binary linuxboot/signrom.c | 128 ++ linuxboot/util.h | 69 +++ qemu/Makefile|3 - qemu/Makefile.target |2 qemu/hw/linuxboot.c | 39 +++ qemu/hw/pc.c | 22 +++- qemu/hw/pc.h |5 + 13 files changed, 600 insertions(+), 9 deletions(-) commit f4f1178898c8a4bbbc0a432354dbcc56353099c3 Author: Nguyen Anh Quynh [EMAIL PROTECTED] Date: Mon Apr 21 12:27:47 2008 +0900 Linuxboot Option ROM support. Signed-off-by: Nguyen Anh Quynh [EMAIL PROTECTED] diff --git a/Makefile b/Makefile index 76c149a..fdd9388 100644 --- a/Makefile +++ b/Makefile @@ -5,7 +5,7 @@ DESTDIR= rpmrelease = devel -.PHONY: kernel user libkvm qemu bios vgabios extboot clean libfdt +.PHONY: kernel user libkvm qemu bios vgabios extboot linuxboot clean libfdt all: libkvm qemu ifneq '$(filter $(ARCH), x86_64 i386 ia64)' '' @@ -19,7 +19,7 @@ qemu kernel user libkvm: qemu: libkvm ifneq '$(filter $(ARCH), i386 x86_64)' '' -qemu: extboot +qemu: extboot linuxboot endif ifneq '$(filter $(ARCH), powerpc)' '' qemu: libfdt @@ -41,6 +41,14 @@ extboot: || ! cmp -s qemu/pc-bios/extboot.bin extboot/extboot.bin; then \ cp extboot/extboot.bin qemu/pc-bios/extboot.bin; \ fi + +linuxboot: + $(MAKE) -C $@ + if ! [ -f qemu/pc-bios/linuxboot.bin ] \ + || ! cmp -s qemu/pc-bios/linuxboot.bin linuxboot/linuxboot.bin; then \ + cp linuxboot/linuxboot.bin qemu/pc-bios/linuxboot.bin; \ + fi + libfdt: $(MAKE) -C $@ @@ -88,6 +96,7 @@ srpm: tar czf $(RPMTOPDIR)/SOURCES/kernel.tar.gz kernel tar czf $(RPMTOPDIR)/SOURCES/scripts.tar.gz scripts tar czf $(RPMTOPDIR)/SOURCES/extboot.tar.gz extboot + tar czf $(RPMTOPDIR)/SOURCES/linuxboot.tar.gz linuxboot cp Makefile configure kvm_stat $(RPMTOPDIR)/SOURCES rpmbuild --define=_topdir $(RPMTOPDIR) -bs $(tmpspec) $(RM) $(tmpspec) diff --git a/linuxboot/Makefile b/linuxboot/Makefile new file mode 100644 index 000..3bc88a6 --- /dev/null +++ b/linuxboot/Makefile @@ -0,0 +1,40 @@ +# Makefile for linuxboot Option ROM +# Nguyen Anh Quynh [EMAIL PROTECTED] + +CC = gcc +CCFLAGS = -g -Wall -Werror -nostdlib -fno-builtin -fomit-frame-pointer -Os + +cc-option = $(shell if test -z `$(1) $(2) -S -o /dev/null -xc \ + /dev/null 21`; then echo $(2); else echo $(3); fi ;) +CCFLAGS += $(call cc-option,$(CC),-nopie,) +CCFLAGS += $(call cc-option,$(CC),-fno-stack-protector,) +CCFLAGS += $(call cc-option,$(CC),-fno-stack-protector-all,) + +INSTALLDIR = /usr/share/qemu + +.PHONY: all +all: clean linuxboot.bin + +.PHONY: install +install: linuxboot.bin + cp linuxboot.bin $(INSTALLDIR) + +.PHONY: clean +clean: + $(RM) *.o *.img *.bin signrom *~ + +linuxboot.img: boot.o rom.o + $(LD) --oformat binary -Ttext 0 $^ -o $@ + +linuxboot.bin: linuxboot.img signrom
Re: [kvm-devel] performance with guests running 2.4 kernels (specifically RHEL3)
I added the traces and captured data over another apparent lockup of the guest. This seems to be representative of the sequence (pid/vcpu removed). (+4776) VMEXIT [ exitcode = 0x, rip = 0x c016127c ] (+ 0) PAGE_FAULT [ errorcode = 0x0003, virt = 0x c0009db4 ] (+3632) VMENTRY (+4552) VMEXIT [ exitcode = 0x, rip = 0x c016104a ] (+ 0) PAGE_FAULT [ errorcode = 0x000b, virt = 0x fffb61c8 ] (+ 54928) VMENTRY (+4568) VMEXIT [ exitcode = 0x, rip = 0x c01610e7 ] (+ 0) PAGE_FAULT [ errorcode = 0x0003, virt = 0x c0009db4 ] (+ 0) PTE_WRITE [ gpa = 0x 9db4 gpte = 0x 41c5d363 ] (+8432) VMENTRY (+3936) VMEXIT [ exitcode = 0x, rip = 0x c01610ee ] (+ 0) PAGE_FAULT [ errorcode = 0x0003, virt = 0x c0009db0 ] (+ 0) PTE_WRITE [ gpa = 0x 9db0 gpte = 0x ] (+ 13832) VMENTRY (+5768) VMEXIT [ exitcode = 0x, rip = 0x c016127c ] (+ 0) PAGE_FAULT [ errorcode = 0x0003, virt = 0x c0009db4 ] (+3712) VMENTRY (+4576) VMEXIT [ exitcode = 0x, rip = 0x c016104a ] (+ 0) PAGE_FAULT [ errorcode = 0x000b, virt = 0x fffb61d0 ] (+ 0) PTE_WRITE [ gpa = 0x 3d5981d0 gpte = 0x 3d55d047 ] (+ 65216) VMENTRY (+4232) VMEXIT [ exitcode = 0x, rip = 0x c01610e7 ] (+ 0) PAGE_FAULT [ errorcode = 0x0003, virt = 0x c0009db4 ] (+ 0) PTE_WRITE [ gpa = 0x 9db4 gpte = 0x 3d598363 ] (+8640) VMENTRY (+3936) VMEXIT [ exitcode = 0x, rip = 0x c01610ee ] (+ 0) PAGE_FAULT [ errorcode = 0x0003, virt = 0x c0009db0 ] (+ 0) PTE_WRITE [ gpa = 0x 9db0 gpte = 0x ] (+ 14160) VMENTRY I can forward a more complete time snippet if you'd like. vcpu0 + corresponding vcpu1 files have 85000 total lines and compressed the files total ~500k. I did not see the FLOODED trace come out during this sample though I did bump the count from 3 to 4 as you suggested. Correlating rip addresses to the 2.4 kernel: c0160d00-c0161290 = page_referenced It looks like the event is kscand running through the pages. I suspected this some time ago, and tried tweaking the kscand_work_percent sysctl variable. It appeared to lower the peak of the spikes, but maybe I imagined it. I believe lowering that value makes kscand wake up more often but do less work (page scanning) each time it is awakened. david Avi Kivity wrote: Can you add a trace at mmu_guess_page_from_pte_write(), right before if (is_present_pte(gpte))? I'm interested in gpa and gpte. Also a trace at kvm_mmu_pte_write(), where it sets flooded = 1 (hmm, try to increase the 3 to 4 in the line right above that, maybe the fork detector is misfiring). - This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel