Re: [kvm-devel] [Qemu-devel] [PATCH 1/5] PCI DMA API (v3)

2008-04-20 Thread Blue Swirl
On 4/19/08, Anthony Liguori [EMAIL PROTECTED] wrote:
 Blue Swirl wrote:

  On 4/17/08, Anthony Liguori [EMAIL PROTECTED] wrote:
 
 
Yes, the vector version of packet receive is tough.  I'll take a look
 at
   your patch.  Basically, you need to associate a set of RX vectors with
 each
   VLANClientState and then when it comes time to deliver a packet to the
 VLAN,
   before calling fd_read, see if there is an RX vector available for the
   client.
  
In the case of tap, I want to optimize further and do the initial
 readv()
   to one of the clients RX buffers and then copy that RX buffer to the
 rest of
   the clients if necessary.
  
  
 
  The vector versions should also help SLIRP to add IP and Ethernet
  headers to the incoming packets.
 
 

  Yeah, I'm hoping that with my posted linux-aio interface, I can add vector
 support since linux-aio has a proper asynchronous vector function.

  Are we happy with the DMA API?  If so, we should commit it now so we can
 start adding proper vector interfaces for net/block.

Well, the IOVector part and bdrv_readv look OK, except for the heavy
mallocing involved.

I'm not so sure about the DMA side and how everything fits together
for zero-copy IO. For example, do we still need explicit translation
at some point?

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] [POWERPC KVM] Kconfig fixes

2008-04-20 Thread Avi Kivity
Hollis Blanchard wrote:
 1 file changed, 5 insertions(+), 6 deletions(-)
 arch/powerpc/kvm/Kconfig |   11 +--


 Don't allow building as a module (asm-offsets dependencies).

 Also, automatically select KVM_BOOKE_HOST until we better separate the guest
 and host layers.

   

Applied, thanks.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] [QEMU POWERPC] FPRs no longer live in kvm_vcpu

2008-04-20 Thread Avi Kivity
Hollis Blanchard wrote:

Applied, thanks.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] QEMU/KVM: clear HF_HALTED mask at vcpu startup time

2008-04-20 Thread Avi Kivity
Marcelo Tosatti wrote:
 Now that threads are spinned up before machine-init(), clearing
 of HF_HALTED_MASK for irqchip in kernel case needs to be moved
 to actual vcpu startup.
   

Applied, thanks.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH] gfxboot VMX workaround v2

2008-04-20 Thread Avi Kivity
Anthony Liguori wrote:

 I'd prefer you not do an emulate_instruction loop at all.  Just 
 emulate one instruction on vmentry failure and let VT tell you what 
 instructions you need to emulate.

 It's only four instructions so I don't think the performance is going 
 to matter.  Take a look at the patch I posted previously.

Once we remove the other VT realmode hacks, we may need more 
instructions emulated.  Consider for example changing to real mode 
without reloading fs and gs; this will cause all real mode code to be 
emulated.

However, there's no need to do everything at once; the loop can 
certainly be added later when we have a proven need for it.


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] KVM: MMU: kvm_pv_mmu_op should not take mmap_sem

2008-04-20 Thread Avi Kivity
Marcelo Tosatti wrote:
 kvm_pv_mmu_op should not take mmap_sem. All gfn_to_page() callers down
 in the MMU processing will take it if necessary, so as it is it can
 deadlock.

 Apparently a leftover from the days before slots_lock.
   

Applied, thanks.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] Plunge in and navigate deep

2008-04-20 Thread Kovanen
Pump it all night long with our new winning formula that gives you the extra 
boost you need. http://www.tritwat.com/

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 1/1] Enble a guest to access a device's memory mapped I/O regions directly.

2008-04-20 Thread Avi Kivity
Muli Ben-Yehuda wrote:
   
   
 Why avoid rmap on mmio pages?  Sure it's unnecessary work, but
 having less cases improves overall reliability.
 

 The rmap functions already have a check to bail out if the pte is not
 an rmap pte, so in that sense, we aren't adding a new case for the
 code to handle, just adding direct MMIO ptes to the existing list of
 non-rmap ptes.

   

I'm worried about the huge chain of direct_mmio parameters passed to 
functions, impact on the audit code (at the end of mmu.c, and the poor 
souls who debug the mmu.

 You can use pfn_valid() in gfn_to_pfn() and kvm_release_pfn_*() to
 conditionally update the page refcounts.
 

 Since rmap isn't useful for direct MMIO ptes, doesn't it make more
 sense to bail out early rather than in the bowls of the rmap code?
   

It does, from a purist point of view (which also favors explicit 
parameters a la direct_mmio rather than indirect parameters like 
pfn_valid()), but I'm looking from the practical point of view now.

With mmu notifiers, we don't need to hold the refcount at all.  So 
presuming we drop the refcounting code completely, are any changes 
actually necessary here?

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] Ubuntu Gutsy host / XP guest / -smp 2

2008-04-20 Thread Avi Kivity
David Abrahams wrote:
   
   
 Versions of kvm producing this sort of output are common in
 archaeological digs.  Please try a more recent release.
 

 Well, I'll try Hardy Heron soon enough, I suppose.  It's due out in 2
 weeks.

 I'm sure you understand that most people can't afford to rebuild all
 their important software so that it stays on the bleeding edge.  Have
 you considered getting more recent versions of kvm into the updates or
 backports repositories of major distros?  I'm not really sure how much
 influence you can have over such things; I'm just asking.

   

That's up to the distro maintainers, or concerned users (who may either 
volunteer work or apply pressure).

 What HAL do you see in device manager?
 
 
 Standard PC

   
   
 This HAL does not support SMP.  You need the ACPI Multiprocessor PC
 HAL or some such.
 

 And how would I get that HAL set up?

   

Follow http://kvm.qumranet.com/kvmwiki/Windows_ACPI_Workaround, 
substituting your desired HAL for Standard PC.

 Unless you have a recent Intel processor, the combination of SMP and
 Windows XP will give noticeably lower performance.  I recommend sticking
 with uniprocessor in such cases.
 

 I have a Core Duo; isn't that recent enough?
   

No, this feature is present only on some of the Core 2s, IIRC.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 00/13] RFC: split the global mutex

2008-04-20 Thread Avi Kivity
Marcelo Tosatti wrote:
 Introduce QEMUDevice, making the ioport/iomem-device relationship visible. 

 At the moment it only contains a lock, but could be extended.

 With it the following is possible:
 - vcpu's to read/write via ioports/iomem while the iothread is working on 
   some unrelated device, or just copying data from the kernel.
 - vcpu's to read/write via ioports/iomem to different devices 
 simultaneously.

 This patchset is only a proof of concept kind of thing, so only serial+raw 
 image
 are supported. 

 Tried two benchmarks, iperf and tiobench. With tiobench the reported latency 
 is 
 significantly lower (20%+), but throughput with IDE is only slightly higher. 

 Expect to see larger improvements with a higher performing IO scheme (SCSI 
 still buggy,
 looking at it).

 The iperf numbers are pretty good. Performance of UP guests increase slightly 
 but SMP
 is quite significant.

   


I expect you're seeing contention induced by memcpy()s and inefficient 
emulation.  With the dma api, I expect the benefit will drop.


 Note that workloads with multiple busy devices (such as databases, web 
 servers) should
 be the real winners.

 What is the feeling on this? Its not _that_ intrusive and can be easily 
 NOP'ed out for
 QEMU.

   

I think many parts are missing (or maybe, I missed them).  You need to 
lock the qemu internals (there are many read-mostly qemu caches 
scattered around the code), lock against hotplug, etc.  For pure cpu 
emulation, there is a ton of work to be done: protecting the translator 
as well as making the translated code smp safe.

I think that QemuDevice makes sense, and that we want this long term, 
but that we first need to improve efficiency (which reduces cpu 
utilization _and_ improves scalability) rather than look at scalability 
alone (which is much harder in addition to the drawback of not reducing 
cpu utilization).


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 3/6] KVM: MMU: Add EPT support

2008-04-20 Thread Yang, Sheng
On Friday 18 April 2008 23:54:04 Anthony Liguori wrote:
 Yang, Sheng wrote:
  On Friday 18 April 2008 21:30:14 Anthony Liguori wrote:
  Yang, Sheng wrote:
  @@ -1048,17 +1071,18 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu,
  u64 *shadow_pte,
 * whether the guest actually used the pte (in order to detect
 * demand paging).
 */
  - spte = PT_PRESENT_MASK | PT_DIRTY_MASK;
  + spte = shadow_base_present_pte | shadow_dirty_mask;
if (!speculative)
pte_access |= PT_ACCESSED_MASK;
if (!dirty)
pte_access = ~ACC_WRITE_MASK;
  - if (!(pte_access  ACC_EXEC_MASK))
  - spte |= PT64_NX_MASK;
  -
  - spte |= PT_PRESENT_MASK;
  + if (pte_access  ACC_EXEC_MASK) {
  + if (shadow_x_mask)
  + spte |= shadow_x_mask;
  + } else if (shadow_nx_mask)
  + spte |= shadow_nx_mask;
 
  This looks like it may be a bug.  The old behavior sets NX if
  (pte_access  ACC_EXEC_MASK).  The new behavior unconditionally sets NX
  and never sets PRESENT.  Also, the if (shadow_x_mas k) checks are
  unnecessary.  spte |= 0 is a nop.
 
  Thanks for the comment! I realized two judgments of shadow_nx/x_mask is
  unnecessary... In fact, the correct behavior is either set shadow_x_mask
  or shadow_nx_mask, may be there is a better approach for this. The logic
  assured by program itself is always safer. But I will remove the
  redundant code at first.
 
  But I don't think it's a bug. The old behavior set NX if (!(pte_access 
  ACC_EXEC_MASK)), the same as the new one.

 The new behavior sets NX regardless of whether (pte_access 
 ACC_EXEC_MASK).  Is the desired change to unconditionally set NX?

Oh, I may see the point... shadow_x_mask != shadow_nx_mask.

the old behavior was:

if (!(pte_access  ACC_EXEC_MASK))
spte |= PT64_NX_MASK;

the new behavior is:

if (pte_access  ACC_EXEC_MASK) {
spte |= shadow_x_mask;
} else spte |= shadow_nx_mask;

For current behavior, kvm_arch_init() got:
   kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK,
   PT_DIRTY_MASK, PT64_NX_MASK, 0);
which means shadow_nx_mask = PT64_NX_MASK, and shadow_x_mask = 0 (NX means not 
executable, and X means executable). 

In patch 5/6, EPT got:
   kvm_mmu_set_mask_ptes(0ull, VMX_EPT_FAKE_ACCESSED_MASK,
   VMX_EPT_FAKE_DIRTY_MASK, 0ull,
   VMX_EPT_EXECUTABLE_MASK);
which means, shadow_nx_mask = 0, and shadow_x_mask = VMX_EPT_EXECUTABLE_MASK

So, when shadow enabled, and (!(pte_access  ACC_EXEC_MASK)), then spte |= 
shadow_nx_mask = PT64_NX_MASK (no change would happen when the condition is 
not satisfied). 

When EPT enabled, and (pte_access  ACC_EXEC_MASK), then spte |= shadow_x_mask 
= VMX_EPT_EXECUTABLE_MASK (no change would happen when condition is not 
satisfied).

They are two different bit and mutual exclusive ones. Maybe there are some 
better way to get their meaning more clearly...


   And I also curious about the
  PRESENT bit. You see, the PRESENT bit was set at the beginning of the
  code, and I really don't know why the duplicate one exists there...

 Looking at the code, you appear to be right.  In the future, I think you
 should separate any cleanups (like removing the redundant setting of
 PRESENT) into a separate patch and stick to just programmatic changes of
 PT_USER_MASK = shadow_user_mask, etc. in this patch.  That makes it a
 lot easier to review correctness.

Thanks for the advice, it's important to separate the cleanups. I will get it 
done more properly next time. 

-- 
Thanks
Yang, Sheng

 Regards,

 Anthony Liguori

if (pte_access  ACC_USER_MASK)
  - spte |= PT_USER_MASK;
  + spte |= shadow_user_mask;
if (largepage)
spte |= PT_PAGE_SIZE_MASK;



-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [Qemu-devel] Re: [PATCH 1/3] Refactor AIO interface to allow other AIO implementations

2008-04-20 Thread Jamie Lokier
Avi Kivity wrote:
 For the majority of deployments posix aio should be sufficient.  The few 
 that need something else can use Linux aio.

Does that mean for the majority of deployments, the slow version is
sufficient.  The few that care about performance can use Linux AIO?

I'm under the impression that the entire and only point of Linux AIO
is that it's faster than POSIX AIO on Linux.

 Of course, a managed environment can use Linux aio unconditionally if 
 knows the kernel has all the needed goodies.

Does that mean a managed environment can have some code which check
the host kernel version + filesystem type holding the VM image, to
conditionally enable Linux AIO?  (Since if you care about
performance, which is the sole reason for using Linux AIO, you
wouldn't want to enable Linux AIO on any host in your cluster where it
will trash performance.)

Just wondering.

Thanks,
-- Jamie

-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [Qemu-devel] Re: [PATCH 1/3] Refactor AIO interface to allow other AIO implementations

2008-04-20 Thread Avi Kivity
Jamie Lokier wrote:
 Avi Kivity wrote:
   
 For the majority of deployments posix aio should be sufficient.  The few 
 that need something else can use Linux aio.
 

 Does that mean for the majority of deployments, the slow version is
 sufficient.  The few that care about performance can use Linux AIO?

   

In essence, yes. s/slow/slower/ and s/performance/ultimate block device 
performance/.

Many deployments don't care at all about block device performance; they 
care mostly about networking performance.

 I'm under the impression that the entire and only point of Linux AIO
 is that it's faster than POSIX AIO on Linux.
   

It is.  I estimate posix aio adds a few microseconds above linux aio per 
I/O request, when using O_DIRECT.  Assuming 10 microseconds, you will 
need 10,000 I/O requests per second per vcpu to have a 10% performance 
difference.  That's definitely rare.

 Of course, a managed environment can use Linux aio unconditionally if 
 knows the kernel has all the needed goodies.
 

 Does that mean a managed environment can have some code which check
 the host kernel version + filesystem type holding the VM image, to
 conditionally enable Linux AIO?  (Since if you care about
 performance, which is the sole reason for using Linux AIO, you
 wouldn't want to enable Linux AIO on any host in your cluster where it
 will trash performance.)
   

Either that, or mandate that all hosts use a filesystem and kernel which 
provide the necessary performance.  Take ovirt for example, which 
provides the entire hypervisor environment, and so can guarantee this.

Also, I'd presume that those that need 10K IOPS and above will not place 
their high throughput images on a filesystem; rather on a separate SAN LUN.

 Just wondering.
   

Hope this clarifies.


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [patch 00/13] RFC: split the global mutex

2008-04-20 Thread Marcelo Tosatti
On Sun, Apr 20, 2008 at 02:16:52PM +0300, Avi Kivity wrote:
 The iperf numbers are pretty good. Performance of UP guests increase 
 slightly but SMP
 is quite significant.

 I expect you're seeing contention induced by memcpy()s and inefficient 
 emulation.  With the dma api, I expect the benefit will drop.

You still have to memcpy() with the dma api. Even with vringfd the
kernel-user copy has to be performed under the global mutex protection,
difference being that several packets can be copied per-syscall instead
of only one.

 Note that workloads with multiple busy devices (such as databases, web 
 servers) should
 be the real winners.
 
 What is the feeling on this? Its not _that_ intrusive and can be easily 
 NOP'ed out for
 QEMU.
 
   
 
 I think many parts are missing (or maybe, I missed them).  You need to 
 lock the qemu internals (there are many read-mostly qemu caches 
 scattered around the code), lock against hotplug, etc.  

Yes, there are some parts missing, such as the bh list and hotplug as
you mention.

 For pure cpu emulation, there is a ton of work to be done: protecting
 the translator as well as making the translated code smp safe.

I now believe there is a lot of work (which was not clear before).
Not particularly interested in getting real emulation to be
multithreaded.

Anyways, the lack of multithreading in qemu emulation should not be a
blocker for these patches to get in, since these are infrastructural
changes.

 I think that QemuDevice makes sense, and that we want this long term, 
 but that we first need to improve efficiency (which reduces cpu 
 utilization _and_ improves scalability) rather than look at scalability 
 alone (which is much harder in addition to the drawback of not reducing 
 cpu utilization).

Will complete the QEMUDevice+splitlock patchset, keep it uptodated, and
test it under a wider variety of workloads.

Thanks.


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [Qemu-devel] Re: [PATCH 1/3] Refactor AIO interface to allow other AIO implementations

2008-04-20 Thread Javier Guerra Giraldez
On Sunday 20 April 2008, Avi Kivity wrote:
 Also, I'd presume that those that need 10K IOPS and above will not place
 their high throughput images on a filesystem; rather on a separate SAN LUN.

i think that too; but still that LUN would be accessed by the VM's via one of 
these IO emulation layers, right?

or maybe you're advocating using the SAN initiator in the VM instead of the 
host?


-- 
Javier


signature.asc
Description: This is a digitally signed message part.
-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [RFC] linuxboot Option ROM for Linux kernel booting

2008-04-20 Thread Nguyen Anh Quynh
Hi,

This should be submitted to upstream (but not to kvm-devel list), but
this is only the test code that I want to quickly send out for
comments. In case it looks OK, I will send it to upstream later.

Inspired by extboot and conversations with Anthony and HPA, this
linuxboot option ROM is a simple option ROM that intercepts int19 in
order to execute linux setup code. This approach eliminates the need
to manipulate the boot sector for this purpose.

To test it, just load linux kernel with your KVM/QEMU image using
-kernel option in normal way.

I succesfully compiled and tested it with kvm-66 on Ubuntu 7.10, guest
Ubuntu 8.04.

Thanks,
Quynh


# diffstat linuxboot1.diff
 Makefile |   13 -
 linuxboot/Makefile   |   40 +++
 linuxboot/boot.S |   54 +
 linuxboot/farvar.h   |  130 +++
 linuxboot/rom.c  |  104 
 linuxboot/signrom|binary
 linuxboot/signrom.c  |  128 ++
 linuxboot/util.h |   69 +++
 qemu/Makefile|3 -
 qemu/Makefile.target |2
 qemu/hw/linuxboot.c  |   39 +++
 qemu/hw/pc.c |   22 +++-
 qemu/hw/pc.h |5 +
 13 files changed, 600 insertions(+), 9 deletions(-)
commit f4f1178898c8a4bbbc0a432354dbcc56353099c3
Author: Nguyen Anh Quynh [EMAIL PROTECTED]
Date:   Mon Apr 21 12:27:47 2008 +0900

Linuxboot Option ROM support.

Signed-off-by: Nguyen Anh Quynh [EMAIL PROTECTED]

diff --git a/Makefile b/Makefile
index 76c149a..fdd9388 100644
--- a/Makefile
+++ b/Makefile
@@ -5,7 +5,7 @@ DESTDIR=
 
 rpmrelease = devel
 
-.PHONY: kernel user libkvm qemu bios vgabios extboot clean libfdt
+.PHONY: kernel user libkvm qemu bios vgabios extboot linuxboot clean libfdt
 
 all: libkvm qemu
 ifneq '$(filter $(ARCH), x86_64 i386 ia64)' ''
@@ -19,7 +19,7 @@ qemu kernel user libkvm:
 
 qemu: libkvm
 ifneq '$(filter $(ARCH), i386 x86_64)' ''
-qemu: extboot
+qemu: extboot linuxboot
 endif
 ifneq '$(filter $(ARCH), powerpc)' ''
 qemu: libfdt
@@ -41,6 +41,14 @@ extboot:
|| ! cmp -s qemu/pc-bios/extboot.bin extboot/extboot.bin; then \
 		cp extboot/extboot.bin qemu/pc-bios/extboot.bin; \
 	fi
+
+linuxboot:
+	$(MAKE) -C $@
+	if ! [ -f qemu/pc-bios/linuxboot.bin ] \
+   || ! cmp -s qemu/pc-bios/linuxboot.bin linuxboot/linuxboot.bin; then \
+		cp linuxboot/linuxboot.bin qemu/pc-bios/linuxboot.bin; \
+	fi
+
 libfdt:
 	$(MAKE) -C $@
 
@@ -88,6 +96,7 @@ srpm:
 	tar czf $(RPMTOPDIR)/SOURCES/kernel.tar.gz kernel
 	tar czf $(RPMTOPDIR)/SOURCES/scripts.tar.gz scripts
 	tar czf $(RPMTOPDIR)/SOURCES/extboot.tar.gz extboot
+	tar czf $(RPMTOPDIR)/SOURCES/linuxboot.tar.gz linuxboot
 	cp Makefile configure kvm_stat $(RPMTOPDIR)/SOURCES
 	rpmbuild  --define=_topdir $(RPMTOPDIR) -bs $(tmpspec)
 	$(RM) $(tmpspec)
diff --git a/linuxboot/Makefile b/linuxboot/Makefile
new file mode 100644
index 000..3bc88a6
--- /dev/null
+++ b/linuxboot/Makefile
@@ -0,0 +1,40 @@
+# Makefile for linuxboot Option ROM
+# Nguyen Anh Quynh [EMAIL PROTECTED]
+
+CC = gcc
+CCFLAGS = -g -Wall -Werror -nostdlib -fno-builtin -fomit-frame-pointer -Os
+
+cc-option = $(shell if test -z `$(1) $(2) -S -o /dev/null -xc \
+  /dev/null 21`; then echo $(2); else echo $(3); fi ;)
+CCFLAGS += $(call cc-option,$(CC),-nopie,)
+CCFLAGS += $(call cc-option,$(CC),-fno-stack-protector,)
+CCFLAGS += $(call cc-option,$(CC),-fno-stack-protector-all,)
+
+INSTALLDIR = /usr/share/qemu
+
+.PHONY: all
+all: clean linuxboot.bin
+
+.PHONY: install
+install: linuxboot.bin
+	cp linuxboot.bin $(INSTALLDIR)
+
+.PHONY: clean
+clean:
+	$(RM) *.o *.img *.bin signrom *~
+
+linuxboot.img: boot.o rom.o
+	$(LD) --oformat binary -Ttext 0 $^ -o $@ 
+
+linuxboot.bin: linuxboot.img signrom
+	./signrom linuxboot.img linuxboot.bin
+
+signrom: signrom.c
+	$(CC) -o $@ -g -Wall $^
+
+%.o: %.c
+	$(CC) $(CCFLAGS) -c $
+
+%.o: %.S
+	$(CC) $(CCFLAGS) -c $
+
diff --git a/linuxboot/boot.S b/linuxboot/boot.S
new file mode 100644
index 000..a9461d6
--- /dev/null
+++ b/linuxboot/boot.S
@@ -0,0 +1,54 @@
+/*
+ * boot.S
+ * Linux Boot Option ROM for QEMU.
+
+ * Copyright (C) by Nguyen Anh Quynh [EMAIL PROTECTED], 2008. 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth 

Re: [kvm-devel] [RFC] linuxboot Option ROM for Linux kernel booting

2008-04-20 Thread Nguyen Anh Quynh
Forget to say that this patch is against kvm-66.

Thanks,
Q

On Mon, Apr 21, 2008 at 12:32 PM, Nguyen Anh Quynh [EMAIL PROTECTED] wrote:
 Hi,

  This should be submitted to upstream (but not to kvm-devel list), but
  this is only the test code that I want to quickly send out for
  comments. In case it looks OK, I will send it to upstream later.

  Inspired by extboot and conversations with Anthony and HPA, this
  linuxboot option ROM is a simple option ROM that intercepts int19 in
  order to execute linux setup code. This approach eliminates the need
  to manipulate the boot sector for this purpose.

  To test it, just load linux kernel with your KVM/QEMU image using
  -kernel option in normal way.

  I succesfully compiled and tested it with kvm-66 on Ubuntu 7.10, guest
  Ubuntu 8.04.

  Thanks,
  Quynh


  # diffstat linuxboot1.diff
   Makefile |   13 -
   linuxboot/Makefile   |   40 +++
   linuxboot/boot.S |   54 +
   linuxboot/farvar.h   |  130 
 +++
   linuxboot/rom.c  |  104 
   linuxboot/signrom|binary
   linuxboot/signrom.c  |  128 
 ++
   linuxboot/util.h |   69 +++
   qemu/Makefile|3 -
   qemu/Makefile.target |2
   qemu/hw/linuxboot.c  |   39 +++
   qemu/hw/pc.c |   22 +++-
   qemu/hw/pc.h |5 +
   13 files changed, 600 insertions(+), 9 deletions(-)


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [RFC] linuxboot Option ROM for Linux kernel booting

2008-04-20 Thread Nguyen Anh Quynh
Hmm, the last patch includes a binary. So please take this patch instead.

Thanks,
Q

# diffstat linuxboot1.diff
 Makefile |   13 -
 linuxboot/Makefile   |   40 +++
 linuxboot/boot.S |   54 +
 linuxboot/farvar.h   |  130 +++
 linuxboot/rom.c  |  104 
 linuxboot/signrom.c  |  128 ++
 linuxboot/util.h |   69 +++
 qemu/Makefile|3 -
 qemu/Makefile.target |2
 qemu/hw/linuxboot.c  |   39 +++
 qemu/hw/pc.c |   22 +++-
 qemu/hw/pc.h |5 +
 12 files changed, 600 insertions(+), 9 deletions(-)





On Mon, Apr 21, 2008 at 12:33 PM, Nguyen Anh Quynh [EMAIL PROTECTED] wrote:
 Forget to say that this patch is against kvm-66.

  Thanks,
  Q



  On Mon, Apr 21, 2008 at 12:32 PM, Nguyen Anh Quynh [EMAIL PROTECTED] wrote:
   Hi,
  
This should be submitted to upstream (but not to kvm-devel list), but
this is only the test code that I want to quickly send out for
comments. In case it looks OK, I will send it to upstream later.
  
Inspired by extboot and conversations with Anthony and HPA, this
linuxboot option ROM is a simple option ROM that intercepts int19 in
order to execute linux setup code. This approach eliminates the need
to manipulate the boot sector for this purpose.
  
To test it, just load linux kernel with your KVM/QEMU image using
-kernel option in normal way.
  
I succesfully compiled and tested it with kvm-66 on Ubuntu 7.10, guest
Ubuntu 8.04.
  
Thanks,
Quynh
  
  
# diffstat linuxboot1.diff
 Makefile |   13 -
 linuxboot/Makefile   |   40 +++
 linuxboot/boot.S |   54 +
 linuxboot/farvar.h   |  130 
 +++
 linuxboot/rom.c  |  104 
 linuxboot/signrom|binary
 linuxboot/signrom.c  |  128 
 ++
 linuxboot/util.h |   69 +++
 qemu/Makefile|3 -
 qemu/Makefile.target |2
 qemu/hw/linuxboot.c  |   39 +++
 qemu/hw/pc.c |   22 +++-
 qemu/hw/pc.h |5 +
 13 files changed, 600 insertions(+), 9 deletions(-)
  

commit f4f1178898c8a4bbbc0a432354dbcc56353099c3
Author: Nguyen Anh Quynh [EMAIL PROTECTED]
Date:   Mon Apr 21 12:27:47 2008 +0900

Linuxboot Option ROM support.

Signed-off-by: Nguyen Anh Quynh [EMAIL PROTECTED]

diff --git a/Makefile b/Makefile
index 76c149a..fdd9388 100644
--- a/Makefile
+++ b/Makefile
@@ -5,7 +5,7 @@ DESTDIR=
 
 rpmrelease = devel
 
-.PHONY: kernel user libkvm qemu bios vgabios extboot clean libfdt
+.PHONY: kernel user libkvm qemu bios vgabios extboot linuxboot clean libfdt
 
 all: libkvm qemu
 ifneq '$(filter $(ARCH), x86_64 i386 ia64)' ''
@@ -19,7 +19,7 @@ qemu kernel user libkvm:
 
 qemu: libkvm
 ifneq '$(filter $(ARCH), i386 x86_64)' ''
-qemu: extboot
+qemu: extboot linuxboot
 endif
 ifneq '$(filter $(ARCH), powerpc)' ''
 qemu: libfdt
@@ -41,6 +41,14 @@ extboot:
|| ! cmp -s qemu/pc-bios/extboot.bin extboot/extboot.bin; then \
 		cp extboot/extboot.bin qemu/pc-bios/extboot.bin; \
 	fi
+
+linuxboot:
+	$(MAKE) -C $@
+	if ! [ -f qemu/pc-bios/linuxboot.bin ] \
+   || ! cmp -s qemu/pc-bios/linuxboot.bin linuxboot/linuxboot.bin; then \
+		cp linuxboot/linuxboot.bin qemu/pc-bios/linuxboot.bin; \
+	fi
+
 libfdt:
 	$(MAKE) -C $@
 
@@ -88,6 +96,7 @@ srpm:
 	tar czf $(RPMTOPDIR)/SOURCES/kernel.tar.gz kernel
 	tar czf $(RPMTOPDIR)/SOURCES/scripts.tar.gz scripts
 	tar czf $(RPMTOPDIR)/SOURCES/extboot.tar.gz extboot
+	tar czf $(RPMTOPDIR)/SOURCES/linuxboot.tar.gz linuxboot
 	cp Makefile configure kvm_stat $(RPMTOPDIR)/SOURCES
 	rpmbuild  --define=_topdir $(RPMTOPDIR) -bs $(tmpspec)
 	$(RM) $(tmpspec)
diff --git a/linuxboot/Makefile b/linuxboot/Makefile
new file mode 100644
index 000..3bc88a6
--- /dev/null
+++ b/linuxboot/Makefile
@@ -0,0 +1,40 @@
+# Makefile for linuxboot Option ROM
+# Nguyen Anh Quynh [EMAIL PROTECTED]
+
+CC = gcc
+CCFLAGS = -g -Wall -Werror -nostdlib -fno-builtin -fomit-frame-pointer -Os
+
+cc-option = $(shell if test -z `$(1) $(2) -S -o /dev/null -xc \
+  /dev/null 21`; then echo $(2); else echo $(3); fi ;)
+CCFLAGS += $(call cc-option,$(CC),-nopie,)
+CCFLAGS += $(call cc-option,$(CC),-fno-stack-protector,)
+CCFLAGS += $(call cc-option,$(CC),-fno-stack-protector-all,)
+
+INSTALLDIR = /usr/share/qemu
+
+.PHONY: all
+all: clean linuxboot.bin
+
+.PHONY: install
+install: linuxboot.bin
+	cp linuxboot.bin $(INSTALLDIR)
+
+.PHONY: clean
+clean:
+	$(RM) *.o *.img *.bin signrom *~
+
+linuxboot.img: boot.o rom.o
+	$(LD) --oformat binary -Ttext 0 $^ -o $@ 
+
+linuxboot.bin: linuxboot.img signrom

Re: [kvm-devel] performance with guests running 2.4 kernels (specifically RHEL3)

2008-04-20 Thread David S. Ahern
I added the traces and captured data over another apparent lockup of the guest.
This seems to be representative of the sequence (pid/vcpu removed).

(+4776)  VMEXIT [ exitcode = 0x, rip = 0x c016127c ]
(+   0)  PAGE_FAULT [ errorcode = 0x0003, virt = 0x c0009db4 ]
(+3632)  VMENTRY
(+4552)  VMEXIT [ exitcode = 0x, rip = 0x c016104a ]
(+   0)  PAGE_FAULT [ errorcode = 0x000b, virt = 0x fffb61c8 ]
(+   54928)  VMENTRY
(+4568)  VMEXIT [ exitcode = 0x, rip = 0x c01610e7 ]
(+   0)  PAGE_FAULT [ errorcode = 0x0003, virt = 0x c0009db4 ]
(+   0)  PTE_WRITE  [ gpa = 0x 9db4 gpte = 0x 41c5d363 ]
(+8432)  VMENTRY
(+3936)  VMEXIT [ exitcode = 0x, rip = 0x c01610ee ]
(+   0)  PAGE_FAULT [ errorcode = 0x0003, virt = 0x c0009db0 ]
(+   0)  PTE_WRITE  [ gpa = 0x 9db0 gpte = 0x  ]
(+   13832)  VMENTRY


(+5768)  VMEXIT [ exitcode = 0x, rip = 0x c016127c ]
(+   0)  PAGE_FAULT [ errorcode = 0x0003, virt = 0x c0009db4 ]
(+3712)  VMENTRY
(+4576)  VMEXIT [ exitcode = 0x, rip = 0x c016104a ]
(+   0)  PAGE_FAULT [ errorcode = 0x000b, virt = 0x fffb61d0 ]
(+   0)  PTE_WRITE  [ gpa = 0x 3d5981d0 gpte = 0x 3d55d047 ]
(+   65216)  VMENTRY
(+4232)  VMEXIT [ exitcode = 0x, rip = 0x c01610e7 ]
(+   0)  PAGE_FAULT [ errorcode = 0x0003, virt = 0x c0009db4 ]
(+   0)  PTE_WRITE  [ gpa = 0x 9db4 gpte = 0x 3d598363 ]
(+8640)  VMENTRY
(+3936)  VMEXIT [ exitcode = 0x, rip = 0x c01610ee ]
(+   0)  PAGE_FAULT [ errorcode = 0x0003, virt = 0x c0009db0 ]
(+   0)  PTE_WRITE  [ gpa = 0x 9db0 gpte = 0x  ]
(+   14160)  VMENTRY

I can forward a more complete time snippet if you'd like. vcpu0 + corresponding
vcpu1 files have 85000 total lines and compressed the files total ~500k.

I did not see the FLOODED trace come out during this sample though I did bump
the count from 3 to 4 as you suggested.


Correlating rip addresses to the 2.4 kernel:

c0160d00-c0161290 = page_referenced

It looks like the event is kscand running through the pages. I suspected this
some time ago, and tried tweaking the kscand_work_percent sysctl variable. It
appeared to lower the peak of the spikes, but maybe I imagined it. I believe
lowering that value makes kscand wake up more often but do less work (page
scanning) each time it is awakened.

david


Avi Kivity wrote:
 Can you add a trace at mmu_guess_page_from_pte_write(), right before if 
 (is_present_pte(gpte))?  I'm interested in gpa and gpte.  Also a trace 
 at kvm_mmu_pte_write(), where it sets flooded = 1 (hmm, try to increase 
 the 3 to 4 in the line right above that, maybe the fork detector is 
 misfiring).


-
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel