Re: Kernel mode VGAs?

2012-02-14 Thread Jan Kiszka
On 2012-02-14 08:12, Gerhard Wiesinger wrote:
 Hello,
 
 Current QEMU-KVM VGA implementation have the following problem with
 legacy OS (e.g. DOS with INT10h calls): Performance is low on accessing
 A000:0
 page and doing bank switching at the 64k page.

Do we already understand the mode and access patterns here? Which VGA
adapter? Cirrus, standard, or any? What is the concrete test case (one
that won't require me digging for MS Dose floppy disks in my basement)?

 
 Would a kernel mode VGA solve these problems?
 How complicated is it?
 Is it possible to have only some parts in kernel mode?
 Any further ideas or suggestions?

Provided we take heavy exits so far, in-kernel acceleration may reduce
the exit overhead by factor, hmm, maybe 3-4. Better is to avoid exists
completely, i.e. switch the region to RAM mode. But that depends on the
graphic mode, and I'm afraid we have already covered all which can be
mapped like this.

In any case, before discussing solutions, we need to analyze the problem.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests

2012-02-14 Thread Jan Kiszka
On 2012-02-14 08:54, Gleb Natapov wrote:
 On Mon, Feb 13, 2012 at 08:22:21PM +0100, Jan Kiszka wrote:
 Unfortunately, this is only an internal structure, not officially
 documented by MS. However, all supported OS versions a legacy by now, no
 longer changing its structure.

 This and a note about the supported OS versions could be added as comment.

 OK.

 For the folks that developed it in qemu-kvm: This targets Windows XP,
 Vista and Server 2003, all 32-bit, right?

 Not Vista. Not sure about Server 2003.

I think I saw some 2003 reference in the qemu-kvm git logs.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests

2012-02-14 Thread Gleb Natapov
On Tue, Feb 14, 2012 at 09:55:46AM +0100, Jan Kiszka wrote:
 On 2012-02-14 08:54, Gleb Natapov wrote:
  On Mon, Feb 13, 2012 at 08:22:21PM +0100, Jan Kiszka wrote:
  Unfortunately, this is only an internal structure, not officially
  documented by MS. However, all supported OS versions a legacy by now, no
  longer changing its structure.
 
  This and a note about the supported OS versions could be added as comment.
 
  OK.
 
  For the folks that developed it in qemu-kvm: This targets Windows XP,
  Vista and Server 2003, all 32-bit, right?
 
  Not Vista. Not sure about Server 2003.
 
 I think I saw some 2003 reference in the qemu-kvm git logs.
 
Very likely. AFAIK it uses the same kernel as XP.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AESNI and guest hosts

2012-02-14 Thread Ryan Brown
Sorry for being a noob here, Any clues with this?, anyone ...

On Mon, Feb 13, 2012 at 2:05 AM, Ryan Brown mp3g...@gmail.com wrote:
 Host/KVM server is running linux 3.2.4 (Debian wheezy), and guest
 kernel is running 3.2.5. The cpu is an E3-1230, but for some reason
 its not able to supply the guest with aesni. Is there a config option
 or is there something we're missing?

    cpu
     archx86_64/arch
     modelWestmere/model
     vendorIntel/vendor
     topology sockets='1' cores='4' threads='2'/
     feature name='rdtscp'/
     feature name='x2apic'/
     feature name='xtpr'/
     feature name='tm2'/
     feature name='est'/
     feature name='vmx'/
     feature name='ds_cpl'/
     feature name='monitor'/
     feature name='pbe'/
     feature name='tm'/
     feature name='ht'/
     feature name='ss'/
     feature name='acpi'/
     feature name='ds'/
    feature name='vme'/
  /cpu

 Guest:
 [root@fanboy:~]# cat /proc/cpuinfo
 processor       : 0
 vendor_id       : GenuineIntel
 cpu family      : 6
 model           : 2
 model name      : QEMU Virtual CPU version 1.0
 stepping        : 3
 microcode       : 0x1
 cpu MHz         : 3192.748
 cache size      : 4096 KB
 fdiv_bug        : no
 hlt_bug         : no
 f00f_bug        : no
 coma_bug        : no
 fpu             : yes
 fpu_exception   : yes
 cpuid level     : 4
 wp              : yes
 flags           : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca
 cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm pni cx16 popcnt
 hypervisor lahf_lm
 bogomips        : 6385.49
 clflush size    : 64
 cache_alignment : 64
 address sizes   : 40 bits physical, 48 bits virtual
 power management:

 processor       : 1
 vendor_id       : GenuineIntel
 cpu family      : 6
 model           : 2
 model name      : QEMU Virtual CPU version 1.0
 stepping        : 3
 microcode       : 0x1
 cpu MHz         : 3192.748
 cache size      : 4096 KB
 fdiv_bug        : no
 hlt_bug         : no
 f00f_bug        : no
 coma_bug        : no
 fpu             : yes
 fpu_exception   : yes
 cpuid level     : 4
 wp              : yes
 flags           : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca
 cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm pni cx16 popcnt
 hypervisor lahf_lm
 bogomips        : 6385.49
 clflush size    : 64
 cache_alignment : 64
 address sizes   : 40 bits physical, 48 bits virtual
 power management:
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] need to improve slot creation/destruction? -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-14 Thread Avi Kivity
On 02/10/2012 03:25 PM, Takuya Yoshikawa wrote:
 Avi Kivity a...@redhat.com wrote:

   2. When we create(and shift?) a memory slot, we call 
   kvm_arch_flush_shadow()
   to clear all mmio sptes, again not restricted to that slot.
  
 /*
  * If the new memory slot is created, we need to clear all
  * mmio sptes.
  */
 if (npages  old.base_gfn != mem-guest_phys_addr  PAGE_SHIFT)
 kvm_arch_flush_shadow(kvm);
  
  This is pretty rare outside the previous scenario (memory/pci hotplug).

 Is this condition correct?

 When npages != 0 and old.npages == 0, the slot is being newly created, do we
 really need to flush shadow pages?

 This should be
   if (npages  old.npages  (old.base_gfn != base_gfn))


Your condition is more correct, but in practice there's no difference. 
If old.npages == 0, then old.base_gfn will be 0, and the condition will
fail, except for the first slot created (when the shadow cache is empty
anyway).

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] need to improve slot creation/destruction? -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-14 Thread Avi Kivity
On 02/10/2012 07:16 PM, Marcelo Tosatti wrote:
 On Thu, Feb 09, 2012 at 04:25:36PM +0200, Avi Kivity wrote:
  On 02/08/2012 08:45 PM, Marcelo Tosatti wrote:
BTW do we really need fast slot creation/destruction?
  
   At the moment yes. Boot a RHEL/Fedora installation disk (or any other
   guest which uses SYSLINUX splash screen) and you will see.
  
  Another workload that suffers is Windows XP clearing the screen during boot.
  
That
   particular case is a limitation of cirrus in QEMU, ideally it should be
   optimized there.
  
  Why do you say that?

 There is no fundamental need to create/destroy the 0xa VGA memory
 slot repeatedly.

If the guest writes to it, then the need exists.

 But you are right that the aim should be decent performance
 nevertheless.


-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Pe: [PATCH v5 1/3] virtio-scsi: first version

2012-02-14 Thread Paolo Bonzini

On 02/14/2012 02:11 AM, Michael S. Tsirkin wrote:

On Tue, Feb 14, 2012 at 11:49:55AM +1100, ronnie sahlberg wrote:

By just exposing this device to the kernel, the kernel keeps sending,
or if not the kernel maybe some other process trying to poll the
status?

every few seconds :
PREVENT_ALLOW_MEDIUM_REMOVAL  prevent removal
PREVENT_ALLOW_MEDIUM_REMOVAL  to immediatel change it back to allow
removal again
TEST_UNIT_READY


After I run this
mount /dev/sdd1 /mnt

The kernel sends a single
PREVENT_ALLOW_MEDIUM_REMOVAL to prevent removal
then every few seconds a
TEST_UNIT_READY


Sorry to interrupt you again guys, but: the discussion started with 
virtio-blk hotplug and now we're talking about SCSI commands?  Sure 
somebody switched topic at some point :) and anyway this is irrelevant 
to what virtio-blk can/cannot do.


BTW, for virtio-scsi the spec provides a way to do hotplug and hotunplug 
without any polling, though it's not implemented yet in the driver.


Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


The way of mapping BIOS into the guest's address space

2012-02-14 Thread Yang Bai
Hi all,

Since on X86, bios is always at the end of the address space, so I
have some thought about how to implement the seabios support for kvm
tool.

1. using kvm__register_mem to map the end of address space to the
guest then copy the code of seabios to this mem region. Just emulating
the bios chip.

2. leave the bios code alone and don't touch the guest's address
space. If the guest accesses the address belonging to the bios, it
will be an IO request and we can emulate the IO access to the bios
chip.

Any ideas about this?

And question:  How could I set the first instruction address after we
issue the vmlaunch instruction?

Thanks,
Yang
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: The way of mapping BIOS into the guest's address space

2012-02-14 Thread Pekka Enberg
On Tue, Feb 14, 2012 at 1:03 PM, Yang Bai hamo...@gmail.com wrote:
 Since on X86, bios is always at the end of the address space, so I
 have some thought about how to implement the seabios support for kvm
 tool.

 1. using kvm__register_mem to map the end of address space to the
 guest then copy the code of seabios to this mem region. Just emulating
 the bios chip.

 2. leave the bios code alone and don't touch the guest's address
 space. If the guest accesses the address belonging to the bios, it
 will be an IO request and we can emulate the IO access to the bios
 chip.

 Any ideas about this?

The latter solution doesn't make any sense to me. Cyrill, do we really
need to put the BIOS at the end of the address space? Don't we have
unused space below 1 MB?

 And question:  How could I set the first instruction address after we
 issue the vmlaunch instruction?

You need to set -boot_ip and fiends. See
tools/kvm/x86/kvm.c::load_bzimage() for an example.

Pekka
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: The way of mapping BIOS into the guest's address space

2012-02-14 Thread Cyrill Gorcunov
On Tue, Feb 14, 2012 at 01:10:59PM +0200, Pekka Enberg wrote:
 On Tue, Feb 14, 2012 at 1:03 PM, Yang Bai hamo...@gmail.com wrote:
  Since on X86, bios is always at the end of the address space, so I
  have some thought about how to implement the seabios support for kvm
  tool.
 
  1. using kvm__register_mem to map the end of address space to the
  guest then copy the code of seabios to this mem region. Just emulating
  the bios chip.

I think this is what should be done.

 
  2. leave the bios code alone and don't touch the guest's address
  space. If the guest accesses the address belonging to the bios, it
  will be an IO request and we can emulate the IO access to the bios
  chip.
 
  Any ideas about this?
 
 The latter solution doesn't make any sense to me. Cyrill, do we really
 need to put the BIOS at the end of the address space? Don't we have
 unused space below 1 MB?

I don't remember for sure how SeaBIOS works actually. What I rememer
is that it aquires all hw environment might have. So without real look
into seabios code I fear I can't answer. But reserving end of 4G address
space for bios copy sounds reasonable if we going to behave as real
hardware. Maybe we could poke someone from KVM camp for a hint?

Cyrill
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware

2012-02-14 Thread jamal
On Mon, 2012-02-13 at 07:13 -0800, John Fastabend wrote:

 The use case here is multiple VFs but the same solution should work with
 multiple PFs as well. FDB controls should be independent of how the ports
 are exposed VFs, PFs, VMDQ/queue pairs, macvlan, etc.

Makes sense.

 With events and ADD/DEL/GET FDB controls we can solve both cases. This also
 solves Roopa's case with macvlan where he wants to add additional addresses
 to macvlan ports.

Not familiar with that issue - I'll prowl the list.

 Yes it should flood here, unless its acting as a 802.1Qbg VEB or VEPA.

Ok. So there is a toggle somewhere which controls how flooding should
happen.

 
 Maybe not. But the kernel already has the needed signals with one extra
 hook we can save running a daemon in user space. Maybe that's not a great
 argument to add kernel code though.

You make a reasonable arguement to have it in the kernel but i think we
win more if we separate the control. So while i empathize, I am hoping
that youd go with the path that is hard to travel ;-

 The PF_BRIDGE:RTM_GETNEIGH,RTM_NEWNEIGH,RTM_DELNEIGH are registered in the
 br_netlink_init() path. 

Hrm - hadnt paid attention to that before. Nasty.
The bridge seems to be hard-coding policy on station movement, no? 
This is a good example of the qualms i have on adding things to the
kernel;-
I may not want to auto update a MAC address moving ports as part of
some policy i have. I can go and add YAK (Yet Another Knob) - but where
is the line drawn?

cheers,
jamal


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH RFC] seabios: add OSHP method stub

2012-02-14 Thread Paul Brook
  In a nutshell, I don't know what a SHPC is (nor OSHP), so I'm looking
  for an additional Ack.
 
 No problem, I'll get an Ack :)
 Meanwhile - here's a summary, as far as I understand it.
 
 Originally PCI SIG only defined the electrical
 and mechanical requirements from hotplug, no standard
 software interface. So it needed ACPI to drive device-specific registers
 to actually do hotplug.
 At some point PCISIG defined standard interfaces
 for PCI hotplug. There are two of them: standard
 hot plug controller (SHPC) for PCI and PCIE hotplug
 for Express.
 
 Now an OS can have a standard driver and use it
 to activate hotplug functionality. This is OS hotplug (OSHP).

So presumably this will work on targets that don't have ACPI?
Assuming a competent guest OS of course.  Have you tested this?

Paul
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: performance trouble

2012-02-14 Thread Avi Kivity
On 02/10/2012 12:09 PM, David Cure wrote:
   hello,

 Le Sun, Feb 05, 2012 at 11:38:34AM +0200, Avi Kivity ecrivait :
  
  Please post a trace as documented in http://www.linux-kvm.org/page/Tracing.

   I made the trace : started just before the slow function launch
 and stoped just after. I start only one VM with 2 vcpus/16G RAM and only one
 user connected to the VM to launch the test.

   The trace file is too big to post here, I gzip it and the file
 is available here : http://www.roullier.net/report.txt.gz

   I hope you can find something strange.


It's reading the HPET like crazy.  There are also tons of interrupts. 
Please use the windows performance tools to see which devices trigger
these interrupts.

The HPET issue will be fixed by the hyper-V enlightenments, but these
will take some time to cook.

You can also try vhost-net to improve networking latency.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: The way of mapping BIOS into the guest's address space

2012-02-14 Thread Cyrill Gorcunov
On Tue, Feb 14, 2012 at 09:20:18PM +0800, Yang Bai wrote:
 And will seabios replace the present bios implement or co-exsit?

Ideally we should get rid of our minibios completely and only have
seabios here instead.

Cyrill
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH RFC] seabios: add OSHP method stub

2012-02-14 Thread Michael S. Tsirkin
On Tue, Feb 14, 2012 at 12:49:08PM +, Paul Brook wrote:
   In a nutshell, I don't know what a SHPC is (nor OSHP), so I'm looking
   for an additional Ack.
  
  No problem, I'll get an Ack :)
  Meanwhile - here's a summary, as far as I understand it.
  
  Originally PCI SIG only defined the electrical
  and mechanical requirements from hotplug, no standard
  software interface. So it needed ACPI to drive device-specific registers
  to actually do hotplug.
  At some point PCISIG defined standard interfaces
  for PCI hotplug. There are two of them: standard
  hot plug controller (SHPC) for PCI and PCIE hotplug
  for Express.
  
  Now an OS can have a standard driver and use it
  to activate hotplug functionality. This is OS hotplug (OSHP).
 
 So presumably this will work on targets that don't have ACPI?
 Assuming a competent guest OS of course.  Have you tested this?
 
 Paul

This being the qemu side of things? I run Linux
and verified that it calls OSHP and afterwards,
runs the native driver and handles hotplug/unplug
without invoking ACPI at all.

It seems that at least the SHPC driver in linux
doesn't work if you don't have an acpi table
with the OSHP method - not many people run with acpi=off
nowdays, so it's probably just a bug.
I'll check how hard it is to fix this.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: performance trouble

2012-02-14 Thread Gleb Natapov
On Tue, Feb 14, 2012 at 03:32:16PM +0200, Avi Kivity wrote:
 On 02/10/2012 12:09 PM, David Cure wrote:
  hello,
 
  Le Sun, Feb 05, 2012 at 11:38:34AM +0200, Avi Kivity ecrivait :
   
   Please post a trace as documented in 
   http://www.linux-kvm.org/page/Tracing.
 
  I made the trace : started just before the slow function launch
  and stoped just after. I start only one VM with 2 vcpus/16G RAM and only one
  user connected to the VM to launch the test.
 
  The trace file is too big to post here, I gzip it and the file
  is available here : http://www.roullier.net/report.txt.gz
 
  I hope you can find something strange.
 
 
 It's reading the HPET like crazy.  There are also tons of interrupts. 
 Please use the windows performance tools to see which devices trigger
 these interrupts.
 
 The HPET issue will be fixed by the hyper-V enlightenments, but these
 will take some time to cook.
 
Try to add -no-hpet to qemu command line and see if it helps.

 You can also try vhost-net to improve networking latency.
 
 -- 
 error compiling committee.c: too many arguments to function
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH RFC] seabios: add OSHP method stub

2012-02-14 Thread Paul Brook
   Now an OS can have a standard driver and use it
   to activate hotplug functionality. This is OS hotplug (OSHP).
  
  So presumably this will work on targets that don't have ACPI?
  Assuming a competent guest OS of course.  Have you tested this?
 
 This being the qemu side of things? I run Linux
 and verified that it calls OSHP and afterwards,
 runs the native driver and handles hotplug/unplug
 without invoking ACPI at all.

I mean using your shiny new hotplug PCI-PCI bridge on arm/ppc/mips targets 
(i.e anything other than x86 PC).  From your description it sounds like it 
*should* work.
 
 It seems that at least the SHPC driver in linux
 doesn't work if you don't have an acpi table
 with the OSHP method - not many people run with acpi=off
 nowdays, so it's probably just a bug.
 I'll check how hard it is to fix this.

Targets other than x86 don't have ACPI to start with.

Paul
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: performance trouble

2012-02-14 Thread Vadim Rozenfeld


- Original Message -
From: Avi Kivity a...@redhat.com
To: David Cure k...@cure.nom.fr
Cc: kvm@vger.kernel.org, Vadim Rozenfeld vroze...@redhat.com
Sent: Tuesday, February 14, 2012 3:32:16 PM
Subject: Re: performance trouble

On 02/10/2012 12:09 PM, David Cure wrote:
   hello,

 Le Sun, Feb 05, 2012 at 11:38:34AM +0200, Avi Kivity ecrivait :
  
  Please post a trace as documented in http://www.linux-kvm.org/page/Tracing.

   I made the trace : started just before the slow function launch
 and stoped just after. I start only one VM with 2 vcpus/16G RAM and only one
 user connected to the VM to launch the test.

   The trace file is too big to post here, I gzip it and the file
 is available here : http://www.roullier.net/report.txt.gz

   I hope you can find something strange.


It's reading the HPET like crazy.  There are also tons of interrupts. 
Please use the windows performance tools to see which devices trigger
these interrupts.

[VR]
+1 
Try Microsoft Windows Performance Toolkit from Windows SDK 
http://www.microsoft.com/download/en/details.aspx?displaylang=enid=3138
It's really good.


The HPET issue will be fixed by the hyper-V enlightenments, but these
will take some time to cook.

You can also try vhost-net to improve networking latency.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Win 2000 driver for -vga std ?

2012-02-14 Thread Reeted

On 02/14/12 07:25, Michael Tokarev wrote:

On 14.02.2012 05:42, Reeted wrote:

Hello, subject says it all

The driver for windows 2000 for the -vga std should be the Anapa VBE 
Vesa VBEMP if I understand correctly

but I cannot on earth find this executable
http://navozhdeniye.narod.ru/vbemp.htm
all links for download all over the world are dangling!

Anybody has conserved this very important driver?


This adapter works in all versions of windows with a built-in
vesa driver just fine, no replacement is necessary or desired.

The only problem is that some versions of windows consider that
driver to be problematic somehow and mark the corresponding
device with yellow exclamation sign.  Go ask M$ about this.


I don't think so...

It detects new hardware (I am virtualizing an existing machine), asks me 
where to look for a driver, I make it go looking into the Win2000 
installation CD and online at Windows Update but it says it cannot find 
a driver for such video adapter. It asks me if I want to disable the 
device or be prompted again for installation at the next boot.


And it keeps running at 16 colors (4 bit depth) 800x600 and very poor 
performances when moving windows around.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH RFC] seabios: add OSHP method stub

2012-02-14 Thread Michael S. Tsirkin
On Tue, Feb 14, 2012 at 01:47:59PM +, Paul Brook wrote:
Now an OS can have a standard driver and use it
to activate hotplug functionality. This is OS hotplug (OSHP).
   
   So presumably this will work on targets that don't have ACPI?
   Assuming a competent guest OS of course.  Have you tested this?
  
  This being the qemu side of things? I run Linux
  and verified that it calls OSHP and afterwards,
  runs the native driver and handles hotplug/unplug
  without invoking ACPI at all.
 
 I mean using your shiny new hotplug PCI-PCI bridge on arm/ppc/mips targets 
 (i.e anything other than x86 PC).  From your description it sounds like it 
 *should* work.
  
  It seems that at least the SHPC driver in linux
  doesn't work if you don't have an acpi table
  with the OSHP method - not many people run with acpi=off
  nowdays, so it's probably just a bug.
  I'll check how hard it is to fix this.
 
 Targets other than x86 don't have ACPI to start with.
 
 Paul

So

#ifdef CONFIG_ACPI
#include linux/pci-acpi.h
static inline int get_hp_hw_control_from_firmware(struct pci_dev *dev)
{
u32 flags = OSC_SHPC_NATIVE_HP_CONTROL;
return acpi_get_hp_hw_control_from_firmware(dev, flags);
}
#else
#define get_hp_hw_control_from_firmware(dev) (0)
#endif

So if you build your guest without acpi, things should work fine.

-- 
MMST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for Tuesday 14

2012-02-14 Thread Juan Quintela
Juan Quintela quint...@redhat.com wrote:
 Hi

 Please send in any agenda items you are interested in covering.

As there are no topics, call is cancelled.

Happy hacking,

Juan.

PD. You should use the extra time to draw a qemu mascot O:-)

 Cheers,

 Juan.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42755] KVM is being extremely slow on AMD Athlon64 4000+ Dual Core 2.1GHz Brisbane

2012-02-14 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42755





--- Comment #28 from Avi Kivity a...@redhat.com  2012-02-14 14:47:38 ---
(In reply to comment #27)
 and there soon will be video capture with 'perf top'
 
 http://vbox7.com/play:199e9ede30

Run it while the guest is also running.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: The way of mapping BIOS into the guest's address space

2012-02-14 Thread cody

On 02/14/2012 07:03 PM, Yang Bai wrote:

Hi all,

Since on X86, bios is always at the end of the address space, so I
have some thought about how to implement the seabios support for kvm
tool.

1. using kvm__register_mem to map the end of address space to the
guest then copy the code of seabios to this mem region. Just emulating
the bios chip.

2. leave the bios code alone and don't touch the guest's address
space. If the guest accesses the address belonging to the bios, it
will be an IO request and we can emulate the IO access to the bios
chip.

Any ideas about this?
   
Can I ask what's the purpose of mapping BIOS code to guest? Any usage? 
Shouldn't BIOS's behavior be emulated by hypervisor? Thanks.


-cody

And question:  How could I set the first instruction address after we
issue the vmlaunch instruction?

Thanks,
Yang
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 5/9] kvmvapic: Add option ROM

2012-02-14 Thread Jan Kiszka
This imports and builds the original VAPIC option ROM of qemu-kvm.
Its interaction with QEMU is described in the commit that introduces the
corresponding device model.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 .gitignore   |1 +
 Makefile |2 +-
 pc-bios/optionrom/Makefile   |2 +-
 pc-bios/optionrom/kvmvapic.S |  341 ++
 4 files changed, 344 insertions(+), 2 deletions(-)
 create mode 100644 pc-bios/optionrom/kvmvapic.S

diff --git a/.gitignore b/.gitignore
index f5aab2c..d3b78c3 100644
--- a/.gitignore
+++ b/.gitignore
@@ -75,6 +75,7 @@ pc-bios/vgabios-pq/status
 pc-bios/optionrom/linuxboot.bin
 pc-bios/optionrom/multiboot.bin
 pc-bios/optionrom/multiboot.raw
+pc-bios/optionrom/kvmvapic.bin
 .stgit-*
 cscope.*
 tags
diff --git a/Makefile b/Makefile
index 47acf3d..c2ef135 100644
--- a/Makefile
+++ b/Makefile
@@ -255,7 +255,7 @@ pxe-e1000.rom pxe-eepro100.rom pxe-ne2k_pci.rom \
 pxe-pcnet.rom pxe-rtl8139.rom pxe-virtio.rom \
 bamboo.dtb petalogix-s3adsp1800.dtb petalogix-ml605.dtb \
 mpc8544ds.dtb \
-multiboot.bin linuxboot.bin \
+multiboot.bin linuxboot.bin kvmvapic.bin \
 s390-zipl.rom \
 spapr-rtas.bin slof.bin \
 palcode-clipper
diff --git a/pc-bios/optionrom/Makefile b/pc-bios/optionrom/Makefile
index 2caf7e6..f6b4027 100644
--- a/pc-bios/optionrom/Makefile
+++ b/pc-bios/optionrom/Makefile
@@ -14,7 +14,7 @@ CFLAGS += -I$(SRC_PATH)
 CFLAGS += $(call cc-option, $(CFLAGS), -fno-stack-protector)
 QEMU_CFLAGS = $(CFLAGS)
 
-build-all: multiboot.bin linuxboot.bin
+build-all: multiboot.bin linuxboot.bin kvmvapic.bin
 
 # suppress auto-removal of intermediate files
 .SECONDARY:
diff --git a/pc-bios/optionrom/kvmvapic.S b/pc-bios/optionrom/kvmvapic.S
new file mode 100644
index 000..e1d8f18
--- /dev/null
+++ b/pc-bios/optionrom/kvmvapic.S
@@ -0,0 +1,341 @@
+#
+# Local APIC acceleration for Windows XP and related guests
+#
+# Copyright 2011 Red Hat, Inc. and/or its affiliates
+#
+# Author: Avi Kivity a...@redhat.com
+#
+# This work is licensed under the terms of the GNU GPL, version 2, or (at your
+# option) any later version.  See the COPYING file in the top-level directory.
+#
+
+   .text 0
+   .code16
+.global _start
+_start:
+   .short 0xaa55
+   .byte (_end - _start) / 512
+   # clear vapic area: firmware load using rep insb may cause
+   # stale tpr/isr/irr data to corrupt the vapic area.
+   push %es
+   push %cs
+   pop %es
+   xor %ax, %ax
+   mov $vapic_size/2, %cx
+   lea vapic, %di
+   cld
+   rep stosw
+   pop %es
+   mov $vapic_base, %ax
+   out %ax, $0x7e
+   lret
+
+   .code32
+vapic_size = 2*4096
+
+.macro fixup delta=-4
+777:
+   .text 1
+   .long 777b + \delta  - vapic_base
+   .text 0
+.endm
+
+.macro reenable_vtpr
+   out %al, $0x7e
+.endm
+
+.text 1
+   fixup_start = .
+.text 0
+
+.align 16
+
+vapic_base:
+   .ascii kvm aPiC
+
+   /* relocation data */
+   .long vapic_base; fixup
+   .long fixup_start   ; fixup
+   .long fixup_end ; fixup
+
+   .long vapic ; fixup
+   .long vapic_size
+vcpu_shift:
+   .long 0
+real_tpr:
+   .long 0
+   .long up_set_tpr; fixup
+   .long up_set_tpr_eax; fixup
+   .long up_get_tpr_eax; fixup
+   .long up_get_tpr_ecx; fixup
+   .long up_get_tpr_edx; fixup
+   .long up_get_tpr_ebx; fixup
+   .long 0 /* esp. won't work. */
+   .long up_get_tpr_ebp; fixup
+   .long up_get_tpr_esi; fixup
+   .long up_get_tpr_edi; fixup
+   .long up_get_tpr_stack  ; fixup
+   .long mp_set_tpr; fixup
+   .long mp_set_tpr_eax; fixup
+   .long mp_get_tpr_eax; fixup
+   .long mp_get_tpr_ecx; fixup
+   .long mp_get_tpr_edx; fixup
+   .long mp_get_tpr_ebx; fixup
+   .long 0 /* esp. won't work. */
+   .long mp_get_tpr_ebp; fixup
+   .long mp_get_tpr_esi; fixup
+   .long mp_get_tpr_edi; fixup
+   .long mp_get_tpr_stack  ; fixup
+
+.macro kvm_hypercall
+   .byte 0x0f, 0x01, 0xc1
+.endm
+
+kvm_hypercall_vapic_poll_irq = 1
+
+pcr_cpu = 0x51
+
+.align 64
+
+mp_get_tpr_eax:
+   pushf
+   cli
+   reenable_vtpr
+   push %ecx
+
+   fs/movzbl pcr_cpu, %eax
+
+   mov vcpu_shift, %ecx; fixup
+   shl %cl, %eax
+   testb $1, vapic+4(%eax) ; fixup delta=-5
+   jz mp_get_tpr_bad
+   movzbl vapic(%eax), %eax ; fixup
+
+mp_get_tpr_out:
+   pop %ecx
+   popf
+   ret
+
+mp_get_tpr_bad:
+   mov real_tpr, %eax  ; fixup
+   mov (%eax), %eax
+   jmp mp_get_tpr_out
+
+mp_get_tpr_ebx:
+   mov %eax, %ebx
+   call mp_get_tpr_eax
+   xchg %eax, %ebx
+   ret
+
+mp_get_tpr_ecx:
+   mov %eax, %ecx
+   call mp_get_tpr_eax
+   xchg %eax, %ecx
+   ret
+
+mp_get_tpr_edx:
+   mov %eax, %edx

[PATCH v3 7/9] kvmvapic: Simplify mp/up_set_tpr

2012-02-14 Thread Jan Kiszka
The CH registers is only written, never read. So we can remove these
operations and, in case of up_set_tpr, also the ECX push/pop.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 pc-bios/optionrom/kvmvapic.S |6 +-
 1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/pc-bios/optionrom/kvmvapic.S b/pc-bios/optionrom/kvmvapic.S
index e1d8f18..856c1e5 100644
--- a/pc-bios/optionrom/kvmvapic.S
+++ b/pc-bios/optionrom/kvmvapic.S
@@ -202,7 +202,6 @@ mp_isr_is_bigger:
mov %bh, %bl
 mp_tpr_is_bigger:
/* %bl = ppr */
-   mov %bl, %ch   /* ch = ppr */
rol $8, %ebx
/* now: %bl = irr, %bh = ppr */
cmp %bh, %bl
@@ -276,7 +275,6 @@ up_set_tpr_eax:
 up_set_tpr:
pushf
push %eax
-   push %ecx
push %ebx
reenable_vtpr
 
@@ -284,7 +282,7 @@ up_set_tpr_failed:
mov vapic, %eax ; fixup
 
mov %eax, %ebx
-   mov 20(%esp), %bl
+   mov 16(%esp), %bl
 
/* %ebx = new vapic (%bl = tpr, %bh = isr, %b3 = irr) */
 
@@ -298,7 +296,6 @@ up_isr_is_bigger:
mov %bh, %bl
 up_tpr_is_bigger:
/* %bl = ppr */
-   mov %bl, %ch   /* ch = ppr */
rol $8, %ebx
/* now: %bl = irr, %bh = ppr */
cmp %bh, %bl
@@ -306,7 +303,6 @@ up_tpr_is_bigger:
 
 up_set_tpr_out:
pop %ebx
-   pop %ecx
pop %eax
popf
ret $4
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/9] Remove useless casts from cpu iterators

2012-02-14 Thread Jan Kiszka
CPUState::next_cpu is already CPUState *.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 cpus.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/cpus.c b/cpus.c
index d0c8340..4e65894 100644
--- a/cpus.c
+++ b/cpus.c
@@ -853,7 +853,7 @@ static int all_vcpus_paused(void)
 if (!penv-stopped) {
 return 0;
 }
-penv = (CPUState *)penv-next_cpu;
+penv = penv-next_cpu;
 }
 
 return 1;
@@ -867,7 +867,7 @@ void pause_all_vcpus(void)
 while (penv) {
 penv-stop = 1;
 qemu_cpu_kick(penv);
-penv = (CPUState *)penv-next_cpu;
+penv = penv-next_cpu;
 }
 
 while (!all_vcpus_paused()) {
@@ -875,7 +875,7 @@ void pause_all_vcpus(void)
 penv = first_cpu;
 while (penv) {
 qemu_cpu_kick(penv);
-penv = (CPUState *)penv-next_cpu;
+penv = penv-next_cpu;
 }
 }
 }
@@ -889,7 +889,7 @@ void resume_all_vcpus(void)
 penv-stop = 0;
 penv-stopped = 0;
 qemu_cpu_kick(penv);
-penv = (CPUState *)penv-next_cpu;
+penv = penv-next_cpu;
 }
 }
 
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/9] kvm: Set cpu_single_env only once

2012-02-14 Thread Jan Kiszka
As we have thread-local cpu_single_env now and KVM uses exactly one
thread per VCPU, we can drop the cpu_single_env updates from the loop
and initialize this variable only once during setup.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 cpus.c|1 +
 kvm-all.c |5 -
 2 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/cpus.c b/cpus.c
index f45a438..d0c8340 100644
--- a/cpus.c
+++ b/cpus.c
@@ -714,6 +714,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
 qemu_mutex_lock(qemu_global_mutex);
 qemu_thread_get_self(env-thread);
 env-thread_id = qemu_get_thread_id();
+cpu_single_env = env;
 
 r = kvm_init_vcpu(env);
 if (r  0) {
diff --git a/kvm-all.c b/kvm-all.c
index c4babda..e2cbc03 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1118,8 +1118,6 @@ int kvm_cpu_exec(CPUState *env)
 return EXCP_HLT;
 }
 
-cpu_single_env = env;
-
 do {
 if (env-kvm_vcpu_dirty) {
 kvm_arch_put_registers(env, KVM_PUT_RUNTIME_STATE);
@@ -1136,13 +1134,11 @@ int kvm_cpu_exec(CPUState *env)
  */
 qemu_cpu_kick_self();
 }
-cpu_single_env = NULL;
 qemu_mutex_unlock_iothread();
 
 run_ret = kvm_vcpu_ioctl(env, KVM_RUN, 0);
 
 qemu_mutex_lock_iothread();
-cpu_single_env = env;
 kvm_arch_post_run(env, run);
 
 kvm_flush_coalesced_mmio_buffer();
@@ -1206,7 +1202,6 @@ int kvm_cpu_exec(CPUState *env)
 }
 
 env-exit_request = 0;
-cpu_single_env = NULL;
 return ret;
 }
 
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 9/9] kvmvapic: Use optionrom helpers

2012-02-14 Thread Jan Kiszka
Use OPTION_ROM_START/END from the common header file, add comment to
init code.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 pc-bios/optionrom/kvmvapic.S |   18 --
 1 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/pc-bios/optionrom/kvmvapic.S b/pc-bios/optionrom/kvmvapic.S
index 856c1e5..aa17a40 100644
--- a/pc-bios/optionrom/kvmvapic.S
+++ b/pc-bios/optionrom/kvmvapic.S
@@ -9,12 +9,10 @@
 # option) any later version.  See the COPYING file in the top-level directory.
 #
 
-   .text 0
-   .code16
-.global _start
-_start:
-   .short 0xaa55
-   .byte (_end - _start) / 512
+#include optionrom.h
+
+OPTION_ROM_START
+
# clear vapic area: firmware load using rep insb may cause
# stale tpr/isr/irr data to corrupt the vapic area.
push %es
@@ -26,8 +24,11 @@ _start:
cld
rep stosw
pop %es
+
+   # announce presence to the hypervisor
mov $vapic_base, %ax
out %ax, $0x7e
+
lret
 
.code32
@@ -331,7 +332,4 @@ up_set_tpr_poll_irq:
 vapic:
 . = . + vapic_size
 
-.byte 0  # reserve space for signature
-.align 512, 0
-
-_end:
+OPTION_ROM_END
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 4/9] target-i386: Add infrastructure for reporting TPR MMIO accesses

2012-02-14 Thread Jan Kiszka
This will allow the APIC core to file a TPR access report. Depending on
the accelerator and kernel irqchip mode, it will either be delivered
right away or queued for later reporting.

In TCG mode, we can restart the triggering instruction and can therefore
forward the event directly. KVM does not allows us to restart, so we
postpone the delivery of events recording in the user space APIC until
the current instruction is completed.

Note that KVM without in-kernel irqchip will report the address after
the instruction that triggered a write access. In contrast, read
accesses will return the precise information.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 cpu-all.h|3 ++-
 hw/apic.h|2 ++
 hw/apic_common.c |4 
 target-i386/cpu.h|   11 +++
 target-i386/helper.c |   19 +++
 target-i386/kvm.c|   24 ++--
 6 files changed, 60 insertions(+), 3 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index e2c3c49..80e6d42 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -375,8 +375,9 @@ DECLARE_TLS(CPUState *,cpu_single_env);
 #define CPU_INTERRUPT_TGT_INT_0   0x0100
 #define CPU_INTERRUPT_TGT_INT_1   0x0400
 #define CPU_INTERRUPT_TGT_INT_2   0x0800
+#define CPU_INTERRUPT_TGT_INT_3   0x2000
 
-/* First unused bit: 0x2000.  */
+/* First unused bit: 0x4000.  */
 
 /* The set of all bits that should be masked when single-stepping.  */
 #define CPU_INTERRUPT_SSTEP_MASK \
diff --git a/hw/apic.h b/hw/apic.h
index a62d83b..45598bd 100644
--- a/hw/apic.h
+++ b/hw/apic.h
@@ -18,6 +18,8 @@ void cpu_set_apic_tpr(DeviceState *s, uint8_t val);
 uint8_t cpu_get_apic_tpr(DeviceState *s);
 void apic_init_reset(DeviceState *s);
 void apic_sipi(DeviceState *s);
+void apic_handle_tpr_access_report(DeviceState *d, target_ulong ip,
+   int access);
 
 /* pc.c */
 int cpu_is_bsp(CPUState *env);
diff --git a/hw/apic_common.c b/hw/apic_common.c
index 8373d79..588531b 100644
--- a/hw/apic_common.c
+++ b/hw/apic_common.c
@@ -68,6 +68,10 @@ uint8_t cpu_get_apic_tpr(DeviceState *d)
 return s ? s-tpr  4 : 0;
 }
 
+void apic_handle_tpr_access_report(DeviceState *d, target_ulong ip, int access)
+{
+}
+
 void apic_report_irq_delivered(int delivered)
 {
 apic_irq_delivered += delivered;
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 37dde79..c2e9ca3 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -482,6 +482,7 @@
 #define CPU_INTERRUPT_VIRQ  CPU_INTERRUPT_TGT_INT_0
 #define CPU_INTERRUPT_INIT  CPU_INTERRUPT_TGT_INT_1
 #define CPU_INTERRUPT_SIPI  CPU_INTERRUPT_TGT_INT_2
+#define CPU_INTERRUPT_TPR   CPU_INTERRUPT_TGT_INT_3
 
 
 enum {
@@ -772,6 +773,9 @@ typedef struct CPUX86State {
 XMMReg ymmh_regs[CPU_NB_REGS];
 
 uint64_t xcr0;
+
+target_ulong tpr_access_ip;
+int tpr_access_type;
 } CPUX86State;
 
 CPUX86State *cpu_x86_init(const char *cpu_model);
@@ -1064,4 +1068,11 @@ void svm_check_intercept(CPUState *env1, uint32_t type);
 
 uint32_t cpu_cc_compute_all(CPUState *env1, int op);
 
+typedef enum TPRAccess {
+TPR_ACCESS_READ,
+TPR_ACCESS_WRITE,
+} TPRAccess;
+
+void cpu_report_tpr_access(CPUState *env, TPRAccess access);
+
 #endif /* CPU_I386_H */
diff --git a/target-i386/helper.c b/target-i386/helper.c
index 2586aff..79aeb8f 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -1189,6 +1189,25 @@ void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, 
int bank,
 }
 }
 }
+
+void cpu_report_tpr_access(CPUState *env, TPRAccess access)
+{
+TranslationBlock *tb;
+
+if (kvm_enabled()) {
+cpu_synchronize_state(env);
+
+env-tpr_access_ip = env-eip;
+env-tpr_access_type = access;
+
+cpu_interrupt(env, CPU_INTERRUPT_TPR);
+} else {
+tb = tb_find_pc(env-mem_io_pc);
+cpu_restore_state(tb, env, env-mem_io_pc);
+
+apic_handle_tpr_access_report(env-apic_state, env-eip, access);
+}
+}
 #endif /* !CONFIG_USER_ONLY */
 
 static void mce_init(CPUX86State *cenv)
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 981192d..fa77f9d 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1635,8 +1635,10 @@ void kvm_arch_pre_run(CPUState *env, struct kvm_run *run)
 }
 
 if (!kvm_irqchip_in_kernel()) {
-/* Force the VCPU out of its inner loop to process the INIT request */
-if (env-interrupt_request  CPU_INTERRUPT_INIT) {
+/* Force the VCPU out of its inner loop to process any INIT requests
+ * or pending TPR access reports. */
+if (env-interrupt_request 
+(CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
 env-exit_request = 1;
 }
 
@@ -1730,6 +1732,11 @@ int kvm_arch_process_async_events(CPUState *env)
 kvm_cpu_synchronize_state(env);
 do_cpu_sipi(env);
 }
+if (env-interrupt_request  CPU_INTERRUPT_TPR) {
+env-interrupt_request = ~CPU_INTERRUPT_TPR;
+

[PATCH v3 3/9] Allow to use pause_all_vcpus from VCPU context

2012-02-14 Thread Jan Kiszka
In order to perform critical manipulations on the VM state in the
context of a VCPU, specifically code patching, stopping and resuming of
all VCPUs may be necessary. resume_all_vcpus is already compatible, now
enable pause_all_vcpus for this use case by stopping the calling context
before starting to wait for the whole gang.

CC: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 cpus.c |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/cpus.c b/cpus.c
index 4e65894..290daa8 100644
--- a/cpus.c
+++ b/cpus.c
@@ -870,6 +870,18 @@ void pause_all_vcpus(void)
 penv = penv-next_cpu;
 }
 
+if (!qemu_thread_is_self(io_thread)) {
+cpu_stop_current();
+if (!kvm_enabled()) {
+while (penv) {
+penv-stop = 0;
+penv-stopped = 1;
+penv = penv-next_cpu;
+}
+return;
+}
+}
+
 while (!all_vcpus_paused()) {
 qemu_cond_wait(qemu_pause_cond, qemu_global_mutex);
 penv = first_cpu;
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 8/9] optionsrom: Reserve space for checksum

2012-02-14 Thread Jan Kiszka
Always add a byte before the final 512-bytes alignment to reserve the
space for the ROM checksum.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 pc-bios/optionrom/optionrom.h |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/pc-bios/optionrom/optionrom.h b/pc-bios/optionrom/optionrom.h
index aa783de..3daf7da 100644
--- a/pc-bios/optionrom/optionrom.h
+++ b/pc-bios/optionrom/optionrom.h
@@ -124,7 +124,8 @@
movw%ax, %ds;
 
 #define OPTION_ROM_END \
-.align 512, 0; \
+   .byte   0;  \
+   .align  512, 0; \
 _end:
 
 #define BOOT_ROM_END   \
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/9] uq/master: TPR access optimization for Windows guests

2012-02-14 Thread Jan Kiszka
v3 comes with the following changes:
 - clear TPR access report on system reset
   (in case we load a guest without the option ROM)
 - addressed review comments on details in kvmvapic.c
 - streamlined 16-bit VAPIC port handling
 - included cleanup for useless next_cpu casts in cpus.c
   (to avoid conflicts on merge)

The series is also available at

git://git.kiszka.org/qemu-kvm.git queues/kvm-tpr

Please review/apply.

CC: Paolo Bonzini pbonz...@redhat.com

Jan Kiszka (9):
  kvm: Set cpu_single_env only once
  Remove useless casts from cpu iterators
  Allow to use pause_all_vcpus from VCPU context
  target-i386: Add infrastructure for reporting TPR MMIO accesses
  kvmvapic: Add option ROM
  kvmvapic: Introduce TPR access optimization for Windows guests
  kvmvapic: Simplify mp/up_set_tpr
  optionsrom: Reserve space for checksum
  kvmvapic: Use optionrom helpers

 .gitignore|1 +
 Makefile  |2 +-
 Makefile.target   |3 +-
 cpu-all.h |3 +-
 cpus.c|   21 +-
 hw/apic.c |  126 ++-
 hw/apic.h |2 +
 hw/apic_common.c  |   68 -
 hw/apic_internal.h|   27 ++
 hw/kvm/apic.c |   32 ++
 hw/kvmvapic.c |  803 +
 kvm-all.c |5 -
 pc-bios/optionrom/Makefile|2 +-
 pc-bios/optionrom/kvmvapic.S  |  335 +
 pc-bios/optionrom/optionrom.h |3 +-
 target-i386/cpu.h |   11 +
 target-i386/helper.c  |   19 +
 target-i386/kvm.c |   24 ++-
 18 files changed, 1458 insertions(+), 29 deletions(-)
 create mode 100644 hw/kvmvapic.c
 create mode 100644 pc-bios/optionrom/kvmvapic.S

-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 6/9] kvmvapic: Introduce TPR access optimization for Windows guests

2012-02-14 Thread Jan Kiszka
This enables acceleration for MMIO-based TPR registers accesses of
32-bit Windows guest systems. It is mostly useful with KVM enabled,
either on older Intel CPUs (without flexpriority feature, can also be
manually disabled for testing) or any current AMD processor.

The approach introduced here is derived from the original version of
qemu-kvm. It was refactored, documented, and extended by support for
user space APIC emulation, both with and without KVM acceleration. The
VMState format was kept compatible, so was the ABI to the option ROM
that implements the guest-side para-virtualized driver service. This
enables seamless migration from qemu-kvm to upstream or, one day,
between KVM and TCG mode.

The basic concept goes like this:
 - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel
   irqchip) a vmcall hypercall is registered
 - VAPIC option ROM is loaded into guest
 - option ROM activates TPR MMIO access reporting via port 0x7e
 - TPR accesses are trapped and patched in the guest to call into option
   ROM instead, VAPIC support is enabled
 - option ROM TPR helpers track state in memory and invoke hypercall to
   poll for pending IRQs if required

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 Makefile.target|3 +-
 hw/apic.c  |  126 -
 hw/apic_common.c   |   64 -
 hw/apic_internal.h |   27 ++
 hw/kvm/apic.c  |   32 ++
 hw/kvmvapic.c  |  803 
 6 files changed, 1041 insertions(+), 14 deletions(-)
 create mode 100644 hw/kvmvapic.c

diff --git a/Makefile.target b/Makefile.target
index 68481a3..ec7eff8 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -230,7 +230,8 @@ obj-y += device-hotplug.o
 
 # Hardware support
 obj-i386-y += mc146818rtc.o pc.o
-obj-i386-y += sga.o apic_common.o apic.o ioapic_common.o ioapic.o piix_pci.o
+obj-i386-y += apic_common.o apic.o kvmvapic.o
+obj-i386-y += sga.o ioapic_common.o ioapic.o piix_pci.o
 obj-i386-y += vmport.o
 obj-i386-y += pci-hotplug.o smbios.o wdt_ib700.o
 obj-i386-y += debugcon.o multiboot.o
diff --git a/hw/apic.c b/hw/apic.c
index 086c544..2ebf3ca 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -35,6 +35,10 @@
 #define MSI_ADDR_DEST_ID_SHIFT 12
 #defineMSI_ADDR_DEST_ID_MASK   0x000
 
+#define SYNC_FROM_VAPIC 0x1
+#define SYNC_TO_VAPIC   0x2
+#define SYNC_ISR_IRR_TO_VAPIC   0x4
+
 static APICCommonState *local_apics[MAX_APICS + 1];
 
 static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode);
@@ -78,6 +82,70 @@ static inline int get_bit(uint32_t *tab, int index)
 return !!(tab[i]  mask);
 }
 
+/* return -1 if no bit is set */
+static int get_highest_priority_int(uint32_t *tab)
+{
+int i;
+for (i = 7; i = 0; i--) {
+if (tab[i] != 0) {
+return i * 32 + fls_bit(tab[i]);
+}
+}
+return -1;
+}
+
+static void apic_sync_vapic(APICCommonState *s, int sync_type)
+{
+VAPICState vapic_state;
+size_t length;
+off_t start;
+int vector;
+
+if (!s-vapic_paddr) {
+return;
+}
+if (sync_type  SYNC_FROM_VAPIC) {
+cpu_physical_memory_rw(s-vapic_paddr, (void *)vapic_state,
+   sizeof(vapic_state), 0);
+s-tpr = vapic_state.tpr;
+}
+if (sync_type  (SYNC_TO_VAPIC | SYNC_ISR_IRR_TO_VAPIC)) {
+start = offsetof(VAPICState, isr);
+length = offsetof(VAPICState, enabled) - offsetof(VAPICState, isr);
+
+if (sync_type  SYNC_TO_VAPIC) {
+assert(qemu_cpu_is_self(s-cpu_env));
+
+vapic_state.tpr = s-tpr;
+vapic_state.enabled = 1;
+start = 0;
+length = sizeof(VAPICState);
+}
+
+vector = get_highest_priority_int(s-isr);
+if (vector  0) {
+vector = 0;
+}
+vapic_state.isr = vector  0xf0;
+
+vapic_state.zero = 0;
+
+vector = get_highest_priority_int(s-irr);
+if (vector  0) {
+vector = 0;
+}
+vapic_state.irr = vector  0xff;
+
+cpu_physical_memory_write_rom(s-vapic_paddr + start,
+  ((void *)vapic_state) + start, length);
+}
+}
+
+static void apic_vapic_base_update(APICCommonState *s)
+{
+apic_sync_vapic(s, SYNC_TO_VAPIC);
+}
+
 static void apic_local_deliver(APICCommonState *s, int vector)
 {
 uint32_t lvt = s-lvt[vector];
@@ -239,20 +307,17 @@ static void apic_set_base(APICCommonState *s, uint64_t 
val)
 
 static void apic_set_tpr(APICCommonState *s, uint8_t val)
 {
-s-tpr = (val  0x0f)  4;
-apic_update_irq(s);
+/* Updates from cr8 are ignored while the VAPIC is active */
+if (!s-vapic_paddr) {
+s-tpr = val  4;
+apic_update_irq(s);
+}
 }
 
-/* return -1 if no bit is set */
-static int get_highest_priority_int(uint32_t *tab)
+static uint8_t apic_get_tpr(APICCommonState *s)
 {
-int i;
-for(i 

Re: level in kvm_mmu_page_role

2012-02-14 Thread Avi Kivity
On 02/13/2012 11:30 PM, Sanidhya Kashyap wrote:
 I have been going through the kvm code but didn't get the significance
 of level in kvm_mmu_page_role. So, it would be nice if anyone can
 explain it what is its use?



It's the page table level.  Level 1 contains page table entries pointing
to 4k pages.  Level 2 contains page directory entries pointing to level
1 page tables, or pointers to 2M pages, and so forth.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4 V13] Avoid soft lockup message when KVM is stopped by host

2012-02-14 Thread Eric B Munson
On Wed, 08 Feb 2012, Eric B Munson wrote:

 
 When a guest kernel is stopped by the host hypervisor it can look like a soft
 lockup to the guest kernel.  This false warning can mask later soft lockup
 warnings which may be real.  This patch series adds a method for a host
 hypervisor to communicate to a guest kernel that it is being stopped.  The
 final patch in the series has the watchdog check this flag when it goes to
 issue a soft lockup warning and skip the warning if the guest knows it was
 stopped.
 
 It was attempted to solve this in Qemu, but the side effects of saving and
 restoring the clock and tsc for each vcpu put the wall clock of the guest 
 behind
 by the amount of time of the pause.  This forces a guest to have ntp running
 in order to keep the wall clock accurate.

Avi,

Is this set fit for merging or is there something else you want changed?

Eric

 
 Cc: mi...@redhat.com
 Cc: h...@zytor.com
 Cc: ry...@linux.vnet.ibm.com
 Cc: aligu...@us.ibm.com
 Cc: mtosa...@redhat.com
 Cc: kvm@vger.kernel.org
 Cc: linux-a...@vger.kernel.org
 Cc: x...@kernel.org
 Cc: linux-ker...@vger.kernel.org
 
 Eric B Munson (4):
   Add flag to indicate that a vm was stopped by the host
   Add functions to check if the host has stopped the vm
   Add ioctl for KVM_KVMCLOCK_CTRL
   Add check for suspended vm in softlockup detector
 
  Documentation/virtual/kvm/api.txt   |   13 +
  arch/ia64/include/asm/kvm_para.h|5 +
  arch/powerpc/include/asm/kvm_para.h |5 +
  arch/s390/include/asm/kvm_para.h|5 +
  arch/x86/include/asm/kvm_para.h |8 
  arch/x86/include/asm/pvclock-abi.h  |1 +
  arch/x86/kernel/kvmclock.c  |   21 +
  arch/x86/kvm/x86.c  |   22 ++
  include/asm-generic/kvm_para.h  |   14 ++
  include/linux/kvm.h |3 +++
  kernel/watchdog.c   |   12 
  11 files changed, 109 insertions(+), 0 deletions(-)
  create mode 100644 include/asm-generic/kvm_para.h
 
 -- 
 1.7.5.4
 


signature.asc
Description: Digital signature


Re: The way of mapping BIOS into the guest's address space

2012-02-14 Thread Pekka Enberg
On Tue, Feb 14, 2012 at 09:20:18PM +0800, Yang Bai wrote:
 And will seabios replace the present bios implement or co-exsit?

On Tue, Feb 14, 2012 at 3:32 PM, Cyrill Gorcunov gorcu...@openvz.org wrote:
 Ideally we should get rid of our minibios completely and only have
 seabios here instead.

No, no, they should co-exist. There's absolutely no reason to force
people to use a BIOS to boot Linux.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: The way of mapping BIOS into the guest's address space

2012-02-14 Thread Cyrill Gorcunov
On Tue, Feb 14, 2012 at 05:35:47PM +0200, Pekka Enberg wrote:
 On Tue, Feb 14, 2012 at 09:20:18PM +0800, Yang Bai wrote:
  And will seabios replace the present bios implement or co-exsit?
 
 On Tue, Feb 14, 2012 at 3:32 PM, Cyrill Gorcunov gorcu...@openvz.org wrote:
  Ideally we should get rid of our minibios completely and only have
  seabios here instead.
 
 No, no, they should co-exist. There's absolutely no reason to force
 people to use a BIOS to boot Linux.
 

I meant run-time (ie in memory). I didn't mean substitude our minibios,
but rather have an ability to either run with compiled-in bios or with
seabios instead.

Cyrill
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: The way of mapping BIOS into the guest's address space

2012-02-14 Thread Pekka Enberg
On Tue, Feb 14, 2012 at 5:38 PM, Cyrill Gorcunov gorcu...@openvz.org wrote:
  Ideally we should get rid of our minibios completely and only have
  seabios here instead.

 No, no, they should co-exist. There's absolutely no reason to force
 people to use a BIOS to boot Linux.

 I meant run-time (ie in memory). I didn't mean substitude our minibios,
 but rather have an ability to either run with compiled-in bios or with
 seabios instead.

Sure.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4 V13] Avoid soft lockup message when KVM is stopped by host

2012-02-14 Thread Marcelo Tosatti
On Tue, Feb 14, 2012 at 10:29:31AM -0500, Eric B Munson wrote:
 On Wed, 08 Feb 2012, Eric B Munson wrote:
 
  
  When a guest kernel is stopped by the host hypervisor it can look like a 
  soft
  lockup to the guest kernel.  This false warning can mask later soft lockup
  warnings which may be real.  This patch series adds a method for a host
  hypervisor to communicate to a guest kernel that it is being stopped.  The
  final patch in the series has the watchdog check this flag when it goes to
  issue a soft lockup warning and skip the warning if the guest knows it was
  stopped.
  
  It was attempted to solve this in Qemu, but the side effects of saving and
  restoring the clock and tsc for each vcpu put the wall clock of the guest 
  behind
  by the amount of time of the pause.  This forces a guest to have ntp running
  in order to keep the wall clock accurate.
 
 Avi,
 
 Is this set fit for merging or is there something else you want changed?

Eric,

On Message-ID: 20120210160536.ga23...@amt.cnet, i asked:

How is the stub getting included for other architectures again?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4 V13] Avoid soft lockup message when KVM is stopped by host

2012-02-14 Thread Eric B Munson
On Tue, 14 Feb 2012, Marcelo Tosatti wrote:

 On Tue, Feb 14, 2012 at 10:29:31AM -0500, Eric B Munson wrote:
  On Wed, 08 Feb 2012, Eric B Munson wrote:
  
   
   When a guest kernel is stopped by the host hypervisor it can look like a 
   soft
   lockup to the guest kernel.  This false warning can mask later soft lockup
   warnings which may be real.  This patch series adds a method for a host
   hypervisor to communicate to a guest kernel that it is being stopped.  The
   final patch in the series has the watchdog check this flag when it goes to
   issue a soft lockup warning and skip the warning if the guest knows it was
   stopped.
   
   It was attempted to solve this in Qemu, but the side effects of saving and
   restoring the clock and tsc for each vcpu put the wall clock of the guest 
   behind
   by the amount of time of the pause.  This forces a guest to have ntp 
   running
   in order to keep the wall clock accurate.
  
  Avi,
  
  Is this set fit for merging or is there something else you want changed?
 
 Eric,
 
 On Message-ID: 20120210160536.ga23...@amt.cnet, i asked:
 
 How is the stub getting included for other architectures again?
 

Marcelo,

Sorry, I put out V13 to answer that.  There is a stub in asm-generic that was
lost in the V11-V12 rebase.  This stub has be included in the V13 set.

Eric


signature.asc
Description: Digital signature


Re: [PATCH 0/4 V13] Avoid soft lockup message when KVM is stopped by host

2012-02-14 Thread Marcelo Tosatti
On Tue, Feb 14, 2012 at 10:50:13AM -0500, Eric B Munson wrote:
 On Tue, 14 Feb 2012, Marcelo Tosatti wrote:
 
  On Tue, Feb 14, 2012 at 10:29:31AM -0500, Eric B Munson wrote:
   On Wed, 08 Feb 2012, Eric B Munson wrote:
   

When a guest kernel is stopped by the host hypervisor it can look like 
a soft
lockup to the guest kernel.  This false warning can mask later soft 
lockup
warnings which may be real.  This patch series adds a method for a host
hypervisor to communicate to a guest kernel that it is being stopped.  
The
final patch in the series has the watchdog check this flag when it goes 
to
issue a soft lockup warning and skip the warning if the guest knows it 
was
stopped.

It was attempted to solve this in Qemu, but the side effects of saving 
and
restoring the clock and tsc for each vcpu put the wall clock of the 
guest behind
by the amount of time of the pause.  This forces a guest to have ntp 
running
in order to keep the wall clock accurate.
   
   Avi,
   
   Is this set fit for merging or is there something else you want changed?
  
  Eric,
  
  On Message-ID: 20120210160536.ga23...@amt.cnet, i asked:
  
  How is the stub getting included for other architectures again?
  
 
 Marcelo,
 
 Sorry, I put out V13 to answer that.  There is a stub in asm-generic that was
 lost in the V11-V12 rebase.  This stub has be included in the V13 set.
 
 Eric

Eric, 

I know the stub has been included in the series. But i am asking how 
it is #include'ed for other architectures? (can't see that).


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] virt: Fix migration bg command

2012-02-14 Thread Lucas Meneghel Rodrigues
In migration tests, the command we were using as a
'watchdog' command was tcpdump, but without specifying
which interface it should listen to. As this may fail
depending on the interface ordering, let's change the
command to listen in all interfaces, since this way
it's safer and the command won't fail depending on
the interface ordering.

Signed-off-by: Eduardo Habkost ehabk...@redhat.com
---
 client/virt/subtests.cfg.sample |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/client/virt/subtests.cfg.sample b/client/virt/subtests.cfg.sample
index b08a5c4..56043e0 100644
--- a/client/virt/subtests.cfg.sample
+++ b/client/virt/subtests.cfg.sample
@@ -350,7 +350,7 @@ variants:
 - migrate: install setup image_copy unattended_install.cdrom
 type = migration
 migration_test_command = help
-migration_bg_command = cd /tmp; nohup tcpdump -q -t ip host localhost
+migration_bg_command = cd /tmp; nohup tcpdump -q -i any -t ip host 
localhost
 migration_bg_check_command = pgrep tcpdump
 migration_bg_kill_command = pkill tcpdump
 kill_vm_on_error = yes
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock

2012-02-14 Thread Andrea Arcangeli
On Fri, Feb 10, 2012 at 03:28:31PM +0900, Takuya Yoshikawa wrote:
 Other threads may process the same page in that small window and skip
 TLB flush and then return before these functions do flush.

It's correct to flush the shadow MMU TLB without the mmu_lock only in
the context of mmu notifier methods. So the below while won't hurt,
it's performance regression and shouldn't be applied (and
it obfuscates the code by not being strict anymore).

To the contrary every other place that does a shadow/secondary MMU smp
tlb flush _must_ happen inside the mmu_lock, otherwise the
serialization isn't correct anymore against the very below mmu_lock in
the below quoted patch taken by
kvm_mmu_notifier_invalidate_page/kvm_mmu_notifier_invalidate_range_start.

The explanation is in commit 4539b35881ae9664b0e2953438dd83f5ee02c0b4.

I'll try to explain it more clearly: the moment you drop mmu_lock,
pages can be freed. So if you invalidate a spte in any place inside
the KVM code (except the mmu notifier methods where a reference of the
page is implicitly hold by the caller and so the page can't go away
under a mmu notifier method by design), then the below
kvm_mmu_notifier_invalidate_page/kvm_mmu_notifier_invalidate_range_start
won't get their need_tlb_flush set anymore, and they won't run the tlb
flush before freeing the page.

So every other place (except mmu notifier) must flush the secondary
MMU smp tlb _before_ releasing the mmu_lock.

Only mmu notifier is safe to flush the secondary MMU TLB _after_
releasing the mmu_lock.

If we changed the mmu notifier methods to unconditionally flush the
shadow TLB (regardless if a spte was present or not), we might not
need to hold the mmu_lock in every tlb flush outside the context of
the mmu notifier methods. But then the mmu notifier methods would
become more expensive, I didn't evaluate fully what would be the side
effects of such a change. Also note, only the
kvm_mmu_notifier_invalidate_page and
kvm_mmu_notifier_invalidate_range_start would need to do that, because
they're the only two where the page reference gets dropped.

Even shorter: because the mmu notifier a implicit reference on the
page exists and is hold by the caller, they can flush outside the
mmu_lock. Every other place in KVM only holds an implicit valid
reference on the page only as long as you hold the mmu_lock, or while
a spte is still established.

Well it's not easy logic so it's not surprising it wasn't totally
clear.

It's probably not heavily documented, and the fact you changed it
still is still good so we refresh our minds on the exact rules of mmu
notifier locking, thanks!

Andrea

 
 Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
 ---
  virt/kvm/kvm_main.c |   19 ++-
  1 files changed, 10 insertions(+), 9 deletions(-)
 
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index 470e305..2b4bc77 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -289,15 +289,15 @@ static void kvm_mmu_notifier_invalidate_page(struct 
 mmu_notifier *mn,
*/
   idx = srcu_read_lock(kvm-srcu);
   spin_lock(kvm-mmu_lock);
 +
   kvm-mmu_notifier_seq++;
   need_tlb_flush = kvm_unmap_hva(kvm, address) | kvm-tlbs_dirty;
 - spin_unlock(kvm-mmu_lock);
 - srcu_read_unlock(kvm-srcu, idx);
 -
   /* we've to flush the tlb before the pages can be freed */
   if (need_tlb_flush)
   kvm_flush_remote_tlbs(kvm);
  
 + spin_unlock(kvm-mmu_lock);
 + srcu_read_unlock(kvm-srcu, idx);
  }
  
  static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
 @@ -335,12 +335,12 @@ static void 
 kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
   for (; start  end; start += PAGE_SIZE)
   need_tlb_flush |= kvm_unmap_hva(kvm, start);
   need_tlb_flush |= kvm-tlbs_dirty;
 - spin_unlock(kvm-mmu_lock);
 - srcu_read_unlock(kvm-srcu, idx);
 -
   /* we've to flush the tlb before the pages can be freed */
   if (need_tlb_flush)
   kvm_flush_remote_tlbs(kvm);
 +
 + spin_unlock(kvm-mmu_lock);
 + srcu_read_unlock(kvm-srcu, idx);
  }
  
  static void kvm_mmu_notifier_invalidate_range_end(struct mmu_notifier *mn,
 @@ -378,13 +378,14 @@ static int kvm_mmu_notifier_clear_flush_young(struct 
 mmu_notifier *mn,
  
   idx = srcu_read_lock(kvm-srcu);
   spin_lock(kvm-mmu_lock);
 - young = kvm_age_hva(kvm, address);
 - spin_unlock(kvm-mmu_lock);
 - srcu_read_unlock(kvm-srcu, idx);
  
 + young = kvm_age_hva(kvm, address);
   if (young)
   kvm_flush_remote_tlbs(kvm);
  
 + spin_unlock(kvm-mmu_lock);
 + srcu_read_unlock(kvm-srcu, idx);
 +
   return young;
  }
  
 -- 
 1.7.5.4
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body 

Re: [PATCH 2/2] KVM: MMU: Flush TLBs only once in invlpg() before releasing mmu_lock

2012-02-14 Thread Andrea Arcangeli
On Tue, Feb 14, 2012 at 01:56:17PM +0900, Takuya Yoshikawa wrote:
 (2012/02/14 13:36), Takuya Yoshikawa wrote:
 
  BTW, do you think that kvm_mmu_flush_tlb() should be moved inside of the
  mmu_lock critical section?
 
 
 Ah, forget about this.  Trivially no.

Yes the reason is that it's the local flush and guest mode isn't
running if we're running that function so it's ok to run it later.

About the other change you did in this patch 2/2, I can't find the
code you're patching in the 3.2 upstream source, when I added the tlb
flush to invlpg, I immediately used a cumulative need_flush at the end
(before relasing mmu_lock of course).

   if (need_flush)
  kvm_flush_remote_tlbs(vcpu-kvm);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock

2012-02-14 Thread Andrea Arcangeli
On Fri, Feb 10, 2012 at 03:52:49PM +0800, Xiao Guangrong wrote:
 On 02/10/2012 02:28 PM, Takuya Yoshikawa wrote:
 
  Other threads may process the same page in that small window and skip
  TLB flush and then return before these functions do flush.
  
 
 
 It is possible that flush tlb in mmu lock only when writeable
 spte is invalided? Sometimes, kvm_flush_remote_tlbs need
 long time to wait.

readonly isn't enough to defer the flush after mmu_lock is
released... if you do it only for writable spte, then what can happen
is the guest may read random data and would crash.

However for this case, the mmu_notifier methods (and only them) are
perfectly safe to flush the shadow MMU TLB after the mmu_lock is
released because the page reference is guaranteed hold by the caller
(not the case for any other place where a spte gets dropped in KVM,
all other places dropping sptes, can only on the mmu notifier to block
on the mmu_lock in order to have a guarantee of the page not being
freed under them, so in every other place the shadow MMU TLB flush
must happen before releasing the mmu_lock so the mmu_notifier will
wait and prevent the page to be freed until all other CPUs running in
guest mode stopped accessing it).
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock

2012-02-14 Thread Marcelo Tosatti
On Tue, Feb 14, 2012 at 06:10:44PM +0100, Andrea Arcangeli wrote:
 On Fri, Feb 10, 2012 at 03:28:31PM +0900, Takuya Yoshikawa wrote:
  Other threads may process the same page in that small window and skip
  TLB flush and then return before these functions do flush.
 
 It's correct to flush the shadow MMU TLB without the mmu_lock only in
 the context of mmu notifier methods. So the below while won't hurt,
 it's performance regression and shouldn't be applied (and
 it obfuscates the code by not being strict anymore).
 
 To the contrary every other place that does a shadow/secondary MMU smp
 tlb flush _must_ happen inside the mmu_lock, otherwise the
 serialization isn't correct anymore against the very below mmu_lock in
 the below quoted patch taken by
 kvm_mmu_notifier_invalidate_page/kvm_mmu_notifier_invalidate_range_start.
 
 The explanation is in commit 4539b35881ae9664b0e2953438dd83f5ee02c0b4.
 
 I'll try to explain it more clearly: the moment you drop mmu_lock,
 pages can be freed. So if you invalidate a spte in any place inside
 the KVM code (except the mmu notifier methods where a reference of the
 page is implicitly hold by the caller and so the page can't go away
 under a mmu notifier method by design), then the below
 kvm_mmu_notifier_invalidate_page/kvm_mmu_notifier_invalidate_range_start
 won't get their need_tlb_flush set anymore, and they won't run the tlb
 flush before freeing the page.
 
 So every other place (except mmu notifier) must flush the secondary
 MMU smp tlb _before_ releasing the mmu_lock.
 
 Only mmu notifier is safe to flush the secondary MMU TLB _after_
 releasing the mmu_lock.
 
 If we changed the mmu notifier methods to unconditionally flush the
 shadow TLB (regardless if a spte was present or not), we might not
 need to hold the mmu_lock in every tlb flush outside the context of
 the mmu notifier methods. But then the mmu notifier methods would
 become more expensive, I didn't evaluate fully what would be the side
 effects of such a change. Also note, only the
 kvm_mmu_notifier_invalidate_page and
 kvm_mmu_notifier_invalidate_range_start would need to do that, because
 they're the only two where the page reference gets dropped.
 
 Even shorter: because the mmu notifier a implicit reference on the
 page exists and is hold by the caller, they can flush outside the
 mmu_lock. Every other place in KVM only holds an implicit valid
 reference on the page only as long as you hold the mmu_lock, or while
 a spte is still established.
 
 Well it's not easy logic so it's not surprising it wasn't totally
 clear.
 
 It's probably not heavily documented, and the fact you changed it
 still is still good so we refresh our minds on the exact rules of mmu
 notifier locking, thanks!

The problem the patch is fixing is not related to page freeing, but
rmap_write_protect. From 8bf3f6f06fcdfd097b6c6ec51531d8292fa0d81d
(replace A (get_dirty_log) with mmu_notifier_invalidate_page):


During protecting pages for dirty logging, other threads may also try
to protect a page in mmu_sync_children() or kvm_mmu_get_page().

In such a case, because get_dirty_log releases mmu_lock before flushing
TLB's, the following race condition can happen:

  A (get_dirty_log) B (another thread)

  lock(mmu_lock)
  clear pte.w
  unlock(mmu_lock)
lock(mmu_lock)
pte.w is already cleared
unlock(mmu_lock)
skip TLB flush
return
  ...
  TLB flush

Though thread B assumes the page has already been protected when it
returns, the remaining TLB entry will break that assumption.


 
 Andrea
 
  
  Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
  ---
   virt/kvm/kvm_main.c |   19 ++-
   1 files changed, 10 insertions(+), 9 deletions(-)
  
  diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
  index 470e305..2b4bc77 100644
  --- a/virt/kvm/kvm_main.c
  +++ b/virt/kvm/kvm_main.c
  @@ -289,15 +289,15 @@ static void kvm_mmu_notifier_invalidate_page(struct 
  mmu_notifier *mn,
   */
  idx = srcu_read_lock(kvm-srcu);
  spin_lock(kvm-mmu_lock);
  +
  kvm-mmu_notifier_seq++;
  need_tlb_flush = kvm_unmap_hva(kvm, address) | kvm-tlbs_dirty;
  -   spin_unlock(kvm-mmu_lock);
  -   srcu_read_unlock(kvm-srcu, idx);
  -
  /* we've to flush the tlb before the pages can be freed */
  if (need_tlb_flush)
  kvm_flush_remote_tlbs(kvm);
   
  +   spin_unlock(kvm-mmu_lock);
  +   srcu_read_unlock(kvm-srcu, idx);
   }
   
   static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
  @@ -335,12 +335,12 @@ static void 
  kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
  for (; start  end; start += PAGE_SIZE)
  need_tlb_flush |= kvm_unmap_hva(kvm, start);
  need_tlb_flush |= kvm-tlbs_dirty;
  -   spin_unlock(kvm-mmu_lock);
  -   srcu_read_unlock(kvm-srcu, idx);
  -
  /* we've to flush the tlb before the pages can 

Re: [VT-d reboot problems] Re: [PATCH] x86 / reboot: Blacklist Dell OptiPlex 990 known to require PCI reboot

2012-02-14 Thread Bastien ROUCARIES
On Tue, Jan 31, 2012 at 1:15 PM, Ingo Molnar mi...@elte.hu wrote:

 (added KVM folks to the Cc:)

 * Bastien ROUCARIES roucaries.bast...@gmail.com wrote:

 Ping^2

 Bastien
 On Mon, Jan 23, 2012 at 11:28 AM, Bastien ROUCARIES
 roucaries.bast...@gmail.com wrote:
  On Mon, Jan 16, 2012 at 8:21 PM, H. Peter Anvin h...@zytor.com wrote:
  On 01/16/2012 03:27 AM, Bastien ROUCARIES wrote:
 
  Does it work if you disable VT-d in the firmware? If so, then adding it
  to the reboot method blacklist is the wrong fix - we need to figure out
  why VT-d interferes with Dell's reboot code.
 
  Yes it work
 
 
  This is particularly so since we are very close to having a full Dell
  model catalogue in the kernel...
 
  Ping ? Do you need some dump ? testing ?

 So disabling VT-d in the BIOS fixes the reboot problem and
 Matthew Garrett suggests we should figure out why and how VT-d
 on this Dell box interferes with the reboot method.

 Thanks,

        Ingo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: AESNI and guest hosts

2012-02-14 Thread Brian Jackson
On Tuesday, February 14, 2012 03:31:10 AM Ryan Brown wrote:
 Sorry for being a noob here, Any clues with this?, anyone ...
 
 On Mon, Feb 13, 2012 at 2:05 AM, Ryan Brown mp3g...@gmail.com wrote:
  Host/KVM server is running linux 3.2.4 (Debian wheezy), and guest
  kernel is running 3.2.5. The cpu is an E3-1230, but for some reason
  its not able to supply the guest with aesni. Is there a config option
  or is there something we're missing?



I don't think it's supported to pass that functionality to the guest.



  
 cpu
  archx86_64/arch
  modelWestmere/model
  vendorIntel/vendor
  topology sockets='1' cores='4' threads='2'/
  feature name='rdtscp'/
  feature name='x2apic'/
  feature name='xtpr'/
  feature name='tm2'/
  feature name='est'/
  feature name='vmx'/
  feature name='ds_cpl'/
  feature name='monitor'/
  feature name='pbe'/
  feature name='tm'/
  feature name='ht'/
  feature name='ss'/
  feature name='acpi'/
  feature name='ds'/
 feature name='vme'/
   /cpu
  
  Guest:
  [root@fanboy:~]# cat /proc/cpuinfo
  processor   : 0
  vendor_id   : GenuineIntel
  cpu family  : 6
  model   : 2
  model name  : QEMU Virtual CPU version 1.0
  stepping: 3
  microcode   : 0x1
  cpu MHz : 3192.748
  cache size  : 4096 KB
  fdiv_bug: no
  hlt_bug : no
  f00f_bug: no
  coma_bug: no
  fpu : yes
  fpu_exception   : yes
  cpuid level : 4
  wp  : yes
  flags   : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca
  cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm pni cx16 popcnt
  hypervisor lahf_lm
  bogomips: 6385.49
  clflush size: 64
  cache_alignment : 64
  address sizes   : 40 bits physical, 48 bits virtual
  power management:
  
  processor   : 1
  vendor_id   : GenuineIntel
  cpu family  : 6
  model   : 2
  model name  : QEMU Virtual CPU version 1.0
  stepping: 3
  microcode   : 0x1
  cpu MHz : 3192.748
  cache size  : 4096 KB
  fdiv_bug: no
  hlt_bug : no
  f00f_bug: no
  coma_bug: no
  fpu : yes
  fpu_exception   : yes
  cpuid level : 4
  wp  : yes
  flags   : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca
  cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm pni cx16 popcnt
  hypervisor lahf_lm
  bogomips: 6385.49
  clflush size: 64
  cache_alignment : 64
  address sizes   : 40 bits physical, 48 bits virtual
 
  power management:
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock

2012-02-14 Thread Andrea Arcangeli
On Tue, Feb 14, 2012 at 03:29:47PM -0200, Marcelo Tosatti wrote:
 The problem the patch is fixing is not related to page freeing, but
 rmap_write_protect. From 8bf3f6f06fcdfd097b6c6ec51531d8292fa0d81d

Can't find the commit on kvm.git.

 (replace A (get_dirty_log) with mmu_notifier_invalidate_page):
 
 
 During protecting pages for dirty logging, other threads may also try
 to protect a page in mmu_sync_children() or kvm_mmu_get_page().
 
 In such a case, because get_dirty_log releases mmu_lock before flushing
 TLB's, the following race condition can happen:
 
   A (get_dirty_log) B (another thread)
 
   lock(mmu_lock)
   clear pte.w
   unlock(mmu_lock)
 lock(mmu_lock)
 pte.w is already cleared
 unlock(mmu_lock)
 skip TLB flush

Not sure which tree it is, but in kvm and upstream I see an
unconditional tlb flush here, not skip (both
kvm_mmu_slot_remove_write_access and kvm_mmu_rmap_write_protect). So I
assume this has been updated in your tree to eb conditional.

Also note kvm_mmu_rmap_write_protect, flushes outside of the mmu_lock
in the kvm_mmu_rmap_write_protect case (like in quoted description),
so two write_protect_slot in parallel against each other may not be
ok, but that may be enforced by design if qemu won't ever call that
ioctl from two different userland threads (it doesn't sounds security
related so it should be ok to enforce its safety by userland design).

 return
   ...
   TLB flush
 
 Though thread B assumes the page has already been protected when it
 returns, the remaining TLB entry will break that assumption.

Now I get the question of why not running the TLB flush inside the
mmu_lock only if the spte was writable :).

kvm_mmu_get_page as long as it only runs in the context of a kvm page
fault is ok, because the page fault would be inhibited by the mmu
notifier invalidates, so maybe it's safe.

mmu_sync_children seems to have a problem instead, in your tree
get_dirty_log also has an issue if it has been updated to skip the
flush on readonly sptes, like I guess.

Interesting how the spte is already non present, the page is just
being freed shortly later, but yet we still need to trigger write
faults synchronously and prevent other CPUs in guest mode to further
modify the page to avoid losing dirty bits updates or updates on
pagetables that maps pagetables in the not NPT/EPT case. If the page
was really only going to be freed it would be ok if the other cpus
would still write to it for a little longer until the page was freed.

Like I wrote in previous email, I was thinking if we'd change the mmu
notifier methods to do an unconditional flush, then every other flush
could also run outside of the mmu_lock. But then I didn't think enough
about this to be sure. My guess is we could move all flushes outside
the mmu_lock if we stop flushling the tlb conditonally to the current
spte values. It'd clearly be slower for an UP guest though :). Large
SMP guests might benefit, if that is feasible at all... It depends how
problematic the mmu_lock is on the large SMP guests and how much we're
saving by doing conditional TLB flushes.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware

2012-02-14 Thread John Fastabend
On 2/14/2012 5:18 AM, jamal wrote:
 On Mon, 2012-02-13 at 07:13 -0800, John Fastabend wrote:
 
 The use case here is multiple VFs but the same solution should work with
 multiple PFs as well. FDB controls should be independent of how the ports
 are exposed VFs, PFs, VMDQ/queue pairs, macvlan, etc.
 
 Makes sense.
 
 With events and ADD/DEL/GET FDB controls we can solve both cases. This also
 solves Roopa's case with macvlan where she wants to add additional addresses
 to macvlan ports.
 
 Not familiar with that issue - I'll prowl the list.

Roopa was likely on the right track here,

http://patchwork.ozlabs.org/patch/123064/

But I think the proper syntax is to use the existing PF_BRIDGE:RTM_XXX
netlink messages. And if possible drive this without extending ndo_ops.

An ideal user space interaction IMHO would look like,

[root@jf-dev1-dcblab iproute2]# ./br/br fdb add 52:e5:62:7b:57:88 dev veth10
[root@jf-dev1-dcblab iproute2]# ./br/br fdb
portmac addrflags
veth2   36:a6:35:9b:96:c4   local
veth4   aa:54:b0:7b:42:ef   local
veth0   2a:e8:5c:95:6c:1b   local
veth6   6e:26:d5:43:a3:36   local
veth0   f2:c1:39:76:6a:fb
veth8   4e:35:16:af:87:13   local
veth10  52:e5:62:7b:57:88   static
veth10  aa:a9:35:21:15:c4   local
[root@jf-dev1-dcblab iproute2]# ./br/br fdb add dev eth3 to 52:e5:62:7b:57:88
RTNETLINK answers: Invalid argument

Using Stephen's br tool. First command adds FDB entry to SW bridge and
if the same tool could be used to add entries to embedded bridge I think
that would be the best case. So no RTNETLINK error on the second cmd. Then
embedded FDB entries could be dumped this way also so I get a complete view
of my FDB setup across multiple sw bridges and embedded bridges.

I don't think br is part of iproute2 yet I just pulled it out of some RFC
but it works reasonably well and is intuitive enough.

 
 Yes it should flood here, unless its acting as a 802.1Qbg VEB or VEPA.
 
 Ok. So there is a toggle somewhere which controls how flooding should
 happen.
 

Yes. The hardware has a bit to support this which is currently not exposed
to user space. That's a case where we have 'yet another knob' that needs
a clean solution. This causes real bugs today when users try to use the
macvlan devices in VEPA mode on top of SR-IOV. By the way these modes are
all part of the 802.1Qbg spec which people actually want to use with Linux
so a good clean solution is probably needed.


 Maybe not. But the kernel already has the needed signals with one extra
 hook we can save running a daemon in user space. Maybe that's not a great
 argument to add kernel code though.
 
 You make a reasonable arguement to have it in the kernel but i think we
 win more if we separate the control. So while i empathize, I am hoping
 that youd go with the path that is hard to travel ;-
 
 The PF_BRIDGE:RTM_GETNEIGH,RTM_NEWNEIGH,RTM_DELNEIGH are registered in the
 br_netlink_init() path. 
 
 Hrm - hadnt paid attention to that before. Nasty.
 The bridge seems to be hard-coding policy on station movement, no? 
 This is a good example of the qualms i have on adding things to the
 kernel;-
 I may not want to auto update a MAC address moving ports as part of
 some policy i have. I can go and add YAK (Yet Another Knob) - but where
 is the line drawn?
 

I have no problem with drawing the line here and trying to implement something
over PF_BRIDGE:RTM_xxx nlmsgs. I'll work with Roopa and see if we can come
up with something in the next couple days.

w.r.t. VEPA/VEB and flooding behavior we could probably have a bit to indicate
if the port is a flooding port or not. Then users could build any sort of 
forwarding
table they wanted OR we could just drive it through a notifier (ndo_ops?) in the
macvlan path which does VEPA today.

OK I'll try to write some actual code now that can be critiqued.

 cheers,
 jamal
 
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware

2012-02-14 Thread Stephen Hemminger
On Tue, 14 Feb 2012 10:57:04 -0800
John Fastabend john.r.fastab...@intel.com wrote:

 On 2/14/2012 5:18 AM, jamal wrote:
  On Mon, 2012-02-13 at 07:13 -0800, John Fastabend wrote:
  
  The use case here is multiple VFs but the same solution should work with
  multiple PFs as well. FDB controls should be independent of how the ports
  are exposed VFs, PFs, VMDQ/queue pairs, macvlan, etc.
  
  Makes sense.
  
  With events and ADD/DEL/GET FDB controls we can solve both cases. This also
  solves Roopa's case with macvlan where she wants to add additional 
  addresses
  to macvlan ports.
  
  Not familiar with that issue - I'll prowl the list.
 
 Roopa was likely on the right track here,
 
 http://patchwork.ozlabs.org/patch/123064/
 
 But I think the proper syntax is to use the existing PF_BRIDGE:RTM_XXX
 netlink messages. And if possible drive this without extending ndo_ops.
 
 An ideal user space interaction IMHO would look like,
 
 [root@jf-dev1-dcblab iproute2]# ./br/br fdb add 52:e5:62:7b:57:88 dev veth10
 [root@jf-dev1-dcblab iproute2]# ./br/br fdb
 portmac addrflags
 veth2   36:a6:35:9b:96:c4   local
 veth4   aa:54:b0:7b:42:ef   local
 veth0   2a:e8:5c:95:6c:1b   local
 veth6   6e:26:d5:43:a3:36   local
 veth0   f2:c1:39:76:6a:fb
 veth8   4e:35:16:af:87:13   local
 veth10  52:e5:62:7b:57:88   static
 veth10  aa:a9:35:21:15:c4   local
 [root@jf-dev1-dcblab iproute2]# ./br/br fdb add dev eth3 to 52:e5:62:7b:57:88
 RTNETLINK answers: Invalid argument

I am going to put bridge (nameclash with br) tool into iproute2 (soon).
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware

2012-02-14 Thread John Fastabend
On 2/14/2012 11:05 AM, Stephen Hemminger wrote:
 On Tue, 14 Feb 2012 10:57:04 -0800
 John Fastabend john.r.fastab...@intel.com wrote:
 
 On 2/14/2012 5:18 AM, jamal wrote:
 On Mon, 2012-02-13 at 07:13 -0800, John Fastabend wrote:

 The use case here is multiple VFs but the same solution should work with
 multiple PFs as well. FDB controls should be independent of how the ports
 are exposed VFs, PFs, VMDQ/queue pairs, macvlan, etc.

 Makes sense.

 With events and ADD/DEL/GET FDB controls we can solve both cases. This also
 solves Roopa's case with macvlan where she wants to add additional 
 addresses
 to macvlan ports.

 Not familiar with that issue - I'll prowl the list.

 Roopa was likely on the right track here,

 http://patchwork.ozlabs.org/patch/123064/

 But I think the proper syntax is to use the existing PF_BRIDGE:RTM_XXX
 netlink messages. And if possible drive this without extending ndo_ops.

 An ideal user space interaction IMHO would look like,

 [root@jf-dev1-dcblab iproute2]# ./br/br fdb add 52:e5:62:7b:57:88 dev veth10
 [root@jf-dev1-dcblab iproute2]# ./br/br fdb
 portmac addrflags
 veth2   36:a6:35:9b:96:c4   local
 veth4   aa:54:b0:7b:42:ef   local
 veth0   2a:e8:5c:95:6c:1b   local
 veth6   6e:26:d5:43:a3:36   local
 veth0   f2:c1:39:76:6a:fb
 veth8   4e:35:16:af:87:13   local
 veth10  52:e5:62:7b:57:88   static
 veth10  aa:a9:35:21:15:c4   local
 [root@jf-dev1-dcblab iproute2]# ./br/br fdb add dev eth3 to 52:e5:62:7b:57:88
 RTNETLINK answers: Invalid argument
 
 I am going to put bridge (nameclash with br) tool into iproute2 (soon).

I've been using it on my dev box for awhile now and it works well for
me.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock

2012-02-14 Thread Marcelo Tosatti
On Tue, Feb 14, 2012 at 07:53:56PM +0100, Andrea Arcangeli wrote:
 On Tue, Feb 14, 2012 at 03:29:47PM -0200, Marcelo Tosatti wrote:
  The problem the patch is fixing is not related to page freeing, but
  rmap_write_protect. From 8bf3f6f06fcdfd097b6c6ec51531d8292fa0d81d
 
 Can't find the commit on kvm.git.

Sorry, we got kvm.git out of sync. But you can see an equivalent below.

 
  (replace A (get_dirty_log) with mmu_notifier_invalidate_page):
  
  
  During protecting pages for dirty logging, other threads may also try
  to protect a page in mmu_sync_children() or kvm_mmu_get_page().
  
  In such a case, because get_dirty_log releases mmu_lock before flushing
  TLB's, the following race condition can happen:
  
A (get_dirty_log) B (another thread)
  
lock(mmu_lock)
clear pte.w
unlock(mmu_lock)
  lock(mmu_lock)
  pte.w is already cleared
  unlock(mmu_lock)
  skip TLB flush
 
 Not sure which tree it is, but in kvm and upstream I see an
 unconditional tlb flush here, not skip (both
 kvm_mmu_slot_remove_write_access and kvm_mmu_rmap_write_protect). So I
 assume this has been updated in your tree to eb conditional.

if (!direct) {
if (rmap_write_protect(vcpu-kvm, gfn))
kvm_flush_remote_tlbs(vcpu-kvm);


 Also note kvm_mmu_rmap_write_protect, flushes outside of the mmu_lock
 in the kvm_mmu_rmap_write_protect case (like in quoted description),
 so two write_protect_slot in parallel against each other may not be
 ok, but that may be enforced by design if qemu won't ever call that
 ioctl from two different userland threads (it doesn't sounds security
 related so it should be ok to enforce its safety by userland design).

Yes, here is the fix:

http://git.kernel.org/?p=virt/kvm/kvm.git;a=commit;h=02b48d00d7f1853bdf8a06da19ca5413ebe334c6

This is an equivalent of 8bf3f6f06fcdfd097b6c6ec51531d8292fa0d81d.

 
  return
...
TLB flush
  
  Though thread B assumes the page has already been protected when it
  returns, the remaining TLB entry will break that assumption.
 
 Now I get the question of why not running the TLB flush inside the
 mmu_lock only if the spte was writable :).
 
 kvm_mmu_get_page as long as it only runs in the context of a kvm page
 fault is ok, because the page fault would be inhibited by the mmu
 notifier invalidates, so maybe it's safe.

Ah, perhaps, but this was not taken into account before. Can you confirm
this is the case so we can revert the invalidate_page patch?

 mmu_sync_children seems to have a problem instead, in your tree
 get_dirty_log also has an issue if it has been updated to skip the
 flush on readonly sptes, like I guess.
 
 Interesting how the spte is already non present, the page is just
 being freed shortly later, but yet we still need to trigger write
 faults synchronously and prevent other CPUs in guest mode to further
 modify the page to avoid losing dirty bits updates or updates on
 pagetables that maps pagetables in the not NPT/EPT case. If the page
 was really only going to be freed it would be ok if the other cpus
 would still write to it for a little longer until the page was freed.
 
 Like I wrote in previous email, I was thinking if we'd change the mmu
 notifier methods to do an unconditional flush, then every other flush
 could also run outside of the mmu_lock. But then I didn't think enough
 about this to be sure. My guess is we could move all flushes outside
 the mmu_lock if we stop flushling the tlb conditonally to the current
 spte values. It'd clearly be slower for an UP guest though :). Large
 SMP guests might benefit, if that is feasible at all... It depends how
 problematic the mmu_lock is on the large SMP guests and how much we're
 saving by doing conditional TLB flushes.

Also it should not be necessary for these flushes to be inside mmu_lock
on EPT/NPT case (since there is no write protection there). But it would
be awkward to differentiate the unlock position based on EPT/NPT.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4 V13] Avoid soft lockup message when KVM is stopped by host

2012-02-14 Thread Eric B Munson
On Tue, 14 Feb 2012, Marcelo Tosatti wrote:

 On Tue, Feb 14, 2012 at 10:50:13AM -0500, Eric B Munson wrote:
  On Tue, 14 Feb 2012, Marcelo Tosatti wrote:
  
   On Tue, Feb 14, 2012 at 10:29:31AM -0500, Eric B Munson wrote:
On Wed, 08 Feb 2012, Eric B Munson wrote:

 
 When a guest kernel is stopped by the host hypervisor it can look 
 like a soft
 lockup to the guest kernel.  This false warning can mask later soft 
 lockup
 warnings which may be real.  This patch series adds a method for a 
 host
 hypervisor to communicate to a guest kernel that it is being stopped. 
  The
 final patch in the series has the watchdog check this flag when it 
 goes to
 issue a soft lockup warning and skip the warning if the guest knows 
 it was
 stopped.
 
 It was attempted to solve this in Qemu, but the side effects of 
 saving and
 restoring the clock and tsc for each vcpu put the wall clock of the 
 guest behind
 by the amount of time of the pause.  This forces a guest to have ntp 
 running
 in order to keep the wall clock accurate.

Avi,

Is this set fit for merging or is there something else you want changed?
   
   Eric,
   
   On Message-ID: 20120210160536.ga23...@amt.cnet, i asked:
   
   How is the stub getting included for other architectures again?
   
  
  Marcelo,
  
  Sorry, I put out V13 to answer that.  There is a stub in asm-generic that 
  was
  lost in the V11-V12 rebase.  This stub has be included in the V13 set.
  
  Eric
 
 Eric, 
 
 I know the stub has been included in the series. But i am asking how 
 it is #include'ed for other architectures? (can't see that).

Marcelo,

kernel/watchdog.c now includes linux/kvm_para.h which includes asm/kvm_para.h.
The check_and_clear function is defined in arch include/asm/kvm_para.h or in
asm-generic/kvm_para.h for any arch lacking the specific header in their asm
include dir.  If I have misunderstood how these headers work, please let me
know and I will fix it.

Eric


signature.asc
Description: Digital signature


Re: AESNI and guest hosts

2012-02-14 Thread Ryan Brown
Thanks for the reply, I was thinking AESNI was supported in the way
SSE/MMX and other cpu flags are supported? is a QEMU or a KVM issue?

On Wed, Feb 15, 2012 at 7:18 AM, Brian Jackson i...@theiggy.com wrote:
 On Tuesday, February 14, 2012 03:31:10 AM Ryan Brown wrote:
 Sorry for being a noob here, Any clues with this?, anyone ...

 On Mon, Feb 13, 2012 at 2:05 AM, Ryan Brown mp3g...@gmail.com wrote:
  Host/KVM server is running linux 3.2.4 (Debian wheezy), and guest
  kernel is running 3.2.5. The cpu is an E3-1230, but for some reason
  its not able to supply the guest with aesni. Is there a config option
  or is there something we're missing?



 I don't think it's supported to pass that functionality to the guest.



 
     cpu
      archx86_64/arch
      modelWestmere/model
      vendorIntel/vendor
      topology sockets='1' cores='4' threads='2'/
      feature name='rdtscp'/
      feature name='x2apic'/
      feature name='xtpr'/
      feature name='tm2'/
      feature name='est'/
      feature name='vmx'/
      feature name='ds_cpl'/
      feature name='monitor'/
      feature name='pbe'/
      feature name='tm'/
      feature name='ht'/
      feature name='ss'/
      feature name='acpi'/
      feature name='ds'/
     feature name='vme'/
   /cpu
 
  Guest:
  [root@fanboy:~]# cat /proc/cpuinfo
  processor       : 0
  vendor_id       : GenuineIntel
  cpu family      : 6
  model           : 2
  model name      : QEMU Virtual CPU version 1.0
  stepping        : 3
  microcode       : 0x1
  cpu MHz         : 3192.748
  cache size      : 4096 KB
  fdiv_bug        : no
  hlt_bug         : no
  f00f_bug        : no
  coma_bug        : no
  fpu             : yes
  fpu_exception   : yes
  cpuid level     : 4
  wp              : yes
  flags           : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca
  cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm pni cx16 popcnt
  hypervisor lahf_lm
  bogomips        : 6385.49
  clflush size    : 64
  cache_alignment : 64
  address sizes   : 40 bits physical, 48 bits virtual
  power management:
 
  processor       : 1
  vendor_id       : GenuineIntel
  cpu family      : 6
  model           : 2
  model name      : QEMU Virtual CPU version 1.0
  stepping        : 3
  microcode       : 0x1
  cpu MHz         : 3192.748
  cache size      : 4096 KB
  fdiv_bug        : no
  hlt_bug         : no
  f00f_bug        : no
  coma_bug        : no
  fpu             : yes
  fpu_exception   : yes
  cpuid level     : 4
  wp              : yes
  flags           : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca
  cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm pni cx16 popcnt
  hypervisor lahf_lm
  bogomips        : 6385.49
  clflush size    : 64
  cache_alignment : 64
  address sizes   : 40 bits physical, 48 bits virtual

  power management:
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: The way of mapping BIOS into the guest's address space

2012-02-14 Thread Kevin O'Connor
On Tue, Feb 14, 2012 at 04:13:42PM +0400, Cyrill Gorcunov wrote:
 On Tue, Feb 14, 2012 at 01:10:59PM +0200, Pekka Enberg wrote:
  On Tue, Feb 14, 2012 at 1:03 PM, Yang Bai hamo...@gmail.com wrote:
   Since on X86, bios is always at the end of the address space, so I
   have some thought about how to implement the seabios support for kvm
   tool.
  
   1. using kvm__register_mem to map the end of address space to the
   guest then copy the code of seabios to this mem region. Just emulating
   the bios chip.
 
 I think this is what should be done.
 
  
   2. leave the bios code alone and don't touch the guest's address
   space. If the guest accesses the address belonging to the bios, it
   will be an IO request and we can emulate the IO access to the bios
   chip.
  
   Any ideas about this?
  
  The latter solution doesn't make any sense to me. Cyrill, do we really
  need to put the BIOS at the end of the address space? Don't we have
  unused space below 1 MB?
 
 I don't remember for sure how SeaBIOS works actually. What I rememer
 is that it aquires all hw environment might have. So without real look
 into seabios code I fear I can't answer. But reserving end of 4G address
 space for bios copy sounds reasonable if we going to behave as real
 hardware. Maybe we could poke someone from KVM camp for a hint?

SeaBIOS has two ways to be deployed - first is to copy the image to
the top of the first 1MB (eg, 0xe-0xf) and jump to
0xf000:0xfff0 in 16bit mode.  The second way is to use the SeaBIOS elf
and deploy into memory (according to the elf memory map) and jump to
SeaBIOS in 32bit mode (according to the elf entry point).

SeaBIOS doesn't really need to be in the top 4G of ram.  SeaBIOS does
expect to have normal PC hardware devices (eg, a PIC), though many
hardware devices can be compiled out via its kconfig interface.  The
more interesting challenge will likely be in communicating critical
pieces of information (eg, total memory size) into SeaBIOS.

The SeaBIOS mailing list (seab...@seabios.org) is probably a better
location for technical seabios questions.

-Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


buildbot failure in kvm on i386

2012-02-14 Thread kvm
The Buildbot has detected a new failure on builder i386 while building kvm.
Full details are available at:
 http://buildbot.b1-systems.de/kvm/builders/i386/builds/454

Buildbot URL: http://buildbot.b1-systems.de/kvm/

Buildslave for this Build: b1_kvm_1

Build Reason: The Nightly scheduler named 'nightly_master' triggered this build
Build Source Stamp: [branch master] HEAD
Blamelist: 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot



[Bug 42755] KVM is being extremely slow on AMD Athlon64 4000+ Dual Core 2.1GHz Brisbane

2012-02-14 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42755





--- Comment #29 from Rosen sandik...@yandex.ru  2012-02-15 07:49:35 ---
(In reply to comment #28)
 (In reply to comment #27)
  and there soon will be video capture with 'perf top'
  
  http://vbox7.com/play:199e9ede30
 
 Run it while the guest is also running.

Good Morning!
There will be video http://vbox7.com/play:7128f03f1f after some momments.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html