Re: [Qemu-devel] cpuid problem in upstream qemu with kvm

2010-01-07 Thread Dor Laor

On 01/06/2010 05:16 PM, Anthony Liguori wrote:

On 01/06/2010 08:48 AM, Dor Laor wrote:

On 01/06/2010 04:32 PM, Avi Kivity wrote:

On 01/06/2010 04:22 PM, Michael S. Tsirkin wrote:

We can probably default -enable-kvm to -cpu host, as long as we
explain
very carefully that if users wish to preserve cpu features across
upgrades, they can't depend on the default.

Hardware upgrades or software upgrades?


Yes.



I just want to remind all the the main motivation for using -cpu
realModelThatWasOnceShiped is to provide correct cpu emulation for the
guest. Using a random qemu|kvm64+flag1-flag2 might really cause
trouble for the guest OS or guest apps.

On top of -cpu nehalem we can always add fancy features like x2apic, etc.


I think it boils down to, how are people going to use this.

For individuals, code names like Nehalem are too obscure. From my own
personal experience, even power users often have no clue whether there
processor is a Nehalem or not.

For management tools, Nehalem is a somewhat imprecise target because it
covers a wide range of potential processors. In general, I think what we
really need to do is simplify the process of going from, here's the
output of /proc/cpuinfo for a 100 nodes, what do I need to pass to qemu
so that migration always works for these systems.

I don't think -cpu nehalem really helps with that problem. -cpu none
helps a bit, but I hope we can find something nicer.


We can debate about the exact name/model to represent the Nehalem 
family, I don't have an issue with that and actually Intel and Amd 
should define it.


There are two main motivations behind the above approach:
1. Sound guest cpu definition.
   Using a predefined model should automatically set all the relevant
   vendor/stepping/cpuid flags/cache sizes/etc.
   We just can let every management application deal with it. It breaks
   guest OS/apps. For instance there are MSI support in windows guest
   relay on the stepping.

2. Simplifying end user and mgmt tools.
   qemu/kvm have the best knowledge about these low levels. If we push
   it up in the stack, eventually it reaches the user. The end user,
   not a 'qemu-devel user' which is actually far better from the
   average user.

   This means that such users will have to know what is popcount and
   whether or not to limit migration on one host by adding sse4.2 or
   not.

This is exactly what vmware are doing:
 - Intel CPUs : 
http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1991
 - AMD CPUs : 
http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1992


Why should we invent the wheel (qemu64..)? Let's learn from their 
experience.


This is the test description of the original patch by John:


# Intel
# -

# Management layers remove pentium3 by default.
# It primarily remains here for testing of 32-bit migration.
#
[0:Pentium 3 Intel
:vmx
:pentium3;]

# Core 2, 65nm
# possible option sets: (+nx,+cx16), (+nx,+cx16,+ssse3)
#
1:Merom
:vmx,sse2
:qemu64,-nx,+sse2;

# Core2 45nm
#
2:Penryn
:vmx,sse2,nx,cx16,ssse3,sse4_1
:qemu64,+sse2,+cx16,+ssse3,+sse4_1;

# Core i7 45/32nm
#
3:Nehalem
:vmx,sse2,nx,cx16,ssse3,sse4_1,sse4_2,popcnt
:qemu64,+sse2,+cx16,+ssse3,+sse4_1,+sse4_2,+popcnt;


# AMD
# ---

# Management layers remove pentium3 by default.
# It primarily remains here for testing of 32-bit migration.
#
[0:Pentium 3 AMD
:svm
:pentium3;]

# Opteron 90nm stepping E1/E4/E6
# possible option sets: (-nx) for 130nm
#
1:Opteron G1
:svm,sse2,nx
:qemu64,+sse2;

# Opteron 90nm stepping F2/F3
#
2:Opteron G2
:svm,sse2,nx,cx16,rdtscp
:qemu64,+sse2,+cx16,+rdtscp;

# Opteron 65/45nm
#
3:Opteron G3
:svm,sse2,nx,cx16,sse4a,misalignsse,popcnt,abm
:qemu64,+sse2,+cx16,+sse4a,+misalignsse,+popcnt,+abm;





Regards,

Anthony Liguori




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Add the ability to use *.xml as unattended_install file for kvm.

2010-01-07 Thread sshang

Signed-off-by: sshang ssh...@redhat.com
---
 client/tests/kvm/scripts/unattended.py |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/scripts/unattended.py 
b/client/tests/kvm/scripts/unattended.py
index 562d317..ee20b60 100755
--- a/client/tests/kvm/scripts/unattended.py
+++ b/client/tests/kvm/scripts/unattended.py
@@ -91,6 +91,8 @@ class UnattendedInstall(object):
 shutil.copyfile(setup_file_path, setup_file_dest)
 elif self.unattended_file.endswith('.ks'):
 dest_fname = 'ks.cfg'
+elif self.unattended_file.endswith('.xml'):
+dest_fname = autounattend.xml
 
 dest = os.path.join(self.floppy_mount, dest_fname)
 shutil.copyfile(self.unattended_file, dest)
-- 
1.5.3.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] cpuid problem in upstream qemu with kvm

2010-01-07 Thread Avi Kivity

On 01/07/2010 10:03 AM, Dor Laor wrote:


We can debate about the exact name/model to represent the Nehalem 
family, I don't have an issue with that and actually Intel and Amd 
should define it.


AMD and Intel already defined their names (in cat /proc/cpuinfo).  They 
don't define families, the whole idea is to segment the market.




There are two main motivations behind the above approach:
1. Sound guest cpu definition.
   Using a predefined model should automatically set all the relevant
   vendor/stepping/cpuid flags/cache sizes/etc.
   We just can let every management application deal with it. It breaks
   guest OS/apps. For instance there are MSI support in windows guest
   relay on the stepping.

2. Simplifying end user and mgmt tools.
   qemu/kvm have the best knowledge about these low levels. If we push
   it up in the stack, eventually it reaches the user. The end user,
   not a 'qemu-devel user' which is actually far better from the
   average user.

   This means that such users will have to know what is popcount and
   whether or not to limit migration on one host by adding sse4.2 or
   not.

This is exactly what vmware are doing:
 - Intel CPUs : 
http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1991 

 - AMD CPUs : 
http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1992 



They don't have to deal with different qemu and kvm versions.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] cpuid problem in upstream qemu with kvm

2010-01-07 Thread Daniel P. Berrange
On Thu, Jan 07, 2010 at 10:03:28AM +0200, Dor Laor wrote:
 On 01/06/2010 05:16 PM, Anthony Liguori wrote:
 On 01/06/2010 08:48 AM, Dor Laor wrote:
 On 01/06/2010 04:32 PM, Avi Kivity wrote:
 On 01/06/2010 04:22 PM, Michael S. Tsirkin wrote:
 We can probably default -enable-kvm to -cpu host, as long as we
 explain
 very carefully that if users wish to preserve cpu features across
 upgrades, they can't depend on the default.
 Hardware upgrades or software upgrades?
 
 Yes.
 
 
 I just want to remind all the the main motivation for using -cpu
 realModelThatWasOnceShiped is to provide correct cpu emulation for the
 guest. Using a random qemu|kvm64+flag1-flag2 might really cause
 trouble for the guest OS or guest apps.
 
 On top of -cpu nehalem we can always add fancy features like x2apic, etc.
 
 I think it boils down to, how are people going to use this.
 
 For individuals, code names like Nehalem are too obscure. From my own
 personal experience, even power users often have no clue whether there
 processor is a Nehalem or not.
 
 For management tools, Nehalem is a somewhat imprecise target because it
 covers a wide range of potential processors. In general, I think what we
 really need to do is simplify the process of going from, here's the
 output of /proc/cpuinfo for a 100 nodes, what do I need to pass to qemu
 so that migration always works for these systems.
 
 I don't think -cpu nehalem really helps with that problem. -cpu none
 helps a bit, but I hope we can find something nicer.
 
 We can debate about the exact name/model to represent the Nehalem 
 family, I don't have an issue with that and actually Intel and Amd 
 should define it.
 
 There are two main motivations behind the above approach:
 1. Sound guest cpu definition.
Using a predefined model should automatically set all the relevant
vendor/stepping/cpuid flags/cache sizes/etc.
We just can let every management application deal with it. It breaks
guest OS/apps. For instance there are MSI support in windows guest
relay on the stepping.
 
 2. Simplifying end user and mgmt tools.
qemu/kvm have the best knowledge about these low levels. If we push
it up in the stack, eventually it reaches the user. The end user,
not a 'qemu-devel user' which is actually far better from the
average user.
 
This means that such users will have to know what is popcount and
whether or not to limit migration on one host by adding sse4.2 or
not.
 
 This is exactly what vmware are doing:
  - Intel CPUs : 
 http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1991
  - AMD CPUs : 
 http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1992
 
 Why should we invent the wheel (qemu64..)? Let's learn from their 
 experience.

NB, be careful to distinguish the different levels of VMwares mgmt stack. In
terms of guest configuration, VMWare ESX APIs require the management app to
specify the raw CPUID masks. With VirtualCenter VMotion they defined this 
handful of common Intel/AMD CPU sets, and will automatically classify hosts
into one  of these sets and use that to specify a default CPUID mask, in the
case that the guest does not have an explicit one in its config. This gives
them good default, out-of-the-box behaviour, while also allowing mgmt apps
100% control over each guest's CPUID should they want it.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WinXP virtual crashes on 0.12.1.2 but not 0.12.1.1

2010-01-07 Thread Avi Kivity

On 01/06/2010 05:45 PM, Mark Cave-Ayland wrote:

Avi Kivity wrote:

It probably did make some kind of difference.  Please try a clean 
install.


After several hours of testing, I've finally found out what the 
problem is.


I tried a clean WinXP guest install and that worked, so it was 
obviously a driver issue. After disabling various drivers in the WinXP 
guest, I didn't get anywhere so I decided to take a break and test 
Marcelo's VNC patch. With this applied, I could actually see all of 
the information in the BSOD which showed the error was in intelppm.sys.


A quick search took me to this page here: 
http://blogs.msdn.com/virtual_pc_guy/archive/2005/10/24/484461.aspx 
which explains the issue in more detail. I first tried disabling the 
intelppm driver and rebooting, but that didn't make a difference; 
however disabling the Processor driver worked and my guest VM booted 
in Normal Mode :)


I think the issue is probably similar to that explained in the article 
above; with a new processor reported to the guest, the internal 
processor driver tries to upload some kind of microcode to the new 
device which fails and causes the guest to fall over. Can we teach KVM 
to silently discard these kinds of updates?




Can you try loading kvm.ko with the ignore_msrs module parameter set?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WinXP virtual crashes on 0.12.1.2 but not 0.12.1.1

2010-01-07 Thread Avi Kivity

On 01/06/2010 07:08 PM, Mark Cave-Ayland wrote:

Mark Cave-Ayland wrote:

A quick search took me to this page here: 
http://blogs.msdn.com/virtual_pc_guy/archive/2005/10/24/484461.aspx 
which explains the issue in more detail. I first tried disabling the 
intelppm driver and rebooting, but that didn't make a difference; 
however disabling the Processor driver worked and my guest VM booted 
in Normal Mode :)


I've just re-created the KVM image fresh from the VDI image once again 
and can confirm that disabling just the Processor driver is enough to 
allow the guest WinXP VM to function in qemu-kvm-0.12.1.2. Perhaps the 
default for -cpu host should not be changed in a micro release as 
there is a risk of breaking existing VMs?




That was actually a fix for a regression relative to 0.11.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] cpuid problem in upstream qemu with kvm

2010-01-07 Thread Dor Laor

On 01/07/2010 10:18 AM, Avi Kivity wrote:

On 01/07/2010 10:03 AM, Dor Laor wrote:


We can debate about the exact name/model to represent the Nehalem
family, I don't have an issue with that and actually Intel and Amd
should define it.


AMD and Intel already defined their names (in cat /proc/cpuinfo). They
don't define families, the whole idea is to segment the market.


The idea here is to minimize the number of models we should have the 
following range for Intel for example:

  pentium3 - merom -  penry - Nehalem - host - kvm/qemu64
So we're supplying wide range of cpus, p3 for maximum flexibility and 
migration, nehalem for performance and migration, host for maximum 
performance and qemu/kvm64 for custom maid.






There are two main motivations behind the above approach:
1. Sound guest cpu definition.
Using a predefined model should automatically set all the relevant
vendor/stepping/cpuid flags/cache sizes/etc.
We just can let every management application deal with it. It breaks
guest OS/apps. For instance there are MSI support in windows guest
relay on the stepping.

2. Simplifying end user and mgmt tools.
qemu/kvm have the best knowledge about these low levels. If we push
it up in the stack, eventually it reaches the user. The end user,
not a 'qemu-devel user' which is actually far better from the
average user.

This means that such users will have to know what is popcount and
whether or not to limit migration on one host by adding sse4.2 or
not.

This is exactly what vmware are doing:
- Intel CPUs :
http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1991

- AMD CPUs :
http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1992



They don't have to deal with different qemu and kvm versions.



Both our customers - the end users. It's not their problem.
IMO what's missing today is a safe and sound cpu emulation that is 
simply and friendly to represent. qemu64,+popcount is not simple for the 
end user. There is no reason to through it on higher level mgmt.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] cpuid problem in upstream qemu with kvm

2010-01-07 Thread Dor Laor

On 01/07/2010 11:24 AM, Avi Kivity wrote:

On 01/07/2010 11:11 AM, Dor Laor wrote:

On 01/07/2010 10:18 AM, Avi Kivity wrote:

On 01/07/2010 10:03 AM, Dor Laor wrote:


We can debate about the exact name/model to represent the Nehalem
family, I don't have an issue with that and actually Intel and Amd
should define it.


AMD and Intel already defined their names (in cat /proc/cpuinfo). They
don't define families, the whole idea is to segment the market.


The idea here is to minimize the number of models we should have the
following range for Intel for example:
pentium3 - merom - penry - Nehalem - host - kvm/qemu64
So we're supplying wide range of cpus, p3 for maximum flexibility and
migration, nehalem for performance and migration, host for maximum
performance and qemu/kvm64 for custom maid.


There's no such thing as Nehalem.


Intel were ok with it. Again, you can name is corei7 or xeon34234234234, 
I don't care, the principle remains the same.






This is exactly what vmware are doing:
- Intel CPUs :
http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1991


- AMD CPUs :
http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1992




They don't have to deal with different qemu and kvm versions.



Both our customers - the end users. It's not their problem.
IMO what's missing today is a safe and sound cpu emulation that is
simply and friendly to represent. qemu64,+popcount is not simple for
the end user. There is no reason to through it on higher level mgmt.


There's no simple solution except to restrict features to what was
available on the first processors.


What's not simple about the above 4 options?
What's a better alternative (that insures users understand it and use it 
and guest msi and even skype application is happy about it)?



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WinXP virtual crashes on 0.12.1.2 but not 0.12.1.1

2010-01-07 Thread Mark Cave-Ayland

Avi Kivity wrote:

I think the issue is probably similar to that explained in the article 
above; with a new processor reported to the guest, the internal 
processor driver tries to upload some kind of microcode to the new 
device which fails and causes the guest to fall over. Can we teach KVM 
to silently discard these kinds of updates?




Can you try loading kvm.ko with the ignore_msrs module parameter set?


Hi Avi,

I've just done a quick test re-enabling processor.sys on my WinXP guest 
and then did the following:


virsh stop winxp
rmmod kvm_intel
rmmod kvm
modprobe kvm ignore_msrs=1
modprobe kvm_intel
virsh start winxp

Unfortunately it still crashes with the same 
DRIVER_UNLOADED_WITHOUT_CANCELING_PENDING_OPERATIONS BSOD :(



HTH,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WinXP virtual crashes on 0.12.1.2 but not 0.12.1.1

2010-01-07 Thread Mark Cave-Ayland

Avi Kivity wrote:

I've just re-created the KVM image fresh from the VDI image once again 
and can confirm that disabling just the Processor driver is enough to 
allow the guest WinXP VM to function in qemu-kvm-0.12.1.2. Perhaps the 
default for -cpu host should not be changed in a micro release as 
there is a risk of breaking existing VMs?




That was actually a fix for a regression relative to 0.11.


Really? Damn :(  Any pointers towards the relevant bug in the bug tracker?


ATB,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] eventfd: new EFD_STATE flag

2010-01-07 Thread Michael S. Tsirkin
On Wed, Jan 06, 2010 at 11:25:40PM -0800, Davide Libenzi wrote:
 On Thu, 7 Jan 2010, Michael S. Tsirkin wrote:
 
  OK. What I think we need is a way to remove ourselves from the eventfd
  wait queue and clear the counter atomically.
  
  We currently do
  remove_wait_queue(irqfd-wqh, irqfd-wait);
  where wqh saves the eventfd wait queue head.
 
 You do a remove_wait_queue() from inside a callback wakeup on the same 
 wait queue head?
 

No, not from callback, in ioctl context.

  If we do this before proposed eventfd_read_ctx, we can lose events.
  If we do this after, we can get spurious events.
  
  An unlocked read is one way to fix this.
 
 You posted one line of code and a two lines analysis of the issue. Can you 
 be a little bit more verbose and show me more code, so that I can actually 
 see what is going on?
 
 
 - Davide


Sure, I was trying to be as brief as possible, here's a detailed summary.

Description of the system (MSI emulation in KVM):

KVM supports an ioctl to assign/deassign an eventfd file to interrupt message
in guest OS.  When this eventfd is signalled, interrupt message is sent.
This assignment is done from qemu system emulator.

eventfd is signalled from device emulation in another thread in
userspace or from kernel, which talks with guest OS through another
eventfd and shared memory (possibility of out of process was discussed
but never got implemented yet).

Note: it's okay to delay messages from correctness point of view, but
generally this is latency-sensitive path. If multiple identical messages
are requested, it's okay to send a single last message, but missing a
message altogether causes deadlocks.  Sending a message when none were
requested might in theory cause crashes, in practice doing this causes
performance degradation.

Another KVM feature is interrupt masking: guest OS requests that we
stop sending some interrupt message, possibly modified mapping
and re-enables this message. This needs to be done without
involving the device that might keep requesting events:
while masked, message is marked pending, and guest might test
the pending status.

We can implement masking in system emulator in userspace, by using
assign/deassign ioctls: when message is masked, we simply deassign all
eventfd, and when it is unmasked, we assign them back.

Here's some code to illustrate how this all works: assign/deassign code
in kernel looks like the following:


this is called to unmask interrupt

static int
kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
{
struct _irqfd *irqfd, *tmp;
struct file *file = NULL;
struct eventfd_ctx *eventfd = NULL;
int ret;
unsigned int events;

irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);

...

file = eventfd_fget(fd);
if (IS_ERR(file)) {
ret = PTR_ERR(file);
goto fail;
}

eventfd = eventfd_ctx_fileget(file);
if (IS_ERR(eventfd)) {
ret = PTR_ERR(eventfd);
goto fail;
}

irqfd-eventfd = eventfd;

/*
 * Install our own custom wake-up handling so we are notified via
 * a callback whenever someone signals the underlying eventfd
 */
init_waitqueue_func_entry(irqfd-wait, irqfd_wakeup);
init_poll_funcptr(irqfd-pt, irqfd_ptable_queue_proc);

spin_lock_irq(kvm-irqfds.lock);

events = file-f_op-poll(file, irqfd-pt);

list_add_tail(irqfd-list, kvm-irqfds.items);
spin_unlock_irq(kvm-irqfds.lock);

A.
/*
 * Check if there was an event already pending on the eventfd
 * before we registered, and trigger it as if we didn't miss it.
 */
if (events  POLLIN)
schedule_work(irqfd-inject);

/*
 * do not drop the file until the irqfd is fully initialized, otherwise
 * we might race against the POLLHUP
 */
fput(file);

return 0;

fail:
...
}

This is called to mask interrupt

/*
 * shutdown any irqfd's that match fd+gsi
 */
static int
kvm_irqfd_deassign(struct kvm *kvm, int fd, int gsi)
{
struct _irqfd *irqfd, *tmp;
struct eventfd_ctx *eventfd;

eventfd = eventfd_ctx_fdget(fd);
if (IS_ERR(eventfd))
return PTR_ERR(eventfd);

spin_lock_irq(kvm-irqfds.lock);

list_for_each_entry_safe(irqfd, tmp, kvm-irqfds.items, list) {
if (irqfd-eventfd == eventfd  irqfd-gsi == gsi)
irqfd_deactivate(irqfd);
}

spin_unlock_irq(kvm-irqfds.lock);
eventfd_ctx_put(eventfd);

/*
 * Block until we know all outstanding shutdown jobs have completed
 * so that we guarantee there will not be any more interrupts on this
 * gsi once this deassign function returns.
 */
flush_workqueue(irqfd_cleanup_wq);

return 0;
}


And deactivation deep down does this (from 

Very bad Speed with Virtio-net

2010-01-07 Thread Benjamin Schweikert

Hello everybody,
this is my first post on a mailing list, so i hope everything works fine.

My host is a AMD X2 4850e with a 64bit Gentoo (unstable). I have tested 
qemu-kvm 0.11, 0.12.x and the git version from the 6. jan.
I created my own bridges, so i dont need the option from libvirt. I 
bridged a 1 Gb lan card for my VMs. When I use the virtio net driver,
i get something about 200-300 mbit form my desktop to one if my VMs. If 
iI use the e1000 driver instead of the virtio I get about

500 - 600 mbit.
I tested this with the following kernels:
Host: 2.6.31.6, 2.6.32.1, 2.6.32.2
Guests: 2.6.26, 2.6.30, 2.6.32 (debian)
2.6.32 (gentoo)

Here is a default result, virtio vs. e1000:

iperf -c 192.168.0.3 -w 512k -l 512k

Client connecting to 192.168.0.3, TCP port 5001
TCP window size:   256 KByte (WARNING: requested   512 KByte)

[  3] local 192.168.0.2 port 52968 connected with 192.168.0.3 port 5001
[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.0 sec438 MBytes267 Mbits/sec


iperf -c 192.168.0.3 -w 512k -l 512k

Client connecting to 192.168.0.3, TCP port 5001
TCP window size:   256 KByte (WARNING: requested   512 KByte)

[  3] local 192.168.0.2 port 52995 connected with 192.168.0.3 port 5001
[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.0 sec602 MBytes505 Mbits/sec

Any ideas what this could be? I attach a dmesg output of my host.
Thx.

Ben
Linux version 2.6.32-gentoo-r1 (r...@tux) (gcc version 4.4.2 (Gentoo 4.4.2 
p1.0) ) #2 SMP Wed Jan 6 12:04:57 CET 2010
Command line: root=/dev/sda2
KERNEL supported cpus:
  Intel GenuineIntel
  AMD AuthenticAMD
  Centaur CentaurHauls
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009f400 (usable)
 BIOS-e820: 0009f400 - 000a (reserved)
 BIOS-e820: 000e6000 - 0010 (reserved)
 BIOS-e820: 0010 - cfeb (usable)
 BIOS-e820: cfeb - cfebe000 (ACPI data)
 BIOS-e820: cfebe000 - cfee (ACPI NVS)
 BIOS-e820: cfee - cfeee000 (reserved)
 BIOS-e820: cfef - cff0 (reserved)
 BIOS-e820: ff70 - 0001 (reserved)
 BIOS-e820: 0001 - 0001a000 (usable)
DMI present.
AMI BIOS detected: BIOS may corrupt low RAM, working around it.
e820 update range:  - 0001 (usable) == (reserved)
last_pfn = 0x1a max_arch_pfn = 0x4
MTRR default type: uncachable
MTRR fixed ranges enabled:
  0-9 write-back
  A-E uncachable
  F-F write-protect
MTRR variable ranges enabled:
  0 base 00 mask FF8000 write-back
  1 base 008000 mask FFC000 write-back
  2 base 00C000 mask FFF000 write-back
  3 disabled
  4 disabled
  5 disabled
  6 disabled
  7 disabled
TOM2: 0001b000 aka 6912M
e820 update range: d000 - 0001 (usable) == (reserved)
last_pfn = 0xcfeb0 max_arch_pfn = 0x4
initial memory mapped : 0 - 2000
init_memory_mapping: -cfeb
 00 - 00cfe0 page 2M
 00cfe0 - 00cfeb page 4k
kernel direct mapping tables up to cfeb @ 1-16000
init_memory_mapping: 0001-0001a000
 01 - 01a000 page 2M
kernel direct mapping tables up to 1a000 @ 14000-1c000
ACPI: RSDP 000f9e40 00014 (v00 ACPIAM)
ACPI: RSDT cfeb 0003C (v01 110608 RSDT1133 20081106 MSFT 0097)
ACPI: FACP cfeb0200 00084 (v02 110608 FACP1133 20081106 MSFT 0097)
ACPI: DSDT cfeb0440 04D44 (v01  1 1000  INTL 20051117)
ACPI: FACS cfebe000 00040
ACPI: APIC cfeb0390 0006C (v01 110608 APIC1133 20081106 MSFT 0097)
ACPI: MCFG cfeb0400 0003C (v01 110608 OEMMCFG  20081106 MSFT 0097)
ACPI: OEMB cfebe040 00071 (v01 110608 OEMB1133 20081106 MSFT 0097)
ACPI: HPET cfeb5190 00038 (v01 110608 OEMHPET  20081106 MSFT 0097)
ACPI: SSDT cfeb51d0 0028A (v01 A M I  POWERNOW 0001 AMD  0001)
ACPI: Local APIC address 0xfee0
(7 early reservations) == bootmem [00 - 01a000]
  #0 [00 - 001000]   BIOS data page == [00 - 001000]
  #1 [006000 - 008000]   TRAMPOLINE == [006000 - 008000]
  #2 [000100 - 0001a7ca84]TEXT DATA BSS == [000100 - 0001a7ca84]
  #3 [09f400 - 10]BIOS reserved == [09f400 - 10]
  #4 [0001a7d000 - 0001a7d0f1]  BRK == [0001a7d000 - 0001a7d0f1]
  #5 [01 - 014000]  PGTABLE == [01 - 014000]
  #6 [014000 - 017000]  PGTABLE == [014000 - 017000]
found SMP MP-table at [880ff780] ff780
 

Re: Very bad Speed with Virtio-net

2010-01-07 Thread Riccardo Veraldi

I have similar results, like yours, using CentOS 5.4 x86_64
I do not think it is possible to gain more than this right now... or 
better I wish it could be possible


If you can get better result please let me know

Rick

Benjamin Schweikert wrote:

Hello everybody,
this is my first post on a mailing list, so i hope everything works fine.

My host is a AMD X2 4850e with a 64bit Gentoo (unstable). I have 
tested qemu-kvm 0.11, 0.12.x and the git version from the 6. jan.
I created my own bridges, so i dont need the option from libvirt. I 
bridged a 1 Gb lan card for my VMs. When I use the virtio net driver,
i get something about 200-300 mbit form my desktop to one if my VMs. 
If iI use the e1000 driver instead of the virtio I get about

500 - 600 mbit.
I tested this with the following kernels:
Host: 2.6.31.6, 2.6.32.1, 2.6.32.2
Guests: 2.6.26, 2.6.30, 2.6.32 (debian)
2.6.32 (gentoo)

Here is a default result, virtio vs. e1000:

iperf -c 192.168.0.3 -w 512k -l 512k

Client connecting to 192.168.0.3, TCP port 5001
TCP window size:   256 KByte (WARNING: requested   512 KByte)

[  3] local 192.168.0.2 port 52968 connected with 192.168.0.3 port 5001
[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.0 sec438 MBytes267 Mbits/sec


iperf -c 192.168.0.3 -w 512k -l 512k

Client connecting to 192.168.0.3, TCP port 5001
TCP window size:   256 KByte (WARNING: requested   512 KByte)

[  3] local 192.168.0.2 port 52995 connected with 192.168.0.3 port 5001
[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.0 sec602 MBytes505 Mbits/sec

Any ideas what this could be? I attach a dmesg output of my host.
Thx.

Ben


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] cpuid problem in upstream qemu with kvm

2010-01-07 Thread Anthony Liguori

On 01/07/2010 03:40 AM, Dor Laor wrote:

There's no simple solution except to restrict features to what was
available on the first processors.


What's not simple about the above 4 options?
What's a better alternative (that insures users understand it and use 
it and guest msi and even skype application is happy about it)?


Even if you have -cpu Nehalem, different versions of the KVM kernel 
module may additionally filter cpuid flags.


So if you had a 2.6.18 kernel and a 2.6.33 kernel, it may be necessary 
to say:


(2.6.33) qemu -cpu Nehalem,-syscall
(2.6.18) qemu -cpu Nehalem

In order to be compatible.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] cpuid problem in upstream qemu with kvm

2010-01-07 Thread Dor Laor

On 01/07/2010 01:39 PM, Anthony Liguori wrote:

On 01/07/2010 03:40 AM, Dor Laor wrote:

There's no simple solution except to restrict features to what was
available on the first processors.


What's not simple about the above 4 options?
What's a better alternative (that insures users understand it and use
it and guest msi and even skype application is happy about it)?


Even if you have -cpu Nehalem, different versions of the KVM kernel
module may additionally filter cpuid flags.

So if you had a 2.6.18 kernel and a 2.6.33 kernel, it may be necessary
to say:

(2.6.33) qemu -cpu Nehalem,-syscall
(2.6.18) qemu -cpu Nehalem


Or let qemu do it automatically for you.



In order to be compatible.

Regards,

Anthony Liguori



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] cpuid problem in upstream qemu with kvm

2010-01-07 Thread Avi Kivity

On 01/07/2010 11:40 AM, Dor Laor wrote:

There's no such thing as Nehalem.



Intel were ok with it. Again, you can name is corei7 or 
xeon34234234234, I don't care, the principle remains the same.




There are several processors belonging to the Nehalem family and each 
have different features.




What's not simple about the above 4 options?


If a qemu/kvm/processor combo doesn't support a feature (say, nx) we 
have to remove it from the migration pool even if the Nehalem processor 
class says it's included.  Or else not admit that combination into the 
migration pool in the first place.


What's a better alternative (that insures users understand it and use 
it and guest msi and even skype application is happy about it)?




Have management scan new nodes and classify them.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] cpuid problem in upstream qemu with kvm

2010-01-07 Thread Avi Kivity

On 01/07/2010 01:44 PM, Dor Laor wrote:

So if you had a 2.6.18 kernel and a 2.6.33 kernel, it may be necessary
to say:

(2.6.33) qemu -cpu Nehalem,-syscall
(2.6.18) qemu -cpu Nehalem



Or let qemu do it automatically for you.


qemu on 2.6.33 doesn't know that you're running qemu on 2.6.18 on 
another node.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] Lazy fpu for svm/npt

2010-01-07 Thread Avi Kivity
This patchset (on top of the previous cr0 patchset) brings lazy fpu to npt.
For the cases where guest and host cr0 match (the majority) it will disable
intercepts for cr0.ts once the guest fpu is loaded, so the guest can to its
own lazy fpu without trapping.

Avi Kivity (3):
  KVM: SVM: Fix SVM_CR0_SELECTIVE_MASK
  KVM: SVM: Initialize fpu_active in init_vmcb()
  KVM: SVM: Lazy fpu for npt

 arch/x86/include/asm/svm.h |2 +-
 arch/x86/kvm/svm.c |   73 +--
 2 files changed, 37 insertions(+), 38 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] KVM: SVM: Fix SVM_CR0_SELECTIVE_MASK

2010-01-07 Thread Avi Kivity
Instead of selecting TS and MP as the comments say, the macro included TS and
PE.  Luckily the macro is unused now, but fix in order to save a few hours of
debugging from anyone who attempts to use it.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/include/asm/svm.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 1fecb7e..38638cd 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -313,7 +313,7 @@ struct __attribute__ ((__packed__)) vmcb {
 
 #define SVM_EXIT_ERR   -1
 
-#define SVM_CR0_SELECTIVE_MASK (1  3 | 1) /* TS and MP */
+#define SVM_CR0_SELECTIVE_MASK (X86_CR0_TS | X86_CR0_MP)
 
 #define SVM_VMLOAD .byte 0x0f, 0x01, 0xda
 #define SVM_VMRUN  .byte 0x0f, 0x01, 0xd8
-- 
1.6.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] KVM: SVM: Initialize fpu_active in init_vmcb()

2010-01-07 Thread Avi Kivity
init_vmcb() sets up the intercepts as if the fpu is active, so initialize it
there.  This avoids an INIT from setting up intercepts inconsistent with
fpu_active.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/svm.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 2a3890f..f4418e2 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -540,6 +540,8 @@ static void init_vmcb(struct vcpu_svm *svm)
struct vmcb_control_area *control = svm-vmcb-control;
struct vmcb_save_area *save = svm-vmcb-save;
 
+   svm-vcpu.fpu_active = 1;
+
control-intercept_cr_read =INTERCEPT_CR0_MASK |
INTERCEPT_CR3_MASK |
INTERCEPT_CR4_MASK;
@@ -730,7 +732,6 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
init_vmcb(svm);
 
fx_init(svm-vcpu);
-   svm-vcpu.fpu_active = 1;
svm-vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(svm-vcpu))
svm-vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
-- 
1.6.5.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] KVM: SVM: Lazy fpu for npt

2010-01-07 Thread Avi Kivity
If two conditions apply:
 - no bits outside TS and EM differ between the host and guest cr0
 - the fpu is active

then we can activate the selective cr0 write intercept and drop the
unconditional cr0 read and write intercept, and allow the guest to run
with the host fpu state.  This reduces the heavyweight context switch
when npt is enabled.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/svm.c |   70 +--
 1 files changed, 34 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index f4418e2..7f3d890 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -571,6 +571,7 @@ static void init_vmcb(struct vcpu_svm *svm)
control-intercept =(1ULL  INTERCEPT_INTR) |
(1ULL  INTERCEPT_NMI) |
(1ULL  INTERCEPT_SMI) |
+   (1ULL  INTERCEPT_SELECTIVE_CR0) |
(1ULL  INTERCEPT_CPUID) |
(1ULL  INTERCEPT_INVD) |
(1ULL  INTERCEPT_HLT) |
@@ -643,10 +644,8 @@ static void init_vmcb(struct vcpu_svm *svm)
control-intercept = ~((1ULL  INTERCEPT_TASK_SWITCH) |
(1ULL  INTERCEPT_INVLPG));
control-intercept_exceptions = ~(1  PF_VECTOR);
-   control-intercept_cr_read = ~(INTERCEPT_CR0_MASK|
-   INTERCEPT_CR3_MASK);
-   control-intercept_cr_write = ~(INTERCEPT_CR0_MASK|
-INTERCEPT_CR3_MASK);
+   control-intercept_cr_read = ~INTERCEPT_CR3_MASK;
+   control-intercept_cr_write = ~INTERCEPT_CR3_MASK;
save-g_pat = 0x0007040600070406ULL;
save-cr3 = 0;
save-cr4 = 0;
@@ -965,6 +964,27 @@ static void svm_decache_cr4_guest_bits(struct kvm_vcpu 
*vcpu)
 {
 }
 
+static void update_cr0_intercept(struct vcpu_svm *svm)
+{
+   ulong gcr0 = svm-vcpu.arch.cr0;
+   u64 *hcr0 = svm-vmcb-save.cr0;
+
+   if (!svm-vcpu.fpu_active)
+   *hcr0 |= SVM_CR0_SELECTIVE_MASK;
+   else
+   *hcr0 = (*hcr0  ~SVM_CR0_SELECTIVE_MASK)
+   | (gcr0  SVM_CR0_SELECTIVE_MASK);
+
+
+   if (gcr0 == *hcr0  svm-vcpu.fpu_active) {
+   svm-vmcb-control.intercept_cr_read = ~INTERCEPT_CR0_MASK;
+   svm-vmcb-control.intercept_cr_write = ~INTERCEPT_CR0_MASK;
+   } else {
+   svm-vmcb-control.intercept_cr_read |= INTERCEPT_CR0_MASK;
+   svm-vmcb-control.intercept_cr_write |= INTERCEPT_CR0_MASK;
+   }
+}
+
 static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 {
struct vcpu_svm *svm = to_svm(vcpu);
@@ -982,12 +1002,11 @@ static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned 
long cr0)
}
}
 #endif
-   if (npt_enabled)
-   goto set;
-
vcpu-arch.cr0 = cr0;
-   cr0 |= X86_CR0_PG | X86_CR0_WP;
-set:
+
+   if (!npt_enabled)
+   cr0 |= X86_CR0_PG | X86_CR0_WP;
+
/*
 * re-enable caching here because the QEMU bios
 * does not do it - this results in some delay at
@@ -995,6 +1014,7 @@ set:
 */
cr0 = ~(X86_CR0_CD | X86_CR0_NW);
svm-vmcb-save.cr0 = cr0;
+   update_cr0_intercept(svm);
 }
 
 static void svm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
@@ -1240,11 +1260,8 @@ static int ud_interception(struct vcpu_svm *svm)
 static int nm_interception(struct vcpu_svm *svm)
 {
svm-vmcb-control.intercept_exceptions = ~(1  NM_VECTOR);
-   if (!kvm_read_cr0_bits(svm-vcpu, X86_CR0_TS))
-   svm-vmcb-save.cr0 = ~X86_CR0_TS;
-   else
-   svm-vmcb-save.cr0 |= X86_CR0_TS;
svm-vcpu.fpu_active = 1;
+   update_cr0_intercept(svm);
 
return 1;
 }
@@ -2297,7 +2314,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm) = 
{
[SVM_EXIT_READ_CR3] = emulate_on_interception,
[SVM_EXIT_READ_CR4] = emulate_on_interception,
[SVM_EXIT_READ_CR8] = emulate_on_interception,
-   /* for now: */
+   [SVM_EXIT_CR0_SEL_WRITE]= emulate_on_interception,
[SVM_EXIT_WRITE_CR0]= emulate_on_interception,
[SVM_EXIT_WRITE_CR3]= emulate_on_interception,
[SVM_EXIT_WRITE_CR4]= emulate_on_interception,
@@ -2383,21 +2400,10 @@ static int handle_exit(struct kvm_vcpu *vcpu)
 
svm_complete_interrupts(svm);
 
-   if (npt_enabled) {
-   int mmu_reload = 0;
-   if ((kvm_read_cr0_bits(vcpu, X86_CR0_PG) ^ svm-vmcb-save.cr0)
-X86_CR0_PG) {
-   svm_set_cr0(vcpu, svm-vmcb-save.cr0);
-   mmu_reload = 1;
-  

Re: [Qemu-devel] cpuid problem in upstream qemu with kvm

2010-01-07 Thread Dor Laor

On 01/07/2010 02:00 PM, Avi Kivity wrote:

On 01/07/2010 01:44 PM, Dor Laor wrote:

So if you had a 2.6.18 kernel and a 2.6.33 kernel, it may be necessary
to say:

(2.6.33) qemu -cpu Nehalem,-syscall
(2.6.18) qemu -cpu Nehalem



Or let qemu do it automatically for you.


qemu on 2.6.33 doesn't know that you're running qemu on 2.6.18 on
another node.



We can live with it, either have qemu realize the kernel version out of 
another existing feature or query uname.


Alternatively, the matching libvirt package can be the one adding or 
removing it in the right distribution.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] cpuid problem in upstream qemu with kvm

2010-01-07 Thread Anthony Liguori

On 01/07/2010 06:20 AM, Dor Laor wrote:

On 01/07/2010 02:00 PM, Avi Kivity wrote:

On 01/07/2010 01:44 PM, Dor Laor wrote:

So if you had a 2.6.18 kernel and a 2.6.33 kernel, it may be necessary
to say:

(2.6.33) qemu -cpu Nehalem,-syscall
(2.6.18) qemu -cpu Nehalem



Or let qemu do it automatically for you.


qemu on 2.6.33 doesn't know that you're running qemu on 2.6.18 on
another node.



We can live with it, either have qemu realize the kernel version out 
of another existing feature or query uname.


Alternatively, the matching libvirt package can be the one adding or 
removing it in the right distribution.


There's another option.

Make cpuid information part of live migration protocol, and then support 
something like -cpu Xeon-3550.  We would remember the exact cpuid mask 
we present to the guest and then we could validate that we can obtain 
the same mask on the destination.


Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WinXP virtual crashes on 0.12.1.2 but not 0.12.1.1

2010-01-07 Thread Avi Kivity

On 01/07/2010 11:57 AM, Mark Cave-Ayland wrote:

Avi Kivity wrote:

I've just re-created the KVM image fresh from the VDI image once 
again and can confirm that disabling just the Processor driver is 
enough to allow the guest WinXP VM to function in qemu-kvm-0.12.1.2. 
Perhaps the default for -cpu host should not be changed in a micro 
release as there is a risk of breaking existing VMs?




That was actually a fix for a regression relative to 0.11.


Really? Damn :(  Any pointers towards the relevant bug in the bug 
tracker?





No, it was reported on list.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] make help output be a little more self-consistent

2010-01-07 Thread Marcelo Tosatti
Bruce,

Can you please send two patches, one for qemu upstream
(qemu-de...@nongnu.org), and another for qemu-kvm (relative to qemu-kvm
specific options).

Thanks.

On Wed, Jan 06, 2010 at 12:31:20PM -0700, Bruce Rogers wrote:
 Signed-off-by: Bruce Rogers brog...@novell.com
 ---
  qemu-options.hx |   58 --
  1 files changed, 30 insertions(+), 28 deletions(-)
 
 diff --git a/qemu-options.hx b/qemu-options.hx
 index 812d067..fdd5884 100644
 --- a/qemu-options.hx
 +++ b/qemu-options.hx
 @@ -42,7 +42,7 @@ DEF(smp, HAS_ARG, QEMU_OPTION_smp,
  -smp 
 n[,maxcpus=cpus][,cores=cores][,threads=threads][,sockets=sockets]\n
  set the number of CPUs to 'n' [default=1]\n
  maxcpus= maximum number of total cpus, including\n
 -  offline CPUs for hotplug etc.\n
 +offline CPUs for hotplug, etc\n
  cores= number of CPU cores on one socket\n
  threads= number of threads on one CPU core\n
  sockets= number of discrete sockets in the system\n)
 @@ -406,8 +406,9 @@ ETEXI
  DEF(device, HAS_ARG, QEMU_OPTION_device,
  -device driver[,options]  add device\n)
  DEF(name, HAS_ARG, QEMU_OPTION_name,
 --name string1[,process=string2]set the name of the guest\n
 -string1 sets the window title and string2 the process name 
 (on Linux)\n)
 +-name string1[,process=string2]\n
 +set the name of the guest\n
 +string1 sets the window title and string2 the process 
 name (on Linux)\n)
  STEXI
  @item -name @var{name}
  Sets the @var{name} of the guest.
 @@ -484,7 +485,7 @@ ETEXI
  

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH resend] Fix the explanation of write_emulated

2010-01-07 Thread Marcelo Tosatti
On Wed, Jan 06, 2010 at 05:55:23PM +0900, Takuya Yoshikawa wrote:
 The explanation of write_emulated is confused with
 that of read_emulated. This patch fix it.
 
 Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] configure: Correct KVM options in help output

2010-01-07 Thread Marcelo Tosatti
On Wed, Jan 06, 2010 at 10:23:54AM +0100, Pierre Riteau wrote:
 Signed-off-by: Pierre Riteau pierre.rit...@irisa.fr
 ---
  configure |8 
  1 files changed, 4 insertions(+), 4 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] cpuid problem in upstream qemu with kvm

2010-01-07 Thread Avi Kivity

On 01/07/2010 02:33 PM, Anthony Liguori wrote:


There's another option.

Make cpuid information part of live migration protocol, and then 
support something like -cpu Xeon-3550.  We would remember the exact 
cpuid mask we present to the guest and then we could validate that we 
can obtain the same mask on the destination.


Currently, our policy is to only migrate dynamic (from the guest's point 
of view) state, and specify static state on the command line [1].


I think your suggestion makes a lot of sense, but I'd like to expand it 
to move all guest state, whether dynamic or static.  So '-m 1G' would be 
migrated as well (but not -mem-path).  Similarly, in -drive 
file=...,if=ide,index=1, everything but file=... would be migrated.


This has an advantage wrt hotplug: since qemu is responsible for 
migrating all guest visible information, the migrator is no longer 
responsible for replaying hotplug events in the exact sequence they 
happened.


In short, I think we should apply your suggestion as broadly as possible.

[1] cpuid state is actually dynamic; repeated cpuid instruction 
execution with the same operands can return different results.  kvm 
supports querying and setting this state.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] cpuid problem in upstream qemu with kvm

2010-01-07 Thread Daniel P. Berrange
On Thu, Jan 07, 2010 at 02:40:34PM +0200, Avi Kivity wrote:
 On 01/07/2010 02:33 PM, Anthony Liguori wrote:
 
 There's another option.
 
 Make cpuid information part of live migration protocol, and then 
 support something like -cpu Xeon-3550.  We would remember the exact 
 cpuid mask we present to the guest and then we could validate that we 
 can obtain the same mask on the destination.
 
 Currently, our policy is to only migrate dynamic (from the guest's point 
 of view) state, and specify static state on the command line [1].
 
 I think your suggestion makes a lot of sense, but I'd like to expand it 
 to move all guest state, whether dynamic or static.  So '-m 1G' would be 
 migrated as well (but not -mem-path).  Similarly, in -drive 
 file=...,if=ide,index=1, everything but file=... would be migrated.
 
 This has an advantage wrt hotplug: since qemu is responsible for 
 migrating all guest visible information, the migrator is no longer 
 responsible for replaying hotplug events in the exact sequence they 
 happened.

With the introduction of the new -device spport, there's no need to
replay hotplug events in order any more. Instead just use static
PCI addresses when starting the guest, and the same addresses after
migration. You could argue that QEMU should preserve the addressing
automatically during migration, but apps need to do it manually
already to keep addreses stable across power-offs, so doing it manually
across migration too is no extra burden.

Regards,
Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] cpuid problem in upstream qemu with kvm

2010-01-07 Thread Avi Kivity

On 01/07/2010 02:47 PM, Daniel P. Berrange wrote:


With the introduction of the new -device spport, there's no need to
replay hotplug events in order any more. Instead just use static
PCI addresses when starting the guest, and the same addresses after
migration. You could argue that QEMU should preserve the addressing
automatically during migration, but apps need to do it manually
already to keep addreses stable across power-offs, so doing it manually
across migration too is no extra burden.

   


That's true - shutdown and startup are an equivalent problem to live 
migration from that point of view.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] cpuid problem in upstream qemu with kvm

2010-01-07 Thread Anthony Liguori

On 01/07/2010 06:40 AM, Avi Kivity wrote:

On 01/07/2010 02:33 PM, Anthony Liguori wrote:


There's another option.

Make cpuid information part of live migration protocol, and then 
support something like -cpu Xeon-3550.  We would remember the exact 
cpuid mask we present to the guest and then we could validate that we 
can obtain the same mask on the destination.


Currently, our policy is to only migrate dynamic (from the guest's 
point of view) state, and specify static state on the command line [1].


I think your suggestion makes a lot of sense, but I'd like to expand 
it to move all guest state, whether dynamic or static.  So '-m 1G' 
would be migrated as well (but not -mem-path).  Similarly, in -drive 
file=...,if=ide,index=1, everything but file=... would be migrated.


Yes, I agree with this and it should be in the form of an fdt.  This 
means we need full qdev conversion.


But I think cpuid is somewhere in the middle with respect to static vs. 
dynamic.  For instance, -cpu host is very dynamic in that you get very 
difficult results on different systems.  Likewise, because of kvm 
filtering, even -cpu qemu64 can be dynamic.


So if we didn't have filtering and -cpu host, I'd agree that it's 
totally static but I think in the current state, it's dynamic.


This has an advantage wrt hotplug: since qemu is responsible for 
migrating all guest visible information, the migrator is no longer 
responsible for replaying hotplug events in the exact sequence they 
happened.


Yup, 100% in agreement as a long term goal.


In short, I think we should apply your suggestion as broadly as possible.

[1] cpuid state is actually dynamic; repeated cpuid instruction 
execution with the same operands can return different results.  kvm 
supports querying and setting this state.


Yes, and we save some cpuid state in cpu.  We just don't save all of it.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] cpuid problem in upstream qemu with kvm

2010-01-07 Thread Dor Laor

On 01/07/2010 03:14 PM, Anthony Liguori wrote:

On 01/07/2010 06:40 AM, Avi Kivity wrote:

On 01/07/2010 02:33 PM, Anthony Liguori wrote:


There's another option.

Make cpuid information part of live migration protocol, and then
support something like -cpu Xeon-3550. We would remember the exact
cpuid mask we present to the guest and then we could validate that we
can obtain the same mask on the destination.


It solves controlling the destination qemu execution all right but does 
not change the initial spawning of the original guest - to know whether 
,-syscall is needed or not.


Anyway, I'm in favor of it too.



Currently, our policy is to only migrate dynamic (from the guest's
point of view) state, and specify static state on the command line [1].

I think your suggestion makes a lot of sense, but I'd like to expand
it to move all guest state, whether dynamic or static. So '-m 1G'
would be migrated as well (but not -mem-path). Similarly, in -drive
file=...,if=ide,index=1, everything but file=... would be migrated.


Yes, I agree with this and it should be in the form of an fdt. This
means we need full qdev conversion.

But I think cpuid is somewhere in the middle with respect to static vs.
dynamic. For instance, -cpu host is very dynamic in that you get very
difficult results on different systems. Likewise, because of kvm
filtering, even -cpu qemu64 can be dynamic.

So if we didn't have filtering and -cpu host, I'd agree that it's
totally static but I think in the current state, it's dynamic.


This has an advantage wrt hotplug: since qemu is responsible for
migrating all guest visible information, the migrator is no longer
responsible for replaying hotplug events in the exact sequence they
happened.


Yup, 100% in agreement as a long term goal.


In short, I think we should apply your suggestion as broadly as possible.

[1] cpuid state is actually dynamic; repeated cpuid instruction
execution with the same operands can return different results. kvm
supports querying and setting this state.


Yes, and we save some cpuid state in cpu. We just don't save all of it.

Regards,

Anthony Liguori



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


may be offtopic question

2010-01-07 Thread Vasiliy G Tolstov
Hello. I'm new with kvm, i'm try to run exherbo linux under it.
Kvm runs on 2.6.32 on gentoo linux 

Virtual machine works slowly, some times i see this error in syslog:

[ 1929.705897] BUG: MAX_LOCK_DEPTH too low!
[ 1929.705902] turning off the locking correctness validator.
[ 1929.705906] Pid: 6523, comm: vm1 Not tainted 2.6.32-gentoo-r1vase #1
[ 1929.705909] Call Trace:
[ 1929.705918]  [8109127e] __lock_acquire+0x8d/0x3f0
[ 1929.705924]  [81092856] lock_acquire+0xd7/0xfc
[ 1929.705930]  [810f46bf] ? mm_take_all_locks+0x92/0x105
[ 1929.705936]  [816365fa] _spin_lock_nest_lock+0x40/0x75
[ 1929.705940]  [810f46bf] ? mm_take_all_locks+0x92/0x105
[ 1929.705945]  [810f46bf] mm_take_all_locks+0x92/0x105
[ 1929.705949]  [810feda9] ? do_mmu_notifier_register
+0x80/0x149
[ 1929.705954]  [810fedb1] do_mmu_notifier_register+0x88/0x149
[ 1929.705958]  [810fee8d] mmu_notifier_register+0xe/0x10
[ 1929.705964]  [8100c44c] kvm_dev_ioctl+0x138/0x2f7
[ 1929.705969]  [81143854] compat_sys_ioctl+0x1b5/0x40d
[ 1929.705975]  [8163585f] ? lockdep_sys_exit_thunk+0x35/0x67
[ 1929.705980]  [8126a678] ? __up_read+0x1c/0x8c
[ 1929.705987]  [81056f7f] sysenter_dispatch+0x7/0x2e

qemu run as:

qemu-system-x86_64 -boot c -m 512 -sdl -net vde,name=tap0,vlan=0 -net
nic,vlan=0,macaddr=52:54:00:00:EE:03 -localtime -name vm1,process=vm1
-vga std -balloon virtio -enable-kvm -runas vase -enable-nesting
-cdrom /media/kvm/install-x86-minimal-20091103.iso -hda
~vase/exherbo-kvm-amd64-20091013.img




-- 
Vasiliy G Tolstov v.tols...@selfip.ru
Selfip.Ru

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: don't treat NULL parent_pte as multimapped in mmu_parent_walk()

2010-01-07 Thread Roel Kluin
If a kvm_mmu_page is not multimapped but parent_pte is NULL
don't treat it as multimapped and dereference it.

Signed-off-by: Roel Kluin roel.kl...@gmail.com
---
This wasn't tested and maybe I misunderstood so please review.

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 4c3e5b2..eb17287 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1031,10 +1031,12 @@ static void mmu_parent_walk(struct kvm_vcpu *vcpu, 
struct kvm_mmu_page *sp,
struct kvm_mmu_page *parent_sp;
int i;
 
-   if (!sp-multimapped  sp-parent_pte) {
-   parent_sp = page_header(__pa(sp-parent_pte));
-   fn(vcpu, parent_sp);
-   mmu_parent_walk(vcpu, parent_sp, fn);
+   if (!sp-multimapped) {
+   if (sp-parent_pte) {
+   parent_sp = page_header(__pa(sp-parent_pte));
+   fn(vcpu, parent_sp);
+   mmu_parent_walk(vcpu, parent_sp, fn);
+   }
return;
}
hlist_for_each_entry(pte_chain, node, sp-parent_ptes, link)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: may be offtopic question

2010-01-07 Thread Avi Kivity

On 01/07/2010 05:00 PM, Vasiliy G Tolstov wrote:

Hello. I'm new with kvm, i'm try to run exherbo linux under it.
Kvm runs on 2.6.32 on gentoo linux

Virtual machine works slowly, some times i see this error in syslog:

[ 1929.705897] BUG: MAX_LOCK_DEPTH too low!
[ 1929.705902] turning off the locking correctness validator.
   


You're running a debugging configuration.  Turn off CONFIG_LOCKDEP and 
other debugging options if you want reasonable performance.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: don't treat NULL parent_pte as multimapped in mmu_parent_walk()

2010-01-07 Thread Avi Kivity

On 01/07/2010 05:56 PM, Roel Kluin wrote:

If a kvm_mmu_page is not multimapped but parent_pte is NULL
don't treat it as multimapped and dereference it.

Signed-off-by: Roel Kluinroel.kl...@gmail.com
---
This wasn't tested and maybe I misunderstood so please review.

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 4c3e5b2..eb17287 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1031,10 +1031,12 @@ static void mmu_parent_walk(struct kvm_vcpu *vcpu, 
struct kvm_mmu_page *sp,
struct kvm_mmu_page *parent_sp;
int i;

-   if (!sp-multimapped  sp-parent_pte) {
-   parent_sp = page_header(__pa(sp-parent_pte));
-   fn(vcpu, parent_sp);
-   mmu_parent_walk(vcpu, parent_sp, fn);
+   if (!sp-multimapped) {
+   if (sp-parent_pte) {
+   parent_sp = page_header(__pa(sp-parent_pte));
+   fn(vcpu, parent_sp);
+   mmu_parent_walk(vcpu, parent_sp, fn);
+   }
return;
}
hlist_for_each_entry(pte_chain, node,sp-parent_ptes, link)

   


If sp-parent_pte is NULL then the list walk terminates immediately.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM: SVM: Lazy fpu for npt

2010-01-07 Thread Joerg Roedel
On Thu, Jan 07, 2010 at 02:15:44PM +0200, Avi Kivity wrote:
 If two conditions apply:
  - no bits outside TS and EM differ between the host and guest cr0
  - the fpu is active
 
 then we can activate the selective cr0 write intercept and drop the
 unconditional cr0 read and write intercept, and allow the guest to run
 with the host fpu state.  This reduces the heavyweight context switch
 when npt is enabled.


 - if (npt_enabled) {
 - int mmu_reload = 0;
 - if ((kvm_read_cr0_bits(vcpu, X86_CR0_PG) ^ svm-vmcb-save.cr0)
 -  X86_CR0_PG) {
 - svm_set_cr0(vcpu, svm-vmcb-save.cr0);
 - mmu_reload = 1;
 - }
 + if (!(svm-vmcb-control.intercept_cr_write  INTERCEPT_CR0_MASK))
   vcpu-arch.cr0 = svm-vmcb-save.cr0;
 + if (npt_enabled)
   vcpu-arch.cr3 = svm-vmcb-save.cr3;
 - if (mmu_reload) {
 - kvm_mmu_reset_context(vcpu);
 - kvm_mmu_load(vcpu);
 - }
 - }
 -

Hmm, I think removing this hack is a seperate issue. Should it be a
sepearte patch which enables cr0 intercept for npt and removes these
lines? It makes this change more clear in the logs.

Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: network shutdown under heavy load

2010-01-07 Thread rek2
Hi guys, it happen again (in this server I didn't patch with the fix you 
guys sent) but I did this so if it happen i can test with tcpdump.. 
seems that the guest can receive packages but can't sent...

when I open a tcpdump I saw traffic coming in, but not out.

Hope this helps..
also I need to know if the patch you guys sent me will be in newer 
versions, if not I like to know since I can't update.



On 12/21/09 11:39 a.m., rek2 wrote:
You  say this version.. is there a newer version with this patch 
already apply to it?


Thanks



On 12/17/09 20:27 p.m., Herbert Xu wrote:

On Thu, Dec 17, 2009 at 01:15:46PM -0500, rek2 wrote:

I been told that today the network when down again and one of the guys
here had to log using the console and restart it for that particular
guests..

on the guest:
  uname -a
Linux  2.6.27.25-170.2.72.fc10.x86_64 #1 SMP Sun Jun 21 18:39:34 
EDT

2009 x86_64 x86_64 x86_64 GNU/Linux

Next time it goes down I will try to run a sniffer and try both sides.

OK I'm fairly sure this version has a buggy virtio-net.  Does
this patch (if it applies :) help?

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9eec5a5..74b3854 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -521,8 +521,10 @@ static void xmit_tasklet(unsigned long data)
  vi-svq-vq_ops-kick(vi-svq);
  vi-last_xmit_skb = NULL;
  }
-if (vi-free_in_tasklet)
+if (vi-free_in_tasklet) {
  free_old_xmit_skbs(vi);
+netif_wake_queue(vi-dev);
+}
  netif_tx_unlock_bh(vi-dev);
  }

Cheers,


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] make help output be a little more self-consistent

2010-01-07 Thread Bruce Rogers

This is the part which applies to qemu-kvm. 

Signed-off-by: Bruce Rogers brog...@novell.com 
---
 qemu-options.hx |   19 ++-
 1 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 788d849..fdd5884 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1938,7 +1938,7 @@ DEF(readconfig, HAS_ARG, QEMU_OPTION_readconfig,
 -readconfig file\n)
 DEF(writeconfig, HAS_ARG, QEMU_OPTION_writeconfig,
 -writeconfig file\n
-read/write config file)
+read/write config file\n)
 
 DEF(no-kvm, 0, QEMU_OPTION_no_kvm,
 -no-kvm disable KVM hardware virtualization\n)
@@ -1947,26 +1947,27 @@ DEF(no-kvm-irqchip, 0, QEMU_OPTION_no_kvm_irqchip,
 DEF(no-kvm-pit, 0, QEMU_OPTION_no_kvm_pit,
 -no-kvm-pit disable KVM kernel mode PIT\n)
 DEF(no-kvm-pit-reinjection, 0, QEMU_OPTION_no_kvm_pit_reinjection,
--no-kvm-pit-reinjection disable KVM kernel mode PIT interrupt 
reinjection\n)
+-no-kvm-pit-reinjection\n
+disable KVM kernel mode PIT interrupt reinjection\n)
 #if defined(TARGET_I386) || defined(TARGET_X86_64) || defined(TARGET_IA64) || 
defined(__linux__)
 DEF(pcidevice, HAS_ARG, QEMU_OPTION_pcidevice,
 -pcidevice host=bus:dev.func[,dma=none][,name=string]\n
-expose a PCI device to the guest OS.\n
+expose a PCI device to the guest OS\n
 dma=none: don't perform any dma translations (default is 
to use an iommu)\n
-'string' is used in log output.\n)
+'string' is used in log output\n)
 #endif
 DEF(enable-nesting, 0, QEMU_OPTION_enable_nesting,
 -enable-nesting enable support for running a VM inside the VM (AMD 
only)\n)
 DEF(nvram, HAS_ARG, QEMU_OPTION_nvram,
--nvram FILE  provide ia64 nvram contents\n)
+-nvram FILE provide ia64 nvram contents\n)
 DEF(tdf, 0, QEMU_OPTION_tdf,
--tdf enable guest time drift compensation\n)
+-tdfenable guest time drift compensation\n)
 DEF(kvm-shadow-memory, HAS_ARG, QEMU_OPTION_kvm_shadow_memory,
 -kvm-shadow-memory MEGABYTES\n
- allocate MEGABYTES for kvm mmu shadowing\n)
+allocate MEGABYTES for kvm mmu shadowing\n)
 DEF(mem-path, HAS_ARG, QEMU_OPTION_mempath,
--mem-path FILE   provide backing storage for guest RAM\n)
+-mem-path FILE  provide backing storage for guest RAM\n)
 #ifdef MAP_POPULATE
 DEF(mem-prealloc, 0, QEMU_OPTION_mem_prealloc,
--mem-preallocpreallocate guest memory (use with -mempath)\n)
+-mem-prealloc   preallocate guest memory (use with -mempath)\n)
 #endif


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] [RESEND] make help output be a little more self-consistent

2010-01-07 Thread Bruce Rogers
This is the part which applies to the base qemu. 
btw: it was sent to qemu-de...@nongnu.org yesterday.) 

Signed-off-by: Bruce Rogers 
---
 qemu-options.hx |   39 ---
 1 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index ecd50eb..20b696d 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -42,7 +42,7 @@ DEF(smp, HAS_ARG, QEMU_OPTION_smp,
 -smp n[,maxcpus=cpus][,cores=cores][,threads=threads][,sockets=sockets]\n
 set the number of CPUs to 'n' [default=1]\n
 maxcpus= maximum number of total cpus, including\n
-  offline CPUs for hotplug etc.\n
+offline CPUs for hotplug, etc\n
 cores= number of CPU cores on one socket\n
 threads= number of threads on one CPU core\n
 sockets= number of discrete sockets in the system\n)
@@ -405,8 +405,9 @@ ETEXI
 DEF(device, HAS_ARG, QEMU_OPTION_device,
 -device driver[,options]  add device\n)
 DEF(name, HAS_ARG, QEMU_OPTION_name,
--name string1[,process=string2]set the name of the guest\n
-string1 sets the window title and string2 the process name 
(on Linux)\n)
+-name string1[,process=string2]\n
+set the name of the guest\n
+string1 sets the window title and string2 the process 
name (on Linux)\n)
 STEXI
 @item -name @var{name}
 Sets the @var{name} of the guest.
@@ -483,7 +484,7 @@ ETEXI
 
 #ifdef CONFIG_SDL
 DEF(ctrl-grab, 0, QEMU_OPTION_ctrl_grab,
--ctrl-grab   use Right-Ctrl to grab mouse (instead of Ctrl-Alt)\n)
+-ctrl-grab  use Right-Ctrl to grab mouse (instead of Ctrl-Alt)\n)
 #endif
 STEXI
 @item -ctrl-grab
@@ -756,12 +757,12 @@ ETEXI
 #ifdef TARGET_I386
 DEF(smbios, HAS_ARG, QEMU_OPTION_smbios,
 -smbios file=binary\n
-Load SMBIOS entry from binary file\n
+load SMBIOS entry from binary file\n
 -smbios type=0[,vendor=str][,version=str][,date=str][,release=%%d.%%d]\n
-Specify SMBIOS type 0 fields\n
+specify SMBIOS type 0 fields\n
 -smbios 
type=1[,manufacturer=str][,product=str][,version=str][,serial=str]\n
   [,uuid=uuid][,sku=str][,family=str]\n
-Specify SMBIOS type 1 fields\n)
+specify SMBIOS type 1 fields\n)
 #endif
 STEXI
 @item -smbios fi...@var{binary}
@@ -816,13 +817,13 @@ DEF(net, HAS_ARG, QEMU_OPTION_net,
 -net 
tap[,vlan=n][,name=str][,fd=h][,ifname=name][,script=file][,downscript=dfile][,sndbuf=nbytes][,vnet_hdr=on|off]\n
 connect the host TAP network interface to VLAN 'n' and 
use the\n
 network scripts 'file' (default=%s)\n
-and 'dfile' (default=%s);\n
-use '[down]script=no' to disable script execution;\n
+and 'dfile' (default=%s)\n
+use '[down]script=no' to disable script execution\n
 use 'fd=h' to connect to an already opened TAP 
interface\n
-use 'sndbuf=nbytes' to limit the size of the send buffer; 
the\n
-default of 'sndbuf=1048576' can be disabled using 
'sndbuf=0'\n
-use vnet_hdr=off to avoid enabling the IFF_VNET_HDR tap 
flag; use\n
-vnet_hdr=on to make the lack of IFF_VNET_HDR support an 
error condition\n
+use 'sndbuf=nbytes' to limit the size of the send buffer 
(the\n
+default of 'sndbuf=1048576' can be disabled using 
'sndbuf=0')\n
+use vnet_hdr=off to avoid enabling the IFF_VNET_HDR tap 
flag\n
+use vnet_hdr=on to make the lack of IFF_VNET_HDR support 
an error condition\n
 #endif
 -net 
socket[,vlan=n][,name=str][,fd=h][,listen=[host]:port][,connect=host:port]\n
 connect the vlan 'n' to another VLAN using a socket 
connection\n
@@ -837,7 +838,7 @@ DEF(net, HAS_ARG, QEMU_OPTION_net,
 #endif
 -net dump[,vlan=n][,file=f][,len=n]\n
 dump traffic on vlan 'n' to file 'f' (max n bytes per 
packet)\n
--net none   use it alone to have zero network devices; if no -net 
option\n
+-net none   use it alone to have zero network devices. If no -net 
option\n
 is provided, the default is '-net nic -net user'\n)
 DEF(netdev, HAS_ARG, QEMU_OPTION_netdev,
 -netdev [
@@ -1589,7 +1590,7 @@ The default device is @code{vc} in graphical mode and 
@code{stdio} in
 non graphical mode.
 ETEXI
 DEF(qmp, HAS_ARG, QEMU_OPTION_qmp, \
--qmp devlike -monitor but opens in 'control' mode.\n)
+-qmp devlike -monitor but opens in 'control' mode\n)
 
 DEF(mon, HAS_ARG, QEMU_OPTION_mon, \
 -mon chardev=[name][,mode=readline|control][,default]\n)
@@ -1607,7 +1608,7 @@ from a script.
 ETEXI
 
 DEF(singlestep, 

Re: [PATCH 0/2] eventfd: new EFD_STATE flag

2010-01-07 Thread Davide Libenzi
On Thu, 7 Jan 2010, Michael S. Tsirkin wrote:

 Sure, I was trying to be as brief as possible, here's a detailed summary.
 
 Description of the system (MSI emulation in KVM):
 
 KVM supports an ioctl to assign/deassign an eventfd file to interrupt message
 in guest OS.  When this eventfd is signalled, interrupt message is sent.
 This assignment is done from qemu system emulator.
 
 eventfd is signalled from device emulation in another thread in
 userspace or from kernel, which talks with guest OS through another
 eventfd and shared memory (possibility of out of process was discussed
 but never got implemented yet).
 
 Note: it's okay to delay messages from correctness point of view, but
 generally this is latency-sensitive path. If multiple identical messages
 are requested, it's okay to send a single last message, but missing a
 message altogether causes deadlocks.  Sending a message when none were
 requested might in theory cause crashes, in practice doing this causes
 performance degradation.
 
 Another KVM feature is interrupt masking: guest OS requests that we
 stop sending some interrupt message, possibly modified mapping
 and re-enables this message. This needs to be done without
 involving the device that might keep requesting events:
 while masked, message is marked pending, and guest might test
 the pending status.
 
 We can implement masking in system emulator in userspace, by using
 assign/deassign ioctls: when message is masked, we simply deassign all
 eventfd, and when it is unmasked, we assign them back.
 
 Here's some code to illustrate how this all works: assign/deassign code
 in kernel looks like the following:
 
 
 this is called to unmask interrupt
 
 static int
 kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
 {
   struct _irqfd *irqfd, *tmp;
   struct file *file = NULL;
   struct eventfd_ctx *eventfd = NULL;
   int ret;
   unsigned int events;
 
   irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
 
 ...
 
   file = eventfd_fget(fd);
   if (IS_ERR(file)) {
   ret = PTR_ERR(file);
   goto fail;
   }
 
   eventfd = eventfd_ctx_fileget(file);
   if (IS_ERR(eventfd)) {
   ret = PTR_ERR(eventfd);
   goto fail;
   }
 
   irqfd-eventfd = eventfd;
 
   /*
* Install our own custom wake-up handling so we are notified via
* a callback whenever someone signals the underlying eventfd
*/
   init_waitqueue_func_entry(irqfd-wait, irqfd_wakeup);
   init_poll_funcptr(irqfd-pt, irqfd_ptable_queue_proc);
 
   spin_lock_irq(kvm-irqfds.lock);
 
   events = file-f_op-poll(file, irqfd-pt);
 
   list_add_tail(irqfd-list, kvm-irqfds.items);
   spin_unlock_irq(kvm-irqfds.lock);
 
 A.
   /*
* Check if there was an event already pending on the eventfd
* before we registered, and trigger it as if we didn't miss it.
*/
   if (events  POLLIN)
   schedule_work(irqfd-inject);
 
   /*
* do not drop the file until the irqfd is fully initialized, otherwise
* we might race against the POLLHUP
*/
   fput(file);
 
   return 0;
 
 fail:
   ...
 }
 
 This is called to mask interrupt
 
 /*
  * shutdown any irqfd's that match fd+gsi
  */
 static int
 kvm_irqfd_deassign(struct kvm *kvm, int fd, int gsi)
 {
   struct _irqfd *irqfd, *tmp;
   struct eventfd_ctx *eventfd;
 
   eventfd = eventfd_ctx_fdget(fd);
   if (IS_ERR(eventfd))
   return PTR_ERR(eventfd);
 
   spin_lock_irq(kvm-irqfds.lock);
 
   list_for_each_entry_safe(irqfd, tmp, kvm-irqfds.items, list) {
   if (irqfd-eventfd == eventfd  irqfd-gsi == gsi)
   irqfd_deactivate(irqfd);
   }
 
   spin_unlock_irq(kvm-irqfds.lock);
   eventfd_ctx_put(eventfd);
 
   /*
* Block until we know all outstanding shutdown jobs have completed
* so that we guarantee there will not be any more interrupts on this
* gsi once this deassign function returns.
*/
   flush_workqueue(irqfd_cleanup_wq);
 
   return 0;
 }
 
 
 And deactivation deep down does this (from irqfd_cleanup_wq workqueue,
 so this is not under the spinlock):
 
 /*
  * Synchronize with the wait-queue and unhook ourselves to
  * prevent
  * further events.
  */
 B.
 remove_wait_queue(irqfd-wqh, irqfd-wait);
 
   
 
 /*
  * It is now safe to release the object's resources
  */
 eventfd_ctx_put(irqfd-eventfd);
 kfree(irqfd);
 
 
 The problems (really the same bug) in KVM that I am trying to fix:
 1. Because of A above, if event was requested while message was masked,
we will not miss a message. However, because we never clear
the counter, so we currently get a spurious message each time
we unmask.
 
We should clear the counter either each time we
deliver message, 

Re: [PATCH 0/2] eventfd: new EFD_STATE flag

2010-01-07 Thread Davide Libenzi
On Thu, 7 Jan 2010, Davide Libenzi wrote:

 When you unmask, shouldn't the fd be in a clear state anyway?

Forget that. If it's used for IRQs you likely need to have a current state 
when you unmask.


- Davide


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] eventfd: new EFD_STATE flag

2010-01-07 Thread Davide Libenzi
On Thu, 7 Jan 2010, Michael S. Tsirkin wrote:

 Sure, I was trying to be as brief as possible, here's a detailed summary.
 
 Description of the system (MSI emulation in KVM):
 
 KVM supports an ioctl to assign/deassign an eventfd file to interrupt message
 in guest OS.  When this eventfd is signalled, interrupt message is sent.
 This assignment is done from qemu system emulator.
 
 eventfd is signalled from device emulation in another thread in
 userspace or from kernel, which talks with guest OS through another
 eventfd and shared memory (possibility of out of process was discussed
 but never got implemented yet).
 
 Note: it's okay to delay messages from correctness point of view, but
 generally this is latency-sensitive path. If multiple identical messages
 are requested, it's okay to send a single last message, but missing a
 message altogether causes deadlocks.  Sending a message when none were
 requested might in theory cause crashes, in practice doing this causes
 performance degradation.
 
 Another KVM feature is interrupt masking: guest OS requests that we
 stop sending some interrupt message, possibly modified mapping
 and re-enables this message. This needs to be done without
 involving the device that might keep requesting events:
 while masked, message is marked pending, and guest might test
 the pending status.
 
 We can implement masking in system emulator in userspace, by using
 assign/deassign ioctls: when message is masked, we simply deassign all
 eventfd, and when it is unmasked, we assign them back.
 
 Here's some code to illustrate how this all works: assign/deassign code
 in kernel looks like the following:
 
 
 this is called to unmask interrupt
 
 static int
 kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
 {
   struct _irqfd *irqfd, *tmp;
   struct file *file = NULL;
   struct eventfd_ctx *eventfd = NULL;
   int ret;
   unsigned int events;
 
   irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
 
 ...
 
   file = eventfd_fget(fd);
   if (IS_ERR(file)) {
   ret = PTR_ERR(file);
   goto fail;
   }
 
   eventfd = eventfd_ctx_fileget(file);
   if (IS_ERR(eventfd)) {
   ret = PTR_ERR(eventfd);
   goto fail;
   }
 
   irqfd-eventfd = eventfd;
 
   /*
* Install our own custom wake-up handling so we are notified via
* a callback whenever someone signals the underlying eventfd
*/
   init_waitqueue_func_entry(irqfd-wait, irqfd_wakeup);
   init_poll_funcptr(irqfd-pt, irqfd_ptable_queue_proc);
 
   spin_lock_irq(kvm-irqfds.lock);
 
   events = file-f_op-poll(file, irqfd-pt);
 
   list_add_tail(irqfd-list, kvm-irqfds.items);
   spin_unlock_irq(kvm-irqfds.lock);
 
 A.
   /*
* Check if there was an event already pending on the eventfd
* before we registered, and trigger it as if we didn't miss it.
*/
   if (events  POLLIN)
   schedule_work(irqfd-inject);
 
   /*
* do not drop the file until the irqfd is fully initialized, otherwise
* we might race against the POLLHUP
*/
   fput(file);
 
   return 0;
 
 fail:
   ...
 }

What is you do (under proper irqfd locking) something like:

eventfd_ctx_read(ctx, 1, cnt);
if (irqfd-cnt != cnt) {
irqfd-cnt = cnt;
schedule_work(irqfd-inject);
}




 And deactivation deep down does this (from irqfd_cleanup_wq workqueue,
 so this is not under the spinlock):
 
 /*
  * Synchronize with the wait-queue and unhook ourselves to
  * prevent
  * further events.
  */
 B.
 remove_wait_queue(irqfd-wqh, irqfd-wait);
 
   
 
 /*
  * It is now safe to release the object's resources
  */
 eventfd_ctx_put(irqfd-eventfd);
 kfree(irqfd);

And:

eventfd_ctx_read(ctx, 1, irqfd-cnt);
remove_wait_queue(irqfd-wqh, irqfd-wait);




- Davide


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: pci-stub error and MSI-X for KVM guest

2010-01-07 Thread Chris Wright
* Fischer, Anna (anna.fisc...@hp.com) wrote:
 So, when setting a breakpoint for the exit() call I'm getting a bit closer to 
 figuring where it kills my guest.

Thanks, this helps clarify what is happening.

 Breakpoint 1, exit (status=1) at exit.c:99
 99{
 Current language:  auto
 The current source language is auto; currently c.
 (gdb) bt
 #0  exit (status=1) at exit.c:99
 #1  0x00470c6e in assigned_dev_pci_read_config (d=0x259c6f0, 
 address=64, len=4)

assigned_dev_pci_read_config(..., 64, 4)
  ^^
This is a libvirt issue.  When you use virt-manager it has libvirtd
fork/exec qemu-kvm.  libvirtd will drop privileges and run qemu-kvm as
user qemu (or perhaps root if you've edited qemu.conf).  Regardless of
the user, it clears capabilities.  Reading PCI config space beyond just
the header requires CAP_SYS_ADMIN.  The above is reading the first 4
bytes of device dependent config space, and the kernel is returning 0
because qemu doesn't have CAP_SYS_ADMIN.

Basically, this means that device assignment w/ libvirt will break
MSI/MSI-X because qemu will never be able to see that the host device
has those PCI capabilities.  This, in turn, renders VF device assignment
useless (since a VF is required to support MSI and/or MSI-X).

Granting CAP_SYS_ADMIN for each qemu instance that does device assignment
would render the privilege reduction useless (CAP_SYS_ADMIN is the
kitchen sink catchall of the Linux capability system).

Hmmph...
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 9/9] KVM: PPC: Pass program interrupt flags to the guest

2010-01-07 Thread Alexander Graf
When we need to reinject a program interrupt into the guest, we also need to
reinject the corresponding flags into the guest.

Signed-off-by: Alexander Graf ag...@suse.de
Reported-by: Benjamin Herrenschmidt b...@kernel.crashing.org
---
 arch/powerpc/kvm/book3s.c |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 66b5924..02861fd 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -633,6 +633,9 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
case BOOK3S_INTERRUPT_PROGRAM:
{
enum emulation_result er;
+   ulong flags;
+
+   flags = (vcpu-arch.shadow_msr  0x1full);
 
if (vcpu-arch.msr  MSR_PR) {
 #ifdef EXIT_DEBUG
@@ -640,7 +643,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
 #endif
if ((vcpu-arch.last_inst  0xff0007ff) !=
(INS_DCBZ  0xfff7)) {
-   kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+   kvmppc_core_queue_program(vcpu, flags);
r = RESUME_GUEST;
break;
}
@@ -655,7 +658,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
case EMULATE_FAIL:
printk(KERN_CRIT %s: emulation at %lx failed (%08x)\n,
   __func__, vcpu-arch.pc, vcpu-arch.last_inst);
-   kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+   kvmppc_core_queue_program(vcpu, flags);
r = RESUME_GUEST;
break;
default:
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/9] KVM: PPC: Add helpers for CR, XER

2010-01-07 Thread Alexander Graf
We now have helpers for the GPRs, so let's also add some for CR and XER.

Having them in the PACA simplifies code a lot, as we don't need to care
about where to store CC or not to overflow any integers.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_ppc.h |   40 
 arch/powerpc/kvm/44x_tlb.c |6 +++-
 arch/powerpc/kvm/book3s.c  |8 +++---
 arch/powerpc/kvm/booke.c   |8 +++---
 4 files changed, 52 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index ba01b9c..d60b2f0 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -108,6 +108,26 @@ static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, 
int num)
return vcpu-arch.gpr[num];
 }
 
+static inline void kvmppc_set_cr(struct kvm_vcpu *vcpu, u32 val)
+{
+   vcpu-arch.cr = val;
+}
+
+static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
+{
+   return vcpu-arch.cr;
+}
+
+static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val)
+{
+   vcpu-arch.xer = val;
+}
+
+static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu)
+{
+   return vcpu-arch.xer;
+}
+
 #else
 
 static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val)
@@ -120,6 +140,26 @@ static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, 
int num)
return vcpu-arch.gpr[num];
 }
 
+static inline void kvmppc_set_cr(struct kvm_vcpu *vcpu, u32 val)
+{
+   vcpu-arch.cr = val;
+}
+
+static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
+{
+   return vcpu-arch.cr;
+}
+
+static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val)
+{
+   vcpu-arch.xer = val;
+}
+
+static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu)
+{
+   return vcpu-arch.xer;
+}
+
 #endif
 
 #endif /* __POWERPC_KVM_PPC_H__ */
diff --git a/arch/powerpc/kvm/44x_tlb.c b/arch/powerpc/kvm/44x_tlb.c
index 8b37736..2570fcc 100644
--- a/arch/powerpc/kvm/44x_tlb.c
+++ b/arch/powerpc/kvm/44x_tlb.c
@@ -506,10 +506,12 @@ int kvmppc_44x_emul_tlbsx(struct kvm_vcpu *vcpu, u8 rt, 
u8 ra, u8 rb, u8 rc)
 
gtlb_index = kvmppc_44x_tlb_index(vcpu, ea, pid, as);
if (rc) {
+   u32 cr = kvmppc_get_cr(vcpu);
+
if (gtlb_index  0)
-   vcpu-arch.cr = ~0x2000;
+   kvmppc_set_cr(vcpu, cr  ~0x2000);
else
-   vcpu-arch.cr |= 0x2000;
+   kvmppc_set_cr(vcpu, cr | 0x2000);
}
kvmppc_set_gpr(vcpu, rt, gtlb_index);
 
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 574b24f..09ba8db 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -717,10 +717,10 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
int i;
 
regs-pc = vcpu-arch.pc;
-   regs-cr = vcpu-arch.cr;
+   regs-cr = kvmppc_get_cr(vcpu);
regs-ctr = vcpu-arch.ctr;
regs-lr = vcpu-arch.lr;
-   regs-xer = vcpu-arch.xer;
+   regs-xer = kvmppc_get_xer(vcpu);
regs-msr = vcpu-arch.msr;
regs-srr0 = vcpu-arch.srr0;
regs-srr1 = vcpu-arch.srr1;
@@ -744,10 +744,10 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
int i;
 
vcpu-arch.pc = regs-pc;
-   vcpu-arch.cr = regs-cr;
+   kvmppc_set_cr(vcpu, regs-cr);
vcpu-arch.ctr = regs-ctr;
vcpu-arch.lr = regs-lr;
-   vcpu-arch.xer = regs-xer;
+   kvmppc_set_xer(vcpu, regs-xer);
kvmppc_set_msr(vcpu, regs-msr);
vcpu-arch.srr0 = regs-srr0;
vcpu-arch.srr1 = regs-srr1;
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 49af80e..338baf9 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -449,10 +449,10 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
int i;
 
regs-pc = vcpu-arch.pc;
-   regs-cr = vcpu-arch.cr;
+   regs-cr = kvmppc_get_cr(vcpu);
regs-ctr = vcpu-arch.ctr;
regs-lr = vcpu-arch.lr;
-   regs-xer = vcpu-arch.xer;
+   regs-xer = kvmppc_get_xer(vcpu);
regs-msr = vcpu-arch.msr;
regs-srr0 = vcpu-arch.srr0;
regs-srr1 = vcpu-arch.srr1;
@@ -476,10 +476,10 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, 
struct kvm_regs *regs)
int i;
 
vcpu-arch.pc = regs-pc;
-   vcpu-arch.cr = regs-cr;
+   kvmppc_set_cr(vcpu, regs-cr);
vcpu-arch.ctr = regs-ctr;
vcpu-arch.lr = regs-lr;
-   vcpu-arch.xer = regs-xer;
+   kvmppc_set_xer(vcpu, regs-xer);
kvmppc_set_msr(vcpu, regs-msr);
vcpu-arch.srr0 = regs-srr0;
vcpu-arch.srr1 = regs-srr1;
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo 

[PATCH 0/9] KVM: PPC: Reduce races, fix code

2010-01-07 Thread Alexander Graf
We've been a bit lax with how we use fields in the PACA so far. Most of
the time we just overwrote random fields that another interrupt handler
would have used as well.

That is racy.

We also jumped over to real mode from IR=1 using RFI. Unfortunately,
we need 3 operations to do that transitions which need to be fully atomic,
as any interrupt coming in between those instructions can possibly break
us.

That is racy too.

So let's get rid of all the racy code and clean up some pieces along the way.

Alexander Graf (9):
  KVM: PPC: Use accessor functions for GPR access
  KVM: PPC: Add helpers for CR, XER
  KVM: PPC: Use PACA backed shadow vcpu
  KVM: PPC: Implement 'skip instruction' mode
  KVM: PPC: Get rid of unnecessary RFI
  KVM: PPC: Call SLB patching code in interrupt safe manner
  KVM: PPC: Emulate trap SRR1 flags properly
  KVM: PPC: Fix HID5 setting code
  KVM: PPC: Pass program interrupt flags to the guest

 arch/powerpc/include/asm/kvm_asm.h   |6 +
 arch/powerpc/include/asm/kvm_book3s.h|4 +
 arch/powerpc/include/asm/kvm_book3s_64_asm.h |   18 ++
 arch/powerpc/include/asm/kvm_host.h  |6 +-
 arch/powerpc/include/asm/kvm_ppc.h   |   76 -
 arch/powerpc/include/asm/paca.h  |5 +
 arch/powerpc/include/asm/reg.h   |4 +
 arch/powerpc/kernel/asm-offsets.c|   34 -
 arch/powerpc/kvm/44x_emulate.c   |   25 ++--
 arch/powerpc/kvm/44x_tlb.c   |   20 ++-
 arch/powerpc/kvm/book3s.c|   35 +++--
 arch/powerpc/kvm/book3s_64_emulate.c |   77 +
 arch/powerpc/kvm/book3s_64_exports.c |1 +
 arch/powerpc/kvm/book3s_64_interrupts.S  |  242 +-
 arch/powerpc/kvm/book3s_64_rmhandlers.S  |   85 +++---
 arch/powerpc/kvm/book3s_64_slb.S |  158 +++---
 arch/powerpc/kvm/booke.c |   27 ++--
 arch/powerpc/kvm/booke_emulate.c |  107 ++--
 arch/powerpc/kvm/e500_emulate.c  |   95 ++-
 arch/powerpc/kvm/e500_tlb.c  |4 +-
 arch/powerpc/kvm/emulate.c   |  112 +++--
 arch/powerpc/kvm/powerpc.c   |   21 ++-
 22 files changed, 672 insertions(+), 490 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/9] KVM: PPC: Implement 'skip instruction' mode

2010-01-07 Thread Alexander Graf
To fetch the last instruction we were interrupted on, we enable DR in early
exit code, where we are still in a very transitional phase between guest
and host state.

Most of the time this seemed to work, but another CPU can easily flush our
TLB and HTAB which makes us go in the Linux page fault handler which totally
breaks because we still use the guest's SLB entries.

To work around that, let's introduce a second KVM guest mode that defines
that whenever we get a trap, we don't call the Linux handler or go into
the KVM exit code, but just jump over the faulting instruction.

That way a potentially bad lwz doesn't trigger any faults and we can later
on interpret the invalid instruction we fetched as fetch didn't work.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_asm.h  |6 
 arch/powerpc/kvm/book3s_64_rmhandlers.S |   39 ++-
 arch/powerpc/kvm/book3s_64_slb.S|   16 
 arch/powerpc/kvm/emulate.c  |4 +++
 4 files changed, 59 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_asm.h 
b/arch/powerpc/include/asm/kvm_asm.h
index af2abe7..aadf2dd 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -97,4 +97,10 @@
 #define RESUME_HOST RESUME_FLAG_HOST
 #define RESUME_HOST_NV  (RESUME_FLAG_HOST|RESUME_FLAG_NV)
 
+#define KVM_GUEST_MODE_NONE0
+#define KVM_GUEST_MODE_GUEST   1
+#define KVM_GUEST_MODE_SKIP2
+
+#define KVM_INST_FETCH_FAILED  -1
+
 #endif /* __POWERPC_KVM_ASM_H__ */
diff --git a/arch/powerpc/kvm/book3s_64_rmhandlers.S 
b/arch/powerpc/kvm/book3s_64_rmhandlers.S
index cd9f0b6..9ad1c26 100644
--- a/arch/powerpc/kvm/book3s_64_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_64_rmhandlers.S
@@ -49,7 +49,7 @@ kvmppc_trampoline_\intno:
mfcrr12
stw r12, PACA_KVM_SCRATCH1(r13)
lbz r12, PACA_KVM_IN_GUEST(r13)
-   cmpwi   r12, 0
+   cmpwi   r12, KVM_GUEST_MODE_NONE
bne ..kvmppc_handler_hasmagic_\intno
/* No KVM guest? Then jump back to the Linux handler! */
lwz r12, PACA_KVM_SCRATCH1(r13)
@@ -60,6 +60,11 @@ kvmppc_trampoline_\intno:
 
/* Now we know we're handling a KVM guest */
 ..kvmppc_handler_hasmagic_\intno:
+
+   /* Should we just skip the faulting instruction? */
+   cmpwi   r12, KVM_GUEST_MODE_SKIP
+   beq kvmppc_handler_skip_ins
+
/* Let's store which interrupt we're handling */
li  r12, \intno
 
@@ -86,6 +91,38 @@ INTERRUPT_TRAMPOLINE BOOK3S_INTERRUPT_ALTIVEC
 INTERRUPT_TRAMPOLINE   BOOK3S_INTERRUPT_VSX
 
 /*
+ * Bring us back to the faulting code, but skip the
+ * faulting instruction.
+ *
+ * This is a generic exit path from the interrupt
+ * trampolines above.
+ *
+ * Input Registers:
+ *
+ * R12   = free
+ * R13   = PACA
+ * PACA.KVM.SCRATCH0 = guest R12
+ * PACA.KVM.SCRATCH1 = guest CR
+ * SPRG_SCRATCH0 = guest R13
+ *
+ */
+kvmppc_handler_skip_ins:
+
+   /* Patch the IP to the next instruction */
+   mfsrr0  r12
+   addir12, r12, 4
+   mtsrr0  r12
+
+   /* Clean up all state */
+   lwz r12, PACA_KVM_SCRATCH1(r13)
+   mtcrr12
+   ld  r12, PACA_KVM_SCRATCH0(r13)
+   mfspr   r13, SPRN_SPRG_SCRATCH0
+
+   /* And get back into the code */
+   RFI
+
+/*
  * This trampoline brings us back to a real mode handler
  *
  * Input Registers:
diff --git a/arch/powerpc/kvm/book3s_64_slb.S b/arch/powerpc/kvm/book3s_64_slb.S
index 7188c11..d07b886 100644
--- a/arch/powerpc/kvm/book3s_64_slb.S
+++ b/arch/powerpc/kvm/book3s_64_slb.S
@@ -212,10 +212,6 @@ kvmppc_handler_trampoline_exit:
mfdar   r5
mfdsisr r6
 
-   /* Unset guest state */
-   li  r9, 0
-   stb r9, PACA_KVM_IN_GUEST(r13)
-
/*
 * In order for us to easily get the last instruction,
 * we got the #vmexit at, we exploit the fact that the
@@ -233,18 +229,28 @@ kvmppc_handler_trampoline_exit:
 
 ld_last_inst:
/* Save off the guest instruction we're at */
+
+   /* Set guest mode to 'jump over instruction' so if lwz faults
+* we'll just continue at the next IP. */
+   li  r9, KVM_GUEST_MODE_SKIP
+   stb r9, PACA_KVM_IN_GUEST(r13)
+
/*1) enable paging for data */
mfmsr   r9
ori r11, r9, MSR_DR /* Enable paging for data */
mtmsr   r11
/*2) fetch the instruction */
-   /* XXX implement PACA_KVM_IN_GUEST=2 path to safely jump over this */
+   li  r0, KVM_INST_FETCH_FAILED   /* In case lwz faults */
lwz r0, 0(r3)
/*3) disable paging again */
mtmsr   r9
 
 no_ld_last_inst:
 
+   /* Unset guest mode */
+   li  r9, KVM_GUEST_MODE_NONE
+   stb r9, PACA_KVM_IN_GUEST(r13)
+
/* Restore bolted entries from the shadow and fix it along the way */
 
  

[PATCH 5/9] KVM: PPC: Get rid of unnecessary RFI

2010-01-07 Thread Alexander Graf
Using an RFI in IR=1 is dangerous. We need to set two SRRs and then do an RFI
without getting interrupted at all, because every interrupt could potentially
overwrite the SRR values.

Fortunately, we don't need to RFI in at least this particular case of the code,
so we can just replace it with an mtmsr and b.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_64_interrupts.S |   22 +++---
 1 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_interrupts.S 
b/arch/powerpc/kvm/book3s_64_interrupts.S
index 66e3b11..3c0ba55 100644
--- a/arch/powerpc/kvm/book3s_64_interrupts.S
+++ b/arch/powerpc/kvm/book3s_64_interrupts.S
@@ -221,15 +221,8 @@ no_dcbz32_off:
mflrr5
std r5, VCPU_LR(r7)
 
-   /* XXX convert to safe function call */
-
/* Restore host msr - SRR1 */
ld  r6, VCPU_HOST_MSR(r7)
-   mtsrr1  r6
-
-   /* Restore host IP - SRR0 */
-   ld  r5, VCPU_HOST_RETIP(r7)
-   mtsrr0  r5
 
/*
 * For some interrupts, we need to call the real Linux
@@ -246,8 +239,9 @@ no_dcbz32_off:
cmpwi   r12, BOOK3S_INTERRUPT_DECREMENTER
beq call_linux_handler
 
-   /* Back to Interruptable Mode! (goto kvm_return_point) */
-   RFI
+   /* Back to EE=1 */
+   mtmsr   r6
+   b   kvm_return_point
 
 call_linux_handler:
 
@@ -260,10 +254,16 @@ call_linux_handler:
 * interrupt handler!
 *
 * R3 still contains the exit code,
-* R6 VCPU_HOST_RETIP and
-* R7 VCPU_HOST_MSR
+* R5 VCPU_HOST_RETIP and
+* R6 VCPU_HOST_MSR
 */
 
+   /* Restore host IP - SRR0 */
+   ld  r5, VCPU_HOST_RETIP(r7)
+
+   /* XXX Better move to a safe function?
+* What if we get an HTAB flush in between mtsrr0 and mtsrr1? */
+
mtlrr12
 
ld  r4, VCPU_TRAMPOLINE_LOWMEM(r7)
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/9] KVM: PPC: Fix HID5 setting code

2010-01-07 Thread Alexander Graf
The code to unset HID5.dcbz32 is broken.
This patch makes it do the right rotate magic.

Signed-off-by: Alexander Graf ag...@suse.de
Reported-by: Benjamin Herrenschmidt b...@kernel.crashing.org
---
 arch/powerpc/kvm/book3s_64_interrupts.S |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_interrupts.S 
b/arch/powerpc/kvm/book3s_64_interrupts.S
index 33aef53..2ff0b21 100644
--- a/arch/powerpc/kvm/book3s_64_interrupts.S
+++ b/arch/powerpc/kvm/book3s_64_interrupts.S
@@ -177,8 +177,9 @@ kvmppc_handler_highmem:
rldicl. r5, r5, 0, 63   /* CR = ((r5  1) == 0) */
beq no_dcbz32_off
 
+   li  r4, 0
mfspr   r5,SPRN_HID5
-   rldimi  r5,r5,6,56
+   rldimi  r5,r4,6,56
mtspr   SPRN_HID5,r5
 
 no_dcbz32_off:
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/9] KVM: PPC: Use accessor functions for GPR access

2010-01-07 Thread Alexander Graf
All code in PPC KVM currently accesses gprs in the vcpu struct directly.

While there's nothing wrong with that wrt the current way gprs are stored
and loaded, it doesn't suffice for the PACA acceleration that will follow
in this patchset.

So let's just create little wrapper inline functions that we call whenever
a GPR needs to be read from or written to. The compiled code shouldn't really
change at all for now.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_ppc.h   |   26 
 arch/powerpc/kvm/44x_emulate.c   |   25 
 arch/powerpc/kvm/44x_tlb.c   |   14 ++--
 arch/powerpc/kvm/book3s.c|8 +-
 arch/powerpc/kvm/book3s_64_emulate.c |   77 +
 arch/powerpc/kvm/booke.c |   16 +++---
 arch/powerpc/kvm/booke_emulate.c |  107 +-
 arch/powerpc/kvm/e500_emulate.c  |   95 --
 arch/powerpc/kvm/e500_tlb.c  |4 +-
 arch/powerpc/kvm/emulate.c   |  106 ++---
 arch/powerpc/kvm/powerpc.c   |   21 ---
 11 files changed, 274 insertions(+), 225 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index abfd0c4..ba01b9c 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -96,4 +96,30 @@ extern void kvmppc_booke_exit(void);
 
 extern void kvmppc_core_destroy_mmu(struct kvm_vcpu *vcpu);
 
+#ifdef CONFIG_PPC_BOOK3S
+
+static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val)
+{
+   vcpu-arch.gpr[num] = val;
+}
+
+static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, int num)
+{
+   return vcpu-arch.gpr[num];
+}
+
+#else
+
+static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val)
+{
+   vcpu-arch.gpr[num] = val;
+}
+
+static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, int num)
+{
+   return vcpu-arch.gpr[num];
+}
+
+#endif
+
 #endif /* __POWERPC_KVM_PPC_H__ */
diff --git a/arch/powerpc/kvm/44x_emulate.c b/arch/powerpc/kvm/44x_emulate.c
index 61af58f..0ff0d40 100644
--- a/arch/powerpc/kvm/44x_emulate.c
+++ b/arch/powerpc/kvm/44x_emulate.c
@@ -65,13 +65,14 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 */
switch (dcrn) {
case DCRN_CPR0_CONFIG_ADDR:
-   vcpu-arch.gpr[rt] = vcpu-arch.cpr0_cfgaddr;
+   kvmppc_set_gpr(vcpu, rt, 
vcpu-arch.cpr0_cfgaddr);
break;
case DCRN_CPR0_CONFIG_DATA:
local_irq_disable();
mtdcr(DCRN_CPR0_CONFIG_ADDR,
  vcpu-arch.cpr0_cfgaddr);
-   vcpu-arch.gpr[rt] = 
mfdcr(DCRN_CPR0_CONFIG_DATA);
+   kvmppc_set_gpr(vcpu, rt,
+  mfdcr(DCRN_CPR0_CONFIG_DATA));
local_irq_enable();
break;
default:
@@ -93,11 +94,11 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
/* emulate some access in kernel */
switch (dcrn) {
case DCRN_CPR0_CONFIG_ADDR:
-   vcpu-arch.cpr0_cfgaddr = vcpu-arch.gpr[rs];
+   vcpu-arch.cpr0_cfgaddr = kvmppc_get_gpr(vcpu, 
rs);
break;
default:
run-dcr.dcrn = dcrn;
-   run-dcr.data = vcpu-arch.gpr[rs];
+   run-dcr.data = kvmppc_get_gpr(vcpu, rs);
run-dcr.is_write = 1;
vcpu-arch.dcr_needed = 1;
kvmppc_account_exit(vcpu, DCR_EXITS);
@@ -146,13 +147,13 @@ int kvmppc_core_emulate_mtspr(struct kvm_vcpu *vcpu, int 
sprn, int rs)
 
switch (sprn) {
case SPRN_PID:
-   kvmppc_set_pid(vcpu, vcpu-arch.gpr[rs]); break;
+   kvmppc_set_pid(vcpu, kvmppc_get_gpr(vcpu, rs)); break;
case SPRN_MMUCR:
-   vcpu-arch.mmucr = vcpu-arch.gpr[rs]; break;
+   vcpu-arch.mmucr = kvmppc_get_gpr(vcpu, rs); break;
case SPRN_CCR0:
-   vcpu-arch.ccr0 = vcpu-arch.gpr[rs]; break;
+   vcpu-arch.ccr0 = kvmppc_get_gpr(vcpu, rs); break;
case SPRN_CCR1:
-   vcpu-arch.ccr1 = vcpu-arch.gpr[rs]; break;
+   vcpu-arch.ccr1 = kvmppc_get_gpr(vcpu, rs); break;
default:
emulated = kvmppc_booke_emulate_mtspr(vcpu, sprn, rs);
}
@@ -167,13 +168,13 @@ int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int 
sprn, int rt)
 

[PATCH 3/9] KVM: PPC: Use PACA backed shadow vcpu

2010-01-07 Thread Alexander Graf
We're being horribly racy right now. All the entry and exit code hijacks
random fields from the PACA that could easily be used by different code in
case we get interrupted, for example by a #MC or even page fault.

After discussing this with Ben, we figured it's best to reserve some more
space in the PACA and just shove off some vcpu state to there.

That way we can drastically improve the readability of the code, make it
less racy and less complex.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_book3s.h|2 +
 arch/powerpc/include/asm/kvm_book3s_64_asm.h |   19 +++
 arch/powerpc/include/asm/kvm_host.h  |5 +-
 arch/powerpc/include/asm/kvm_ppc.h   |   20 ++-
 arch/powerpc/include/asm/paca.h  |5 +
 arch/powerpc/kernel/asm-offsets.c|   35 -
 arch/powerpc/kvm/book3s.c|4 +
 arch/powerpc/kvm/book3s_64_interrupts.S  |  216 +-
 arch/powerpc/kvm/book3s_64_rmhandlers.S  |   32 +---
 arch/powerpc/kvm/book3s_64_slb.S |  150 +++---
 10 files changed, 250 insertions(+), 238 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index 74b7369..f192017 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -23,6 +23,7 @@
 #include linux/types.h
 #include linux/kvm_host.h
 #include asm/kvm_ppc.h
+#include asm/kvm_book3s_64_asm.h
 
 struct kvmppc_slb {
u64 esid;
@@ -69,6 +70,7 @@ struct kvmppc_sid_map {
 
 struct kvmppc_vcpu_book3s {
struct kvm_vcpu vcpu;
+   struct kvmppc_book3s_shadow_vcpu shadow_vcpu;
struct kvmppc_sid_map sid_map[SID_MAP_NUM];
struct kvmppc_slb slb[64];
struct {
diff --git a/arch/powerpc/include/asm/kvm_book3s_64_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_64_asm.h
index 2e06ee8..fca9404 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64_asm.h
@@ -20,6 +20,8 @@
 #ifndef __ASM_KVM_BOOK3S_ASM_H__
 #define __ASM_KVM_BOOK3S_ASM_H__
 
+#ifdef __ASSEMBLY__
+
 #ifdef CONFIG_KVM_BOOK3S_64_HANDLER
 
 #include asm/kvm_asm.h
@@ -55,4 +57,21 @@ kvmppc_resume_\intno:
 
 #endif /* CONFIG_KVM_BOOK3S_64_HANDLER */
 
+#else  /*__ASSEMBLY__ */
+
+struct kvmppc_book3s_shadow_vcpu {
+   ulong gpr[14];
+   u32 cr;
+   u32 xer;
+   ulong host_r1;
+   ulong host_r2;
+   ulong handler;
+   ulong scratch0;
+   ulong scratch1;
+   ulong vmhandler;
+   ulong rmhandler;
+};
+
+#endif /*__ASSEMBLY__ */
+
 #endif /* __ASM_KVM_BOOK3S_ASM_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 1201f62..d615fa8 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -175,10 +175,13 @@ struct kvm_vcpu_arch {
ulong gpr[32];
 
ulong pc;
-   u32 cr;
ulong ctr;
ulong lr;
+
+#ifdef CONFIG_BOOKE
ulong xer;
+   u32 cr;
+#endif
 
ulong msr;
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index d60b2f0..89c5d79 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -98,34 +98,42 @@ extern void kvmppc_core_destroy_mmu(struct kvm_vcpu *vcpu);
 
 #ifdef CONFIG_PPC_BOOK3S
 
+/* We assume we're always acting on the current vcpu */
+
 static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val)
 {
-   vcpu-arch.gpr[num] = val;
+   if ( num  14 )
+   get_paca()-shadow_vcpu.gpr[num] = val;
+   else
+   vcpu-arch.gpr[num] = val;
 }
 
 static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, int num)
 {
-   return vcpu-arch.gpr[num];
+   if ( num  14 )
+   return get_paca()-shadow_vcpu.gpr[num];
+   else
+   return vcpu-arch.gpr[num];
 }
 
 static inline void kvmppc_set_cr(struct kvm_vcpu *vcpu, u32 val)
 {
-   vcpu-arch.cr = val;
+   get_paca()-shadow_vcpu.cr = val;
 }
 
 static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu)
 {
-   return vcpu-arch.cr;
+   return get_paca()-shadow_vcpu.cr;
 }
 
 static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val)
 {
-   vcpu-arch.xer = val;
+   get_paca()-shadow_vcpu.xer = val;
 }
 
 static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu)
 {
-   return vcpu-arch.xer;
+   return get_paca()-shadow_vcpu.xer;
 }
 
 #else
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 5e9b4ef..d8a6931 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -19,6 +19,9 @@
 #include asm/mmu.h
 #include asm/page.h
 #include asm/exception-64e.h
+#ifdef CONFIG_KVM_BOOK3S_64_HANDLER
+#include asm/kvm_book3s_64_asm.h
+#endif
 
 register struct paca_struct *local_paca asm(r13);
 
@@ -135,6 +138,8 @@ struct paca_struct {
  

[PATCH 6/9] KVM: PPC: Call SLB patching code in interrupt safe manner

2010-01-07 Thread Alexander Graf
Currently we're racy when doing the transition from IR=1 to IR=0, from
the module memory entry code to the real mode SLB switching code.

To work around that I took a look at the RTAS entry code which is faced
with a similar problem and did the same thing:

  A small helper in linear mapped memory that does mtmsr with IR=0 and
  then RFIs info the actual handler.

Thanks to that trick we can safely take page faults in the entry code
and only need to be really wary of what to do as of the SLB switching
part.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_book3s.h|1 +
 arch/powerpc/include/asm/kvm_book3s_64_asm.h |1 -
 arch/powerpc/include/asm/kvm_host.h  |1 +
 arch/powerpc/kernel/asm-offsets.c|3 +--
 arch/powerpc/kvm/book3s.c|1 +
 arch/powerpc/kvm/book3s_64_exports.c |1 +
 arch/powerpc/kvm/book3s_64_interrupts.S  |   25 +++--
 arch/powerpc/kvm/book3s_64_rmhandlers.S  |   18 ++
 arch/powerpc/kvm/book3s_64_slb.S |4 
 9 files changed, 34 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index f192017..c91be0f 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -121,6 +121,7 @@ extern void kvmppc_set_bat(struct kvm_vcpu *vcpu, struct 
kvmppc_bat *bat,
 
 extern u32 kvmppc_trampoline_lowmem;
 extern u32 kvmppc_trampoline_enter;
+extern void kvmppc_rmcall(ulong srr0, ulong srr1);
 
 static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/powerpc/include/asm/kvm_book3s_64_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_64_asm.h
index fca9404..183461b 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64_asm.h
@@ -69,7 +69,6 @@ struct kvmppc_book3s_shadow_vcpu {
ulong scratch0;
ulong scratch1;
ulong vmhandler;
-   ulong rmhandler;
 };
 
 #endif /*__ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index d615fa8..f7215e6 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -167,6 +167,7 @@ struct kvm_vcpu_arch {
ulong trampoline_lowmem;
ulong trampoline_enter;
ulong highmem_handler;
+   ulong rmcall;
ulong host_paca_phys;
struct kvmppc_mmu mmu;
 #endif
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 03b4fcd..be90ced 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -214,8 +214,6 @@ int main(void)
DEFINE(PACA_KVM_HOST_R2, offsetof(struct paca_struct, 
shadow_vcpu.host_r2));
DEFINE(PACA_KVM_VMHANDLER, offsetof(struct paca_struct,
shadow_vcpu.vmhandler));
-   DEFINE(PACA_KVM_RMHANDLER, offsetof(struct paca_struct,
-   shadow_vcpu.rmhandler));
DEFINE(PACA_KVM_SCRATCH0, offsetof(struct paca_struct,
   shadow_vcpu.scratch0));
DEFINE(PACA_KVM_SCRATCH1, offsetof(struct paca_struct,
@@ -437,6 +435,7 @@ int main(void)
DEFINE(VCPU_TRAMPOLINE_LOWMEM, offsetof(struct kvm_vcpu, 
arch.trampoline_lowmem));
DEFINE(VCPU_TRAMPOLINE_ENTER, offsetof(struct kvm_vcpu, 
arch.trampoline_enter));
DEFINE(VCPU_HIGHMEM_HANDLER, offsetof(struct kvm_vcpu, 
arch.highmem_handler));
+   DEFINE(VCPU_RMCALL, offsetof(struct kvm_vcpu, arch.rmcall));
DEFINE(VCPU_HFLAGS, offsetof(struct kvm_vcpu, arch.hflags));
DEFINE(VCPU_GPRS, offsetof(struct kvm_vcpu, arch.gpr));
 #else
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 3e06eae..1317392 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -919,6 +919,7 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, 
unsigned int id)
vcpu-arch.trampoline_lowmem = kvmppc_trampoline_lowmem;
vcpu-arch.trampoline_enter = kvmppc_trampoline_enter;
vcpu-arch.highmem_handler = (ulong)kvmppc_handler_highmem;
+   vcpu-arch.rmcall = *(ulong*)kvmppc_rmcall;
 
vcpu-arch.shadow_msr = MSR_USER64;
 
diff --git a/arch/powerpc/kvm/book3s_64_exports.c 
b/arch/powerpc/kvm/book3s_64_exports.c
index 5b2db38..99b0712 100644
--- a/arch/powerpc/kvm/book3s_64_exports.c
+++ b/arch/powerpc/kvm/book3s_64_exports.c
@@ -22,3 +22,4 @@
 
 EXPORT_SYMBOL_GPL(kvmppc_trampoline_enter);
 EXPORT_SYMBOL_GPL(kvmppc_trampoline_lowmem);
+EXPORT_SYMBOL_GPL(kvmppc_rmcall);
diff --git a/arch/powerpc/kvm/book3s_64_interrupts.S 
b/arch/powerpc/kvm/book3s_64_interrupts.S
index 3c0ba55..33aef53 100644
--- a/arch/powerpc/kvm/book3s_64_interrupts.S
+++ b/arch/powerpc/kvm/book3s_64_interrupts.S
@@ -95,17 +95,14 @@ kvm_start_entry:
ld  r3, 

[PATCH 7/9] KVM: PPC: Emulate trap SRR1 flags properly

2010-01-07 Thread Alexander Graf
Book3S needs some flags in SRR1 to get to know details about an interrupt.

One such example is the trap instruction. It tells the guest kernel that
a program interrupt is due to a trap using a bit in SRR1.

This patch implements above behavior, making WARN_ON behave like WARN_ON.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_book3s.h |1 +
 arch/powerpc/include/asm/kvm_ppc.h|2 +-
 arch/powerpc/include/asm/reg.h|4 
 arch/powerpc/kvm/book3s.c |7 +--
 arch/powerpc/kvm/booke.c  |3 ++-
 arch/powerpc/kvm/emulate.c|2 +-
 6 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index c91be0f..79ab8fa 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -91,6 +91,7 @@ struct kvmppc_vcpu_book3s {
u64 vsid_next;
u64 vsid_max;
int context_id;
+   ulong prog_flags; /* flags to inject when giving a 700 trap */
 };
 
 #define CONTEXT_HOST   0
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 89c5d79..09816da 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -80,7 +80,7 @@ extern void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu);
 
 extern void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu);
 extern int kvmppc_core_pending_dec(struct kvm_vcpu *vcpu);
-extern void kvmppc_core_queue_program(struct kvm_vcpu *vcpu);
+extern void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags);
 extern void kvmppc_core_queue_dec(struct kvm_vcpu *vcpu);
 extern void kvmppc_core_dequeue_dec(struct kvm_vcpu *vcpu);
 extern void kvmppc_core_queue_external(struct kvm_vcpu *vcpu,
diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index bc8dd53..5572e86 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -426,6 +426,10 @@
 #define   SRR1_WAKEMT  0x0028 /* mtctrl */
 #define   SRR1_WAKEDEC 0x0018 /* Decrementer interrupt */
 #define   SRR1_WAKETHERM   0x0010 /* Thermal management interrupt */
+#define   SRR1_PROGFPE 0x0010 /* Floating Point Enabled */
+#define   SRR1_PROGPRIV0x0004 /* Privileged instruction */
+#define   SRR1_PROGTRAP0x0002 /* Trap */
+#define   SRR1_PROGADDR0x0001 /* SRR0 contains subsequent 
addr */
 #define SPRN_HSRR0 0x13A   /* Save/Restore Register 0 */
 #define SPRN_HSRR1 0x13B   /* Save/Restore Register 1 */
 
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 1317392..66b5924 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -168,8 +168,9 @@ void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, 
unsigned int vec)
 }
 
 
-void kvmppc_core_queue_program(struct kvm_vcpu *vcpu)
+void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags)
 {
+   to_book3s(vcpu)-prog_flags = flags;
kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_PROGRAM);
 }
 
@@ -198,6 +199,7 @@ int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, 
unsigned int priority)
 {
int deliver = 1;
int vec = 0;
+   ulong flags = 0ULL;
 
switch (priority) {
case BOOK3S_IRQPRIO_DECREMENTER:
@@ -231,6 +233,7 @@ int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, 
unsigned int priority)
break;
case BOOK3S_IRQPRIO_PROGRAM:
vec = BOOK3S_INTERRUPT_PROGRAM;
+   flags = to_book3s(vcpu)-prog_flags;
break;
case BOOK3S_IRQPRIO_VSX:
vec = BOOK3S_INTERRUPT_VSX;
@@ -261,7 +264,7 @@ int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, 
unsigned int priority)
 #endif
 
if (deliver)
-   kvmppc_inject_interrupt(vcpu, vec, 0ULL);
+   kvmppc_inject_interrupt(vcpu, vec, flags);
 
return deliver;
 }
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 338baf9..e283e44 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -82,8 +82,9 @@ static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu,
set_bit(priority, vcpu-arch.pending_exceptions);
 }
 
-void kvmppc_core_queue_program(struct kvm_vcpu *vcpu)
+void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags)
 {
+   /* BookE does flags in ESR, so ignore those we get here */
kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_PROGRAM);
 }
 
diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index 04e317c..8b0ba0b 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -154,7 +154,7 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct 
kvm_vcpu *vcpu)
 #else
vcpu-arch.esr |= ESR_PTR;
 #endif
-   

[PATCH 9/9] KVM: PPC: Pass program interrupt flags to the guest

2010-01-07 Thread Alexander Graf
When we need to reinject a program interrupt into the guest, we also need to
reinject the corresponding flags into the guest.

Signed-off-by: Alexander Graf ag...@suse.de
Reported-by: Benjamin Herrenschmidt b...@kernel.crashing.org
---
 arch/powerpc/kvm/book3s.c |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 66b5924..02861fd 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -633,6 +633,9 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
case BOOK3S_INTERRUPT_PROGRAM:
{
enum emulation_result er;
+   ulong flags;
+
+   flags = (vcpu-arch.shadow_msr  0x1full);
 
if (vcpu-arch.msr  MSR_PR) {
 #ifdef EXIT_DEBUG
@@ -640,7 +643,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
 #endif
if ((vcpu-arch.last_inst  0xff0007ff) !=
(INS_DCBZ  0xfff7)) {
-   kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+   kvmppc_core_queue_program(vcpu, flags);
r = RESUME_GUEST;
break;
}
@@ -655,7 +658,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
case EMULATE_FAIL:
printk(KERN_CRIT %s: emulation at %lx failed (%08x)\n,
   __func__, vcpu-arch.pc, vcpu-arch.last_inst);
-   kvmppc_book3s_queue_irqprio(vcpu, exit_nr);
+   kvmppc_core_queue_program(vcpu, flags);
r = RESUME_GUEST;
break;
default:
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/9] KVM: PPC: Implement 'skip instruction' mode

2010-01-07 Thread Alexander Graf
To fetch the last instruction we were interrupted on, we enable DR in early
exit code, where we are still in a very transitional phase between guest
and host state.

Most of the time this seemed to work, but another CPU can easily flush our
TLB and HTAB which makes us go in the Linux page fault handler which totally
breaks because we still use the guest's SLB entries.

To work around that, let's introduce a second KVM guest mode that defines
that whenever we get a trap, we don't call the Linux handler or go into
the KVM exit code, but just jump over the faulting instruction.

That way a potentially bad lwz doesn't trigger any faults and we can later
on interpret the invalid instruction we fetched as fetch didn't work.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_asm.h  |6 
 arch/powerpc/kvm/book3s_64_rmhandlers.S |   39 ++-
 arch/powerpc/kvm/book3s_64_slb.S|   16 
 arch/powerpc/kvm/emulate.c  |4 +++
 4 files changed, 59 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_asm.h 
b/arch/powerpc/include/asm/kvm_asm.h
index af2abe7..aadf2dd 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -97,4 +97,10 @@
 #define RESUME_HOST RESUME_FLAG_HOST
 #define RESUME_HOST_NV  (RESUME_FLAG_HOST|RESUME_FLAG_NV)
 
+#define KVM_GUEST_MODE_NONE0
+#define KVM_GUEST_MODE_GUEST   1
+#define KVM_GUEST_MODE_SKIP2
+
+#define KVM_INST_FETCH_FAILED  -1
+
 #endif /* __POWERPC_KVM_ASM_H__ */
diff --git a/arch/powerpc/kvm/book3s_64_rmhandlers.S 
b/arch/powerpc/kvm/book3s_64_rmhandlers.S
index cd9f0b6..9ad1c26 100644
--- a/arch/powerpc/kvm/book3s_64_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_64_rmhandlers.S
@@ -49,7 +49,7 @@ kvmppc_trampoline_\intno:
mfcrr12
stw r12, PACA_KVM_SCRATCH1(r13)
lbz r12, PACA_KVM_IN_GUEST(r13)
-   cmpwi   r12, 0
+   cmpwi   r12, KVM_GUEST_MODE_NONE
bne ..kvmppc_handler_hasmagic_\intno
/* No KVM guest? Then jump back to the Linux handler! */
lwz r12, PACA_KVM_SCRATCH1(r13)
@@ -60,6 +60,11 @@ kvmppc_trampoline_\intno:
 
/* Now we know we're handling a KVM guest */
 ..kvmppc_handler_hasmagic_\intno:
+
+   /* Should we just skip the faulting instruction? */
+   cmpwi   r12, KVM_GUEST_MODE_SKIP
+   beq kvmppc_handler_skip_ins
+
/* Let's store which interrupt we're handling */
li  r12, \intno
 
@@ -86,6 +91,38 @@ INTERRUPT_TRAMPOLINE BOOK3S_INTERRUPT_ALTIVEC
 INTERRUPT_TRAMPOLINE   BOOK3S_INTERRUPT_VSX
 
 /*
+ * Bring us back to the faulting code, but skip the
+ * faulting instruction.
+ *
+ * This is a generic exit path from the interrupt
+ * trampolines above.
+ *
+ * Input Registers:
+ *
+ * R12   = free
+ * R13   = PACA
+ * PACA.KVM.SCRATCH0 = guest R12
+ * PACA.KVM.SCRATCH1 = guest CR
+ * SPRG_SCRATCH0 = guest R13
+ *
+ */
+kvmppc_handler_skip_ins:
+
+   /* Patch the IP to the next instruction */
+   mfsrr0  r12
+   addir12, r12, 4
+   mtsrr0  r12
+
+   /* Clean up all state */
+   lwz r12, PACA_KVM_SCRATCH1(r13)
+   mtcrr12
+   ld  r12, PACA_KVM_SCRATCH0(r13)
+   mfspr   r13, SPRN_SPRG_SCRATCH0
+
+   /* And get back into the code */
+   RFI
+
+/*
  * This trampoline brings us back to a real mode handler
  *
  * Input Registers:
diff --git a/arch/powerpc/kvm/book3s_64_slb.S b/arch/powerpc/kvm/book3s_64_slb.S
index 7188c11..d07b886 100644
--- a/arch/powerpc/kvm/book3s_64_slb.S
+++ b/arch/powerpc/kvm/book3s_64_slb.S
@@ -212,10 +212,6 @@ kvmppc_handler_trampoline_exit:
mfdar   r5
mfdsisr r6
 
-   /* Unset guest state */
-   li  r9, 0
-   stb r9, PACA_KVM_IN_GUEST(r13)
-
/*
 * In order for us to easily get the last instruction,
 * we got the #vmexit at, we exploit the fact that the
@@ -233,18 +229,28 @@ kvmppc_handler_trampoline_exit:
 
 ld_last_inst:
/* Save off the guest instruction we're at */
+
+   /* Set guest mode to 'jump over instruction' so if lwz faults
+* we'll just continue at the next IP. */
+   li  r9, KVM_GUEST_MODE_SKIP
+   stb r9, PACA_KVM_IN_GUEST(r13)
+
/*1) enable paging for data */
mfmsr   r9
ori r11, r9, MSR_DR /* Enable paging for data */
mtmsr   r11
/*2) fetch the instruction */
-   /* XXX implement PACA_KVM_IN_GUEST=2 path to safely jump over this */
+   li  r0, KVM_INST_FETCH_FAILED   /* In case lwz faults */
lwz r0, 0(r3)
/*3) disable paging again */
mtmsr   r9
 
 no_ld_last_inst:
 
+   /* Unset guest mode */
+   li  r9, KVM_GUEST_MODE_NONE
+   stb r9, PACA_KVM_IN_GUEST(r13)
+
/* Restore bolted entries from the shadow and fix it along the way */
 
  

[PATCH 1/9] KVM: PPC: Use accessor functions for GPR access

2010-01-07 Thread Alexander Graf
All code in PPC KVM currently accesses gprs in the vcpu struct directly.

While there's nothing wrong with that wrt the current way gprs are stored
and loaded, it doesn't suffice for the PACA acceleration that will follow
in this patchset.

So let's just create little wrapper inline functions that we call whenever
a GPR needs to be read from or written to. The compiled code shouldn't really
change at all for now.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_ppc.h   |   26 
 arch/powerpc/kvm/44x_emulate.c   |   25 
 arch/powerpc/kvm/44x_tlb.c   |   14 ++--
 arch/powerpc/kvm/book3s.c|8 +-
 arch/powerpc/kvm/book3s_64_emulate.c |   77 +
 arch/powerpc/kvm/booke.c |   16 +++---
 arch/powerpc/kvm/booke_emulate.c |  107 +-
 arch/powerpc/kvm/e500_emulate.c  |   95 --
 arch/powerpc/kvm/e500_tlb.c  |4 +-
 arch/powerpc/kvm/emulate.c   |  106 ++---
 arch/powerpc/kvm/powerpc.c   |   21 ---
 11 files changed, 274 insertions(+), 225 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index abfd0c4..ba01b9c 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -96,4 +96,30 @@ extern void kvmppc_booke_exit(void);
 
 extern void kvmppc_core_destroy_mmu(struct kvm_vcpu *vcpu);
 
+#ifdef CONFIG_PPC_BOOK3S
+
+static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val)
+{
+   vcpu-arch.gpr[num] = val;
+}
+
+static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, int num)
+{
+   return vcpu-arch.gpr[num];
+}
+
+#else
+
+static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val)
+{
+   vcpu-arch.gpr[num] = val;
+}
+
+static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, int num)
+{
+   return vcpu-arch.gpr[num];
+}
+
+#endif
+
 #endif /* __POWERPC_KVM_PPC_H__ */
diff --git a/arch/powerpc/kvm/44x_emulate.c b/arch/powerpc/kvm/44x_emulate.c
index 61af58f..0ff0d40 100644
--- a/arch/powerpc/kvm/44x_emulate.c
+++ b/arch/powerpc/kvm/44x_emulate.c
@@ -65,13 +65,14 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 */
switch (dcrn) {
case DCRN_CPR0_CONFIG_ADDR:
-   vcpu-arch.gpr[rt] = vcpu-arch.cpr0_cfgaddr;
+   kvmppc_set_gpr(vcpu, rt, 
vcpu-arch.cpr0_cfgaddr);
break;
case DCRN_CPR0_CONFIG_DATA:
local_irq_disable();
mtdcr(DCRN_CPR0_CONFIG_ADDR,
  vcpu-arch.cpr0_cfgaddr);
-   vcpu-arch.gpr[rt] = 
mfdcr(DCRN_CPR0_CONFIG_DATA);
+   kvmppc_set_gpr(vcpu, rt,
+  mfdcr(DCRN_CPR0_CONFIG_DATA));
local_irq_enable();
break;
default:
@@ -93,11 +94,11 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
/* emulate some access in kernel */
switch (dcrn) {
case DCRN_CPR0_CONFIG_ADDR:
-   vcpu-arch.cpr0_cfgaddr = vcpu-arch.gpr[rs];
+   vcpu-arch.cpr0_cfgaddr = kvmppc_get_gpr(vcpu, 
rs);
break;
default:
run-dcr.dcrn = dcrn;
-   run-dcr.data = vcpu-arch.gpr[rs];
+   run-dcr.data = kvmppc_get_gpr(vcpu, rs);
run-dcr.is_write = 1;
vcpu-arch.dcr_needed = 1;
kvmppc_account_exit(vcpu, DCR_EXITS);
@@ -146,13 +147,13 @@ int kvmppc_core_emulate_mtspr(struct kvm_vcpu *vcpu, int 
sprn, int rs)
 
switch (sprn) {
case SPRN_PID:
-   kvmppc_set_pid(vcpu, vcpu-arch.gpr[rs]); break;
+   kvmppc_set_pid(vcpu, kvmppc_get_gpr(vcpu, rs)); break;
case SPRN_MMUCR:
-   vcpu-arch.mmucr = vcpu-arch.gpr[rs]; break;
+   vcpu-arch.mmucr = kvmppc_get_gpr(vcpu, rs); break;
case SPRN_CCR0:
-   vcpu-arch.ccr0 = vcpu-arch.gpr[rs]; break;
+   vcpu-arch.ccr0 = kvmppc_get_gpr(vcpu, rs); break;
case SPRN_CCR1:
-   vcpu-arch.ccr1 = vcpu-arch.gpr[rs]; break;
+   vcpu-arch.ccr1 = kvmppc_get_gpr(vcpu, rs); break;
default:
emulated = kvmppc_booke_emulate_mtspr(vcpu, sprn, rs);
}
@@ -167,13 +168,13 @@ int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int 
sprn, int rt)
 

[PATCH 6/9] KVM: PPC: Call SLB patching code in interrupt safe manner

2010-01-07 Thread Alexander Graf
Currently we're racy when doing the transition from IR=1 to IR=0, from
the module memory entry code to the real mode SLB switching code.

To work around that I took a look at the RTAS entry code which is faced
with a similar problem and did the same thing:

  A small helper in linear mapped memory that does mtmsr with IR=0 and
  then RFIs info the actual handler.

Thanks to that trick we can safely take page faults in the entry code
and only need to be really wary of what to do as of the SLB switching
part.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_book3s.h|1 +
 arch/powerpc/include/asm/kvm_book3s_64_asm.h |1 -
 arch/powerpc/include/asm/kvm_host.h  |1 +
 arch/powerpc/kernel/asm-offsets.c|3 +--
 arch/powerpc/kvm/book3s.c|1 +
 arch/powerpc/kvm/book3s_64_exports.c |1 +
 arch/powerpc/kvm/book3s_64_interrupts.S  |   25 +++--
 arch/powerpc/kvm/book3s_64_rmhandlers.S  |   18 ++
 arch/powerpc/kvm/book3s_64_slb.S |4 
 9 files changed, 34 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index f192017..c91be0f 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -121,6 +121,7 @@ extern void kvmppc_set_bat(struct kvm_vcpu *vcpu, struct 
kvmppc_bat *bat,
 
 extern u32 kvmppc_trampoline_lowmem;
 extern u32 kvmppc_trampoline_enter;
+extern void kvmppc_rmcall(ulong srr0, ulong srr1);
 
 static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/powerpc/include/asm/kvm_book3s_64_asm.h 
b/arch/powerpc/include/asm/kvm_book3s_64_asm.h
index fca9404..183461b 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64_asm.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64_asm.h
@@ -69,7 +69,6 @@ struct kvmppc_book3s_shadow_vcpu {
ulong scratch0;
ulong scratch1;
ulong vmhandler;
-   ulong rmhandler;
 };
 
 #endif /*__ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index d615fa8..f7215e6 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -167,6 +167,7 @@ struct kvm_vcpu_arch {
ulong trampoline_lowmem;
ulong trampoline_enter;
ulong highmem_handler;
+   ulong rmcall;
ulong host_paca_phys;
struct kvmppc_mmu mmu;
 #endif
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index 03b4fcd..be90ced 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -214,8 +214,6 @@ int main(void)
DEFINE(PACA_KVM_HOST_R2, offsetof(struct paca_struct, 
shadow_vcpu.host_r2));
DEFINE(PACA_KVM_VMHANDLER, offsetof(struct paca_struct,
shadow_vcpu.vmhandler));
-   DEFINE(PACA_KVM_RMHANDLER, offsetof(struct paca_struct,
-   shadow_vcpu.rmhandler));
DEFINE(PACA_KVM_SCRATCH0, offsetof(struct paca_struct,
   shadow_vcpu.scratch0));
DEFINE(PACA_KVM_SCRATCH1, offsetof(struct paca_struct,
@@ -437,6 +435,7 @@ int main(void)
DEFINE(VCPU_TRAMPOLINE_LOWMEM, offsetof(struct kvm_vcpu, 
arch.trampoline_lowmem));
DEFINE(VCPU_TRAMPOLINE_ENTER, offsetof(struct kvm_vcpu, 
arch.trampoline_enter));
DEFINE(VCPU_HIGHMEM_HANDLER, offsetof(struct kvm_vcpu, 
arch.highmem_handler));
+   DEFINE(VCPU_RMCALL, offsetof(struct kvm_vcpu, arch.rmcall));
DEFINE(VCPU_HFLAGS, offsetof(struct kvm_vcpu, arch.hflags));
DEFINE(VCPU_GPRS, offsetof(struct kvm_vcpu, arch.gpr));
 #else
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 3e06eae..1317392 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -919,6 +919,7 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, 
unsigned int id)
vcpu-arch.trampoline_lowmem = kvmppc_trampoline_lowmem;
vcpu-arch.trampoline_enter = kvmppc_trampoline_enter;
vcpu-arch.highmem_handler = (ulong)kvmppc_handler_highmem;
+   vcpu-arch.rmcall = *(ulong*)kvmppc_rmcall;
 
vcpu-arch.shadow_msr = MSR_USER64;
 
diff --git a/arch/powerpc/kvm/book3s_64_exports.c 
b/arch/powerpc/kvm/book3s_64_exports.c
index 5b2db38..99b0712 100644
--- a/arch/powerpc/kvm/book3s_64_exports.c
+++ b/arch/powerpc/kvm/book3s_64_exports.c
@@ -22,3 +22,4 @@
 
 EXPORT_SYMBOL_GPL(kvmppc_trampoline_enter);
 EXPORT_SYMBOL_GPL(kvmppc_trampoline_lowmem);
+EXPORT_SYMBOL_GPL(kvmppc_rmcall);
diff --git a/arch/powerpc/kvm/book3s_64_interrupts.S 
b/arch/powerpc/kvm/book3s_64_interrupts.S
index 3c0ba55..33aef53 100644
--- a/arch/powerpc/kvm/book3s_64_interrupts.S
+++ b/arch/powerpc/kvm/book3s_64_interrupts.S
@@ -95,17 +95,14 @@ kvm_start_entry:
ld  r3,