[PATCH] KVM: update mmu documetation for role.nxe.

2010-05-11 Thread Gui Jianfeng
There's no member cr4_nxe in struct kvm_mmu_page_role, it names nxe now.
Update mmu document.

Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com
---
 Documentation/kvm/mmu.txt |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Documentation/kvm/mmu.txt b/Documentation/kvm/mmu.txt
index 0cc28fb..fde7989 100644
--- a/Documentation/kvm/mmu.txt
+++ b/Documentation/kvm/mmu.txt
@@ -161,7 +161,7 @@ Shadow pages contain the following information:
   role.cr4_pae:
 Contains the value of cr4.pae for which the page is valid (e.g. whether
 32-bit or 64-bit gptes are in use).
-  role.cr4_nxe:
+  role.nxe:
 Contains the value of efer.nxe for which the page is valid.
   gfn:
 Either the guest page table containing the translations shadowed by this
-- 
1.6.5.2


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fix 80000001.EDX supported bit filtering

2010-05-11 Thread Gleb Natapov
On AMD some bits from 1.EDX are reported in 8001.EDX. The mask used
to copy bits from 1.EDX to 8001.EDX is incorrect resulting in
unsupported features passed into a guest.

Signed-off-by: Gleb Natapov g...@redhat.com
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 76c1adb..6463390 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -111,7 +111,7 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, 
uint32_t function, int reg)
  * so add missing bits according to the AMD spec:
  */
 cpuid_1_edx = kvm_arch_get_supported_cpuid(env, 1, R_EDX);
-ret |= cpuid_1_edx  0xdfeff7ff;
+ret |= cpuid_1_edx  0x183f7ff;
 break;
 }
 break;
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 4/5] Inter-VM shared memory PCI device

2010-05-11 Thread Avi Kivity

On 05/10/2010 08:25 PM, Anthony Liguori wrote:

On 05/10/2010 11:59 AM, Avi Kivity wrote:

On 05/10/2010 06:38 PM, Anthony Liguori wrote:


Otherwise, if the BAR is allocated during initialization, I would 
have

to use MAP_FIXED to mmap the memory.  This is what I did before the
qemu_ram_mmap() function was added.


What would happen to any data written to the BAR before the the 
handshake completed?  I think it would disappear.


You don't have to do MAP_FIXED.  You can allocate a ram area and map 
that in when disconnected.  When you connect, you create another ram 
area and memcpy() the previous ram area to the new one.  You then 
map the second ram area in.


But it's a shared memory area.  Other peers could have connected and 
written some data in.  The memcpy() would destroy their data.


Why try to attempt to support multi-master shared memory?  What's the 
use-case?


(presuming you mean multiple writers?)

This is a surprising take.  What's the use of a single master shared 
memory area?


Most uses of shared memory among processes or threads are multi-master.  
One use case can be a shared cache among the various guests.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 4/5] Inter-VM shared memory PCI device

2010-05-11 Thread Avi Kivity

On 05/10/2010 08:52 PM, Anthony Liguori wrote:

Why try to attempt to support multi-master shared memory?  What's the
use-case?

I don't see it as multi-master, but that the latest guest to join
shouldn't have their contents take precedence.  In developing this
patch, my motivation has been to let the guests decide.  If the memcpy
is always done, even when no data is written, a guest cannot join
without overwriting everything.

One use case we're looking at is having VMs using a map reduce
framework like Hadoop or Phoenix running in VMs.  However, if a
workqueue is stored or data transfer passes through shared memory, a
system can't scale up the number of workers because each new guest
will erase the shared memory (and the workqueue or in progress data
transfer).


(Replying again to list)

What data structure would you use?  For a lockless ring queue, you can 
only support a single producer and consumer.  To achieve bidirectional 
communication in virtio, we always use two queues.


You don't have to use a lockless ring queue.  You can use locks 
(spinlocks without interrupt support, full mutexes with interrupts) and 
any data structure you like.  Say a hash table + LRU for a shared cache.




If you're adding additional queues to support other levels of 
communication, you can always use different areas of shared memory.


You'll need O(n^2) shared memory areas (n=peer count), and it is a lot 
less flexible that real shared memory.  Consider using threading where 
the only communication among threads is a pipe (erlang?)



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] Support for booting from virtio disks

2010-05-11 Thread Avi Kivity

On 05/10/2010 06:48 PM, Anthony Liguori wrote:

On 05/10/2010 03:11 AM, Gleb Natapov wrote:

This patch adds native support for booting from virtio disks to Seabios.

Signed-off-by: Gleb Natapovg...@redhat.com


A related problem that I think we need to think about how we solve is 
indicating to Seabios which device we want to boot from


With your patch, a user can select a virtio device explicitly or if 
they use only one virtio device, it will Just Work.


However, if a user uses IDE and virtio, or a user has multiple disks, 
they cannot select a device via -boot.


Is this something we need to address?  I don't think we'd break 
libvirt if we didn't.


BIOSes traditionally address this by storing the boot order in RTC 
non-volatile memory, and allow the user to configure the order via a 
menu.  We could do the same (storing the RTC memory in a small disk image).


Alternatively we can seed the order from the command line (-boot 
id1,id2,id3 where id* are some qdev property attached to disks, this is 
more flexible than the current syntax I think).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 4/5] Inter-VM shared memory PCI device

2010-05-11 Thread Avi Kivity

On 05/11/2010 02:17 AM, Cam Macdonell wrote:

On Mon, May 10, 2010 at 5:59 AM, Avi Kivitya...@redhat.com  wrote:
   

On 04/21/2010 08:53 PM, Cam Macdonell wrote:
 

Support an inter-vm shared memory device that maps a shared-memory object
as a
PCI device in the guest.  This patch also supports interrupts between
guest by
communicating over a unix domain socket.  This patch applies to the
qemu-kvm
repository.

 -device ivshmem,size=size in format accepted by -m[,shm=shm name]

Interrupts are supported between multiple VMs by using a shared memory
server
by using a chardev socket.

 -device ivshmem,size=size in format accepted by -m[,shm=shm name]
 [,chardev=id][,msi=on][,irqfd=on][,vectors=n]
 -chardev socket,path=path,id=id

(shared memory server is qemu.git/contrib/ivshmem-server)

Sample programs and init scripts are in a git repo here:


+typedef struct EventfdEntry {
+PCIDevice *pdev;
+int vector;
+} EventfdEntry;
+
+typedef struct IVShmemState {
+PCIDevice dev;
+uint32_t intrmask;
+uint32_t intrstatus;
+uint32_t doorbell;
+
+CharDriverState * chr;
+CharDriverState ** eventfd_chr;
+int ivshmem_mmio_io_addr;
+
+pcibus_t mmio_addr;
+unsigned long ivshmem_offset;
+uint64_t ivshmem_size; /* size of shared memory region */
+int shm_fd; /* shared memory file descriptor */
+
+int nr_allocated_vms;
+/* array of eventfds for each guest */
+int ** eventfds;
+/* keep track of # of eventfds for each guest*/
+int * eventfds_posn_count;

   

More readable:

  typedef struct Peer {
  int nb_eventfds;
  int *eventfds;
  } Peer;
  int nb_peers;
  Peer *peers;

Does eventfd_chr need to be there as well?
 

No it does not, eventfd_chr store character devices for receiving
interrupts when irqfd is not available, so we only them for this
guest, not for our peers.
   


Ok.


Does this need to be part of the state?
 

They are because they're passed in as qdev properties from the
command-line so I thought they needed to be in the state struct to be
assigned via DEFINE_PROP_...
   


Well I'm not q-ualified to comment on qdev, so I'm fine either way with 
this.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 5/8] export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID

2010-05-11 Thread Avi Kivity

On 05/06/2010 12:27 AM, Glauber Costa wrote:

Right now, we were using individual KVM_CAP entities to communicate
userspace about which cpuids we support. This is suboptimal, since it
generates a delay between the feature arriving in the host, and
being available at the guest.

A much better mechanism is to list para features in KVM_GET_SUPPORTED_CPUID.
This makes userspace automatically aware of what we provide. And if we
ever add a new cpuid bit in the future, we have to do that again,
which create some complexity and delay in feature adoption.


--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1971,6 +1971,23 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, 
u32 function,
}
break;
}
+   case KVM_CPUID_SIGNATURE: {
+   char signature[12] = KVMKVMKVM\0\0;
+   u32 *sigptr = (u32 *)signature;
+   entry-eax = 1;
   


Don't understand where the value for eax comes from.  qemu-kvm-x86.c has 0.


@@ -2017,6 +2034,19 @@ static int kvm_dev_ioctl_get_supported_cpuid(struct 
kvm_cpuid2 *cpuid,
for (func = 0x8001; func= limit  nent  cpuid-nent; ++func)
do_cpuid_ent(cpuid_entries[nent], func, 0,
nent, cpuid-nent);
+
+   
+
+   r = -E2BIG;
+   if (nent= cpuid-nent)
+   goto out_free;
+
+   do_cpuid_ent(cpuid_entries[nent], KVM_CPUID_SIGNATURE, 0,nent,
+cpuid-nent);
   


Need a check here too, or the next call can overflow.  Would have been 
better to make do_cpuid_ent() check and return an error.



+
+   do_cpuid_ent(cpuid_entries[nent], KVM_CPUID_FEATURES, 0,nent,
+cpuid-nent);
+
r = -E2BIG;
if (nent= cpuid-nent)
goto out_free;
   



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/8] pvclock fixes - v3

2010-05-11 Thread Avi Kivity

On 05/06/2010 12:27 AM, Glauber Costa wrote:

Hello,

Avi, this version fixes the issues you raised in the last
review. Hopte it is better now.

   


Sorry about the late review.  Looks reasonable, apart from a couple of 
minor points.



A cpuid.txt wil follow as soon as we're set on everything, so
I am not including it yet.
   


Pity, for a reviewer it is a lot easier to review a spec, then the 
code's conformity to the spec, instead of having to reverse engineer the 
intent from the code.  Please include it next time.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/2] fix kvmclock bug - memory corruption (v2)

2010-05-11 Thread Avi Kivity

On 05/05/2010 10:19 PM, Glauber Costa wrote:

Hi,

Avi, in this patch, I am resetting all msrs upon CPU reset.
Hope it is better. Patch 1 is yet another cleanup.
   


Looks good.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 00/10] Redirct and make use of the guest serial console

2010-05-11 Thread Jason Wang
The guest console is useful for failure troubleshooting especially for
the one who has calltrace. And as we plan to push the network related
test in the next few weeks, we found the serial session in more
reliable during the network testing. So this patchset logs the guest
serial throught the redirectied serial of guest and also enable the
ability to log into guest through serial console. I only open the
serial console for linux, I would do some investigation on windows
guests. 

Change from v1:

- Coding style improvement according to the suggestions from Michael Goldish
- Improve the username sending handling in remote_login()
- Change the matching re of login to [Ll]ogin:\s*$
- Check whether vm have already dead in dumpping thread
- Return none rather than raise exception when met unknown shell_client
- Keep tty0 for all linux guests
- Enable the serial console in unattended installation
- Add a helper to check whether the panic information was occured 
- Keep the porcess() at its original location in preprocess()

---

Jason Wang (10):
  KVM test: Introduce prompt assist
  KVM test: Send the username in remote_login()
  KVM test: Make the login re suitable for serial console
  KVM test: Redirect the serial to the unix domain socket
  KVM test: Log the content from guest serial console
  KVM test: Return none when met unknown type in kvm_vm.remote_login().
  KVM test: Introduce local_login()
  KVM test: Enable the serial console for all linux guests
  KVM test: Enable the serial console during unattended installation
  KVM test: Add a helper to search the panic in the log


 client/tests/kvm/kvm_preprocessing.py|   59 ++
 client/tests/kvm/kvm_utils.py|   32 ++
 client/tests/kvm/kvm_vm.py   |   49 ++
 client/tests/kvm/scripts/check_serial.py |   41 ++
 client/tests/kvm/tests_base.cfg.sample   |   16 ---
 client/tests/kvm/unattended/Fedora-10.ks |2 -
 client/tests/kvm/unattended/Fedora-11.ks |2 -
 client/tests/kvm/unattended/Fedora-12.ks |2 -
 client/tests/kvm/unattended/Fedora-8.ks  |2 -
 client/tests/kvm/unattended/Fedora-9.ks  |2 -
 client/tests/kvm/unattended/OpenSUSE-11.xml  |1 
 client/tests/kvm/unattended/RHEL-3-series.ks |2 -
 client/tests/kvm/unattended/RHEL-4-series.ks |2 -
 client/tests/kvm/unattended/RHEL-5-series.ks |2 -
 client/tests/kvm/unattended/SLES-11.xml  |1 
 15 files changed, 183 insertions(+), 32 deletions(-)
 create mode 100644 client/tests/kvm/scripts/check_serial.py

-- 
Signature
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 01/10] KVM test: Introduce prompt assist

2010-05-11 Thread Jason Wang
We need to send an assist string to a session in order to get the
prompt when re-connecting to session through serial. This patch sends
assist string before matching the prompt in remote_login().

Signed-off-by: Jason Wang jasow...@redhat.com
---
 client/tests/kvm/kvm_utils.py |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
index 25f3c8c..11f2b1a 100644
--- a/client/tests/kvm/kvm_utils.py
+++ b/client/tests/kvm/kvm_utils.py
@@ -451,7 +451,8 @@ def check_kvm_source_dir(source_dir):
 # The following are functions used for SSH, SCP and Telnet communication with
 # guests.
 
-def remote_login(command, password, prompt, linesep=\n, timeout=10):
+def remote_login(command, password, prompt, linesep=\n, timeout=10,
+ prompt_assist=None):
 
 Log into a remote host (guest) using SSH or Telnet. Run the given command
 using kvm_spawn and provide answers to the questions asked. If timeout
@@ -468,7 +469,8 @@ def remote_login(command, password, prompt, linesep=\n, 
timeout=10):
 @param timeout: The maximal time duration (in seconds) to wait for each
 step of the login procedure (i.e. the Are you sure prompt, the
 password prompt, the shell prompt, etc)
-
+@param prompt_assist: An assistant string sent before the pattern
+matching in order to get the prompt for some kinds of shell_client.
 @return Return the kvm_spawn object on success and None on failure.
 
 sub = kvm_subprocess.kvm_shell_session(command,
@@ -479,6 +481,9 @@ def remote_login(command, password, prompt, linesep=\n, 
timeout=10):
 
 logging.debug(Trying to login with command '%s' % command)
 
+if prompt_assist is not None:
+sub.sendline(prompt_assist)
+
 while True:
 (match, text) = sub.read_until_last_line_matches(
 [r[Aa]re you sure, r[Pp]assword:\s*$, r^\s*[Ll]ogin:\s*$,

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 02/10] KVM test: Send the username in remote_login()

2010-05-11 Thread Jason Wang
In order to let the serial console work, we must let the
remote_login() send the username when met the username prompt. This
patch fails the progress if if it met the username prompt twice.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 client/tests/kvm/kvm_utils.py |   23 ---
 1 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
index 11f2b1a..2800fae 100644
--- a/client/tests/kvm/kvm_utils.py
+++ b/client/tests/kvm/kvm_utils.py
@@ -451,7 +451,7 @@ def check_kvm_source_dir(source_dir):
 # The following are functions used for SSH, SCP and Telnet communication with
 # guests.
 
-def remote_login(command, password, prompt, linesep=\n, timeout=10,
+def remote_login(command, username, password, prompt, linesep=\n, timeout=10,
  prompt_assist=None):
 
 Log into a remote host (guest) using SSH or Telnet. Run the given command
@@ -462,6 +462,7 @@ def remote_login(command, password, prompt, linesep=\n, 
timeout=10,
 @brief: Log into a remote host (guest) using SSH or Telnet.
 
 @param command: The command to execute (e.g. ssh r...@localhost)
+@param username: The username to send in reply to a username prompt
 @param password: The password to send in reply to a password prompt
 @param prompt: The shell prompt that indicates a successful login
 @param linesep: The line separator to send instead of \\n
@@ -471,6 +472,7 @@ def remote_login(command, password, prompt, linesep=\n, 
timeout=10,
 password prompt, the shell prompt, etc)
 @param prompt_assist: An assistant string sent before the pattern
 matching in order to get the prompt for some kinds of shell_client.
+
 @return Return the kvm_spawn object on success and None on failure.
 
 sub = kvm_subprocess.kvm_shell_session(command,
@@ -478,6 +480,7 @@ def remote_login(command, password, prompt, linesep=\n, 
timeout=10,
prompt=prompt)
 
 password_prompt_count = 0
+login_prompt_count = 0
 
 logging.debug(Trying to login with command '%s' % command)
 
@@ -505,9 +508,15 @@ def remote_login(command, password, prompt, linesep=\n, 
timeout=10,
 sub.close()
 return None
 elif match == 2:  # login:
-logging.debug(Got unexpected login prompt)
-sub.close()
-return None
+if login_prompt_count == 0:
+logging.debug(Got username prompt; sending '%s' % username)
+sub.sendline(username)
+login_prompt_count += 1
+continue
+else:
+logging.debug(Got username prompt again)
+sub.close()
+return None
 elif match == 3:  # Connection closed
 logging.debug(Got 'Connection closed')
 sub.close()
@@ -644,7 +653,7 @@ def ssh(host, port, username, password, prompt, 
linesep=\n, timeout=10):
 command = (ssh -o UserKnownHostsFile=/dev/null 
-o PreferredAuthentications=password -p %s %...@%s %
(port, username, host))
-return remote_login(command, password, prompt, linesep, timeout)
+return remote_login(command, username, password, prompt, linesep, timeout)
 
 
 def telnet(host, port, username, password, prompt, linesep=\n, timeout=10):
@@ -661,7 +670,7 @@ def telnet(host, port, username, password, prompt, 
linesep=\n, timeout=10):
 @return: kvm_spawn object on success and None on failure.
 
 command = telnet -l %s %s %s % (username, host, port)
-return remote_login(command, password, prompt, linesep, timeout)
+return remote_login(command, username, password, prompt, linesep, timeout)
 
 
 def netcat(host, port, username, password, prompt, linesep=\n, timeout=10):
@@ -678,7 +687,7 @@ def netcat(host, port, username, password, prompt, 
linesep=\n, timeout=10):
 @return: kvm_spawn object on success and None on failure.
 
 command = nc %s %s % (host, port)
-return remote_login(command, password, prompt, linesep, timeout)
+return remote_login(command, username, password, prompt, linesep, timeout)
 
 
 # The following are utility functions related to ports.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 03/10] KVM test: Make the login re suitable for serial console

2010-05-11 Thread Jason Wang
Current matching re ^\s*[Ll]ogin:\s*$ is not suitable for the serial
console, so change it to [Ll]ogin:\s*$.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 client/tests/kvm/kvm_utils.py |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/client/tests/kvm/kvm_utils.py b/client/tests/kvm/kvm_utils.py
index 2800fae..5cb3efb 100644
--- a/client/tests/kvm/kvm_utils.py
+++ b/client/tests/kvm/kvm_utils.py
@@ -489,7 +489,7 @@ def remote_login(command, username, password, prompt, 
linesep=\n, timeout=10,
 
 while True:
 (match, text) = sub.read_until_last_line_matches(
-[r[Aa]re you sure, r[Pp]assword:\s*$, r^\s*[Ll]ogin:\s*$,
+[r[Aa]re you sure, r[Pp]assword:\s*$, r[Ll]ogin:\s*$,
  r[Cc]onnection.*closed, r[Cc]onnection.*refused,
  r[Pp]lease wait, prompt],
  timeout=timeout, internal_timeout=0.5)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 04/10] KVM test: Redirect the serial to the unix domain socket

2010-05-11 Thread Jason Wang
This patch redirect the guest serial to the unix domain socket which
would be used by the following patches to dump its content or use it
as a session.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 client/tests/kvm/kvm_vm.py |   19 ---
 1 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py
index 6bc7987..8f4753f 100755
--- a/client/tests/kvm/kvm_vm.py
+++ b/client/tests/kvm/kvm_vm.py
@@ -116,17 +116,20 @@ class VM:
 self.address_cache = address_cache
 self.pci_assignable = None
 
-# Find available monitor filename
+# Find available filenames for monitor and guest serial redirection
 while True:
-# The monitor filename should be unique
+# The filenames should be unique
 self.instance = (time.strftime(%Y%m%d-%H%M%S-) +
  kvm_utils.generate_random_string(4))
-self.monitor_file_name = os.path.join(/tmp,
-  monitor- + self.instance)
-if not os.path.exists(self.monitor_file_name):
-break
-
 
+names = [os.path.join(/tmp, type + self.instance) for type in
+ monitor-, serial-]
+if True in [os.path.exists(file) for file in names]:
+continue
+else:
+[self.monitor_file_name, self.serial_file_name] = names
+break
+ 
 def clone(self, name=None, params=None, root_dir=None, address_cache=None):
 
 Return a clone of the VM object with optionally modified parameters.
@@ -316,6 +319,8 @@ class VM:
 for pci_id in self.pa_pci_ids:
 qemu_cmd +=  -pcidevice host=%s % pci_id
 
+qemu_cmd +=  -serial unix:%s,server,nowait % self.serial_file_name
+
 return qemu_cmd
 
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 05/10] KVM test: Log the content from guest serial console

2010-05-11 Thread Jason Wang
This patch tries to get the content of guest serial and log it into
the debug directoy of the testcase through a dedicated thread which is
created in the preprocessing and ended in the postprocessing. The
params of serial_mode must be set to dump in order to make use of
this feature.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 client/tests/kvm/kvm_preprocessing.py  |   59 +++-
 client/tests/kvm/tests_base.cfg.sample |1 +
 2 files changed, 59 insertions(+), 1 deletions(-)

diff --git a/client/tests/kvm/kvm_preprocessing.py 
b/client/tests/kvm/kvm_preprocessing.py
index 4b9290c..0a33d24 100644
--- a/client/tests/kvm/kvm_preprocessing.py
+++ b/client/tests/kvm/kvm_preprocessing.py
@@ -1,4 +1,5 @@
 import sys, os, time, commands, re, logging, signal, glob, threading, shutil
+import socket, select
 from autotest_lib.client.bin import test, utils
 from autotest_lib.client.common_lib import error
 import kvm_vm, kvm_utils, kvm_subprocess, ppm_utils
@@ -13,7 +14,8 @@ except ImportError:
 
 _screendump_thread = None
 _screendump_thread_termination_event = None
-
+_serialdump_thread = None
+_serialdump_thread_termination_event = None
 
 def preprocess_image(test, params):
 
@@ -267,6 +269,16 @@ def preprocess(test, params, env):
   args=(test, params, env))
 _screendump_thread.start()
 
+# Start the serial dump thread
+if params.get(serial_mode) == dump:
+logging.debug(Starting serialdump thread)
+global _serialdump_thread, _serialdump_thread_termination_event
+_serialdump_thread_termination_event = threading.Event()
+_serialdump_thread = threading.Thread(target=_dump_serial_console,
+  args=(test, params, env))
+_serialdump_thread.start()
+
+
 
 def postprocess(test, params, env):
 
@@ -286,6 +298,13 @@ def postprocess(test, params, env):
 _screendump_thread_termination_event.set()
 _screendump_thread.join(10)
 
+# Terminate the serialdump thread
+global _serialdump_thread, _serialdump_thread_termination_event
+if _serialdump_thread:
+logging.debug(Terminating serialdump thread...)
+_serialdump_thread_termination_event.set()
+_serialdump_thread.join(10)
+
 # Warn about corrupt PPM files
 for f in glob.glob(os.path.join(test.debugdir, *.ppm)):
 if not ppm_utils.image_verify_ppm_file(f):
@@ -450,3 +469,41 @@ def _take_screendumps(test, params, env):
 if _screendump_thread_termination_event.isSet():
 break
 _screendump_thread_termination_event.wait(delay)
+
+def _dump_serial_console(test, params, env):
+global _serialdump_thread_termination_event
+rs = []
+files = {}
+
+while True:
+for vm in kvm_utils.env_get_all_vms(env):
+if not vm in files and not vm.is_dead():
+try:
+serial_socket = socket.socket(socket.AF_UNIX,
+  socket.SOCK_STREAM)
+serial_socket.setblocking(False)
+serial_socket.connect(vm.serial_file_name)
+except:
+logging.debug(Could not connect to serial socket for %s %
+  vm.name)
+continue
+rs.append(serial_socket)
+serial_dump_filename = os.path.join(test.debugdir,
+serial-%s % vm.name)
+files[vm] = [serial_socket, file(serial_dump_filename, a+)]
+
+r, w, x = select.select(rs, [], [], 0.5)
+for vm in files.keys():
+(s ,d) = files[vm]
+if s in r:
+data = s.recv(16384)
+if not data:
+rs.remove(s)
+files.pop(vm)
+else:
+d.write(data)
+
+if _serialdump_thread_termination_event.isSet():
+break
+
+
diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index 9f82ffb..169a69e 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -13,6 +13,7 @@ start_vm = yes
 kill_vm = no
 kill_vm_gracefully = yes
 kill_unresponsive_vms = yes
+serial_mode = dump
 
 # Screendump specific stuff
 convert_ppm_files_to_png_on_error = yes

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 06/10] KVM test: Return none when met unknown type in kvm_vm.remote_login().

2010-05-11 Thread Jason Wang
None is returned when met the unknown type of shell_client in
kvm_vm.remote_login() in order to avoid the traceback.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 client/tests/kvm/kvm_vm.py |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py
index 8f4753f..cfebfc1 100755
--- a/client/tests/kvm/kvm_vm.py
+++ b/client/tests/kvm/kvm_vm.py
@@ -806,7 +806,10 @@ class VM:
 elif client == nc:
 session = kvm_utils.netcat(address, port, username, password,
prompt, linesep, timeout)
-
+else:
+logging.error(Unknown shell_client type %s % client)
+return None
+
 if session:
 session.set_status_test_command(self.params.get(status_test_
 command, ))

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 07/10] KVM test: Introduce local_login()

2010-05-11 Thread Jason Wang
This patch introduces a new method which is used to log into the guest
through the guest serial console. The serial_mode must be set to
session in order to make use of this patch. Serial does not support
cocurrent sessions, so it doest not aim at replacing the regular
remote shell servers. But it would be useful for the network related
test which may cause the network unresponsive such as driver
load/unload or some kinds of network stress testing.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 client/tests/kvm/kvm_vm.py |   25 +
 1 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/kvm_vm.py b/client/tests/kvm/kvm_vm.py
index cfebfc1..b7151c5 100755
--- a/client/tests/kvm/kvm_vm.py
+++ b/client/tests/kvm/kvm_vm.py
@@ -815,7 +815,32 @@ class VM:
 command, ))
 return session
 
+def local_login(self, timeout=240):
+
+Log into the guest via serial console
+If timeout expires while waiting for output from the guest (e.g. a
+password prompt or a shell prompt) -- fail.
+
+
+serial_mode = self.params.get(serial_mode)
+username = self.params.get(username, )
+password = self.params.get(password, )
+prompt = self.params.get(shell_prompt, [\#\$])
+linesep = eval('%s' % self.params.get(shell_linesep, r\n))
 
+if serial_mode != session:
+logging.debug(serial_mode is not session)
+return None
+else:
+command = nc -U %s  % self.serial_file_name
+assist = self.params.get(prompt_assist, )
+session = kvm_utils.remote_login(command, username, password, 
prompt,
+ linesep, timeout, assist)
+if session:
+session.set_status_test_command(self.params.get(status_test_
+command, ))
+return session
+
 def copy_files_to(self, local_path, remote_path, nic_index=0, timeout=300):
 
 Transfer files to the guest.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 08/10] KVM test: Enable the serial console for all linux guests

2010-05-11 Thread Jason Wang
As we have the ability to dump the content from serial console or use
a session through it, we need to redirect the console to serial
through unattended files to make use of it. The patch also keep the
tty0 accroding to the suggestion of Michael Goldish.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 client/tests/kvm/unattended/Fedora-10.ks |2 +-
 client/tests/kvm/unattended/Fedora-11.ks |2 +-
 client/tests/kvm/unattended/Fedora-12.ks |2 +-
 client/tests/kvm/unattended/Fedora-8.ks  |2 +-
 client/tests/kvm/unattended/Fedora-9.ks  |2 +-
 client/tests/kvm/unattended/OpenSUSE-11.xml  |1 +
 client/tests/kvm/unattended/RHEL-3-series.ks |2 +-
 client/tests/kvm/unattended/RHEL-4-series.ks |2 +-
 client/tests/kvm/unattended/RHEL-5-series.ks |2 +-
 client/tests/kvm/unattended/SLES-11.xml  |1 +
 10 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/client/tests/kvm/unattended/Fedora-10.ks 
b/client/tests/kvm/unattended/Fedora-10.ks
index 61e59d7..8da9eec 100644
--- a/client/tests/kvm/unattended/Fedora-10.ks
+++ b/client/tests/kvm/unattended/Fedora-10.ks
@@ -11,7 +11,7 @@ firewall --enabled --ssh
 selinux --enforcing
 timezone --utc America/New_York
 firstboot --disable
-bootloader --location=mbr
+bootloader --location=mbr --append=console=ttyS0,115200 console=tty0
 zerombr
 clearpart --all --initlabel
 autopart
diff --git a/client/tests/kvm/unattended/Fedora-11.ks 
b/client/tests/kvm/unattended/Fedora-11.ks
index 0be7d06..ef978e8 100644
--- a/client/tests/kvm/unattended/Fedora-11.ks
+++ b/client/tests/kvm/unattended/Fedora-11.ks
@@ -10,7 +10,7 @@ firewall --enabled --ssh
 selinux --enforcing
 timezone --utc America/New_York
 firstboot --disable
-bootloader --location=mbr
+bootloader --location=mbr --append=console=ttyS0,115200 console=tty0
 zerombr
 
 clearpart --all --initlabel
diff --git a/client/tests/kvm/unattended/Fedora-12.ks 
b/client/tests/kvm/unattended/Fedora-12.ks
index 0be7d06..ef978e8 100644
--- a/client/tests/kvm/unattended/Fedora-12.ks
+++ b/client/tests/kvm/unattended/Fedora-12.ks
@@ -10,7 +10,7 @@ firewall --enabled --ssh
 selinux --enforcing
 timezone --utc America/New_York
 firstboot --disable
-bootloader --location=mbr
+bootloader --location=mbr --append=console=ttyS0,115200 console=tty0
 zerombr
 
 clearpart --all --initlabel
diff --git a/client/tests/kvm/unattended/Fedora-8.ks 
b/client/tests/kvm/unattended/Fedora-8.ks
index f4a872d..e9a4c53 100644
--- a/client/tests/kvm/unattended/Fedora-8.ks
+++ b/client/tests/kvm/unattended/Fedora-8.ks
@@ -11,7 +11,7 @@ firewall --enabled --ssh
 selinux --enforcing
 timezone --utc America/New_York
 firstboot --disable
-bootloader --location=mbr
+bootloader --location=mbr --append=console=ttyS0,115200 console=tty0
 zerombr
 clearpart --all --initlabel
 autopart
diff --git a/client/tests/kvm/unattended/Fedora-9.ks 
b/client/tests/kvm/unattended/Fedora-9.ks
index f4a872d..e9a4c53 100644
--- a/client/tests/kvm/unattended/Fedora-9.ks
+++ b/client/tests/kvm/unattended/Fedora-9.ks
@@ -11,7 +11,7 @@ firewall --enabled --ssh
 selinux --enforcing
 timezone --utc America/New_York
 firstboot --disable
-bootloader --location=mbr
+bootloader --location=mbr --append=console=ttyS0,115200 console=tty0
 zerombr
 clearpart --all --initlabel
 autopart
diff --git a/client/tests/kvm/unattended/OpenSUSE-11.xml 
b/client/tests/kvm/unattended/OpenSUSE-11.xml
index 7dd44fa..64140bf 100644
--- a/client/tests/kvm/unattended/OpenSUSE-11.xml
+++ b/client/tests/kvm/unattended/OpenSUSE-11.xml
@@ -50,6 +50,7 @@
 moduleedd/module
   /initrd_module
 /initrd_modules
+appendconsole=ttyS0,115200 console=tty0/append
 loader_typegrub/loader_type
 sections config:type=list/
   /bootloader
diff --git a/client/tests/kvm/unattended/RHEL-3-series.ks 
b/client/tests/kvm/unattended/RHEL-3-series.ks
index 884b386..dab32f5 100644
--- a/client/tests/kvm/unattended/RHEL-3-series.ks
+++ b/client/tests/kvm/unattended/RHEL-3-series.ks
@@ -10,7 +10,7 @@ rootpw 123456
 firewall --enabled --ssh
 timezone America/New_York
 firstboot --disable
-bootloader --location=mbr
+bootloader --location=mbr --append=console=ttyS0,115200 console=tty0
 clearpart --all --initlabel
 autopart
 reboot
diff --git a/client/tests/kvm/unattended/RHEL-4-series.ks 
b/client/tests/kvm/unattended/RHEL-4-series.ks
index ce4a430..4816aa8 100644
--- a/client/tests/kvm/unattended/RHEL-4-series.ks
+++ b/client/tests/kvm/unattended/RHEL-4-series.ks
@@ -11,7 +11,7 @@ firewall --enabled --ssh
 selinux --enforcing
 timezone --utc America/New_York
 firstboot --disable
-bootloader --location=mbr
+bootloader --location=mbr --append=console=ttyS0,115200 console=tty0
 zerombr
 clearpart --all --initlabel
 autopart
diff --git a/client/tests/kvm/unattended/RHEL-5-series.ks 
b/client/tests/kvm/unattended/RHEL-5-series.ks
index f4a872d..e9a4c53 100644
--- a/client/tests/kvm/unattended/RHEL-5-series.ks
+++ b/client/tests/kvm/unattended/RHEL-5-series.ks

[PATCH v2 09/10] KVM test: Enable the serial console during unattended installation

2010-05-11 Thread Jason Wang
This patch enable the serial console during unattended installation
for all linux guests.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 client/tests/kvm/tests_base.cfg.sample |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index 169a69e..3c0933e 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -452,7 +452,7 @@ variants:
 pxe_initrd = initrd.img
 tftp = images/tftpboot
 extra_params +=  -bootp /pxelinux.0 -boot cn
-kernel_args = ks=floppy nicdelay=60
+kernel_args = ks=floppy nicdelay=60 console=ttyS0,115200 
console=tty0
 
 variants:
 - 8.32:
@@ -592,7 +592,7 @@ variants:
 pxe_initrd = initrd
 tftp = images/tftpboot
 extra_params +=  -bootp /pxelinux.0 -boot cn
-kernel_args = autoyast=floppy
+kernel_args = autoyast=floppy console=ttyS0,115200 
console=tty0
 post_install_delay = 10
 
 variants:
@@ -674,7 +674,7 @@ variants:
 pxe_image = linux
 pxe_initrd = initrd
 extra_params +=  -bootp /pxelinux.0 -boot n
-kernel_args = autoyast=floppy
+kernel_args = autoyast=floppy console=ttyS0,115200 
console=tty0
 post_install_delay = 10
 
 variants:
@@ -733,7 +733,7 @@ variants:
 pxe_initrd = initrd.img
 tftp = images/tftpboot
 extra_params +=  -bootp /pxelinux.0 -boot cn
-kernel_args = ks=floppy nicdelay=60
+kernel_args = ks=floppy nicdelay=60 console=ttyS0,115200 
console=tty0
 
 variants:
 - 3.9.i386:

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 10/10] KVM test: Add a helper to search the panic in the log

2010-05-11 Thread Jason Wang
This checker serves as the post_command to find the panic information
in the file which contains the content of guest serial console.

Signed-off-by: Jason Wang jasow...@redhat.com
---
 client/tests/kvm/scripts/check_serial.py |   41 ++
 client/tests/kvm/tests_base.cfg.sample   |7 -
 2 files changed, 46 insertions(+), 2 deletions(-)
 create mode 100644 client/tests/kvm/scripts/check_serial.py

diff --git a/client/tests/kvm/scripts/check_serial.py 
b/client/tests/kvm/scripts/check_serial.py
new file mode 100644
index 000..969bbe3
--- /dev/null
+++ b/client/tests/kvm/scripts/check_serial.py
@@ -0,0 +1,41 @@
+import os, sys, glob, re
+
+
+class SerialCheckerError(Exception):
+
+Simple wrapper for the builtin Exception class.
+
+pass
+
+
+class SerialChecker(object):
+
+Serach the panic or other keywords in the guest serial.
+
+def __init__(self):
+
+Gets params from environment variables and sets class attributes.
+
+client_dir =  os.environ['AUTODIR']
+self.pattern = os.environ['KVM_TEST_search_pattern']
+self.shortname = os.environ['KVM_TEST_shortname']
+self.debugdir = os.path.join(client_dir, 
results/default/kvm.%s/debug \
+ % self.shortname)
+self.serial_files = glob.glob(os.path.join(self.debugdir, 'serial*'))
+
+
+def check(self):
+
+Check whether the pattern were found in the serial files
+
+fail = [ f for f in self.serial_files if
+ re.findall(self.pattern, file(f).read(), re.I) ]
+if fail:
+print %s is found in %s % (self.pattern, fail)
+raise SerialCheckerError(Error found during the check, Please \
+  check the log)
+
+
+if __name__ == __main__:
+checker = SerialChecker()
+checker.check()
diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index 3c0933e..3ac8f0d 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -52,6 +52,10 @@ address_index = 0
 # Misc
 profilers = kvm_stat
 
+# pattern serach in guest serial console
+serach_pattern = panic
+post_command = python scripts/check_serial.py
+post_command_noncritical = no
 
 # Tests
 variants:
@@ -1314,10 +1318,9 @@ virtio|virtio_blk|e1000|balloon_check:
 variants:
 - @qcow2:
 image_format = qcow2
-post_command =  python scripts/check_image.py;
+post_command =  python scripts/check_image.py; python 
scripts/check_serial.py
 remove_image = no
 post_command_timeout = 600
-post_command_noncritical = yes
 - vmdk:
 only Fedora Ubuntu Windows
 only smp2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] Support for booting from virtio disks

2010-05-11 Thread Stefan Hajnoczi
From what I can tell SeaBIOS is reading CMOS_BIOS_BOOTFLAG1 and
CMOS_BIOS_BOOTFLAG2 from non-volatile memory.  The values index into
bev[], which contains IPL entries (the drives).

Is the order of bev[] entries well-defined?  Is there a way for QEMU
command-line to know that the first virtio-blk device corresponds to x
and the IDE CD-ROM corresponds to y?

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4 v3] KVM: VMX: VMXON/VMXOFF usage changes.

2010-05-11 Thread Avi Kivity

On 05/07/2010 05:43 AM, Xu, Dongxiao wrote:

From: Dongxiao Xudongxiao...@intel.com

SDM suggests VMXON should be called before VMPTRLD, and VMXOFF
should be called after doing VMCLEAR.

Therefore in vmm coexistence case, we should firstly call VMXON
before any VMCS operation, and then call VMXOFF after the
operation is done.


@@ -4118,7 +4143,10 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
struct kvm_cpuid_entry2 *best;
struct vcpu_vmx *vmx = to_vmx(vcpu);
u32 exec_control;
+   int cpu;

+   cpu = get_cpu();
+   vmx_vcpu_load(vmx-vcpu, cpu);
vmx-rdtscp_enabled = false;
if (vmx_rdtscp_supported()) {
exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
@@ -4133,6 +4161,9 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu)
}
}
}
+   vmx_vcpu_put(vmx-vcpu);
+   put_cpu();
+
  }

  static void vmx_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
   


I'm not sure why this is needed.  vmx_cpuid_update() is called from a 
vcpu ioctl which should have called vcpu_load() before.


Apart from that, everything looks good for merging.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] VMX: Invalid guest state detection enhancements and bug fixes

2010-05-11 Thread Avi Kivity

On 05/10/2010 06:51 PM, Mohammed Gamal wrote:

- Correct unusable flag check on SS, DS, ES, FS, GS, and LDTR
- Add rflags checks
- Report failed instruction on emulation failure

   


Please post as separate patches.


diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 777e00d..968384b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2122,7 +2122,7 @@ static bool stack_segment_valid(struct kvm_vcpu *vcpu)
ss_rpl = ss.selector  SELECTOR_RPL_MASK;

if (ss.unusable)
-   return true;
+   return false;
   


Where does it say that ss must be usable?


if (ss.type != 3  ss.type != 7)
return false;
if (!ss.s)
@@ -2144,7 +2144,7 @@ static bool data_segment_valid(struct kvm_vcpu *vcpu, int 
seg)
rpl = var.selector  SELECTOR_RPL_MASK;

if (var.unusable)
-   return true;
+   return false;
if (!var.s)
return false;
if (!var.present)
   


Ditto.


@@ -2185,7 +2185,7 @@ static bool ldtr_valid(struct kvm_vcpu *vcpu)
vmx_get_segment(vcpu,ldtr, VCPU_SREG_LDTR);

if (ldtr.unusable)
-   return true;
+   return false;
if (ldtr.selector  SELECTOR_TI_MASK)   /* TI = 1 */
return false;
if (ldtr.type != 2)
   


Ditto.


@@ -2207,6 +2207,27 @@ static bool cs_ss_rpl_check(struct kvm_vcpu *vcpu)
 (ss.selector  SELECTOR_RPL_MASK));
  }

+static bool rflags_valid(struct kvm_vcpu *vcpu)
+{
+   unsigned long rflags;
+   u32 entry_intr_info;
+
+   rflags = vmcs_readl(GUEST_RFLAGS);
+   entry_intr_info = vmcs_read32(VM_ENTRY_INTR_INFO_FIELD);
+#ifdef CONFIG_X86_64
+   if (is_long_mode(vcpu))
+   if (rflags  X86_EFLAGS_VM)
+   return false;   
+#endif
   


This is architecturally illegal, not just for vmx entries.  The check 
should be made when emulating setting rflags and in KVM_SET_REGS.


These checks should assume the state is architecturally legal, and only 
check if they are legal for vmx entries.



+   if ((entry_intr_info  INTR_INFO_INTR_TYPE_MASK) == INTR_TYPE_EXT_INTR
+ (entry_intr_info  INTR_INFO_VALID_MASK)) {
+   if (!(rflags  X86_EFLAGS_IF))
+   return false;
+   }
+
   


Ditto.


@@ -3559,6 +3581,7 @@ static int handle_invalid_guest_state(struct kvm_vcpu 
*vcpu)
}

if (err != EMULATE_DONE) {
+   kvm_report_emulation_failure(vcpu, invalid guest state 
handler);
vcpu-run-exit_reason = KVM_EXIT_INTERNAL_ERROR;
vcpu-run-internal.suberror = 
KVM_INTERNAL_ERROR_EMULATION;
vcpu-run-internal.ndata = 0;
   


Uneeded, userspace can report it.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] pci: cleanly backout of pci_qdev_init()

2010-05-11 Thread Markus Armbruster
Alex Williamson alex.william...@redhat.com writes:

 If the init function of a device fails, as might happen with device
 assignment, we never undo the work done by do_pci_register_device().
 This not only causes a bit of a memory leak, but also leaves a bogus
 pointer in the bus devices array that can cause a segfault or
 garbage data from 'info pci'.

 Signed-off-by: Alex Williamson alex.william...@redhat.com

Looks good to me.  Gerd?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] pci: cleanly backout of pci_qdev_init()

2010-05-11 Thread Gerd Hoffmann

On 05/11/10 11:25, Markus Armbruster wrote:

Alex Williamsonalex.william...@redhat.com  writes:


If the init function of a device fails, as might happen with device
assignment, we never undo the work done by do_pci_register_device().
This not only causes a bit of a memory leak, but also leaves a bogus
pointer in the bus devices array that can cause a segfault or
garbage data from 'info pci'.

Signed-off-by: Alex Williamsonalex.william...@redhat.com


Looks good to me.  Gerd?


Yes, looks fine.

cheers,
  Gerd


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 4/4 v3] KVM: VMX: VMXON/VMXOFF usage changes.

2010-05-11 Thread Xu, Dongxiao
Avi Kivity wrote:
 On 05/07/2010 05:43 AM, Xu, Dongxiao wrote:
 From: Dongxiao Xudongxiao...@intel.com
 
 SDM suggests VMXON should be called before VMPTRLD, and VMXOFF
 should be called after doing VMCLEAR.
 
 Therefore in vmm coexistence case, we should firstly call VMXON
 before any VMCS operation, and then call VMXOFF after the
 operation is done.
 
 
 @@ -4118,7 +4143,10 @@ static void vmx_cpuid_update(struct kvm_vcpu
  *vcpu) struct kvm_cpuid_entry2 *best;
  struct vcpu_vmx *vmx = to_vmx(vcpu);
  u32 exec_control;
 +int cpu;
 
 +cpu = get_cpu();
 +vmx_vcpu_load(vmx-vcpu, cpu);
  vmx-rdtscp_enabled = false;
  if (vmx_rdtscp_supported()) {
  exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL);
 @@ -4133,6 +4161,9 @@ static void vmx_cpuid_update(struct kvm_vcpu
  *vcpu)  } }
  }
 +vmx_vcpu_put(vmx-vcpu);
 +put_cpu();
 +
   }
 
   static void vmx_set_supported_cpuid(u32 func, struct
 kvm_cpuid_entry2 *entry) 
 
 
 I'm not sure why this is needed.  vmx_cpuid_update() is called from a
 vcpu ioctl which should have called vcpu_load() before.
 
 Apart from that, everything looks good for merging.

Vcpu_load() and vcpu_put() is not called in that ioctl. I will add that and
send out another version of patchset.

Thanks,
Dongxiao

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4 v3] KVM: VMX: VMXON/VMXOFF usage changes.

2010-05-11 Thread Avi Kivity

On 05/11/2010 12:38 PM, Xu, Dongxiao wrote:



I'm not sure why this is needed.  vmx_cpuid_update() is called from a
vcpu ioctl which should have called vcpu_load() before.

Apart from that, everything looks good for merging.
 

Vcpu_load() and vcpu_put() is not called in that ioctl. I will add that and
send out another version of patchset.
   


That's a serious bug even before this patchset, since vmx_cpuid_update 
accesses the vmcs.  Please send the fix as a separate patch (which will 
be backported to -stable).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 4/4 v3] KVM: VMX: VMXON/VMXOFF usage changes.

2010-05-11 Thread Xu, Dongxiao
Avi Kivity wrote:
 On 05/11/2010 12:38 PM, Xu, Dongxiao wrote:
 
 I'm not sure why this is needed.  vmx_cpuid_update() is called from
 a vcpu ioctl which should have called vcpu_load() before.
 
 Apart from that, everything looks good for merging.
 
 Vcpu_load() and vcpu_put() is not called in that ioctl. I will add
 that and send out another version of patchset.
 
 
 That's a serious bug even before this patchset, since vmx_cpuid_update
 accesses the vmcs.  Please send the fix as a separate patch (which
 will be backported to -stable).

That's OK. 

Thanks
Dongxiao--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space

2010-05-11 Thread Takuya Yoshikawa




In usual workload, the number of dirty pages varies a lot for each
iteration
and we should gain really a lot for relatively clean cases.


Can you post such a test, for an idle large guest?


OK, I'll do!



Result of low workload test (running top during migration) first,

4GB guest
picked up slots[1](len=3757047808) only
*
get.org get.optswitch.opt

1060875 310292 190335
1076754 301295 188600
 655504 318284 196029
 529769 301471325
 694796  70216 221172
 651868 353073 196184
 543339 312865 213236
1061938  72785 203090
 689527 323901 249519
 621364 323881473
1063671  70703 192958
 915903 336318 174008
1046462 332384782
1037942  72783 190655
 680122 318305 243544
 688156 314935 193526
 558658 265934 190550
 652454 372135 196270
 660140  68613352
1101947 378642 186575
.........
*

As expected we've got the difference more clearly.

In this case, switch.opt reduced 1/3 (.1 msec) compared to get.opt
for each iteration.

And when the slot is cleaner, the ratio is bigger.










--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86: Call vcpu_load and vcpu_put in cpuid_update.

2010-05-11 Thread Xu, Dongxiao
From: Dongxiao Xu dongxiao...@intel.com

cpuid_update may operate VMCS, so vcpu_load() and vcpu_put()
should be called to ensure correctness.

Signed-off-by: Dongxiao Xu dongxiao...@intel.com
---
 arch/x86/kvm/x86.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6b2ce1d..08edfc8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1773,6 +1773,7 @@ static int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
if (copy_from_user(cpuid_entries, entries,
   cpuid-nent * sizeof(struct kvm_cpuid_entry)))
goto out_free;
+   vcpu_load(vcpu);
for (i = 0; i  cpuid-nent; i++) {
vcpu-arch.cpuid_entries[i].function = 
cpuid_entries[i].function;
vcpu-arch.cpuid_entries[i].eax = cpuid_entries[i].eax;
@@ -1790,6 +1791,7 @@ static int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
r = 0;
kvm_apic_set_version(vcpu);
kvm_x86_ops-cpuid_update(vcpu);
+   vcpu_put(vcpu);
 
 out_free:
vfree(cpuid_entries);
@@ -1810,9 +1812,11 @@ static int kvm_vcpu_ioctl_set_cpuid2(struct kvm_vcpu 
*vcpu,
if (copy_from_user(vcpu-arch.cpuid_entries, entries,
   cpuid-nent * sizeof(struct kvm_cpuid_entry2)))
goto out;
+   vcpu_load(vcpu);
vcpu-arch.cpuid_nent = cpuid-nent;
kvm_apic_set_version(vcpu);
kvm_x86_ops-cpuid_update(vcpu);
+   vcpu_put(vcpu);
return 0;
 
 out:
-- 
1.6.3
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4 v4] KVM: VMX: Support hosted VMM coexsitence.

2010-05-11 Thread Xu, Dongxiao
Hi all,

This is hosted VMM coexistence support v4.

Main changes from v3:
Remove the change to vmx_cpuid_update() since vcpu_load() and
vcpu_put() will be added around cpuid_update by another patch.

Main changes from v2:
1) Change vmm_coexistence to vmm_exclusive.
2) Some code structure changes. Split the original 3 patches to 4.
3) Address some comments from Avi.

Main changes from v1:
1) Add an module option vmm_coexistence to decide whether to
enable this feature. Currently it is off defaultly.
2) Each time when a KVM vcpu is scheduled in, it will invalidate EPT
and VPID TLBs to avoid conflict between different VMMs.

VMX: Support for coexistence of KVM and other hosted VMMs. 

The following NOTE is picked up from Intel SDM 3B 27.3 chapter, 
MANAGING VMCS REGIONS AND POINTERS.

--
NOTE
As noted in Section 21.1, the processor may optimize VMX operation
by maintaining the state of an active VMCS (one for which VMPTRLD
has been executed) on the processor. Before relinquishing control to
other system software that may, without informing the VMM, remove
power from the processor (e.g., for transitions to S3 or S4) or leave
VMX operation, a VMM must VMCLEAR all active VMCSs. This ensures
that all VMCS data cached by the processor are flushed to memory
and that no other software can corrupt the current VMM's VMCS data.
It is also recommended that the VMM execute VMXOFF after such
executions of VMCLEAR.
--

Currently, VMCLEAR is called at VCPU migration. To support hosted
VMM coexistence, this patch modifies the VMCLEAR/VMPTRLD and
VMXON/VMXOFF usages. VMCLEAR will be called when VCPU is
scheduled out of a physical CPU, while VMPTRLD is called when VCPU
is scheduled in a physical CPU. Also this approach could eliminates
the IPI mechanism for original VMCLEAR. As suggested by SDM,
VMXOFF will be called after VMCLEAR, and VMXON will be called
before VMPTRLD.

With this patchset, KVM and VMware Workstation 7 could launch
serapate guests and they can work well with each other. --
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4 v4] KVM: VMX: Define new functions to wrapper direct call of asm code.

2010-05-11 Thread Xu, Dongxiao
From: Dongxiao Xu dongxiao...@intel.com

Define vmcs_load() and kvm_cpu_vmxon() to avoid direct call of asm
code. Also move VMXE bit operation out of kvm_cpu_vmxoff().

Signed-off-by: Dongxiao Xu dongxiao...@intel.com
---
 arch/x86/kvm/vmx.c |   36 +++-
 1 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 875b785..e77da89 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -445,6 +445,19 @@ static void vmcs_clear(struct vmcs *vmcs)
   vmcs, phys_addr);
 }
 
+static void vmcs_load(struct vmcs *vmcs)
+{
+   u64 phys_addr = __pa(vmcs);
+   u8 error;
+
+   asm volatile (__ex(ASM_VMX_VMPTRLD_RAX) ; setna %0
+   : =g(error) : a(phys_addr), m(phys_addr)
+   : cc, memory);
+   if (error)
+   printk(KERN_ERR kvm: vmptrld %p/%llx fail\n,
+  vmcs, phys_addr);
+}
+
 static void __vcpu_clear(void *arg)
 {
struct vcpu_vmx *vmx = arg;
@@ -769,7 +782,6 @@ static void vmx_load_host_state(struct vcpu_vmx *vmx)
 static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
-   u64 phys_addr = __pa(vmx-vmcs);
u64 tsc_this, delta, new_offset;
 
if (vcpu-cpu != cpu) {
@@ -783,15 +795,8 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
}
 
if (per_cpu(current_vmcs, cpu) != vmx-vmcs) {
-   u8 error;
-
per_cpu(current_vmcs, cpu) = vmx-vmcs;
-   asm volatile (__ex(ASM_VMX_VMPTRLD_RAX) ; setna %0
- : =g(error) : a(phys_addr), m(phys_addr)
- : cc);
-   if (error)
-   printk(KERN_ERR kvm: vmptrld %p/%llx fail\n,
-  vmx-vmcs, phys_addr);
+   vmcs_load(vmx-vmcs);
}
 
if (vcpu-cpu != cpu) {
@@ -1220,6 +1225,13 @@ static __init int vmx_disabled_by_bios(void)
/* locked but not enabled */
 }
 
+static void kvm_cpu_vmxon(u64 addr)
+{
+   asm volatile (ASM_VMX_VMXON_RAX
+   : : a(addr), m(addr)
+   : memory, cc);
+}
+
 static int hardware_enable(void *garbage)
 {
int cpu = raw_smp_processor_id();
@@ -1240,9 +1252,7 @@ static int hardware_enable(void *garbage)
   FEATURE_CONTROL_LOCKED |
   FEATURE_CONTROL_VMXON_ENABLED);
write_cr4(read_cr4() | X86_CR4_VMXE); /* FIXME: not cpu hotplug safe */
-   asm volatile (ASM_VMX_VMXON_RAX
- : : a(phys_addr), m(phys_addr)
- : memory, cc);
+   kvm_cpu_vmxon(phys_addr);
 
ept_sync_global();
 
@@ -1266,13 +1276,13 @@ static void vmclear_local_vcpus(void)
 static void kvm_cpu_vmxoff(void)
 {
asm volatile (__ex(ASM_VMX_VMXOFF) : : : cc);
-   write_cr4(read_cr4()  ~X86_CR4_VMXE);
 }
 
 static void hardware_disable(void *garbage)
 {
vmclear_local_vcpus();
kvm_cpu_vmxoff();
+   write_cr4(read_cr4()  ~X86_CR4_VMXE);
 }
 
 static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt,
-- 
1.6.3
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4 v4] KVM: VMX: VMCLEAR/VMPTRLD usage changes.

2010-05-11 Thread Xu, Dongxiao
From: Dongxiao Xu dongxiao...@intel.com

Originally VMCLEAR/VMPTRLD is called on vcpu migration. To
support hosted VMM coexistance, VMCLEAR is executed on vcpu
schedule out, and VMPTRLD is executed on vcpu schedule in.
This could also eliminate the IPI when doing VMCLEAR.

Signed-off-by: Dongxiao Xu dongxiao...@intel.com
---
 arch/x86/kvm/vmx.c |7 ++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 49b0850..c536b9d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -62,6 +62,9 @@ module_param_named(unrestricted_guest,
 static int __read_mostly emulate_invalid_guest_state = 0;
 module_param(emulate_invalid_guest_state, bool, S_IRUGO);
 
+static int __read_mostly vmm_exclusive = 1;
+module_param(vmm_exclusive, bool, S_IRUGO);
+
 #define KVM_GUEST_CR0_MASK_UNRESTRICTED_GUEST  \
(X86_CR0_WP | X86_CR0_NE | X86_CR0_NW | X86_CR0_CD)
 #define KVM_GUEST_CR0_MASK \
@@ -784,7 +787,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
struct vcpu_vmx *vmx = to_vmx(vcpu);
u64 tsc_this, delta, new_offset;
 
-   if (vcpu-cpu != cpu)
+   if (vmm_exclusive  vcpu-cpu != cpu)
vcpu_clear(vmx);
 
if (per_cpu(current_vmcs, cpu) != vmx-vmcs) {
@@ -830,6 +833,8 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
 {
__vmx_load_host_state(to_vmx(vcpu));
+   if (!vmm_exclusive)
+   __vcpu_clear(to_vmx(vcpu));
 }
 
 static void vmx_fpu_activate(struct kvm_vcpu *vcpu)
-- 
1.6.3
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4 v4] KVM: VMX: VMXON/VMXOFF usage changes.

2010-05-11 Thread Xu, Dongxiao
From: Dongxiao Xu dongxiao...@intel.com

SDM suggests VMXON should be called before VMPTRLD, and VMXOFF
should be called after doing VMCLEAR.

Therefore in vmm coexistence case, we should firstly call VMXON
before any VMCS operation, and then call VMXOFF after the
operation is done.

Signed-off-by: Dongxiao Xu dongxiao...@intel.com
---
 arch/x86/kvm/vmx.c |   38 +++---
 1 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c536b9d..dbd47a7 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -168,6 +168,8 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
 
 static int init_rmode(struct kvm *kvm);
 static u64 construct_eptp(unsigned long root_hpa);
+static void kvm_cpu_vmxon(u64 addr);
+static void kvm_cpu_vmxoff(void);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -786,8 +788,11 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
u64 tsc_this, delta, new_offset;
+   u64 phys_addr = __pa(per_cpu(vmxarea, cpu));
 
-   if (vmm_exclusive  vcpu-cpu != cpu)
+   if (!vmm_exclusive)
+   kvm_cpu_vmxon(phys_addr);
+   else if (vcpu-cpu != cpu)
vcpu_clear(vmx);
 
if (per_cpu(current_vmcs, cpu) != vmx-vmcs) {
@@ -833,8 +838,10 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
 {
__vmx_load_host_state(to_vmx(vcpu));
-   if (!vmm_exclusive)
+   if (!vmm_exclusive) {
__vcpu_clear(to_vmx(vcpu));
+   kvm_cpu_vmxoff();
+   }
 }
 
 static void vmx_fpu_activate(struct kvm_vcpu *vcpu)
@@ -1257,9 +1264,11 @@ static int hardware_enable(void *garbage)
   FEATURE_CONTROL_LOCKED |
   FEATURE_CONTROL_VMXON_ENABLED);
write_cr4(read_cr4() | X86_CR4_VMXE); /* FIXME: not cpu hotplug safe */
-   kvm_cpu_vmxon(phys_addr);
 
-   ept_sync_global();
+   if (vmm_exclusive) {
+   kvm_cpu_vmxon(phys_addr);
+   ept_sync_global();
+   }
 
return 0;
 }
@@ -1285,8 +1294,10 @@ static void kvm_cpu_vmxoff(void)
 
 static void hardware_disable(void *garbage)
 {
-   vmclear_local_vcpus();
-   kvm_cpu_vmxoff();
+   if (vmm_exclusive) {
+   vmclear_local_vcpus();
+   kvm_cpu_vmxoff();
+   }
write_cr4(read_cr4()  ~X86_CR4_VMXE);
 }
 
@@ -3949,6 +3960,19 @@ static void vmx_free_vcpu(struct kvm_vcpu *vcpu)
kmem_cache_free(kvm_vcpu_cache, vmx);
 }
 
+static inline void vmcs_init(struct vmcs *vmcs)
+{
+   u64 phys_addr = __pa(per_cpu(vmxarea, raw_smp_processor_id()));
+
+   if (!vmm_exclusive)
+   kvm_cpu_vmxon(phys_addr);
+
+   vmcs_clear(vmcs);
+
+   if (!vmm_exclusive)
+   kvm_cpu_vmxoff();
+}
+
 static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
 {
int err;
@@ -3974,7 +3998,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, 
unsigned int id)
if (!vmx-vmcs)
goto free_msrs;
 
-   vmcs_clear(vmx-vmcs);
+   vmcs_init(vmx-vmcs);
 
cpu = get_cpu();
vmx_vcpu_load(vmx-vcpu, cpu);
-- 
1.6.3
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4 v4] KVM: VMX: Some minor changes to code structure.

2010-05-11 Thread Xu, Dongxiao
From: Dongxiao Xu dongxiao...@intel.com

Do some preparations for vmm coexistence support.

Signed-off-by: Dongxiao Xu dongxiao...@intel.com
---
 arch/x86/kvm/vmx.c |   16 
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e77da89..49b0850 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -784,15 +784,8 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
struct vcpu_vmx *vmx = to_vmx(vcpu);
u64 tsc_this, delta, new_offset;
 
-   if (vcpu-cpu != cpu) {
+   if (vcpu-cpu != cpu)
vcpu_clear(vmx);
-   kvm_migrate_timers(vcpu);
-   set_bit(KVM_REQ_TLB_FLUSH, vcpu-requests);
-   local_irq_disable();
-   list_add(vmx-local_vcpus_link,
-per_cpu(vcpus_on_cpu, cpu));
-   local_irq_enable();
-   }
 
if (per_cpu(current_vmcs, cpu) != vmx-vmcs) {
per_cpu(current_vmcs, cpu) = vmx-vmcs;
@@ -803,6 +796,13 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
struct desc_ptr dt;
unsigned long sysenter_esp;
 
+   kvm_migrate_timers(vcpu);
+   set_bit(KVM_REQ_TLB_FLUSH, vcpu-requests);
+   local_irq_disable();
+   list_add(vmx-local_vcpus_link,
+per_cpu(vcpus_on_cpu, cpu));
+   local_irq_enable();
+
vcpu-cpu = cpu;
/*
 * Linux uses per-cpu TSS and GDT, so set these when switching
-- 
1.6.3
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] VMX: Invalid guest state detection enhancements and bug fixes

2010-05-11 Thread Mohammed Gamal
On Tue, May 11, 2010 at 12:24 PM, Avi Kivity a...@redhat.com wrote:
 On 05/10/2010 06:51 PM, Mohammed Gamal wrote:

 - Correct unusable flag check on SS, DS, ES, FS, GS, and LDTR
 - Add rflags checks
 - Report failed instruction on emulation failure



 Please post as separate patches.

 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 777e00d..968384b 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -2122,7 +2122,7 @@ static bool stack_segment_valid(struct kvm_vcpu
 *vcpu)
        ss_rpl = ss.selector  SELECTOR_RPL_MASK;

        if (ss.unusable)
 -               return true;
 +               return false;


 Where does it say that ss must be usable?

Oh, I must have misread it. The unusable bit check is for virtual 8086
mode. However, with this check returning true when the segment is
unusable we still return prematurely as we don't do the other checks.


        if (ss.type != 3  ss.type != 7)
                return false;
        if (!ss.s)
 @@ -2144,7 +2144,7 @@ static bool data_segment_valid(struct kvm_vcpu
 *vcpu, int seg)
        rpl = var.selector  SELECTOR_RPL_MASK;

        if (var.unusable)
 -               return true;
 +               return false;
        if (!var.s)
                return false;
        if (!var.present)


 Ditto.

 @@ -2185,7 +2185,7 @@ static bool ldtr_valid(struct kvm_vcpu *vcpu)
        vmx_get_segment(vcpu,ldtr, VCPU_SREG_LDTR);

        if (ldtr.unusable)
 -               return true;
 +               return false;
        if (ldtr.selector  SELECTOR_TI_MASK)   /* TI = 1 */
                return false;
        if (ldtr.type != 2)


 Ditto.

 @@ -2207,6 +2207,27 @@ static bool cs_ss_rpl_check(struct kvm_vcpu *vcpu)
                 (ss.selector  SELECTOR_RPL_MASK));
  }

 +static bool rflags_valid(struct kvm_vcpu *vcpu)
 +{
 +       unsigned long rflags;
 +       u32 entry_intr_info;
 +
 +       rflags = vmcs_readl(GUEST_RFLAGS);
 +       entry_intr_info = vmcs_read32(VM_ENTRY_INTR_INFO_FIELD);
 +#ifdef CONFIG_X86_64
 +       if (is_long_mode(vcpu))
 +               if (rflags  X86_EFLAGS_VM)
 +                       return false;
 +#endif


 This is architecturally illegal, not just for vmx entries.  The check should
 be made when emulating setting rflags and in KVM_SET_REGS.

 These checks should assume the state is architecturally legal, and only
 check if they are legal for vmx entries.


 +       if ((entry_intr_info  INTR_INFO_INTR_TYPE_MASK) ==
 INTR_TYPE_EXT_INTR
 +                 (entry_intr_info  INTR_INFO_VALID_MASK)) {
 +               if (!(rflags  X86_EFLAGS_IF))
 +                       return false;
 +       }
 +


 Ditto.

 @@ -3559,6 +3581,7 @@ static int handle_invalid_guest_state(struct
 kvm_vcpu *vcpu)
                }

                if (err != EMULATE_DONE) {
 +                       kvm_report_emulation_failure(vcpu, invalid guest
 state handler);
                        vcpu-run-exit_reason = KVM_EXIT_INTERNAL_ERROR;
                        vcpu-run-internal.suberror =
 KVM_INTERNAL_ERROR_EMULATION;
                        vcpu-run-internal.ndata = 0;


 Uneeded, userspace can report it.

Userspace does report the address at which a VM fails, it doesn't for
instance show the contents of RIP which is necessary for knowing which
instruction failed, this is at least needed for testing purposes. IIRC
I've seen some trace_kvm_failed_insn() functions somewhere. But I
don't know how to use tracing utilities so I'd be delighted if someone
could point me to some relevant documentation or something.


 --
 error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Qemu-KVM Livate Migration 0.12.2 - 0.12.3/4 broken?

2010-05-11 Thread Peter Lieven

Hi Qemu/KVM Devel Team,

Live Migration from a 0.12.2 qemu-kvm to a 0.12.3 (and 0.12.4)
does not work: load of migration failed

Is there any way to find out, why exactly it fails? I have
a lot of VMs running on 0.12.2 and would like to migrate
them to 0.12.4

cmdline:
-net tap,vlan=6,script=no,downscript=no,ifname=tap7 -net 
nic,vlan=6,model=e1000,macaddr=52:54:00:fe:00:88   -net 
tap,vlan=651,script=no,downscript=no,ifname=tap8 -net 
nic,vlan=651,model=e1000,macaddr=52:54:00:ff:00:69   -drive 
file=/dev/mapper/iqn.2001-05.com.equallogic:0-8a0906-b6eca7604-6280020bc4b4bde8-quagga,if=ide,boot=on,cache=none,aio=native 
 -m 256 -monitor tcp:0:4090,server,nowait -vnc :90 -name 'Quagga' 
-boot order=dc,menu=on  -k de  -incoming tcp:172.21.59.132:5090 
-pidfile /var/run/qemu/vm-148.pid  -rtc base=utc,clock=vm  -usb 
-usbdevice tablet  -no-kvm-irqchip  -vga cirrus


Any hints would be appreciated!

Thanks,
Peter



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] VMX: Invalid guest state detection enhancements and bug fixes

2010-05-11 Thread Gleb Natapov
On Tue, May 11, 2010 at 01:53:51PM +0300, Mohammed Gamal wrote:
                 if (err != EMULATE_DONE) {
  +                       kvm_report_emulation_failure(vcpu, invalid guest
  state handler);
                         vcpu-run-exit_reason = KVM_EXIT_INTERNAL_ERROR;
                         vcpu-run-internal.suberror =
  KVM_INTERNAL_ERROR_EMULATION;
                         vcpu-run-internal.ndata = 0;
 
 
  Uneeded, userspace can report it.
 
 Userspace does report the address at which a VM fails, it doesn't for
 instance show the contents of RIP which is necessary for knowing which
 instruction failed, this is at least needed for testing purposes. IIRC
 I've seen some trace_kvm_failed_insn() functions somewhere. But I
 don't know how to use tracing utilities so I'd be delighted if someone
 could point me to some relevant documentation or something.
Userspace can be extended to print instruction at RIP on emulation
failure. Furthermore userspace can even disassemble it for you. IIRC
kvm_report_emulation_failure() was once removed from here and I've
actually sent patch to remove kvm_report_emulation_failure() at all.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.33.3: possible recursive locking detected

2010-05-11 Thread CaT
On Wed, May 05, 2010 at 10:52:50AM +0800, Américo Wang wrote:
 On Wed, May 5, 2010 at 10:32 AM, Yong Zhang yong.zh...@windriver.com wrote:
  On Tue, May 04, 2010 at 11:37:37AM +0300, Avi Kivity wrote:
  On 05/04/2010 10:03 AM, CaT wrote:
  I'm currently running 2.6.33.3 in a KVM instance emulating a core2duo
  on 1 cpu with virtio HDs running on top of a core2duo host running 
  2.6.33.3.
  qemu-kvm version 0.12.3.
 
  Can you try commit 6992f5334995af474c2b58d010d08bc597f0f2fe in the latest
  kernel?
 
 
 Hmm, 2.6.33 -stable has commit 846f99749ab68bbc7f75c74fec305de675b1a1bf?
 
 Actually, these 3 commits fixed it:
 
 6992f5334995af474c2b58d010d08bc597f0f2fe sysfs: Use one lockdep class
 per sysfs ttribute.
 a2db6842873c8e5a70652f278d469128cb52db70 sysfs: Only take active
 references on attributes.
 e72ceb8ccac5f770b3e696e09bb673dca7024b20 sysfs: Remove 
 sysfs_get/put_active_two
 
 However, there are many other patches needed to amend these, so I think
 it's not suitable for -stable to include, perhaps a revert of
 846f99749ab68bbc7f75c74fec305de675b1a1bf is better.

Slightly at a loss as to what to do, now. It's a virt instance so I can
apply patches at will but, well, clarity is good. :)

-- 
  A search of his car uncovered pornography, a homemade sex aid, women's 
  stockings and a Jack Russell terrier.
- http://www.news.com.au/story/0%2C27574%2C24675808-421%2C00.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] Support for booting from virtio disks

2010-05-11 Thread Gleb Natapov
On Tue, May 11, 2010 at 11:19:07AM +0300, Avi Kivity wrote:
 On 05/10/2010 06:48 PM, Anthony Liguori wrote:
 On 05/10/2010 03:11 AM, Gleb Natapov wrote:
 This patch adds native support for booting from virtio disks to Seabios.
 
 Signed-off-by: Gleb Natapovg...@redhat.com
 
 A related problem that I think we need to think about how we solve
 is indicating to Seabios which device we want to boot from
 
 With your patch, a user can select a virtio device explicitly or
 if they use only one virtio device, it will Just Work.
 
 However, if a user uses IDE and virtio, or a user has multiple
 disks, they cannot select a device via -boot.
 
 Is this something we need to address?  I don't think we'd break
 libvirt if we didn't.
 
 BIOSes traditionally address this by storing the boot order in RTC
 non-volatile memory, and allow the user to configure the order via a
 menu.  We could do the same (storing the RTC memory in a small disk
 image).
 
Real BIOS can do that because it enumerates all bootable devices,
attach name for each one of them and then asks user to configure
boot order using names it attached to devices. In our case we
want to provide boot order on qemu command line before BIOS
enumerated devices, so qemu should be able to pass enough information
about boot device so that BIOS can uniquely identify it after it will
discover all bootable devices. bus/device pair can be such thing.

 Alternatively we can seed the order from the command line (-boot
 id1,id2,id3 where id* are some qdev property attached to disks, this
 is more flexible than the current syntax I think).
 
The problem is how to communicate this order to Seabios.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] Support for booting from virtio disks

2010-05-11 Thread Gleb Natapov
On Tue, May 11, 2010 at 10:04:25AM +0100, Stefan Hajnoczi wrote:
 From what I can tell SeaBIOS is reading CMOS_BIOS_BOOTFLAG1 and
 CMOS_BIOS_BOOTFLAG2 from non-volatile memory.  The values index into
 bev[], which contains IPL entries (the drives).
 
 Is the order of bev[] entries well-defined?  Is there a way for QEMU
 command-line to know that the first virtio-blk device corresponds to x
 and the IDE CD-ROM corresponds to y?
 
The order of bev[] is the order in which bootable devices where discovered
and can change from one version of Seabios to another.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS] [PATCHv2] Support for booting from virtio disks

2010-05-11 Thread Kevin O'Connor
On Tue, May 11, 2010 at 10:04:25AM +0100, Stefan Hajnoczi wrote:
 From what I can tell SeaBIOS is reading CMOS_BIOS_BOOTFLAG1 and
 CMOS_BIOS_BOOTFLAG2 from non-volatile memory.  The values index into
 bev[], which contains IPL entries (the drives).
 
 Is the order of bev[] entries well-defined?  Is there a way for QEMU
 command-line to know that the first virtio-blk device corresponds to x
 and the IDE CD-ROM corresponds to y?

SeaBIOS arranges for bev[0] = floppy, bev[1] = hd, bev[2] = cdrom, and
bev[3] to be the first network card - it does this so that the boot
order can be read from qemu.  However, it's a pain to force this
order.

-Kevin
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS] [PATCHv2] Support for booting from virtio disks

2010-05-11 Thread Gleb Natapov
On Tue, May 11, 2010 at 08:45:29AM -0400, Kevin O'Connor wrote:
 On Tue, May 11, 2010 at 10:04:25AM +0100, Stefan Hajnoczi wrote:
  From what I can tell SeaBIOS is reading CMOS_BIOS_BOOTFLAG1 and
  CMOS_BIOS_BOOTFLAG2 from non-volatile memory.  The values index into
  bev[], which contains IPL entries (the drives).
  
  Is the order of bev[] entries well-defined?  Is there a way for QEMU
  command-line to know that the first virtio-blk device corresponds to x
  and the IDE CD-ROM corresponds to y?
 
 SeaBIOS arranges for bev[0] = floppy, bev[1] = hd, bev[2] = cdrom, and
 bev[3] to be the first network card - it does this so that the boot
 order can be read from qemu.  However, it's a pain to force this
 order.
 
What if there are more then one disk?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 4/5] Inter-VM shared memory PCI device

2010-05-11 Thread Anthony Liguori

On 05/11/2010 02:59 AM, Avi Kivity wrote:

(Replying again to list)

What data structure would you use?  For a lockless ring queue, you 
can only support a single producer and consumer.  To achieve 
bidirectional communication in virtio, we always use two queues.



You don't have to use a lockless ring queue.  You can use locks 
(spinlocks without interrupt support, full mutexes with interrupts) 
and any data structure you like.  Say a hash table + LRU for a shared 
cache.


Yeah, the mailslot enables this.

I think the question boils down to whether we can support transparent 
peer connections and disconnections.  I think that's important in order 
to support transparent live migration.


If you have two peers that are disconnected and then connect to each 
other, there's simply no way to choose who's content gets preserved.  
It's necessary to designate one peer as a master in order to break the tie.


So this could simply involve an additional option to the shared memory 
driver: role=master|peer.  If role=master, when a new shared memory 
segment is mapped, the contents of the BAR ram is memcpy()'d to the 
shared memory segment.  In either case, the contents of the shared 
memory segment should be memcpy()'d to the BAR ram whenever the shared 
memory segment is disconnected.


I believe role=master should be default because I think a relationship 
of master/slave is going to be much more common than peering.




If you're adding additional queues to support other levels of 
communication, you can always use different areas of shared memory.


You'll need O(n^2) shared memory areas (n=peer count), and it is a lot 
less flexible that real shared memory.  Consider using threading where 
the only communication among threads is a pipe (erlang?)


I can't think of a use of multiple peers via shared memory today with 
virtualization.  I know lots of master/slave uses of shared memory 
though.  I agree that it's useful to support from an academic 
perspective but I don't believe it's going to be the common use.


Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: VMX: blocked-by-sti must not defer NMI injections

2010-05-11 Thread Jan Kiszka
As the processor may not consider GUEST_INTR_STATE_STI as a reason for
blocking NMI, it could return immediately with EXIT_REASON_NMI_WINDOW
when we asked for it. But as we consider this state as NMI-blocking, we
can run into an endless loop.

Resolve this by allowing NMI injection if just GUEST_INTR_STATE_STI is
active (originally suggested by Gleb). Intel confirmed that this is
safe, the processor will never complain about NMI injection in this
state.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
KVM-Stable-Tag
---
 arch/x86/kvm/vmx.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 777e00d..fa3959b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2824,8 +2824,7 @@ static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
return 0;
 
return  !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) 
-   (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS |
-   GUEST_INTR_STATE_NMI));
+   (GUEST_INTR_STATE_MOV_SS | GUEST_INTR_STATE_NMI));
 }
 
 static bool vmx_get_nmi_mask(struct kvm_vcpu *vcpu)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for May 11

2010-05-11 Thread Chris Wright
* Chris Wright (chr...@redhat.com) wrote:
 Please send in any agenda items you are interested in covering.
 
 If we have a lack of agenda items I'll cancel the week's call.

No agenda, so no call this week.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for May 11

2010-05-11 Thread Luiz Capitulino
On Mon, 10 May 2010 08:02:50 -0700
Chris Wright chr...@redhat.com wrote:

 Please send in any agenda items you are interested in covering.

- Exposing named errno in QMP errors

 (hope it's not too late)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: VMX: blocked-by-sti must not defer NMI injections

2010-05-11 Thread Gleb Natapov
On Tue, May 11, 2010 at 03:16:46PM +0200, Jan Kiszka wrote:
 As the processor may not consider GUEST_INTR_STATE_STI as a reason for
 blocking NMI, it could return immediately with EXIT_REASON_NMI_WINDOW
 when we asked for it. But as we consider this state as NMI-blocking, we
 can run into an endless loop.
 
 Resolve this by allowing NMI injection if just GUEST_INTR_STATE_STI is
 active (originally suggested by Gleb). Intel confirmed that this is
 safe, the processor will never complain about NMI injection in this
 state.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
Acked-by: Gleb Natapovg...@redhat.com

 KVM-Stable-Tag
 ---
  arch/x86/kvm/vmx.c |3 +--
  1 files changed, 1 insertions(+), 2 deletions(-)
 
 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 777e00d..fa3959b 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -2824,8 +2824,7 @@ static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
   return 0;
  
   return  !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) 
 - (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS |
 - GUEST_INTR_STATE_NMI));
 + (GUEST_INTR_STATE_MOV_SS | GUEST_INTR_STATE_NMI));
  }
  
  static bool vmx_get_nmi_mask(struct kvm_vcpu *vcpu)

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for May 11

2010-05-11 Thread Alexander Graf
Luiz Capitulino wrote:
 On Mon, 10 May 2010 08:02:50 -0700
 Chris Wright chr...@redhat.com wrote:

   
 Please send in any agenda items you are interested in covering.
 

 - Exposing named errno in QMP errors
   

- flush=on

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for May 11

2010-05-11 Thread Luiz Capitulino
On Tue, 11 May 2010 15:50:32 +0200
Alexander Graf ag...@suse.de wrote:

 Luiz Capitulino wrote:
  On Mon, 10 May 2010 08:02:50 -0700
  Chris Wright chr...@redhat.com wrote:
 

  Please send in any agenda items you are interested in covering.
  
 
  - Exposing named errno in QMP errors

 
 - flush=on

 We're doing it by irc right now, no need for the call then.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for May 11

2010-05-11 Thread Jes Sorensen
On 05/11/10 15:53, Luiz Capitulino wrote:
 On Tue, 11 May 2010 15:50:32 +0200
 Alexander Graf ag...@suse.de wrote:
 
 Luiz Capitulino wrote:
 On Mon, 10 May 2010 08:02:50 -0700
 Chris Wright chr...@redhat.com wrote:

   
 Please send in any agenda items you are interested in covering.
 

 - Exposing named errno in QMP errors
   

 - flush=on
 
  We're doing it by irc right now, no need for the call then.
 

Where?

Jes

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/2] fix kvmclock bug - memory corruption (v2)

2010-05-11 Thread Marcelo Tosatti
On Wed, May 05, 2010 at 03:19:19PM -0400, Glauber Costa wrote:
 Hi,
 
 Avi, in this patch, I am resetting all msrs upon CPU reset.
 Hope it is better. Patch 1 is yet another cleanup.
 
 Glauber Costa (2):
   change header for kvm_get_msr_list
   turn off kvmclock when resetting cpu

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for May 11

2010-05-11 Thread Luiz Capitulino
On Tue, 11 May 2010 15:57:10 +0200
Jes Sorensen jes.soren...@redhat.com wrote:

 On 05/11/10 15:53, Luiz Capitulino wrote:
  On Tue, 11 May 2010 15:50:32 +0200
  Alexander Graf ag...@suse.de wrote:
  
  Luiz Capitulino wrote:
  On Mon, 10 May 2010 08:02:50 -0700
  Chris Wright chr...@redhat.com wrote:
 

  Please send in any agenda items you are interested in covering.
  
 
  - Exposing named errno in QMP errors

 
  - flush=on
  
   We're doing it by irc right now, no need for the call then.
  
 
 Where?

 #virtualization on irc.oftc.net, it's too specific to QMP errors though,
but everyone is welcome to join of course.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 4/5] Inter-VM shared memory PCI device

2010-05-11 Thread Avi Kivity

On 05/11/2010 04:10 PM, Anthony Liguori wrote:

On 05/11/2010 02:59 AM, Avi Kivity wrote:

(Replying again to list)

What data structure would you use?  For a lockless ring queue, you 
can only support a single producer and consumer.  To achieve 
bidirectional communication in virtio, we always use two queues.



You don't have to use a lockless ring queue.  You can use locks 
(spinlocks without interrupt support, full mutexes with interrupts) 
and any data structure you like.  Say a hash table + LRU for a shared 
cache.


Yeah, the mailslot enables this.

I think the question boils down to whether we can support transparent 
peer connections and disconnections.  I think that's important in 
order to support transparent live migration.


If you have two peers that are disconnected and then connect to each 
other, there's simply no way to choose who's content gets preserved.  
It's necessary to designate one peer as a master in order to break the 
tie.


The master is the shared memory area.  It's a completely separate entity 
that is represented by the backing file (or shared memory server handing 
out the fd to mmap).  It can exists independently of any guest.




So this could simply involve an additional option to the shared memory 
driver: role=master|peer.  If role=master, when a new shared memory 
segment is mapped, the contents of the BAR ram is memcpy()'d to the 
shared memory segment.  In either case, the contents of the shared 
memory segment should be memcpy()'d to the BAR ram whenever the shared 
memory segment is disconnected.


I don't understand why we need separate BAR ram and shared memory.  Have 
just shared memory, exposed by the BAR when connected.  When the PCI 
card is disconnected from shared memory, the BAR should discard writes 
and return all 1s for reads.


Having a temporary RAM area while disconnected doesn't serve a purpose 
(since it exists only for a short while) and increases the RAM load.


I believe role=master should be default because I think a relationship 
of master/slave is going to be much more common than peering.


What if you have N guests?  What if the master disconnects?





If you're adding additional queues to support other levels of 
communication, you can always use different areas of shared memory.


You'll need O(n^2) shared memory areas (n=peer count), and it is a 
lot less flexible that real shared memory.  Consider using threading 
where the only communication among threads is a pipe (erlang?)


I can't think of a use of multiple peers via shared memory today with 
virtualization.  I know lots of master/slave uses of shared memory 
though.  I agree that it's useful to support from an academic 
perspective but I don't believe it's going to be the common use.


Large shared cache.  That use case even survives live migration if you 
use lockless algorithms.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 11/12] KVM: introduce new API for getting/switching dirty bitmaps

2010-05-11 Thread Marcelo Tosatti
On Tue, May 11, 2010 at 02:53:54PM +0900, Takuya Yoshikawa wrote:
 (2010/05/11 12:43), Marcelo Tosatti wrote:
 On Tue, May 04, 2010 at 10:08:21PM +0900, Takuya Yoshikawa wrote:
 +How to Get
 +
 +Before calling this, you have to set the slot member of kvm_user_dirty_log
 +to indicate the target memory slot.
 +
 +struct kvm_user_dirty_log {
 +   __u32 slot;
 +   __u32 flags;
 +   __u64 dirty_bitmap;
 +   __u64 dirty_bitmap_old;
 +};
 +
 +The addresses will be set in the paired members: dirty_bitmap and _old.
 
 Why not pass the bitmap address to the kernel, instead of having the
 kernel allocate it. Because apparently you plan to do that in a new
 generation anyway?
 
 Yes, we want to make qemu allocate and free bitmaps in the future.
 But currently, those are strictly tied with memory slot registration and
 changing it in one patch set is really difficult.
 
 Anyway, we need kernel side allocation mechanism to keep the current
 GET_DIRTY_LOG api. I don't mind not exporting kernel allocated bitmaps
 in this patch set and later introducing a bitmap registration mechanism
 in another patch set.
 
 As this RFC is suggesting, kernel side double buffering infrastructure is
 designed as general as possible and adding a new API like SWITCH can be done
 naturally.
 
 
 Also, why does the kernel need to know about different bitmaps?
 
 Because we need to support current GET API. We don't have any way to get
 a new bitmap in the case of GET and we don't want to do_mmap() every time
 for GET.
 
 
 One alternative would be:
 
 KVM_SWITCH_DIRTY_LOG passing the address of a bitmap. If the active
 bitmap was clean, it returns 0, no switch performed. If the active
 bitmap was dirty, the kernel switches to the new bitmap and returns 1.
 
 And the responsability of cleaning the new bitmap could also be left
 for userspace.
 
 
 That is a beautiful approach but we can do that only when we give up using
 GET api.
 
 
 I follow you and Avi's advice about that kind of maintenance policy!
 What do you think?

If you introduce a switch ioctl that frees the bitmap vmalloc'ed by the
current set_memory_region (if its not freed already), after pointing the
memslot to the user supplied one, it should be fine?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RESENT] fix 80000001.EDX supported bit filtering

2010-05-11 Thread Gleb Natapov
On AMD some bits from 1.EDX are reported in 8001.EDX. The mask used
to copy bits from 1.EDX to 8001.EDX is incorrect resulting in
unsupported features passed into a guest.

Signed-off-by: Gleb Natapov g...@redhat.com

Resent since Avi's email bounced first time.

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 76c1adb..6463390 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -111,7 +111,7 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, 
uint32_t function, int reg)
  * so add missing bits according to the AMD spec:
  */
 cpuid_1_edx = kvm_arch_get_supported_cpuid(env, 1, R_EDX);
-ret |= cpuid_1_edx  0xdfeff7ff;
+ret |= cpuid_1_edx  0x183f7ff;
 break;
 }
 break;
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 4/5] Inter-VM shared memory PCI device

2010-05-11 Thread Cam Macdonell
On Tue, May 11, 2010 at 8:03 AM, Avi Kivity a...@redhat.com wrote:
 On 05/11/2010 04:10 PM, Anthony Liguori wrote:

 On 05/11/2010 02:59 AM, Avi Kivity wrote:

 (Replying again to list)

 What data structure would you use?  For a lockless ring queue, you can
 only support a single producer and consumer.  To achieve bidirectional
 communication in virtio, we always use two queues.


 You don't have to use a lockless ring queue.  You can use locks
 (spinlocks without interrupt support, full mutexes with interrupts) and any
 data structure you like.  Say a hash table + LRU for a shared cache.

 Yeah, the mailslot enables this.

 I think the question boils down to whether we can support transparent peer
 connections and disconnections.  I think that's important in order to
 support transparent live migration.

 If you have two peers that are disconnected and then connect to each
 other, there's simply no way to choose who's content gets preserved.  It's
 necessary to designate one peer as a master in order to break the tie.

 The master is the shared memory area.  It's a completely separate entity
 that is represented by the backing file (or shared memory server handing out
 the fd to mmap).  It can exists independently of any guest.

I think the master/peer idea would be necessary if we were sharing
guest memory (sharing guest A's memory with guest B).  Then if the
master (guest A) dies, perhaps something needs to happen to preserve
the memory contents.  But since we're sharing host memory, the
applications in the guests can race to determine the master by
grabbing a lock at offset 0 or by using lowest VM ID.

Looking at it another way, it is the applications using shared memory
that may or may not need a master, the Qemu processes don't need the
concept of a master since the memory belongs to the host.



 So this could simply involve an additional option to the shared memory
 driver: role=master|peer.  If role=master, when a new shared memory segment
 is mapped, the contents of the BAR ram is memcpy()'d to the shared memory
 segment.  In either case, the contents of the shared memory segment should
 be memcpy()'d to the BAR ram whenever the shared memory segment is
 disconnected.

 I don't understand why we need separate BAR ram and shared memory.  Have
 just shared memory, exposed by the BAR when connected.  When the PCI card is
 disconnected from shared memory, the BAR should discard writes and return
 all 1s for reads.

 Having a temporary RAM area while disconnected doesn't serve a purpose
 (since it exists only for a short while) and increases the RAM load.

I agree with Avi here.  If a guest wants to view shared memory, then
it needs to stay connected.  I think being able to see the contents
(via the memcpy()) even though the guest is disconnected would be
confusing.


 I believe role=master should be default because I think a relationship of
 master/slave is going to be much more common than peering.

 What if you have N guests?  What if the master disconnects?



 If you're adding additional queues to support other levels of
 communication, you can always use different areas of shared memory.

 You'll need O(n^2) shared memory areas (n=peer count), and it is a lot
 less flexible that real shared memory.  Consider using threading where the
 only communication among threads is a pipe (erlang?)

 I can't think of a use of multiple peers via shared memory today with
 virtualization.  I know lots of master/slave uses of shared memory though.
  I agree that it's useful to support from an academic perspective but I
 don't believe it's going to be the common use.

 Large shared cache.  That use case even survives live migration if you use
 lockless algorithms.

 --
 error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 4/5] Inter-VM shared memory PCI device

2010-05-11 Thread Avi Kivity

On 05/11/2010 05:17 PM, Cam Macdonell wrote:



The master is the shared memory area.  It's a completely separate entity
that is represented by the backing file (or shared memory server handing out
the fd to mmap).  It can exists independently of any guest.
 

I think the master/peer idea would be necessary if we were sharing
guest memory (sharing guest A's memory with guest B).  Then if the
master (guest A) dies, perhaps something needs to happen to preserve
the memory contents.


Definitely.  But we aren't...


   But since we're sharing host memory, the
applications in the guests can race to determine the master by
grabbing a lock at offset 0 or by using lowest VM ID.

Looking at it another way, it is the applications using shared memory
that may or may not need a master, the Qemu processes don't need the
concept of a master since the memory belongs to the host.
   


Exactly.  Furthermore, even in a master/slave relationship, there will 
be different masters for different sub-areas, it would be a pity to 
expose all this in the hardware abstraction.  This way we have an 
external device, and PCI HBAs which connect to it - just like a 
multi-tailed SCSI disk.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.33.3: possible recursive locking detected

2010-05-11 Thread Greg KH
On Tue, May 11, 2010 at 09:33:50PM +1000, CaT wrote:
 On Wed, May 05, 2010 at 10:52:50AM +0800, Américo Wang wrote:
  On Wed, May 5, 2010 at 10:32 AM, Yong Zhang yong.zh...@windriver.com 
  wrote:
   On Tue, May 04, 2010 at 11:37:37AM +0300, Avi Kivity wrote:
   On 05/04/2010 10:03 AM, CaT wrote:
   I'm currently running 2.6.33.3 in a KVM instance emulating a core2duo
   on 1 cpu with virtio HDs running on top of a core2duo host running 
   2.6.33.3.
   qemu-kvm version 0.12.3.
  
   Can you try commit 6992f5334995af474c2b58d010d08bc597f0f2fe in the latest
   kernel?
  
  
  Hmm, 2.6.33 -stable has commit 846f99749ab68bbc7f75c74fec305de675b1a1bf?
  
  Actually, these 3 commits fixed it:
  
  6992f5334995af474c2b58d010d08bc597f0f2fe sysfs: Use one lockdep class
  per sysfs ttribute.
  a2db6842873c8e5a70652f278d469128cb52db70 sysfs: Only take active
  references on attributes.
  e72ceb8ccac5f770b3e696e09bb673dca7024b20 sysfs: Remove 
  sysfs_get/put_active_two
  
  However, there are many other patches needed to amend these, so I think
  it's not suitable for -stable to include, perhaps a revert of
  846f99749ab68bbc7f75c74fec305de675b1a1bf is better.
 
 Slightly at a loss as to what to do, now. It's a virt instance so I can
 apply patches at will but, well, clarity is good. :)

Just ignore the lockdep warnings as they are bogus, or turn them off, or
use .34-rc7, as they are resolved there.

thanks,

greg k-h
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 4/5] Inter-VM shared memory PCI device

2010-05-11 Thread Anthony Liguori

On 05/11/2010 09:53 AM, Avi Kivity wrote:

On 05/11/2010 05:17 PM, Cam Macdonell wrote:


The master is the shared memory area.  It's a completely separate 
entity
that is represented by the backing file (or shared memory server 
handing out

the fd to mmap).  It can exists independently of any guest.

I think the master/peer idea would be necessary if we were sharing
guest memory (sharing guest A's memory with guest B).  Then if the
master (guest A) dies, perhaps something needs to happen to preserve
the memory contents.


Definitely.  But we aren't...


Then transparent live migration is impossible.  IMHO, that's a 
fundamental mistake that we will regret down the road.



   But since we're sharing host memory, the
applications in the guests can race to determine the master by
grabbing a lock at offset 0 or by using lowest VM ID.

Looking at it another way, it is the applications using shared memory
that may or may not need a master, the Qemu processes don't need the
concept of a master since the memory belongs to the host.


Exactly.  Furthermore, even in a master/slave relationship, there will 
be different masters for different sub-areas, it would be a pity to 
expose all this in the hardware abstraction.  This way we have an 
external device, and PCI HBAs which connect to it - just like a 
multi-tailed SCSI disk.


To support transparent live migration, it's necessary to do two things:

1) Preserve the memory contents of the PCI BAR after disconnected from a 
shared memory segment
2) Synchronize any changes made to the PCI BAR with the shared memory 
segment upon reconnect/initial connection.


N.B. savevm/loadvm both constitute disconnect and reconnect events 
respectively.


Supporting (1) is easy since we just need to memcpy() the contents of 
the shared memory segment to a temporary RAM area upon disconnect.


Supporting (2) is easy when the shared memory segment is viewed as owned 
by the guest since it has the definitive copy of the data.  IMHO, this 
is what role=master means.  However, if we want to support a model where 
the guest does not have a definitive copy of the data, upon reconnect, 
we need to throw away the guest's changes and make the shared memory 
segment appear to simultaneously update to the guest.  This is what 
role=peer means.


For role=peer, it's necessary to signal to the guest when it's not 
connected.  This means prior to savevm it's necessary to indicate to the 
guest that it's been disconnected.


I think it's important that we build this mechanism in from the start 
because as I've stated in the past, I don't think role=peer is going to 
be the dominant use-case.  I actually don't think that shared memory 
between guests is all that interesting compared to shared memory to an 
external process on the host.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 0/12] KVM, x86, ppc, asm-generic: moving dirty bitmaps to user space

2010-05-11 Thread Alexander Graf
Takuya Yoshikawa wrote:
 Hi, sorry for sending from my personal account.
 The following series are all from me:

   From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp

   The 3rd version of moving dirty bitmaps to user space.

 From this version, we add x86 and ppc and asm-generic people to CC lists.


 [To KVM people]

 Sorry for being late to reply your comments.

 Avi,
  - I've wrote an answer to your question in patch 5/12: drivers/vhost/vhost.c 
 .

  - I've considered to change the set_bit_user_non_atomic to an inline 
 function,
but did not change because the other helpers in the uaccess.h are written 
 as
macros. Anyway, I hope that x86 people will give us appropriate suggestions
about this.

  - I thought that documenting about making bitmaps 64-bit aligned will be
written when we add an API to register user-allocated bitmaps. So probably
in the next series.

 Avi, Alex,
  - Could you check the ia64 and ppc parts, please? I tried to keep the logical
changes as small as possible.

I personally tried to build these with cross compilers. For ia64, I could 
 check
build success with my patch series. But book3s, even without my patch 
 series,
it failed with the following errors:

   arch/powerpc/kvm/book3s_paired_singles.c: In function 
 'kvmppc_emulate_paired_single':
   arch/powerpc/kvm/book3s_paired_singles.c:1289: error: the frame size of 
 2288 bytes is larger than 2048 bytes
   make[1]: *** [arch/powerpc/kvm/book3s_paired_singles.o] Error 1
   make: *** [arch/powerpc/kvm] Error 2
   

This is bad. I haven't encountered that one at all so far, but I guess
my compiler version is different from yours. Sigh.


 About changelog: there are two main changes from the 2nd version:
   1. I changed the treatment of clean slots (see patch 1/12).
  This was already applied today, thanks!
   2. I changed the switch API. (see patch 11/12).

 To show this API's advantage, I also did a test (see the end of this mail).


 [To x86 people]

 Hi, Thomas, Ingo, Peter,

 Please review the patches 4,5/12. Because this is the first experience for
 me to send patches to x86, please tell me if this lacks anything.


 [To ppc people]

 Hi, Benjamin, Paul, Alex,

 Please see the patches 6,7/12. I first say sorry for that I've not tested 
 these
 yet. In that sense, these may not be in the quality for precise reviews. But I
 will be happy if you would give me any comments.

 Alex, could you help me? Though I have a plan to get PPC box in the future,
 currently I cannot test these.
   

Could you please point me to a git tree where everything's readily
applied? That would make testing a lot easier.

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [qemu-kvm tests PATCH v3] qemu-kvm tests: fix linker script problem

2010-05-11 Thread Marcelo Tosatti
On Wed, May 05, 2010 at 08:28:15PM +0300, Naphtali Sprei wrote:
 
 commit 848bd0c89c83814023cf51c72effdbc7de0d18b7 causes the linker script
 itself (flat.lds) to become part of the linked objects, which messed
 the output file, specifically, the symbol edata is not the last symbol
 anymore.
 
 changes v2 - v3
 
 Instead of using a separate rule, which doesn't really adds the flat.lds to 
 the prerequisite
 list, use Avi suggestion of filtering.
 
 Signed-off-by: Naphtali Sprei nsp...@redhat.com
 ---
  kvm/user/config-x86-common.mak |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix segfault after device assignment hot remove

2010-05-11 Thread Marcelo Tosatti
On Thu, May 06, 2010 at 12:58:12PM -0600, Alex Williamson wrote:
 We keep a qlist of assigned devices for irq updates, but we forgot to
 remove entries from it if they're hot unplugged.  This makes
 assigned_dev_update_irqs() a timebomb that goes off when the guest is
 rebooted.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---

Applied to master and stable-0.12, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 7/12 not tested yet] PPC: introduce __set_bit() like function for bitmaps in user space

2010-05-11 Thread Alexander Graf
Takuya Yoshikawa wrote:
 During the work of KVM's dirty page logging optimization, we encountered
 the need of manipulating bitmaps in user space efficiantly. To achive this,
 we introduce a uaccess function for setting a bit in user space following
 Avi's suggestion.

   KVM is now using dirty bitmaps for live-migration and VGA. Although we need
   to update them from kernel side, copying them every time for updating the
   dirty log is a big bottleneck. Especially, we tested that zero-copy bitmap
   manipulation improves responses of GUI manipulations a lot.

 We also found one similar need in drivers/vhost/vhost.c in which the author
 implemented set_bit_to_user() locally using inefficient functions: see TODO
 at the top of that.

 Probably, this kind of need would be common for virtualization area.

 So we introduce a function set_bit_user_non_atomic().

 Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
 Signed-off-by: Fernando Luis Vazquez Cao ferna...@oss.ntt.co.jp
 CC: Alexander Graf ag...@suse.de
 CC: Benjamin Herrenschmidt b...@kernel.crashing.org
 CC: Paul Mackerras pau...@samba.org
 ---
  arch/powerpc/include/asm/uaccess.h |   19 +++
  1 files changed, 19 insertions(+), 0 deletions(-)

 diff --git a/arch/powerpc/include/asm/uaccess.h 
 b/arch/powerpc/include/asm/uaccess.h
 index 3a01ce8..f878326 100644
 --- a/arch/powerpc/include/asm/uaccess.h
 +++ b/arch/powerpc/include/asm/uaccess.h
 @@ -321,6 +321,25 @@ do { 
 \
   __gu_err;   \
  })
  
 +static inline int set_bit_user_non_atomic(int nr, void __user *addr)
 +{
 + u8 __user *p;
 + u8 val;
 +
 + p = (u8 __user *)((unsigned long)addr + nr / BITS_PER_BYTE);
   

Does C do the + or the / first? Either way, I'd like to see brackets here :)


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/9] pvclock misc fixes - v4.

2010-05-11 Thread Glauber Costa
This is the fourth version ov kvmclock fixes.

Just two minor changes in patch 5, per avi request, and
the addition of cpuid.txt file, documenting all cpuid flags
we use.

As a side effect, this patch removes the time-travel feature
in kvm guests.

Glauber Costa (9):
  Enable pvclock flags in vcpu_time_info structure
  Add a global synchronization point for pvclock
  change msr numbers for kvmclock
  add new KVMCLOCK cpuid feature
  export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID
  Try using new kvm clock msrs
  don't compute pvclock adjustments if we trust the tsc
  Tell the guest we'll warn it about tsc stability
  Add cpuid.txt file

 Documentation/kvm/cpuid.txt|   44 
 arch/x86/include/asm/kvm_para.h|   13 
 arch/x86/include/asm/pvclock-abi.h |4 ++-
 arch/x86/include/asm/pvclock.h |1 +
 arch/x86/kernel/kvmclock.c |   56 ++-
 arch/x86/kernel/pvclock.c  |   37 +++
 arch/x86/kvm/x86.c |   44 +++-
 7 files changed, 176 insertions(+), 23 deletions(-)
 create mode 100644 Documentation/kvm/cpuid.txt

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/9] Enable pvclock flags in vcpu_time_info structure

2010-05-11 Thread Glauber Costa
This patch removes one padding byte and transform it into a flags
field. New versions of guests using pvclock will query these flags
upon each read.

Flags, however, will only be interpreted when the guest decides to.
It uses the pvclock_valid_flags function to signal that a specific
set of flags should be taken into consideration. Which flags are valid
are usually devised via HV negotiation.

Signed-off-by: Glauber Costa glom...@redhat.com
CC: Jeremy Fitzhardinge jer...@goop.org
---
 arch/x86/include/asm/pvclock-abi.h |3 ++-
 arch/x86/include/asm/pvclock.h |1 +
 arch/x86/kernel/pvclock.c  |9 +
 3 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/pvclock-abi.h 
b/arch/x86/include/asm/pvclock-abi.h
index 6d93508..ec5c41a 100644
--- a/arch/x86/include/asm/pvclock-abi.h
+++ b/arch/x86/include/asm/pvclock-abi.h
@@ -29,7 +29,8 @@ struct pvclock_vcpu_time_info {
u64   system_time;
u32   tsc_to_system_mul;
s8tsc_shift;
-   u8pad[3];
+   u8flags;
+   u8pad[2];
 } __attribute__((__packed__)); /* 32 bytes */
 
 struct pvclock_wall_clock {
diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index 53235fd..cd02f32 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -6,6 +6,7 @@
 
 /* some helper functions for xen and kvm pv clock sources */
 cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src);
+void pvclock_set_flags(u8 flags);
 unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src);
 void pvclock_read_wallclock(struct pvclock_wall_clock *wall,
struct pvclock_vcpu_time_info *vcpu,
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 03801f2..f7fdd56 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -31,8 +31,16 @@ struct pvclock_shadow_time {
u32 tsc_to_nsec_mul;
int tsc_shift;
u32 version;
+   u8  flags;
 };
 
+static u8 valid_flags __read_mostly = 0;
+
+void pvclock_set_flags(u8 flags)
+{
+   valid_flags = flags;
+}
+
 /*
  * Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction,
  * yielding a 64-bit result.
@@ -91,6 +99,7 @@ static unsigned pvclock_get_time_values(struct 
pvclock_shadow_time *dst,
dst-system_timestamp  = src-system_time;
dst-tsc_to_nsec_mul   = src-tsc_to_system_mul;
dst-tsc_shift = src-tsc_shift;
+   dst-flags = src-flags;
rmb();  /* test version after fetching data */
} while ((src-version  1) || (dst-version != src-version));
 
-- 
1.6.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/9] export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID

2010-05-11 Thread Glauber Costa
Right now, we were using individual KVM_CAP entities to communicate
userspace about which cpuids we support. This is suboptimal, since it
generates a delay between the feature arriving in the host, and
being available at the guest.

A much better mechanism is to list para features in KVM_GET_SUPPORTED_CPUID.
This makes userspace automatically aware of what we provide. And if we
ever add a new cpuid bit in the future, we have to do that again,
which create some complexity and delay in feature adoption.

Signed-off-by: Glauber Costa glom...@redhat.com
---
 arch/x86/kvm/x86.c |   34 ++
 1 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index eb84947..a51b560 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1971,6 +1971,23 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, 
u32 function,
}
break;
}
+   case KVM_CPUID_SIGNATURE: {
+   char signature[12] = KVMKVMKVM\0\0;
+   u32 *sigptr = (u32 *)signature;
+   entry-eax = 0;
+   entry-ebx = sigptr[0];
+   entry-ecx = sigptr[1];
+   entry-edx = sigptr[2];
+   break;
+   }
+   case KVM_CPUID_FEATURES:
+   entry-eax = (1  KVM_FEATURE_CLOCKSOURCE) |
+(1  KVM_FEATURE_NOP_IO_DELAY) |
+(1  KVM_FEATURE_CLOCKSOURCE2);
+   entry-ebx = 0;
+   entry-ecx = 0;
+   entry-edx = 0;
+   break;
case 0x8000:
entry-eax = min(entry-eax, 0x801a);
break;
@@ -2017,6 +2034,23 @@ static int kvm_dev_ioctl_get_supported_cpuid(struct 
kvm_cpuid2 *cpuid,
for (func = 0x8001; func = limit  nent  cpuid-nent; ++func)
do_cpuid_ent(cpuid_entries[nent], func, 0,
 nent, cpuid-nent);
+
+   
+
+   r = -E2BIG;
+   if (nent = cpuid-nent)
+   goto out_free;
+
+   do_cpuid_ent(cpuid_entries[nent], KVM_CPUID_SIGNATURE, 0, nent,
+cpuid-nent);
+
+   r = -E2BIG;
+   if (nent = cpuid-nent)
+   goto out_free;
+
+   do_cpuid_ent(cpuid_entries[nent], KVM_CPUID_FEATURES, 0, nent,
+cpuid-nent);
+
r = -E2BIG;
if (nent = cpuid-nent)
goto out_free;
-- 
1.6.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/9] Try using new kvm clock msrs

2010-05-11 Thread Glauber Costa
We now added a new set of clock-related msrs in replacement of the old
ones. In theory, we could just try to use them and get a return value
indicating they do not exist, due to our use of kvm_write_msr_save.

However, kvm clock registration happens very early, and if we ever
try to write to a non-existant MSR, we raise a lethal #GP, since our
idt handlers are not in place yet.

So this patch tests for a cpuid feature exported by the host to
decide which set of msrs are supported.

Signed-off-by: Glauber Costa glom...@redhat.com
---
 arch/x86/kernel/kvmclock.c |   53 ++-
 1 files changed, 32 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index feaeb0d..59c740f 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -29,6 +29,8 @@
 #define KVM_SCALE 22
 
 static int kvmclock = 1;
+static int msr_kvm_system_time = MSR_KVM_SYSTEM_TIME;
+static int msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK;
 
 static int parse_no_kvmclock(char *arg)
 {
@@ -54,7 +56,8 @@ static unsigned long kvm_get_wallclock(void)
 
low = (int)__pa_symbol(wall_clock);
high = ((u64)__pa_symbol(wall_clock)  32);
-   native_write_msr(MSR_KVM_WALL_CLOCK, low, high);
+
+   native_write_msr(msr_kvm_wall_clock, low, high);
 
vcpu_time = get_cpu_var(hv_clock);
pvclock_read_wallclock(wall_clock, vcpu_time, ts);
@@ -130,7 +133,8 @@ static int kvm_register_clock(char *txt)
high = ((u64)__pa(per_cpu(hv_clock, cpu))  32);
printk(KERN_INFO kvm-clock: cpu %d, msr %x:%x, %s\n,
   cpu, high, low, txt);
-   return native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high);
+
+   return native_write_msr_safe(msr_kvm_system_time, low, high);
 }
 
 #ifdef CONFIG_X86_LOCAL_APIC
@@ -165,14 +169,14 @@ static void __init kvm_smp_prepare_boot_cpu(void)
 #ifdef CONFIG_KEXEC
 static void kvm_crash_shutdown(struct pt_regs *regs)
 {
-   native_write_msr_safe(MSR_KVM_SYSTEM_TIME, 0, 0);
+   native_write_msr(msr_kvm_system_time, 0, 0);
native_machine_crash_shutdown(regs);
 }
 #endif
 
 static void kvm_shutdown(void)
 {
-   native_write_msr_safe(MSR_KVM_SYSTEM_TIME, 0, 0);
+   native_write_msr(msr_kvm_system_time, 0, 0);
native_machine_shutdown();
 }
 
@@ -181,27 +185,34 @@ void __init kvmclock_init(void)
if (!kvm_para_available())
return;
 
-   if (kvmclock  kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)) {
-   if (kvm_register_clock(boot clock))
-   return;
-   pv_time_ops.sched_clock = kvm_clock_read;
-   x86_platform.calibrate_tsc = kvm_get_tsc_khz;
-   x86_platform.get_wallclock = kvm_get_wallclock;
-   x86_platform.set_wallclock = kvm_set_wallclock;
+   if (kvmclock  kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE2)) {
+   msr_kvm_system_time = MSR_KVM_SYSTEM_TIME_NEW;
+   msr_kvm_wall_clock = MSR_KVM_WALL_CLOCK_NEW;
+   } else if (!(kvmclock  kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE)))
+   return;
+
+   printk(KERN_INFO kvm-clock: Using msrs %x and %x,
+   msr_kvm_system_time, msr_kvm_wall_clock);
+
+   if (kvm_register_clock(boot clock))
+   return;
+   pv_time_ops.sched_clock = kvm_clock_read;
+   x86_platform.calibrate_tsc = kvm_get_tsc_khz;
+   x86_platform.get_wallclock = kvm_get_wallclock;
+   x86_platform.set_wallclock = kvm_set_wallclock;
 #ifdef CONFIG_X86_LOCAL_APIC
-   x86_cpuinit.setup_percpu_clockev =
-   kvm_setup_secondary_clock;
+   x86_cpuinit.setup_percpu_clockev =
+   kvm_setup_secondary_clock;
 #endif
 #ifdef CONFIG_SMP
-   smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
+   smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
 #endif
-   machine_ops.shutdown  = kvm_shutdown;
+   machine_ops.shutdown  = kvm_shutdown;
 #ifdef CONFIG_KEXEC
-   machine_ops.crash_shutdown  = kvm_crash_shutdown;
+   machine_ops.crash_shutdown  = kvm_crash_shutdown;
 #endif
-   kvm_get_preset_lpj();
-   clocksource_register(kvm_clock);
-   pv_info.paravirt_enabled = 1;
-   pv_info.name = KVM;
-   }
+   kvm_get_preset_lpj();
+   clocksource_register(kvm_clock);
+   pv_info.paravirt_enabled = 1;
+   pv_info.name = KVM;
 }
-- 
1.6.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 9/9] Add cpuid.txt file

2010-05-11 Thread Glauber Costa
This file documents cpuid bits used by KVM.

Signed-off-by: Glauber Costa glom...@redhat.com
---
 Documentation/kvm/cpuid.txt |   44 +++
 1 files changed, 44 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/kvm/cpuid.txt

diff --git a/Documentation/kvm/cpuid.txt b/Documentation/kvm/cpuid.txt
new file mode 100644
index 000..ca1e89f
--- /dev/null
+++ b/Documentation/kvm/cpuid.txt
@@ -0,0 +1,44 @@
+KVM CPUID bits
+Glauber Costa glom...@redhat.com, Red Hat Inc, 2010
+=
+
+A guest running on a kvm host, can check some of its features using
+cpuid. This is not always guaranteed to work, since userspace can
+mask-out some, or even all KVM-related cpuid features before launching
+a guest.
+
+KVM cpuid functions are:
+
+function: KVM_CPUID_SIGNATURE (0x4000)
+returns : eax = 0,
+  ebx = 0x4b4d564b,
+  ecx = 0x564b4d56,
+  edx = 0x4d.
+Note that this value in ebx, ecx and edx corresponds to the string KVMKVMKVM.
+This function queries the presence of KVM cpuid leafs.
+
+
+function: define KVM_CPUID_FEATURES (0x4001)
+returns : ebx, ecx, edx = 0
+  eax = and OR'ed group of (1  flag), where each flags is:
+
+
+flag   || value || meaning
+=
+KVM_FEATURE_CLOCKSOURCE|| 0 || kvmclock available at msrs
+   ||   || 0x11 and 0x12. 
+--
+KVM_FEATURE_NOP_IO_DELAY   || 1 || not necessary to perform delays
+   ||   || on PIO operations.
+--
+KVM_FEATURE_MMU_OP || 2 || deprecated.
+--
+KVM_FEATURE_CLOCKSOURCE2   || 3 || kvmclock available at msrs
+   ||   || 0x4b564d00 and 0x4b564d01
+--
+KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will wanr if no guest-side
+   ||   || per-cpu warps are expected in
+   ||   || kvmclock.
+--
+
+
-- 
1.6.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/9] Tell the guest we'll warn it about tsc stability

2010-05-11 Thread Glauber Costa
This patch puts up the flag that tells the guest that we'll warn it
about the tsc being trustworthy or not. By now, we also say
it is not.
---
 arch/x86/kvm/x86.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a51b560..7922401 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -855,6 +855,8 @@ static void kvm_write_guest_time(struct kvm_vcpu *v)
vcpu-hv_clock.system_time = ts.tv_nsec +
 (NSEC_PER_SEC * (u64)ts.tv_sec) + 
v-kvm-arch.kvmclock_offset;
 
+   vcpu-hv_clock.flags = 0;
+
/*
 * The interface expects us to write an even number signaling that the
 * update is finished. Since the guest won't see the intermediate
@@ -1983,7 +1985,8 @@ static void do_cpuid_ent(struct kvm_cpuid_entry2 *entry, 
u32 function,
case KVM_CPUID_FEATURES:
entry-eax = (1  KVM_FEATURE_CLOCKSOURCE) |
 (1  KVM_FEATURE_NOP_IO_DELAY) |
-(1  KVM_FEATURE_CLOCKSOURCE2);
+(1  KVM_FEATURE_CLOCKSOURCE2) |
+(1  KVM_FEATURE_CLOCKSOURCE_STABLE_BIT);
entry-ebx = 0;
entry-ecx = 0;
entry-edx = 0;
-- 
1.6.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/9] don't compute pvclock adjustments if we trust the tsc

2010-05-11 Thread Glauber Costa
If the HV told us we can fully trust the TSC, skip any
correction

Signed-off-by: Glauber Costa glom...@redhat.com
---
 arch/x86/include/asm/kvm_para.h|5 +
 arch/x86/include/asm/pvclock-abi.h |1 +
 arch/x86/kernel/kvmclock.c |3 +++
 arch/x86/kernel/pvclock.c  |4 
 4 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index f019f8c..05eba5e 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -21,6 +21,11 @@
  */
 #define KVM_FEATURE_CLOCKSOURCE23
 
+/* The last 8 bits are used to indicate how to interpret the flags field
+ * in pvclock structure. If no bits are set, all flags are ignored.
+ */
+#define KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24
+
 #define MSR_KVM_WALL_CLOCK  0x11
 #define MSR_KVM_SYSTEM_TIME 0x12
 
diff --git a/arch/x86/include/asm/pvclock-abi.h 
b/arch/x86/include/asm/pvclock-abi.h
index ec5c41a..35f2d19 100644
--- a/arch/x86/include/asm/pvclock-abi.h
+++ b/arch/x86/include/asm/pvclock-abi.h
@@ -39,5 +39,6 @@ struct pvclock_wall_clock {
u32   nsec;
 } __attribute__((__packed__));
 
+#define PVCLOCK_TSC_STABLE_BIT (1  0)
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_X86_PVCLOCK_ABI_H */
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 59c740f..eb9b76c 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -215,4 +215,7 @@ void __init kvmclock_init(void)
clocksource_register(kvm_clock);
pv_info.paravirt_enabled = 1;
pv_info.name = KVM;
+
+   if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT))
+   pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
 }
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index f5bc40e..239427c 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -135,6 +135,10 @@ cycle_t pvclock_clocksource_read(struct 
pvclock_vcpu_time_info *src)
barrier();
} while (version != src-version);
 
+   if ((valid_flags  PVCLOCK_TSC_STABLE_BIT) 
+   (shadow.flags  PVCLOCK_TSC_STABLE_BIT))
+   return ret;
+
/*
 * Assumption here is that last_value, a global accumulator, always goes
 * forward. If we are less than that, we should not be much smaller.
-- 
1.6.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/9] add new KVMCLOCK cpuid feature

2010-05-11 Thread Glauber Costa
This cpuid, KVM_CPUID_CLOCKSOURCE2, will indicate to the guest
that kvmclock is available through a new set of MSRs. The old ones
are deprecated.

Signed-off-by: Glauber Costa glom...@redhat.com
---
 arch/x86/include/asm/kvm_para.h |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 9734808..f019f8c 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -16,6 +16,10 @@
 #define KVM_FEATURE_CLOCKSOURCE0
 #define KVM_FEATURE_NOP_IO_DELAY   1
 #define KVM_FEATURE_MMU_OP 2
+/* This indicates that the new set of kvmclock msrs
+ * are available. The use of 0x11 and 0x12 is deprecated
+ */
+#define KVM_FEATURE_CLOCKSOURCE23
 
 #define MSR_KVM_WALL_CLOCK  0x11
 #define MSR_KVM_SYSTEM_TIME 0x12
-- 
1.6.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/9] change msr numbers for kvmclock

2010-05-11 Thread Glauber Costa
Avi pointed out a while ago that those MSRs falls into the pentium
PMU range. So the idea here is to add new ones, and after a while,
deprecate the old ones.

Signed-off-by: Glauber Costa glom...@redhat.com
---
 arch/x86/include/asm/kvm_para.h |4 
 arch/x86/kvm/x86.c  |7 ++-
 2 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index ffae142..9734808 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -20,6 +20,10 @@
 #define MSR_KVM_WALL_CLOCK  0x11
 #define MSR_KVM_SYSTEM_TIME 0x12
 
+/* Custom MSRs falls in the range 0x4b564d00-0x4b564dff */
+#define MSR_KVM_WALL_CLOCK_NEW  0x4b564d00
+#define MSR_KVM_SYSTEM_TIME_NEW 0x4b564d01
+
 #define KVM_MAX_MMU_OP_BATCH   32
 
 /* Operations for KVM_HC_MMU_OP */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6b2ce1d..eb84947 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -664,9 +664,10 @@ static inline u32 bit(int bitno)
  * kvm-specific. Those are put in the beginning of the list.
  */
 
-#define KVM_SAVE_MSRS_BEGIN5
+#define KVM_SAVE_MSRS_BEGIN7
 static u32 msrs_to_save[] = {
MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
+   MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW,
HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL,
HV_X64_MSR_APIC_ASSIST_PAGE,
MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP,
@@ -1192,10 +1193,12 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 data)
case MSR_IA32_MISC_ENABLE:
vcpu-arch.ia32_misc_enable_msr = data;
break;
+   case MSR_KVM_WALL_CLOCK_NEW:
case MSR_KVM_WALL_CLOCK:
vcpu-kvm-arch.wall_clock = data;
kvm_write_wall_clock(vcpu-kvm, data);
break;
+   case MSR_KVM_SYSTEM_TIME_NEW:
case MSR_KVM_SYSTEM_TIME: {
if (vcpu-arch.time_page) {
kvm_release_page_dirty(vcpu-arch.time_page);
@@ -1467,9 +1470,11 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 *pdata)
data = vcpu-arch.efer;
break;
case MSR_KVM_WALL_CLOCK:
+   case MSR_KVM_WALL_CLOCK_NEW:
data = vcpu-kvm-arch.wall_clock;
break;
case MSR_KVM_SYSTEM_TIME:
+   case MSR_KVM_SYSTEM_TIME_NEW:
data = vcpu-arch.time;
break;
case MSR_IA32_P5_MC_ADDR:
-- 
1.6.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/9] Add a global synchronization point for pvclock

2010-05-11 Thread Glauber Costa
In recent stress tests, it was found that pvclock-based systems
could seriously warp in smp systems. Using ingo's time-warp-test.c,
I could trigger a scenario as bad as 1.5mi warps a minute in some systems.
(to be fair, it wasn't that bad in most of them). Investigating further, I
found out that such warps were caused by the very offset-based calculation
pvclock is based on.

This happens even on some machines that report constant_tsc in its tsc flags,
specially on multi-socket ones.

Two reads of the same kernel timestamp at approx the same time, will likely
have tsc timestamped in different occasions too. This means the delta we
calculate is unpredictable at best, and can probably be smaller in a cpu
that is legitimately reading clock in a forward ocasion.

Some adjustments on the host could make this window less likely to happen,
but still, it pretty much poses as an intrinsic problem of the mechanism.

A while ago, I though about using a shared variable anyway, to hold clock
last state, but gave up due to the high contention locking was likely
to introduce, possibly rendering the thing useless on big machines. I argue,
however, that locking is not necessary.

We do a read-and-return sequence in pvclock, and between read and return,
the global value can have changed. However, it can only have changed
by means of an addition of a positive value. So if we detected that our
clock timestamp is less than the current global, we know that we need to
return a higher one, even though it is not exactly the one we compared to.

OTOH, if we detect we're greater than the current time source, we atomically
replace the value with our new readings. This do causes contention on big
boxes (but big here means *BIG*), but it seems like a good trade off, since
it provide us with a time source guaranteed to be stable wrt time warps.

After this patch is applied, I don't see a single warp in time during 5 days
of execution, in any of the machines I saw them before.

Signed-off-by: Glauber Costa glom...@redhat.com
CC: Jeremy Fitzhardinge jer...@goop.org
CC: Avi Kivity a...@redhat.com
CC: Marcelo Tosatti mtosa...@redhat.com
CC: Zachary Amsden zams...@redhat.com
---
 arch/x86/kernel/pvclock.c |   24 
 1 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index f7fdd56..f5bc40e 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -118,11 +118,14 @@ unsigned long pvclock_tsc_khz(struct 
pvclock_vcpu_time_info *src)
return pv_tsc_khz;
 }
 
+static atomic64_t last_value = ATOMIC64_INIT(0);
+
 cycle_t pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
 {
struct pvclock_shadow_time shadow;
unsigned version;
cycle_t ret, offset;
+   u64 last;
 
do {
version = pvclock_get_time_values(shadow, src);
@@ -132,6 +135,27 @@ cycle_t pvclock_clocksource_read(struct 
pvclock_vcpu_time_info *src)
barrier();
} while (version != src-version);
 
+   /*
+* Assumption here is that last_value, a global accumulator, always goes
+* forward. If we are less than that, we should not be much smaller.
+* We assume there is an error marging we're inside, and then the 
correction
+* does not sacrifice accuracy.
+*
+* For reads: global may have changed between test and return,
+* but this means someone else updated poked the clock at a later time.
+* We just need to make sure we are not seeing a backwards event.
+*
+* For updates: last_value = ret is not enough, since two vcpus could be
+* updating at the same time, and one of them could be slightly behind,
+* making the assumption that last_value always go forward fail to hold.
+*/
+   last = atomic64_read(last_value);
+   do {
+   if (ret  last)
+   return last;
+   last = atomic64_cmpxchg(last_value, last, ret);
+   } while (unlikely(last != ret));
+
return ret;
 }
 
-- 
1.6.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 4/5] Inter-VM shared memory PCI device

2010-05-11 Thread Cam Macdonell
On Tue, May 11, 2010 at 9:51 AM, Anthony Liguori anth...@codemonkey.ws wrote:
 On 05/11/2010 09:53 AM, Avi Kivity wrote:

 On 05/11/2010 05:17 PM, Cam Macdonell wrote:

 The master is the shared memory area.  It's a completely separate entity
 that is represented by the backing file (or shared memory server handing
 out
 the fd to mmap).  It can exists independently of any guest.

 I think the master/peer idea would be necessary if we were sharing
 guest memory (sharing guest A's memory with guest B).  Then if the
 master (guest A) dies, perhaps something needs to happen to preserve
 the memory contents.

 Definitely.  But we aren't...

 Then transparent live migration is impossible.  IMHO, that's a fundamental
 mistake that we will regret down the road.

   But since we're sharing host memory, the
 applications in the guests can race to determine the master by
 grabbing a lock at offset 0 or by using lowest VM ID.

 Looking at it another way, it is the applications using shared memory
 that may or may not need a master, the Qemu processes don't need the
 concept of a master since the memory belongs to the host.

 Exactly.  Furthermore, even in a master/slave relationship, there will be
 different masters for different sub-areas, it would be a pity to expose all
 this in the hardware abstraction.  This way we have an external device, and
 PCI HBAs which connect to it - just like a multi-tailed SCSI disk.

 To support transparent live migration, it's necessary to do two things:

 1) Preserve the memory contents of the PCI BAR after disconnected from a
 shared memory segment
 2) Synchronize any changes made to the PCI BAR with the shared memory
 segment upon reconnect/initial connection.

 N.B. savevm/loadvm both constitute disconnect and reconnect events
 respectively.

 Supporting (1) is easy since we just need to memcpy() the contents of the
 shared memory segment to a temporary RAM area upon disconnect.

 Supporting (2) is easy when the shared memory segment is viewed as owned by
 the guest since it has the definitive copy of the data.  IMHO, this is what
 role=master means.  However, if we want to support a model where the guest
 does not have a definitive copy of the data, upon reconnect, we need to
 throw away the guest's changes and make the shared memory segment appear to
 simultaneously update to the guest.  This is what role=peer means.

 For role=peer, it's necessary to signal to the guest when it's not
 connected.  This means prior to savevm it's necessary to indicate to the
 guest that it's been disconnected.

 I think it's important that we build this mechanism in from the start
 because as I've stated in the past, I don't think role=peer is going to be
 the dominant use-case.  I actually don't think that shared memory between
 guests is all that interesting compared to shared memory to an external
 process on the host.


Most of the people I hear from who are using my patch are using a peer
model to share data between applications (simulations, JVMs, etc).
But guest-to-host applications work as well of course.

I think transparent migration can be achieved by making the
connected/disconnected state transparent to the application.

When using the shared memory server, the server has to be setup anyway
on the new host and copying the memory region could be part of that as
well if the application needs the contents preserved.  I don't think
it has to be handled by the savevm/loadvm operations.  There's little
difference between naming one VM the master or letting the shared
memory server act like a master.

I think abstractions on top of shared memory could handle
disconnection issues (sort of how TCP handles them for networks) if
the application needs it.  Again, my opinion is to leave it to the
application to decide what it necessary.

Cam
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] VMX: Fix and improve guest state validity checks

2010-05-11 Thread Mohammed Gamal
- Add 's' and 'g' field checks on segment registers
- Correct SS checks for request and descriptor privilege levels

Signed-off-by: Mohammed Gamal m.gamal...@gmail.com
---
 arch/x86/kvm/vmx.c |   73 +++
 1 files changed, 67 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 777e00d..9805c2a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2121,16 +2121,30 @@ static bool stack_segment_valid(struct kvm_vcpu *vcpu)
vmx_get_segment(vcpu, ss, VCPU_SREG_SS);
ss_rpl = ss.selector  SELECTOR_RPL_MASK;
 
-   if (ss.unusable)
+   if (ss.dpl != ss_rpl) /* DPL != RPL */
+   return false;
+
+   if (ss.unusable) /* Short-circuit */
return true;
+
if (ss.type != 3  ss.type != 7)
return false;
if (!ss.s)
return false;
-   if (ss.dpl != ss_rpl) /* DPL != RPL */
-   return false;
if (!ss.present)
return false;
+   if (ss.limit  0xfff0) {
+if ((ss.limit  0xfff)  0xfff)
+return false;
+if (!ss.g)
+return false;
+} else {
+if ((ss.limit  0xfff) == 0xfff)
+return false;
+if (ss.g)
+return false;
+}
+
 
return true;
 }
@@ -2143,8 +2157,15 @@ static bool data_segment_valid(struct kvm_vcpu *vcpu, 
int seg)
vmx_get_segment(vcpu, var, seg);
rpl = var.selector  SELECTOR_RPL_MASK;
 
-   if (var.unusable)
+   if (var.unusable)  /* Short-circuit */
return true;
+   if (!(var.type  AR_TYPE_ACCESSES_MASK))
+   return false;
+   if (var.type  AR_TYPE_CODE_MASK) {
+   if (!(var.type  AR_TYPE_READABLE_MASK))
+   return false;
+   }
+
if (!var.s)
return false;
if (!var.present)
@@ -2154,6 +2175,18 @@ static bool data_segment_valid(struct kvm_vcpu *vcpu, 
int seg)
return false;
}
 
+   if (var.limit  0xfff0) {
+   if ((var.limit  0xfff)  0xfff)
+   return false;
+   if (!var.g)
+   return false;
+   } else {
+   if ((var.limit  0xfff) == 0xfff)
+   return false;
+   if (var.g)
+   return false;
+   }
+
/* TODO: Add other members to kvm_segment_field to allow checking for 
other access
 * rights flags
 */
@@ -2172,6 +2205,21 @@ static bool tr_valid(struct kvm_vcpu *vcpu)
return false;
if (tr.type != 3  tr.type != 11) /* TODO: Check if guest is in IA32e 
mode */
return false;
+   if (tr.s)
+   return false;
+   if (tr.limit  0xfff0) {
+if ((tr.limit  0xfff)  0xfff)
+return false;
+if (!tr.g)
+return false;
+} else {
+if ((tr.limit  0xfff) == 0xfff)
+return false;
+if (tr.g)
+return false;
+}
+
+
if (!tr.present)
return false;
 
@@ -2184,7 +2232,7 @@ static bool ldtr_valid(struct kvm_vcpu *vcpu)
 
vmx_get_segment(vcpu, ldtr, VCPU_SREG_LDTR);
 
-   if (ldtr.unusable)
+   if (ldtr.unusable)  /* Short-circuit */
return true;
if (ldtr.selector  SELECTOR_TI_MASK)   /* TI = 1 */
return false;
@@ -2192,6 +2240,20 @@ static bool ldtr_valid(struct kvm_vcpu *vcpu)
return false;
if (!ldtr.present)
return false;
+   if (ldtr.s)
+   return false;
+   if (ldtr.limit  0xfff0) {
+if ((ldtr.limit  0xfff)  0xfff)
+return false;
+if (!ldtr.g)
+return false;
+} else {
+if ((ldtr.limit  0xfff) == 0xfff)
+return false;
+if (ldtr.g)
+return false;
+}
+
 
return true;
 }
@@ -2251,7 +2313,6 @@ static bool guest_state_valid(struct kvm_vcpu *vcpu)
}
/* TODO:
 * - Add checks on RIP
-* - Add checks on RFLAGS
 */
 
return true;
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 4/5] Inter-VM shared memory PCI device

2010-05-11 Thread Anthony Liguori

On 05/11/2010 11:39 AM, Cam Macdonell wrote:


Most of the people I hear from who are using my patch are using a peer
model to share data between applications (simulations, JVMs, etc).
But guest-to-host applications work as well of course.

I think transparent migration can be achieved by making the
connected/disconnected state transparent to the application.

When using the shared memory server, the server has to be setup anyway
on the new host and copying the memory region could be part of that as
well if the application needs the contents preserved.  I don't think
it has to be handled by the savevm/loadvm operations.  There's little
difference between naming one VM the master or letting the shared
memory server act like a master.
   


Except that to make it work with the shared memory server, you need the 
server to participate in the live migration protocol which is something 
I'd prefer to avoid at it introduces additional down time.


Regards,

Anthony Liguori


I think abstractions on top of shared memory could handle
disconnection issues (sort of how TCP handles them for networks) if
the application needs it.  Again, my opinion is to leave it to the
application to decide what it necessary.

Cam
   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC] vhost: fix barrier pairing

2010-05-11 Thread Michael S. Tsirkin
According to memory-barriers.txt, an smp memory barrier
should always be paired with another smp memory barrier,
and I quote a lack of appropriate pairing is almost certainly an
error.

In case of vhost, failure to flush out used index
update before looking at the interrupt disable flag
could result in missed interrupts, resulting in
networking hang under stress.

This might happen when flags read bypasses used index write.
So we see interrupts disabled and do not interrupt, at the
same time guest writes flags value to enable interrupt,
reads an old used index value, thinks that
used ring is empty and waits for interrupt.

Note: the barrier we pair with here is in
drivers/virtio/virtio_ring.c, function
vring_enable_cb.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---

Dave, I think this is needed in 2.6.34, I'll send a pull
request after doing some more testing.

Rusty, Juan, could you take a look as well please?
Thanks!

 drivers/vhost/vhost.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index e69d238..14fa2f5 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1035,7 +1035,10 @@ int vhost_add_used(struct vhost_virtqueue *vq, unsigned 
int head, int len)
 /* This actually signals the guest, using eventfd. */
 void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 {
-   __u16 flags = 0;
+   __u16 flags;
+   /* Flush out used index updates. */
+   smp_mb();
+
if (get_user(flags, vq-avail-flags)) {
vq_err(vq, Failed to get flags);
return;
-- 
1.7.1.12.g42b7f
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 4/5] Inter-VM shared memory PCI device

2010-05-11 Thread Cam Macdonell
On Tue, May 11, 2010 at 11:05 AM, Anthony Liguori anth...@codemonkey.ws wrote:
 On 05/11/2010 11:39 AM, Cam Macdonell wrote:

 Most of the people I hear from who are using my patch are using a peer
 model to share data between applications (simulations, JVMs, etc).
 But guest-to-host applications work as well of course.

 I think transparent migration can be achieved by making the
 connected/disconnected state transparent to the application.

 When using the shared memory server, the server has to be setup anyway
 on the new host and copying the memory region could be part of that as
 well if the application needs the contents preserved.  I don't think
 it has to be handled by the savevm/loadvm operations.  There's little
 difference between naming one VM the master or letting the shared
 memory server act like a master.


 Except that to make it work with the shared memory server, you need the
 server to participate in the live migration protocol which is something I'd
 prefer to avoid at it introduces additional down time.

Fair enough, then to move to a resolution on this can we either

not support migration at this point, which leaves us free to add it
later as migration use cases become better understand. (my preference)

OR

1 - not support migration when the server is used
2 - if role=master is specified in the non-server case, then that
guest will copy the memory with it.  If role=peer is specified, the
guest will use the shared memory object on the destination host as is
(possibly creating it or output an error if memory object doesn't
exist).

Cam
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 4/5] Inter-VM shared memory PCI device

2010-05-11 Thread Avi Kivity

On 05/11/2010 06:51 PM, Anthony Liguori wrote:

On 05/11/2010 09:53 AM, Avi Kivity wrote:

On 05/11/2010 05:17 PM, Cam Macdonell wrote:


The master is the shared memory area.  It's a completely separate 
entity
that is represented by the backing file (or shared memory server 
handing out

the fd to mmap).  It can exists independently of any guest.

I think the master/peer idea would be necessary if we were sharing
guest memory (sharing guest A's memory with guest B).  Then if the
master (guest A) dies, perhaps something needs to happen to preserve
the memory contents.


Definitely.  But we aren't...


Then transparent live migration is impossible.  IMHO, that's a 
fundamental mistake that we will regret down the road.


I don't see why the two cases are any different.  In all cases, all 
guests have to be migrated simultaneously, or we have to support 
distributed shared memory (likely at the kernel level).  Who owns the 
memory makes no difference.


There is a two non-transparent variants:
- forcibly disconnect the migrating guest, and migrate it later
  - puts all the burden on the guest application
- ask the guest to detach from the memory device
  - host is at the mercy of the guest

Since the consumers of shared memory are academia, they'll probably 
implement DSM.





   But since we're sharing host memory, the
applications in the guests can race to determine the master by
grabbing a lock at offset 0 or by using lowest VM ID.

Looking at it another way, it is the applications using shared memory
that may or may not need a master, the Qemu processes don't need the
concept of a master since the memory belongs to the host.


Exactly.  Furthermore, even in a master/slave relationship, there 
will be different masters for different sub-areas, it would be a pity 
to expose all this in the hardware abstraction.  This way we have an 
external device, and PCI HBAs which connect to it - just like a 
multi-tailed SCSI disk.


To support transparent live migration, it's necessary to do two things:

1) Preserve the memory contents of the PCI BAR after disconnected from 
a shared memory segment
2) Synchronize any changes made to the PCI BAR with the shared memory 
segment upon reconnect/initial connection.


Disconnect/reconnect mean it's no longer transparent.



N.B. savevm/loadvm both constitute disconnect and reconnect events 
respectively.


Supporting (1) is easy since we just need to memcpy() the contents of 
the shared memory segment to a temporary RAM area upon disconnect.


Supporting (2) is easy when the shared memory segment is viewed as 
owned by the guest since it has the definitive copy of the data.  
IMHO, this is what role=master means. 


There is no 'the guest', if the memory is to be shared there will be 
multiple guests (or multiple entities).


However, if we want to support a model where the guest does not have a 
definitive copy of the data, upon reconnect, we need to throw away the 
guest's changes and make the shared memory segment appear to 
simultaneously update to the guest.  This is what role=peer means.


For role=peer, it's necessary to signal to the guest when it's not 
connected.  This means prior to savevm it's necessary to indicate to 
the guest that it's been disconnected.


I think it's important that we build this mechanism in from the start 
because as I've stated in the past, I don't think role=peer is going 
to be the dominant use-case.  I actually don't think that shared 
memory between guests is all that interesting compared to shared 
memory to an external process on the host.


I'd like to avoid making the distinction.  Why limit at the outset?

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 4/5] Inter-VM shared memory PCI device

2010-05-11 Thread Avi Kivity

On 05/11/2010 08:05 PM, Anthony Liguori wrote:

On 05/11/2010 11:39 AM, Cam Macdonell wrote:


Most of the people I hear from who are using my patch are using a peer
model to share data between applications (simulations, JVMs, etc).
But guest-to-host applications work as well of course.

I think transparent migration can be achieved by making the
connected/disconnected state transparent to the application.

When using the shared memory server, the server has to be setup anyway
on the new host and copying the memory region could be part of that as
well if the application needs the contents preserved.  I don't think
it has to be handled by the savevm/loadvm operations.  There's little
difference between naming one VM the master or letting the shared
memory server act like a master.


Except that to make it work with the shared memory server, you need 
the server to participate in the live migration protocol which is 
something I'd prefer to avoid at it introduces additional down time.


We can tunnel its migration data through qemu.  Of course, gathering its 
dirty bitmap will be interesting.  DSM may be the way to go here (we can 
even live migrate qemu through DSM: share the guest address space and 
immediately start running on the destination node; the guest will fault 
its memory to the destination.  An advantage is that that the cpu load 
is immediately transferred.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


virtio: power management

2010-05-11 Thread Michael S. Tsirkin
Anyone looked at power management with virtio?
virtio-pci has callbacks to save/restore pci config
on suspend/resume, but it seems that more action,
such as restoring queue state, would be needed
for e.g. suspend to disk to work.

Rusty, any hints on the code in virtio-pci
that deals with suspend? Is it incomplete?

Thanks,

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] pci: cleanly backout of pci_qdev_init()

2010-05-11 Thread Blue Swirl
On 5/10/10, Alex Williamson alex.william...@redhat.com wrote:
 If the init function of a device fails, as might happen with device
  assignment, we never undo the work done by do_pci_register_device().
  This not only causes a bit of a memory leak, but also leaves a bogus
  pointer in the bus devices array that can cause a segfault or
  garbage data from 'info pci'.

  Signed-off-by: Alex Williamson alex.william...@redhat.com
  ---

   hw/pci.c |   17 -
   1 files changed, 12 insertions(+), 5 deletions(-)

  diff --git a/hw/pci.c b/hw/pci.c
  index f167436..3d3560e 100644
  --- a/hw/pci.c
  +++ b/hw/pci.c
  @@ -625,6 +625,14 @@ static PCIDevice *do_pci_register_device(PCIDevice 
 *pci_dev, PCIBus *bus,
  return pci_dev;
   }

  +static void do_pci_unregister_device(PCIDevice *pci_dev)
  +{
  +qemu_free_irqs(pci_dev-irq);
  +pci_dev-bus-devices[pci_dev-devfn] = NULL;
  +pci_config_free(pci_dev);
  +return;

Isn't this 'return' useless?

  +}
  +
   PCIDevice *pci_register_device(PCIBus *bus, const char *name,
 int instance_size, int devfn,
 PCIConfigReadFunc *config_read,
  @@ -680,10 +688,7 @@ static int pci_unregister_device(DeviceState *dev)
  return ret;

  pci_unregister_io_regions(pci_dev);
  -
  -qemu_free_irqs(pci_dev-irq);
  -pci_dev-bus-devices[pci_dev-devfn] = NULL;
  -pci_config_free(pci_dev);
  +do_pci_unregister_device(pci_dev);
  return 0;
   }

  @@ -1652,8 +1657,10 @@ static int pci_qdev_init(DeviceState *qdev, 
 DeviceInfo *base)
  if (pci_dev == NULL)
  return -1;
  rc = info-init(pci_dev);
  -if (rc != 0)
  +if (rc != 0) {
  +do_pci_unregister_device(pci_dev);
  return rc;
  +}

  /* rom loading */
  if (pci_dev-romfile == NULL  info-romfile != NULL)



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Net KVM

2010-05-11 Thread Willi
Dears,

I tried to install NetKVM in the guest machine windows xp with your
download kvm-guest-drivers-windows-2.zip. But there is not a content
like your page
http://www.linux-kvm.com/content/using-windows-installer-paravirtual-net
work-drivers 

I am missing the netkvminstall.exe. 

And a windows xp guest machine on a vista host dont allow to start with
qemu-system-x86_64 -hda windows.img -m 400 -net user -net
nic,model=virtio. 

How can I get the NetKVM in the xp - machine?

Thanks for your answers

Willi

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9] pvclock misc fixes - v4.

2010-05-11 Thread Avi Kivity

On 05/11/2010 07:17 PM, Glauber Costa wrote:

This is the fourth version ov kvmclock fixes.

Just two minor changes in patch 5, per avi request, and
the addition of cpuid.txt file, documenting all cpuid flags
we use.
   


Looks good.


As a side effect, this patch removes the time-travel feature
in kvm guests.
   


Too many reports of users accidentally killing their grandparents anyway.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] pci: cleanly backout of pci_qdev_init()

2010-05-11 Thread Alex Williamson
On Tue, 2010-05-11 at 21:17 +0300, Blue Swirl wrote:
 On 5/10/10, Alex Williamson alex.william...@redhat.com wrote:
 
hw/pci.c |   17 -
1 files changed, 12 insertions(+), 5 deletions(-)
 
   diff --git a/hw/pci.c b/hw/pci.c
   index f167436..3d3560e 100644
   --- a/hw/pci.c
   +++ b/hw/pci.c
   @@ -625,6 +625,14 @@ static PCIDevice *do_pci_register_device(PCIDevice 
  *pci_dev, PCIBus *bus,
   return pci_dev;
}
 
   +static void do_pci_unregister_device(PCIDevice *pci_dev)
   +{
   +qemu_free_irqs(pci_dev-irq);
   +pci_dev-bus-devices[pci_dev-devfn] = NULL;
   +pci_config_free(pci_dev);
   +return;
 
 Isn't this 'return' useless?

Yeah, I'll remove it.

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] pci: cleanly backout of pci_qdev_init()

2010-05-11 Thread Alex Williamson
If the init function of a device fails, as might happen with device
assignment, we never undo the work done by do_pci_register_device().
This not only causes a bit of a memory leak, but also leaves a bogus
pointer in the bus devices array that can cause a segfault or
garbage data from 'info pci'.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

v2: Remove extraneous return from do_pci_unregister_device()(

 hw/pci.c |   16 +++-
 1 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index f167436..3ca5f09 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -625,6 +625,13 @@ static PCIDevice *do_pci_register_device(PCIDevice 
*pci_dev, PCIBus *bus,
 return pci_dev;
 }
 
+static void do_pci_unregister_device(PCIDevice *pci_dev)
+{
+qemu_free_irqs(pci_dev-irq);
+pci_dev-bus-devices[pci_dev-devfn] = NULL;
+pci_config_free(pci_dev);
+}
+
 PCIDevice *pci_register_device(PCIBus *bus, const char *name,
int instance_size, int devfn,
PCIConfigReadFunc *config_read,
@@ -680,10 +687,7 @@ static int pci_unregister_device(DeviceState *dev)
 return ret;
 
 pci_unregister_io_regions(pci_dev);
-
-qemu_free_irqs(pci_dev-irq);
-pci_dev-bus-devices[pci_dev-devfn] = NULL;
-pci_config_free(pci_dev);
+do_pci_unregister_device(pci_dev);
 return 0;
 }
 
@@ -1652,8 +1656,10 @@ static int pci_qdev_init(DeviceState *qdev, DeviceInfo 
*base)
 if (pci_dev == NULL)
 return -1;
 rc = info-init(pci_dev);
-if (rc != 0)
+if (rc != 0) {
+do_pci_unregister_device(pci_dev);
 return rc;
+}
 
 /* rom loading */
 if (pci_dev-romfile == NULL  info-romfile != NULL)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio: put last seen used index into ring itself

2010-05-11 Thread Ryan Harper
* Michael S. Tsirkin m...@redhat.com [2010-05-05 16:37]:
 Generally, the Host end of the virtio ring doesn't need to see where
 Guest is up to in consuming the ring.  However, to completely understand
 what's going on from the outside, this information must be exposed.
 For example, host can reduce the number of interrupts by detecting
 that the guest is currently handling previous buffers.
 
 Fortunately, we have room to expand: the ring is always a whole number
 of pages and there's hundreds of bytes of padding after the avail ring
 and the used ring, whatever the number of descriptors (which must be a
 power of 2).
 
 We add a feature bit so the guest can tell the host that it's writing
 out the current value there, if it wants to use that.
 
 This is based on a patch by Rusty Russell, with the main difference
 being that we dedicate a feature bit to guest to tell the host it is
 writing the used index.  This way we don't need to force host to publish
 the last available index until we have a use for it.
 
 Signed-off-by: Rusty Russell ru...@rustcorp.com.au
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
 
 Rusty,
 this is a simplified form of a patch you posted in the past.
 I have a vhost patch that, using this feature, shows external
 to host bandwidth grow from 5 to 7 GB/s, by avoiding
 an interrupt in the window after previous interrupt
 was sent and before interrupts were disabled for the vq.
 With vhost under some external to host loads I see
 this window being hit about 30% sometimes.
 
 I'm finalizing the host bits and plan to send
 the final version for inclusion when all's ready,
 but I'd like to hear comments meanwhile.
 
  drivers/virtio/virtio_ring.c |   28 +---
  include/linux/virtio_ring.h  |   14 +-
  2 files changed, 30 insertions(+), 12 deletions(-)
 
 diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
 index 1ca8890..7729aba 100644
 --- a/drivers/virtio/virtio_ring.c
 +++ b/drivers/virtio/virtio_ring.c
 @@ -89,9 +89,6 @@ struct vring_virtqueue
   /* Number we've added since last sync. */
   unsigned int num_added;
 
 - /* Last used index we've seen. */
 - u16 last_used_idx;
 -
   /* How to notify other side. FIXME: commonalize hcalls! */
   void (*notify)(struct virtqueue *vq);
 
 @@ -285,12 +282,13 @@ static void detach_buf(struct vring_virtqueue *vq, 
 unsigned int head)
 
  static inline bool more_used(const struct vring_virtqueue *vq)
  {
 - return vq-last_used_idx != vq-vring.used-idx;
 + return *vq-vring.last_used_idx != vq-vring.used-idx;
  }
 
  void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
  {
   struct vring_virtqueue *vq = to_vvq(_vq);
 + struct vring_used_elem *u;
   void *ret;
   unsigned int i;
 
 @@ -307,12 +305,13 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned 
 int *len)
   return NULL;
   }
 
 - /* Only get used array entries after they have been exposed by host. */
 - virtio_rmb();
 -
 - i = vq-vring.used-ring[vq-last_used_idx%vq-vring.num].id;
 - *len = vq-vring.used-ring[vq-last_used_idx%vq-vring.num].len;
 + /* Only get used array entries after they have been exposed by host.
 +  * Need mb(), not just rmb() because we write last_used_idx below. */
 + virtio_mb();
 
 + u = vq-vring.used-ring[*vq-vring.last_used_idx % vq-vring.num];
 + i = u-id;
 + *len = u-len;
   if (unlikely(i = vq-vring.num)) {
   BAD_RING(vq, id %u out of range\n, i);
   return NULL;
 @@ -325,7 +324,8 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned 
 int *len)
   /* detach_buf clears data, so grab it now. */
   ret = vq-data[i];
   detach_buf(vq, i);
 - vq-last_used_idx++;
 + (*vq-vring.last_used_idx)++;
 +
   END_USE(vq);
   return ret;
  }
 @@ -431,7 +431,7 @@ struct virtqueue *vring_new_virtqueue(unsigned int num,
   vq-vq.name = name;
   vq-notify = notify;
   vq-broken = false;
 - vq-last_used_idx = 0;
 + *vq-vring.last_used_idx = 0;
   vq-num_added = 0;
   list_add_tail(vq-vq.list, vdev-vqs);
  #ifdef DEBUG
 @@ -440,6 +440,10 @@ struct virtqueue *vring_new_virtqueue(unsigned int num,
 
   vq-indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC);
 
 + /* We publish used index whether Host offers it or not: if not, it's
 +  * junk space anyway.  But calling this acknowledges the feature. */
 + virtio_has_feature(vdev, VIRTIO_RING_F_PUBLISH_USED);
 +

You use VIRTIO_RING_F_PUBLISH_USED here, but
VIRTIO_RING_F_PUBLISH_INDICES below... 


   /* No callback?  Tell other side not to bother us. */
   if (!callback)
   vq-vring.avail-flags |= VRING_AVAIL_F_NO_INTERRUPT;
 @@ -473,6 +477,8 @@ void vring_transport_features(struct virtio_device *vdev)
   switch (i) {
   case VIRTIO_RING_F_INDIRECT_DESC:
   break;
 +   

[PATCH] x86 emulator: Add test acc, imm instruction (opcodes 0xA8 - 0xA9)

2010-05-11 Thread Mohammed Gamal
This adds test acc, imm instruction to the x86 emulator

Signed-off-by: Mohammed Gamal m.gamal...@gmail.com
---
 arch/x86/kvm/emulate.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index b43ac98..be5e78d 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -181,7 +181,7 @@ static u32 opcode_table[256] = {
ByteOp | SrcSI | DstDI | Mov | String, SrcSI | DstDI | Mov | String,
ByteOp | SrcSI | DstDI | String, SrcSI | DstDI | String,
/* 0xA8 - 0xAF */
-   0, 0, ByteOp | DstDI | Mov | String, DstDI | Mov | String,
+   DstAcc | SrcImmByte | ByteOp, DstAcc | SrcImm, ByteOp | DstDI | Mov | 
String, DstDI | Mov | String,
ByteOp | SrcSI | DstAcc | Mov | String, SrcSI | DstAcc | Mov | String,
ByteOp | DstDI | String, DstDI | String,
/* 0xB0 - 0xB7 */
@@ -2754,6 +2754,7 @@ special_insn:
}
break;
case 0x84 ... 0x85:
+   test:
emulate_2op_SrcV(test, c-src, c-dst, ctxt-eflags);
break;
case 0x86 ... 0x87: /* xchg */
@@ -2852,6 +2853,8 @@ special_insn:
c-dst.type = OP_NONE; /* Disable writeback. */
DPRINTF(cmps: mem1=0x%p mem2=0x%p\n, c-src.ptr, c-dst.ptr);
goto cmp;
+   case 0xa8 ... 0xa9: /* test ax, imm */
+   goto test;  
case 0xaa ... 0xab: /* stos */
c-dst.val = c-regs[VCPU_REGS_RAX];
break;
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: MMU: Fix free memory accounting race in mmu_alloc_roots()

2010-05-11 Thread Marcelo Tosatti
On Mon, May 10, 2010 at 12:09:56PM +0300, Avi Kivity wrote:
 We drop the mmu lock between freeing memory and allocating the roots; this
 allows some other vcpu to sneak in and allocate memory.
 
 While the race is benign (resulting only in temporary overallocation, not oom)
 it is simple and easy to fix by moving the freeing close to the allocation.
 
 Signed-off-by: Avi Kivity a...@redhat.com
 ---
  arch/x86/kvm/mmu.c |5 ++---
  1 files changed, 2 insertions(+), 3 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] KVM: inject #UD if instruction emulation fails and exit to userspace

2010-05-11 Thread Marcelo Tosatti
On Mon, May 10, 2010 at 11:16:56AM +0300, Gleb Natapov wrote:
 Do not kill VM when instruction emulation fails. Inject #UD and report
 failure to userspace instead. Userspace may choose to reenter guest if
 vcpu is in userspace (cpl == 3) in which case guest OS will kill
 offending process and continue running.
 
 Signed-off-by: Gleb Natapov g...@redhat.com
 ---
 
 Changelog:
   v1-v2:
 - always inject #UD and report failure to userspace.

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] KVM: x86: Remove kvm_mmu_reset_context() in kvm_set_efer()

2010-05-11 Thread Marcelo Tosatti
On Tue, May 11, 2010 at 01:30:06PM +0800, Sheng Yang wrote:
 Modify EFER won't result in mode switch directly. After EFER.LME set, the
 following set CR0.PG would result in mode switch to IA32e. And the later
 action already covered by kvm_set_cr0().
 
 Signed-off-by: Sheng Yang sh...@linux.intel.com
 ---
  arch/x86/kvm/x86.c |1 -
  1 files changed, 0 insertions(+), 1 deletions(-)
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 764f89b..b59fc67 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -721,7 +721,6 @@ static int set_efer(struct kvm_vcpu *vcpu, u64 efer)
   kvm_x86_ops-set_efer(vcpu, efer);
  
   vcpu-arch.mmu.base_role.nxe = (efer  EFER_NX)  !tdp_enabled;
 - kvm_mmu_reset_context(vcpu);

But there are different sets of shadow pagetables for NXE on/off. See
commit 9645bb56b31a1b.

Without the reset, after NXE 1-0 transition, a spte retains the NXE
validity check, and subsequent use of such gpte with bit 63 set does not
cause a fault.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] VMX: x86: Only reset MMU when necessary

2010-05-11 Thread Marcelo Tosatti
On Tue, May 11, 2010 at 01:30:07PM +0800, Sheng Yang wrote:
 Only modifying some bits of CR0/CR4 needs paging mode switch.
 
 Add update_rsvd_bits_mask() to address EFER.NX bit updating for reserved bits.
 
 Signed-off-by: Sheng Yang sh...@linux.intel.com
 ---
  arch/x86/include/asm/kvm_host.h |1 +
  arch/x86/kvm/mmu.c  |   17 ++---
  arch/x86/kvm/x86.c  |   14 --
  3 files changed, 27 insertions(+), 5 deletions(-)
 
 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
 index ed48904..c8c8a03 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -553,6 +553,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, 
 int slot);
  void kvm_mmu_zap_all(struct kvm *kvm);
  unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
  void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int 
 kvm_nr_mmu_pages);
 +void update_rsvd_bits_mask(struct kvm_vcpu *vcpu);
  
  int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
  
 diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
 index 5412185..98abdcf 100644
 --- a/arch/x86/kvm/mmu.c
 +++ b/arch/x86/kvm/mmu.c
 @@ -2335,6 +2335,19 @@ static void reset_rsvds_bits_mask(struct kvm_vcpu 
 *vcpu, int level)
   }
  }
  
 +void update_rsvd_bits_mask(struct kvm_vcpu *vcpu)
 +{
 + if (!is_paging(vcpu))
 + return;
 + if (is_long_mode(vcpu))
 + reset_rsvds_bits_mask(vcpu, PT64_ROOT_LEVEL);
 + else if (is_pae(vcpu))
 + reset_rsvds_bits_mask(vcpu, PT32E_ROOT_LEVEL);
 + else
 + reset_rsvds_bits_mask(vcpu, PT32_ROOT_LEVEL);
 +}
 +EXPORT_SYMBOL_GPL(update_rsvd_bits_mask);
 +
  static int paging64_init_context_common(struct kvm_vcpu *vcpu, int level)
  {
   struct kvm_mmu *context = vcpu-arch.mmu;
 @@ -2400,18 +2413,16 @@ static int init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
   context-gva_to_gpa = nonpaging_gva_to_gpa;
   context-root_level = 0;
   } else if (is_long_mode(vcpu)) {
 - reset_rsvds_bits_mask(vcpu, PT64_ROOT_LEVEL);
   context-gva_to_gpa = paging64_gva_to_gpa;
   context-root_level = PT64_ROOT_LEVEL;
   } else if (is_pae(vcpu)) {
 - reset_rsvds_bits_mask(vcpu, PT32E_ROOT_LEVEL);
   context-gva_to_gpa = paging64_gva_to_gpa;
   context-root_level = PT32E_ROOT_LEVEL;
   } else {
 - reset_rsvds_bits_mask(vcpu, PT32_ROOT_LEVEL);
   context-gva_to_gpa = paging32_gva_to_gpa;
   context-root_level = PT32_ROOT_LEVEL;
   }
 + update_rsvd_bits_mask(vcpu);
  
   return 0;
  }
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index b59fc67..1c76e08 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -416,6 +416,9 @@ out:
  
  static int __kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
  {
 + unsigned long old_cr0 = kvm_read_cr0(vcpu);
 + unsigned long update_bits = X86_CR0_PG | X86_CR0_PE;

If PAE paging would be in use following an execution of MOV to CR0 or
MOV to CR4 (see Section 4.1.1) and the instruction is modifying any of
CR0.CD, CR0.NW, CR0.PG, CR4.PAE, CR4.PGE, or CR4.PSE; then the PDPTEs
are loaded from the address in CR3.

If the PDPTRS changed, the mmu must be reloaded.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Do not stop VM if emulation failed in userspace.

2010-05-11 Thread Marcelo Tosatti
On Mon, May 10, 2010 at 11:21:34AM +0300, Gleb Natapov wrote:
 Continue vcpu execution in case emulation failure happened while vcpu
 was in userspace. In this case #UD will be injected into the guest
 allowing guest OS to kill offending process and continue.
 
 Signed-off-by: Gleb Natapov g...@redhat.com

Applied to uq/master, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio: put last seen used index into ring itself

2010-05-11 Thread Michael S. Tsirkin
On Tue, May 11, 2010 at 01:46:08PM -0500, Ryan Harper wrote:
 * Michael S. Tsirkin m...@redhat.com [2010-05-05 16:37]:
  Generally, the Host end of the virtio ring doesn't need to see where
  Guest is up to in consuming the ring.  However, to completely understand
  what's going on from the outside, this information must be exposed.
  For example, host can reduce the number of interrupts by detecting
  that the guest is currently handling previous buffers.
  
  Fortunately, we have room to expand: the ring is always a whole number
  of pages and there's hundreds of bytes of padding after the avail ring
  and the used ring, whatever the number of descriptors (which must be a
  power of 2).
  
  We add a feature bit so the guest can tell the host that it's writing
  out the current value there, if it wants to use that.
  
  This is based on a patch by Rusty Russell, with the main difference
  being that we dedicate a feature bit to guest to tell the host it is
  writing the used index.  This way we don't need to force host to publish
  the last available index until we have a use for it.
  
  Signed-off-by: Rusty Russell ru...@rustcorp.com.au
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
  
  Rusty,
  this is a simplified form of a patch you posted in the past.
  I have a vhost patch that, using this feature, shows external
  to host bandwidth grow from 5 to 7 GB/s, by avoiding
  an interrupt in the window after previous interrupt
  was sent and before interrupts were disabled for the vq.
  With vhost under some external to host loads I see
  this window being hit about 30% sometimes.
  
  I'm finalizing the host bits and plan to send
  the final version for inclusion when all's ready,
  but I'd like to hear comments meanwhile.
  
   drivers/virtio/virtio_ring.c |   28 +---
   include/linux/virtio_ring.h  |   14 +-
   2 files changed, 30 insertions(+), 12 deletions(-)
  
  diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
  index 1ca8890..7729aba 100644
  --- a/drivers/virtio/virtio_ring.c
  +++ b/drivers/virtio/virtio_ring.c
  @@ -89,9 +89,6 @@ struct vring_virtqueue
  /* Number we've added since last sync. */
  unsigned int num_added;
  
  -   /* Last used index we've seen. */
  -   u16 last_used_idx;
  -
  /* How to notify other side. FIXME: commonalize hcalls! */
  void (*notify)(struct virtqueue *vq);
  
  @@ -285,12 +282,13 @@ static void detach_buf(struct vring_virtqueue *vq, 
  unsigned int head)
  
   static inline bool more_used(const struct vring_virtqueue *vq)
   {
  -   return vq-last_used_idx != vq-vring.used-idx;
  +   return *vq-vring.last_used_idx != vq-vring.used-idx;
   }
  
   void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
   {
  struct vring_virtqueue *vq = to_vvq(_vq);
  +   struct vring_used_elem *u;
  void *ret;
  unsigned int i;
  
  @@ -307,12 +305,13 @@ void *virtqueue_get_buf(struct virtqueue *_vq, 
  unsigned int *len)
  return NULL;
  }
  
  -   /* Only get used array entries after they have been exposed by host. */
  -   virtio_rmb();
  -
  -   i = vq-vring.used-ring[vq-last_used_idx%vq-vring.num].id;
  -   *len = vq-vring.used-ring[vq-last_used_idx%vq-vring.num].len;
  +   /* Only get used array entries after they have been exposed by host.
  +* Need mb(), not just rmb() because we write last_used_idx below. */
  +   virtio_mb();
  
  +   u = vq-vring.used-ring[*vq-vring.last_used_idx % vq-vring.num];
  +   i = u-id;
  +   *len = u-len;
  if (unlikely(i = vq-vring.num)) {
  BAD_RING(vq, id %u out of range\n, i);
  return NULL;
  @@ -325,7 +324,8 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned 
  int *len)
  /* detach_buf clears data, so grab it now. */
  ret = vq-data[i];
  detach_buf(vq, i);
  -   vq-last_used_idx++;
  +   (*vq-vring.last_used_idx)++;
  +
  END_USE(vq);
  return ret;
   }
  @@ -431,7 +431,7 @@ struct virtqueue *vring_new_virtqueue(unsigned int num,
  vq-vq.name = name;
  vq-notify = notify;
  vq-broken = false;
  -   vq-last_used_idx = 0;
  +   *vq-vring.last_used_idx = 0;
  vq-num_added = 0;
  list_add_tail(vq-vq.list, vdev-vqs);
   #ifdef DEBUG
  @@ -440,6 +440,10 @@ struct virtqueue *vring_new_virtqueue(unsigned int num,
  
  vq-indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC);
  
  +   /* We publish used index whether Host offers it or not: if not, it's
  +* junk space anyway.  But calling this acknowledges the feature. */
  +   virtio_has_feature(vdev, VIRTIO_RING_F_PUBLISH_USED);
  +
 
 You use VIRTIO_RING_F_PUBLISH_USED here, but
 VIRTIO_RING_F_PUBLISH_INDICES below... 
 
 
  /* No callback?  Tell other side not to bother us. */
  if (!callback)
  vq-vring.avail-flags |= VRING_AVAIL_F_NO_INTERRUPT;
  @@ -473,6 +477,8 @@ void vring_transport_features(struct virtio_device 
  *vdev)
  

Re: [Qemu-devel] [PATCH RFC] virtio: put last seen used index into ring itself

2010-05-11 Thread Michael S. Tsirkin
On Tue, May 11, 2010 at 10:27:22PM +0300, Avi Kivity wrote:
 On 05/07/2010 06:23 AM, Rusty Russell wrote:
 On Thu, 6 May 2010 07:30:00 pm Avi Kivity wrote:

 On 05/05/2010 11:58 PM, Michael S. Tsirkin wrote:
  
 +  /* We publish the last-seen used index at the end of the available ring.
 +   * It is at the end for backwards compatibility. */
 +  vr-last_used_idx =(vr)-avail-ring[num];
 +  /* Verify that last used index does not spill over the used ring. */
 +  BUG_ON((void *)vr-last_used_idx +
 + sizeof *vr-last_used_idx   (void *)vr-used);
}


 Shouldn't this be on its own cache line?
  
 It's next to the available ring; because that's where the guest publishes
 its data.  That whole page is guest-write, host-read.

 Putting it on a cacheline by itself would be a slight pessimization; the host
 cpu would have to get the last_used_idx cacheline and the avail descriptor
 cacheline every time.  This way, they are sometimes the same cacheline.


 If one peer writes the tail of the available ring, while the other reads  
 last_used_idx, it's a false bounce, no?

 Having things on the same cacheline is only worthwhile if they are  
 accessed at the same time.

Yes, this is what I was trying to say.
avail flags and used index *are* accessed at the same time, so
there could be an advantage to sharing a cache line there.

All this should be kept in mind if we ever do
VIRTIO_RING_F_NEW_LAYOUT.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix 80000001.EDX supported bit filtering

2010-05-11 Thread Marcelo Tosatti
On Tue, May 11, 2010 at 09:41:25AM +0300, Gleb Natapov wrote:
 On AMD some bits from 1.EDX are reported in 8001.EDX. The mask used
 to copy bits from 1.EDX to 8001.EDX is incorrect resulting in
 unsupported features passed into a guest.
 
 Signed-off-by: Gleb Natapov g...@redhat.com
 diff --git a/target-i386/kvm.c b/target-i386/kvm.c
 index 76c1adb..6463390 100644
 --- a/target-i386/kvm.c
 +++ b/target-i386/kvm.c
 @@ -111,7 +111,7 @@ uint32_t kvm_arch_get_supported_cpuid(CPUState *env, 
 uint32_t function, int reg)
   * so add missing bits according to the AMD spec:
   */
  cpuid_1_edx = kvm_arch_get_supported_cpuid(env, 1, 
 R_EDX);
 -ret |= cpuid_1_edx  0xdfeff7ff;
 +ret |= cpuid_1_edx  0x183f7ff;
  break;
  }
  break;

Applied to uq/master, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: update mmu documetation for role.nxe.

2010-05-11 Thread Marcelo Tosatti
On Tue, May 11, 2010 at 02:36:58PM +0800, Gui Jianfeng wrote:
 There's no member cr4_nxe in struct kvm_mmu_page_role, it names nxe now.
 Update mmu document.
 
 Signed-off-by: Gui Jianfeng guijianf...@cn.fujitsu.com
 ---
  Documentation/kvm/mmu.txt |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] qemu-kvm problem with DOS/4GW extender and EMM386.EXE

2010-05-11 Thread Avi Kivity

On 05/11/2010 11:56 PM, Andy Walls wrote:

Running an MS-DOS 6.22 image with qemu-kvm on a RedHat Linux OS, I
noticed the guest OS becomes hung and my dmesg gets spammed with

set_cr0: #GP, set PG flag with a clear PE flag

That message appears to be the linux kernel's kvm emulator griping about
Paging Enable bit being enabled while the Protection Enable bit is set
for real mode.  (The Intel manual says this should be a protection
fault).

The program that causes this has the DOS/4GW DOS extender runtime
compiled into it.

I found that when I don't load the EMM386.EXE memory manager, the
problem doesn't occur.

Here's a kvmtrace segment of when things are not working:

   


Please post kvm issues to k...@vger.


0 (+   0)  CR_READ   vcpu = 0x  pid = 0x1997 [ CR# = 0, 
value = 0x 8011 ]
28471049900815 (+4000)  VMENTRY   vcpu = 0x  pid = 
0x1997
28471049903815 (+3000)  VMEXITvcpu = 0x  pid = 
0x1997 [ exitcode = 0x0010, rip = 0x 2a73 ]
0 (+   0)  LMSW  vcpu = 0x  pid = 0x1997 [ value = 
0x8010 ]
28471049933815 (+   3)  VMENTRY   vcpu = 0x  pid = 
0x1997
28471049936815 (+3000)  VMEXITvcpu = 0x  pid = 
0x1997 [ exitcode = 0x007b, rip = 0x 1fd6 ]




To me it appears EMM386.EXE enables paging, and the DOS/4GW DOS extender
tries to manipulate the PE bit in CR0 with LMSW but doesn't succeed.

These programs appear to work fine in VMWare and on real hardware.


Any ideas on how to make EMM386.EXE and the DOS/$GW extender work in
qemu-kvm?
   


Looks like a bug in the implementation of LMSW.  The manual says:

If the PE flag of the source operand (bit 0) is set to 1, the 
instruction causes the
processor to switch to protected mode. While in protected mode, the 
LMSW instruc-
tion cannot be used to clear the PE flag and force a switch back to 
real-address mode.


But kvm doesn't implement that.  Instead, it follows the operation section:


Operation
CR0[0:3] ← SRC[0:3];


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Don't allow lmsw to clear cr0.pe

2010-05-11 Thread Avi Kivity
The current lmsw implementation allows the guest to clear cr0.pe, contrary
to the manual, which breaks EMM386.EXE.

Fix by ORing the old cr0.pe with lmsw's operand.

Signed-off-by: Avi Kivity a...@redhat.com
---
 arch/x86/kvm/x86.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cd8a606..a40170c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -462,7 +462,7 @@ EXPORT_SYMBOL_GPL(kvm_set_cr0);
 
 void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw)
 {
-   kvm_set_cr0(vcpu, kvm_read_cr0_bits(vcpu, ~0x0ful) | (msw  0x0f));
+   kvm_set_cr0(vcpu, kvm_read_cr0_bits(vcpu, ~0x0eul) | (msw  0x0f));
 }
 EXPORT_SYMBOL_GPL(kvm_lmsw);
 
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >