[KVM-AUTOTEST][PATCH] Win2003-64.steps: fix a barrier at step 17 that sometimes fails due to popups.

2009-04-19 Thread Uri Lublin
From: Michael Goldish mgold...@redhat.com

Signed-off-by: Michael Goldish mgold...@redhat.com

diff --git a/client/tests/kvm_runtest_2/steps/Win2003-64.steps 
b/client/tests/kvm_runtest_2/steps/Win2003-64.steps
index 2ce794f..9a4d3e7 100644
--- a/client/tests/kvm_runtest_2/steps/Win2003-64.steps
+++ b/client/tests/kvm_runtest_2/steps/Win2003-64.steps
@@ -124,7 +124,7 @@ key 0xdc
 step 1081.26
 screendump 20080101_17_defe197bc5b379b442c5bbbf9fe649f1.ppm
 # Windows 2003 Start Menu Opened
-barrier_2 29 19 321 429 2c7df6a8da99f7a33e37b94d165129f4 19
+barrier_2 118 73 209 68 7c9eca05f228fe133007bffb22e14a0c 19
 # Sending keys: u
 key u
 # 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-AUTOTEST][PATCH] kvm_guest_wizard.py, stepmaker.py: do not explicitly print failure messages.

2009-04-19 Thread Uri Lublin
From: Michael Goldish mgold...@redhat.com

This is now taken care of by kvm_runtest_2.py.

Signed-off-by: Michael Goldish mgold...@redhat.com

diff --git a/client/tests/kvm_runtest_2/kvm_guest_wizard.py 
b/client/tests/kvm_runtest_2/kvm_guest_wizard.py
index 2f08c78..afe6b02 100644
--- a/client/tests/kvm_runtest_2/kvm_guest_wizard.py
+++ b/client/tests/kvm_runtest_2/kvm_guest_wizard.py
@@ -139,24 +139,16 @@ def barrier_2(vm, words, fail_if_stuck_for, 
stuck_detection_history, output_dir,
 def run_steps(test, params, env):
 vm = kvm_utils.env_get_vm(env, params.get(main_vm))
 if not vm:
-message = VM object not found in environment
-kvm_log.error(message)
-raise error.TestError, message
+raise error.TestError, VM object not found in environment
 if not vm.is_alive():
-message = VM seems to be dead; Guestwizard requires a living VM
-kvm_log.error(message)
-raise error.TestError, message
+raise error.TestError, VM seems to be dead; Guestwizard requires a 
living VM
 
 steps_filename = params.get(steps)
 if not steps_filename:
-message = Steps filename not specified
-kvm_log.error(message)
-raise error.TestError, message
+raise error.TestError, Steps filename not specified
 steps_filename = os.path.join(test.bindir, steps, steps_filename)
 if not os.path.exists(steps_filename):
-message = Steps file not found: %s % steps_filename
-kvm_log.error(message)
-raise error.TestError, message
+raise error.TestError, Steps file not found: %s % steps_filename
 
 fail_if_stuck_for = params.get(fail_if_stuck_for)
 if fail_if_stuck_for:
diff --git a/client/tests/kvm_runtest_2/stepmaker.py 
b/client/tests/kvm_runtest_2/stepmaker.py
index 476c145..54a1a4a 100644
--- a/client/tests/kvm_runtest_2/stepmaker.py
+++ b/client/tests/kvm_runtest_2/stepmaker.py
@@ -323,24 +323,16 @@ class StepMaker(stepeditor.StepMakerWindow):
 def run_stepmaker(test, params, env):
 vm = kvm_utils.env_get_vm(env, params.get(main_vm))
 if not vm:
-message = VM object not found in environment
-kvm_log.error(message)
-raise error.TestError, message
+raise error.TestError, VM object not found in environment
 if not vm.is_alive():
-message = VM seems to be dead; Step Maker requires a living VM
-kvm_log.error(message)
-raise error.TestError, message
+raise error.TestError, VM seems to be dead; Step Maker requires a 
living VM
 
 steps_filename = params.get(steps)
 if not steps_filename:
-message = Steps filename not specified
-kvm_log.error(message)
-raise error.TestError, message
+raise error.TestError, Steps filename not specified
 steps_filename = os.path.join(test.bindir, steps, steps_filename)
 if os.path.exists(steps_filename):
-message = Steps file %s already exists % steps_filename
-kvm_log.error(message)
-raise error.TestError, message
+raise error.TestError, Steps file %s already exists % steps_filename
 
 StepMaker(vm, steps_filename, test.debugdir, params)
 gtk.main()
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-AUTOTEST][PATCH] kvm_guest_wizard.py: barrier_2: sleep at the end of each iteration.

2009-04-19 Thread Uri Lublin
From: Michael Goldish mgold...@redhat.com

Sleep at the end of the iteration, after checking the md5sum for the first
time. This enables us to pass the barrier very quickly in some cases.
(Currently we sleep first and check the md5sum later.)

Signed-off-by: Michael Goldish mgold...@redhat.com

diff --git a/client/tests/kvm_runtest_2/kvm_guest_wizard.py 
b/client/tests/kvm_runtest_2/kvm_guest_wizard.py
index afe6b02..c89e12e 100644
--- a/client/tests/kvm_runtest_2/kvm_guest_wizard.py
+++ b/client/tests/kvm_runtest_2/kvm_guest_wizard.py
@@ -51,8 +51,6 @@ def barrier_2(vm, words, fail_if_stuck_for, 
stuck_detection_history, output_dir,
 
 failure_message = None
 
-iteration = 0
-
 # Main loop
 while True:
 # Check for timeouts
@@ -63,13 +61,6 @@ def barrier_2(vm, words, fail_if_stuck_for, 
stuck_detection_history, output_dir,
 failure_message = guest is stuck
 break
 
-# Sleep for a while
-if iteration == 0 and sleep_duration == 1.0:
-time.sleep(0.5)
-else:
-time.sleep(sleep_duration)
-iteration += 1
-
 # Make sure vm is alive
 if not vm.is_alive():
 failure_message = VM is dead
@@ -108,6 +99,9 @@ def barrier_2(vm, words, fail_if_stuck_for, 
stuck_detection_history, output_dir,
 # Limit queue length to stuck_detection_history
 prev_whole_image_md5sums = 
prev_whole_image_md5sums[:stuck_detection_history]
 
+# Sleep for a while
+time.sleep(sleep_duration)
+
 # Failure
 message = Barrier failed at step %s after %.2f seconds (%s) % \
 (current_step_num, time.time() - start_time, failure_message)
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-AUTOTEST][PATCH] RHEL-3.9 step files: fix the first barriers, which are a little risky.

2009-04-19 Thread Uri Lublin
From: Michael Goldish mgold...@redhat.com

In these step files, the regions picked for the barriers are updated before the
regions that actually indicate readiness for keyboard input (the boot: prompt).
Therefore, when the step files run, keyboard input (Enter key) is given before
the VM is ready, and thus has no effect.

Signed-off-by: Michael Goldish mgold...@redhat.com

diff --git a/client/tests/kvm_runtest_2/steps/RHEL-3.9-i386.steps 
b/client/tests/kvm_runtest_2/steps/RHEL-3.9-i386.steps
index 0b449d0..7e053f8 100644
--- a/client/tests/kvm_runtest_2/steps/RHEL-3.9-i386.steps
+++ b/client/tests/kvm_runtest_2/steps/RHEL-3.9-i386.steps
@@ -5,14 +5,14 @@
 # 
 step unknown
 screendump 20090404_143150_210be5cd4155aa4fdbd89494106b0f4d.ppm
-barrier_2 176 37 27 270 a3ea4e66598c86efeecc5926e244a8f5 20 optional
+barrier_2 42 171 0 265 1e64135a6da7be7e77f9de86f9eff237 20 optional
 # Sending keys: ret
 key ret
 # 
 step 33.24
 screendump 20080101_01_9e427763ab38cb59e66287f190585eb2.ppm
 # boot options
-barrier_2 272 19 30 26 4559684d52ebb532b4ccae01c190e45f 15 optional
+barrier_2 47 170 0 27 bb2399341a2c56e29213445c375779a6 15 optional
 # Sending keys: ret
 key ret
 # 
diff --git a/client/tests/kvm_runtest_2/steps/RHEL-3.9-x86_64.steps 
b/client/tests/kvm_runtest_2/steps/RHEL-3.9-x86_64.steps
index 9ef221a..6b3e8bf 100644
--- a/client/tests/kvm_runtest_2/steps/RHEL-3.9-x86_64.steps
+++ b/client/tests/kvm_runtest_2/steps/RHEL-3.9-x86_64.steps
@@ -7,7 +7,7 @@ step 33.24
 screendump 20080101_01_9e427763ab38cb59e66287f190585eb2.ppm
 # boot options
 sleep 5
-barrier_2 272 19 30 26 4559684d52ebb532b4ccae01c190e45f 166 optional
+barrier_2 47 167 0 27 a4964917d318bfb8b9094569e19b4d60 166 optional
 # Sending keys: ret
 key ret
 # 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-AUTOTEST][PATCH] make_html_report.py: don't use head -n 1 when getting the KVM version.

2009-04-19 Thread Uri Lublin
From: Michael Goldish mgold...@redhat.com

Use python's splitlines() instead of head -n 1 because the latter causes
broken pipe messages in some configurations.

Signed-off-by: Michael Goldish mgold...@redhat.com

diff --git a/client/tests/kvm_runtest_2/make_html_report.py 
b/client/tests/kvm_runtest_2/make_html_report.py
index 677da78..002062c 100755
--- a/client/tests/kvm_runtest_2/make_html_report.py
+++ b/client/tests/kvm_runtest_2/make_html_report.py
@@ -1636,12 +1636,14 @@ def get_keyval_value(result_dir, key):
 If no appropriate line is found, return 'Unknown'.
 
 keyval_pattern = os.path.join(result_dir, kvm_runtest_2.*, keyval)
-keyval_line = commands.getoutput(grep -h %s %s | head -n 1 % (key, 
keyval_pattern))
+keyval_lines = commands.getoutput(rgrep -h '\b%s\b.*=' %s % (key, 
keyval_pattern))
+if not keyval_lines:
+return Unknown
+keyval_line = keyval_lines.splitlines()[0]
 if key in keyval_line and = in keyval_line:
-value = keyval_line.split(=)[1].strip()
+return keyval_line.split(=)[1].strip()
 else:
-value = Unknown
-return value
+return Unknown
 
 
 def get_kvm_version(result_dir):
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-AUTOTEST][PATCH] Fedora-8-i386.steps: fix a barrier at step 26 that sometimes fails.

2009-04-19 Thread Uri Lublin
From: Alexey Eromenko aerom...@redhat.com

Signed-off-by: Alexey Eromenko aerom...@redhat.com
Signed-off-by: Michael Goldish mgold...@redhat.com

diff --git a/client/tests/kvm_runtest_2/steps/Fedora-8-i386.steps 
b/client/tests/kvm_runtest_2/steps/Fedora-8-i386.steps
index b23878a..732d705 100644
--- a/client/tests/kvm_runtest_2/steps/Fedora-8-i386.steps
+++ b/client/tests/kvm_runtest_2/steps/Fedora-8-i386.steps
@@ -190,7 +190,7 @@ key alt-y
 step 1438.85
 screendump 20080101_26_ecf306a3f7ace4cb936f009c80afab21.ppm
 # date and time
-barrier_2 280 292 520 308 5b1d9fd42d7025be66355a94c4c93dbe 21
+barrier_2 221 290 579 310 4502dc2f76ed6aba1a2229c8385a8eae 21
 # Sending keys: alt-f
 key alt-f
 # 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-AUTOTEST][PATCH] WinXP-32.steps: add an optional barrier to deal with closed start menu.

2009-04-19 Thread Uri Lublin
From: Michael Goldish mgold...@redhat.com

Usually the barrier at step 24 should find an open start menu. Sometimes
the start menu is closed, so an additional step is needed to open it in that
case.

Signed-off-by: Michael Goldish mgold...@redhat.com

diff --git a/client/tests/kvm_runtest_2/steps/WinXP-32.steps 
b/client/tests/kvm_runtest_2/steps/WinXP-32.steps
index ca3cef8..b0c6e35 100644
--- a/client/tests/kvm_runtest_2/steps/WinXP-32.steps
+++ b/client/tests/kvm_runtest_2/steps/WinXP-32.steps
@@ -148,8 +148,20 @@ key alt-f
 # 
 step 2279.61
 screendump 20080101_24_51acae003d73d0f4a4c58fa0a053b471.ppm
-# Win XP Start Menu (opened)
-barrier_2 104 24 0 576 d14fcd1ec7ffcd0fa3e942b9caa4d4e5 240
+# Win XP desktop
+barrier_2 48 51 391 288 bbac8a522510d7c8d6e515f6a3fbd4c3 240
+# 
+step 2279.61
+screendump 20090416_150641_b72ad5c48ec2dbc9814d569e38cbb4cc.ppm
+# Win XP Start Menu (closed)
+barrier_2 104 41 0 559 a7cc02cecff2cb495f300aefbb99d9ae 5 optional
+# Sending keys: ctrl-esc
+key ctrl-esc
+# 
+step 2279.61
+screendump 20080101_24_51acae003d73d0f4a4c58fa0a053b471.ppm
+# Win XP Start Menu (open)
+barrier_2 189 67 0 533 69a5571c58bdf9e0acee8337883bec43 20
 # Sending keys: u
 key u
 # 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-AUTOTEST][PATCH] kvm_tests.cfg.sample: fix typo in RHEL.4.7.x86_64 ISO filename.

2009-04-19 Thread Uri Lublin
From: Alexey Eromenko aerom...@redhat.com

Signed-off-by: Alexey Eromenko aerom...@redhat.com
Signed-off-by: Michael Goldish mgold...@redhat.com
Signed-off-by: Uri Lublin u...@redhat.com

diff --git a/client/tests/kvm_runtest_2/kvm_tests.cfg.sample 
b/client/tests/kvm_runtest_2/kvm_tests.cfg.sample
index f621abe..68df834 100644
--- a/client/tests/kvm_runtest_2/kvm_tests.cfg.sample
+++ b/client/tests/kvm_runtest_2/kvm_tests.cfg.sample
@@ -236,7 +236,7 @@ variants:
 image_name = rhel4-64
 install:
 steps=RHEL-4.7-x86_64.steps
-cdrom=linux/RHEL-4-x86_64-DVD.iso
+cdrom=linux/RHEL-4.7-x86_64-DVD.iso
 md5sum=ea9dae16dd86f7d94092d0e672333292
 md5sum_1m=58fa63eaee68e269f4cb1d2edf479792
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-AUTOTEST][PATCH] kvm_tests.cfg.sample: Windows: change migration_test_command to ver vol.

2009-04-19 Thread Uri Lublin
From: Michael Goldish mgold...@redhat.com

The current migration_test_command is help, which for some reason returns a
non-zero %errorlevel%. This makes get_command_status_output report the output
as if an error has occurred. ver  vol returns 0, and is also probably more
informative.

Signed-off-by: Michael Goldish mgold...@redhat.com

diff --git a/client/tests/kvm_runtest_2/kvm_tests.cfg.sample 
b/client/tests/kvm_runtest_2/kvm_tests.cfg.sample
index 68df834..cc0907b 100644
--- a/client/tests/kvm_runtest_2/kvm_tests.cfg.sample
+++ b/client/tests/kvm_runtest_2/kvm_tests.cfg.sample
@@ -269,6 +269,8 @@ variants:
 ssh_status_test_command = echo %errorlevel%
 username = Administrator
 password = 123456
+migrate:
+migration_test_command = ver  vol
 
 variants:
 - Win2000:
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-AUTOTEST][PATCH] kvm_tests.cfg.sample: remove -snapshot from boot and reboot tests.

2009-04-19 Thread Uri Lublin
From: Michael Goldish mgold...@redhat.com

It shouldn't be too risky to run these tests without -snapshot. It will also
save some time because the following tests, which run without -snapshot, won't
have to restart the VM. That's true since following tests do not use
-snapshot, so the framework shuts down the VM and re-run without
-snapshot.

Signed-off-by: Michael Goldish mgold...@redhat.com
Signed-off-by: Uri Lublin u...@redhat.com

diff --git a/client/tests/kvm_runtest_2/kvm_tests.cfg.sample 
b/client/tests/kvm_runtest_2/kvm_tests.cfg.sample
index cc0907b..9798c83 100644
--- a/client/tests/kvm_runtest_2/kvm_tests.cfg.sample
+++ b/client/tests/kvm_runtest_2/kvm_tests.cfg.sample
@@ -42,13 +42,11 @@ variants:
 
 - boot: install setup
 type = boot
-extra_params +=  -snapshot
 kill_vm_on_error = yes
 
 - reboot:   install setup
 type = boot
 reboot = yes
-extra_params +=  -snapshot
 kill_vm_on_error = yes
 
 - migrate:  install setup
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-AUTOTEST][PATCH] kvm_install: make buildir (where kvm is installed) next to srcdir

2009-04-19 Thread Uri Lublin
From: Uri Lublin u...@redhat.com

If the user specify a directory for the source (srcdir) it makes
sense that buildir would reside in the same directory.

This is useful for server dispatched client tests, as without
this patch one would have to re-install kvm for every job.

Note: this works for most used modes (kvm sources), including
git, snapshot and release. It may not work (meaning buildir will
not be next to srcdir) for e.g. localsrc.

Signed-off-by: Uri Lublin u...@redhat.com

diff --git a/client/tests/kvm_runtest_2/kvm_install.py 
b/client/tests/kvm_runtest_2/kvm_install.py
index 843a621..6d8fcb2 100755
--- a/client/tests/kvm_runtest_2/kvm_install.py
+++ b/client/tests/kvm_runtest_2/kvm_install.py
@@ -228,7 +228,10 @@ def __load_kvm_modules():
 
 def __install_kvm(test, srcdir):
 # create destination dir
-kvm_build_dir = os.path.join(test.bindir, build)
+
+kvm_build_dir = os.path.join(srcdir, '..', '..', 'build')
+kvm_build_dir = os.path.abspath(kvm_build_dir)
+
 if not os.path.exists(kvm_build_dir):
 os.mkdir(kvm_build_dir)
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Split kvm source tarballs

2009-04-19 Thread Avi Kivity

Christoph Hellwig wrote:

Any idea when the split will happen?
  


I'm working on it, but hit a snag when rewriting upstream qemu commits 
to match the official qemu.git:
- my repo and qemu.git used different -k options ($keyword $ 
squashing), so I can't match tree objects
- some non-UTF-8 characters got corrupted, so I can't match log messages 
(some are dups anyway)
- I can't match the git-svn-id line since I ripped it off in 
kvm-userspace.git
- I can't match the ordinal position of the commit in the changelog 
since some bogus commits made their way into kvm-userspace.git


I expect I'll work around these issues, but it isn't like I have my 
script ready to roll.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] deal with interrupt shadow state for emulated instruction

2009-04-19 Thread Avi Kivity

H. Peter Anvin wrote:

Avi Kivity wrote:


We don't emulate guest user mode.

Well, if guest userspace can convince its kernel to give it access to 
some memory mapped I/O register, I guess it can execute repeated 'mov 
ss, mmio' and starve the guest kernel.




It doesn't need a MMIO register to do that, even.



Can you explain how?

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add MCE support to KVM

2009-04-19 Thread Avi Kivity

Anthony Liguori wrote:

Avi Kivity wrote:


Then we would need to tell which read-only MSRs are setup writeable 
and which aren't...


I'm okay with an ioctl to setup MCE, but just make sure userspace has 
all the information to know what the kernel can do rather than the 
try-and-see-if-it-works approach.  We can publish this information 
via KVM_CAP things, or via another ioctl (see 
KVM_GET_SUPPORTED_CPUID2 for an example).


Why not introduce a new exit type for MSR reads/writes that aren't 
handled by the kernel?  You just need a bit on the return that 
indicates whether to GPF because of an invalid MSR access.


KVM_SET_MSRs should be reserved for MSRs that are performance 
sensitive.  Not all of them will be.




Right now everything in the vcpu is emulated in the kernel.  Everything 
else is emulated either in the kernel (irqchip) or in userspace.  This 
makes things easier to understand, and is more future friendly if more 
cpu features become virtualized by hardware.


While these are not compelling reasons, they at least lean the balance 
in favour of a kernel implementation.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v2] Add MCE support to KVM

2009-04-19 Thread Avi Kivity

Anthony Liguori wrote:

Huang Ying wrote:

The related MSRs are emulated. MCE capability is exported via
extension KVM_CAP_MCE and ioctl KVM_X86_GET_MCE_CAP_SUPPORTED. A new
vcpu ioctl command KVM_X86_SETUP_MCE is used to setup MCE emulation
such as the mcg_cap. MCE is injected via vcpu ioctl command
KVM_X86_SET_MCE. Extended machine-check state (MCG_EXT_P) and CMCI are
not simulated.
  


Maybe I'm missing something, but couldn't this be implemented entirely 
within userspace?  There's nothing VT/SVM specific about this.  If the 
issue is setting these MSRs from userspace via KVM_SET_MSRS isn't 
enough, perhaps we should add userspace MSR handling.




You also need to inject the MCE.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] return default values for apic probe functions.

2009-04-19 Thread Avi Kivity

Glauber Costa wrote:

As KVM cpus runs on threads, it is possible that
we call kvm_load_registers() from a cpu thread, while the
apic has not yet fully initialized. kvm_load_registers() is called
from ap_main_loop.

This is not a problem when we're starting the whole machine together,
but is a problem for hotplug, since we don't have the protection
of the locks that protect machine initialization. Currently, some executions
of cpu hotplug on rainy sundays fail with a segfault.

Moving apic initialization to before kvm_init_vpcu proved fruitful,
as there are some dependencies involved. (kvm irqchip would fail to
initialize).
  


I presume you mean unfruitful (or perhaps a nasty kind of fruit).


This patch provides default values to be used for tpr and apic_base,
that will be returned when the apic is not yet properly initialized.
It is aimed at kvm, where the problem exists, but it could equally be
used for qemu too, if there is agreement.
  


Seems like a hack... can you try not to make the vcpu visible until it 
is completely initialized?


(and what is the problem exactly - someone accessing the registers from 
a different thread? that shouldn't happen)


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/15] Coalesce userspace/kernel irqchip interrupt injection logic.

2009-04-19 Thread Avi Kivity

Jan Kiszka wrote:

-
-   if (!npt_enabled  event_injection)
-   kvm_mmu_unprotect_page_virt(svm-vcpu, fault_address);
+   else {
+   if (svm-vcpu.arch.interrupt.pending ||
+   svm-vcpu.arch.exception.pending)
+   kvm_mmu_unprotect_page_virt(svm-vcpu, fault_address);
+   }



Without understanding yet why kvm_mmu_unprotect_page_virt is required
here,


That is needed in case interrupt injection failed because the process of 
injecting the interrupt (usually pushing data on the stack) writes to a 
write-protected page table.


I guess we need nmi_pending here as well.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/15] Add NMI injection support to SVM.

2009-04-19 Thread Avi Kivity

Jan Kiszka wrote:

First, this must return 1 (or set an exit reason, but there is no reason
to escape to user space here). And second, I think a corner case is not
handled the same way as on real iron: If there is already the next NMI
waiting, we will inject it before iret, not after its execution as it
should be.
  


Yes, good catch.


No easy solution for this yet. Maybe emulating iret, but there is no
implementation, specifically for protected mode. 


That will be a disaster, IRET is horribly complicated.


Maybe setting a
breakpoint. Or maybe enforcing a single step exception. 


Single step looks good, except that it may conflict with a guest 
debugging itself (guest debugging an NMI handler?!).



Nothing trivial
in this list. On the other hand, this may only be a slight imprecision
of the virtualization. Need to think about it.
  


It may cause a stack overflow if we have a huge stream of NMIs (if an 
NMI triggers an NMI in the handler).  Of course that's a totally 
unrealistic scenario.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/15] Add NMI injection support to SVM.

2009-04-19 Thread Jan Kiszka
Avi Kivity wrote:
 Jan Kiszka wrote:
 First, this must return 1 (or set an exit reason, but there is no reason
 to escape to user space here). And second, I think a corner case is not
 handled the same way as on real iron: If there is already the next NMI
 waiting, we will inject it before iret, not after its execution as it
 should be.
   
 
 Yes, good catch.
 
 No easy solution for this yet. Maybe emulating iret, but there is no
 implementation, specifically for protected mode. 
 
 That will be a disaster, IRET is horribly complicated.

Yeah, I know...

 
 Maybe setting a
 breakpoint. Or maybe enforcing a single step exception. 
 
 Single step looks good, except that it may conflict with a guest
 debugging itself (guest debugging an NMI handler?!).

But that should be solvable without too much effort.

BTW, guest single-stepping in and out of interrupt handlers is not
properly working anyway. We only set TF in current eflags but do not
bother about the CPU state that will get loaded next. Given some rainy
days (or a paying customer) I think I'll look into this once again. Same
goes for suppressing IRQ injection while single-stepping just as QEMU
does, which may even come earlier as someone already asked for it.

 
 Nothing trivial
 in this list. On the other hand, this may only be a slight imprecision
 of the virtualization. Need to think about it.
   
 
 It may cause a stack overflow if we have a huge stream of NMIs (if an
 NMI triggers an NMI in the handler).  Of course that's a totally
 unrealistic scenario.
 

Good point. But as it is a corner case, I think we can fly without a
complete solution first, fixing it in a second step.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] Add shared memory PCI device that shares a memory object betweens VMs

2009-04-19 Thread Avi Kivity

Cameron Macdonell wrote:


Hi Avi and Anthony,

Sorry for the top-reply, but we haven't discussed this aspect here 
before.


I've been thinking about how to implement interrupts.  As far as I can 
tell, unix domain sockets in Qemu/KVM are used point-to-point with one 
VM being the server by specifying server along with the unix: 
option.  This works simply for two VMs, but I'm unsure how this can 
extend to multiple VMs.  How would a server VM know how many clients 
to wait for?  How can messages then be multicast or broadcast?  Is a 
separate interrupt server necessary?



I don't think unix provides a reliable multicast RPC.  So yes, an 
interrupt server seems necessary.


You could expand its role an make it a shared memory PCI card server, 
and have it also be responsible for providing the backing file using an 
SCM_RIGHTS fd.  That would reduce setup headaches for users (setting up 
a file for which all VMs have permissions).


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Luvalley-2 has been released: running KVM below any operating system

2009-04-19 Thread Avi Kivity
Xiaodong Yi wrote:
 Hi,

 I've tested the guest Linux using UnixBench 5.1.2. The platform is:
   * Intel's Core Due CPU with 2 cores, 2GB RAM
   * CentOS 5.2 as the dom0 Linux, i.e., the host Linux for KVM
   * CentOS 5.2 as the guest Linux, i.e., the Linux running on the
 virtual machine provided by Qemu

 The first set of results is for Luvalley, and the second one is for
 KVM. As the result, Luvalley's guest Linux is 20% ~ 30% faster than
 KVM's guest! It is very surprise to me. I had through Luvalley's guest
 should be the same performance as KVM's.

   

Yes, it is surprising.

 Double-Precision Whetstone12287.7 MWIPS (10.0 s, 2 
 samples)
 Double-Precision Whetstone 2166.3 MWIPS (10.2 s, 2 samples

That's by far the biggest difference. Can you confirm it isn't a typo?

If not, then it looks like we have a bug in floating point handling. I
don't think this benchmark uses sse.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: compiling git -85rc

2009-04-19 Thread Avi Kivity

Michael Tokarev wrote:

For quite some time the compilation of userspace is
broken on my environment.  I'm not sure if it's new
or it was this way before -- at least kvm-84 builds
fine.

The problem happens when the kernel running is 64bits
and the userspace is 32bits.  In the source tree,
qemu/ directory correctly configures itself to use
-m32 compiler flag (the default anyway), but other
directories -- libkvm/ and user/ uses x86_64 for
$(ARCH) and hence -m64 and /lib64.  And sure thing,
the link stage does not work because -lkvm is not
compatible (32 vs 64 bits).

I had to change
 include config-$(ARCH).mak
into
 include config-i386.mak
in libkvm/Makefile and in user/Makefile to build it.

Maybe, just may be, it's sufficient to do a
 make ARCH=i386
step instead - I'll try that too.



That's a bug, I'll fix it up.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Luvalley-2 has been released: running KVM below any operating system

2009-04-19 Thread Jan Kiszka
Avi Kivity wrote:
 Xiaodong Yi wrote:
 Hi,

 I've tested the guest Linux using UnixBench 5.1.2. The platform is:
   * Intel's Core Due CPU with 2 cores, 2GB RAM
   * CentOS 5.2 as the dom0 Linux, i.e., the host Linux for KVM
   * CentOS 5.2 as the guest Linux, i.e., the Linux running on the
 virtual machine provided by Qemu

 The first set of results is for Luvalley, and the second one is for
 KVM. As the result, Luvalley's guest Linux is 20% ~ 30% faster than
 KVM's guest! It is very surprise to me. I had through Luvalley's guest
 should be the same performance as KVM's.

   
 
 Yes, it is surprising.
 
 Double-Precision Whetstone12287.7 MWIPS (10.0 s, 2 
 samples)
 Double-Precision Whetstone 2166.3 MWIPS (10.2 s, 2 
 samples
 
 That's by far the biggest difference. Can you confirm it isn't a typo?
 
 If not, then it looks like we have a bug in floating point handling. I
 don't think this benchmark uses sse.
 

Even the native Linux numbers are not that high, rather comparable to
KVM. I suspect Luvalley is fooling the benchmark here...

Jan




signature.asc
Description: OpenPGP digital signature


Re: Luvalley-2 has been released: running KVM below any operating system

2009-04-19 Thread Avi Kivity
Jan Kiszka wrote:
 Avi Kivity wrote:
   
 Xiaodong Yi wrote:
 
 Hi,

 I've tested the guest Linux using UnixBench 5.1.2. The platform is:
   * Intel's Core Due CPU with 2 cores, 2GB RAM
   * CentOS 5.2 as the dom0 Linux, i.e., the host Linux for KVM
   * CentOS 5.2 as the guest Linux, i.e., the Linux running on the
 virtual machine provided by Qemu

 The first set of results is for Luvalley, and the second one is for
 KVM. As the result, Luvalley's guest Linux is 20% ~ 30% faster than
 KVM's guest! It is very surprise to me. I had through Luvalley's guest
 should be the same performance as KVM's.

   
   
 Yes, it is surprising.

 
 Double-Precision Whetstone12287.7 MWIPS (10.0 s, 2 
 samples)
 Double-Precision Whetstone 2166.3 MWIPS (10.2 s, 2 
 samples
   
 That's by far the biggest difference. Can you confirm it isn't a typo?

 If not, then it looks like we have a bug in floating point handling. I
 don't think this benchmark uses sse.

 

 Even the native Linux numbers are not that high, rather comparable to
 KVM. I suspect Luvalley is fooling the benchmark here...
   

Most likely timing. Likely Luvally guests time runs to slowly, so they
achieve more loops per unit of guest time.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: port kvm to arm

2009-04-19 Thread Avi Kivity

(copying kvm@vger.kernel.org, where most kvm development happens)

sen wang wrote:

I have some idea about the ARM KVM.

I have register a project ,namedarm-kvm, to port KVM to arm.
I'll kick off the ARM KVM porting next week.

This is my plan:

step 1   Make a guest linux run on the Hypervisor-linux.

From the Hypervisor-linux's viewpoint, The guest is a thread of him. 
which is like the X86 KVM.But,In this step,the guest-linux has no IO. 
It is based on roofs. The physical memroy is allocated to the guest os 
at creating time. The guest-linux still run in the privileged 
mode,which handle the user mode memory exeception ,SWI system call ... 
But the guest-linux will not touch the hardware devices.The physical 
interrupt should be controlled by Hypervisor-linux. So the 
Hypervisor-linux can switch it off when interrupt happens. And the 
Hypervisor-linux will provide some vitual system devices like system 
timer and interrupt controller for guest.


Because lack of hardware support,The guest linux should know when to 
exit,for exemple the idle time,the kernel panic, the guest should 
voluntarily exit VM --which is like a thread switch.


so there is another job: modification of Hypervisor-linux to support 
arm,and modification of qemu for arm KVM


You'll can hook read[bwl]() and similar functions to use hypercalls, 
which will allow you to use emulated devices in qemu, even without  full 
mmu virtualization.




step 2   Make guest IO work

The guest should have IO functions.Of course, IT is the virtIO. we 
should make the virtIO work on the arm kvm.The arm has no hardware 
virtualization extension like intel VT. But,In the guest virtIO 
driver, when need start IO,the guest should voluntarily exit VM--so it 
like IO-exit of the intel VT.


If you make emulated devices work, then you can defer this to the last 
step.  virtio is only an optimization.





step 3   add memory virtualization for guest

just make the guest run in the user mode,porting the x86 kvm memory 
virtualization function for arm kvm,and so on. But I think we should 
keep the work mode of step 1 as a option for user. sometime, I think 
the guest handle the physical memory directly is not a bad thing.



step 4   use the trustzone

from arm V6, there is a hardware extension of trustzone. We can make 
use of it. let the Hypervisor-linux run int the monitor state,the 
guest run in the non-security state. But,the guest os run as 
normal--linux kernel in the privileged mode,the user program in the 
user mode.So architecture is more elegant.


Does the monitor state have its own userspace?  If not, how are you 
going to run qemu?




step5   more guest os
the work mode of step 1 has the advantage: we don't need modify the 
guest much,which make port wince,symbain possible. for, you know,we 
don't have so much source codes for those os.


welcome join the arm-kvm team. thanks


Good luck :)

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/15] Add NMI injection support to SVM.

2009-04-19 Thread Gleb Natapov
On Fri, Apr 17, 2009 at 03:12:57PM +, Dmitry Eremin-Solenikov wrote:
 
 This patch does expose some problems on real HW. The first NMI completes w/o
 problems. However If I try to boot the kernel w/ nmi_watchdog=1 or to trigger
 two NMIs from the monitor, kernel is stuck somewhere.
 
Can you try this patch instead patch13:


diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8b6f6e9..057a612 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -766,6 +766,7 @@ enum {
 #define HF_GIF_MASK(1  0)
 #define HF_HIF_MASK(1  1)
 #define HF_VINTR_MASK  (1  2)
+#define HF_NMI_MASK(1  3)
 
 /*
  * Hardware virtualization extension instructions may fault if a
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c605477..0a2b3f1 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1834,6 +1834,13 @@ static int cpuid_interception(struct vcpu_svm *svm, 
struct kvm_run *kvm_run)
return 1;
 }
 
+static int iret_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
+{
+   svm-vmcb-control.intercept = ~(1UL  INTERCEPT_IRET);
+   svm-vcpu.arch.hflags = ~HF_NMI_MASK;
+   return 1;
+}
+
 static int invlpg_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
 {
if (emulate_instruction(svm-vcpu, kvm_run, 0, 0, 0) != EMULATE_DONE)
@@ -2111,6 +2118,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm,
[SVM_EXIT_VINTR]= interrupt_window_interception,
/* [SVM_EXIT_CR0_SEL_WRITE] = emulate_on_interception, */
[SVM_EXIT_CPUID]= cpuid_interception,
+   [SVM_EXIT_IRET] = iret_interception,
[SVM_EXIT_INVD] = emulate_on_interception,
[SVM_EXIT_HLT]  = halt_interception,
[SVM_EXIT_INVLPG]   = invlpg_interception,
@@ -2218,6 +2226,12 @@ static void pre_svm_run(struct vcpu_svm *svm)
new_asid(svm, svm_data);
 }
 
+static void svm_inject_nmi(struct vcpu_svm *svm)
+{
+   svm-vmcb-control.event_inj = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI;
+   svm-vcpu.arch.hflags |= HF_NMI_MASK;
+   svm-vmcb-control.intercept |= (1UL  INTERCEPT_IRET);
+}
 
 static inline void svm_inject_irq(struct vcpu_svm *svm, int irq)
 {
@@ -2269,6 +2283,14 @@ static void update_cr8_intercept(struct kvm_vcpu *vcpu)
vmcb-control.intercept_cr_write |= INTERCEPT_CR8_MASK;
 }
 
+static int svm_nmi_allowed(struct kvm_vcpu *vcpu)
+{
+   struct vcpu_svm *svm = to_svm(vcpu);
+   struct vmcb *vmcb = svm-vmcb;
+   return !(vmcb-control.int_state  SVM_INTERRUPT_SHADOW_MASK) 
+   !(svm-vcpu.arch.hflags  HF_NMI_MASK);
+}
+
 static int svm_interrupt_allowed(struct kvm_vcpu *vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
@@ -2284,16 +2306,35 @@ static void enable_irq_window(struct kvm_vcpu *vcpu)
svm_inject_irq(to_svm(vcpu), 0x0);
 }
 
+static void enable_nmi_window(struct kvm_vcpu *vcpu)
+{
+   struct vcpu_svm *svm = to_svm(vcpu);
+
+   if (svm-vmcb-control.int_state  SVM_INTERRUPT_SHADOW_MASK)
+   enable_irq_window(vcpu);
+}
+
 static void svm_intr_inject(struct kvm_vcpu *vcpu)
 {
/* try to reinject previous events if any */
+   if (vcpu-arch.nmi_injected) {
+   svm_inject_nmi(to_svm(vcpu));
+   return;
+   }
+
if (vcpu-arch.interrupt.pending) {
svm_queue_irq(to_svm(vcpu), vcpu-arch.interrupt.nr);
return;
}
 
/* try to inject new event if pending */
-   if (kvm_cpu_has_interrupt(vcpu)) {
+   if (vcpu-arch.nmi_pending) {
+   if (svm_nmi_allowed(vcpu)) {
+   vcpu-arch.nmi_pending = false;
+   vcpu-arch.nmi_injected = true;
+   svm_inject_nmi(vcpu);
+   }
+   } else if (kvm_cpu_has_interrupt(vcpu)) {
if (svm_interrupt_allowed(vcpu)) {
kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu));
svm_queue_irq(to_svm(vcpu), vcpu-arch.interrupt.nr);
@@ -2312,7 +2353,10 @@ static void svm_intr_assist(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
 
svm_intr_inject(vcpu);
 
-   if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
+   /* enable NMI/IRQ window open exits if needed */
+   if (vcpu-arch.nmi_pending)
+   enable_nmi_window(vcpu);
+   else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
enable_irq_window(vcpu);
 
 out:
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/15] Add NMI injection support to SVM.

2009-04-19 Thread Gleb Natapov
On Fri, Apr 17, 2009 at 09:55:45PM +0200, Jan Kiszka wrote:
 Gleb Natapov wrote:
  Signed-off-by: Gleb Natapov g...@redhat.com
  ---
   arch/x86/include/asm/kvm_host.h |1 +
   arch/x86/kvm/svm.c  |   49 
  +-
   2 files changed, 48 insertions(+), 2 deletions(-)
  
  diff --git a/arch/x86/include/asm/kvm_host.h 
  b/arch/x86/include/asm/kvm_host.h
  index 8b6f6e9..057a612 100644
  --- a/arch/x86/include/asm/kvm_host.h
  +++ b/arch/x86/include/asm/kvm_host.h
  @@ -766,6 +766,7 @@ enum {
   #define HF_GIF_MASK(1  0)
   #define HF_HIF_MASK(1  1)
   #define HF_VINTR_MASK  (1  2)
  +#define HF_NMI_MASK(1  3)
   
   /*
* Hardware virtualization extension instructions may fault if a
  diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
  index c605477..cd60fd7 100644
  --- a/arch/x86/kvm/svm.c
  +++ b/arch/x86/kvm/svm.c
  @@ -1834,6 +1834,13 @@ static int cpuid_interception(struct vcpu_svm *svm, 
  struct kvm_run *kvm_run)
  return 1;
   }
   
  +static int iret_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
  +{
  +   svm-vmcb-control.intercept = ~(1UL  INTERCEPT_IRET);
  +   svm-vcpu.arch.hflags = ~HF_NMI_MASK;
  +   return 0;
  +}
 
 First, this must return 1 (or set an exit reason, but there is no reason
 to escape to user space here). And second, I think a corner case is not
 handled the same way as on real iron: If there is already the next NMI
 waiting, we will inject it before iret, not after its execution as it
 should be.
 
 No easy solution for this yet. Maybe emulating iret, but there is no
 implementation, specifically for protected mode. Maybe setting a
 breakpoint. Or maybe enforcing a single step exception. Nothing trivial
 in this list. On the other hand, this may only be a slight imprecision
 of the virtualization. Need to think about it.
 
What about this:
Instead of clearing HF_NMI_MASK in iret_interception() we can set
another flag (HF_IRET) and on guest entry clear HF_NMI_MASK (and
HF_IRET) if HF_IRET is set, but do that after checking for NMI
injection. The pending NMI will be injected on the next entry.
Also not how real HW works, but may be better then current situation.


--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/15] Add NMI injection support to SVM.

2009-04-19 Thread Avi Kivity

Gleb Natapov wrote:

On Fri, Apr 17, 2009 at 09:55:45PM +0200, Jan Kiszka wrote:
  

Gleb Natapov wrote:


Signed-off-by: Gleb Natapov g...@redhat.com
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/svm.c  |   49 +-
 2 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8b6f6e9..057a612 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -766,6 +766,7 @@ enum {
 #define HF_GIF_MASK(1  0)
 #define HF_HIF_MASK(1  1)
 #define HF_VINTR_MASK  (1  2)
+#define HF_NMI_MASK(1  3)
 
 /*

  * Hardware virtualization extension instructions may fault if a
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c605477..cd60fd7 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1834,6 +1834,13 @@ static int cpuid_interception(struct vcpu_svm *svm, 
struct kvm_run *kvm_run)
return 1;
 }
 
+static int iret_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)

+{
+   svm-vmcb-control.intercept = ~(1UL  INTERCEPT_IRET);
+   svm-vcpu.arch.hflags = ~HF_NMI_MASK;
+   return 0;
+}
  

First, this must return 1 (or set an exit reason, but there is no reason
to escape to user space here). And second, I think a corner case is not
handled the same way as on real iron: If there is already the next NMI
waiting, we will inject it before iret, not after its execution as it
should be.

No easy solution for this yet. Maybe emulating iret, but there is no
implementation, specifically for protected mode. Maybe setting a
breakpoint. Or maybe enforcing a single step exception. Nothing trivial
in this list. On the other hand, this may only be a slight imprecision
of the virtualization. Need to think about it.



What about this:
Instead of clearing HF_NMI_MASK in iret_interception() we can set
another flag (HF_IRET) and on guest entry clear HF_NMI_MASK (and
HF_IRET) if HF_IRET is set, but do that after checking for NMI
injection. The pending NMI will be injected on the next entry.
Also not how real HW works, but may be better then current situation.
  


There may not be a next entry if the guest is in a tight loop.  Given 
NMIs are used for watchdogs, that's not good.


btw, injection before IRET is executed is broken if interrupt stack 
tables are used, since the injection will reset rsp instead of nesting.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/15] Add NMI injection support to SVM.

2009-04-19 Thread Gleb Natapov
On Sun, Apr 19, 2009 at 04:21:29PM +0300, Avi Kivity wrote:
 Gleb Natapov wrote:
 On Fri, Apr 17, 2009 at 09:55:45PM +0200, Jan Kiszka wrote:
   
 Gleb Natapov wrote:
 
 Signed-off-by: Gleb Natapov g...@redhat.com
 ---
  arch/x86/include/asm/kvm_host.h |1 +
  arch/x86/kvm/svm.c  |   49 
 +-
  2 files changed, 48 insertions(+), 2 deletions(-)

 diff --git a/arch/x86/include/asm/kvm_host.h 
 b/arch/x86/include/asm/kvm_host.h
 index 8b6f6e9..057a612 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -766,6 +766,7 @@ enum {
  #define HF_GIF_MASK   (1  0)
  #define HF_HIF_MASK   (1  1)
  #define HF_VINTR_MASK (1  2)
 +#define HF_NMI_MASK   (1  3)
   /*
   * Hardware virtualization extension instructions may fault if a
 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
 index c605477..cd60fd7 100644
 --- a/arch/x86/kvm/svm.c
 +++ b/arch/x86/kvm/svm.c
 @@ -1834,6 +1834,13 @@ static int cpuid_interception(struct vcpu_svm *svm, 
 struct kvm_run *kvm_run)
return 1;
  }
  +static int iret_interception(struct vcpu_svm *svm, struct kvm_run 
 *kvm_run)
 +{
 +  svm-vmcb-control.intercept = ~(1UL  INTERCEPT_IRET);
 +  svm-vcpu.arch.hflags = ~HF_NMI_MASK;
 +  return 0;
 +}
   
 First, this must return 1 (or set an exit reason, but there is no reason
 to escape to user space here). And second, I think a corner case is not
 handled the same way as on real iron: If there is already the next NMI
 waiting, we will inject it before iret, not after its execution as it
 should be.

 No easy solution for this yet. Maybe emulating iret, but there is no
 implementation, specifically for protected mode. Maybe setting a
 breakpoint. Or maybe enforcing a single step exception. Nothing trivial
 in this list. On the other hand, this may only be a slight imprecision
 of the virtualization. Need to think about it.

 
 What about this:
 Instead of clearing HF_NMI_MASK in iret_interception() we can set
 another flag (HF_IRET) and on guest entry clear HF_NMI_MASK (and
 HF_IRET) if HF_IRET is set, but do that after checking for NMI
 injection. The pending NMI will be injected on the next entry.
 Also not how real HW works, but may be better then current situation.
   

 There may not be a next entry if the guest is in a tight loop.  Given  
 NMIs are used for watchdogs, that's not good.

We don't exit a guest after kvm time slice ends?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/15] Add NMI injection support to SVM.

2009-04-19 Thread Jan Kiszka
Gleb Natapov wrote:
 On Fri, Apr 17, 2009 at 09:55:45PM +0200, Jan Kiszka wrote:
 Gleb Natapov wrote:
 Signed-off-by: Gleb Natapov g...@redhat.com
 ---
  arch/x86/include/asm/kvm_host.h |1 +
  arch/x86/kvm/svm.c  |   49 
 +-
  2 files changed, 48 insertions(+), 2 deletions(-)

 diff --git a/arch/x86/include/asm/kvm_host.h 
 b/arch/x86/include/asm/kvm_host.h
 index 8b6f6e9..057a612 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -766,6 +766,7 @@ enum {
  #define HF_GIF_MASK(1  0)
  #define HF_HIF_MASK(1  1)
  #define HF_VINTR_MASK  (1  2)
 +#define HF_NMI_MASK(1  3)
  
  /*
   * Hardware virtualization extension instructions may fault if a
 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
 index c605477..cd60fd7 100644
 --- a/arch/x86/kvm/svm.c
 +++ b/arch/x86/kvm/svm.c
 @@ -1834,6 +1834,13 @@ static int cpuid_interception(struct vcpu_svm *svm, 
 struct kvm_run *kvm_run)
 return 1;
  }
  
 +static int iret_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
 +{
 +   svm-vmcb-control.intercept = ~(1UL  INTERCEPT_IRET);
 +   svm-vcpu.arch.hflags = ~HF_NMI_MASK;
 +   return 0;
 +}
 First, this must return 1 (or set an exit reason, but there is no reason
 to escape to user space here). And second, I think a corner case is not
 handled the same way as on real iron: If there is already the next NMI
 waiting, we will inject it before iret, not after its execution as it
 should be.

 No easy solution for this yet. Maybe emulating iret, but there is no
 implementation, specifically for protected mode. Maybe setting a
 breakpoint. Or maybe enforcing a single step exception. Nothing trivial
 in this list. On the other hand, this may only be a slight imprecision
 of the virtualization. Need to think about it.

 What about this:
 Instead of clearing HF_NMI_MASK in iret_interception() we can set
 another flag (HF_IRET) and on guest entry clear HF_NMI_MASK (and
 HF_IRET) if HF_IRET is set, but do that after checking for NMI
 injection. The pending NMI will be injected on the next entry.
 Also not how real HW works, but may be better then current situation.

It's OK as a first step towards correct NMI emulation. Additionally, you
could enable the IRQ window interception in case the is an NMI pending.
The resulting behavior should then much like the VNMI mask emulation for
vmx.

The next step should then be setting TF in the eflags stored on the
guest's stack before returning *if* there is already the next NMI
pending. But I wonder how much additional effort this will actually mean
(compared to the band-aid work)... :)

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 13/15] Add NMI injection support to SVM.

2009-04-19 Thread Avi Kivity

Gleb Natapov wrote:
There may not be a next entry if the guest is in a tight loop.  Given  
NMIs are used for watchdogs, that's not good.




We don't exit a guest after kvm time slice ends?
  


There are no time slices any more.  If there's only once thread for a 
vcpu, you might have no exits at all with a tickless kernel.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/15] Add NMI injection support to SVM.

2009-04-19 Thread Gleb Natapov
On Sun, Apr 19, 2009 at 03:27:22PM +0200, Jan Kiszka wrote:
 Gleb Natapov wrote:
  On Fri, Apr 17, 2009 at 09:55:45PM +0200, Jan Kiszka wrote:
  Gleb Natapov wrote:
  Signed-off-by: Gleb Natapov g...@redhat.com
  ---
   arch/x86/include/asm/kvm_host.h |1 +
   arch/x86/kvm/svm.c  |   49 
  +-
   2 files changed, 48 insertions(+), 2 deletions(-)
 
  diff --git a/arch/x86/include/asm/kvm_host.h 
  b/arch/x86/include/asm/kvm_host.h
  index 8b6f6e9..057a612 100644
  --- a/arch/x86/include/asm/kvm_host.h
  +++ b/arch/x86/include/asm/kvm_host.h
  @@ -766,6 +766,7 @@ enum {
   #define HF_GIF_MASK  (1  0)
   #define HF_HIF_MASK  (1  1)
   #define HF_VINTR_MASK(1  2)
  +#define HF_NMI_MASK  (1  3)
   
   /*
* Hardware virtualization extension instructions may fault if a
  diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
  index c605477..cd60fd7 100644
  --- a/arch/x86/kvm/svm.c
  +++ b/arch/x86/kvm/svm.c
  @@ -1834,6 +1834,13 @@ static int cpuid_interception(struct vcpu_svm 
  *svm, struct kvm_run *kvm_run)
return 1;
   }
   
  +static int iret_interception(struct vcpu_svm *svm, struct kvm_run 
  *kvm_run)
  +{
  + svm-vmcb-control.intercept = ~(1UL  INTERCEPT_IRET);
  + svm-vcpu.arch.hflags = ~HF_NMI_MASK;
  + return 0;
  +}
  First, this must return 1 (or set an exit reason, but there is no reason
  to escape to user space here). And second, I think a corner case is not
  handled the same way as on real iron: If there is already the next NMI
  waiting, we will inject it before iret, not after its execution as it
  should be.
 
  No easy solution for this yet. Maybe emulating iret, but there is no
  implementation, specifically for protected mode. Maybe setting a
  breakpoint. Or maybe enforcing a single step exception. Nothing trivial
  in this list. On the other hand, this may only be a slight imprecision
  of the virtualization. Need to think about it.
 
  What about this:
  Instead of clearing HF_NMI_MASK in iret_interception() we can set
  another flag (HF_IRET) and on guest entry clear HF_NMI_MASK (and
  HF_IRET) if HF_IRET is set, but do that after checking for NMI
  injection. The pending NMI will be injected on the next entry.
  Also not how real HW works, but may be better then current situation.
 
 It's OK as a first step towards correct NMI emulation. Additionally, you
 could enable the IRQ window interception in case the is an NMI pending.
 The resulting behavior should then much like the VNMI mask emulation for
 vmx.
 
Yeah, but the question is if IRQ windows is already opened will exit
happens before or after IRET.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/15] Add NMI injection support to SVM.

2009-04-19 Thread Gleb Natapov
On Sun, Apr 19, 2009 at 04:28:09PM +0300, Avi Kivity wrote:
 Gleb Natapov wrote:
 There may not be a next entry if the guest is in a tight loop.  Given 
  NMIs are used for watchdogs, that's not good.

 
 We don't exit a guest after kvm time slice ends?
   

 There are no time slices any more.  If there's only once thread for a  
 vcpu, you might have no exits at all with a tickless kernel.

Well, KVM may request some kind of even (timer) that will cause exit to
VCPU. This looks hacky though.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/15] Add NMI injection support to SVM.

2009-04-19 Thread Jan Kiszka
Gleb Natapov wrote:
 On Sun, Apr 19, 2009 at 03:27:22PM +0200, Jan Kiszka wrote:
 Gleb Natapov wrote:
 On Fri, Apr 17, 2009 at 09:55:45PM +0200, Jan Kiszka wrote:
 Gleb Natapov wrote:
 Signed-off-by: Gleb Natapov g...@redhat.com
 ---
  arch/x86/include/asm/kvm_host.h |1 +
  arch/x86/kvm/svm.c  |   49 
 +-
  2 files changed, 48 insertions(+), 2 deletions(-)

 diff --git a/arch/x86/include/asm/kvm_host.h 
 b/arch/x86/include/asm/kvm_host.h
 index 8b6f6e9..057a612 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -766,6 +766,7 @@ enum {
  #define HF_GIF_MASK  (1  0)
  #define HF_HIF_MASK  (1  1)
  #define HF_VINTR_MASK(1  2)
 +#define HF_NMI_MASK  (1  3)
  
  /*
   * Hardware virtualization extension instructions may fault if a
 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
 index c605477..cd60fd7 100644
 --- a/arch/x86/kvm/svm.c
 +++ b/arch/x86/kvm/svm.c
 @@ -1834,6 +1834,13 @@ static int cpuid_interception(struct vcpu_svm 
 *svm, struct kvm_run *kvm_run)
   return 1;
  }
  
 +static int iret_interception(struct vcpu_svm *svm, struct kvm_run 
 *kvm_run)
 +{
 + svm-vmcb-control.intercept = ~(1UL  INTERCEPT_IRET);
 + svm-vcpu.arch.hflags = ~HF_NMI_MASK;
 + return 0;
 +}
 First, this must return 1 (or set an exit reason, but there is no reason
 to escape to user space here). And second, I think a corner case is not
 handled the same way as on real iron: If there is already the next NMI
 waiting, we will inject it before iret, not after its execution as it
 should be.

 No easy solution for this yet. Maybe emulating iret, but there is no
 implementation, specifically for protected mode. Maybe setting a
 breakpoint. Or maybe enforcing a single step exception. Nothing trivial
 in this list. On the other hand, this may only be a slight imprecision
 of the virtualization. Need to think about it.

 What about this:
 Instead of clearing HF_NMI_MASK in iret_interception() we can set
 another flag (HF_IRET) and on guest entry clear HF_NMI_MASK (and
 HF_IRET) if HF_IRET is set, but do that after checking for NMI
 injection. The pending NMI will be injected on the next entry.
 Also not how real HW works, but may be better then current situation.
 It's OK as a first step towards correct NMI emulation. Additionally, you
 could enable the IRQ window interception in case the is an NMI pending.
 The resulting behavior should then much like the VNMI mask emulation for
 vmx.

 Yeah, but the question is if IRQ windows is already opened will exit
 happens before or after IRET.
 

Hey, this is band-aid, it won't heal all cases. ;)

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 13/15] Add NMI injection support to SVM.

2009-04-19 Thread Avi Kivity

Gleb Natapov wrote:

It's OK as a first step towards correct NMI emulation. Additionally, you
could enable the IRQ window interception in case the is an NMI pending.
The resulting behavior should then much like the VNMI mask emulation for
vmx.



Yeah, but the question is if IRQ windows is already opened will exit
happens before or after IRET.
  


You mean if the NMI handler enabled interrupts?

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/15] Add NMI injection support to SVM.

2009-04-19 Thread Gleb Natapov
On Sun, Apr 19, 2009 at 04:40:51PM +0300, Avi Kivity wrote:
 Gleb Natapov wrote:
 It's OK as a first step towards correct NMI emulation. Additionally, you
 could enable the IRQ window interception in case the is an NMI pending.
 The resulting behavior should then much like the VNMI mask emulation for
 vmx.

 
 Yeah, but the question is if IRQ windows is already opened will exit
 happens before or after IRET.
   

 You mean if the NMI handler enabled interrupts?

Yes.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/15] Add NMI injection support to SVM.

2009-04-19 Thread Avi Kivity

Gleb Natapov wrote:

On Sun, Apr 19, 2009 at 04:40:51PM +0300, Avi Kivity wrote:
  

Gleb Natapov wrote:


It's OK as a first step towards correct NMI emulation. Additionally, you
could enable the IRQ window interception in case the is an NMI pending.
The resulting behavior should then much like the VNMI mask emulation for
vmx.




Yeah, but the question is if IRQ windows is already opened will exit
happens before or after IRET.
  
  

You mean if the NMI handler enabled interrupts?



Yes.

  


Then the guest deserves whatever it gets...

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/15] Add NMI injection support to SVM.

2009-04-19 Thread Jan Kiszka
Gleb Natapov wrote:
 On Sun, Apr 19, 2009 at 04:28:09PM +0300, Avi Kivity wrote:
 Gleb Natapov wrote:
 There may not be a next entry if the guest is in a tight loop.  Given 
  NMIs are used for watchdogs, that's not good.

 
 We don't exit a guest after kvm time slice ends?
   
 There are no time slices any more.  If there's only once thread for a  
 vcpu, you might have no exits at all with a tickless kernel.

 Well, KVM may request some kind of even (timer) that will cause exit to
 VCPU. This looks hacky though.

We already spent to much electrons and brain cycles on possibly much
simpler workarounds. I think injecting and handling a single-step, even
while there is guest debugging going on or the guest itself single-steps
or both, will not be more complicated - but more correct.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 13/15] Add NMI injection support to SVM.

2009-04-19 Thread Gleb Natapov
On Sun, Apr 19, 2009 at 04:43:12PM +0300, Avi Kivity wrote:
 Gleb Natapov wrote:
 On Sun, Apr 19, 2009 at 04:40:51PM +0300, Avi Kivity wrote:
   
 Gleb Natapov wrote:
 
 It's OK as a first step towards correct NMI emulation. Additionally, you
 could enable the IRQ window interception in case the is an NMI pending.
 The resulting behavior should then much like the VNMI mask emulation for
 vmx.

 
 Yeah, but the question is if IRQ windows is already opened will exit
 happens before or after IRET.
 
 You mean if the NMI handler enabled interrupts?

 
 Yes.

   

 Then the guest deserves whatever it gets...

I suspect windows may do this since it uses NMI for task switching.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/15] Add NMI injection support to SVM.

2009-04-19 Thread Avi Kivity

Jan Kiszka wrote:

We already spent to much electrons and brain cycles on possibly much
simpler workarounds. I think injecting and handling a single-step, even
while there is guest debugging going on or the guest itself single-steps
or both, will not be more complicated - but more correct.
  


I agree.  I'm still worried about interactions between the IRET single 
stepping code and other things which use the debug registers.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/15] Add NMI injection support to SVM.

2009-04-19 Thread Gleb Natapov
On Sun, Apr 19, 2009 at 04:49:18PM +0300, Avi Kivity wrote:
 Jan Kiszka wrote:
 We already spent to much electrons and brain cycles on possibly much
 simpler workarounds. I think injecting and handling a single-step, even
 while there is guest debugging going on or the guest itself single-steps
 or both, will not be more complicated - but more correct.
   

 I agree.  I'm still worried about interactions between the IRET single  
 stepping code and other things which use the debug registers.

I don't disagree too. Just throwing other ideas :)

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/15] Coalesce userspace/kernel irqchip interrupt injection logic.

2009-04-19 Thread Gleb Natapov
On Sat, Apr 18, 2009 at 07:28:20PM +0300, Gleb Natapov wrote:
  
  So this patch may either expose a bug in the svm emulation of qemu or
  comes with a subtle regression that only triggers due to qemu's timing.
  This needs to be understood. Gleb, any progress on reproducing it on
  your side?
  
 I reproduced it and I am debugging it. In my case the boot hangs on sti;hlt
 sequence. Instrumentation thus far shows that at this point interrupts no 
 longer
 injected because ppr value is too big. Need to see why, but tpr handling
 is not complete in qemu svm. May be this is the reason. Will know more
 tomorrow.
 
I've looked into this and my conclusion is that if you are not going to
develop SVM in qemu don't use it just yet. QEMU doesn't handle exceptions
during event injection properly. Actually it does not handle it at all,
so if PF happens during interrupt injection interrupt is lost and, what
worse, is never acked. If interrupt was high prio it blocks all other
interrupts.

The patch below adds exception handling during event injection. Valid
flag removed from EVENTINJ only after successful injection and EVENTINJ
is copied to EXITINTINFO on exit. Can you give it a try?

And this is not the only problem I saw, but the one that caused my guest
to hang.

diff --git a/target-i386/op_helper.c b/target-i386/op_helper.c
index be09263..9264afd 100644
--- a/target-i386/op_helper.c
+++ b/target-i386/op_helper.c
@@ -1249,6 +1249,10 @@ void do_interrupt(int intno, int is_int, int error_code,
 } else {
 do_interrupt_real(intno, is_int, error_code, next_eip);
 }
+if (env-hflags  HF_SVMI_MASK) {
+   uint32_t event_inj = ldl_phys(env-vm_vmcb + offsetof(struct vmcb, 
control.event_inj));
+   stl_phys(env-vm_vmcb + offsetof(struct vmcb, control.event_inj), 
event_inj  ~SVM_EVTINJ_VALID);
+}
 }
 
 /* This should come from sysemu.h - if we could include it here... */
@@ -4994,7 +4998,6 @@ void helper_vmrun(int aflag, int next_eip_addend)
 uint8_t vector = event_inj  SVM_EVTINJ_VEC_MASK;
 uint16_t valid_err = event_inj  SVM_EVTINJ_VALID_ERR;
 uint32_t event_inj_err = ldl_phys(env-vm_vmcb + offsetof(struct vmcb, 
control.event_inj_err));
-stl_phys(env-vm_vmcb + offsetof(struct vmcb, control.event_inj), 
event_inj  ~SVM_EVTINJ_VALID);
 
 qemu_log_mask(CPU_LOG_TB_IN_ASM, Injecting(%#hx): , valid_err);
 /* FIXME: need to implement valid_err */
@@ -5331,6 +5334,8 @@ void helper_vmexit(uint32_t exit_code, uint64_t 
exit_info_1)
 cpu_x86_set_cpl(env, 0);
 stq_phys(env-vm_vmcb + offsetof(struct vmcb, control.exit_code), 
exit_code);
 stq_phys(env-vm_vmcb + offsetof(struct vmcb, control.exit_info_1), 
exit_info_1);
+stl_phys(env-vm_vmcb + offsetof(struct vmcb, control.exit_int_info), 
ldl_phys(env-vm_vmcb + offsetof(struct vmcb, control.event_inj)));
+stl_phys(env-vm_vmcb + offsetof(struct vmcb, control.exit_int_info_err), 
ldl_phys(env-vm_vmcb + offsetof(struct vmcb, control.event_inj_err)));
 
 env-hflags2 = ~HF2_GIF_MASK;
 /* FIXME: Resets the current ASID register to zero (host ASID). */
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/15] Add NMI injection support to SVM.

2009-04-19 Thread Jan Kiszka
Avi Kivity wrote:
 Jan Kiszka wrote:
 We already spent to much electrons and brain cycles on possibly much
 simpler workarounds. I think injecting and handling a single-step, even
 while there is guest debugging going on or the guest itself single-steps
 or both, will not be more complicated - but more correct.
   
 
 I agree.  I'm still worried about interactions between the IRET single
 stepping code and other things which use the debug registers.

The interaction is inside KVM, so under our control. We may simply save
the related states, hook into the guest-debugging parts that evaluate
single step exceptions, and handle this case with highest prio,
transparently to users of lower prio. In theory - you never really know
until you tried to implement it...

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 05/15] Coalesce userspace/kernel irqchip interrupt injection logic.

2009-04-19 Thread Jan Kiszka
Gleb Natapov wrote:
 On Sat, Apr 18, 2009 at 07:28:20PM +0300, Gleb Natapov wrote:
 So this patch may either expose a bug in the svm emulation of qemu or
 comes with a subtle regression that only triggers due to qemu's timing.
 This needs to be understood. Gleb, any progress on reproducing it on
 your side?

 I reproduced it and I am debugging it. In my case the boot hangs on sti;hlt
 sequence. Instrumentation thus far shows that at this point interrupts no 
 longer
 injected because ppr value is too big. Need to see why, but tpr handling
 is not complete in qemu svm. May be this is the reason. Will know more
 tomorrow.

 I've looked into this and my conclusion is that if you are not going to
 develop SVM in qemu don't use it just yet.

We had a resource conflict regarding SVM capable AMD boxes and a tight
schedule, so we decided to pick qemu as initial development platform.
Turns out that this has was a bit too optimistic. :)

 QEMU doesn't handle exceptions
 during event injection properly. Actually it does not handle it at all,
 so if PF happens during interrupt injection interrupt is lost and, what
 worse, is never acked. If interrupt was high prio it blocks all other
 interrupts.
 
 The patch below adds exception handling during event injection. Valid
 flag removed from EVENTINJ only after successful injection and EVENTINJ
 is copied to EXITINTINFO on exit. Can you give it a try?

Ah, great, thanks. Will test.

 
 And this is not the only problem I saw, but the one that caused my guest
 to hang.

OK, good to know. I added Alex (though he's said to be on vacation ATM)
and qemu to CC. Maybe you can quickly list the other issues you've
stumbled over, for the records and for motivating contributors...

 
 diff --git a/target-i386/op_helper.c b/target-i386/op_helper.c
 index be09263..9264afd 100644
 --- a/target-i386/op_helper.c
 +++ b/target-i386/op_helper.c
 @@ -1249,6 +1249,10 @@ void do_interrupt(int intno, int is_int, int 
 error_code,
  } else {
  do_interrupt_real(intno, is_int, error_code, next_eip);
  }
 +if (env-hflags  HF_SVMI_MASK) {
 + uint32_t event_inj = ldl_phys(env-vm_vmcb + offsetof(struct vmcb, 
 control.event_inj));
 + stl_phys(env-vm_vmcb + offsetof(struct vmcb, control.event_inj), 
 event_inj  ~SVM_EVTINJ_VALID);
 +}
  }
  
  /* This should come from sysemu.h - if we could include it here... */
 @@ -4994,7 +4998,6 @@ void helper_vmrun(int aflag, int next_eip_addend)
  uint8_t vector = event_inj  SVM_EVTINJ_VEC_MASK;
  uint16_t valid_err = event_inj  SVM_EVTINJ_VALID_ERR;
  uint32_t event_inj_err = ldl_phys(env-vm_vmcb + offsetof(struct 
 vmcb, control.event_inj_err));
 -stl_phys(env-vm_vmcb + offsetof(struct vmcb, control.event_inj), 
 event_inj  ~SVM_EVTINJ_VALID);
  
  qemu_log_mask(CPU_LOG_TB_IN_ASM, Injecting(%#hx): , valid_err);
  /* FIXME: need to implement valid_err */
 @@ -5331,6 +5334,8 @@ void helper_vmexit(uint32_t exit_code, uint64_t 
 exit_info_1)
  cpu_x86_set_cpl(env, 0);
  stq_phys(env-vm_vmcb + offsetof(struct vmcb, control.exit_code), 
 exit_code);
  stq_phys(env-vm_vmcb + offsetof(struct vmcb, control.exit_info_1), 
 exit_info_1);
 +stl_phys(env-vm_vmcb + offsetof(struct vmcb, control.exit_int_info), 
 ldl_phys(env-vm_vmcb + offsetof(struct vmcb, control.event_inj)));
 +stl_phys(env-vm_vmcb + offsetof(struct vmcb, 
 control.exit_int_info_err), ldl_phys(env-vm_vmcb + offsetof(struct vmcb, 
 control.event_inj_err)));
  
  env-hflags2 = ~HF2_GIF_MASK;
  /* FIXME: Resets the current ASID register to zero (host ASID). */
 --
   Gleb.

Thanks again,
Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 13/15] Add NMI injection support to SVM.

2009-04-19 Thread Julian Stecklina
Gleb Natapov g...@redhat.com writes:

 On Sun, Apr 19, 2009 at 04:43:12PM +0300, Avi Kivity wrote:
 Gleb Natapov wrote:
 On Sun, Apr 19, 2009 at 04:40:51PM +0300, Avi Kivity wrote:
   
 Gleb Natapov wrote:
 
 It's OK as a first step towards correct NMI emulation. Additionally, you
 could enable the IRQ window interception in case the is an NMI pending.
 The resulting behavior should then much like the VNMI mask emulation for
 vmx.

 
 Yeah, but the question is if IRQ windows is already opened will exit
 happens before or after IRET.
 
 You mean if the NMI handler enabled interrupts?

 
 Yes.

   

 Then the guest deserves whatever it gets...

 I suspect windows may do this since it uses NMI for task switching.

Could you elaborate on that? How/why does it use NMIs for task
switching?

Regards,
-- 
Julian Stecklina

The day Microsoft makes something that doesn't suck is probably the day
they start making vacuum cleaners - Ernst Jan Plugge

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/15] Add NMI injection support to SVM.

2009-04-19 Thread Gleb Natapov
On Sun, Apr 19, 2009 at 04:07:52PM +0200, Julian Stecklina wrote:
 Gleb Natapov g...@redhat.com writes:
 
  On Sun, Apr 19, 2009 at 04:43:12PM +0300, Avi Kivity wrote:
  Gleb Natapov wrote:
  On Sun, Apr 19, 2009 at 04:40:51PM +0300, Avi Kivity wrote:

  Gleb Natapov wrote:
  
  It's OK as a first step towards correct NMI emulation. Additionally, 
  you
  could enable the IRQ window interception in case the is an NMI pending.
  The resulting behavior should then much like the VNMI mask emulation 
  for
  vmx.
 
  
  Yeah, but the question is if IRQ windows is already opened will exit
  happens before or after IRET.
  
  You mean if the NMI handler enabled interrupts?
 
  
  Yes.
 

 
  Then the guest deserves whatever it gets...
 
  I suspect windows may do this since it uses NMI for task switching.
 
 Could you elaborate on that? How/why does it use NMIs for task
 switching?
 
During WHQL testing (or if you just enable verifier on windows 2003)
windows changes hibernate to not power down a PC, but resume
immediately. During this immediate resume it sends NMI to non-boot CPUs
while IDT for nmi is configured as a task gate. I am not sure it
actually calls IRET after that.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/15] Add NMI injection support to SVM.

2009-04-19 Thread Avi Kivity

Gleb Natapov wrote:

Could you elaborate on that? How/why does it use NMIs for task
switching?



During WHQL testing (or if you just enable verifier on windows 2003)
windows changes hibernate to not power down a PC, but resume
immediately. During this immediate resume it sends NMI to non-boot CPUs
while IDT for nmi is configured as a task gate. I am not sure it
actually calls IRET after that.
  


If it doesn't call IRET, it will never see another NMI.

But of course it will execute IRET, as part of normal execution.  You 
can't do anything without it.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2768533 ] SCSI/VirtIO: VM unable to reboot from non-IDE controller

2009-04-19 Thread SourceForge.net
Bugs item #2768533, was opened at 2009-04-16 16:01
Message generated for change (Comment added) made by technologov
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2768533group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 7
Private: No
Submitted By: Technologov (technologov)
Assigned to: Nobody/Anonymous (nobody)
Summary: SCSI/VirtIO: VM unable to reboot from non-IDE controller

Initial Comment:
All Red Hat-based systems (RHEL, Fedora) have failed during automated tests on 
SCSI disks on KVM.
It turned out to be a problem of Qemu/KVM, where after a software reboot (i.e. 
initiated by guest OS), the system is unable to boot from SCSI controller.

To reboot successfully you need either soft reboot+IDE or Hard reboot (Qemu 
cold boot)+SCSI.

The Command sent to Qemu/KVM: 
/usr/local/bin/qemu-system-x86_64 -m 512 -monitor 
tcp:localhost:4602,server,nowait -cdrom /isos/linux/Fedora-8-i386-DVD.iso  
-drive file=/vm/fedora8-32.qcow2,if=scsi,boot=on -name fedora8-32

Host: RHEL 5/x64, KVM-85rc6. (tried both Intel and AMD)
Guest: RHEL 5, Fedora 8, Fedora 9 (tried both 32 and 64-bit).

-Alexey, 16.4.2009.

--

Comment By: Technologov (technologov)
Date: 2009-04-19 17:25

Message:
Bug was also opened in Ubuntu Launchpad.
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/363743

Two reasons: 
1. Qemu project does not have bugzilla - so Ubuntu Launchpad became main
bugzilla for Qemu
2. This bug affects Ubuntu-8.10

--

Comment By: Technologov (technologov)
Date: 2009-04-16 17:12

Message:
This problem also exists for VirtIO block device.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2768533group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/15] Coalesce userspace/kernel irqchip interrupt injection logic.

2009-04-19 Thread Gleb Natapov
On Sun, Apr 19, 2009 at 04:05:21PM +0200, Jan Kiszka wrote:
  And this is not the only problem I saw, but the one that caused my guest
  to hang.
 
 OK, good to know. I added Alex (though he's said to be on vacation ATM)
 and qemu to CC. Maybe you can quickly list the other issues you've
 stumbled over, for the records and for motivating contributors...
 
Another one that I remember (because this was my first suspect) is
interrupt shadow handling. HF_INHIBIT_IRQ_MASK is cleared on exit when
shadow bit is set in int_state and is not set on entry if hypervisor
set shadow bit by itself. I am not sure how real HW actually handles this,
but patch below demonstrates how I think it does it :) And of cause
comments like /* FIXME: this should respect TPR */ don't look promising.


diff --git a/target-i386/op_helper.c b/target-i386/op_helper.c
index be09263..691a7f0 100644
--- a/target-i386/op_helper.c
+++ b/target-i386/op_helper.c
@@ -4971,6 +4997,15 @@ void helper_vmrun(int aflag, int next_eip_addend)
 env-dr[6] = ldq_phys(env-vm_vmcb + offsetof(struct vmcb, save.dr6));
 cpu_x86_set_cpl(env, ldub_phys(env-vm_vmcb + offsetof(struct vmcb, 
save.cpl)));
 
+{
+   uint32_t aaa;
+aaa = ldl_phys(env-vm_vmcb + offsetof(struct vmcb, 
control.int_state));
+   if (aaa  SVM_INTERRUPT_SHADOW_MASK)
+   helper_set_inhibit_irq();
+   else
+   helper_reset_inhibit_irq();
+}
+
 /* FIXME: guest state consistency checks */
 
 switch(ldub_phys(env-vm_vmcb + offsetof(struct vmcb, control.tlb_ctl))) {
@@ -5243,7 +5280,6 @@ void helper_vmexit(uint32_t exit_code, uint64_t 
exit_info_1)
 
 if(env-hflags  HF_INHIBIT_IRQ_MASK) {
 stl_phys(env-vm_vmcb + offsetof(struct vmcb, control.int_state), 
SVM_INTERRUPT_SHADOW_MASK);
-env-hflags = ~HF_INHIBIT_IRQ_MASK;
 } else {
 stl_phys(env-vm_vmcb + offsetof(struct vmcb, control.int_state), 0);
 }
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/15] Add NMI injection support to SVM.

2009-04-19 Thread Gleb Natapov
On Sun, Apr 19, 2009 at 05:20:37PM +0300, Avi Kivity wrote:
 Gleb Natapov wrote:
 Could you elaborate on that? How/why does it use NMIs for task
 switching?

 
 During WHQL testing (or if you just enable verifier on windows 2003)
 windows changes hibernate to not power down a PC, but resume
 immediately. During this immediate resume it sends NMI to non-boot CPUs
 while IDT for nmi is configured as a task gate. I am not sure it
 actually calls IRET after that.
   

 If it doesn't call IRET, it will never see another NMI.

 But of course it will execute IRET, as part of normal execution.  You  
 can't do anything without it.

Boot CPU can send INIT after task switch (and I think this is what
happens).

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/15] Add NMI injection support to SVM.

2009-04-19 Thread Avi Kivity

Gleb Natapov wrote:

On Sun, Apr 19, 2009 at 05:20:37PM +0300, Avi Kivity wrote:
  

Gleb Natapov wrote:


Could you elaborate on that? How/why does it use NMIs for task
switching?




During WHQL testing (or if you just enable verifier on windows 2003)
windows changes hibernate to not power down a PC, but resume
immediately. During this immediate resume it sends NMI to non-boot CPUs
while IDT for nmi is configured as a task gate. I am not sure it
actually calls IRET after that.
  
  

If it doesn't call IRET, it will never see another NMI.

But of course it will execute IRET, as part of normal execution.  You  
can't do anything without it.




Boot CPU can send INIT after task switch (and I think this is what
happens).
  


But eventually it will execute IRET.

(We need to fix INIT to clear the NMI blocking flag, not that it matters 
so much)


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/15] Coalesce userspace/kernel irqchip interrupt injection logic.

2009-04-19 Thread Jan Kiszka
Jan Kiszka wrote:
 Gleb Natapov wrote:
 On Sat, Apr 18, 2009 at 07:28:20PM +0300, Gleb Natapov wrote:
 So this patch may either expose a bug in the svm emulation of qemu or
 comes with a subtle regression that only triggers due to qemu's timing.
 This needs to be understood. Gleb, any progress on reproducing it on
 your side?

 I reproduced it and I am debugging it. In my case the boot hangs on sti;hlt
 sequence. Instrumentation thus far shows that at this point interrupts no 
 longer
 injected because ppr value is too big. Need to see why, but tpr handling
 is not complete in qemu svm. May be this is the reason. Will know more
 tomorrow.

 I've looked into this and my conclusion is that if you are not going to
 develop SVM in qemu don't use it just yet.
 
 We had a resource conflict regarding SVM capable AMD boxes and a tight
 schedule, so we decided to pick qemu as initial development platform.
 Turns out that this has was a bit too optimistic. :)
 
 QEMU doesn't handle exceptions
 during event injection properly. Actually it does not handle it at all,
 so if PF happens during interrupt injection interrupt is lost and, what
 worse, is never acked. If interrupt was high prio it blocks all other
 interrupts.

 The patch below adds exception handling during event injection. Valid
 flag removed from EVENTINJ only after successful injection and EVENTINJ
 is copied to EXITINTINFO on exit. Can you give it a try?
 
 Ah, great, thanks. Will test.

I can confirm: patch below makes my kvm-in-qemu test case happy, too.
Maybe you want to post this with changelog and signed-off to qemu-devel.

Jan

 diff --git a/target-i386/op_helper.c b/target-i386/op_helper.c
 index be09263..9264afd 100644
 --- a/target-i386/op_helper.c
 +++ b/target-i386/op_helper.c
 @@ -1249,6 +1249,10 @@ void do_interrupt(int intno, int is_int, int 
 error_code,
  } else {
  do_interrupt_real(intno, is_int, error_code, next_eip);
  }
 +if (env-hflags  HF_SVMI_MASK) {
 +uint32_t event_inj = ldl_phys(env-vm_vmcb + offsetof(struct vmcb, 
 control.event_inj));
 +stl_phys(env-vm_vmcb + offsetof(struct vmcb, control.event_inj), 
 event_inj  ~SVM_EVTINJ_VALID);
 +}
  }
  
  /* This should come from sysemu.h - if we could include it here... */
 @@ -4994,7 +4998,6 @@ void helper_vmrun(int aflag, int next_eip_addend)
  uint8_t vector = event_inj  SVM_EVTINJ_VEC_MASK;
  uint16_t valid_err = event_inj  SVM_EVTINJ_VALID_ERR;
  uint32_t event_inj_err = ldl_phys(env-vm_vmcb + offsetof(struct 
 vmcb, control.event_inj_err));
 -stl_phys(env-vm_vmcb + offsetof(struct vmcb, control.event_inj), 
 event_inj  ~SVM_EVTINJ_VALID);
  
  qemu_log_mask(CPU_LOG_TB_IN_ASM, Injecting(%#hx): , valid_err);
  /* FIXME: need to implement valid_err */
 @@ -5331,6 +5334,8 @@ void helper_vmexit(uint32_t exit_code, uint64_t 
 exit_info_1)
  cpu_x86_set_cpl(env, 0);
  stq_phys(env-vm_vmcb + offsetof(struct vmcb, control.exit_code), 
 exit_code);
  stq_phys(env-vm_vmcb + offsetof(struct vmcb, control.exit_info_1), 
 exit_info_1);
 +stl_phys(env-vm_vmcb + offsetof(struct vmcb, control.exit_int_info), 
 ldl_phys(env-vm_vmcb + offsetof(struct vmcb, control.event_inj)));
 +stl_phys(env-vm_vmcb + offsetof(struct vmcb, 
 control.exit_int_info_err), ldl_phys(env-vm_vmcb + offsetof(struct vmcb, 
 control.event_inj_err)));
  
  env-hflags2 = ~HF2_GIF_MASK;
  /* FIXME: Resets the current ASID register to zero (host ASID). */
 --
  Gleb.




signature.asc
Description: OpenPGP digital signature


Re: [PATCH 05/15] Coalesce userspace/kernel irqchip interrupt injection logic.

2009-04-19 Thread Gleb Natapov
On Sun, Apr 19, 2009 at 05:06:29PM +0200, Jan Kiszka wrote:
 Jan Kiszka wrote:
  Gleb Natapov wrote:
  On Sat, Apr 18, 2009 at 07:28:20PM +0300, Gleb Natapov wrote:
  So this patch may either expose a bug in the svm emulation of qemu or
  comes with a subtle regression that only triggers due to qemu's timing.
  This needs to be understood. Gleb, any progress on reproducing it on
  your side?
 
  I reproduced it and I am debugging it. In my case the boot hangs on 
  sti;hlt
  sequence. Instrumentation thus far shows that at this point interrupts no 
  longer
  injected because ppr value is too big. Need to see why, but tpr handling
  is not complete in qemu svm. May be this is the reason. Will know more
  tomorrow.
 
  I've looked into this and my conclusion is that if you are not going to
  develop SVM in qemu don't use it just yet.
  
  We had a resource conflict regarding SVM capable AMD boxes and a tight
  schedule, so we decided to pick qemu as initial development platform.
  Turns out that this has was a bit too optimistic. :)
  
  QEMU doesn't handle exceptions
  during event injection properly. Actually it does not handle it at all,
  so if PF happens during interrupt injection interrupt is lost and, what
  worse, is never acked. If interrupt was high prio it blocks all other
  interrupts.
 
  The patch below adds exception handling during event injection. Valid
  flag removed from EVENTINJ only after successful injection and EVENTINJ
  is copied to EXITINTINFO on exit. Can you give it a try?
  
  Ah, great, thanks. Will test.
 
 I can confirm: patch below makes my kvm-in-qemu test case happy, too.
 Maybe you want to post this with changelog and signed-off to qemu-devel.
 
Yeah, I'll reformat and submit.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/15] Add NMI injection support to SVM.

2009-04-19 Thread Gleb Natapov
On Sun, Apr 19, 2009 at 05:57:56PM +0300, Avi Kivity wrote:
 Gleb Natapov wrote:
 On Sun, Apr 19, 2009 at 05:20:37PM +0300, Avi Kivity wrote:
   
 Gleb Natapov wrote:
 
 Could you elaborate on that? How/why does it use NMIs for task
 switching?

 
 During WHQL testing (or if you just enable verifier on windows 2003)
 windows changes hibernate to not power down a PC, but resume
 immediately. During this immediate resume it sends NMI to non-boot CPUs
 while IDT for nmi is configured as a task gate. I am not sure it
 actually calls IRET after that.
 
 If it doesn't call IRET, it will never see another NMI.

 But of course it will execute IRET, as part of normal execution.  You 
  can't do anything without it.

 
 Boot CPU can send INIT after task switch (and I think this is what
 happens).
   

 But eventually it will execute IRET.

Yes :) But I strongly suspect that NMI window will be opened after SIPI
even before first IRET.

 (We need to fix INIT to clear the NMI blocking flag, not that it matters  
 so much)
If we reset intercept mask on INIT, but don't clear NMI blocking flag we
will never receive NMIs on the vcpu.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Defer remote tlb flushes on invlpg (v4)

2009-04-19 Thread Marcelo Tosatti
On Sat, Apr 18, 2009 at 05:34:27PM +0200, Andrea Arcangeli wrote:
 On Sun, Apr 12, 2009 at 07:31:58PM -0300, Marcelo Tosatti wrote:
  mmu_sync_children needs to find any _reachable_ sptes that are unsync,
  read the guest pagetable, and sync the sptes. Before it can inspect the
  guest pagetable, it needs to write protect it, with rmap_write_protect.
 
 So far so good.
 
  In theory it wouldnt be necesarry to call
  kvm_flush_remote_tlbs_cond(protected) here, but only
  kvm_flush_remote_tlbs(), since the kvm-remote_tlbs_dirty information
  is not pertinent to mmu_sync_children.
 
 Hmm I'm not sure I fully understand how it is not pertinent.
 
 When we have a cr3 write in a remote vcpu, mmu_sync_children runs and
 it resyncs all sptes reachaeble for that remote vcpu context. But to
 resync the sptes, it also has to get rid of any old writable tlb entry
 for its remote vcpu where cr3 write is running. Checking only sptes to
 find writable ones, updating the sptes that are mapped by the writable
 sptes, and marking them wrprotected, isn't enough, as old spte
 contents may still be cached in the tlb if remote_tlbs_dirty is true!

 Think if the invlpg'd spte that got nuked by rmap_remove in the invlpg
 handler running in the current vcpu has a writable tlb entry active in
 the vcpu that later does cr3 overwrite. mmu_sync_children running in
 the remote vcpu will find no writable spte in the rmap chain
 representing that spte (because that spte that is still cached in the
 remote tlb, has already been zapped by the current vcpu) but it is
 still cached and writable in the remote vcpu TLB cache, when cr3
 overwrite runs.

Right, so you have to cope with the fact that invlpg can skip a TLB
flush. OK.

  But this is done here to clear remote_tlbs_dirty (after a
  kvm_flush_remote_tlbs remote_tlbs_dirty is clean), ie: as an
  optimization.
 
 The whole point of remote_tlbs_dirty is to defer any smp tlb flush at
 the least possible time, so how can it be an optimization to run a
 kvm_flush_remote_tlbs earlier than necessary? The only way this can be
 an optimization, is to never run kvm_flush_remote_tlbs unless
 absolutely necessary, and to leave remote_tlbs_dirty true instead of
 calling kvm_flush_remote_tlbs. Calling kvm_flush_remote_tlbs_cond
 instead of kvm_flush_remote_tlbs cannot ever be an optimization.

Right.

 @@ -465,7 +464,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, 
gva_t gva)
   rmap_remove(vcpu-kvm, sptep);
   if (is_large_pte(*sptep))
   --vcpu-kvm-stat.lpages;
-  need_flush = 1;
+  vcpu-kvm-remote_tlbs_dirty = true;
   }
   set_shadow_pte(sptep, 
shadow_trap_nonpresent_pte);
   break;
@@ -475,8 +474,6 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, 
gva_t 
gva)
   break;
   }
 - if (need_flush)
-  kvm_flush_remote_tlbs(vcpu-kvm);
   spin_unlock(vcpu-kvm-mmu_lock);
   
   AFIK to be compliant with lowlevel archs (without ASN it doesn't
   matter I think as vmx always flush on exit), we have to flush the
   local tlb here, with set_bit(KVM_REQ_TLB_FLUSH, vcpu-requests). I
   don't see why it's missing. Or am I wrong?
  
  Caller does it:
  
  void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
  {
  vcpu-arch.mmu.invlpg(vcpu, gva);
  kvm_mmu_flush_tlb(vcpu);
  ++vcpu-stat.invlpg;
  }
 
 Ah great! Avi also mentioned it I recall but I didn't figure out it
 was after FNAME(invlpg) returns. But isn't always more efficient to
 set_bit(KVM_REQ_TLB_FLUSH, vcpu-requests) instead like I'm doing?

Sure that works too.

 See my version of kvm_flush_local_tlb, that's a bit different from
 kvm_mmu_flush_tlb and I'm made the old no-mmu notifier kernel safe too
 which I think is worth it. If you like to keep my version of
 kvm_flush_local_tlb, then I've simply to remove the kvm_mmu_flush_tlb
 from kvm_mmu_invlpg and move it inside the arch.mmu.invlpg handler so
 each shadow implementation does it its way. Comments?

I'm fine with your kvm_flush_local_tlb. Just one minor nit:

+   /* get new asid before returning to guest mode */
+   if (!test_bit(KVM_REQ_TLB_FLUSH, vcpu-requests))
+   set_bit(KVM_REQ_TLB_FLUSH, vcpu-requests);

Whats the test_bit for?

 If you disagree, and you want to run kvm_mmu_flush_tlb in
 kvm_mmu_invlpg instead of set_bit(KVM_REQ_TLB_FLUSH, vcpu-requests),
 then I can expand the kvm_flush_local_tlb with the smp_wmb() in the
 FNAME(invlpg) code. Like:
 
 if (need_flush) {
smp_wmb();
remote_tlbs_dirty = true;
 }
 spin_unlock(mmu_lock);
 
 Then the local tlb flush will run when we return from FNAME(invlpg)
 and remote_tlbs_dirty is set _after_ 

Re: [PATCH -v2] Add MCE support to KVM

2009-04-19 Thread Huang Ying
On Sat, 2009-04-18 at 23:54 +0800, Anthony Liguori wrote:
 Huang Ying wrote:
  The related MSRs are emulated. MCE capability is exported via
  extension KVM_CAP_MCE and ioctl KVM_X86_GET_MCE_CAP_SUPPORTED. A new
  vcpu ioctl command KVM_X86_SETUP_MCE is used to setup MCE emulation
  such as the mcg_cap. MCE is injected via vcpu ioctl command
  KVM_X86_SET_MCE. Extended machine-check state (MCG_EXT_P) and CMCI are
  not simulated.

 
 Maybe I'm missing something, but couldn't this be implemented entirely 
 within userspace?  There's nothing VT/SVM specific about this.  If the 
 issue is setting these MSRs from userspace via KVM_SET_MSRS isn't 
 enough, perhaps we should add userspace MSR handling.
 
 Also, if you implement the MSR logic in userspace, it's pretty simple to 
 make it work in the non-TCG case which will be a requirement for 
 upstream merging.

There is more logic than just KVM_SET_MSRS, such as BANK reporting
disabling, overwriting rules, triple fault for UC MCE during MCIP.
Although these logic can be implemented in user space, I think put them
in kernel space is easy to be understood. And the code is pretty short.

Best Regards,
Huang Ying



signature.asc
Description: This is a digitally signed message part


Re: [PATCH -v2 2/2] kvm userspace: Add MCE simulation to kvm

2009-04-19 Thread Huang Ying
On Fri, 2009-04-17 at 22:49 +0800, Marcelo Tosatti wrote:
 Hi Huang,
 
 On Fri, Apr 17, 2009 at 03:38:45PM +0800, Huang Ying wrote:
  - MCE features are initialized when VCPU is initialized according to CPUID.
  - A monitor command mce is added to inject a MCE.
  
  
  ChangeLog:
  
  v2:
  
  - Use new kernel MCE capability exportion interface.
  
  
  Signed-off-by: Huang Ying ying.hu...@intel.com
  
  ---
   libkvm/libkvm-x86.c|   33 +
   libkvm/libkvm.h|4 
   qemu/monitor.c |   26 ++
   qemu/qemu-kvm-x86.c|   33 +
   qemu/qemu-kvm.c|   26 ++
   qemu/qemu-kvm.h|4 
   qemu/target-i386/cpu.h |3 +++
   7 files changed, 129 insertions(+)
  
  --- a/qemu/monitor.c
  +++ b/qemu/monitor.c
  @@ -1557,6 +1557,31 @@ static void do_info_status(Monitor *mon)
   }
   
   
  +#if defined(TARGET_I386) || defined(TARGET_X86_64)
  +static void do_inject_mce(Monitor *mon,
  + int cpu_index, int bank,
  + unsigned status_hi, unsigned status_lo,
  + unsigned mcg_status_hi, unsigned mcg_status_lo,
  + unsigned addr_hi, unsigned addr_lo,
  + unsigned misc_hi, unsigned misc_lo)
  +{
  +CPUState *env;
  +struct kvm_x86_mce mce = {
  +   .bank = bank,
  +   .status = ((uint64_t)status_hi  32) | status_lo,
  +   .mcg_status = ((uint64_t)mcg_status_hi  32) | mcg_status_lo,
  +   .addr = ((uint64_t)addr_hi  32) | addr_lo,
  +   .misc = ((uint64_t)misc_hi  32) | misc_lo,
  +};
  +
  +for (env = first_cpu; env != NULL; env = env-next_cpu)
  +   if (env-cpu_index == cpu_index  env-mcg_cap) {
  +   kvm_inject_x86_mce(env, mce);
  +   break;
  +   }
  +}
  +#endif
 
 This will break compilation on hosts with older kernel headers. To ease
 the merge with QEMU, do_inject_mce should call qemu_inject_x86_mce,
 which does nothing for the case where KVM is disabled, or call KVM
 specific function otherwise (knowledge about struct kvm_x86_mce should
 be contained within KVM code).

Yes. I will fix this.

  +
   static void do_balloon(Monitor *mon, int value)
   {
   ram_addr_t target = value;
  @@ -1758,6 +1783,7 @@ static const mon_cmd_t mon_cmds[] = {
 [tap,user,socket,vde] options, add host VLAN client },
   { host_net_remove, is, net_host_device_remove,
 vlan_id name, remove host VLAN client },
  +{ mce, ii, do_inject_mce, cpu bank status mcgstatus addr 
  misc, inject a MCE on the given CPU},
   #endif
   { balloon, i, do_balloon,
 target, request VM to change it's memory allocation (in MB) },
  --- a/libkvm/libkvm-x86.c
  +++ b/libkvm/libkvm-x86.c
  @@ -379,6 +379,39 @@ int kvm_set_msrs(kvm_context_t kvm, int 
   return r;
   }
   
 
  +int kvm_get_mce_cap_supported(kvm_context_t kvm, uint64_t *mce_cap,
  + int *max_banks)
  +{
  +int r;
  +
  +r = ioctl(kvm-fd, KVM_CHECK_EXTENSION, KVM_CAP_MCE);
  +if (r  0) {
  +   *max_banks = r;
  +   return ioctl(kvm-fd, KVM_X86_GET_MCE_CAP_SUPPORTED, mce_cap);
  +}
  +return -ENOSYS;
  +}
  +
  +int kvm_setup_mce(kvm_context_t kvm, int vcpu, uint64_t *mcg_cap)
  +{
  +int r;
  +
  +r = ioctl(kvm-fd, KVM_CHECK_EXTENSION, KVM_CAP_MCE);
  +if (r  0)
  +   return ioctl(kvm-vcpu_fd[vcpu], KVM_X86_SETUP_MCE, mcg_cap);
  +return -ENOSYS;
  +}
  +
  +int kvm_set_mce(kvm_context_t kvm, int vcpu, struct kvm_x86_mce *m)
  +{
  +int r;
  +
  +r = ioctl(kvm-fd, KVM_CHECK_EXTENSION, KVM_CAP_MCE);
  +if (r  0)
  +   return ioctl(kvm-vcpu_fd[vcpu], KVM_X86_SET_MCE, m);
  +return -ENOSYS;
  +}
 
 Have to #ifdef KVM_CAP_MCE where appropriate, otherwise compilation 
 on hosts with older kernel headers fail.

Yes. I will fix this.

Best Regards,
Huang Ying



signature.asc
Description: This is a digitally signed message part


[PATCH 0/5] ksm - dynamic page sharing driver for linux v4

2009-04-19 Thread Izik Eidus
Introduction:
KSM is a linux driver that allows dynamicly sharing identical memory
pages between one or more processes.

Unlike tradtional page sharing that is made at the allocation of the
memory, ksm do it dynamicly after the memory was created.
Memory is periodically scanned; identical pages are identified and
merged.

The sharing is made in a transparent way to the procsess that use it.

Ksm is highly important for hypervisors (kvm), where in production
enviorments there might be many copys of the same data data among the
host memory.
This kind of data can be:
similar kernels, librarys, cache, and so on.

Even that ksm was wrote for kvm, any userspace application that want
to use it to share its data can try it.

Ksm may be useful for any application that might have similar (page
aligment) data strctures among the memory, ksm will find this data merge
it to one copy, and even if it will be changed and thereforew copy on
writed, ksm will merge it again as soon as it will be identical again.

Another reason to consider using ksm is the fact that it might simplify
alot the userspace code of application that want to use shared private data,
instead that the application will mange shared area, ksm will do this for
the application, and even write to this data will be allowed without any
synchinization acts from the application.

Ksm was desgiend to be a loadable module that doesnt change the VM code
of linux.


From v3 - v4:

1) Mostly fixes of coding styles, and few bugs that Andrew found.
   * get_user_pages return value check - we now check == 1.
   * protecting the vma under the mmap_sem() while checking its fields.
   * kthread_* renaming into ksm_thread_*
   * const to the file_operations strctures.

   (The only thing i didnt change from your comments, is the number
of pages we are allocating for the hash table, this is performence
critical for ksm)

2) Changed get_pte() to be linux generic function avaible for other users.


Description of the ksm interface:

Ksm interface is splited into two areas;
Administration sysfs interface - interface that is avaible thanks to
sysfs to control the ksm cpu timing, maximum allocation of kernel pages
and statics.

This interface is avaible at /sys/kernel/mm/ksm/ and its fields are:

kernel_pages_allocated - information about how many kernel pagesksm have
allocated, this pages are not swappabke, and each page like that is used
by ksm to share pages with identical content.

pages_shared - how many pages were shared by ksm

run - set to 1 when you want ksm to run, 0 when no

max_kernel_pages - set the maximum amount of kernel pages to be allocated
by ksm, set 0 for unlimited.

pages_to_scan - how many pages to scan before ksm will sleep

sleep - how much usecs ksm will sleep.


The interface for applications that want its memory to be scanned by ksm:
This interface is built around ioctls when application want its memory
to be scanned it will do something like that:

static int ksm_register_memory(unsigned long phys_ram_size,
   unsigned long phys_ram_base)
{
int fd;
int ksm_fd;
int r = 1;
struct ksm_memory_region ksm_region;

fd = open(/dev/ksm, O_RDWR | O_TRUNC, (mode_t)0600);
if (fd == -1)
goto out;

ksm_fd = ioctl(fd, KSM_CREATE_SHARED_MEMORY_AREA);
if (ksm_fd == -1)
goto out_free;

ksm_region.npages = phys_ram_size / TARGET_PAGE_SIZE;
ksm_region.addr = phys_ram_base;
r = ioctl(ksm_fd, KSM_REGISTER_MEMORY_REGION, ksm_region);
if (r)
goto out_free1;

return r;

out_free1:
close(ksm_fd);
out_free:
close(fd);
out:
return r;
}

This ioctls are:

KSM_GET_API_VERSION:
Give the userspace the api version of the module.

KSM_CREATE_SHARED_MEMORY_AREA:
Create shared memory reagion fd, that latter allow the user to register
the memory region to scan by using:
KSM_REGISTER_MEMORY_REGION and KSM_REMOVE_MEMORY_REGION

KSM_REGISTER_MEMORY_REGION:
Register userspace virtual address range to be scanned by ksm.
This ioctl is using the ksm_memory_region structure:
ksm_memory_region:
__u32 npages;
 number of pages to share inside this memory region.
__u32 pad;
__u64 addr:
the begining of the virtual address of this region.
__u64 reserved_bits;
reserved bits for future usage.

KSM_REMOVE_MEMORY_REGION:
Remove memory region from ksm.


Testing ksm:
Considering the fact that i got some mails asking me how to use this,
I guess it wasnt explined good in the last posts, i will try to improve
this:

The following steps should allow you to test ksm and play with it:

1) First patch your kernel with this patchs.

2) Patch avi kvm-git tree:
   (git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git) with the patchs
   from:
   http://lkml.org/lkml/2009/3/30/534

3) Patch kvm-userspace tree:
   git.kernel.org/pub/scm/virt/kvm/kvm-userspace.git with the patchs from:
   http://lkml.org/lkml/2009/3/30/538 

4) try to do:
   echo 300  

[PATCH 1/5] MMU_NOTIFIERS: add set_pte_at_notify()

2009-04-19 Thread Izik Eidus
this macro allow setting the pte in the shadow page tables directly
instead of flushing the shadow page table entry and then get vmexit in
order to set it.

This function is optimzation for kvm/users of mmu_notifiers for COW
pages, it is useful for kvm when ksm is used beacuse it allow kvm
not to have to recive VMEXIT and only then map the shared page into
the mmu shadow pages, but instead map it directly at the same time
linux map the page into the host page table.

this mmu notifer macro is working by calling to callback that will map
directly the physical page into the shadow page tables.

(users of mmu_notifiers that didnt implement the set_pte_at_notify()
call back will just recive the mmu_notifier_invalidate_page callback)

Signed-off-by: Izik Eidus iei...@redhat.com
---
 include/linux/mmu_notifier.h |   34 ++
 mm/memory.c  |   10 --
 mm/mmu_notifier.c|   20 
 3 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index b77486d..8bb245f 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -61,6 +61,15 @@ struct mmu_notifier_ops {
 struct mm_struct *mm,
 unsigned long address);
 
+   /* 
+   * change_pte is called in cases that pte mapping into page is changed
+   * for example when ksm mapped pte to point into a new shared page.
+   */
+   void (*change_pte)(struct mmu_notifier *mn,
+  struct mm_struct *mm,
+  unsigned long address,
+  pte_t pte);
+
/*
 * Before this is invoked any secondary MMU is still ok to
 * read/write to the page previously pointed to by the Linux
@@ -154,6 +163,8 @@ extern void __mmu_notifier_mm_destroy(struct mm_struct *mm);
 extern void __mmu_notifier_release(struct mm_struct *mm);
 extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm,
  unsigned long address);
+extern void __mmu_notifier_change_pte(struct mm_struct *mm, 
+ unsigned long address, pte_t pte);
 extern void __mmu_notifier_invalidate_page(struct mm_struct *mm,
  unsigned long address);
 extern void __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
@@ -175,6 +186,13 @@ static inline int mmu_notifier_clear_flush_young(struct 
mm_struct *mm,
return 0;
 }
 
+static inline void mmu_notifier_change_pte(struct mm_struct *mm,
+  unsigned long address, pte_t pte)
+{
+   if (mm_has_notifiers(mm))
+   __mmu_notifier_change_pte(mm, address, pte);
+}
+
 static inline void mmu_notifier_invalidate_page(struct mm_struct *mm,
  unsigned long address)
 {
@@ -236,6 +254,16 @@ static inline void mmu_notifier_mm_destroy(struct 
mm_struct *mm)
__young;\
 })
 
+#define set_pte_at_notify(__mm, __address, __ptep, __pte)  \
+({ \
+   struct mm_struct *___mm = __mm; \
+   unsigned long ___address = __address;   \
+   pte_t ___pte = __pte;   \
+   \
+   set_pte_at(__mm, __address, __ptep, ___pte);\
+   mmu_notifier_change_pte(___mm, ___address, ___pte); \
+})
+
 #else /* CONFIG_MMU_NOTIFIER */
 
 static inline void mmu_notifier_release(struct mm_struct *mm)
@@ -248,6 +276,11 @@ static inline int mmu_notifier_clear_flush_young(struct 
mm_struct *mm,
return 0;
 }
 
+static inline void mmu_notifier_change_pte(struct mm_struct *mm,
+  unsigned long address, pte_t pte)
+{
+}
+
 static inline void mmu_notifier_invalidate_page(struct mm_struct *mm,
  unsigned long address)
 {
@@ -273,6 +306,7 @@ static inline void mmu_notifier_mm_destroy(struct mm_struct 
*mm)
 
 #define ptep_clear_flush_young_notify ptep_clear_flush_young
 #define ptep_clear_flush_notify ptep_clear_flush
+#define set_pte_at_notify set_pte_at
 
 #endif /* CONFIG_MMU_NOTIFIER */
 
diff --git a/mm/memory.c b/mm/memory.c
index cf6873e..1e1a14b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2051,9 +2051,15 @@ gotten:
 * seen in the presence of one thread doing SMC and another
 * thread doing COW.
 */
-   ptep_clear_flush_notify(vma, address, page_table);
+   ptep_clear_flush(vma, address, page_table);
page_add_new_anon_rmap(new_page, vma, address);
-  

[PATCH 2/5] add get_pte(): helper function: fetching pte for va

2009-04-19 Thread Izik Eidus
get_pte() receive mm_struct of a task, and a virtual address and return
the pte corresponding to it.

this function return NULL in case it couldnt fetch the pte.

Signed-off-by: Izik Eidus iei...@redhat.com
---
 include/linux/mm.h |   24 
 1 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index bff1f0d..9a34109 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -894,6 +894,30 @@ int vma_wants_writenotify(struct vm_area_struct *vma);
 
 extern pte_t *get_locked_pte(struct mm_struct *mm, unsigned long addr, 
spinlock_t **ptl);
 
+static inline pte_t *get_pte(struct mm_struct *mm, unsigned long addr)
+{
+   pgd_t *pgd;
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *ptep = NULL;
+
+   pgd = pgd_offset(mm, addr);
+   if (!pgd_present(*pgd))
+   goto out;
+
+   pud = pud_offset(pgd, addr);
+   if (!pud_present(*pud))
+   goto out;
+
+   pmd = pmd_offset(pud, addr);
+   if (!pmd_present(*pmd))
+   goto out;
+
+   ptep = pte_offset_map(pmd, addr);
+out:
+   return ptep;
+}
+
 #ifdef __PAGETABLE_PUD_FOLDED
 static inline int __pud_alloc(struct mm_struct *mm, pgd_t *pgd,
unsigned long address)
-- 
1.5.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] add replace_page(): change the page pte is pointing to.

2009-04-19 Thread Izik Eidus
replace_page() allow changing the mapping of pte from one physical page
into diffrent physical page.

this function is working by removing oldpage from the rmap and calling
put_page on it, and by setting the pte to point into newpage and by
inserting it to the rmap using page_add_file_rmap().

note: newpage must be non anonymous page, the reason for this is:
replace_page() is built to allow mapping one page into more than one
virtual addresses, the mapping of this page can happen in diffrent
offsets inside each vma, and therefore we cannot trust the page-index
anymore.

the side effect of this issue is that newpage cannot be anything but
kernel allocated page that is not swappable.

Signed-off-by: Izik Eidus iei...@redhat.com
---
 include/linux/mm.h |5 +++
 mm/memory.c|   80 
 2 files changed, 85 insertions(+), 0 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9a34109..a0ddfb5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1264,6 +1264,11 @@ int vm_insert_pfn(struct vm_area_struct *vma, unsigned 
long addr,
 int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn);
 
+#if defined(CONFIG_KSM) || defined(CONFIG_KSM_MODULE)
+int replace_page(struct vm_area_struct *vma, struct page *oldpage,
+struct page *newpage, pte_t orig_pte, pgprot_t prot);
+#endif
+
 struct page *follow_page(struct vm_area_struct *, unsigned long address,
unsigned int foll_flags);
 #define FOLL_WRITE 0x01/* check pte is writable */
diff --git a/mm/memory.c b/mm/memory.c
index 1e1a14b..d6e53c2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1567,6 +1567,86 @@ int vm_insert_mixed(struct vm_area_struct *vma, unsigned 
long addr,
 }
 EXPORT_SYMBOL(vm_insert_mixed);
 
+#if defined(CONFIG_KSM) || defined(CONFIG_KSM_MODULE)
+
+/**
+ * replace_page - replace page in vma with new page
+ * @vma:  vma that hold the pte oldpage is pointed by.
+ * @oldpage:  the page we are replacing with newpage
+ * @newpage:  the page we replace oldpage with
+ * @orig_pte: the original value of the pte
+ * @prot: page protection bits
+ *
+ * Returns 0 on success, -EFAULT on failure.
+ *
+ * Note: @newpage must not be an anonymous page because replace_page() does
+ * not change the mapping of @newpage to have the same values as @oldpage.
+ * @newpage can be mapped in several vmas at different offsets (page-index).
+ */
+int replace_page(struct vm_area_struct *vma, struct page *oldpage,
+struct page *newpage, pte_t orig_pte, pgprot_t prot)
+{
+   struct mm_struct *mm = vma-vm_mm;
+   pgd_t *pgd;
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *ptep;
+   spinlock_t *ptl;
+   unsigned long addr;
+   int ret;
+
+   BUG_ON(PageAnon(newpage));
+
+   ret = -EFAULT;
+   addr = page_address_in_vma(oldpage, vma);
+   if (addr == -EFAULT)
+   goto out;
+
+   pgd = pgd_offset(mm, addr);
+   if (!pgd_present(*pgd))
+   goto out;
+
+   pud = pud_offset(pgd, addr);
+   if (!pud_present(*pud))
+   goto out;
+
+   pmd = pmd_offset(pud, addr);
+   if (!pmd_present(*pmd))
+   goto out;
+
+   ptep = pte_offset_map_lock(mm, pmd, addr, ptl);
+   if (!ptep)
+   goto out;
+
+   if (!pte_same(*ptep, orig_pte)) {
+   pte_unmap_unlock(ptep, ptl);
+   goto out;
+   }
+
+   ret = 0;
+   get_page(newpage);
+   page_add_file_rmap(newpage);
+
+   flush_cache_page(vma, addr, pte_pfn(*ptep));
+   ptep_clear_flush(vma, addr, ptep);
+   set_pte_at_notify(mm, addr, ptep, mk_pte(newpage, prot));
+
+   page_remove_rmap(oldpage);
+   if (PageAnon(oldpage)) {
+   dec_mm_counter(mm, anon_rss);
+   inc_mm_counter(mm, file_rss);
+   }
+   put_page(oldpage);
+
+   pte_unmap_unlock(ptep, ptl);
+
+out:
+   return ret;
+}
+EXPORT_SYMBOL_GPL(replace_page);
+
+#endif
+
 /*
  * maps a range of physical memory into the requested pages. the old
  * mappings are removed. any references to nonexistent pages results
-- 
1.5.6.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] add ksm kernel shared memory driver.

2009-04-19 Thread Izik Eidus
Ksm is driver that allow merging identical pages between one or more
applications in way unvisible to the application that use it.
Pages that are merged are marked as readonly and are COWed when any
application try to change them.

Ksm is used for cases where using fork() is not suitable,
one of this cases is where the pages of the application keep changing
dynamicly and the application cannot know in advance what pages are
going to be identical.

Ksm works by walking over the memory pages of the applications it
scan in order to find identical pages.
It uses a two sorted data strctures called stable and unstable trees
to find in effective way the identical pages.

When ksm finds two identical pages, it marks them as readonly and merges
them into single one page,
after the pages are marked as readonly and merged into one page, linux
will treat this pages as normal copy_on_write pages and will fork them
when write access will happen to them.

Ksm scan just memory areas that were registred to be scanned by it.

Signed-off-by: Izik Eidus iei...@redhat.com
Signed-off-by: Chris Wright chr...@redhat.com
Signed-off-by: Andrea Arcangeli aarca...@redhat.com
---
 include/linux/ksm.h|   48 ++
 include/linux/miscdevice.h |1 +
 mm/Kconfig |6 +
 mm/Makefile|1 +
 mm/ksm.c   | 1675 
 5 files changed, 1731 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/ksm.h
 create mode 100644 mm/ksm.c

diff --git a/include/linux/ksm.h b/include/linux/ksm.h
new file mode 100644
index 000..2c11e9a
--- /dev/null
+++ b/include/linux/ksm.h
@@ -0,0 +1,48 @@
+#ifndef __LINUX_KSM_H
+#define __LINUX_KSM_H
+
+/*
+ * Userspace interface for /dev/ksm - kvm shared memory
+ */
+
+#include linux/types.h
+#include linux/ioctl.h
+
+#include asm/types.h
+
+#define KSM_API_VERSION 1
+
+#define ksm_control_flags_run 1
+
+/* for KSM_REGISTER_MEMORY_REGION */
+struct ksm_memory_region {
+   __u32 npages; /* number of pages to share */
+   __u32 pad;
+   __u64 addr; /* the begining of the virtual address */
+__u64 reserved_bits;
+};
+
+#define KSMIO 0xAB
+
+/* ioctls for /dev/ksm */
+
+#define KSM_GET_API_VERSION  _IO(KSMIO,   0x00)
+/*
+ * KSM_CREATE_SHARED_MEMORY_AREA - create the shared memory reagion fd
+ */
+#define KSM_CREATE_SHARED_MEMORY_AREA_IO(KSMIO,   0x01) /* return SMA fd */
+
+/* ioctls for SMA fds */
+
+/*
+ * KSM_REGISTER_MEMORY_REGION - register virtual address memory area to be
+ * scanned by kvm.
+ */
+#define KSM_REGISTER_MEMORY_REGION   _IOW(KSMIO,  0x20,\
+ struct ksm_memory_region)
+/*
+ * KSM_REMOVE_MEMORY_REGION - remove virtual address memory area from ksm.
+ */
+#define KSM_REMOVE_MEMORY_REGION _IO(KSMIO,   0x21)
+
+#endif
diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h
index beb6ec9..297c0bb 100644
--- a/include/linux/miscdevice.h
+++ b/include/linux/miscdevice.h
@@ -30,6 +30,7 @@
 #define HPET_MINOR 228
 #define FUSE_MINOR 229
 #define KVM_MINOR  232
+#define KSM_MINOR  233
 #define MISC_DYNAMIC_MINOR 255
 
 struct device;
diff --git a/mm/Kconfig b/mm/Kconfig
index 57971d2..fb8ac63 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -225,3 +225,9 @@ config HAVE_MLOCKED_PAGE_BIT
 
 config MMU_NOTIFIER
bool
+
+config KSM
+   tristate Enable KSM for page sharing
+   help
+ Enable the KSM kernel module to allow page sharing of equal pages
+ among different tasks.
diff --git a/mm/Makefile b/mm/Makefile
index ec73c68..b885513 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -24,6 +24,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
 obj-$(CONFIG_TMPFS_POSIX_ACL) += shmem_acl.o
 obj-$(CONFIG_SLOB) += slob.o
 obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_KSM) += ksm.o
 obj-$(CONFIG_PAGE_POISONING) += debug-pagealloc.o
 obj-$(CONFIG_SLAB) += slab.o
 obj-$(CONFIG_SLUB) += slub.o
diff --git a/mm/ksm.c b/mm/ksm.c
new file mode 100644
index 000..7fd4158
--- /dev/null
+++ b/mm/ksm.c
@@ -0,0 +1,1675 @@
+/*
+ * Memory merging driver for Linux
+ *
+ * This module enables dynamic sharing of identical pages found in different
+ * memory areas, even if they are not shared by fork()
+ *
+ * Copyright (C) 2008 Red Hat, Inc.
+ * Authors:
+ * Izik Eidus
+ * Andrea Arcangeli
+ * Chris Wright
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ */
+
+#include linux/module.h
+#include linux/errno.h
+#include linux/mm.h
+#include linux/fs.h
+#include linux/miscdevice.h
+#include linux/vmalloc.h
+#include linux/file.h
+#include linux/mman.h
+#include linux/sched.h
+#include linux/rwsem.h
+#include linux/pagemap.h
+#include linux/sched.h
+#include linux/rmap.h
+#include linux/spinlock.h
+#include linux/jhash.h
+#include linux/delay.h
+#include linux/kthread.h
+#include linux/wait.h

Re: Luvalley-2 has been released: running KVM below any operating system

2009-04-19 Thread Xiaodong Yi
It is not a typo. I copied from UnixBench output directly. Howver, it
must be a bug of Luvalley because even the native Linux benchmark on
Double-Precision Whetstone is not that high. I also noticed that other
benchmarks are all lower than native Linux.

About timing, Luvalley does nothing more than KVM except that Luvalley
implemented the VLAPIC timer using TSC while KVM uses the services of
Linux kernel. The other timers of both Luvalley and KVM, I think, are
all implemented in Qemu.

Moreover, I could not explain why Luvalley's benchmarks on process
creation, execl throughput, file copy and shell script are 20% ~ 40%
higher than KVM, while other benchmarks on pipe throughput, context
switching and syscall overhead are almost the same as KVM.

Thanks for your attention,

Xiaodong

2009/4/19 Avi Kivity a...@redhat.com:
 Jan Kiszka wrote:
 Avi Kivity wrote:

 Xiaodong Yi wrote:

 Hi,

 I've tested the guest Linux using UnixBench 5.1.2. The platform is:
   * Intel's Core Due CPU with 2 cores, 2GB RAM
   * CentOS 5.2 as the dom0 Linux, i.e., the host Linux for KVM
   * CentOS 5.2 as the guest Linux, i.e., the Linux running on the
 virtual machine provided by Qemu

 The first set of results is for Luvalley, and the second one is for
 KVM. As the result, Luvalley's guest Linux is 20% ~ 30% faster than
 KVM's guest! It is very surprise to me. I had through Luvalley's guest
 should be the same performance as KVM's.



 Yes, it is surprising.


 Double-Precision Whetstone                    12287.7 MWIPS (10.0 s, 2 
 samples)
 Double-Precision Whetstone                     2166.3 MWIPS (10.2 s, 2 
 samples

 That's by far the biggest difference. Can you confirm it isn't a typo?

 If not, then it looks like we have a bug in floating point handling. I
 don't think this benchmark uses sse.



 Even the native Linux numbers are not that high, rather comparable to
 KVM. I suspect Luvalley is fooling the benchmark here...


 Most likely timing. Likely Luvally guests time runs to slowly, so they
 achieve more loops per unit of guest time.

 --
 Do not meddle in the internals of kernels, for they are subtle and quick to 
 panic.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v2] Add MCE support to KVM

2009-04-19 Thread Kyle Moffett
On Sun, Apr 19, 2009 at 9:19 PM, Huang Ying ying.hu...@intel.com wrote:
 On Sat, 2009-04-18 at 23:54 +0800, Anthony Liguori wrote:
 Huang Ying wrote:
  The related MSRs are emulated. MCE capability is exported via
  extension KVM_CAP_MCE and ioctl KVM_X86_GET_MCE_CAP_SUPPORTED. A new
  vcpu ioctl command KVM_X86_SETUP_MCE is used to setup MCE emulation
  such as the mcg_cap. MCE is injected via vcpu ioctl command
  KVM_X86_SET_MCE. Extended machine-check state (MCG_EXT_P) and CMCI are
  not simulated.
 

 Maybe I'm missing something, but couldn't this be implemented entirely
 within userspace?  There's nothing VT/SVM specific about this.  If the
 issue is setting these MSRs from userspace via KVM_SET_MSRS isn't
 enough, perhaps we should add userspace MSR handling.

 Also, if you implement the MSR logic in userspace, it's pretty simple to
 make it work in the non-TCG case which will be a requirement for
 upstream merging.

 There is more logic than just KVM_SET_MSRS, such as BANK reporting
 disabling, overwriting rules, triple fault for UC MCE during MCIP.
 Although these logic can be implemented in user space, I think put them
 in kernel space is easy to be understood. And the code is pretty short.

IMO the main reason to put this in kernel-space would be to make it
possible to automatically forward some MCE errors generated by the
real hardware (RAM ECC errors for example) down into the VM.  Right
now I suppose you could do that with the patches to forward RAM-based
hard MCEs to userspace using SIGSEGV and handling the SIGSEGV in
userspace, but that seems more fragile to me.

Cheers,
Kyle Moffett
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -v2] Add MCE support to KVM

2009-04-19 Thread Huang Ying
On Mon, 2009-04-20 at 11:57 +0800, Kyle Moffett wrote:
 On Sun, Apr 19, 2009 at 9:19 PM, Huang Ying ying.hu...@intel.com wrote:
  On Sat, 2009-04-18 at 23:54 +0800, Anthony Liguori wrote:
  Huang Ying wrote:
   The related MSRs are emulated. MCE capability is exported via
   extension KVM_CAP_MCE and ioctl KVM_X86_GET_MCE_CAP_SUPPORTED. A new
   vcpu ioctl command KVM_X86_SETUP_MCE is used to setup MCE emulation
   such as the mcg_cap. MCE is injected via vcpu ioctl command
   KVM_X86_SET_MCE. Extended machine-check state (MCG_EXT_P) and CMCI are
   not simulated.
  
 
  Maybe I'm missing something, but couldn't this be implemented entirely
  within userspace?  There's nothing VT/SVM specific about this.  If the
  issue is setting these MSRs from userspace via KVM_SET_MSRS isn't
  enough, perhaps we should add userspace MSR handling.
 
  Also, if you implement the MSR logic in userspace, it's pretty simple to
  make it work in the non-TCG case which will be a requirement for
  upstream merging.
 
  There is more logic than just KVM_SET_MSRS, such as BANK reporting
  disabling, overwriting rules, triple fault for UC MCE during MCIP.
  Although these logic can be implemented in user space, I think put them
  in kernel space is easy to be understood. And the code is pretty short.
 
 IMO the main reason to put this in kernel-space would be to make it
 possible to automatically forward some MCE errors generated by the
 real hardware (RAM ECC errors for example) down into the VM.  Right
 now I suppose you could do that with the patches to forward RAM-based
 hard MCEs to userspace using SIGSEGV and handling the SIGSEGV in
 userspace, but that seems more fragile to me.

Yes. Puting this in kernel-space would make it more reliable to forward
the MCE to KVM guest.

Best Regards,
Huang Ying


signature.asc
Description: This is a digitally signed message part