date:20120210

https://bugzilla.kernel.org/show_bug.cgi?id=42755

   Summary: KVM is being extremely slow on AMD Athlon64 4000+ Dual
Core 2.1GHz Brisbane
   Product: Virtualization
   Version: unspecified
Kernel Version: 3.2.2
  Platform: All
OS/Version: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: kvm
AssignedTo: virtualization_...@kernel-bugs.osdl.org
ReportedBy: sandik...@yandex.ru
Regression: No


Hello.

Kvm seems to be broken on AMD Athlon64 4000+ Dual Core 2.1GHz Brisbane. A
kernel build takes over
2 hours, compared to 15 minutes on the host. I'm running with virtio
for storage, and this does not seem to be related to IO. A simple test
with dd also demonstrates that the problem is cpu-ralated:

Host test:
dd if=/dev/zero of=/dev/null bs=1k count=8M
8388608+0 записей считано (read)
8388608+0 записей написано (write)
 скопировано (copyed) 8589934592 байта (8,6 GB), 7,04456 c, 1,2 GB/c

This also does not seem to be a cpu bug, as both VirtualBox and VMware
work fine.

Vmware/Virtualbox test:
dd if=/dev/zero of=/dev/null bs=1k count=8M
8388608+0 записей считано (read)
8388608+0 записей написано (write)
 скопировано (copyed) 8589934592 байта (8,6 GB), 5,02745 c, 1,7 GB/c

I've tried different kernels on host, between 2.6.22 and current git,
with the same results. The guest is running Gentoo Hardened kernel 3.2.2
, but I could try with a an other distro and a vanilla kernel
if required.

My kvm version is kvm 1.0

Help would be much appreciated.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 42755] KVM is being extremely slow on AMD Athlon64 4000+ Dual Core 2.1GHz Brisbane

https://bugzilla.kernel.org/show_bug.cgi?id=42755





--- Comment #1 from Rosen sandik...@yandex.ru  2012-02-10 08:06:25 ---
in kvm guest this speed are about 2MB/s and is slow like a hell. With it
booting for about 15 minutes.

Same guest machine boot for about 4-7 seconds on virtualbox/vmware

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 42755] KVM is being extremely slow on AMD Athlon64 4000+ Dual Core 2.1GHz Brisbane

https://bugzilla.kernel.org/show_bug.cgi?id=42755





--- Comment #2 from Michael Tokarev m...@tls.msk.ru  2012-02-10 09:18:35 ---
This is very unlikely to do with the kernel.  Qemu (which you're using to run
your guest -- kvm is just in-kernel module) will emulate whole machine
including CPU in case it can't access /dev/kvm or if you requested to not use
it in the first place.  If the problem is due to lack of access to /dev/kvm,
qemu tells you about that on stderr.

So far, you haven't provided any of:

 - qemu version
 - qemu command line
 - messages generated by qemu

And for this issue, please also provide output of

 info kvm

from qemu monitor.  If kvm is enabled, it should tell you just that --
enabled. If not, it'll tell you disabled and this means that your guest is
running in full emulation mode.

Without that, this bugreport is not very useful.  Especially having in mind
that this setup works for lots of other people just fine.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[KVM-Autotest][PATCH 1/2] kvm test: Change cpuflags test call structure.

2012-02-10 Thread Jiří Župka

This changes are neccesary because people who use
the test want to use this test another way.

Signed-off-by: Jiří Župka jzu...@redhat.com
---
 client/tests/kvm/tests/cpuflags.py |  890 +---
 client/virt/subtests.cfg.sample|   24 +-
 2 files changed, 438 insertions(+), 476 deletions(-)

diff --git a/client/tests/kvm/tests/cpuflags.py 
b/client/tests/kvm/tests/cpuflags.py
index c2111d4..97dcbaa 100644
--- a/client/tests/kvm/tests/cpuflags.py
+++ b/client/tests/kvm/tests/cpuflags.py
@@ -2,8 +2,6 @@ import logging, re, random, os, time, socket, pickle
 from autotest_lib.client.common_lib import error, utils
 from autotest_lib.client.virt import kvm_vm
 from autotest_lib.client.virt import virt_utils, aexpect
-from autotest_lib.client.common_lib.test import Subtest, subtest_nocleanup
-from autotest_lib.client.common_lib.test import subtest_fatal
 
 
 def run_cpuflags(test, params, env):
@@ -31,7 +29,7 @@ def run_cpuflags(test, params, env):
 class HgFlags(object):
 def __init__(self, cpu_model, extra_flags=set([])):
 virtual_flags = set(map(virt_utils.Flag,
-params.get(guest_spec_flags, 
).split()))
+   params.get(guest_spec_flags, ).split()))
 self.hw_flags = set(map(virt_utils.Flag,
 params.get(host_spec_flags, ).split()))
 self.qemu_support_flags = get_all_qemu_flags()
@@ -61,7 +59,6 @@ def run_cpuflags(test, params, env):
 self.host_all_unsupported_flags -= (self.host_support_flags |
 virtual_flags)
 
-
 def start_guest_with_cpuflags(cpuflags, smp=None, migration=False,
   wait=True):
 
@@ -87,7 +84,6 @@ def run_cpuflags(test, params, env):
 
 return (vm, session)
 
-
 def get_guest_system_cpuflags(vm_session):
 
 Get guest system cpuflags.
@@ -101,7 +97,6 @@ def run_cpuflags(test, params, env):
 flags = flags_re.search(out).groups()[0].split()
 return set(map(virt_utils.Flag, flags))
 
-
 def get_guest_host_cpuflags(cpumodel):
 
 Get cpu flags correspond with cpumodel parameters.
@@ -123,7 +118,6 @@ def run_cpuflags(test, params, env):
 flags += flag_group.split()
 return set(map(virt_utils.Flag, flags))
 
-
 def get_all_qemu_flags():
 cmd = qemu_binary +  -cpu ?cpuid
 output = utils.run(cmd).stdout
@@ -137,7 +131,6 @@ def run_cpuflags(test, params, env):
 
 return set(map(virt_utils.Flag, flags))
 
-
 def get_flags_full_name(cpu_flag):
 
 Get all name of Flag.
@@ -151,7 +144,6 @@ def run_cpuflags(test, params, env):
 return virt_utils.Flag(f)
 return []
 
-
 def parse_qemu_cpucommand(cpumodel):
 
 Parse qemu cpu params.
@@ -175,7 +167,6 @@ def run_cpuflags(test, params, env):
 
 return real_flags
 
-
 def get_cpu_models():
 
 Get all cpu models from qemu.
@@ -188,7 +179,6 @@ def run_cpuflags(test, params, env):
 cpu_re = re.compile(\w+\s+\[?(\w+)\]?)
 return cpu_re.findall(output)
 
-
 def check_cpuflags(cpumodel, vm_session):
 
 Check if vm flags are same like flags select by cpumodel.
@@ -206,7 +196,6 @@ def run_cpuflags(test, params, env):
 logging.debug(Flags on guest not defined by host: %s, (gf - rf))
 return rf - gf
 
-
 def disable_cpu(vm_session, cpu, disable=True):
 
 Disable cpu in guest system.
@@ -224,7 +213,6 @@ def run_cpuflags(test, params, env):
 vm_session.cmd(echo 1  %s % cpu_online)
 logging.debug(Guest cpu %d is enabled., cpu)
 
-
 def install_cpuflags_test_on_vm(vm, dst_dir):
 
 Install stress to vm.
@@ -240,7 +228,6 @@ def run_cpuflags(test, params, env):
 session.cmd(sync)
 session.close()
 
-
 def check_cpuflags_work(vm, path, flags):
 
 Check which flags work.
@@ -258,7 +245,7 @@ def run_cpuflags(test, params, env):
 try:
 for tc in virt_utils.kvm_map_flags_to_test[f]:
 session.cmd(%s/cpuflags-test --%s %
-(os.path.join(path,test_cpu_flags), tc))
+(os.path.join(path, test_cpu_flags), tc))
 pass_Flags.append(f)
 except aexpect.ShellCmdError:
 not_working.append(f)
@@ -268,7 +255,6 @@ def run_cpuflags(test, params, env):
 set(map(virt_utils.Flag, not_working)),
 set(map(virt_utils.Flag, not_tested)))
 
-
 def run_stress(vm, timeout, guest_flags):
 
 Run stress on vm for timeout time.
@@ -292,7 +278,6 @@ def run_cpuflags(test, params, env):
 dd_session.close()
 return ret
 
-
 def separe_cpu_model(cpu_model):

[KVM-Autotest][PATCH 2/2] kvm test: Adds new subtest to cpuflags test.

2012-02-10 Thread Jiří Župka

Adds new subtest which tests warning of
qemu-kvm -cpu started with parameter check.

Signed-off-by: Jiří Župka jzu...@redhat.com
---
 client/tests/kvm/tests/cpuflags.py |   35 +++
 client/virt/subtests.cfg.sample|2 ++
 client/virt/virt_utils.py  |1 +
 3 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/client/tests/kvm/tests/cpuflags.py 
b/client/tests/kvm/tests/cpuflags.py
index 97dcbaa..1b8306d 100644
--- a/client/tests/kvm/tests/cpuflags.py
+++ b/client/tests/kvm/tests/cpuflags.py
@@ -429,6 +429,41 @@ def run_cpuflags(test, params, env):
  (str(Flags[1])))
 
 # 3) fail boot unsupported flags
+class test_boot_warn_with_host_unsupported_flags(MiniSubtest):
+def test(self):
+#This is virtual cpu flags which are supported by
+#qemu but no with host cpu.
+cpu_model, extra_flags = parse_cpu_model()
+
+flags = HgFlags(cpu_model, extra_flags)
+
+logging.debug(Unsupported flags %s.,
+  str(flags.host_all_unsupported_flags))
+cpuf_model = cpu_model + ,check
+
+# Add unsupported flags.
+for fadd in flags.host_all_unsupported_flags:
+cpuf_model += ,+ + fadd
+
+cmd = qemu_binary +  -cpu  + cpuf_model
+out = None
+
+try:
+try:
+out = utils.run(cmd, timeout=5, ignore_status=True).stderr
+raise error.TestFail(Guest not boot with unsupported 
+ flags.)
+except error.CmdError, e:
+out = e.result_obj.stderr
+finally:
+uns_re = re.compile(^warning:.*flag '(.+)', re.MULTILINE)
+warn_flags = set(map(virt_utils.Flag, uns_re.findall(out)))
+fwarn_flags = flags.host_all_unsupported_flags - warn_flags
+if fwarn_flags:
+raise error.TestFail(Qemu did not warn the use of 
+ flags %s % str(fwarn_flags))
+
+# 3) fail boot unsupported flags
 class test_fail_boot_with_host_unsupported_flags(MiniSubtest):
 def test(self):
 #This is virtual cpu flags which are supported by
diff --git a/client/virt/subtests.cfg.sample b/client/virt/subtests.cfg.sample
index ad24075..58e2928 100644
--- a/client/virt/subtests.cfg.sample
+++ b/client/virt/subtests.cfg.sample
@@ -1437,6 +1437,8 @@ variants:
 test_type = test_boot_cpu_model
 - qemu_boot_cpu_model_and_flags:
 test_type = test_boot_cpu_model_and_additional_flags
+- qemu_warn_boot_check_cpu_model:
+test_type = 
test_boot_warn_with_host_unsupported_flags
 - qemu_boot_fail_cpu_model:
 test_type = 
test_fail_boot_with_host_unsupported_flags
 - stress_guest:
diff --git a/client/virt/virt_utils.py b/client/virt/virt_utils.py
index 20ed4ba..e65b322 100644
--- a/client/virt/virt_utils.py
+++ b/client/virt/virt_utils.py
@@ -1331,6 +1331,7 @@ kvm_map_flags_to_test = {
 kvm_map_flags_aliases = {
 'sse4.1'  :'sse4_1',
 'sse4.2'  :'sse4_2',
+'pclmulqdq'   :'pclmuldq',
 }
 
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 42755] KVM is being extremely slow on AMD Athlon64 4000+ Dual Core 2.1GHz Brisbane

https://bugzilla.kernel.org/show_bug.cgi?id=42755





--- Comment #3 from Rosen sandik...@yandex.ru  2012-02-10 09:36:35 ---
cmdline:

sudo kvm -enable-kvm -cpu 'host' -smp '2,cores=2,threads=2,sockets=1' -m 512M
-vga vmware -usbdevice tablet -soundhw es1370 -drive
file=test-amd64.img,if=virtio,cache=none -netdev
type=tap,script=/etc/qemu-ifup,id=net0 -device virtio-net-pci,netdev=net0
-balloon virtio -daemonize -name 'Gentoo Hardened 64 20120207'

version:

qemu -version
QEMU emulator version 1.0 (qemu-kvm-1.0), Copyright (c) 2003-2008 Fabrice
Bellard

info:

(qemu) info kvm
kvm support: enabled
(qemu)

Host dmesg kvm only :

roko__@ArchXFCE64(11:34:21-12-02-10)( ~ pts/1 )# dmesg | grep kvm
[12347.567390] kvm: Nested Virtualization enabled
[12374.734909] kvm: unreliable cycle conversion on adjustable rate TSC
[12451.916934] kvm: 13119: cpu0 unhandled rdmsr: 0xc001100d
[12451.916959] kvm: 13119: cpu0 unhandled rdmsr: 0xc0010112
[12452.286858] kvm: 13119: cpu0 unhandled rdmsr: 0xc0010001
[16345.719346] kvm: 23604: cpu0 unhandled rdmsr: 0xc0010112
[16346.104688] kvm: 23604: cpu0 unhandled rdmsr: 0xc0010001
[23694.763207] kvm: 4941: cpu0 unhandled rdmsr: 0xc0010112
[23694.969082] kvm: 4941: cpu0 unhandled rdmsr: 0xc0010001
[23830.158470] kvm: 4941: cpu0 unhandled rdmsr: 0xc0010112
[23830.360768] kvm: 4941: cpu0 unhandled rdmsr: 0xc0010001
[23882.579070] kvm: 5419: cpu0 unhandled rdmsr: 0xc0010112
[23882.779127] kvm: 5419: cpu0 unhandled rdmsr: 0xc0010001
[24081.106328] kvm: 5770: cpu0 unhandled rdmsr: 0xc0010112
[24081.439772] kvm: 5770: cpu0 unhandled rdmsr: 0xc0010001
[24478.792921] kvm: 7706: cpu0 unhandled rdmsr: 0xc0010112
[24478.986782] kvm: 7706: cpu0 unhandled rdmsr: 0xc0010001
[24931.920782] kvm: SMP vm created on host with unstable TSC; guest TSC will
not be reliable
[24942.147749] kvm: 8966: cpu0 unhandled rdmsr: 0xc0010112
[24942.301014] kvm: 8966: cpu0 unhandled rdmsr: 0xc0010001
[27046.175073] kvm: 15606: cpu0 unhandled rdmsr: 0xc0010112
[27046.332410] kvm: 15606: cpu0 unhandled rdmsr: 0xc0010001
[28459.859245] kvm: 23994: cpu0 unhandled rdmsr: 0xc0010112
[28460.058113] kvm: 23994: cpu0 unhandled rdmsr: 0xc0010001
[30892.864676] kvm: 6341: cpu0 unhandled rdmsr: 0xc0010001
[31438.995057] kvm: 11147: cpu0 unhandled rdmsr: 0xc0010112
[31439.214065] kvm: 11147: cpu0 unhandled rdmsr: 0xc0010001
[32561.185944] kvm: 20349: cpu0 unhandled rdmsr: 0xc0010112
[32561.448391] kvm: 20349: cpu0 unhandled rdmsr: 0xc0010001
[33633.457536] kvm: 28893: cpu0 unhandled rdmsr: 0xc0010112
[33633.801818] kvm: 28893: cpu0 unhandled rdmsr: 0xc0010001
[33879.382766] kvm: 30299: cpu0 unhandled rdmsr: 0xc0010112
[33879.577796] kvm: 30299: cpu0 unhandled rdmsr: 0xc0010001
[34654.862032] kvm: 32668: cpu0 unhandled rdmsr: 0xc0010112
[34655.234667] kvm: 32668: cpu0 unhandled rdmsr: 0xc0010001
[36724.031803] kvm: 32668: cpu0 unhandled rdmsr: 0xc0010112
[36724.147844] kvm: 32668: cpu0 unhandled rdmsr: 0xc0010001
[36836.657695] kvm: 32668: cpu0 unhandled rdmsr: 0xc0010112
[36836.896359] kvm: 32668: cpu0 unhandled rdmsr: 0xc0010001
[36891.644691] kvm: 8665: cpu0 unhandled rdmsr: 0xc0010112
[36891.778808] kvm: 8665: cpu0 unhandled rdmsr: 0xc0010001
[37065.796742] kvm: 9545: cpu0 unhandled rdmsr: 0xc0010112
[37065.951394] kvm: 9545: cpu0 unhandled rdmsr: 0xc0010001
[37198.255255] kvm: 10184: cpu0 unhandled rdmsr: 0xc0010112
[37198.385243] kvm: 10184: cpu0 unhandled rdmsr: 0xc0010001
[37531.276399] kvm: 11308: cpu0 unhandled rdmsr: 0xc0010112
[37531.404946] kvm: 11308: cpu0 unhandled rdmsr: 0xc0010001
[37633.200338] kvm: 12026: cpu0 unhandled rdmsr: 0xc0010112
[37633.335279] kvm: 12026: cpu0 unhandled rdmsr: 0xc0010001
[37974.339382] kvm: 13679: cpu0 unhandled rdmsr: 0xc0010112
[37974.463314] kvm: 13679: cpu0 unhandled rdmsr: 0xc0010001
[37998.836470] kvm: 14017: cpu0 unhandled rdmsr: 0xc0010112
[37998.952253] kvm: 14017: cpu0 unhandled rdmsr: 0xc0010001
[38194.260773] kvm: 14756: cpu0 unhandled rdmsr: 0xc0010112
[38194.400660] kvm: 14756: cpu0 unhandled rdmsr: 0xc0010001
[40609.026332] kvm: 23045: cpu0 unhandled rdmsr: 0xc0010001
[41580.924327] kvm: 28276: cpu0 unhandled rdmsr: 0xc0010112
[41581.040324] kvm: 28276: cpu0 unhandled rdmsr: 0xc0010001
[89588.419228] kvm: 5045: cpu0 unhandled rdmsr: 0xc0010112
[89588.558034] kvm: 5045: cpu0 unhandled rdmsr: 0xc0010001
[126250.670121] kvm: 27098: cpu0 unhandled rdmsr: 0xc0010112
[126250.800737] kvm: 27098: cpu0 unhandled rdmsr: 0xc0010001
[128187.690739] kvm: 27098: cpu0 unhandled rdmsr: 0xc0010112
[128187.821740] kvm: 27098: cpu0 unhandled rdmsr: 0xc0010001
[128452.392446] kvm: 1746: cpu0 unhandled rdmsr: 0xc0010112
[128452.524364] kvm: 1746: cpu0 unhandled rdmsr: 0xc0010001
[128642.028625] kvm: 1746: cpu0 unhandled rdmsr: 0xc0010112
[128642.165803] kvm: 1746: cpu0 unhandled rdmsr: 0xc0010001
[129900.176718] kvm: 6916: cpu0 unhandled rdmsr: 0xc0010112
[129900.301721] kvm: 6916: cpu0 unhandled rdmsr: 0xc0010001
[130077.225227] kvm: 8083: cpu0 unhandled rdmsr: 0xc0010112
[130077.355747]

[PATCH 1/4] kvm tool: Stop init if check_extensions failed

If kvm__check_extensions found that some of the required
KVM extention is not supported by OS, we should stop the
init and free all allocated resources.

Signed-off-by: Yang Bai hamo...@gmail.com
---
 tools/kvm/kvm.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
index 9a0bd67..8e749ad 100644
--- a/tools/kvm/kvm.c
+++ b/tools/kvm/kvm.c
@@ -384,6 +384,7 @@ struct kvm *kvm__init(const char *kvm_dev, const char 
*hugetlbfs_path, u64 ram_s
if (kvm__check_extensions(kvm)) {
pr_err(A required KVM extention is not supported by OS);
ret = -ENOSYS;
+   goto err;
}
 
kvm__arch_init(kvm, hugetlbfs_path, ram_size);
-- 
1.7.8.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/4] kvm tool: unite the error handle in kvm__init

When error occurs, just set the ret to the reason,
then jump to the error handle labels.
This makes the code more readable.

Signed-off-by: Yang Bai hamo...@gmail.com
---
 tools/kvm/kvm.c |   11 ++-
 1 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
index 8e749ad..192d70e 100644
--- a/tools/kvm/kvm.c
+++ b/tools/kvm/kvm.c
@@ -339,7 +339,8 @@ struct kvm *kvm__init(const char *kvm_dev, const char 
*hugetlbfs_path, u64 ram_s
 
if (!kvm__arch_cpu_supports_vm()) {
pr_err(Your CPU does not support hardware virtualization);
-   return ERR_PTR(-ENOSYS);
+   ret = -ENOSYS;
+   goto err;
}
 
kvm = kvm__new();
@@ -378,13 +379,13 @@ struct kvm *kvm__init(const char *kvm_dev, const char 
*hugetlbfs_path, u64 ram_s
kvm-name = strdup(name);
if (!kvm-name) {
ret = -ENOMEM;
-   goto err;
+   goto err_vm_fd;
}
 
if (kvm__check_extensions(kvm)) {
pr_err(A required KVM extention is not supported by OS);
ret = -ENOSYS;
-   goto err;
+   goto err_vm_fd;
}
 
kvm__arch_init(kvm, hugetlbfs_path, ram_size);
@@ -394,13 +395,13 @@ struct kvm *kvm__init(const char *kvm_dev, const char 
*hugetlbfs_path, u64 ram_s
 
return kvm;
 
-err:
+err_vm_fd:
close(kvm-vm_fd);
 err_sys_fd:
close(kvm-sys_fd);
 err_free:
free(kvm);
-
+err:
return ERR_PTR(ret);
 }
 
-- 
1.7.8.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/4] kvm tool: if kvm_ipc__start failed, return negative

If kvm_ipc__start failed, it returns a negative and by checking
this return value, we can ensure that it succeeds.

Signed-off-by: Yang Bai hamo...@gmail.com
---
 tools/kvm/kvm-ipc.c |   38 --
 tools/kvm/kvm.c |7 ++-
 2 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/tools/kvm/kvm-ipc.c b/tools/kvm/kvm-ipc.c
index 6a0bd21..257c806c 100644
--- a/tools/kvm/kvm-ipc.c
+++ b/tools/kvm/kvm-ipc.c
@@ -166,27 +166,53 @@ static void *kvm_ipc__thread(void *param)
 
 int kvm_ipc__start(int sock)
 {
+   int ret;
struct epoll_event ev = {0};
 
server_fd = sock;
 
epoll_fd = epoll_create(KVM_IPC_MAX_MSGS);
+   if (epoll_fd  0) {
+   ret = epoll_fd;
+   goto err;
+   }
 
ev.events = EPOLLIN | EPOLLET;
ev.data.fd = sock;
-   if (epoll_ctl(epoll_fd, EPOLL_CTL_ADD, sock, ev)  0)
-   die(Failed starting IPC thread);
+   if (epoll_ctl(epoll_fd, EPOLL_CTL_ADD, sock, ev)  0) {
+   pr_err(Failed starting IPC thread);
+   ret = -EFAULT;
+   goto err_epoll;
+   }
 
stop_fd = eventfd(0, 0);
+   if (stop_fd  0) {
+   ret = stop_fd;
+   goto err_epoll;
+   }
+
ev.events = EPOLLIN | EPOLLET;
ev.data.fd = stop_fd;
-   if (epoll_ctl(epoll_fd, EPOLL_CTL_ADD, stop_fd, ev)  0)
-   die(Failed adding stop event to epoll);
+   if (epoll_ctl(epoll_fd, EPOLL_CTL_ADD, stop_fd, ev)  0) {
+   pr_err(Failed adding stop event to epoll);
+   ret = -EFAULT;
+   goto err_stop;
+   }
 
-   if (pthread_create(thread, NULL, kvm_ipc__thread, NULL) != 0)
-   die(Failed starting IPC thread);
+   if (pthread_create(thread, NULL, kvm_ipc__thread, NULL) != 0) {
+   pr_err(Failed starting IPC thread);
+   ret = -EFAULT;
+   goto err_stop;
+   }
 
return 0;
+
+err_stop:
+   close(stop_fd);
+err_epoll:
+   close(epoll_fd);
+err:
+   return ret;
 }
 
 int kvm_ipc__stop(void)
diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
index 192d70e..f02d5df 100644
--- a/tools/kvm/kvm.c
+++ b/tools/kvm/kvm.c
@@ -390,7 +390,12 @@ struct kvm *kvm__init(const char *kvm_dev, const char 
*hugetlbfs_path, u64 ram_s
 
kvm__arch_init(kvm, hugetlbfs_path, ram_size);
 
-   kvm_ipc__start(kvm__create_socket(kvm));
+   ret = kvm_ipc__start(kvm__create_socket(kvm));
+   if (ret  0) {
+   pr_err(Starting ipc failed.);
+   goto err_vm_fd;
+   }
+
kvm_ipc__register_handler(KVM_IPC_PID, kvm__pid);
 
return kvm;
-- 
1.7.8.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/4] kvm tool: ensure kvm_ipc__register_handler success

By checking the return value from kvm_ipc__register_handler,
we can ensure that it succeeds.

Signed-off-by: Yang Bai hamo...@gmail.com
---
 tools/kvm/kvm.c |8 +++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c
index f02d5df..99bcef4 100644
--- a/tools/kvm/kvm.c
+++ b/tools/kvm/kvm.c
@@ -396,10 +396,16 @@ struct kvm *kvm__init(const char *kvm_dev, const char 
*hugetlbfs_path, u64 ram_s
goto err_vm_fd;
}
 
-   kvm_ipc__register_handler(KVM_IPC_PID, kvm__pid);
+   ret = kvm_ipc__register_handler(KVM_IPC_PID, kvm__pid);
+   if (ret  0) {
+   pr_err(Register ipc handler failed.);
+   goto err_ipc;
+   }
 
return kvm;
 
+err_ipc:
+   kvm_ipc__stop();
 err_vm_fd:
close(kvm-vm_fd);
 err_sys_fd:
-- 
1.7.8.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: x86: kvmclock: abstract save/restore sched_clock_state

2012-02-10 Thread Amit Shah

On (Thu) 09 Feb 2012 [16:13:29], Igor Mammedov wrote:

 Stalls are probably caused by uninitialized percpu hv_clock, with
 following patch I don't see stalls. Although I might be just lucky.
 http://git.kernel.org/?p=virt/kvm/kvm.git;a=commit;h=e2971ac7e1d186af059e088d305496c5cb47d487

Your commit does make things better, I don't see any stalls on the
first resume.

However, a subsequent s4 causes the stall to re-appear on resume, and
this time there are no stall messages; the kernel just sits there
spinning on something.  I've not found the solution to this one yet (I
had a commit similar to Marcelo's in the works, which got me to the
previous works-but-stalls behaviour).

 However there is/are a warning/s on suspend path and with following patch:

I didn't see this.

Amit
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 2/3] KVM: PPC: epapr: Add idle hcall support for host

And add a new flag definition in kvm_ppc_pvinfo to indicate
whether host support EV_IDLE hcall.

Signed-off-by: Liu Yu yu@freescale.com
---
v3:
no change

 arch/powerpc/include/asm/kvm_para.h |   14 --
 arch/powerpc/kvm/powerpc.c  |8 
 include/linux/kvm.h |2 ++
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index 50533f9..e8632b6 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -41,9 +41,19 @@ struct kvm_vcpu_arch_shared {
 };
 
 #define KVM_SC_MAGIC_R00x4b564d21 /* KVM! */
-#define HC_VENDOR_KVM  (42  16)
+
+#include asm/epapr_hcalls.h
+
+/* ePAPR Hypercall Vendor ID */
+#define HC_VENDOR_EPAPR(EV_EPAPR_VENDOR_ID  16)
+#define HC_VENDOR_KVM  (EV_KVM_VENDOR_ID  16)
+
+/* ePAPR Hypercall Token */
+#define HC_EV_IDLE EV_IDLE
+
+/* ePAPR Hypercall Return Codes */
 #define HC_EV_SUCCESS  0
-#define HC_EV_UNIMPLEMENTED12
+#define HC_EV_UNIMPLEMENTEDEV_UNIMPLEMENTED
 
 #define KVM_FEATURE_MAGIC_PAGE 1
 
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index c33f6a7..1242ee1 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -81,6 +81,10 @@ int kvmppc_kvm_pv(struct kvm_vcpu *vcpu)
 
/* Second return value is in r4 */
break;
+   case HC_VENDOR_EPAPR | HC_EV_IDLE:
+   r = HC_EV_SUCCESS;
+   kvm_vcpu_block(vcpu);
+   break;
default:
r = HC_EV_UNIMPLEMENTED;
break;
@@ -772,6 +776,10 @@ static int kvm_vm_ioctl_get_pvinfo(struct kvm_ppc_pvinfo 
*pvinfo)
pvinfo-hcall[2] = inst_sc;
pvinfo-hcall[3] = inst_nop;
 
+#ifdef CONFIG_BOOKE
+   pvinfo-flags |= KVM_PPC_PVINFO_FLAGS_EV_IDLE;
+#endif
+
return 0;
 }
 
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index c107fae..501712d 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -429,6 +429,8 @@ struct kvm_ppc_pvinfo {
__u8  pad[108];
 };
 
+#define KVM_PPC_PVINFO_FLAGS_EV_IDLE   (10)
+
 #define KVMIO 0xAE
 
 /*
-- 
1.7.0.4


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 1/3] KVM: PPC: epapr: Factor out the epapr init

from the kvm guest paravirt init code.

Signed-off-by: Liu Yu yu@freescale.com
---
v3:
apply the epapr init for all ppc platform

 arch/powerpc/Kconfig|4 +++
 arch/powerpc/include/asm/epapr_hcalls.h |8 +
 arch/powerpc/kernel/Makefile|1 +
 arch/powerpc/kernel/epapr_para.c|   46 +++
 arch/powerpc/kernel/kvm.c   |   13 +++--
 arch/powerpc/kvm/Kconfig|1 +
 6 files changed, 64 insertions(+), 9 deletions(-)
 create mode 100644 arch/powerpc/kernel/epapr_para.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 47682b6..00bd508 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -196,6 +196,10 @@ config EPAPR_BOOT
  Used to allow a board to specify it wants an ePAPR compliant wrapper.
default n
 
+config EPAPR_PARA
+   bool
+   default n
+
 config DEFAULT_UIMAGE
bool
help
diff --git a/arch/powerpc/include/asm/epapr_hcalls.h 
b/arch/powerpc/include/asm/epapr_hcalls.h
index f3b0c2c..c4b86e4 100644
--- a/arch/powerpc/include/asm/epapr_hcalls.h
+++ b/arch/powerpc/include/asm/epapr_hcalls.h
@@ -148,6 +148,14 @@
 #define EV_HCALL_CLOBBERS2 EV_HCALL_CLOBBERS3, r5
 #define EV_HCALL_CLOBBERS1 EV_HCALL_CLOBBERS2, r4
 
+extern u32 *epapr_hcall_insts;
+extern int epapr_hcall_insts_len;
+
+static inline void epapr_get_hcall_insts(u32 **instp, int *lenp)
+{
+   *instp = epapr_hcall_insts;
+   *lenp = epapr_hcall_insts_len;
+}
 
 /*
  * We use uintptr_t to define a register because it's guaranteed to be a
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index ce4f7f1..1e41c76 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -134,6 +134,7 @@ ifneq ($(CONFIG_XMON)$(CONFIG_KEXEC),)
 obj-y  += ppc_save_regs.o
 endif
 
+obj-$(CONFIG_EPAPR_PARA)   += epapr_para.o
 obj-$(CONFIG_KVM_GUEST)+= kvm.o kvm_emul.o
 
 # Disable GCOV in odd or sensitive code
diff --git a/arch/powerpc/kernel/epapr_para.c b/arch/powerpc/kernel/epapr_para.c
new file mode 100644
index 000..7e1561a
--- /dev/null
+++ b/arch/powerpc/kernel/epapr_para.c
@@ -0,0 +1,46 @@
+/*
+ * ePAPR para-virtualization support.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright (C) 2012 Freescale Semiconductor, Inc.
+ */
+
+#include linux/of.h
+#include asm/epapr_hcalls.h
+#include asm/cacheflush.h
+
+u32 *epapr_hcall_insts;
+int epapr_hcall_insts_len;
+
+static int __init epapr_para_init(void)
+{
+   struct device_node *hyper_node;
+   u32 *insts;
+   int len;
+
+   hyper_node = of_find_node_by_path(/hypervisor);
+   if (!hyper_node)
+   return -ENODEV;
+
+   insts = (u32*)of_get_property(hyper_node, hcall-instructions, len);
+   if (!(len % 4)  (len = (4 * 4))) {
+   epapr_hcall_insts = insts;
+   epapr_hcall_insts_len = len;
+   }
+
+   return 0;
+}
+
+early_initcall(epapr_para_init);
diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index b06bdae..2e03ab8 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -28,6 +28,7 @@
 #include asm/sections.h
 #include asm/cacheflush.h
 #include asm/disassemble.h
+#include asm/epapr_hcalls.h
 
 #define KVM_MAGIC_PAGE (-4096L)
 #define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x)
@@ -535,18 +536,12 @@ EXPORT_SYMBOL_GPL(kvm_hypercall);
 static int kvm_para_setup(void)
 {
extern u32 kvm_hypercall_start;
-   struct device_node *hyper_node;
u32 *insts;
int len, i;
 
-   hyper_node = of_find_node_by_path(/hypervisor);
-   if (!hyper_node)
-   return -1;
-
-   insts = (u32*)of_get_property(hyper_node, hcall-instructions, len);
-   if (len % 4)
-   return -1;
-   if (len  (4 * 4))
+   insts = epapr_hcall_insts;
+   len = epapr_hcall_insts_len;
+   if (insts == NULL)
return -1;
 
for (i = 0; i  (len / 4); i++)
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 78133de..cd1ee68 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -20,6 +20,7 @@ config KVM
bool
select PREEMPT_NOTIFIERS
select ANON_INODES
+   select EPAPR_PARA

[PATCH v3 3/3] KVM: PPC: epapr: install ev_idle hcall for e500 guest

If the guest hypervisor node contains has-idle property.

Signed-off-by: Liu Yu yu@freescale.com
---
v3:
1. apply the hcall idle for all ppc platform
2. add a loop to prevent spurious wakeups

 arch/powerpc/kernel/Makefile |2 +-
 arch/powerpc/kernel/epapr.S  |   47 ++
 arch/powerpc/kernel/epapr_para.c |   14 ++-
 3 files changed, 61 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/kernel/epapr.S

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 1e41c76..65e24be 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -134,7 +134,7 @@ ifneq ($(CONFIG_XMON)$(CONFIG_KEXEC),)
 obj-y  += ppc_save_regs.o
 endif
 
-obj-$(CONFIG_EPAPR_PARA)   += epapr_para.o
+obj-$(CONFIG_EPAPR_PARA)   += epapr_para.o epapr.o
 obj-$(CONFIG_KVM_GUEST)+= kvm.o kvm_emul.o
 
 # Disable GCOV in odd or sensitive code
diff --git a/arch/powerpc/kernel/epapr.S b/arch/powerpc/kernel/epapr.S
new file mode 100644
index 000..34cc54f
--- /dev/null
+++ b/arch/powerpc/kernel/epapr.S
@@ -0,0 +1,47 @@
+/*
+ * Copyright (C) 2012 Freescale Semiconductor, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include linux/threads.h
+#include asm/reg.h
+#include asm/page.h
+#include asm/cputable.h
+#include asm/thread_info.h
+#include asm/ppc_asm.h
+#include asm/asm-offsets.h
+
+#define HC_VENDOR_EPAPR(1  16)
+#define HC_EV_IDLE 16
+
+_GLOBAL(epapr_ev_idle)
+epapr_ev_idle:
+#ifdef CONFIG_E500
+   rlwinm  r3,r1,0,0,31-THREAD_SHIFT   /* current thread_info */
+   lwz r4,TI_LOCAL_FLAGS(r3)   /* set napping bit */
+   ori r4,r4,_TLF_NAPPING  /* so when we take an exception */
+   stw r4,TI_LOCAL_FLAGS(r3)   /* it will return to our caller */
+#endif
+   wrteei  1
+
+idle_loop:
+   LOAD_REG_IMMEDIATE(r11, HC_VENDOR_EPAPR | HC_EV_IDLE)
+
+/* Hypercall entry point. Will be patched with device tree instructions. */
+.global epapr_hypercall_start
+epapr_hypercall_start:
+   li  r3, -1
+   nop
+   nop
+   nop
+
+   /*
+* Guard against spurious wakeups (e.g. from a hypervisor) --
+* any real interrupt will cause us to return to LR due to
+* _TLF_NAPPING.
+*/
+   b   idle_loop
diff --git a/arch/powerpc/kernel/epapr_para.c b/arch/powerpc/kernel/epapr_para.c
index 7e1561a..ff8fb78 100644
--- a/arch/powerpc/kernel/epapr_para.c
+++ b/arch/powerpc/kernel/epapr_para.c
@@ -20,6 +20,10 @@
 #include linux/of.h
 #include asm/epapr_hcalls.h
 #include asm/cacheflush.h
+#include asm/machdep.h
+
+extern void epapr_ev_idle(void);
+extern u32 epapr_hypercall_start[];
 
 u32 *epapr_hcall_insts;
 int epapr_hcall_insts_len;
@@ -28,7 +32,7 @@ static int __init epapr_para_init(void)
 {
struct device_node *hyper_node;
u32 *insts;
-   int len;
+   int len, i;
 
hyper_node = of_find_node_by_path(/hypervisor);
if (!hyper_node)
@@ -38,8 +42,16 @@ static int __init epapr_para_init(void)
if (!(len % 4)  (len = (4 * 4))) {
epapr_hcall_insts = insts;
epapr_hcall_insts_len = len;
+
+   for (i = 0; i  (len / 4); i++)
+   epapr_hypercall_start[i] = insts[i];
+   flush_icache_range((ulong)epapr_hypercall_start,
+  (ulong)epapr_hypercall_start + len);
}
 
+   if (of_get_property(hyper_node, has-idle, NULL))
+   ppc_md.power_save = epapr_ev_idle;
+
return 0;
 }
 
-- 
1.7.0.4


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: WARNING: at arch/x86/kernel/smp.c:119 native_smp_send_reschedule+0x25/0x43()

2012-02-10 Thread Srivatsa S. Bhat


Adding Suresh and Peter to Cc.

On 02/10/2012 01:16 AM, Sasha Levin wrote:

 On Wed, Feb 8, 2012 at 7:59 PM, Josh Boyer jwbo...@gmail.com wrote:
 On Wed, Feb 8, 2012 at 8:31 PM, Sasha Levin levinsasha...@gmail.com wrote:
 Hi all,

 I got the following warning when shutting down a KVM guest with a whole 
 bunch of cores (254 in this case).

 It's actually pretty easy to reproduce it, it happens every once in 2-3 
 shutdowns.

 [   32.448626] [ cut here ]
 [   32.449160] WARNING: at arch/x86/kernel/smp.c:119 
 native_smp_send_reschedule+0x25/0x43()
 [   32.449621] Pid: 1, comm: init_stage2 Not tainted 3.2.0+ #14
 [   32.449621] Call Trace:
 [   32.449621]  IRQ  [81041a44] ? 
 native_smp_send_reschedule+0x25/0x43
 [   32.449621]  [810735b2] warn_slowpath_common+0x7b/0x93
 [   32.449621]  [810962cc] ? tick_nohz_handler+0xc9/0xc9
 [   32.449621]  [81073675] warn_slowpath_null+0x15/0x18
 [   32.449621]  [81041a44] native_smp_send_reschedule+0x25/0x43
 [   32.449621]  [81067a00] smp_send_reschedule+0xa/0xc
 [   32.449621]  [8106f25e] scheduler_tick+0x21a/0x242
 [   32.449621]  [8107da10] update_process_times+0x62/0x73
 [   32.449621]  [81096336] tick_sched_timer+0x6a/0x8a
 [   32.449621]  [8108c5eb] __run_hrtimer.clone.26+0x55/0xcb
 [   32.449621]  [8108cd77] hrtimer_interrupt+0xcb/0x19b
 [   32.449621]  [810428a8] smp_apic_timer_interrupt+0x72/0x85
 [   32.449621]  [8165a8de] apic_timer_interrupt+0x6e/0x80
 [   32.449621]  EOI  [8165928e] ? 
 _raw_spin_unlock_irqrestore+0x3a/0x3e
 [   32.449621]  [81042f4e] ? arch_local_irq_restore+0x6/0xd
 [   32.449621]  [810430c4] 
 default_send_IPI_mask_allbutself_phys+0x78/0x88
 [   32.449621]  [8106c3c4] ? __migrate_task+0xf1/0xf1
 [   32.449621]  [81045445] physflat_send_IPI_allbutself+0x12/0x14
 [   32.449621]  [81041aaf] native_stop_other_cpus+0x4d/0xa8
 [   32.449621]  [810411c6] native_machine_shutdown+0x56/0x6d
 [   32.449621]  [81048499] kvm_shutdown+0x1a/0x1c
 [   32.449621]  [810411f9] machine_shutdown+0xa/0xc
 [   32.449621]  [81041265] native_machine_restart+0x20/0x32
 [   32.449621]  [81041297] machine_restart+0xa/0xc
 [   32.449621]  [81081d53] kernel_restart+0x49/0x4d
 [   32.449621]  [81081f26] sys_reboot+0x14b/0x18a
 [   32.449621]  [81089937] ? remove_wait_queue+0x4c/0x51
 [   32.449621]  [8107637f] ? do_wait+0x1a4/0x1e7
 [   32.449621]  [8107735a] ? sys_wait4+0xa8/0xbc
 [   32.449621]  [8107522b] ? clear_tsk_thread_flag+0xf/0xf
 [   32.449621]  [81659a25] ? async_page_fault+0x25/0x30
 [   32.449621]  [81659e92] system_call_fastpath+0x16/0x1b
 [   32.449621] ---[ end trace d0f03651493fd3d6 ]--

 You don't really point out exactly which kernel this is, but we saw this in
 3.3 git and it was fixed by commit 71325960d16cd68ea0e22a8da15b2495b0f363f7.
 Or at least something very like it was.
 
 The kernel there was vanilla 3.2 (as stated in the warning header).
 
 I've tried it again with linux-next from today which includes the
 commit you mentioned, and still get the same error.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: performance trouble

2012-02-10 Thread David Cure

hello,

Le Sun, Feb 05, 2012 at 11:38:34AM +0200, Avi Kivity ecrivait :
 
 Please post a trace as documented in http://www.linux-kvm.org/page/Tracing.

I made the trace : started just before the slow function launch
and stoped just after. I start only one VM with 2 vcpus/16G RAM and only one
user connected to the VM to launch the test.

The trace file is too big to post here, I gzip it and the file
is available here : http://www.roullier.net/report.txt.gz

I hope you can find something strange.

David.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: x86: kvmclock: abstract save/restore sched_clock_state

2012-02-10 Thread Igor Mammedov

Could you send me your .config and commit id of kernel you are using?

- Original Message -
 From: Amit Shah amit.s...@redhat.com
 To: Igor Mammedov imamm...@redhat.com
 Cc: Marcelo Tosatti mtosa...@redhat.com, kvm@vger.kernel.org, 
 t...@linutronix.de, mi...@redhat.com,
 h...@zytor.com, x...@kernel.org, johns...@us.ibm.com, r...@redhat.com, 
 a...@redhat.com
 Sent: Friday, February 10, 2012 11:02:11 AM
 Subject: Re: x86: kvmclock: abstract save/restore sched_clock_state

 On (Thu) 09 Feb 2012 [16:13:29], Igor Mammedov wrote:

  Stalls are probably caused by uninitialized percpu hv_clock, with
  following patch I don't see stalls. Although I might be just lucky.
  http://git.kernel.org/?p=virt/kvm/kvm.git;a=commit;h=e2971ac7e1d186af059e088d305496c5cb47d487

 Your commit does make things better, I don't see any stalls on the
 first resume.

 However, a subsequent s4 causes the stall to re-appear on resume, and
 this time there are no stall messages; the kernel just sits there
 spinning on something.  I've not found the solution to this one yet
 (I
 had a commit similar to Marcelo's in the works, which got me to the
 previous works-but-stalls behaviour).

  However there is/are a warning/s on suspend path and with following
  patch:

 I didn't see this.

   Amit
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: x86: kvmclock: abstract save/restore sched_clock_state

2012-02-10 Thread Amit Shah

On (Fri) 10 Feb 2012 [05:11:00], Igor Mammedov wrote:
 Could you send me your .config and commit id of kernel you are using?

Kernel's based on bd3ce7d57c380af110c86d19e256115d0e7053ca plus your
commit + Marcelo's patch.

config is attached below.

#
# Automatically generated file; DO NOT EDIT.
# Linux/x86_64 3.3.0-rc2 Kernel Configuration
#
CONFIG_64BIT=y
# CONFIG_X86_32 is not set
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT=elf64-x86-64
CONFIG_ARCH_DEFCONFIG=arch/x86/configs/x86_64_defconfig
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
# CONFIG_RWSEM_GENERIC_SPINLOCK is not set
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_DEFAULT_IDLE=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ZONE_DMA32=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_64_SMP=y
CONFIG_X86_HT=y
CONFIG_ARCH_HWEIGHT_CFLAGS=-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx 
-fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 
-fcall-saved-r11
# CONFIG_KTIME_SCALAR is not set
CONFIG_ARCH_CPU_PROBE_RELEASE=y
CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config
CONFIG_HAVE_IRQ_WORK=y
CONFIG_IRQ_WORK=y

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=
CONFIG_LOCALVERSION=
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
CONFIG_KERNEL_LZMA=y
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
CONFIG_DEFAULT_HOSTNAME=virthost
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
# CONFIG_FHANDLE is not set
# CONFIG_TASKSTATS is not set
CONFIG_AUDIT=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_WATCH=y
CONFIG_AUDIT_TREE=y
# CONFIG_AUDIT_LOGINUID_IMMUTABLE is not set
CONFIG_HAVE_GENERIC_HARDIRQS=y

#
# IRQ subsystem
#
CONFIG_GENERIC_HARDIRQS=y
CONFIG_HAVE_SPARSE_IRQ=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y

#
# RCU Subsystem
#
CONFIG_TREE_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
# CONFIG_RCU_TRACE is not set
CONFIG_RCU_FANOUT=64
# CONFIG_RCU_FANOUT_EXACT is not set
# CONFIG_RCU_FAST_NO_HZ is not set
# CONFIG_TREE_RCU_TRACE is not set
# CONFIG_RCU_BOOST is not set
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=17
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CGROUP_DEVICE is not set
# CONFIG_CPUSETS is not set
# CONFIG_CGROUP_CPUACCT is not set
# CONFIG_RESOURCE_COUNTERS is not set
# CONFIG_CGROUP_PERF is not set
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
# CONFIG_CFS_BANDWIDTH is not set
# CONFIG_RT_GROUP_SCHED is not set
# CONFIG_BLK_CGROUP is not set
# CONFIG_CHECKPOINT_RESTORE is not set
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_IPC_NS=y
CONFIG_USER_NS=y
CONFIG_PID_NS=y
CONFIG_NET_NS=y
CONFIG_SCHED_AUTOGROUP=y
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_RD_XZ=y
CONFIG_RD_LZO=y
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
# CONFIG_EXPERT is not set
# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
# CONFIG_EMBEDDED is not set
CONFIG_HAVE_PERF_EVENTS=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# CONFIG_PERF_COUNTERS is not set
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_PCI_QUIRKS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_COMPAT_BRK is not set
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_PROFILING is not set
CONFIG_TRACEPOINTS=y
CONFIG_HAVE_OPROFILE=y
CONFIG_OPROFILE_NMI_TIMER=y
# CONFIG_KPROBES is not set
# CONFIG_JUMP_LABEL is not set

QEMU applying for Google Summer of Code 2012

2012-02-10 Thread Stefan Hajnoczi

This year's Google Summer of Code has been announced:

http://www.google-melange.com/gsoc/events/google/gsoc2012

For those who haven't heard of GSoC before, it funds university
students to work on open source projects during the summer.
Organizations, such as QEMU, can participate to attract students who
will tackle projects for 12 weeks this summer.  The GSoC program has
been very successful because it gives students real open source
experience and organizations can grow their development community.

QEMU has participated for several years and I would like to organize
our participation this year.  Luiz was QEMU organization administrator
last year and contacted me because he will not have time this year.  I
will prepare the application form for QEMU so that we will be
considered for 2012.

Umbrella organization
-
Like last year, we can provide a home for KVM kernel module and
libvirt projects too if those organizations prefer not to apply to
GSoC themselves.  Please let us know so we can work together!

Ideas list
--
The starting point for student candidates is our Ideas List.  I have
created a new page for this year - please add project ideas that you'd
like students to work on:

http://wiki.qemu.org/Google_Summer_of_Code_2012

Here is last year's list:

http://wiki.qemu.org/Google_Summer_of_Code_2011

A GSoC project should be achievable in 12 weeks by someone who is
competent in C programming but does not have prior QEMU coding
experience.  Students normally work full-time (5 days per week).
Please also indicate if you are willing to mentor a student for your
project idea.  I have provided a wiki template on the page so you can
easily add project ideas.

Mentors needed
--
Each student that we accept needs a mentor.  Mentors are QEMU
developers who are willing to answer questions, review code, give
advice, and evaluate the student's progress.  This is a time
commitment but also a good experience that I have enjoyed and would
recommend.

Timeline

Please add project ideas to the wiki now:
http://wiki.qemu.org/Google_Summer_of_Code_2012

Feb 27 - Mar 9: I will submit QEMU's application form

Mar 17 - Apr 20: Mentors respond to student candidates, interview
them, and select the best candidate

May 21 - Aug 20: Students work on their projects

Please let me know if you have any questions or suggestions!

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] QEMU applying for Google Summer of Code 2012

2012-02-10 Thread Daniel P. Berrange

On Fri, Feb 10, 2012 at 10:30:24AM +, Stefan Hajnoczi wrote:
 This year's Google Summer of Code has been announced:
 
 http://www.google-melange.com/gsoc/events/google/gsoc2012
 
 For those who haven't heard of GSoC before, it funds university
 students to work on open source projects during the summer.
 Organizations, such as QEMU, can participate to attract students who
 will tackle projects for 12 weeks this summer.  The GSoC program has
 been very successful because it gives students real open source
 experience and organizations can grow their development community.
 
 QEMU has participated for several years and I would like to organize
 our participation this year.  Luiz was QEMU organization administrator
 last year and contacted me because he will not have time this year.  I
 will prepare the application form for QEMU so that we will be
 considered for 2012.
 
 Umbrella organization
 -
 Like last year, we can provide a home for KVM kernel module and
 libvirt projects too if those organizations prefer not to apply to
 GSoC themselves.  Please let us know so we can work together!

To maximise the spirit of collaboration between libvirt  QEMU/KVM
communities I think it would make sense for us to work together under
the same GSoC Umbrella organization.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 0/3] virtio-scsi driver

2012-02-10 Thread Stefan Hajnoczi

On Sun, Feb 5, 2012 at 11:15 AM, Paolo Bonzini pbonz...@redhat.com wrote:
 This is the first implementation of the virtio-scsi driver, a virtual
 HBA that will be supported by KVM.  It implements a subset of the spec,
 in particular it does not implement asynchronous notifications for either
 LUN reset/removal/addition or CD-ROM media events, but it is already
 functional and usable.

 Other matching bits:

 - spec at http://people.redhat.com/pbonzini/virtio-spec.pdf

 - QEMU implementation at git://github.com/bonzini/qemu.git,
  branch virtio-scsi

 Please review.  Getting this in 3.3 is starting to look like wishful thinking,
 but the possibility of regressions is obviously zero so I'm still dreaming.
 Otherwise, that would be 3.4.

Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [libvirt] QEMU applying for Google Summer of Code 2012

2012-02-10 Thread Stefan Hajnoczi

On Fri, Feb 10, 2012 at 10:59 AM, Daniel P. Berrange
berra...@redhat.com wrote:
 On Fri, Feb 10, 2012 at 10:30:24AM +, Stefan Hajnoczi wrote:
 This year's Google Summer of Code has been announced:

 http://www.google-melange.com/gsoc/events/google/gsoc2012

 For those who haven't heard of GSoC before, it funds university
 students to work on open source projects during the summer.
 Organizations, such as QEMU, can participate to attract students who
 will tackle projects for 12 weeks this summer.  The GSoC program has
 been very successful because it gives students real open source
 experience and organizations can grow their development community.

 QEMU has participated for several years and I would like to organize
 our participation this year.  Luiz was QEMU organization administrator
 last year and contacted me because he will not have time this year.  I
 will prepare the application form for QEMU so that we will be
 considered for 2012.

 Umbrella organization
 -
 Like last year, we can provide a home for KVM kernel module and
 libvirt projects too if those organizations prefer not to apply to
 GSoC themselves.  Please let us know so we can work together!

 To maximise the spirit of collaboration between libvirt  QEMU/KVM
 communities I think it would make sense for us to work together under
 the same GSoC Umbrella organization.

Excellent, there are many project ideas that could touch both
codebases.  Please feel free to add project ideas to the wiki:

http://wiki.qemu.org/Google_Summer_of_Code_2012

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/4] kvm tool: Stop init if check_extensions failed

2012-02-10 Thread Pekka Enberg

On Fri, Feb 10, 2012 at 11:55 AM, Yang Bai hamo...@gmail.com wrote:
 If kvm__check_extensions found that some of the required
 KVM extention is not supported by OS, we should stop the
 init and free all allocated resources.

 Signed-off-by: Yang Bai hamo...@gmail.com

Applied all four patches, thanks!
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: x86: kvmclock: abstract save/restore sched_clock_state

On Fri, Feb 10, 2012 at 03:32:11PM +0530, Amit Shah wrote:
 On (Thu) 09 Feb 2012 [16:13:29], Igor Mammedov wrote:
 
  Stalls are probably caused by uninitialized percpu hv_clock, with
  following patch I don't see stalls. Although I might be just lucky.
  http://git.kernel.org/?p=virt/kvm/kvm.git;a=commit;h=e2971ac7e1d186af059e088d305496c5cb47d487
 
 Your commit does make things better, I don't see any stalls on the
 first resume.
 
 However, a subsequent s4 causes the stall to re-appear on resume, and
 this time there are no stall messages; the kernel just sits there
 spinning on something.  I've not found the solution to this one yet (I
 had a commit similar to Marcelo's in the works, which got me to the
 previous works-but-stalls behaviour).

I cannot reproduce it here. Suspend/resume are operating normally after
several iterations. Igor do you see anything similar?

Amit, can you please enable CONFIG_PRINTK_TIME=y and post a full dmesg 
(both during suspend and also the new kernel during resume).

Thanks.

  However there is/are a warning/s on suspend path and with following patch:
 
 I didn't see this.

This is unrelated.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: x86: kvmclock: abstract save/restore sched_clock_state

On Fri, Feb 10, 2012 at 10:32:16AM -0200, Marcelo Tosatti wrote:
 On Fri, Feb 10, 2012 at 03:32:11PM +0530, Amit Shah wrote:
  On (Thu) 09 Feb 2012 [16:13:29], Igor Mammedov wrote:
  
   Stalls are probably caused by uninitialized percpu hv_clock, with
   following patch I don't see stalls. Although I might be just lucky.
   http://git.kernel.org/?p=virt/kvm/kvm.git;a=commit;h=e2971ac7e1d186af059e088d305496c5cb47d487
  
  Your commit does make things better, I don't see any stalls on the
  first resume.
  
  However, a subsequent s4 causes the stall to re-appear on resume, and
  this time there are no stall messages; the kernel just sits there
  spinning on something.  I've not found the solution to this one yet (I
  had a commit similar to Marcelo's in the works, which got me to the
  previous works-but-stalls behaviour).
 
 I cannot reproduce it here. Suspend/resume are operating normally after
 several iterations. Igor do you see anything similar?
 
 Amit, can you please enable CONFIG_PRINTK_TIME=y and post a full dmesg 
 (both during suspend and also the new kernel during resume).

Also is it reproducible with UP guest?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: x86: kvmclock: abstract save/restore sched_clock_state

2012-02-10 Thread Igor Mammedov

On 02/10/2012 01:33 PM, Marcelo Tosatti wrote:

On Fri, Feb 10, 2012 at 10:32:16AM -0200, Marcelo Tosatti wrote:

On Fri, Feb 10, 2012 at 03:32:11PM +0530, Amit Shah wrote:

On (Thu) 09 Feb 2012 [16:13:29], Igor Mammedov wrote:

Stalls are probably caused by uninitialized percpu hv_clock, with
following patch I don't see stalls. Although I might be just lucky.
http://git.kernel.org/?p=virt/kvm/kvm.git;a=commit;h=e2971ac7e1d186af059e088d305496c5cb47d487

Your commit does make things better, I don't see any stalls on the
first resume.

However, a subsequent s4 causes the stall to re-appear on resume, and
this time there are no stall messages; the kernel just sits there
spinning on something. I've not found the solution to this one yet (I
had a commit similar to Marcelo's in the works, which got me to the
previous works-but-stalls behaviour).

I cannot reproduce it here. Suspend/resume are operating normally after
several iterations. Igor do you see anything similar?

I wasn't able to reproduce it either but I haven't tried with Amit's config
yet.

Amit, can you please enable CONFIG_PRINTK_TIME=y and post a full dmesg
(both during suspend and also the new kernel during resume).

Also is it reproducible with UP guest?

Another thing is to try smp guest without kvmclock and see if it helps.
It might be just something else.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Thanks,
Igor
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [RFC] need to improve slot creation/destruction? -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-10 Thread Takuya Yoshikawa

Avi Kivity a...@redhat.com wrote:

 On 02/09/2012 04:23 PM, Avi Kivity wrote:
   BTW do we really need fast slot creation/destruction?
 
  Not really, but it's good to have infrastructure that copes with
  different workloads.  If the patches keep the code simple I think it's a
  good thing to have.

My patch is pretty simple.

 To qualify - taking several tens of milliseconds is out of the question
 as some workloads grind to a halt.  But it doesn't need to be incredibly
 fast.

I think the problem is not how long it takes to create/invalidate a slot
but following faults and shadow pages reconstruction.

Similar to mmu shrinker problem we discussed before.

E.g. users will not expect sudden latency induced by pci hotplug.

Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: x86: kvmclock: abstract save/restore sched_clock_state

2012-02-10 Thread Amit Shah

On (Fri) 10 Feb 2012 [10:32:16], Marcelo Tosatti wrote:
 On Fri, Feb 10, 2012 at 03:32:11PM +0530, Amit Shah wrote:
  On (Thu) 09 Feb 2012 [16:13:29], Igor Mammedov wrote:
  
   Stalls are probably caused by uninitialized percpu hv_clock, with
   following patch I don't see stalls. Although I might be just lucky.
   http://git.kernel.org/?p=virt/kvm/kvm.git;a=commit;h=e2971ac7e1d186af059e088d305496c5cb47d487
  
  Your commit does make things better, I don't see any stalls on the
  first resume.
  
  However, a subsequent s4 causes the stall to re-appear on resume, and
  this time there are no stall messages; the kernel just sits there
  spinning on something.  I've not found the solution to this one yet (I
  had a commit similar to Marcelo's in the works, which got me to the
  previous works-but-stalls behaviour).
 
 I cannot reproduce it here. Suspend/resume are operating normally after
 several iterations. Igor do you see anything similar?
 
 Amit, can you please enable CONFIG_PRINTK_TIME=y and post a full dmesg 
 (both during suspend and also the new kernel during resume).

In my case, I run a ping to the host (10.0.2.2) while the s4
suspend/resume operations are performed.

Complete dmesg, for all 3 invocations of the guest.  First one boots,
starts ping, starts s4.  Second one starts s4 after confirming ping is
working fine.  Third one just stays there, spinning.

$ ./x86_64-softmmu/qemu-system-x86_64 -kernel ~/src/linux/arch/x86/boot/bzImage 
 -append 'root=/dev/vda1 console=tty0 console=ttyS0 no_console_suspend' -drive 
file=/guests/f14-suspend.qcow2,if=none,id=dr0 -device virtio-blk-pci,drive=dr0 
-net nic,model=virtio -net user  -serial stdio  -enable-kvm -m 512  -cpu host 
-smp 4
[0.00] Initializing cgroup subsys cpu
[0.00] Linux version 3.3.0-rc2+ (a...@amit.redhat.com) (gcc version 
4.6.2 20111027 (Red Hat 4.6.2-1) (GCC) ) #295 SMP PREEMPT Fri Feb 10 18:39:48 
IST 2012
[0.00] Command line: root=/dev/vda1 console=tty0 console=ttyS0 
no_console_suspend
[0.00] BIOS-provided physical RAM map:
[0.00]  BIOS-e820:  - 0009dc00 (usable)
[0.00]  BIOS-e820: 0009dc00 - 000a (reserved)
[0.00]  BIOS-e820: 000f - 0010 (reserved)
[0.00]  BIOS-e820: 0010 - 1fffd000 (usable)
[0.00]  BIOS-e820: 1fffd000 - 2000 (reserved)
[0.00]  BIOS-e820: feffc000 - ff00 (reserved)
[0.00]  BIOS-e820: fffc - 0001 (reserved)
[0.00] NX (Execute Disable) protection: active
[0.00] DMI 2.4 present.
[0.00] No AGP bridge found
[0.00] last_pfn = 0x1fffd max_arch_pfn = 0x4
[0.00] x86 PAT enabled: cpu 0, old 0x70406, new 0x7010600070106
[0.00] init_memory_mapping: -1fffd000
[0.00] RAMDISK: 1fa2e000 - 1fff
[0.00] ACPI: RSDP 000fd3f0 00014 (v00 BOCHS )
[0.00] ACPI: RSDT 1fffd660 00034 (v01 BOCHS  BXPCRSDT 0001 
BXPC 0001)
[0.00] ACPI: FACP 1f80 00074 (v01 BOCHS  BXPCFACP 0001 
BXPC 0001)
[0.00] ACPI: DSDT 1fffd9b0 02589 (v01   BXPC   BXDSDT 0001 
INTL 20100528)
[0.00] ACPI: FACS 1f40 00040
[0.00] ACPI: SSDT 1fffd7e0 001C1 (v01 BOCHS  BXPCSSDT 0001 
BXPC 0001)
[0.00] ACPI: APIC 1fffd6e0 0008A (v01 BOCHS  BXPCAPIC 0001 
BXPC 0001)
[0.00] ACPI: HPET 1fffd6a0 00038 (v01 BOCHS  BXPCHPET 0001 
BXPC 0001)
[0.00] kvm-clock: Using msrs 4b564d01 and 4b564d00
[0.00] kvm-clock: cpu 0, msr 0:18509c1, boot clock
[0.00] Zone PFN ranges:
[0.00]   DMA  0x0010 - 0x1000
[0.00]   DMA320x1000 - 0x0010
[0.00]   Normal   empty
[0.00] Movable zone start PFN for each node
[0.00] Early memory PFN ranges
[0.00] 0: 0x0010 - 0x009d
[0.00] 0: 0x0100 - 0x0001fffd
[0.00] ACPI: PM-Timer IO Port: 0xb008
[0.00] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[0.00] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
[0.00] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
[0.00] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
[0.00] ACPI: IOAPIC (id[0x04] address[0xfec0] gsi_base[0])
[0.00] IOAPIC[0]: apic_id 4, version 17, address 0xfec0, GSI 0-23
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[0.00] Using ACPI (MADT) for SMP configuration information

Re: [RFC] need to improve slot creation/destruction? -- Re: [RFC][PATCH] srcu: Implement call_srcu()

2012-02-10 Thread Takuya Yoshikawa

Avi Kivity a...@redhat.com wrote:

  2. When we create(and shift?) a memory slot, we call kvm_arch_flush_shadow()
  to clear all mmio sptes, again not restricted to that slot.
 
  /*
   * If the new memory slot is created, we need to clear all
   * mmio sptes.
   */
  if (npages  old.base_gfn != mem-guest_phys_addr  PAGE_SHIFT)
  kvm_arch_flush_shadow(kvm);
 
 This is pretty rare outside the previous scenario (memory/pci hotplug).

Is this condition correct?

When npages != 0 and old.npages == 0, the slot is being newly created, do we
really need to flush shadow pages?

This should be
if (npages  old.npages  (old.base_gfn != base_gfn))
No?

Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware

2012-02-10 Thread Roopa Prabhu




On 2/9/12 9:36 AM, John Fastabend john.r.fastab...@intel.com wrote:

 On 2/8/2012 8:36 PM, Stephen Hemminger wrote:
 On Wed, 08 Feb 2012 19:22:06 -0800
 John Fastabend john.r.fastab...@intel.com wrote:
 
 Propagate software FDB table into hardware uc, mc lists when
 the NETIF_F_HW_FDB is set.
 
 This resolves the case below where an embedded switch is used
 in hardware to do inter-VF or VF-PF switching. This patch
 pushes the FDB entry (specifically the MAC address) into the
 embedded switch with dev_add_uc and dev_add_mc so the switch
 learns about the software bridge.
 
 
   veth0  veth2
 |  |
   
   |  bridge0 |    software bridging
   
/
/
   ethx.y  ethx
 VF PF
  \ \   propagate FDB entries to HW
  \ \
   
   |  Embedded Bridge | hardware offloaded switching
   
 
 This is only an RFC couple more changes are needed.
 
 (1) Optimize HW FDB set/del to only walk list if an FDB offloaded
 device is attached. Or decide it doesn't matter from unlikely()
 path.
 
 (2) Is it good enough to just call dev_uc_{add|del} or
 dev_mc_{add|del}? Or do some devices really need a new netdev
 callback to do this operation correctly. I think it should be
 good enough as is.
 
 (3) wrapped list walk in rcu_read_lock() just in case maybe every
 case is already inside rcu_read_lock()/unlock().
 
 Also this is in response to this thread regarding the macvlan and
 exposing rx filters posting now to see if folks think this is the
 right idea and if it will resolve at least the bridge case.
 
 http://lists.openwall.net/netdev/2011/11/08/135
 
 Signed-off-by: John Fastabend john.r.fastab...@intel.com
 ---
 
  include/linux/netdev_features.h |2 ++
  net/bridge/br_fdb.c |   34 ++
  2 files changed, 36 insertions(+), 0 deletions(-)
 
 diff --git a/include/linux/netdev_features.h
 b/include/linux/netdev_features.h
 index 77f5202..5936fae 100644
 
 Rather than yet another device feature, I would rather use netlink_notifier
 callback. The notifier is more general and generic without messing with
 internals
 of bridge.
 
 
 But the device features makes it easy for user space to learn that the device
 supports this sort of offload. Now if all SR-IOV devices support this then it
 doesn't matter but I thought there were SR-IOV devices that didn't do any
 switching? I'll dig through the SR-IOV drivers to check there are not too
 many of them.

Correct. Our 802.1Qbh sriov device (enic) does not do local switching.

 
 By netlink_notifier do you mean adding a notifier_block and using
 atomic_notifier_call_chain()
 probably in rtnl_notify()? Then drivers could register with the notifier chain
 with
 atomic_notifier_chain_register() and receive the events correctly. Or did I
 miss
 some notifier chain that already exists?
 
 Thanks,
 John
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

virtio-blk performance regression and qemu-kvm

2012-02-10 Thread Dongsu Park

Hi,

Recently I observed performance regression regarding virtio-blk,
especially different IO bandwidths between qemu-kvm 0.14.1 and 1.0.
So I want to share the benchmark results, and ask you what the reason
would be.

1. Test condition

 - On host, ramdisk-backed block device (/dev/ram0)
 - qemu-kvm is configured with virtio-blk driver for /dev/ram0,
   which is detected as /dev/vdb inside the guest VM.
 - Host System: Ubuntu 11.10 / Kernel 3.2
 - Guest System: Debian 6.0 / Kernel 3.0.6
 - Host I/O scheduler : deadline
 - testing tool : fio

2. Raw performance on the host

 If we test I/O with fio on /dev/ram0 on the host,

 - Sequential read (on the host)
  # fio -name iops -rw=read -size=1G -iodepth 1 \
   -filename /dev/ram0 -ioengine libaio -direct=1 -bs=4096

 - Sequential write (on the host)
  # fio -name iops -rw=write -size=1G -iodepth 1 \
   -filename /dev/ram0 -ioengine libaio -direct=1 -bs=4096

 Result:

  read   1691,6 MByte/s
  write   898,9 MByte/s

 No wonder, it's extremely fast.

3. Comparison with different qemu-kvm versions

 Now I'm running benchmarks with both qemu-kvm 0.14.1 and 1.0.

 - Sequential read (Running inside guest)
   # fio -name iops -rw=read -size=1G -iodepth 1 \
-filename /dev/vdb -ioengine libaio -direct=1 -bs=4096

 - Sequential write (Running inside guest)
   # fio -name iops -rw=write -size=1G -iodepth 1 \
-filename /dev/vdb -ioengine libaio -direct=1 -bs=4096

 For each one, I tested 3 times to get the average.

 Result:

  seqread with qemu-kvm 0.14.1   67,0 MByte/s
  seqread with qemu-kvm 1.0  30,9 MByte/s

  seqwrite with qemu-kvm 0.14.1  65,8 MByte/s
  seqwrite with qemu-kvm 1.0 30,5 MByte/s

 So the newest stable version of qemu-kvm shows only the half of
 bandwidth compared to the older version 0.14.1.

The question is, why is it so slower?
How can we improve the performance, except for downgrading to 0.14.1?

I know there have been already several discussions on this issue,
for example, benchmark and trace on virtio-blk latency [1],
or in-kernel accelerator vhost-blk [2].
I'm going to continue testing with those ones, too.
But does anyone have a better idea or know about recent updates?

Regards,
Dongsu

[1] http://www.linux-kvm.org/page/Virtio/Block/Latency
[2] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/76893

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware

2012-02-10 Thread jamal

Hi John,

I went backwards to summarize at the top after going through your email.

TL;DR version 0.1: 
you provide a good use case where it makes sense to do things in the
kernel. IMO, you could make the same arguement if your embedded switch
could do ACLs, IPv4 forwarding etc. And the kernel bloats.
I am always bigoted to move all policy control to user space instead of
bloating in the kernel.

 
On Thu, 2012-02-09 at 20:14 -0800, John Fastabend wrote:

  
  Hi Jamal,
  
  The user space app in this case would listen for FDB updates to the SW
  bridge and then mirror them at the embedded NIC. In this case it seems
  easier to just add a notifier chain and let the kernel keep these in
  sync. Otherwise we need a daemon in user space to replicate these.
  

A user space daemon if you need to ensure synchronization. Thats what i
meant when i said there was a disadvantage over the simple case when
the goal is always to synchronize.

  On the other hand if you could make the same RTM_NEWNEIGH, RTM_DELNEIGH,
  and RTM_GETNEIGH work for the bridge, embedded bridge, and macvlan you
  would have one common interface to drive these. But the bridge already
  has this protocol/msgtype so that would require either some demux or
  new protocol/msgtype pairs to be created. 
  

The bridge is very netlink friendly these days. Given the rest of the
network stack (*NEIGH* you mention above) talks netlink to user space
it should be workable. 

  Let me think on it. I'm tempted by the simplicity of adding notifier
  hooks though.

If something is missing bridge-side it may need to be added (as Per
Stephen's comment) - i just took it one further indicating those
notifiers need to also netlink-speak


 Actually because the bridge is adding/removing fdb entries dynamically
 maybe its best this gets done in kernel. Here's the example case,

[..]

 
 With the flow by letters above hope this is not too difficult to follow.

 (A) veth0 a virtual device transmits packet destined for ethx.y
 (B) SW bridge receives frames and updates FDB flooding to C
 (C) eth0 the PF in this case sends the frame to the HW backed by the
 embedded bridge

Following so far.
Can you have more than one PF per embedded switch? Or is the intent here
purely to do VMs/VF separation?

 (D) The HW embedded switch has a static entry for ethx.y and forwards
 the frame to the VF or if its a broadcast frame also floods it to
 the wire and ethx.y

nod.

 (E) ethx.y receives the frame and generates a response to the dest mac of
 veth0

nod.
Since you said in #D the entries in the switch are static, I am assuming
at this point neither ethx.y nor veth0 exist in the embedded FDB.

 Now here is the potential issue,
 
 (G) The frame transmitted from ethx.y with the destination address of
 veth0 but the embedded switch is not a learning switch. If the FDB
 update is done in user space its possible (likely?) that the FDB
 entry for veth0 has not been added to the embedded switch yet. 

Ok, got it - so the catch here is the switch is not capable of learning.
I think this depends on where learning is done. Your intent is to
use the S/W bridge as something that does the learning for you i.e in
the kernel. This makes the s/w bridge part of MUST-have-for-this-to-run.
And that maybe the case for your use case.

What if I dont wanna run the S/W bridge at all?
Ive been making a point that with a simple knob(Stephen doesn like to
add such a knob), the SW bridge could defer learning to user space. 
[This way you can add a lot of richness e.g on ACLs such as restricting
what MAC addresses etc are allowed to talk to which ones etc.].
But if bypass the s/w bridge all together and learn in user space
or have a static config in which i populate the embedded switch, i dont
see the issue.

 Now
 we either have to flood the frame which is not horrible but not
 ideal or worse if the embedded switch does not support flooding send
 it to the wire and veth0 never receives it. 

If it is a switch it has to flood, no? Otherwise it sounds broken.

 If the SW bridge pushes
 the FDB update down into the embedded switch the address is for
 sure in the embedded switches forwarding tables and the switching
 works as expected.

Yes, there is a small gap between the s/w bridge learning and the
synchronization happening to the embedded nic switch. That gap gets
larger if you defer learning to user space. But like you said earlier,
during that gap packets are flooded - and do you care if the
synchronization doesnt happen immediately?

 So to handle this case correctly its probably best IMHO to use a notifier
 hook. Having a RTM_GETNEIGH for the embedded switch implemented though
 would be nice for dumping the FDB of the embedded switch and SET/DEL
 could be used to configure the FDB when its not being driven by the SW
 switch. Of course we should try to be minimalists here.

Do you need to have a different *NEIGH* than what we already have
really?

The

Re: [PATCH 2/4 V13] Add functions to check if the host has stopped the vm

On Wed, Feb 08, 2012 at 10:07:44AM -0500, Eric B Munson wrote:
 When a host stops or suspends a VM it will set a flag to show this.  The
 watchdog will use these functions to determine if a softlockup is real, or the
 result of a suspended VM.
 
 Signed-off-by: Eric B Munson emun...@mgebm.net
 asm-generic changes Acked-by: Arnd Bergmann a...@arndb.de
 Cc: mi...@redhat.com
 Cc: h...@zytor.com
 Cc: ry...@linux.vnet.ibm.com
 Cc: aligu...@us.ibm.com
 Cc: mtosa...@redhat.com
 Cc: kvm@vger.kernel.org
 Cc: linux-a...@vger.kernel.org
 Cc: x...@kernel.org
 Cc: linux-ker...@vger.kernel.org
 ---
 Changes from V11:
  Re-add the missing asm-generic stub for check_and_clear_guest_stopped()
 Changes from V6:
  Use __this_cpu_and when clearing the PVCLOCK_GUEST_STOPPED flag
 Changes from V5:
  Collapse generic stubs into this patch
  check_and_clear_guest_stopped() takes no args and uses __get_cpu_var()
  Include individual definitions in ia64, s390, and powerpc
 
  arch/ia64/include/asm/kvm_para.h|5 +
  arch/powerpc/include/asm/kvm_para.h |5 +
  arch/s390/include/asm/kvm_para.h|5 +
  arch/x86/include/asm/kvm_para.h |8 
  arch/x86/kernel/kvmclock.c  |   21 +
  include/asm-generic/kvm_para.h  |   14 ++
  6 files changed, 58 insertions(+), 0 deletions(-)
  create mode 100644 include/asm-generic/kvm_para.h
 
 diff --git a/arch/ia64/include/asm/kvm_para.h 
 b/arch/ia64/include/asm/kvm_para.h
 index 1588aee..2019cb9 100644
 --- a/arch/ia64/include/asm/kvm_para.h
 +++ b/arch/ia64/include/asm/kvm_para.h
 @@ -26,6 +26,11 @@ static inline unsigned int kvm_arch_para_features(void)
   return 0;
  }
  
 +static inline bool kvm_check_and_clear_guest_paused(void)
 +{
 + return false;
 +}
 +
  #endif
  
  #endif
 diff --git a/arch/powerpc/include/asm/kvm_para.h 
 b/arch/powerpc/include/asm/kvm_para.h
 index 50533f9..1f80293 100644
 --- a/arch/powerpc/include/asm/kvm_para.h
 +++ b/arch/powerpc/include/asm/kvm_para.h
 @@ -169,6 +169,11 @@ static inline unsigned int kvm_arch_para_features(void)
   return r;
  }
  
 +static inline bool kvm_check_and_clear_guest_paused(void)
 +{
 + return false;
 +}
 +
  #endif /* __KERNEL__ */
  
  #endif /* __POWERPC_KVM_PARA_H__ */
 diff --git a/arch/s390/include/asm/kvm_para.h 
 b/arch/s390/include/asm/kvm_para.h
 index 6964db2..a988329 100644
 --- a/arch/s390/include/asm/kvm_para.h
 +++ b/arch/s390/include/asm/kvm_para.h
 @@ -149,6 +149,11 @@ static inline unsigned int kvm_arch_para_features(void)
   return 0;
  }
  
 +static inline bool kvm_check_and_clear_guest_paused(void)
 +{
 + return false;
 +}
 +
  #endif
  
  #endif /* __S390_KVM_PARA_H */
 diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
 index 734c376..99c4bbe 100644
 --- a/arch/x86/include/asm/kvm_para.h
 +++ b/arch/x86/include/asm/kvm_para.h
 @@ -95,6 +95,14 @@ struct kvm_vcpu_pv_apf_data {
  extern void kvmclock_init(void);
  extern int kvm_register_clock(char *txt);
  
 +#ifdef CONFIG_KVM_CLOCK
 +bool kvm_check_and_clear_guest_paused(void);
 +#else
 +static inline bool kvm_check_and_clear_guest_paused(void)
 +{
 + return false;
 +}
 +#endif /* CONFIG_KVMCLOCK */
  
  /* This instruction is vmcall.  On non-VT architectures, it will generate a
   * trap that we will then rewrite to the appropriate instruction.
 diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
 index 44842d7..bdf6423 100644
 --- a/arch/x86/kernel/kvmclock.c
 +++ b/arch/x86/kernel/kvmclock.c
 @@ -22,6 +22,7 @@
  #include asm/msr.h
  #include asm/apic.h
  #include linux/percpu.h
 +#include linux/hardirq.h
  
  #include asm/x86_init.h
  #include asm/reboot.h
 @@ -114,6 +115,26 @@ static void kvm_get_preset_lpj(void)
   preset_lpj = lpj;
  }
  
 +bool kvm_check_and_clear_guest_paused(void)
 +{
 + bool ret = false;
 + struct pvclock_vcpu_time_info *src;
 +
 + /*
 +  * per_cpu() is safe here because this function is only called from
 +  * timer functions where preemption is already disabled.
 +  */
 + WARN_ON(!in_atomic());
 + src = __get_cpu_var(hv_clock);
 + if ((src-flags  PVCLOCK_GUEST_STOPPED) != 0) {
 + __this_cpu_and(hv_clock.flags, ~PVCLOCK_GUEST_STOPPED);
 + ret = true;
 + }
 +
 + return ret;
 +}
 +EXPORT_SYMBOL_GPL(kvm_check_and_clear_guest_paused);
 +
  static struct clocksource kvm_clock = {
   .name = kvm-clock,
   .read = kvm_clock_get_cycles,
 diff --git a/include/asm-generic/kvm_para.h b/include/asm-generic/kvm_para.h
 new file mode 100644
 index 000..05ef7e7
 --- /dev/null
 +++ b/include/asm-generic/kvm_para.h
 @@ -0,0 +1,14 @@
 +#ifndef _ASM_GENERIC_KVM_PARA_H
 +#define _ASM_GENERIC_KVM_PARA_H
 +
 +
 +/*
 + * This function is used by architectures that support kvm to avoid issuing
 + * false soft lockup messages.
 + */
 +static inline bool kvm_check_and_clear_guest_paused(void)
 +{
 + return false;
 +}
 +

Re: [RFC PATCH v0 1/2] net: bridge: propagate FDB table into hardware

2012-02-10 Thread Stephen Hemminger

On Fri, 10 Feb 2012 10:18:31 -0500
jamal h...@cyberus.ca wrote:

 Hi John,
 
 I went backwards to summarize at the top after going through your email.
 
 TL;DR version 0.1: 
 you provide a good use case where it makes sense to do things in the
 kernel. IMO, you could make the same arguement if your embedded switch
 could do ACLs, IPv4 forwarding etc. And the kernel bloats.
 I am always bigoted to move all policy control to user space instead of
 bloating in the kernel.
 
  
 On Thu, 2012-02-09 at 20:14 -0800, John Fastabend wrote:
 
   
   Hi Jamal,
   
   The user space app in this case would listen for FDB updates to the SW
   bridge and then mirror them at the embedded NIC. In this case it seems
   easier to just add a notifier chain and let the kernel keep these in
   sync. Otherwise we need a daemon in user space to replicate these.
   
 
 A user space daemon if you need to ensure synchronization. Thats what i
 meant when i said there was a disadvantage over the simple case when
 the goal is always to synchronize.
 
   On the other hand if you could make the same RTM_NEWNEIGH, RTM_DELNEIGH,
   and RTM_GETNEIGH work for the bridge, embedded bridge, and macvlan you
   would have one common interface to drive these. But the bridge already
   has this protocol/msgtype so that would require either some demux or
   new protocol/msgtype pairs to be created. 
   
 
 The bridge is very netlink friendly these days. Given the rest of the
 network stack (*NEIGH* you mention above) talks netlink to user space
 it should be workable. 
 
   Let me think on it. I'm tempted by the simplicity of adding notifier
   hooks though.
 
 If something is missing bridge-side it may need to be added (as Per
 Stephen's comment) - i just took it one further indicating those
 notifiers need to also netlink-speak
 
 
  Actually because the bridge is adding/removing fdb entries dynamically
  maybe its best this gets done in kernel. Here's the example case,
 
 [..]
 
  
  With the flow by letters above hope this is not too difficult to follow.
 
  (A) veth0 a virtual device transmits packet destined for ethx.y
  (B) SW bridge receives frames and updates FDB flooding to C
  (C) eth0 the PF in this case sends the frame to the HW backed by the
  embedded bridge
 
 Following so far.
 Can you have more than one PF per embedded switch? Or is the intent here
 purely to do VMs/VF separation?
 
  (D) The HW embedded switch has a static entry for ethx.y and forwards
  the frame to the VF or if its a broadcast frame also floods it to
  the wire and ethx.y
 
 nod.
 
  (E) ethx.y receives the frame and generates a response to the dest mac of
  veth0
 
 nod.
 Since you said in #D the entries in the switch are static, I am assuming
 at this point neither ethx.y nor veth0 exist in the embedded FDB.
 
  Now here is the potential issue,
  
  (G) The frame transmitted from ethx.y with the destination address of
  veth0 but the embedded switch is not a learning switch. If the FDB
  update is done in user space its possible (likely?) that the FDB
  entry for veth0 has not been added to the embedded switch yet. 
 
 Ok, got it - so the catch here is the switch is not capable of learning.
 I think this depends on where learning is done. Your intent is to
 use the S/W bridge as something that does the learning for you i.e in
 the kernel. This makes the s/w bridge part of MUST-have-for-this-to-run.
 And that maybe the case for your use case.
 
 What if I dont wanna run the S/W bridge at all?
 Ive been making a point that with a simple knob(Stephen doesn like to
 add such a knob), the SW bridge could defer learning to user space. 
 [This way you can add a lot of richness e.g on ACLs such as restricting
 what MAC addresses etc are allowed to talk to which ones etc.].
 But if bypass the s/w bridge all together and learn in user space
 or have a static config in which i populate the embedded switch, i dont
 see the issue.
 
  Now
  we either have to flood the frame which is not horrible but not
  ideal or worse if the embedded switch does not support flooding send
  it to the wire and veth0 never receives it. 
 
 If it is a switch it has to flood, no? Otherwise it sounds broken.
 
  If the SW bridge pushes
  the FDB update down into the embedded switch the address is for
  sure in the embedded switches forwarding tables and the switching
  works as expected.
 
 Yes, there is a small gap between the s/w bridge learning and the
 synchronization happening to the embedded nic switch. That gap gets
 larger if you defer learning to user space. But like you said earlier,
 during that gap packets are flooded - and do you care if the
 synchronization doesnt happen immediately?
 
  So to handle this case correctly its probably best IMHO to use a notifier
  hook. Having a RTM_GETNEIGH for the embedded switch implemented though
  would be nice for dumping the FDB of the embedded switch and SET/DEL
  could be used to

Re: [RFC] need to improve slot creation/destruction? -- Re: [RFC][PATCH] srcu: Implement call_srcu()

On Thu, Feb 09, 2012 at 04:25:36PM +0200, Avi Kivity wrote:
 On 02/08/2012 08:45 PM, Marcelo Tosatti wrote:
   BTW do we really need fast slot creation/destruction?
 
  At the moment yes. Boot a RHEL/Fedora installation disk (or any other
  guest which uses SYSLINUX splash screen) and you will see.
 
 Another workload that suffers is Windows XP clearing the screen during boot.
 
   That
  particular case is a limitation of cirrus in QEMU, ideally it should be
  optimized there.
 
 Why do you say that?

There is no fundamental need to create/destroy the 0xa VGA memory
slot repeatedly.

But you are right that the aim should be decent performance
nevertheless.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC] need to improve slot creation/destruction? -- Re: [RFC][PATCH] srcu: Implement call_srcu()

On Fri, Feb 10, 2012 at 10:08:12PM +0900, Takuya Yoshikawa wrote:
 Avi Kivity a...@redhat.com wrote:
 
  On 02/09/2012 04:23 PM, Avi Kivity wrote:
BTW do we really need fast slot creation/destruction?
  
   Not really, but it's good to have infrastructure that copes with
   different workloads.  If the patches keep the code simple I think it's a
   good thing to have.
 
 My patch is pretty simple.
 
  To qualify - taking several tens of milliseconds is out of the question
  as some workloads grind to a halt.  But it doesn't need to be incredibly
  fast.
 
 I think the problem is not how long it takes to create/invalidate a slot
 but following faults and shadow pages reconstruction.

The problem which is immediatelly visible is kvm_set_mem performance.

   Similar to mmu shrinker problem we discussed before.
 
 E.g. users will not expect sudden latency induced by pci hotplug.
 
   Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3] KVM: Allow host IRQ sharing for assigned PCI 2.3 devices

PCI 2.3 allows to generically disable IRQ sources at device level. This
enables us to share legacy IRQs of such devices with other host devices
when passing them to a guest.

The new IRQ sharing feature introduced here is optional, user space has
to request it explicitly. Moreover, user space can inform us about its
view of PCI_COMMAND_INTX_DISABLE so that we can avoid unmasking the
interrupt and signaling it if the guest masked it via the virtualized
PCI config space.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

Changes in v3:
 - rebased over current kvm.git (no code conflict, just api.txt)

 Documentation/virtual/kvm/api.txt |   31 ++
 arch/x86/kvm/x86.c|1 +
 include/linux/kvm.h   |6 +
 include/linux/kvm_host.h  |2 +
 virt/kvm/assigned-dev.c   |  208 +++-
 5 files changed, 219 insertions(+), 29 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 59a3826..5ce0e29 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1169,6 +1169,14 @@ following flags are specified:
 
 /* Depends on KVM_CAP_IOMMU */
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1  0)
+/* The following two depend on KVM_CAP_PCI_2_3 */
+#define KVM_DEV_ASSIGN_PCI_2_3 (1  1)
+#define KVM_DEV_ASSIGN_MASK_INTX   (1  2)
+
+If KVM_DEV_ASSIGN_PCI_2_3 is set, the kernel will manage legacy INTx interrupts
+via the PCI-2.3-compliant device-level mask, thus enable IRQ sharing with other
+assigned devices or host devices. KVM_DEV_ASSIGN_MASK_INTX specifies the
+guest's view on the INTx mask, see KVM_ASSIGN_SET_INTX_MASK for details.
 
 The KVM_DEV_ASSIGN_ENABLE_IOMMU flag is a mandatory option to ensure
 isolation of the device.  Usages not specifying this flag are deprecated.
@@ -1441,6 +1449,29 @@ The num_dirty field is a performance hint for KVM to 
determine whether it
 should skip processing the bitmap and just invalidate everything.  It must
 be set to the number of set bits in the bitmap.
 
+4.60 KVM_ASSIGN_SET_INTX_MASK
+
+Capability: KVM_CAP_PCI_2_3
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_assigned_pci_dev (in)
+Returns: 0 on success, -1 on error
+
+Informs the kernel about the guest's view on the INTx mask. As long as the
+guest masks the legacy INTx, the kernel will refrain from unmasking it at
+hardware level and will not assert the guest's IRQ line. User space is still
+responsible for applying this state to the assigned device's real config space
+by setting or clearing the Interrupt Disable bit 10 in the Command register.
+
+To avoid that the kernel overwrites the state user space wants to set,
+KVM_ASSIGN_SET_INTX_MASK has to be called prior to updating the config space.
+Moreover, user space has to write back its own view on the Interrupt Disable
+bit whenever modifying the Command word.
+
+See KVM_ASSIGN_DEV_IRQ for the data structure. The target device is specified
+by assigned_dev_id. In the flags field, only KVM_DEV_ASSIGN_MASK_INTX is
+evaluated.
+
 4.62 KVM_CREATE_SPAPR_TCE
 
 Capability: KVM_CAP_SPAPR_TCE
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2bd77a3..1f11435 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2099,6 +2099,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_XSAVE:
case KVM_CAP_ASYNC_PF:
case KVM_CAP_GET_TSC_KHZ:
+   case KVM_CAP_PCI_2_3:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index acbe429..6c322a9 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -588,6 +588,7 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_TSC_DEADLINE_TIMER 72
 #define KVM_CAP_S390_UCONTROL 73
 #define KVM_CAP_SYNC_REGS 74
+#define KVM_CAP_PCI_2_3 75
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -784,6 +785,9 @@ struct kvm_s390_ucas_mapping {
 /* Available with KVM_CAP_TSC_CONTROL */
 #define KVM_SET_TSC_KHZ   _IO(KVMIO,  0xa2)
 #define KVM_GET_TSC_KHZ   _IO(KVMIO,  0xa3)
+/* Available with KVM_CAP_PCI_2_3 */
+#define KVM_ASSIGN_SET_INTX_MASK  _IOW(KVMIO,  0xa4, \
+  struct kvm_assigned_pci_dev)
 
 /*
  * ioctls for vcpu fds
@@ -857,6 +861,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_SET_ONE_REG  _IOW(KVMIO,  0xac, struct kvm_one_reg)
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU(1  0)
+#define KVM_DEV_ASSIGN_PCI_2_3 (1  1)
+#define KVM_DEV_ASSIGN_MASK_INTX   (1  2)
 
 struct kvm_assigned_pci_dev {
__u32 assigned_dev_id;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9698080..d1d68f4 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -547,6 +547,7 @@ struct kvm_assigned_dev_kernel {
unsigned int entries_nr;
int host_irq;
bool host_irq_disabled;
+   bool pci_2_3;
struct msix_entry *host_msix_entries;

[PATCH v2 8/8] kvmvapic: Use optionrom helpers

Use OPTION_ROM_START/END from the common header file, add comment to
init code.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 pc-bios/optionrom/kvmvapic.S |   18 --
 1 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/pc-bios/optionrom/kvmvapic.S b/pc-bios/optionrom/kvmvapic.S
index 856c1e5..aa17a40 100644
--- a/pc-bios/optionrom/kvmvapic.S
+++ b/pc-bios/optionrom/kvmvapic.S
@@ -9,12 +9,10 @@
 # option) any later version.  See the COPYING file in the top-level directory.
 #
 
-   .text 0
-   .code16
-.global _start
-_start:
-   .short 0xaa55
-   .byte (_end - _start) / 512
+#include optionrom.h
+
+OPTION_ROM_START
+
# clear vapic area: firmware load using rep insb may cause
# stale tpr/isr/irr data to corrupt the vapic area.
push %es
@@ -26,8 +24,11 @@ _start:
cld
rep stosw
pop %es
+
+   # announce presence to the hypervisor
mov $vapic_base, %ax
out %ax, $0x7e
+
lret
 
.code32
@@ -331,7 +332,4 @@ up_set_tpr_poll_irq:
 vapic:
 . = . + vapic_size
 
-.byte 0  # reserve space for signature
-.align 512, 0
-
-_end:
+OPTION_ROM_END
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 2/8] Allow to use pause_all_vcpus from VCPU context

In order to perform critical manipulations on the VM state in the
context of a VCPU, specifically code patching, stopping and resuming of
all VCPUs may be necessary. resume_all_vcpus is already compatible, now
enable pause_all_vcpus for this use case by stopping the calling context
before starting to wait for the whole gang.

CC: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 cpus.c |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/cpus.c b/cpus.c
index d0c8340..5adfc6b 100644
--- a/cpus.c
+++ b/cpus.c
@@ -870,6 +870,18 @@ void pause_all_vcpus(void)
 penv = (CPUState *)penv-next_cpu;
 }
 
+if (!qemu_thread_is_self(io_thread)) {
+cpu_stop_current();
+if (!kvm_enabled()) {
+while (penv) {
+penv-stop = 0;
+penv-stopped = 1;
+penv = (CPUState *)penv-next_cpu;
+}
+return;
+}
+}
+
 while (!all_vcpus_paused()) {
 qemu_cond_wait(qemu_pause_cond, qemu_global_mutex);
 penv = first_cpu;
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 5/8] kvmvapic: Introduce TPR access optimization for Windows guests

This enables acceleration for MMIO-based TPR registers accesses of
32-bit Windows guest systems. It is mostly useful with KVM enabled,
either on older Intel CPUs (without flexpriority feature, can also be
manually disabled for testing) or any current AMD processor.

The approach introduced here is derived from the original version of
qemu-kvm. It was refactored, documented, and extended by support for
user space APIC emulation, both with and without KVM acceleration. The
VMState format was kept compatible, so was the ABI to the option ROM
that implements the guest-side para-virtualized driver service. This
enables seamless migration from qemu-kvm to upstream or, one day,
between KVM and TCG mode.

The basic concept goes like this:
 - VAPIC PV interface consisting of I/O port 0x7e and (for KVM in-kernel
   irqchip) a vmcall hypercall is registered
 - VAPIC option ROM is loaded into guest
 - option ROM activates TPR MMIO access reporting via port 0x7e
 - TPR accesses are trapped and patched in the guest to call into option
   ROM instead, VAPIC support is enabled
 - option ROM TPR helpers track state in memory and invoke hypercall to
   poll for pending IRQs if required

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 Makefile.target|3 +-
 hw/apic.c  |  126 -
 hw/apic_common.c   |   64 +-
 hw/apic_internal.h |   27 ++
 hw/kvm/apic.c  |   32 +++
 hw/kvmvapic.c  |  774 
 6 files changed, 1012 insertions(+), 14 deletions(-)
 create mode 100644 hw/kvmvapic.c

diff --git a/Makefile.target b/Makefile.target
index 68481a3..ec7eff8 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -230,7 +230,8 @@ obj-y += device-hotplug.o
 
 # Hardware support
 obj-i386-y += mc146818rtc.o pc.o
-obj-i386-y += sga.o apic_common.o apic.o ioapic_common.o ioapic.o piix_pci.o
+obj-i386-y += apic_common.o apic.o kvmvapic.o
+obj-i386-y += sga.o ioapic_common.o ioapic.o piix_pci.o
 obj-i386-y += vmport.o
 obj-i386-y += pci-hotplug.o smbios.o wdt_ib700.o
 obj-i386-y += debugcon.o multiboot.o
diff --git a/hw/apic.c b/hw/apic.c
index 086c544..2ebf3ca 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -35,6 +35,10 @@
 #define MSI_ADDR_DEST_ID_SHIFT 12
 #defineMSI_ADDR_DEST_ID_MASK   0x000
 
+#define SYNC_FROM_VAPIC 0x1
+#define SYNC_TO_VAPIC   0x2
+#define SYNC_ISR_IRR_TO_VAPIC   0x4
+
 static APICCommonState *local_apics[MAX_APICS + 1];
 
 static void apic_set_irq(APICCommonState *s, int vector_num, int trigger_mode);
@@ -78,6 +82,70 @@ static inline int get_bit(uint32_t *tab, int index)
 return !!(tab[i]  mask);
 }
 
+/* return -1 if no bit is set */
+static int get_highest_priority_int(uint32_t *tab)
+{
+int i;
+for (i = 7; i = 0; i--) {
+if (tab[i] != 0) {
+return i * 32 + fls_bit(tab[i]);
+}
+}
+return -1;
+}
+
+static void apic_sync_vapic(APICCommonState *s, int sync_type)
+{
+VAPICState vapic_state;
+size_t length;
+off_t start;
+int vector;
+
+if (!s-vapic_paddr) {
+return;
+}
+if (sync_type  SYNC_FROM_VAPIC) {
+cpu_physical_memory_rw(s-vapic_paddr, (void *)vapic_state,
+   sizeof(vapic_state), 0);
+s-tpr = vapic_state.tpr;
+}
+if (sync_type  (SYNC_TO_VAPIC | SYNC_ISR_IRR_TO_VAPIC)) {
+start = offsetof(VAPICState, isr);
+length = offsetof(VAPICState, enabled) - offsetof(VAPICState, isr);
+
+if (sync_type  SYNC_TO_VAPIC) {
+assert(qemu_cpu_is_self(s-cpu_env));
+
+vapic_state.tpr = s-tpr;
+vapic_state.enabled = 1;
+start = 0;
+length = sizeof(VAPICState);
+}
+
+vector = get_highest_priority_int(s-isr);
+if (vector  0) {
+vector = 0;
+}
+vapic_state.isr = vector  0xf0;
+
+vapic_state.zero = 0;
+
+vector = get_highest_priority_int(s-irr);
+if (vector  0) {
+vector = 0;
+}
+vapic_state.irr = vector  0xff;
+
+cpu_physical_memory_write_rom(s-vapic_paddr + start,
+  ((void *)vapic_state) + start, length);
+}
+}
+
+static void apic_vapic_base_update(APICCommonState *s)
+{
+apic_sync_vapic(s, SYNC_TO_VAPIC);
+}
+
 static void apic_local_deliver(APICCommonState *s, int vector)
 {
 uint32_t lvt = s-lvt[vector];
@@ -239,20 +307,17 @@ static void apic_set_base(APICCommonState *s, uint64_t 
val)
 
 static void apic_set_tpr(APICCommonState *s, uint8_t val)
 {
-s-tpr = (val  0x0f)  4;
-apic_update_irq(s);
+/* Updates from cr8 are ignored while the VAPIC is active */
+if (!s-vapic_paddr) {
+s-tpr = val  4;
+apic_update_irq(s);
+}
 }
 
-/* return -1 if no bit is set */
-static int get_highest_priority_int(uint32_t *tab)
+static uint8_t apic_get_tpr(APICCommonState *s)
 {
-int i;
-

[PATCH v2 3/8] target-i386: Add infrastructure for reporting TPR MMIO accesses

This will allow the APIC core to file a TPR access report. Depending on
the accelerator and kernel irqchip mode, it will either be delivered
right away or queued for later reporting.

In TCG mode, we can restart the triggering instruction and can therefore
forward the event directly. KVM does not allows us to restart, so we
postpone the delivery of events recording in the user space APIC until
the current instruction is completed.

Note that KVM without in-kernel irqchip will report the address after
the instruction that triggered a write access. In contrast, read
accesses will return the precise information.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 cpu-all.h|3 ++-
 hw/apic.h|2 ++
 hw/apic_common.c |4 
 target-i386/cpu.h|9 +
 target-i386/helper.c |   19 +++
 target-i386/kvm.c|   24 ++--
 6 files changed, 58 insertions(+), 3 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index e2c3c49..80e6d42 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -375,8 +375,9 @@ DECLARE_TLS(CPUState *,cpu_single_env);
 #define CPU_INTERRUPT_TGT_INT_0   0x0100
 #define CPU_INTERRUPT_TGT_INT_1   0x0400
 #define CPU_INTERRUPT_TGT_INT_2   0x0800
+#define CPU_INTERRUPT_TGT_INT_3   0x2000
 
-/* First unused bit: 0x2000.  */
+/* First unused bit: 0x4000.  */
 
 /* The set of all bits that should be masked when single-stepping.  */
 #define CPU_INTERRUPT_SSTEP_MASK \
diff --git a/hw/apic.h b/hw/apic.h
index a62d83b..45598bd 100644
--- a/hw/apic.h
+++ b/hw/apic.h
@@ -18,6 +18,8 @@ void cpu_set_apic_tpr(DeviceState *s, uint8_t val);
 uint8_t cpu_get_apic_tpr(DeviceState *s);
 void apic_init_reset(DeviceState *s);
 void apic_sipi(DeviceState *s);
+void apic_handle_tpr_access_report(DeviceState *d, target_ulong ip,
+   int access);
 
 /* pc.c */
 int cpu_is_bsp(CPUState *env);
diff --git a/hw/apic_common.c b/hw/apic_common.c
index 8373d79..588531b 100644
--- a/hw/apic_common.c
+++ b/hw/apic_common.c
@@ -68,6 +68,10 @@ uint8_t cpu_get_apic_tpr(DeviceState *d)
 return s ? s-tpr  4 : 0;
 }
 
+void apic_handle_tpr_access_report(DeviceState *d, target_ulong ip, int access)
+{
+}
+
 void apic_report_irq_delivered(int delivered)
 {
 apic_irq_delivered += delivered;
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 37dde79..92e9c87 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -482,6 +482,7 @@
 #define CPU_INTERRUPT_VIRQ  CPU_INTERRUPT_TGT_INT_0
 #define CPU_INTERRUPT_INIT  CPU_INTERRUPT_TGT_INT_1
 #define CPU_INTERRUPT_SIPI  CPU_INTERRUPT_TGT_INT_2
+#define CPU_INTERRUPT_TPR   CPU_INTERRUPT_TGT_INT_3
 
 
 enum {
@@ -772,6 +773,9 @@ typedef struct CPUX86State {
 XMMReg ymmh_regs[CPU_NB_REGS];
 
 uint64_t xcr0;
+
+target_ulong tpr_access_ip;
+int tpr_access_type;
 } CPUX86State;
 
 CPUX86State *cpu_x86_init(const char *cpu_model);
@@ -1064,4 +1068,9 @@ void svm_check_intercept(CPUState *env1, uint32_t type);
 
 uint32_t cpu_cc_compute_all(CPUState *env1, int op);
 
+#define TPR_ACCESS_READ 0
+#define TPR_ACCESS_WRITE1
+
+void cpu_report_tpr_access(CPUState *env, int access);
+
 #endif /* CPU_I386_H */
diff --git a/target-i386/helper.c b/target-i386/helper.c
index 2586aff..eca20cd 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -1189,6 +1189,25 @@ void cpu_x86_inject_mce(Monitor *mon, CPUState *cenv, 
int bank,
 }
 }
 }
+
+void cpu_report_tpr_access(CPUState *env, int access)
+{
+TranslationBlock *tb;
+
+if (kvm_enabled()) {
+cpu_synchronize_state(env);
+
+env-tpr_access_ip = env-eip;
+env-tpr_access_type = access;
+
+cpu_interrupt(env, CPU_INTERRUPT_TPR);
+} else {
+tb = tb_find_pc(env-mem_io_pc);
+cpu_restore_state(tb, env, env-mem_io_pc);
+
+apic_handle_tpr_access_report(env-apic_state, env-eip, access);
+}
+}
 #endif /* !CONFIG_USER_ONLY */
 
 static void mce_init(CPUX86State *cenv)
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 981192d..fa77f9d 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -1635,8 +1635,10 @@ void kvm_arch_pre_run(CPUState *env, struct kvm_run *run)
 }
 
 if (!kvm_irqchip_in_kernel()) {
-/* Force the VCPU out of its inner loop to process the INIT request */
-if (env-interrupt_request  CPU_INTERRUPT_INIT) {
+/* Force the VCPU out of its inner loop to process any INIT requests
+ * or pending TPR access reports. */
+if (env-interrupt_request 
+(CPU_INTERRUPT_INIT | CPU_INTERRUPT_TPR)) {
 env-exit_request = 1;
 }
 
@@ -1730,6 +1732,11 @@ int kvm_arch_process_async_events(CPUState *env)
 kvm_cpu_synchronize_state(env);
 do_cpu_sipi(env);
 }
+if (env-interrupt_request  CPU_INTERRUPT_TPR) {
+env-interrupt_request = ~CPU_INTERRUPT_TPR;
+apic_handle_tpr_access_report(env-apic_state,

[PATCH v2 0/8] uq/master: TPR access optimization for Windows guests

Here is v2 of the TPR access optimization. Changes:
 - plug race between patching and running VCPUs accessing the same TPR
   instruction by stopping VCPUs during patch process
 - realized forward/backward check in evaluate_tpr_instruction via a
   table but kept patch_instruction as is (too much variations for a
   table-driven approach)
 - dropped smp_cpus == 1 special case from get_kpcr_number
 - fixed comment why R/W ROM alias has to be page-aligned

The series is also available at

git://git.kiszka.org/qemu-kvm.git queues/kvm-tpr

Please review/apply.

CC: Paolo Bonzini pbonz...@redhat.com

Jan Kiszka (8):
  kvm: Set cpu_single_env only once
  Allow to use pause_all_vcpus from VCPU context
  target-i386: Add infrastructure for reporting TPR MMIO accesses
  kvmvapic: Add option ROM
  kvmvapic: Introduce TPR access optimization for Windows guests
  kvmvapic: Simplify mp/up_set_tpr
  optionsrom: Reserve space for checksum
  kvmvapic: Use optionrom helpers

 .gitignore|1 +
 Makefile  |2 +-
 Makefile.target   |3 +-
 cpu-all.h |3 +-
 cpus.c|   13 +
 hw/apic.c |  126 ++-
 hw/apic.h |2 +
 hw/apic_common.c  |   68 -
 hw/apic_internal.h|   27 ++
 hw/kvm/apic.c |   32 ++
 hw/kvmvapic.c |  774 +
 kvm-all.c |5 -
 pc-bios/optionrom/Makefile|2 +-
 pc-bios/optionrom/kvmvapic.S  |  335 ++
 pc-bios/optionrom/optionrom.h |3 +-
 target-i386/cpu.h |9 +
 target-i386/helper.c  |   19 +
 target-i386/kvm.c |   24 ++-
 18 files changed, 1423 insertions(+), 25 deletions(-)
 create mode 100644 hw/kvmvapic.c
 create mode 100644 pc-bios/optionrom/kvmvapic.S

-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 6/8] kvmvapic: Simplify mp/up_set_tpr

The CH registers is only written, never read. So we can remove these
operations and, in case of up_set_tpr, also the ECX push/pop.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 pc-bios/optionrom/kvmvapic.S |6 +-
 1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/pc-bios/optionrom/kvmvapic.S b/pc-bios/optionrom/kvmvapic.S
index e1d8f18..856c1e5 100644
--- a/pc-bios/optionrom/kvmvapic.S
+++ b/pc-bios/optionrom/kvmvapic.S
@@ -202,7 +202,6 @@ mp_isr_is_bigger:
mov %bh, %bl
 mp_tpr_is_bigger:
/* %bl = ppr */
-   mov %bl, %ch   /* ch = ppr */
rol $8, %ebx
/* now: %bl = irr, %bh = ppr */
cmp %bh, %bl
@@ -276,7 +275,6 @@ up_set_tpr_eax:
 up_set_tpr:
pushf
push %eax
-   push %ecx
push %ebx
reenable_vtpr
 
@@ -284,7 +282,7 @@ up_set_tpr_failed:
mov vapic, %eax ; fixup
 
mov %eax, %ebx
-   mov 20(%esp), %bl
+   mov 16(%esp), %bl
 
/* %ebx = new vapic (%bl = tpr, %bh = isr, %b3 = irr) */
 
@@ -298,7 +296,6 @@ up_isr_is_bigger:
mov %bh, %bl
 up_tpr_is_bigger:
/* %bl = ppr */
-   mov %bl, %ch   /* ch = ppr */
rol $8, %ebx
/* now: %bl = irr, %bh = ppr */
cmp %bh, %bl
@@ -306,7 +303,6 @@ up_tpr_is_bigger:
 
 up_set_tpr_out:
pop %ebx
-   pop %ecx
pop %eax
popf
ret $4
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 4/8] kvmvapic: Add option ROM

This imports and builds the original VAPIC option ROM of qemu-kvm.
Its interaction with QEMU is described in the commit that introduces the
corresponding device model.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 .gitignore   |1 +
 Makefile |2 +-
 pc-bios/optionrom/Makefile   |2 +-
 pc-bios/optionrom/kvmvapic.S |  341 ++
 4 files changed, 344 insertions(+), 2 deletions(-)
 create mode 100644 pc-bios/optionrom/kvmvapic.S

diff --git a/.gitignore b/.gitignore
index f5aab2c..d3b78c3 100644
--- a/.gitignore
+++ b/.gitignore
@@ -75,6 +75,7 @@ pc-bios/vgabios-pq/status
 pc-bios/optionrom/linuxboot.bin
 pc-bios/optionrom/multiboot.bin
 pc-bios/optionrom/multiboot.raw
+pc-bios/optionrom/kvmvapic.bin
 .stgit-*
 cscope.*
 tags
diff --git a/Makefile b/Makefile
index 47acf3d..c2ef135 100644
--- a/Makefile
+++ b/Makefile
@@ -255,7 +255,7 @@ pxe-e1000.rom pxe-eepro100.rom pxe-ne2k_pci.rom \
 pxe-pcnet.rom pxe-rtl8139.rom pxe-virtio.rom \
 bamboo.dtb petalogix-s3adsp1800.dtb petalogix-ml605.dtb \
 mpc8544ds.dtb \
-multiboot.bin linuxboot.bin \
+multiboot.bin linuxboot.bin kvmvapic.bin \
 s390-zipl.rom \
 spapr-rtas.bin slof.bin \
 palcode-clipper
diff --git a/pc-bios/optionrom/Makefile b/pc-bios/optionrom/Makefile
index 2caf7e6..f6b4027 100644
--- a/pc-bios/optionrom/Makefile
+++ b/pc-bios/optionrom/Makefile
@@ -14,7 +14,7 @@ CFLAGS += -I$(SRC_PATH)
 CFLAGS += $(call cc-option, $(CFLAGS), -fno-stack-protector)
 QEMU_CFLAGS = $(CFLAGS)
 
-build-all: multiboot.bin linuxboot.bin
+build-all: multiboot.bin linuxboot.bin kvmvapic.bin
 
 # suppress auto-removal of intermediate files
 .SECONDARY:
diff --git a/pc-bios/optionrom/kvmvapic.S b/pc-bios/optionrom/kvmvapic.S
new file mode 100644
index 000..e1d8f18
--- /dev/null
+++ b/pc-bios/optionrom/kvmvapic.S
@@ -0,0 +1,341 @@
+#
+# Local APIC acceleration for Windows XP and related guests
+#
+# Copyright 2011 Red Hat, Inc. and/or its affiliates
+#
+# Author: Avi Kivity a...@redhat.com
+#
+# This work is licensed under the terms of the GNU GPL, version 2, or (at your
+# option) any later version.  See the COPYING file in the top-level directory.
+#
+
+   .text 0
+   .code16
+.global _start
+_start:
+   .short 0xaa55
+   .byte (_end - _start) / 512
+   # clear vapic area: firmware load using rep insb may cause
+   # stale tpr/isr/irr data to corrupt the vapic area.
+   push %es
+   push %cs
+   pop %es
+   xor %ax, %ax
+   mov $vapic_size/2, %cx
+   lea vapic, %di
+   cld
+   rep stosw
+   pop %es
+   mov $vapic_base, %ax
+   out %ax, $0x7e
+   lret
+
+   .code32
+vapic_size = 2*4096
+
+.macro fixup delta=-4
+777:
+   .text 1
+   .long 777b + \delta  - vapic_base
+   .text 0
+.endm
+
+.macro reenable_vtpr
+   out %al, $0x7e
+.endm
+
+.text 1
+   fixup_start = .
+.text 0
+
+.align 16
+
+vapic_base:
+   .ascii kvm aPiC
+
+   /* relocation data */
+   .long vapic_base; fixup
+   .long fixup_start   ; fixup
+   .long fixup_end ; fixup
+
+   .long vapic ; fixup
+   .long vapic_size
+vcpu_shift:
+   .long 0
+real_tpr:
+   .long 0
+   .long up_set_tpr; fixup
+   .long up_set_tpr_eax; fixup
+   .long up_get_tpr_eax; fixup
+   .long up_get_tpr_ecx; fixup
+   .long up_get_tpr_edx; fixup
+   .long up_get_tpr_ebx; fixup
+   .long 0 /* esp. won't work. */
+   .long up_get_tpr_ebp; fixup
+   .long up_get_tpr_esi; fixup
+   .long up_get_tpr_edi; fixup
+   .long up_get_tpr_stack  ; fixup
+   .long mp_set_tpr; fixup
+   .long mp_set_tpr_eax; fixup
+   .long mp_get_tpr_eax; fixup
+   .long mp_get_tpr_ecx; fixup
+   .long mp_get_tpr_edx; fixup
+   .long mp_get_tpr_ebx; fixup
+   .long 0 /* esp. won't work. */
+   .long mp_get_tpr_ebp; fixup
+   .long mp_get_tpr_esi; fixup
+   .long mp_get_tpr_edi; fixup
+   .long mp_get_tpr_stack  ; fixup
+
+.macro kvm_hypercall
+   .byte 0x0f, 0x01, 0xc1
+.endm
+
+kvm_hypercall_vapic_poll_irq = 1
+
+pcr_cpu = 0x51
+
+.align 64
+
+mp_get_tpr_eax:
+   pushf
+   cli
+   reenable_vtpr
+   push %ecx
+
+   fs/movzbl pcr_cpu, %eax
+
+   mov vcpu_shift, %ecx; fixup
+   shl %cl, %eax
+   testb $1, vapic+4(%eax) ; fixup delta=-5
+   jz mp_get_tpr_bad
+   movzbl vapic(%eax), %eax ; fixup
+
+mp_get_tpr_out:
+   pop %ecx
+   popf
+   ret
+
+mp_get_tpr_bad:
+   mov real_tpr, %eax  ; fixup
+   mov (%eax), %eax
+   jmp mp_get_tpr_out
+
+mp_get_tpr_ebx:
+   mov %eax, %ebx
+   call mp_get_tpr_eax
+   xchg %eax, %ebx
+   ret
+
+mp_get_tpr_ecx:
+   mov %eax, %ecx
+   call mp_get_tpr_eax
+   xchg %eax, %ecx
+   ret
+
+mp_get_tpr_edx:
+   mov %eax, %edx

[PATCH v2 1/8] kvm: Set cpu_single_env only once

As we have thread-local cpu_single_env now and KVM uses exactly one
thread per VCPU, we can drop the cpu_single_env updates from the loop
and initialize this variable only once during setup.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 cpus.c|1 +
 kvm-all.c |5 -
 2 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/cpus.c b/cpus.c
index f45a438..d0c8340 100644
--- a/cpus.c
+++ b/cpus.c
@@ -714,6 +714,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
 qemu_mutex_lock(qemu_global_mutex);
 qemu_thread_get_self(env-thread);
 env-thread_id = qemu_get_thread_id();
+cpu_single_env = env;
 
 r = kvm_init_vcpu(env);
 if (r  0) {
diff --git a/kvm-all.c b/kvm-all.c
index c4babda..e2cbc03 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1118,8 +1118,6 @@ int kvm_cpu_exec(CPUState *env)
 return EXCP_HLT;
 }
 
-cpu_single_env = env;
-
 do {
 if (env-kvm_vcpu_dirty) {
 kvm_arch_put_registers(env, KVM_PUT_RUNTIME_STATE);
@@ -1136,13 +1134,11 @@ int kvm_cpu_exec(CPUState *env)
  */
 qemu_cpu_kick_self();
 }
-cpu_single_env = NULL;
 qemu_mutex_unlock_iothread();
 
 run_ret = kvm_vcpu_ioctl(env, KVM_RUN, 0);
 
 qemu_mutex_lock_iothread();
-cpu_single_env = env;
 kvm_arch_post_run(env, run);
 
 kvm_flush_coalesced_mmio_buffer();
@@ -1206,7 +1202,6 @@ int kvm_cpu_exec(CPUState *env)
 }
 
 env-exit_request = 0;
-cpu_single_env = NULL;
 return ret;
 }
 
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 7/8] optionsrom: Reserve space for checksum

Always add a byte before the final 512-bytes alignment to reserve the
space for the ROM checksum.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 pc-bios/optionrom/optionrom.h |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/pc-bios/optionrom/optionrom.h b/pc-bios/optionrom/optionrom.h
index aa783de..3daf7da 100644
--- a/pc-bios/optionrom/optionrom.h
+++ b/pc-bios/optionrom/optionrom.h
@@ -124,7 +124,8 @@
movw%ax, %ds;
 
 #define OPTION_ROM_END \
-.align 512, 0; \
+   .byte   0;  \
+   .align  512, 0; \
 _end:
 
 #define BOOT_ROM_END   \
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 1/3] KVM: PPC: epapr: Factor out the epapr init

2012-02-10 Thread Scott Wood

On 02/10/2012 04:02 AM, Liu Yu wrote:
 from the kvm guest paravirt init code.
 
 Signed-off-by: Liu Yu yu@freescale.com
 ---
 v3:
 apply the epapr init for all ppc platform
 
  arch/powerpc/Kconfig|4 +++
  arch/powerpc/include/asm/epapr_hcalls.h |8 +
  arch/powerpc/kernel/Makefile|1 +
  arch/powerpc/kernel/epapr_para.c|   46 
 +++
  arch/powerpc/kernel/kvm.c   |   13 +++--
  arch/powerpc/kvm/Kconfig|1 +
  6 files changed, 64 insertions(+), 9 deletions(-)
  create mode 100644 arch/powerpc/kernel/epapr_para.c
 
 diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
 index 47682b6..00bd508 100644
 --- a/arch/powerpc/Kconfig
 +++ b/arch/powerpc/Kconfig
 @@ -196,6 +196,10 @@ config EPAPR_BOOT
 Used to allow a board to specify it wants an ePAPR compliant wrapper.
   default n
  
 +config EPAPR_PARA
 + bool
 + default n

EPAPR_PARAVIRT

  config DEFAULT_UIMAGE
   bool
   help
 diff --git a/arch/powerpc/include/asm/epapr_hcalls.h 
 b/arch/powerpc/include/asm/epapr_hcalls.h
 index f3b0c2c..c4b86e4 100644
 --- a/arch/powerpc/include/asm/epapr_hcalls.h
 +++ b/arch/powerpc/include/asm/epapr_hcalls.h
 @@ -148,6 +148,14 @@
  #define EV_HCALL_CLOBBERS2 EV_HCALL_CLOBBERS3, r5
  #define EV_HCALL_CLOBBERS1 EV_HCALL_CLOBBERS2, r4
  
 +extern u32 *epapr_hcall_insts;
 +extern int epapr_hcall_insts_len;
 +
 +static inline void epapr_get_hcall_insts(u32 **instp, int *lenp)
 +{
 + *instp = epapr_hcall_insts;
 + *lenp = epapr_hcall_insts_len;
 +}

Why do we need a function for this?  Why is the public interface
anything other than invoke a hypercall?

 +static int __init epapr_para_init(void)
 +{
 + struct device_node *hyper_node;
 + u32 *insts;
 + int len;
 +
 + hyper_node = of_find_node_by_path(/hypervisor);
 + if (!hyper_node)
 + return -ENODEV;
 +
 + insts = (u32*)of_get_property(hyper_node, hcall-instructions, len);

Do not cast away that const.

 @@ -535,18 +536,12 @@ EXPORT_SYMBOL_GPL(kvm_hypercall);
  static int kvm_para_setup(void)
  {
   extern u32 kvm_hypercall_start;
 - struct device_node *hyper_node;
   u32 *insts;
   int len, i;
  
 - hyper_node = of_find_node_by_path(/hypervisor);
 - if (!hyper_node)
 - return -1;
 -
 - insts = (u32*)of_get_property(hyper_node, hcall-instructions, len);
 - if (len % 4)
 - return -1;
 - if (len  (4 * 4))
 + insts = epapr_hcall_insts;
 + len = epapr_hcall_insts_len;
 + if (insts == NULL)
   return -1;
  
   for (i = 0; i  (len / 4); i++)

Why are you still doing the patching inside kvm.c?

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 3/3] KVM: PPC: epapr: install ev_idle hcall for e500 guest

2012-02-10 Thread Scott Wood

On 02/10/2012 04:02 AM, Liu Yu wrote:
 +_GLOBAL(epapr_ev_idle)
 +epapr_ev_idle:
 +#ifdef CONFIG_E500
 + rlwinm  r3,r1,0,0,31-THREAD_SHIFT   /* current thread_info */
 + lwz r4,TI_LOCAL_FLAGS(r3)   /* set napping bit */
 + ori r4,r4,_TLF_NAPPING  /* so when we take an exception */
 + stw r4,TI_LOCAL_FLAGS(r3)   /* it will return to our caller */
 +#endif
 + wrteei  1

On what hardware would you not need to use _TLF_NAPPING?

-Scott

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: WARNING: at arch/x86/kernel/smp.c:119 native_smp_send_reschedule+0x25/0x43()

On Fri, 2012-02-10 at 15:36 +0530, Srivatsa S. Bhat wrote:
  [   32.448626] [ cut here ]
  [   32.449160] WARNING: at arch/x86/kernel/smp.c:119 
  native_smp_send_reschedule+0x25/0x43()
  [   32.449621] Pid: 1, comm: init_stage2 Not tainted 3.2.0+ #14
  [   32.449621] Call Trace:
  [   32.449621]  IRQ  [81041a44] ? 
  native_smp_send_reschedule+0x25/0x43
  [   32.449621]  [810735b2] warn_slowpath_common+0x7b/0x93
  [   32.449621]  [810962cc] ? tick_nohz_handler+0xc9/0xc9
  [   32.449621]  [81073675] warn_slowpath_null+0x15/0x18
  [   32.449621]  [81041a44] native_smp_send_reschedule+0x25/0x43
  [   32.449621]  [81067a00] smp_send_reschedule+0xa/0xc
  [   32.449621]  [8106f25e] scheduler_tick+0x21a/0x242
  [   32.449621]  [8107da10] update_process_times+0x62/0x73
  [   32.449621]  [81096336] tick_sched_timer+0x6a/0x8a
  [   32.449621]  [8108c5eb] __run_hrtimer.clone.26+0x55/0xcb
  [   32.449621]  [8108cd77] hrtimer_interrupt+0xcb/0x19b
  [   32.449621]  [810428a8] smp_apic_timer_interrupt+0x72/0x85
  [   32.449621]  [8165a8de] apic_timer_interrupt+0x6e/0x80
  [   32.449621]  EOI  [8165928e] ? 
  _raw_spin_unlock_irqrestore+0x3a/0x3e
  [   32.449621]  [81042f4e] ? arch_local_irq_restore+0x6/0xd
  [   32.449621]  [810430c4] 
  default_send_IPI_mask_allbutself_phys+0x78/0x88
  [   32.449621]  [8106c3c4] ? __migrate_task+0xf1/0xf1
  [   32.449621]  [81045445] 
  physflat_send_IPI_allbutself+0x12/0x14
  [   32.449621]  [81041aaf] native_stop_other_cpus+0x4d/0xa8
  [   32.449621]  [810411c6] native_machine_shutdown+0x56/0x6d
  [   32.449621]  [81048499] kvm_shutdown+0x1a/0x1c
  [   32.449621]  [810411f9] machine_shutdown+0xa/0xc
  [   32.449621]  [81041265] native_machine_restart+0x20/0x32
  [   32.449621]  [81041297] machine_restart+0xa/0xc
  [   32.449621]  [81081d53] kernel_restart+0x49/0x4d
  [   32.449621]  [81081f26] sys_reboot+0x14b/0x18a
  [   32.449621]  [81089937] ? remove_wait_queue+0x4c/0x51
  [   32.449621]  [8107637f] ? do_wait+0x1a4/0x1e7
  [   32.449621]  [8107735a] ? sys_wait4+0xa8/0xbc
  [   32.449621]  [8107522b] ? clear_tsk_thread_flag+0xf/0xf
  [   32.449621]  [81659a25] ? async_page_fault+0x25/0x30
  [   32.449621]  [81659e92] system_call_fastpath+0x16/0x1b
  [   32.449621] ---[ end trace d0f03651493fd3d6 ]-- 

OK, so a 'modern' kernel does it slightly different and I've no idea
what exactly goes wrong in your vintage version. But I can see the
current stuff going at it all wrong.

What seems to happen is that native_nmi_stop_other_cpus() NMI broadcasts
for smp_stop_nmi_callback()-stop_this_cpu(). Which without any
serialization what so ever marks all remote CPUs offline and calls halt
with IRQs disabled - dead.

While we're waiting for this all to complete, the scheduler tries to
no_hz load-balance and kick a cpu it thinks is still around and we get
the above splat because the NMI just marked it offline without telling
anybody about it.

Now, arguably you don't want to go through the whole hotplug crap to
shut down your machine, esp not on panic, but clearing the online state
without telling anybody about it is bound to lead to these things.

No immediate solution comes to mind...

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: WARNING: at arch/x86/kernel/smp.c:119 native_smp_send_reschedule+0x25/0x43()

On Fri, 2012-02-10 at 19:58 +0100, Peter Zijlstra wrote:
 OK, so a 'modern' kernel does it slightly different and I've no idea
 what exactly goes wrong in your vintage version. But I can see the
 current stuff going at it all wrong.
 
 What seems to happen is that native_nmi_stop_other_cpus() NMI broadcasts
 for smp_stop_nmi_callback()-stop_this_cpu(). Which without any
 serialization what so ever marks all remote CPUs offline and calls halt
 with IRQs disabled - dead.
 
 While we're waiting for this all to complete, the scheduler tries to
 no_hz load-balance and kick a cpu it thinks is still around and we get
 the above splat because the NMI just marked it offline without telling
 anybody about it.
 
 Now, arguably you don't want to go through the whole hotplug crap to
 shut down your machine, esp not on panic, but clearing the online state
 without telling anybody about it is bound to lead to these things.
 
 No immediate solution comes to mind... 

Don, any reason you wait for the NMI broadcast to complete with IRQs
enabled? If you disable IRQs before the broadcast the interrupt can't
happen and should side-step this particular problem.

Its not like we have 'latency' issues on this path :-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: WARNING: at arch/x86/kernel/smp.c:119 native_smp_send_reschedule+0x25/0x43()

2012-02-10 Thread Don Zickus

On Fri, Feb 10, 2012 at 08:03:53PM +0100, Peter Zijlstra wrote:
 On Fri, 2012-02-10 at 19:58 +0100, Peter Zijlstra wrote:
  OK, so a 'modern' kernel does it slightly different and I've no idea
  what exactly goes wrong in your vintage version. But I can see the
  current stuff going at it all wrong.
  
  What seems to happen is that native_nmi_stop_other_cpus() NMI broadcasts
  for smp_stop_nmi_callback()-stop_this_cpu(). Which without any
  serialization what so ever marks all remote CPUs offline and calls halt
  with IRQs disabled - dead.
  
  While we're waiting for this all to complete, the scheduler tries to
  no_hz load-balance and kick a cpu it thinks is still around and we get
  the above splat because the NMI just marked it offline without telling
  anybody about it.
  
  Now, arguably you don't want to go through the whole hotplug crap to
  shut down your machine, esp not on panic, but clearing the online state
  without telling anybody about it is bound to lead to these things.
  
  No immediate solution comes to mind... 
 
 Don, any reason you wait for the NMI broadcast to complete with IRQs
 enabled? If you disable IRQs before the broadcast the interrupt can't
 happen and should side-step this particular problem.

Well I believe the old way had the same problem using the REBOOT_IRQ as
opposed to NMI.  I also don't know how to shutdown interrupts system wide
without just broadcasting an IRQ to locally disable interrupts.

 
 Its not like we have 'latency' issues on this path :-)

Heh.  Oddly I was writing the changelog for a patch that kinda changes
this path to sorta revert back to the old way of using a REBOOT_IRQ with
an NMI follow-on when the IRQ fails.

Originally, I wanted to make sure the cpus were shutdown immediately so we
can serialize the panic path hence the original change.

I also ran into the same problem you did and hacked up another patch that
checked a global atomic variable that let the system know we were shutting
down and not to do the WARN_ON (the global is already created for the NMI
case now).

I'll try to post that soon once I finish my long winded changelog.

Though it kinda addresses your issue, I'm not sure it does it in a way
that will satisfy you.  But I look forward to the discussion. :-)

Cheers,
Don
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: WARNING: at arch/x86/kernel/smp.c:119 native_smp_send_reschedule+0x25/0x43()

On Fri, 2012-02-10 at 15:02 -0500, Don Zickus wrote:
 I also ran into the same problem you did and hacked up another patch that
 checked a global atomic variable that let the system know we were shutting
 down and not to do the WARN_ON (the global is already created for the NMI
 case now). 

system_state seems like that thing.. 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: WARNING: at arch/x86/kernel/smp.c:119 native_smp_send_reschedule+0x25/0x43()

2012-02-10 Thread Don Zickus

On Fri, Feb 10, 2012 at 09:18:41PM +0100, Peter Zijlstra wrote:
 On Fri, 2012-02-10 at 15:02 -0500, Don Zickus wrote:
  I also ran into the same problem you did and hacked up another patch that
  checked a global atomic variable that let the system know we were shutting
  down and not to do the WARN_ON (the global is already created for the NMI
  case now). 
 
 system_state seems like that thing.. 

except it doesn't seem to have a PANIC state, though we could add one I
suppose.

The thing is even if you reverted my changes:

e58d429 x86, reboot: Fix typo in nmi reboot path
bda6263 x86, NMI: Add knob to disable using NMI IPIs to stop cpus
3603a25 x86, reboot: Use NMI instead of REBOOT_VECTOR to stop cpus

I think you still run into the same problem because the reschedule code
changed.

So my second patch which I will eventually post will just skip the WARN_ON
if the system is going down.  Not sure if that is the proper way to address
this problem or change all of the stop_this_cpu code to use a different
bitmask than the cpu_online bitmask (but then you run the risk of a stuck
IPI I guess if the cpu is halted without notifying anyone).

Cheers,
Don
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: WARNING: at arch/x86/kernel/smp.c:119 native_smp_send_reschedule+0x25/0x43()

On Fri, 2012-02-10 at 15:31 -0500, Don Zickus wrote:
 So my second patch which I will eventually post will just skip the WARN_ON
 if the system is going down.  Not sure if that is the proper way to address
 this problem or change all of the stop_this_cpu code to use a different
 bitmask than the cpu_online bitmask (but then you run the risk of a stuck
 IPI I guess if the cpu is halted without notifying anyone). 

Yeah, the async hard kill of all cpus is bound to make problems.. what
I'm wondering is, why is this in the normal shutdown path and not
specific to a hard panic?

Trying to make this work is just not going to be pretty, and in the
panic case we really don't care much.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: x86: kvmclock: abstract save/restore sched_clock_state

2012-02-10 Thread Igor Mammedov


I've to revoke my ack and say NAK to this patch. Patch itself is in right
direction but clock restore happens too late.

With patch I've used to hunt down overflow for cpu hotplug
(maybe it will be better for it to be in kernel, which will help to detect
 overflow problem without spending a lot of time to debug it?):

diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 42eb330..0081e10 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -43,7 +43,10 @@ void pvclock_set_flags(u8 flags)

 static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow)
 {
-   u64 delta = native_read_tsc() - shadow-tsc_timestamp;
+   u64 tsc = native_read_tsc();
+   u64 shadow_timestamp = shadow-tsc_timestamp;
+   u64 delta = tsc - shadow_timestamp;
+   BUG_ON (tsc  shadow_timestamp);
return pvclock_scale_delta(delta, shadow-tsc_to_nsec_mul,
   shadow-tsc_shift);
 }

I've got following back-trace:

#29 0x81004925 in oops_end (flags=70, regs=optimized out, signr=11) 
at arch/x86/kernel/dumpstack.c:246
#30 0x81004a77 in die (str=0x8152ca7f invalid opcode, 
regs=0x8800149859e8, err=0)
at arch/x86/kernel/dumpstack.c:305
#31 0x81001fed in do_trap (trapnr=6, signr=4, str=0x8152ca7f 
invalid opcode, regs=0x8800149859e8,
error_code=0, info=0x880014985948) at arch/x86/kernel/traps.c:168
#32 0x810021f7 in do_invalid_op (regs=0x8800149859e8, error_code=0) 
at arch/x86/kernel/traps.c:209
#33 signal handler called
#34 pvclock_get_nsec_offset (shadow=0x880014985a98) at 
arch/x86/kernel/pvclock.c:51
#35 pvclock_clocksource_read (src=0x88001f1d1ac0) at 
arch/x86/kernel/pvclock.c:107
#36 0x8101aa97 in kvm_clock_read () at arch/x86/kernel/kvmclock.c:79
#37 0x810087cc in paravirt_sched_clock () at 
/home/imammedov/fc15/linux-2.6/arch/x86/include/asm/paravirt.h:230
#38 sched_clock () at arch/x86/kernel/tsc.c:71
#39 0x8104f025 in sched_clock_local (scd=0x88001f1d2600) at 
kernel/sched/clock.c:147
#40 0x8104f1b7 in sched_clock_cpu (cpu=0) at kernel/sched/clock.c:232
#41 0x8104f1e5 in local_clock () at kernel/sched/clock.c:316
#42 0x810699d2 in lockstat_clock () at kernel/lockdep.c:158
#43 __lock_acquire (lock=0x816131b8, subclass=optimized out, trylock=0, 
read=0, check=2, hardirqs_off=optimized out,
nest_lock=0x0, ip=18446744071578924098, references=0) at 
kernel/lockdep.c:3098
#44 0x8106a924 in lock_acquire (lock=0x816131b8, subclass=0, 
trylock=0, read=0, check=2, nest_lock=0x0,
ip=18446744071578924098) at kernel/lockdep.c:3555
#45 0x8136b1c3 in __raw_spin_lock (lock=0x816131a0) at 
include/linux/spinlock_api_smp.h:143
#46 _raw_spin_lock (lock=0x816131a0) at kernel/spinlock.c:137
#47 0x81013442 in prepare_set () at 
arch/x86/kernel/cpu/mtrr/generic.c:676
#48 0x8101368c in generic_set_all () at 
arch/x86/kernel/cpu/mtrr/generic.c:723
#49 0x8101258b in mtrr_bp_restore () at 
arch/x86/kernel/cpu/mtrr/main.c:738
#50 0x812cabfa in __restore_processor_state (ctxt=optimized out) at 
arch/x86/power/cpu.c:227
#51 restore_processor_state () at arch/x86/power/cpu.c:233
---Type return to continue, or q return to quit---
#52 0x8105a1b6 in create_image (platform_mode=0) at 
kernel/power/hibernate.c:291
#53 hibernation_snapshot (platform_mode=0) at kernel/power/hibernate.c:363
#54 0x8105a825 in hibernate () at kernel/power/hibernate.c:629
#55 0x81058868 in state_store (kobj=optimized out, attr=optimized 
out,
buf=0x88001366e000 Address 0x88001366e000 out of bounds, n=5) at 
kernel/power/main.c:284
#56 0x811e6cc3 in kobj_attr_store (kobj=optimized out, attr=optimized 
out, buf=optimized out,
count=optimized out) at lib/kobject.c:699
#57 0x811372c2 in flush_write_buffer (count=optimized out, buffer=optimized 
out, dentry=optimized out)
at fs/sysfs/file.c:202
#58 sysfs_write_file (file=optimized out, buf=0x7f9ce710d000 Address 0x7f9ce710d000 
out of bounds, count=optimized out,
ppos=0x880014985f58) at fs/sysfs/file.c:236
#59 0x810ebfbe in vfs_write (file=0x88001cb48000, buf=0x7f9ce710d000 
Address 0x7f9ce710d000 out of bounds,
count=optimized out, pos=0x880014985f58) at fs/read_write.c:435
#60 0x810ec175 in sys_write (fd=optimized out, buf=0x7f9ce710d000 
Address 0x7f9ce710d000 out of bounds,
count=optimized out) at fs/read_write.c:487

that shows an access to clock happens right before 
x86_platform.restore_sched_clock_state
is called.

Moving x86_platform.restore_sched_clock_state before mtrr_bp_restore solves 
issue.
It isn't bugged on me for 20 save/restore cycles with this change, without this 
change
it bugs on 2nd-3rd cycle.


BTW Amit,
your config doesn't have CONFIG_KVM_GUEST set, which causes primary cpu clock 
to be

Re: WARNING: at arch/x86/kernel/smp.c:119 native_smp_send_reschedule+0x25/0x43()

2012-02-10 Thread Don Zickus

On Fri, Feb 10, 2012 at 09:36:03PM +0100, Peter Zijlstra wrote:
 On Fri, 2012-02-10 at 15:31 -0500, Don Zickus wrote:
  So my second patch which I will eventually post will just skip the WARN_ON
  if the system is going down.  Not sure if that is the proper way to address
  this problem or change all of the stop_this_cpu code to use a different
  bitmask than the cpu_online bitmask (but then you run the risk of a stuck
  IPI I guess if the cpu is halted without notifying anyone). 
 
 Yeah, the async hard kill of all cpus is bound to make problems.. what
 I'm wondering is, why is this in the normal shutdown path and not
 specific to a hard panic?

I didn't write the original code, I just changed it from REBOOT_IRQ to
NMI and left all the stop_this_cpu stuff alone.

 
 Trying to make this work is just not going to be pretty, and in the
 panic case we really don't care much.

Sure.

Cheers,
Don
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RFC v2 1/3] KVM: Guard mmu_notifier specific code with CONFIG_MMU_NOTIFIER

2012-02-10 Thread Marc Zyngier

In order to avoid compilation failure when KVM is not compiled in,
guard the mmu_notifier specific sections with both CONFIG_MMU_NOTIFIER
and KVM_ARCH_WANT_MMU_NOTIFIER, like it is being done in the rest of
the KVM code.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
---
 include/linux/kvm_host.h |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 900c763..a596b47 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -287,7 +287,7 @@ struct kvm {
struct hlist_head irq_ack_notifier_list;
 #endif
 
-#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
+#if defined(CONFIG_MMU_NOTIFIER)  defined(KVM_ARCH_WANT_MMU_NOTIFIER)
struct mmu_notifier mmu_notifier;
unsigned long mmu_notifier_seq;
long mmu_notifier_count;
@@ -695,7 +695,7 @@ struct kvm_stats_debugfs_item {
 extern struct kvm_stats_debugfs_item debugfs_entries[];
 extern struct dentry *kvm_debugfs_dir;
 
-#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
+#if defined(CONFIG_MMU_NOTIFIER)  defined(KVM_ARCH_WANT_MMU_NOTIFIER)
 static inline int mmu_notifier_retry(struct kvm_vcpu *vcpu, unsigned long 
mmu_seq)
 {
if (unlikely(vcpu-kvm-mmu_notifier_count))
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RFC v2 2/3] ARM: KVM: mark the end of the HYP mode code with __kvm_hyp_code_end

2012-02-10 Thread Marc Zyngier

Use __kvm_hyp_code_end to mark the end of the main HYP code instead of
__kvm_vcpu_run_end. It's a bit cleaner as we're about to add more code
to that section.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
---
 arch/arm/include/asm/kvm_asm.h |3 ++-
 arch/arm/kvm/arm.c |4 ++--
 arch/arm/kvm/interrupts.S  |8 +---
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 89c318ea..5ee7bd3 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -45,7 +45,8 @@ extern char __kvm_hyp_vector[];
 extern char __kvm_hyp_vector_end[];
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
-extern char __kvm_vcpu_run_end[];
+
+extern char __kvm_hyp_code_end[];
 #endif
 
 #endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 14ccc4d..602e087 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -636,9 +636,9 @@ static int init_hyp_mode(void)
 * Map the world-switch code
 */
err = create_hyp_mappings(kvm_hyp_pgd,
- __kvm_vcpu_run, __kvm_vcpu_run_end);
+ __kvm_vcpu_run, __kvm_hyp_code_end);
if (err) {
-   kvm_err(err, Cannot map world-switch code);
+   kvm_err(err, Cannot map hyp mode code);
goto out_free_mappings;
}
 
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index fbc26ca..8b7e5e9 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -351,11 +351,13 @@ return_to_ioctl:
 THUMB( orr lr, lr, #1)
mov pc, lr
 
-   .ltorg
 
-__kvm_vcpu_run_end:
-   .globl __kvm_vcpu_run_end
+   
+   .ltorg
 
+__kvm_hyp_code_end:
+   .globl  __kvm_hyp_code_end
+   
 
 @
 @  Hypervisor exception vector and handlers
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RFC v2 3/3] ARM: KVM: Add support for MMU notifiers

2012-02-10 Thread Marc Zyngier

Add the necessary infrastructure to handle MMU notifiers on KVM/ARM.
As we don't have shadow page tables, the implementation is actually very
simple. The only supported operation is kvm_unmap_hva(), where we remove
the HVA from the 2nd stage translation. All other hooks are NOPs.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
---
The aging ops are left unused for the moment, until I actually understand what
they are used for and whether they apply to the ARM architecture.

From v1:
- Fixed the brown paper bug of invalidating the hva instead of the ipa

 arch/arm/include/asm/kvm_asm.h  |2 +
 arch/arm/include/asm/kvm_host.h |   19 
 arch/arm/kvm/Kconfig|1 +
 arch/arm/kvm/interrupts.S   |   18 
 arch/arm/kvm/mmu.c  |   44 --
 5 files changed, 81 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 5ee7bd3..18be9bb 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -36,6 +36,7 @@ asm(.equ SMCHYP_HVBAR_W, 0xfff0);
 #endif /* __ASSEMBLY__ */
 
 #ifndef __ASSEMBLY__
+struct kvm;
 struct kvm_vcpu;
 
 extern char __kvm_hyp_init[];
@@ -46,6 +47,7 @@ extern char __kvm_hyp_vector_end[];
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 
+extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 extern char __kvm_hyp_code_end[];
 #endif
 
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 555a6f1..1c0c68b 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -109,4 +109,23 @@ struct kvm_vm_stat {
 struct kvm_vcpu_stat {
 };
 
+#define KVM_ARCH_WANT_MMU_NOTIFIER
+struct kvm;
+int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
+
+/* We do not have shadow page tables, hence the empty hooks */
+static inline int kvm_age_hva(struct kvm *kvm, unsigned long hva)
+{
+   return 0;
+}
+
+static inline int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
+{
+   return 0;
+}
+
+static inline void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t 
pte)
+{
+}
+
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index ccabbb3..7ce9173 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -36,6 +36,7 @@ config KVM_ARM_HOST
depends on KVM
depends on MMU
depends on CPU_V7 || ARM_VIRT_EXT
+   select  MMU_NOTIFIER
---help---
  Provides host support for ARM processors.
 
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 8b7e5e9..8822fb3 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -351,7 +351,25 @@ return_to_ioctl:
 THUMB( orr lr, lr, #1)
mov pc, lr
 
+ENTRY(__kvm_tlb_flush_vmid)
+   hvc #0  @ Switch to Hyp mode
+   push{r2, r3}
 
+   ldrdr2, r3, [r0, #KVM_VTTBR]
+   mcrrp15, 6, r2, r3, c2  @ Write VTTBR
+   isb
+   mcr p15, 0, r0, c8, c7, 0   @ TBLIALL
+   dsb
+   isb
+   mov r2, #0
+   mov r3, #0
+   mcrrp15, 6, r2, r3, c2  @ Back to VMID #0
+   isb
+
+   pop {r2, r3}
+   hvc #0  @ Back to SVC
+   mov pc, lr
+ENDPROC(__kvm_tlb_flush_vmid)

.ltorg
 
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index baeb8a1..3f8d83b 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -245,12 +245,12 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
kvm-arch.pgd = NULL;
 }
 
-static int __user_mem_abort(struct kvm *kvm, phys_addr_t addr, pfn_t pfn)
+static int stage2_set_pte(struct kvm *kvm, phys_addr_t addr, pte_t new_pte)
 {
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
-   pte_t *pte, new_pte;
+   pte_t *pte;
 
/* Create 2nd stage page table mapping - Level 1 */
pgd = kvm-arch.pgd + pgd_index(addr);
@@ -279,12 +279,18 @@ static int __user_mem_abort(struct kvm *kvm, phys_addr_t 
addr, pfn_t pfn)
pte = pte_offset_kernel(pmd, addr);
 
/* Create 2nd stage page table mapping - Level 3 */
-   new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
set_pte_ext(pte, new_pte, 0);
 
return 0;
 }
 
+static int __user_mem_abort(struct kvm *kvm, phys_addr_t addr, pfn_t pfn)
+{
+   pte_t new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
+
+   return stage2_set_pte(kvm, addr, new_pte);
+}
+
 static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
  gfn_t gfn, struct kvm_memory_slot *memslot)
 {
@@ -510,3 +516,35 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 
return user_mem_abort(vcpu, fault_ipa, gfn, memslot);
 }
+
+int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
+{
+   static const pte_t null_pte;
+   struct kvm_memslots *slots;
+   struct kvm_memory_slot *memslot;
+   int

Re: [PATCH RFC v2 2/3] ARM: KVM: mark the end of the HYP mode code with __kvm_hyp_code_end

2012-02-10 Thread Christoffer Dall

On Fri, Feb 10, 2012 at 2:22 PM, Marc Zyngier marc.zyng...@arm.com wrote:
 Use __kvm_hyp_code_end to mark the end of the main HYP code instead of
 __kvm_vcpu_run_end. It's a bit cleaner as we're about to add more code
 to that section.

this is good, but should we not rename the beginning of the section as
well (something like __kvm_hyp_code_start)???

Then we can include your new snippet and also the
__kvm_flush_vm_context in that section and perform a single mapping in
the init code - much nicer.


 Signed-off-by: Marc Zyngier marc.zyng...@arm.com
 ---
  arch/arm/include/asm/kvm_asm.h |    3 ++-
  arch/arm/kvm/arm.c             |    4 ++--
  arch/arm/kvm/interrupts.S      |    8 +---
  3 files changed, 9 insertions(+), 6 deletions(-)

 diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
 index 89c318ea..5ee7bd3 100644
 --- a/arch/arm/include/asm/kvm_asm.h
 +++ b/arch/arm/include/asm/kvm_asm.h
 @@ -45,7 +45,8 @@ extern char __kvm_hyp_vector[];
  extern char __kvm_hyp_vector_end[];

  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 -extern char __kvm_vcpu_run_end[];
 +
 +extern char __kvm_hyp_code_end[];
  #endif

  #endif /* __ARM_KVM_ASM_H__ */
 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index 14ccc4d..602e087 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -636,9 +636,9 @@ static int init_hyp_mode(void)
         * Map the world-switch code
         */
        err = create_hyp_mappings(kvm_hyp_pgd,
 -                                 __kvm_vcpu_run, __kvm_vcpu_run_end);
 +                                 __kvm_vcpu_run, __kvm_hyp_code_end);
        if (err) {
 -               kvm_err(err, Cannot map world-switch code);
 +               kvm_err(err, Cannot map hyp mode code);
                goto out_free_mappings;
        }

 diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
 index fbc26ca..8b7e5e9 100644
 --- a/arch/arm/kvm/interrupts.S
 +++ b/arch/arm/kvm/interrupts.S
 @@ -351,11 +351,13 @@ return_to_ioctl:
  THUMB( orr     lr, lr, #1)
        mov     pc, lr

 -       .ltorg

 -__kvm_vcpu_run_end:
 -       .globl __kvm_vcpu_run_end
 +
 +       .ltorg

 +__kvm_hyp_code_end:
 +       .globl  __kvm_hyp_code_end
 +

  @
  @  Hypervisor exception vector and handlers
 --
 1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC v2 3/3] ARM: KVM: Add support for MMU notifiers

2012-02-10 Thread Christoffer Dall

On Fri, Feb 10, 2012 at 2:22 PM, Marc Zyngier marc.zyng...@arm.com wrote:
 Add the necessary infrastructure to handle MMU notifiers on KVM/ARM.
 As we don't have shadow page tables, the implementation is actually very
 simple. The only supported operation is kvm_unmap_hva(), where we remove
 the HVA from the 2nd stage translation. All other hooks are NOPs.

 Signed-off-by: Marc Zyngier marc.zyng...@arm.com
 ---
 The aging ops are left unused for the moment, until I actually understand what
 they are used for and whether they apply to the ARM architecture.

 From v1:
 - Fixed the brown paper bug of invalidating the hva instead of the ipa

  arch/arm/include/asm/kvm_asm.h  |    2 +
  arch/arm/include/asm/kvm_host.h |   19 
  arch/arm/kvm/Kconfig            |    1 +
  arch/arm/kvm/interrupts.S       |   18 
  arch/arm/kvm/mmu.c              |   44 --
  5 files changed, 81 insertions(+), 3 deletions(-)

 diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
 index 5ee7bd3..18be9bb 100644
 --- a/arch/arm/include/asm/kvm_asm.h
 +++ b/arch/arm/include/asm/kvm_asm.h
 @@ -36,6 +36,7 @@ asm(.equ SMCHYP_HVBAR_W, 0xfff0);
  #endif /* __ASSEMBLY__ */

  #ifndef __ASSEMBLY__
 +struct kvm;
  struct kvm_vcpu;

  extern char __kvm_hyp_init[];
 @@ -46,6 +47,7 @@ extern char __kvm_hyp_vector_end[];

  extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);

 +extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
  extern char __kvm_hyp_code_end[];
  #endif

 diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
 index 555a6f1..1c0c68b 100644
 --- a/arch/arm/include/asm/kvm_host.h
 +++ b/arch/arm/include/asm/kvm_host.h
 @@ -109,4 +109,23 @@ struct kvm_vm_stat {
  struct kvm_vcpu_stat {
  };

 +#define KVM_ARCH_WANT_MMU_NOTIFIER
 +struct kvm;
 +int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
 +
 +/* We do not have shadow page tables, hence the empty hooks */
 +static inline int kvm_age_hva(struct kvm *kvm, unsigned long hva)
 +{
 +       return 0;
 +}
 +
 +static inline int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
 +{
 +       return 0;
 +}
 +
 +static inline void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, 
 pte_t pte)
 +{
 +}
 +
  #endif /* __ARM_KVM_HOST_H__ */
 diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
 index ccabbb3..7ce9173 100644
 --- a/arch/arm/kvm/Kconfig
 +++ b/arch/arm/kvm/Kconfig
 @@ -36,6 +36,7 @@ config KVM_ARM_HOST
        depends on KVM
        depends on MMU
        depends on CPU_V7 || ARM_VIRT_EXT
 +       select  MMU_NOTIFIER
        ---help---
          Provides host support for ARM processors.

 diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
 index 8b7e5e9..8822fb3 100644
 --- a/arch/arm/kvm/interrupts.S
 +++ b/arch/arm/kvm/interrupts.S
 @@ -351,7 +351,25 @@ return_to_ioctl:
  THUMB( orr     lr, lr, #1)
        mov     pc, lr


I would prefer moving this to the top of the file before all the
macros so the coherency between the world-switch to the guest and the
return path is clearer (see the v6 staging branch where I already
added a function).

What I want to avoid is this

__switch:
   do some switch stuff
__return:
   do some return stuff

cache_fun_1:
   foo

cache_fun_2:
   foo

cache_fun_3:
   foo

cache_fun_4:
   foo

vector:
   b __return


 +ENTRY(__kvm_tlb_flush_vmid)
 +       hvc     #0                      @ Switch to Hyp mode
 +       push    {r2, r3}

 +       ldrd    r2, r3, [r0, #KVM_VTTBR]
 +       mcrr    p15, 6, r2, r3, c2      @ Write VTTBR
 +       isb
 +       mcr     p15, 0, r0, c8, c7, 0   @ TBLIALL
 +       dsb
 +       isb
 +       mov     r2, #0
 +       mov     r3, #0
 +       mcrr    p15, 6, r2, r3, c2      @ Back to VMID #0
 +       isb
 +
 +       pop     {r2, r3}
 +       hvc     #0                      @ Back to SVC
 +       mov     pc, lr
 +ENDPROC(__kvm_tlb_flush_vmid)

        .ltorg

 diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
 index baeb8a1..3f8d83b 100644
 --- a/arch/arm/kvm/mmu.c
 +++ b/arch/arm/kvm/mmu.c
 @@ -245,12 +245,12 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
        kvm-arch.pgd = NULL;
  }

 -static int __user_mem_abort(struct kvm *kvm, phys_addr_t addr, pfn_t pfn)
 +static int stage2_set_pte(struct kvm *kvm, phys_addr_t addr, pte_t new_pte)
  {
        pgd_t *pgd;
        pud_t *pud;
        pmd_t *pmd;
 -       pte_t *pte, new_pte;
 +       pte_t *pte;

        /* Create 2nd stage page table mapping - Level 1 */
        pgd = kvm-arch.pgd + pgd_index(addr);
 @@ -279,12 +279,18 @@ static int __user_mem_abort(struct kvm *kvm, 
 phys_addr_t addr, pfn_t pfn)
                pte = pte_offset_kernel(pmd, addr);

        /* Create 2nd stage page table mapping - Level 3 */
 -       new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
        set_pte_ext(pte, new_pte, 0);

        return 0;
  }

 +static int __user_mem_abort(struct kvm *kvm, phys_addr_t addr, pfn_t pfn)
 +{
 +       pte_t

[PATCH v3 1/3] KVM: PPC: epapr: Factor out the epapr init

from the kvm guest paravirt init code.

Signed-off-by: Liu Yu yu@freescale.com
---
v3:
apply the epapr init for all ppc platform

 arch/powerpc/Kconfig|4 +++
 arch/powerpc/include/asm/epapr_hcalls.h |8 +
 arch/powerpc/kernel/Makefile|1 +
 arch/powerpc/kernel/epapr_para.c|   46 +++
 arch/powerpc/kernel/kvm.c   |   13 +++--
 arch/powerpc/kvm/Kconfig|1 +
 6 files changed, 64 insertions(+), 9 deletions(-)
 create mode 100644 arch/powerpc/kernel/epapr_para.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 47682b6..00bd508 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -196,6 +196,10 @@ config EPAPR_BOOT
  Used to allow a board to specify it wants an ePAPR compliant wrapper.
default n
 
+config EPAPR_PARA
+   bool
+   default n
+
 config DEFAULT_UIMAGE
bool
help
diff --git a/arch/powerpc/include/asm/epapr_hcalls.h 
b/arch/powerpc/include/asm/epapr_hcalls.h
index f3b0c2c..c4b86e4 100644
--- a/arch/powerpc/include/asm/epapr_hcalls.h
+++ b/arch/powerpc/include/asm/epapr_hcalls.h
@@ -148,6 +148,14 @@
 #define EV_HCALL_CLOBBERS2 EV_HCALL_CLOBBERS3, r5
 #define EV_HCALL_CLOBBERS1 EV_HCALL_CLOBBERS2, r4
 
+extern u32 *epapr_hcall_insts;
+extern int epapr_hcall_insts_len;
+
+static inline void epapr_get_hcall_insts(u32 **instp, int *lenp)
+{
+   *instp = epapr_hcall_insts;
+   *lenp = epapr_hcall_insts_len;
+}
 
 /*
  * We use uintptr_t to define a register because it's guaranteed to be a
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index ce4f7f1..1e41c76 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -134,6 +134,7 @@ ifneq ($(CONFIG_XMON)$(CONFIG_KEXEC),)
 obj-y  += ppc_save_regs.o
 endif
 
+obj-$(CONFIG_EPAPR_PARA)   += epapr_para.o
 obj-$(CONFIG_KVM_GUEST)+= kvm.o kvm_emul.o
 
 # Disable GCOV in odd or sensitive code
diff --git a/arch/powerpc/kernel/epapr_para.c b/arch/powerpc/kernel/epapr_para.c
new file mode 100644
index 000..7e1561a
--- /dev/null
+++ b/arch/powerpc/kernel/epapr_para.c
@@ -0,0 +1,46 @@
+/*
+ * ePAPR para-virtualization support.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright (C) 2012 Freescale Semiconductor, Inc.
+ */
+
+#include linux/of.h
+#include asm/epapr_hcalls.h
+#include asm/cacheflush.h
+
+u32 *epapr_hcall_insts;
+int epapr_hcall_insts_len;
+
+static int __init epapr_para_init(void)
+{
+   struct device_node *hyper_node;
+   u32 *insts;
+   int len;
+
+   hyper_node = of_find_node_by_path(/hypervisor);
+   if (!hyper_node)
+   return -ENODEV;
+
+   insts = (u32*)of_get_property(hyper_node, hcall-instructions, len);
+   if (!(len % 4)  (len = (4 * 4))) {
+   epapr_hcall_insts = insts;
+   epapr_hcall_insts_len = len;
+   }
+
+   return 0;
+}
+
+early_initcall(epapr_para_init);
diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
index b06bdae..2e03ab8 100644
--- a/arch/powerpc/kernel/kvm.c
+++ b/arch/powerpc/kernel/kvm.c
@@ -28,6 +28,7 @@
 #include asm/sections.h
 #include asm/cacheflush.h
 #include asm/disassemble.h
+#include asm/epapr_hcalls.h
 
 #define KVM_MAGIC_PAGE (-4096L)
 #define magic_var(x) KVM_MAGIC_PAGE + offsetof(struct kvm_vcpu_arch_shared, x)
@@ -535,18 +536,12 @@ EXPORT_SYMBOL_GPL(kvm_hypercall);
 static int kvm_para_setup(void)
 {
extern u32 kvm_hypercall_start;
-   struct device_node *hyper_node;
u32 *insts;
int len, i;
 
-   hyper_node = of_find_node_by_path(/hypervisor);
-   if (!hyper_node)
-   return -1;
-
-   insts = (u32*)of_get_property(hyper_node, hcall-instructions, len);
-   if (len % 4)
-   return -1;
-   if (len  (4 * 4))
+   insts = epapr_hcall_insts;
+   len = epapr_hcall_insts_len;
+   if (insts == NULL)
return -1;
 
for (i = 0; i  (len / 4); i++)
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 78133de..cd1ee68 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -20,6 +20,7 @@ config KVM
bool
select PREEMPT_NOTIFIERS
select ANON_INODES
+   select EPAPR_PARA

[PATCH v3 3/3] KVM: PPC: epapr: install ev_idle hcall for e500 guest