Re: [PATCH 3/6] KVM MMU: optimize/cleanup for marking parent unsync

2010-04-13 Thread Xiao Guangrong


Xiao Guangrong wrote:
> 
> Actually, the origin code has a bug, the code segment in mmu_parent_walk():
> 
> | if (!sp->multimapped && sp->parent_pte) {
> | ..
> | return;
> | }
> | hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link)
> | for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i) {
> | ..
> | }
> 
> So, if sp->parent_pte == NULL, it's unsafe...

Marcelo, please ignore this, it not a bug, just my mistake, sorry...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM test: Fixes on autotest subtest

2010-04-13 Thread Lucas Meneghel Rodrigues
 * Make it possible to run any generic control file inside
the guest. The reason is that we might want to try
any arbitrary combination of tests instead of just one
single test. Due to this, the test_name parameter isn't
necessary anymore.
 * Also, if the autotest test times out, try to bring back
logs for the tests that eventually might have passed,
as well as get the assessment of the test statuses at the
time the control file run timed out.

Signed-off-by: Lucas Meneghel Rodrigues 
---
 client/tests/kvm/kvm_test_utils.py |  109 
 client/tests/kvm/tests/autotest.py |4 +-
 client/tests/kvm/tests_base.cfg.sample |   14 
 3 files changed, 55 insertions(+), 72 deletions(-)

diff --git a/client/tests/kvm/kvm_test_utils.py 
b/client/tests/kvm/kvm_test_utils.py
index f512044..699601e 100644
--- a/client/tests/kvm/kvm_test_utils.py
+++ b/client/tests/kvm/kvm_test_utils.py
@@ -241,15 +241,14 @@ def get_memory_info(lvms):
 return meminfo
 
 
-def run_autotest(vm, session, control_path, timeout, test_name, outputdir):
+def run_autotest(vm, session, control_path, timeout, outputdir):
 """
 Run an autotest control file inside a guest (linux only utility).
 
 @param vm: VM object.
 @param session: A shell session on the VM provided.
-@param control: An autotest control file.
-@param timeout: Timeout under which the autotest test must complete.
-@param test_name: Autotest client test name.
+@param control_path: A path to an autotest control file.
+@param timeout: Timeout under which the autotest control file must 
complete.
 @param outputdir: Path on host where we should copy the guest autotest
 results to.
 """
@@ -305,22 +304,49 @@ def run_autotest(vm, session, control_path, timeout, 
test_name, outputdir):
 logging.error("Uncompress output:\n%s" % output)
 raise error.TestFail("Could not extract %s on guest" % basename)
 
+
+def get_results():
+"""
+Copy autotest results present on the guest back to the host.
+"""
+logging.info("Trying to copy autotest results from guest")
+guest_results_dir = os.path.join(outputdir, "guest_autotest_results")
+if not os.path.exists(guest_results_dir):
+os.mkdir(guest_results_dir)
+if not vm.copy_files_from("%s/results/default/*" % autotest_path,
+  guest_results_dir):
+logging.error("Could not copy autotest results from guest")
+
+
+def get_results_summary():
+"""
+Get the status of the tests that were executed on the host and close
+the session where autotest was being executed.
+"""
+output = session.get_command_output("cat results/*/status")
+results = scan_results.parse_results(output)
+# Report test results
+logging.info("Results (test, status, duration, info):")
+for result in results:
+logging.info(str(result))
+session.close()
+return results
+
+
 if not os.path.isfile(control_path):
 raise error.TestError("Invalid path to autotest control file: %s" %
   control_path)
 
-tarred_autotest_path = "/tmp/autotest.tar.bz2"
-tarred_test_path = "/tmp/%s.tar.bz2" % test_name
+compressed_autotest_path = "/tmp/autotest.tar.bz2"
 
 # To avoid problems, let's make the test use the current AUTODIR
 # (autotest client path) location
 autotest_path = os.environ['AUTODIR']
-tests_path = os.path.join(autotest_path, 'tests')
-test_path = os.path.join(tests_path, test_name)
 
 # tar the contents of bindir/autotest
-cmd = "tar cvjf %s %s/*" % (tarred_autotest_path, autotest_path)
-cmd += " --exclude=%s/tests" % autotest_path
+cmd = "tar cvjf %s %s/*" % (compressed_autotest_path, autotest_path)
+# Until we have nested virtualization, we don't need the kvm test :)
+cmd += " --exclude=%s/tests/kvm" % autotest_path
 cmd += " --exclude=%s/results" % autotest_path
 cmd += " --exclude=%s/tmp" % autotest_path
 cmd += " --exclude=%s/control" % autotest_path
@@ -329,34 +355,18 @@ def run_autotest(vm, session, control_path, timeout, 
test_name, outputdir):
 cmd += " --exclude=*.git"
 utils.run(cmd)
 
-# tar the contents of bindir/autotest/tests/
-cmd = "tar cvjf %s %s/*" % (tarred_test_path, test_path)
-cmd += " --exclude=*.pyc"
-cmd += " --exclude=*.svn"
-cmd += " --exclude=*.git"
-utils.run(cmd)
-
 # Copy autotest.tar.bz2
-copy_if_size_differs(vm, tarred_autotest_path, tarred_autotest_path)
-
-# Copy .tar.bz2
-copy_if_size_differs(vm, tarred_test_path, tarred_test_path)
+copy_if_size_differs(vm, compressed_autotest_path, 
compressed_autotest_path)
 
 # Extract autotest.tar.bz2
-extract(vm, tarred_autotest_path, "/")
-
-# mkdir autotest/tests
-session.get_command_output("mkdir -p %s" % t

Re: [PATCH 3/6] KVM MMU: optimize/cleanup for marking parent unsync

2010-04-13 Thread Xiao Guangrong


Marcelo Tosatti wrote:

>>> I'd prefer to not touch it.
>> This patch avoids walk all parents and i think this overload is really 
>> unnecessary.
>> It has other tricks in this codepath but i not noticed? :-)
> 
> My point is that there is no point in optimizing something unless its
> performance sensitive.

Hi Marcelo,

I think optimizing not only means 'performance' but also means 'smaller 
code'(maybe 'cleanup'
is more suitable) and 'logic optimize'(do little things), i'm not sure this 
patch whether can
improve system performance obviously but it optimize the code logic and reduce 
code size, and
it not harm other code and system performance, right? :-)

Actually, the origin code has a bug, the code segment in mmu_parent_walk():

|   if (!sp->multimapped && sp->parent_pte) {
|   ..
|   return;
|   }
|   hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link)
|   for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i) {
|   ..
|   }

So, if sp->parent_pte == NULL, it's unsafe...

> And as i recall, mmu_unsync_walk was much more
> sensitive performance wise than parent walking. Actually, gfn_to_memslot 
> seems more important since its also noticeable on EPT/NPT hosts.

Yeah, i also noticed these and i'm looking into these code.

Thanks,
Xiao
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: VM performance issue in KVM guests.

2010-04-13 Thread Zhang, Xiantao
Avi Kivity wrote:
> On 04/13/2010 03:50 AM, Zhang, Xiantao wrote:
>> Avi Kivity wrote:
>> 
>>> On 04/12/2010 05:04 AM, Zhang, Xiantao wrote:
>>> 
 
> What was the performance hit?  What was your I/O setup (image
> format, using aio?) 
> 
> 
 The issue only happens when vcpu number is over-committed(e.g.
 vcpu/pcpu>2) and physical cpus are saturated. For example,  when
 run webbench in windows OS in this case, its performance drops by
 80%. In our experiment, we are using image file through virtio,
 and I think aio should be used by default also.
 
 
>>> Is this on a machine that does pause-loop exits?  The current
>>> handing of PLE is very suboptimal.  With proper directed yield we
>>> should be much better there. 
>>> 
>>> Without PLE, we need paravirtualized spinlocks, no way around it.
>>> 
>> PLE has the ability to eliminate the issue at some extent, and pv
>> solution should be helpful also.  But for windows guests running on
>> machines without PLE, we still needs to enhance host side to resolve
>> the issue.   
>> 
> 
> Well, was this on a machine with PLE or without PLE?

I am saying the machine has no PLE feature support. Even with PLE feature 
support, there is still performance loss due to PLE's cost. 
 
>>> Spin loops need to be addressed first, they are known to kill
>>> performance in overcommit situations.
>>> 
>> Even in overcommit case, if vcpu threads of one qemu are not
>> scheduled or pulled to the same logical processor, the performance
>> drop is tolerant like Xen's case today. But for KVM, it has to
>> suffer from additional performance loss, since host's scheduler
>> actively pulls these vcpu threads together.
>> 
>> 
> Can you quantify this loss?  Give examples of what happens?

For example, one machine is configured with 2 pCPUs and there are two Windows 
guests running on the machine, and each guest is cconfigured with 2 vcpus and 
one webbench server runs in it. 
If use host's default scheduler, webbench's performance is very bad, but if pin 
each geust's vCPU0 to pCPU0 and vCPU1 to pCPU1, we can see 5-10X performance 
improvement with same CPU utilization. 
In addition, we also see kvm's perf scalability is also impacted in large 
systems, for some performance experiments, kvm's perf begins to drop when vCPU 
is overcommitted and pCPU are saturated, but once the wake_up_affine feature is 
switched off in scheduler, kvm's perf can keep rising in this case.   
Xiantao

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kvmtrace and debugging kvm

2010-04-13 Thread Manish Regmi
hi,
  I am trying to use kvmtrace but it looks like its no longer used. in
kvm_main.c it is returning -EOPNOTSUP.
kvmtrace -V -D test -o mykvmtest
does not seem to do anything.

is it no longer used? is there any better way of debugging kvm?
Thank you.

---
regards
Manish Regmi

http://manish-cs.blogspot.com
http://ext2read.sf.net
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6] KVM MMU: fix kvm_mmu_zap_page() and its calling path

2010-04-13 Thread Xiao Guangrong


Marcelo Tosatti wrote:
> On Tue, Apr 13, 2010 at 09:34:14AM +0800, Xiao Guangrong wrote:
>>
>> Marcelo Tosatti wrote:
>>
 @@ -1483,8 +1483,8 @@ static int mmu_zap_unsync_children(struct kvm *kvm,
for_each_sp(pages, sp, parents, i) {
kvm_mmu_zap_page(kvm, sp);
mmu_pages_clear_parents(&parents);
 +  zapped++;
}
 -  zapped += pages.nr;
kvm_mmu_pages_init(parent, &parents, &pages);
}
>>> Don't see why this is needed? The for_each_sp loop uses pvec.nr.
>> I think mmu_zap_unsync_children() should return the number of zapped pages 
>> then we
>> can adjust the number of free pages in kvm_mmu_change_mmu_pages(), but 
>> pages.nr no
>> only includes the unsync/zapped pages but also includes their parents.
> 
> Oh i see. I think its safer to check for list_empty then to rely on
> proper accounting there, like __kvm_mmu_free_some_pages does.

Do you mean that we'd better add WARN_ON(list_empty()) code in 
kvm_mmu_zap_page()?

Thanks,
Xiao
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH] KVM test: Memory ballooning test for KVM guest

2010-04-13 Thread Lucas Meneghel Rodrigues
On Tue, Apr 13, 2010 at 1:24 AM, pradeep  wrote:
>
> Lucas
>
> Any comments??

No pradeep, Sudhir's comments are very reasonable, please implement
the changes he suggested and re-send the patch, I am going to do some
testing on it on my local systems and then commit it upstream when
done with testing.

>
> --sp
>
> pradeep wrote:
>
>> sudhir kumar wrote:
>>
>>
>>> On Fri, Apr 9, 2010 at 2:40 PM, pradeep  wrote:
>>>
>>>
 Hi Lucas

 Thanks for your comments.
 Please find the patch, with suggested changes.

 Thanks
 Pradeep



 Signed-off-by: Pradeep Kumar Surisetty 
 ---
 diff -uprN autotest-old/client/tests/kvm/tests/balloon_check.py
 autotest/client/tests/kvm/tests/balloon_check.py
 --- autotest-old/client/tests/kvm/tests/balloon_check.py        1969-12-31
 19:00:00.0 -0500
 +++ autotest/client/tests/kvm/tests/balloon_check.py    2010-04-09
 12:33:34.0 -0400
 @@ -0,0 +1,47 @@
 +import re, string, logging, random, time
 +from autotest_lib.client.common_lib import error
 +import kvm_test_utils, kvm_utils
 +
 +def run_balloon_check(test, params, env):
 +    """
 +    Check Memory ballooning:
 +    1) Boot a guest
 +    2) Increase and decrease the memory of guest using balloon command 
 from
 monitor


>>> Better replace this description by "Change the guest memory between X
>>> and Y values"
>>> Also instead of using 0.6 and 0.95 below, better use two variables and
>>> take their value from config file. This will give the user a
>>> flexibility to narrow or widen the ballooning range.
>>>
>>>
>> Thanks for your suggestions. I dont  think, user should need flexibility
>> here.
>> If ballooning doest work for one set of value, it will not work for any
>> other.
>> And here, we are choosing between 60 to 95% of actual values, which is
>> reasonable.
>>
>>
>>>
>>>
 +    3) check memory info
 +
 +   �...@param test: kvm test object
 +   �...@param params: Dictionary with the test parameters
 +   �...@param env: Dictionary with test environment.
 +    """
 +
 +    vm = kvm_test_utils.get_living_vm(env, params.get("main_vm"))
 +    session = kvm_test_utils.wait_for_login(vm)
 +    fail = 0
 +
 +    # Check memory size
 +    logging.info("Memory size check")
 +    expected_mem = int(params.get("mem"))
 +    actual_mem = vm.get_memory_size()
 +    if actual_mem != expected_mem:
 +        logging.error("Memory size mismatch:")
 +        logging.error("Assigned to VM: %s" % expected_mem)
 +        logging.error("Reported by OS: %s" % actual_mem)
 +
 +    #change memory to random size between 60% to 95% of actual memory
 +    percent = random.uniform(0.6, 0.95)
 +    new_mem = int(percent*expected_mem)
 +    vm.send_monitor_cmd("balloon %s" %new_mem)


>>> You may want to check if the command passed/failed. Older versions
>>> might not support ballooning.
>>>
>>>
>>
>> sure,  i will make changes here.
>>
>>>
>>>
 +    time.sleep(20)


>>> why 20 second sleep and why the magic number?
>>>
>>>
>> As soon as balloon command is passed, it takes some time for memory
>> ballooing.
>> If we check "info balloon" as soon as we run ballooning, it will give
>> weird values.
>> I just choose it as, 20sec and its not huge time from testing perspective.
>>
>>>
>>>
 +    status, output = vm.send_monitor_cmd("info balloon")


>>> You might want to put this check before changing the memory.
>>>
>>>
>>>
>> sure, will make changes here too.
>>
 +    if status != 0:
 +        logging.error("qemu monitor command failed: info balloon")
 +
 +    balloon_cmd_mem = int(re.findall("\d+",output)[0])


>>> A better variable name I can think of is "ballooned_mem"
>>>
>>>
>>
>> will  change it...
>>
>>>
>>>
 +    if balloon_cmd_mem != new_mem:
 +        logging.error("memory ballooning failed while changing memory to
 %s" %balloon_cmd_mem)
 +       fail += 1
 +
 +    #Checking for test result
 +    if fail != 0:


>>> In case you are running multiple iterations and the 2nd iteration
>>> fails you will always miss this condition.
>>>
>>>
>>
>>
>>>
>>>
 +        raise error.TestFail("Memory ballooning test failed ")
 +    session.close()
 diff -uprN autotest-old/client/tests/kvm/tests_base.cfg.sample
 autotest/client/tests/kvm/tests_base.cfg.sample
 --- autotest-old/client/tests/kvm/tests_base.cfg.sample 2010-04-09
 12:32:50.0 -0400
 +++ autotest/client/tests/kvm/tests_base.cfg.sample     2010-04-09
 12:53:27.0 -0400
 @@ -185,6 +185,10 @@ variants:
                 drift_threshold = 10
                 drift_threshold_single = 3

 +    - balloon_check:  install setup unattended_install boot
 +        type = balloon_check
 +        extra_params += " -b

Re: [Qemu-devel] [GSoC 2010] Pass-through filesystem support.

2010-04-13 Thread Mohammed Gamal
On Tue, Apr 13, 2010 at 9:08 PM, jvrao  wrote:
> jvrao wrote:
>> Alexander Graf wrote:
>>> On 12.04.2010, at 13:58, Jamie Lokier wrote:
>>>
 Mohammed Gamal wrote:
> On Mon, Apr 12, 2010 at 12:29 AM, Jamie Lokier  
> wrote:
>> Javier Guerra Giraldez wrote:
>>> On Sat, Apr 10, 2010 at 7:42 AM, Mohammed Gamal  
>>> wrote:
 On Sat, Apr 10, 2010 at 2:12 PM, Jamie Lokier  
 wrote:
> To throw a spanner in, the most widely supported filesystem across
> operating systems is probably NFS, version 2 :-)
 Remember that Windows usage on a VM is not some rare use case, and
 it'd be a little bit of a pain from a user's perspective to have to
 install a third party NFS client for every VM they use. Having
 something supported on the VM out of the box is a better option IMO.
>>> i don't think virtio-CIFS has any more support out of the box (on any
>>> system) than virtio-9P.
>> It doesn't, but at least network-CIFS tends to work ok and is the
>> method of choice for Windows VMs - when you can setup Samba on the
>> host (which as previously noted you cannot always do non-disruptively
>> with current Sambas).
>>
>> -- Jamie
>>
> I think having support for both 9p and CIFS would be the best option.
> In that case the user will have the option to use either one,
> depending on how their guests support these filesystems. In that case
> I'd prefer to work on CIFS support while the 9p effort can still go
> on. I don't think both efforts are mutually exclusive.
>
> What do the rest of you guys think?
 I only noted NFS because most old OSes do not support CIFS or 9P -
 especially all the old unixes.

 I don't think old versions of MS-DOS and Windows (95, 98, ME, Nt4?)
 even support current CIFS.  They need extra server settings to work
 - such as setting passwords on the server to non-encrypted and other 
 quirks.

 Meanwhile Windows Vista/2008/7 works better with SMB2, not CIFS, to
 properly see symlinks and hard links.

 So there is no really nice out of the box file service which works
 easily with all guest OSes.

 I'm guessing that out of all the filesystems, CIFS is the most widely
 supported in recent OSes (released in the last 10 years).  But I'm not
 really sure what the state of CIFS is for non-Windows, non-Linux,
 non-BSD guests.
>>> So what? If you want to have direct host fs access, install guest drivers. 
>>> If you can't, set up networking and use CIFS or NFS or whatever.
>>>
 I'm not sure why 9P is being pursued.  Does anything much support it,
 or do all OSes except quite recent Linux need a custom driver for 9P?
 Even Linux only got the first commit in the kernel 5 years ago, so
 probably it was only about 3 years ago that it will have begun
 appearing in stable distros, if at all.  Filesystem passthrough to
 Linux guests installed in the last couple of years is a useful
 feature, and I know that for many people that is their only use of
 KVM, but compared with CIFS' broad support it seems like quite a
 narrow goal.
>>> The goal is to have something simple and fast. We can fine-tune 9P to align 
>>> with the Linux VFS structures, making it really little overhead (and little 
>>> headache too). For Windows guests, nothing prevents us to expose yet 
>>> another 9P flavor. That again would be aligned well with Windows's VFS and 
>>> be slim and fast there.
>>>
>>> The biggest problem I see with CIFS is that it's a huge beast. There are a 
>>> lot of corner cases where it just doesn't fit in. See my previous mail for 
>>> more details.
>>>
>> As Alex mentioned, 9P is chosen for its mere simplicity and easy 
>> adaptability.
>> NFS and CIFS does not give that flexibility. As we mentioned in the patch 
>> series, we are
>> already seeing better numbers with 9P. Looking ahead 9P can embed KVM/QEMU 
>> knowledge
>> to share physical resources like memory/cache between the host and the guest.
>>
>> I think looking into the windows side of 9P client would be great option too.
>> The current patch on the mailing list supports .U (unix) protocol and will 
>> be introducing
>> .L (Linux) variant as we move forward.
>
> Hi Mohammed,
> Please let us know once you decide on where your interest lies.
> Will be glad to have you on VirtFS (9P) though. :)
>
>
> - JV
>

It seems the community is more keen on getting 9P support merged than
getting CIFS supported, and they have made good points to support
their argument. I'm not sure whether work on this project could fit in
as a GSoC project and if there is much remaining work that could make
it fit in that direction. But I'd be glad to volunteer anyway :)

Regards,
Mohammed
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.ke

Re: [Qemu-devel] [GSoC 2010] Pass-through filesystem support.

2010-04-13 Thread jvrao
jvrao wrote:
> Alexander Graf wrote:
>> On 12.04.2010, at 13:58, Jamie Lokier wrote:
>>
>>> Mohammed Gamal wrote:
 On Mon, Apr 12, 2010 at 12:29 AM, Jamie Lokier  wrote:
> Javier Guerra Giraldez wrote:
>> On Sat, Apr 10, 2010 at 7:42 AM, Mohammed Gamal  
>> wrote:
>>> On Sat, Apr 10, 2010 at 2:12 PM, Jamie Lokier  
>>> wrote:
 To throw a spanner in, the most widely supported filesystem across
 operating systems is probably NFS, version 2 :-)
>>> Remember that Windows usage on a VM is not some rare use case, and
>>> it'd be a little bit of a pain from a user's perspective to have to
>>> install a third party NFS client for every VM they use. Having
>>> something supported on the VM out of the box is a better option IMO.
>> i don't think virtio-CIFS has any more support out of the box (on any
>> system) than virtio-9P.
> It doesn't, but at least network-CIFS tends to work ok and is the
> method of choice for Windows VMs - when you can setup Samba on the
> host (which as previously noted you cannot always do non-disruptively
> with current Sambas).
>
> -- Jamie
>
 I think having support for both 9p and CIFS would be the best option.
 In that case the user will have the option to use either one,
 depending on how their guests support these filesystems. In that case
 I'd prefer to work on CIFS support while the 9p effort can still go
 on. I don't think both efforts are mutually exclusive.

 What do the rest of you guys think?
>>> I only noted NFS because most old OSes do not support CIFS or 9P -
>>> especially all the old unixes.
>>>
>>> I don't think old versions of MS-DOS and Windows (95, 98, ME, Nt4?)
>>> even support current CIFS.  They need extra server settings to work
>>> - such as setting passwords on the server to non-encrypted and other quirks.
>>>
>>> Meanwhile Windows Vista/2008/7 works better with SMB2, not CIFS, to
>>> properly see symlinks and hard links.
>>>
>>> So there is no really nice out of the box file service which works
>>> easily with all guest OSes.
>>>
>>> I'm guessing that out of all the filesystems, CIFS is the most widely
>>> supported in recent OSes (released in the last 10 years).  But I'm not
>>> really sure what the state of CIFS is for non-Windows, non-Linux,
>>> non-BSD guests.
>> So what? If you want to have direct host fs access, install guest drivers. 
>> If you can't, set up networking and use CIFS or NFS or whatever.
>>
>>> I'm not sure why 9P is being pursued.  Does anything much support it,
>>> or do all OSes except quite recent Linux need a custom driver for 9P?
>>> Even Linux only got the first commit in the kernel 5 years ago, so
>>> probably it was only about 3 years ago that it will have begun
>>> appearing in stable distros, if at all.  Filesystem passthrough to
>>> Linux guests installed in the last couple of years is a useful
>>> feature, and I know that for many people that is their only use of
>>> KVM, but compared with CIFS' broad support it seems like quite a
>>> narrow goal.
>> The goal is to have something simple and fast. We can fine-tune 9P to align 
>> with the Linux VFS structures, making it really little overhead (and little 
>> headache too). For Windows guests, nothing prevents us to expose yet another 
>> 9P flavor. That again would be aligned well with Windows's VFS and be slim 
>> and fast there.
>>
>> The biggest problem I see with CIFS is that it's a huge beast. There are a 
>> lot of corner cases where it just doesn't fit in. See my previous mail for 
>> more details.
>>
> As Alex mentioned, 9P is chosen for its mere simplicity and easy adaptability.
> NFS and CIFS does not give that flexibility. As we mentioned in the patch 
> series, we are 
> already seeing better numbers with 9P. Looking ahead 9P can embed KVM/QEMU 
> knowledge
> to share physical resources like memory/cache between the host and the guest.
> 
> I think looking into the windows side of 9P client would be great option too. 
> The current patch on the mailing list supports .U (unix) protocol and will be 
> introducing
> .L (Linux) variant as we move forward.

Hi Mohammed,
Please let us know once you decide on where your interest lies.
Will be glad to have you on VirtFS (9P) though. :)


- JV

> 
> - JV
> 
> 
>> Alex
>>
>>
>>
> 
> 
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost: fix sparse warnings

2010-04-13 Thread Michael S. Tsirkin
On Tue, Apr 13, 2010 at 02:11:25PM -0400, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig 
> 

Thanks, applied.

> Index: linux-2.6/drivers/vhost/net.c
> ===
> --- linux-2.6.orig/drivers/vhost/net.c2010-04-05 21:13:24.196004388 
> +0200
> +++ linux-2.6/drivers/vhost/net.c 2010-04-05 21:13:32.726004109 +0200
> @@ -641,7 +641,7 @@ static struct miscdevice vhost_net_misc
>   &vhost_net_fops,
>  };
>  
> -int vhost_net_init(void)
> +static int vhost_net_init(void)
>  {
>   int r = vhost_init();
>   if (r)
> @@ -658,7 +658,7 @@ err_init:
>  }
>  module_init(vhost_net_init);
>  
> -void vhost_net_exit(void)
> +static void vhost_net_exit(void)
>  {
>   misc_deregister(&vhost_net_misc);
>   vhost_cleanup();
> Index: linux-2.6/drivers/vhost/vhost.c
> ===
> --- linux-2.6.orig/drivers/vhost/vhost.c  2010-04-05 21:12:58.806004249 
> +0200
> +++ linux-2.6/drivers/vhost/vhost.c   2010-04-05 21:14:44.681010674 +0200
> @@ -710,7 +710,7 @@ int vhost_log_write(struct vhost_virtque
>   return 0;
>  }
>  
> -int translate_desc(struct vhost_dev *dev, u64 addr, u32 len,
> +static int translate_desc(struct vhost_dev *dev, u64 addr, u32 len,
>  struct iovec iov[], int iov_size)
>  {
>   const struct vhost_memory_region *reg;
> @@ -736,7 +736,7 @@ int translate_desc(struct vhost_dev *dev
>   _iov = iov + ret;
>   size = reg->memory_size - addr + reg->guest_phys_addr;
>   _iov->iov_len = min((u64)len, size);
> - _iov->iov_base = (void *)(unsigned long)
> + _iov->iov_base = (void __user *)(unsigned long)
>   (reg->userspace_addr + addr - reg->guest_phys_addr);
>   s += size;
>   addr += size;
> @@ -990,7 +990,7 @@ void vhost_discard_vq_desc(struct vhost_
>   * want to notify the guest, using eventfd. */
>  int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
>  {
> - struct vring_used_elem *used;
> + struct vring_used_elem __user *used;
>  
>   /* The virtqueue contains a ring of used buffers.  Get a pointer to the
>* next entry in that used ring. */
> @@ -1014,7 +1014,8 @@ int vhost_add_used(struct vhost_virtqueu
>   smp_wmb();
>   /* Log used ring entry write. */
>   log_write(vq->log_base,
> -   vq->log_addr + ((void *)used - (void *)vq->used),
> +   vq->log_addr +
> +((void __user *)used - (void __user *)vq->used),
> sizeof *used);
>   /* Log used index update. */
>   log_write(vq->log_base,
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost: fix sparse warnings

2010-04-13 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 


Index: linux-2.6/drivers/vhost/net.c
===
--- linux-2.6.orig/drivers/vhost/net.c  2010-04-05 21:13:24.196004388 +0200
+++ linux-2.6/drivers/vhost/net.c   2010-04-05 21:13:32.726004109 +0200
@@ -641,7 +641,7 @@ static struct miscdevice vhost_net_misc
&vhost_net_fops,
 };
 
-int vhost_net_init(void)
+static int vhost_net_init(void)
 {
int r = vhost_init();
if (r)
@@ -658,7 +658,7 @@ err_init:
 }
 module_init(vhost_net_init);
 
-void vhost_net_exit(void)
+static void vhost_net_exit(void)
 {
misc_deregister(&vhost_net_misc);
vhost_cleanup();
Index: linux-2.6/drivers/vhost/vhost.c
===
--- linux-2.6.orig/drivers/vhost/vhost.c2010-04-05 21:12:58.806004249 
+0200
+++ linux-2.6/drivers/vhost/vhost.c 2010-04-05 21:14:44.681010674 +0200
@@ -710,7 +710,7 @@ int vhost_log_write(struct vhost_virtque
return 0;
 }
 
-int translate_desc(struct vhost_dev *dev, u64 addr, u32 len,
+static int translate_desc(struct vhost_dev *dev, u64 addr, u32 len,
   struct iovec iov[], int iov_size)
 {
const struct vhost_memory_region *reg;
@@ -736,7 +736,7 @@ int translate_desc(struct vhost_dev *dev
_iov = iov + ret;
size = reg->memory_size - addr + reg->guest_phys_addr;
_iov->iov_len = min((u64)len, size);
-   _iov->iov_base = (void *)(unsigned long)
+   _iov->iov_base = (void __user *)(unsigned long)
(reg->userspace_addr + addr - reg->guest_phys_addr);
s += size;
addr += size;
@@ -990,7 +990,7 @@ void vhost_discard_vq_desc(struct vhost_
  * want to notify the guest, using eventfd. */
 int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
 {
-   struct vring_used_elem *used;
+   struct vring_used_elem __user *used;
 
/* The virtqueue contains a ring of used buffers.  Get a pointer to the
 * next entry in that used ring. */
@@ -1014,7 +1014,8 @@ int vhost_add_used(struct vhost_virtqueu
smp_wmb();
/* Log used ring entry write. */
log_write(vq->log_base,
- vq->log_addr + ((void *)used - (void *)vq->used),
+ vq->log_addr +
+  ((void __user *)used - (void __user *)vq->used),
  sizeof *used);
/* Log used index update. */
log_write(vq->log_base,
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [qemu-kvm] fix CPUID vendor override

2010-04-13 Thread Marcelo Tosatti
On Sun, Apr 11, 2010 at 10:30:30PM +0200, Andre Przywara wrote:
> the meaning of vendor_override is actually the opposite of how it
> is currently used :-(
> This fix reverts the workaround 4dad7ff3 and replaces it with the
> correct version.
> Fix it to allow KVM to export the non-native CPUID vendor if
> explicitly requested by the user.
> 
> Signed-off-by: Andre Przywara 
> ---
>  target-i386/cpuid.c |4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> I already sent appropriate patches for upstream qemu git HEAD and
> qemu stable. This fix is especially for qemu-kvm,
> as it differs from upstream QEMU.
> 
> Regards,
> Andre.

Applied for both stable and master, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] svm: implement NEXTRIPsave SVM feature

2010-04-13 Thread Marcelo Tosatti
On Sun, Apr 11, 2010 at 11:07:28PM +0200, Andre Przywara wrote:
> On SVM we set the instruction length of skipped instructions
> to hard-coded, well known values, which could be wrong when (bogus,
> but valid) prefixes (REX, segment override) are used.
> Newer AMD processors (Fam10h 45nm and better, aka. PhenomII or
> AthlonII) have an explicit NEXTRIP field in the VMCB containing the
> desired information.
> Since it is cheap to do so, we use this field to override the guessed
> value on newer processors.
> A fix for older CPUs would be rather expensive, as it would require
> to fetch and partially decode the instruction. As the problem is not
> a security issue and needs special, handcrafted code to trigger
> (no compiler will ever generate such code), I omit a fix for older
> CPUs.
> If someone is interested, I have both a patch for these CPUs as well as
> demo code triggering this issue: It segfaults under KVM, but runs
> perfectly on native Linux.
> 
> Signed-off-by: Andre Przywara 
> ---
>  arch/x86/include/asm/svm.h |4 +++-
>  arch/x86/kvm/svm.c |   13 -
>  2 files changed, 11 insertions(+), 6 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] KVM: move DR register access handling into generic code

2010-04-13 Thread Marcelo Tosatti
On Tue, Apr 13, 2010 at 10:05:23AM +0300, Gleb Natapov wrote:
> Currently both SVM and VMX have their own DR handling code. Move it to
> x86.c.
> 
> Changelog:
>  v1->v2
>   - kvm_set_dr() always return 1 in a case of error
> 
> Signed-off-by: Gleb Natapov 

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] KVM: move DR register access handling into generic code

2010-04-13 Thread Jan Kiszka
Gleb Natapov wrote:
> Currently both SVM and VMX have their own DR handling code. Move it to
> x86.c.
> 
> Changelog:
>  v1->v2
>   - kvm_set_dr() always return 1 in a case of error
> 
> Signed-off-by: Gleb Natapov 

Acked-by: Jan Kiszka 

> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 0c49c88..5d5e0a9 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -496,8 +496,7 @@ struct kvm_x86_ops {
>   void (*set_idt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
>   void (*get_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
>   void (*set_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
> - int (*get_dr)(struct kvm_vcpu *vcpu, int dr, unsigned long *dest);
> - int (*set_dr)(struct kvm_vcpu *vcpu, int dr, unsigned long value);
> + void (*set_dr7)(struct kvm_vcpu *vcpu, unsigned long value);
>   void (*cache_reg)(struct kvm_vcpu *vcpu, enum kvm_reg reg);
>   unsigned long (*get_rflags)(struct kvm_vcpu *vcpu);
>   void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
> @@ -602,6 +601,8 @@ void kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long 
> cr0);
>  void kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3);
>  void kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
>  void kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8);
> +int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val);
> +int kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val);
>  unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu);
>  void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw);
>  void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l);
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index d04c7ad..c773a46 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -1303,70 +1303,11 @@ static void new_asid(struct vcpu_svm *svm, struct 
> svm_cpu_data *sd)
>   svm->vmcb->control.asid = sd->next_asid++;
>  }
>  
> -static int svm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *dest)
> +static void svm_set_dr7(struct kvm_vcpu *vcpu, unsigned long value)
>  {
>   struct vcpu_svm *svm = to_svm(vcpu);
>  
> - switch (dr) {
> - case 0 ... 3:
> - *dest = vcpu->arch.db[dr];
> - break;
> - case 4:
> - if (kvm_read_cr4_bits(vcpu, X86_CR4_DE))
> - return EMULATE_FAIL; /* will re-inject UD */
> - /* fall through */
> - case 6:
> - if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)
> - *dest = vcpu->arch.dr6;
> - else
> - *dest = svm->vmcb->save.dr6;
> - break;
> - case 5:
> - if (kvm_read_cr4_bits(vcpu, X86_CR4_DE))
> - return EMULATE_FAIL; /* will re-inject UD */
> - /* fall through */
> - case 7:
> - if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)
> - *dest = vcpu->arch.dr7;
> - else
> - *dest = svm->vmcb->save.dr7;
> - break;
> - }
> -
> - return EMULATE_DONE;
> -}
> -
> -static int svm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long value)
> -{
> - struct vcpu_svm *svm = to_svm(vcpu);
> -
> - switch (dr) {
> - case 0 ... 3:
> - vcpu->arch.db[dr] = value;
> - if (!(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP))
> - vcpu->arch.eff_db[dr] = value;
> - break;
> - case 4:
> - if (kvm_read_cr4_bits(vcpu, X86_CR4_DE))
> - return EMULATE_FAIL; /* will re-inject UD */
> - /* fall through */
> - case 6:
> - vcpu->arch.dr6 = (value & DR6_VOLATILE) | DR6_FIXED_1;
> - break;
> - case 5:
> - if (kvm_read_cr4_bits(vcpu, X86_CR4_DE))
> - return EMULATE_FAIL; /* will re-inject UD */
> - /* fall through */
> - case 7:
> - vcpu->arch.dr7 = (value & DR7_VOLATILE) | DR7_FIXED_1;
> - if (!(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)) {
> - svm->vmcb->save.dr7 = vcpu->arch.dr7;
> - vcpu->arch.switch_db_regs = (value & DR7_BP_EN_MASK);
> - }
> - break;
> - }
> -
> - return EMULATE_DONE;
> + svm->vmcb->save.dr7 = value;
>  }
>  
>  static int pf_interception(struct vcpu_svm *svm)
> @@ -3298,8 +3239,7 @@ static struct kvm_x86_ops svm_x86_ops = {
>   .set_idt = svm_set_idt,
>   .get_gdt = svm_get_gdt,
>   .set_gdt = svm_set_gdt,
> - .get_dr = svm_get_dr,
> - .set_dr = svm_set_dr,
> + .set_dr7 = svm_set_dr7,
>   .cache_reg = svm_cache_reg,
>   .get_rflags = svm_get_rflags,
>   .set_rflags = svm_set_rflags,
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 41e63bb..53c42f0 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -3082,19 +3082,9 

Re: [PATCH 2/6] KVM MMU: fix kvm_mmu_zap_page() and its calling path

2010-04-13 Thread Marcelo Tosatti
On Tue, Apr 13, 2010 at 09:34:14AM +0800, Xiao Guangrong wrote:
> 
> 
> Marcelo Tosatti wrote:
> 
> >> @@ -1483,8 +1483,8 @@ static int mmu_zap_unsync_children(struct kvm *kvm,
> >>for_each_sp(pages, sp, parents, i) {
> >>kvm_mmu_zap_page(kvm, sp);
> >>mmu_pages_clear_parents(&parents);
> >> +  zapped++;
> >>}
> >> -  zapped += pages.nr;
> >>kvm_mmu_pages_init(parent, &parents, &pages);
> >>}
> > 
> > Don't see why this is needed? The for_each_sp loop uses pvec.nr.
> 
> I think mmu_zap_unsync_children() should return the number of zapped pages 
> then we
> can adjust the number of free pages in kvm_mmu_change_mmu_pages(), but 
> pages.nr no
> only includes the unsync/zapped pages but also includes their parents.

Oh i see. I think its safer to check for list_empty then to rely on
proper accounting there, like __kvm_mmu_free_some_pages does.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] KVM MMU: optimize/cleanup for marking parent unsync

2010-04-13 Thread Marcelo Tosatti
On Tue, Apr 13, 2010 at 09:53:07AM +0800, Xiao Guangrong wrote:
> 
> 
> Marcelo Tosatti wrote:
> 
> > Xiao,
> > 
> > Did you actually see this codepath as being performance sensitive? 
> 
> Actually, i not run benchmarks to contrast the performance before this patch
> and after this patch.
> 
> > 
> > I'd prefer to not touch it.
> 
> This patch avoids walk all parents and i think this overload is really 
> unnecessary.
> It has other tricks in this codepath but i not noticed? :-)

My point is that there is no point in optimizing something unless its
performance sensitive. And as i recall, mmu_unsync_walk was much more
sensitive performance wise than parent walking. Actually, gfn_to_memslot 
seems more important since its also noticeable on EPT/NPT hosts.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost-net: fix vq_memory_access_ok error checking

2010-04-13 Thread Michael S. Tsirkin
On Tue, Apr 13, 2010 at 06:01:21PM +0300, Michael S. Tsirkin wrote:
> On Wed, Apr 07, 2010 at 09:59:10AM -0400, Jeff Dike wrote:
> > vq_memory_access_ok needs to check whether mem == NULL
> > 
> > Signed-off-by: Jeff Dike 
> > Signed-off-by: Michael S. Tsirkin 
> 
> This was already queued by me, you do not need to
> fill Dave's inbox with vhost patches.

This was sent in error, please ignore.
Sorry about the noise.

> > ---
> >  drivers/vhost/vhost.c |4 
> >  1 files changed, 4 insertions(+), 0 deletions(-)
> > 
> > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > index 7bd7a1e..b8e1127 100644
> > --- a/drivers/vhost/vhost.c
> > +++ b/drivers/vhost/vhost.c
> > @@ -235,6 +235,10 @@ static int vq_memory_access_ok(void __user *log_base, 
> > struct vhost_memory *mem,
> >int log_all)
> >  {
> > int i;
> > +
> > +if (!mem)
> > +return 0;
> > +
> > for (i = 0; i < mem->nregions; ++i) {
> > struct vhost_memory_region *m = mem->regions + i;
> > unsigned long a = m->userspace_addr;
> > -- 
> > 1.7.0.2.280.gc6f05
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost-net: fix vq_memory_access_ok error checking

2010-04-13 Thread Michael S. Tsirkin
On Wed, Apr 07, 2010 at 09:59:10AM -0400, Jeff Dike wrote:
> vq_memory_access_ok needs to check whether mem == NULL
> 
> Signed-off-by: Jeff Dike 
> Signed-off-by: Michael S. Tsirkin 

This was already queued by me, you do not need to
fill Dave's inbox with vhost patches.

> ---
>  drivers/vhost/vhost.c |4 
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 7bd7a1e..b8e1127 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -235,6 +235,10 @@ static int vq_memory_access_ok(void __user *log_base, 
> struct vhost_memory *mem,
>  int log_all)
>  {
>   int i;
> +
> +if (!mem)
> +return 0;
> +
>   for (i = 0; i < mem->nregions; ++i) {
>   struct vhost_memory_region *m = mem->regions + i;
>   unsigned long a = m->userspace_addr;
> -- 
> 1.7.0.2.280.gc6f05
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] qemu-kvm: emulator tests: fix msr test

2010-04-13 Thread Naphtali Sprei

use correct 64 bit mode inline assembly constraints
use a canonical form address when writing to the MSR_KERNEL_GS_BASE MSR

Signed-off-by: Naphtali Sprei 
---
 kvm/user/test/x86/msr.c |   14 --
 1 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/kvm/user/test/x86/msr.c b/kvm/user/test/x86/msr.c
index 92102fa..0d6f286 100644
--- a/kvm/user/test/x86/msr.c
+++ b/kvm/user/test/x86/msr.c
@@ -17,23 +17,25 @@ static void report(const char *name, int passed)
 
 static void wrmsr(unsigned index, unsigned long long value)
 {
-   asm volatile ("wrmsr" : : "c"(index), "A"(value));
+   unsigned a = value, d = value >> 32;
+
+   asm volatile("wrmsr" : : "a"(a), "d"(d), "c"(index));
 }
 
 static unsigned long long rdmsr(unsigned index)
 {
-   unsigned long long value;
-
-   asm volatile ("rdmsr" : "=A"(value) : "c"(index));
+   unsigned a, d;
 
-   return value;
+   asm volatile("rdmsr" : "=a"(a), "=d"(d) : "c"(index));
+   return ((unsigned long long)d << 32) | a;
 }
+
 #endif
 
 static void test_kernel_gs_base(void)
 {
 #ifdef __x86_64__
-   unsigned long long v1 = 0x123456789abcdef, v2;
+   unsigned long long v1 = 0x123456789abc, v2;
 
wrmsr(MSR_KERNEL_GS_BASE, v1);
v2 = rdmsr(MSR_KERNEL_GS_BASE);
-- 
1.6.3.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call minutes for Apr 13

2010-04-13 Thread Chris Wright
No topics, short call
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: cleanup: limit the number of pages per memory slot

2010-04-13 Thread Takuya Yoshikawa
This is based on my last patch:
  "fix the handling of dirty bitmaps to avoid overflows"

Sorry, this resulted in reverting part of that.

===

This patch limits the number of pages per memory slot to make
us free from extra care about type issues.

Signed-off-by: Takuya Yoshikawa 
---
 include/linux/kvm_host.h |6 ++
 virt/kvm/kvm_main.c  |   11 ++-
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 169d077..5583063 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -105,6 +105,12 @@ struct kvm_vcpu {
struct kvm_vcpu_arch arch;
 };
 
+/*
+ * Some of the bitops functions do not support too long bitmaps.
+ * This number must be determined not to exceed such limits.
+ */
+#define KVM_MEM_MAX_NR_PAGES ((1UL << 31) - 1)
+
 struct kvm_memory_slot {
gfn_t base_gfn;
unsigned long npages;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6fe79c4..799ff1f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -553,6 +553,10 @@ int __kvm_set_memory_region(struct kvm *kvm,
base_gfn = mem->guest_phys_addr >> PAGE_SHIFT;
npages = mem->memory_size >> PAGE_SHIFT;
 
+   r = -EINVAL;
+   if (npages > KVM_MEM_MAX_NR_PAGES)
+   goto out;
+
if (!npages)
mem->flags &= ~KVM_MEM_LOG_DIRTY_PAGES;
 
@@ -1183,13 +1187,10 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
memslot = gfn_to_memslot_unaliased(kvm, gfn);
if (memslot && memslot->dirty_bitmap) {
unsigned long rel_gfn = gfn - memslot->base_gfn;
-   unsigned long *p = memslot->dirty_bitmap +
-   rel_gfn / BITS_PER_LONG;
-   int offset = rel_gfn % BITS_PER_LONG;
 
/* avoid RMW */
-   if (!generic_test_le_bit(offset, p))
-   generic___set_le_bit(offset, p);
+   if (!generic_test_le_bit(rel_gfn, memslot->dirty_bitmap))
+   generic___set_le_bit(rel_gfn, memslot->dirty_bitmap);
}
 }
 
-- 
1.6.3.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6] KVM MMU: optimize/cleanup for marking parent unsync

2010-04-13 Thread Avi Kivity

On 04/13/2010 04:53 AM, Xiao Guangrong wrote:



I'd prefer to not touch it.
 

This patch avoids walk all parents and i think this overload is really 
unnecessary.
It has other tricks in this codepath but i not noticed? :-)
   


There is also the advantage of code size reduction.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/6] Modifies wrapper functions for byte-based phys_ram_dirty bitmap to bit-based phys_ram_dirty bitmap.

2010-04-13 Thread Yoshiaki Tamura

Avi Kivity wrote:

On 04/13/2010 11:01 AM, Yoshiaki Tamura wrote:

Avi Kivity wrote:

On 04/12/2010 01:58 PM, Yoshiaki Tamura wrote:

Is it necessary to update migration and vga bitmaps?

We can simply update the master bitmap, and update the migration
and vga
bitmaps only when they need it. That can be done in a different patch.



Let me explain the role of the master bitmap here.

In the previous discussion, the master bitmap represented at least one
dirty type is actually dirty. While implementing this approach, I
though there is one downside that upon resetting the bitmap, it needs
to check all dirty types whether they are already cleared to unset the
master bitmap, and this might hurt the performance in general.


The way I see it, the master bitmap is a buffer between the guest and
all the other client bitmaps (migration and vga). So, a bit can be set
in vga and cleared in master. Let's look at the following scenario:

1. guest touches address 0xa (page 0xa0)
2. qemu updates master bitmap bit 0xa0
3. vga timer fires
4. vga syncs the dirty bitmap for addresses 0xa-0xc
4.1 vga_bitmap[0xa0-0xc0] |= master_bitmap[0xa0-0xc0]
4.2 migration_bitmap[0xa0-0xc0] |= master_bitmap[0xa0-0xc0]
4.3 master_bitmap[0xa0-0xc0] = 0
5. vga draws based on vga_bitmap

the nice thing is that step 4.2 can be omitted if migration is not
active.

so, client bitmaps are always updated when the client requests them,
master bitmap is only a buffer.


Thanks. I think I'm finally getting your point.
At 4.2 and 4.3, is syncing to client necessary?


Yes. We want to clear the master bitmap, otherwise on the next iteration
we will get dirty bits that may be spurious. But if we clear the dirty
bitmap, we must copy it to all client bitmaps, not just vga, otherwise
we lose data.


Simply doing OR at get_dirty like in this patch not enough?


If you don't clear the master bitmap, then you will get the same bits at
the next iteration.


OK.
Master is just a cache/buffer, and upon get_dirty, we copy it to client bitmaps 
with for loop, clear the master and finally check actual client bitmap.  If 
we're going to allocate client bitmaps dynamically, we need to check whether it 
isn't null too.



In this patch, master bitmap represents all types are dirty, similar
to existing 0xff. With this approach, resetting the master bitmap can
be done without checking the other types. set_dirty_flags is actually
taking the burden in this case though. Anyway, IIUC somebody would be
unhappy depending on the role of the master bitmap.


Yes, the problem is that set_dirty_flags is the common case and also
uses random access, we'd want to make it touch only a single bit. client
access is rare, and also sequential, and therefore efficient.


I see.
So set_dirty_flags would be like,

1. set master first regard less of dirty type.
2. if dirty type was CODE, set the client.


There really should be two APIs, one for general dirtyness and one for
code dirty. As Anthony said, the code dirty bitmap is completely
separate from the other uses.

Consider starting by separating the two uses, it may clear up the code
and make things easier later on.


That really make sense.  A little bit hesitant because the code might look a bit 
duplicated.



In summary, I'll prepare two patch sets.

1. v3 of this patch set.
- Change FLAGS value to (1,2,4,8), and add IDX (0,1,2,3)
- Use ffsl to convert FLAGS to IDX.
- Add help function which takes IDX.
- Change the behavior of set_dirty_flags as above.
- Change dirty bitmap access to a loop.
- Add brace after if ()
- Move some macros to qemu-common.h.

2. Allocate vga and migration dirty bitmap dynamically.

Please modify or add items if I were missing.


I would add separation of CODE and the other dirty bitmaps.


I'm not sure I should starting separating first or later at this moment, but 
will consider it.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] get rid of mmu_only parameter in emulator_write_emulated()

2010-04-13 Thread Avi Kivity

On 04/13/2010 10:40 AM, Gleb Natapov wrote:

The code path executed in case of cmpxchg crossing page boundary is not
touched by the patch as far as I can see. In this case
emulator_write_emulated() is executed with mmu_only false and
kvm_mmu_pte_write() is called from emulator_write_phys().

   


Yes. Patch is correct.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/6] Modifies wrapper functions for byte-based phys_ram_dirty bitmap to bit-based phys_ram_dirty bitmap.

2010-04-13 Thread Avi Kivity

On 04/13/2010 11:01 AM, Yoshiaki Tamura wrote:

Avi Kivity wrote:

On 04/12/2010 01:58 PM, Yoshiaki Tamura wrote:

Is it necessary to update migration and vga bitmaps?

We can simply update the master bitmap, and update the migration 
and vga

bitmaps only when they need it. That can be done in a different patch.



Let me explain the role of the master bitmap here.

In the previous discussion, the master bitmap represented at least one
dirty type is actually dirty. While implementing this approach, I
though there is one downside that upon resetting the bitmap, it needs
to check all dirty types whether they are already cleared to unset the
master bitmap, and this might hurt the performance in general.


The way I see it, the master bitmap is a buffer between the guest and
all the other client bitmaps (migration and vga). So, a bit can be set
in vga and cleared in master. Let's look at the following scenario:

1. guest touches address 0xa (page 0xa0)
2. qemu updates master bitmap bit 0xa0
3. vga timer fires
4. vga syncs the dirty bitmap for addresses 0xa-0xc
4.1 vga_bitmap[0xa0-0xc0] |= master_bitmap[0xa0-0xc0]
4.2 migration_bitmap[0xa0-0xc0] |= master_bitmap[0xa0-0xc0]
4.3 master_bitmap[0xa0-0xc0] = 0
5. vga draws based on vga_bitmap

the nice thing is that step 4.2 can be omitted if migration is not 
active.


so, client bitmaps are always updated when the client requests them,
master bitmap is only a buffer.


Thanks.  I think I'm finally getting your point.
At 4.2 and 4.3, is syncing to client necessary?


Yes.  We want to clear the master bitmap, otherwise on the next 
iteration we will get dirty bits that may be spurious.  But if we clear 
the dirty bitmap, we must copy it to all client bitmaps, not just vga, 
otherwise we lose data.



Simply doing OR at get_dirty like in this patch not enough?


If you don't clear the master bitmap, then you will get the same bits at 
the next iteration.





In this patch, master bitmap represents all types are dirty, similar
to existing 0xff. With this approach, resetting the master bitmap can
be done without checking the other types. set_dirty_flags is actually
taking the burden in this case though. Anyway, IIUC somebody would be
unhappy depending on the role of the master bitmap.


Yes, the problem is that set_dirty_flags is the common case and also
uses random access, we'd want to make it touch only a single bit. client
access is rare, and also sequential, and therefore efficient.


I see.
So set_dirty_flags would be like,

1. set master first regard less of dirty type.
2. if dirty type was CODE, set the client.


There really should be two APIs, one for general dirtyness and one for 
code dirty.  As Anthony said, the code dirty bitmap is completely 
separate from the other uses.


Consider starting by separating the two uses, it may clear up the code 
and make things easier later on.




In summary, I'll prepare two patch sets.

1. v3 of this patch set.
- Change FLAGS value to (1,2,4,8), and add IDX (0,1,2,3)
- Use ffsl to convert FLAGS to IDX.
- Add help function which takes IDX.
- Change the behavior of set_dirty_flags as above.
- Change dirty bitmap access to a loop.
- Add brace after if ()
- Move some macros to qemu-common.h.

2. Allocate vga and migration dirty bitmap dynamically.

Please modify or add items if I were missing.


I would add separation of CODE and the other dirty bitmaps.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/6] Modifies wrapper functions for byte-based phys_ram_dirty bitmap to bit-based phys_ram_dirty bitmap.

2010-04-13 Thread Yoshiaki Tamura

Avi Kivity wrote:

On 04/12/2010 01:58 PM, Yoshiaki Tamura wrote:

Is it necessary to update migration and vga bitmaps?

We can simply update the master bitmap, and update the migration and vga
bitmaps only when they need it. That can be done in a different patch.



Let me explain the role of the master bitmap here.

In the previous discussion, the master bitmap represented at least one
dirty type is actually dirty. While implementing this approach, I
though there is one downside that upon resetting the bitmap, it needs
to check all dirty types whether they are already cleared to unset the
master bitmap, and this might hurt the performance in general.


The way I see it, the master bitmap is a buffer between the guest and
all the other client bitmaps (migration and vga). So, a bit can be set
in vga and cleared in master. Let's look at the following scenario:

1. guest touches address 0xa (page 0xa0)
2. qemu updates master bitmap bit 0xa0
3. vga timer fires
4. vga syncs the dirty bitmap for addresses 0xa-0xc
4.1 vga_bitmap[0xa0-0xc0] |= master_bitmap[0xa0-0xc0]
4.2 migration_bitmap[0xa0-0xc0] |= master_bitmap[0xa0-0xc0]
4.3 master_bitmap[0xa0-0xc0] = 0
5. vga draws based on vga_bitmap

the nice thing is that step 4.2 can be omitted if migration is not active.

so, client bitmaps are always updated when the client requests them,
master bitmap is only a buffer.


Thanks.  I think I'm finally getting your point.
At 4.2 and 4.3, is syncing to client necessary?
Simply doing OR at get_dirty like in this patch not enough?


In this patch, master bitmap represents all types are dirty, similar
to existing 0xff. With this approach, resetting the master bitmap can
be done without checking the other types. set_dirty_flags is actually
taking the burden in this case though. Anyway, IIUC somebody would be
unhappy depending on the role of the master bitmap.


Yes, the problem is that set_dirty_flags is the common case and also
uses random access, we'd want to make it touch only a single bit. client
access is rare, and also sequential, and therefore efficient.


I see.
So set_dirty_flags would be like,

1. set master first regard less of dirty type.
2. if dirty type was CODE, set the client.


Note that we should only allocate the migration and vga bitmaps when
migration or vga is active.


Right. May I do it in a different patch?


Sure, more patches is usually better.


I think this is about optimization. In this patch, I focused on not
breaking the existing function.


Yes, that's the best approach. But we do need to see how all the
incremental steps end up.


I totally agree.  Thanks for helping.

In summary, I'll prepare two patch sets.

1. v3 of this patch set.
- Change FLAGS value to (1,2,4,8), and add IDX (0,1,2,3)
- Use ffsl to convert FLAGS to IDX.
- Add help function which takes IDX.
- Change the behavior of set_dirty_flags as above.
- Change dirty bitmap access to a loop.
- Add brace after if ()
- Move some macros to qemu-common.h.

2. Allocate vga and migration dirty bitmap dynamically.

Please modify or add items if I were missing.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] get rid of mmu_only parameter in emulator_write_emulated()

2010-04-13 Thread Gleb Natapov
On Tue, Apr 13, 2010 at 10:35:53AM +0300, Avi Kivity wrote:
> On 04/13/2010 10:26 AM, Gleb Natapov wrote:
> >On Tue, Apr 13, 2010 at 10:24:40AM +0300, Avi Kivity wrote:
> >>On 04/13/2010 10:21 AM, Gleb Natapov wrote:
> >>>May be I am missing something here, but it seams we can call
> >>>kvm_mmu_pte_write() directly from emulator_cmpxchg_emulated()
> >>>instead of passing mmu_only down to emulator_write_emulated_onepage()
> >>>and call it there.
> >>>
> >>>
> >>>@@ -3460,7 +3444,9 @@ static int emulator_cmpxchg_emulated(unsigned long 
> >>>addr,
> >>>   if (!exchanged)
> >>>   return X86EMUL_CMPXCHG_FAILED;
> >>>
> >>>-  return __emulator_write_emulated(addr, new, bytes, vcpu, true);
> >>>+  kvm_mmu_pte_write(vcpu, gpa, new, bytes, 1);
> >>>+
> >>>+  return X86EMUL_CONTINUE;
> >>>
> >>The written range might cross a page boundary, which
> >>kvm_mmu_pte_write() is not prepared to handle.
> >>
> >Don't we emulate exchange as write in this case?
> >
> > if (((gpa + bytes - 1)&  PAGE_MASK) != (gpa&  PAGE_MASK))
> > goto emul_write;
> 
> We do, but that's unrelated.  We still have to invalidate potential
> ptes on both pages
> 
The code path executed in case of cmpxchg crossing page boundary is not
touched by the patch as far as I can see. In this case
emulator_write_emulated() is executed with mmu_only false and
kvm_mmu_pte_write() is called from emulator_write_phys().

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] get rid of mmu_only parameter in emulator_write_emulated()

2010-04-13 Thread Avi Kivity

On 04/13/2010 10:26 AM, Gleb Natapov wrote:

On Tue, Apr 13, 2010 at 10:24:40AM +0300, Avi Kivity wrote:
   

On 04/13/2010 10:21 AM, Gleb Natapov wrote:
 

May be I am missing something here, but it seams we can call
kvm_mmu_pte_write() directly from emulator_cmpxchg_emulated()
instead of passing mmu_only down to emulator_write_emulated_onepage()
and call it there.


@@ -3460,7 +3444,9 @@ static int emulator_cmpxchg_emulated(unsigned long addr,
if (!exchanged)
return X86EMUL_CMPXCHG_FAILED;

-   return __emulator_write_emulated(addr, new, bytes, vcpu, true);
+   kvm_mmu_pte_write(vcpu, gpa, new, bytes, 1);
+
+   return X86EMUL_CONTINUE;

   

The written range might cross a page boundary, which
kvm_mmu_pte_write() is not prepared to handle.

 

Don't we emulate exchange as write in this case?

if (((gpa + bytes - 1)&  PAGE_MASK) != (gpa&  PAGE_MASK))
goto emul_write;
   


We do, but that's unrelated.  We still have to invalidate potential ptes 
on both pages


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] get rid of mmu_only parameter in emulator_write_emulated()

2010-04-13 Thread Gleb Natapov
On Tue, Apr 13, 2010 at 10:24:40AM +0300, Avi Kivity wrote:
> On 04/13/2010 10:21 AM, Gleb Natapov wrote:
> >May be I am missing something here, but it seams we can call
> >kvm_mmu_pte_write() directly from emulator_cmpxchg_emulated()
> >instead of passing mmu_only down to emulator_write_emulated_onepage()
> >and call it there.
> >
> >
> >@@ -3460,7 +3444,9 @@ static int emulator_cmpxchg_emulated(unsigned long 
> >addr,
> > if (!exchanged)
> > return X86EMUL_CMPXCHG_FAILED;
> >
> >-return __emulator_write_emulated(addr, new, bytes, vcpu, true);
> >+kvm_mmu_pte_write(vcpu, gpa, new, bytes, 1);
> >+
> >+return X86EMUL_CONTINUE;
> >
> 
> The written range might cross a page boundary, which
> kvm_mmu_pte_write() is not prepared to handle.
> 
Don't we emulate exchange as write in this case?

if (((gpa + bytes - 1) & PAGE_MASK) != (gpa & PAGE_MASK))
goto emul_write;

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] get rid of mmu_only parameter in emulator_write_emulated()

2010-04-13 Thread Avi Kivity

On 04/13/2010 10:21 AM, Gleb Natapov wrote:

May be I am missing something here, but it seams we can call
kvm_mmu_pte_write() directly from emulator_cmpxchg_emulated()
instead of passing mmu_only down to emulator_write_emulated_onepage()
and call it there.


@@ -3460,7 +3444,9 @@ static int emulator_cmpxchg_emulated(unsigned long addr,
if (!exchanged)
return X86EMUL_CMPXCHG_FAILED;

-   return __emulator_write_emulated(addr, new, bytes, vcpu, true);
+   kvm_mmu_pte_write(vcpu, gpa, new, bytes, 1);
+
+   return X86EMUL_CONTINUE;

   


The written range might cross a page boundary, which kvm_mmu_pte_write() 
is not prepared to handle.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] get rid of mmu_only parameter in emulator_write_emulated()

2010-04-13 Thread Gleb Natapov
May be I am missing something here, but it seams we can call
kvm_mmu_pte_write() directly from emulator_cmpxchg_emulated()
instead of passing mmu_only down to emulator_write_emulated_onepage()
and call it there.

Signed-off-by: Gleb Natapov 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a4e55ae..8ab30e1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3327,8 +3327,7 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
 static int emulator_write_emulated_onepage(unsigned long addr,
   const void *val,
   unsigned int bytes,
-  struct kvm_vcpu *vcpu,
-  bool mmu_only)
+  struct kvm_vcpu *vcpu)
 {
gpa_t gpa;
u32 error_code;
@@ -3344,10 +3343,6 @@ static int emulator_write_emulated_onepage(unsigned long 
addr,
if ((gpa & PAGE_MASK) == APIC_DEFAULT_PHYS_BASE)
goto mmio;
 
-   if (mmu_only) {
-   kvm_mmu_pte_write(vcpu, gpa, val, bytes, 1);
-   return X86EMUL_CONTINUE;
-   }
if (emulator_write_phys(vcpu, gpa, val, bytes))
return X86EMUL_CONTINUE;
 
@@ -3368,35 +3363,24 @@ mmio:
return X86EMUL_CONTINUE;
 }
 
-int __emulator_write_emulated(unsigned long addr,
-  const void *val,
-  unsigned int bytes,
-  struct kvm_vcpu *vcpu,
-  bool mmu_only)
+int emulator_write_emulated(unsigned long addr,
+   const void *val,
+   unsigned int bytes,
+   struct kvm_vcpu *vcpu)
 {
/* Crossing a page boundary? */
if (((addr + bytes - 1) ^ addr) & PAGE_MASK) {
int rc, now;
 
now = -addr & ~PAGE_MASK;
-   rc = emulator_write_emulated_onepage(addr, val, now, vcpu,
-mmu_only);
+   rc = emulator_write_emulated_onepage(addr, val, now, vcpu);
if (rc != X86EMUL_CONTINUE)
return rc;
addr += now;
val += now;
bytes -= now;
}
-   return emulator_write_emulated_onepage(addr, val, bytes, vcpu,
-  mmu_only);
-}
-
-int emulator_write_emulated(unsigned long addr,
-  const void *val,
-  unsigned int bytes,
-  struct kvm_vcpu *vcpu)
-{
-   return __emulator_write_emulated(addr, val, bytes, vcpu, false);
+   return emulator_write_emulated_onepage(addr, val, bytes, vcpu);
 }
 EXPORT_SYMBOL_GPL(emulator_write_emulated);
 
@@ -3460,7 +3444,9 @@ static int emulator_cmpxchg_emulated(unsigned long addr,
if (!exchanged)
return X86EMUL_CMPXCHG_FAILED;
 
-   return __emulator_write_emulated(addr, new, bytes, vcpu, true);
+   kvm_mmu_pte_write(vcpu, gpa, new, bytes, 1);
+
+   return X86EMUL_CONTINUE;
 
 emul_write:
printk_once(KERN_WARNING "kvm: emulating exchange as write\n");
@@ -4174,7 +4160,7 @@ int kvm_fix_hypercall(struct kvm_vcpu *vcpu)
 
kvm_x86_ops->patch_hypercall(vcpu, instruction);
 
-   return __emulator_write_emulated(rip, instruction, 3, vcpu, false);
+   return emulator_write_emulated(rip, instruction, 3, vcpu);
 }
 
 void realmode_lgdt(struct kvm_vcpu *vcpu, u16 limit, unsigned long base)
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: fix the handling of dirty bitmaps to avoid overflows

2010-04-13 Thread Avi Kivity

On 04/13/2010 10:03 AM, Takuya Yoshikawa wrote:

It's better to limit memory slots to something that can be handled by
everything, then. 2^31 pages is plenty. Return -EINVAL if the slot is
too large.



I agree with that, so we make this patch pending to fix like that?
  -- or should make a new patch based on this patch?


We need a new patch to block oversize memory slots.  The current patch 
can come on top (but now page numbers fit inside an int, so it is just a 
cleanup, not a bugfix).


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2] KVM: move DR register access handling into generic code

2010-04-13 Thread Gleb Natapov
Currently both SVM and VMX have their own DR handling code. Move it to
x86.c.

Changelog:
 v1->v2
  - kvm_set_dr() always return 1 in a case of error

Signed-off-by: Gleb Natapov 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0c49c88..5d5e0a9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -496,8 +496,7 @@ struct kvm_x86_ops {
void (*set_idt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
void (*get_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
void (*set_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt);
-   int (*get_dr)(struct kvm_vcpu *vcpu, int dr, unsigned long *dest);
-   int (*set_dr)(struct kvm_vcpu *vcpu, int dr, unsigned long value);
+   void (*set_dr7)(struct kvm_vcpu *vcpu, unsigned long value);
void (*cache_reg)(struct kvm_vcpu *vcpu, enum kvm_reg reg);
unsigned long (*get_rflags)(struct kvm_vcpu *vcpu);
void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags);
@@ -602,6 +601,8 @@ void kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
 void kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3);
 void kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
 void kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8);
+int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val);
+int kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val);
 unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu);
 void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw);
 void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index d04c7ad..c773a46 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1303,70 +1303,11 @@ static void new_asid(struct vcpu_svm *svm, struct 
svm_cpu_data *sd)
svm->vmcb->control.asid = sd->next_asid++;
 }
 
-static int svm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *dest)
+static void svm_set_dr7(struct kvm_vcpu *vcpu, unsigned long value)
 {
struct vcpu_svm *svm = to_svm(vcpu);
 
-   switch (dr) {
-   case 0 ... 3:
-   *dest = vcpu->arch.db[dr];
-   break;
-   case 4:
-   if (kvm_read_cr4_bits(vcpu, X86_CR4_DE))
-   return EMULATE_FAIL; /* will re-inject UD */
-   /* fall through */
-   case 6:
-   if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)
-   *dest = vcpu->arch.dr6;
-   else
-   *dest = svm->vmcb->save.dr6;
-   break;
-   case 5:
-   if (kvm_read_cr4_bits(vcpu, X86_CR4_DE))
-   return EMULATE_FAIL; /* will re-inject UD */
-   /* fall through */
-   case 7:
-   if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)
-   *dest = vcpu->arch.dr7;
-   else
-   *dest = svm->vmcb->save.dr7;
-   break;
-   }
-
-   return EMULATE_DONE;
-}
-
-static int svm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long value)
-{
-   struct vcpu_svm *svm = to_svm(vcpu);
-
-   switch (dr) {
-   case 0 ... 3:
-   vcpu->arch.db[dr] = value;
-   if (!(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP))
-   vcpu->arch.eff_db[dr] = value;
-   break;
-   case 4:
-   if (kvm_read_cr4_bits(vcpu, X86_CR4_DE))
-   return EMULATE_FAIL; /* will re-inject UD */
-   /* fall through */
-   case 6:
-   vcpu->arch.dr6 = (value & DR6_VOLATILE) | DR6_FIXED_1;
-   break;
-   case 5:
-   if (kvm_read_cr4_bits(vcpu, X86_CR4_DE))
-   return EMULATE_FAIL; /* will re-inject UD */
-   /* fall through */
-   case 7:
-   vcpu->arch.dr7 = (value & DR7_VOLATILE) | DR7_FIXED_1;
-   if (!(vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP)) {
-   svm->vmcb->save.dr7 = vcpu->arch.dr7;
-   vcpu->arch.switch_db_regs = (value & DR7_BP_EN_MASK);
-   }
-   break;
-   }
-
-   return EMULATE_DONE;
+   svm->vmcb->save.dr7 = value;
 }
 
 static int pf_interception(struct vcpu_svm *svm)
@@ -3298,8 +3239,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.set_idt = svm_set_idt,
.get_gdt = svm_get_gdt,
.set_gdt = svm_set_gdt,
-   .get_dr = svm_get_dr,
-   .set_dr = svm_set_dr,
+   .set_dr7 = svm_set_dr7,
.cache_reg = svm_cache_reg,
.get_rflags = svm_get_rflags,
.set_rflags = svm_set_rflags,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 41e63bb..53c42f0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3082,19 +3082,9 @@ static int handle_cr(struct kvm_vcpu *vcpu)
return 0;
 }
 
-static int check_dr_alias(struct kvm_vcpu *vcpu)
-{
-   if (kvm_read_cr4

Re: [PATCH] KVM: fix the handling of dirty bitmaps to avoid overflows

2010-04-13 Thread Takuya Yoshikawa




BTW, just from my curiosity, are there any cases in which we use such
huge
number of pages currently?

ALIGN(memslot->npages, BITS_PER_LONG) / 8;

More than G pages need really big memory!
-- We are assuming some special cases like "short" int size?


No, int is 32 bits, but memslot->npages is not our under control.

Note that you don't actually need all those pages to create a large
memory slot.



If so, we may have to care about a lot of things from now on, because
common
functions like __set_bit() don't support such long buffers.


It's better to limit memory slots to something that can be handled by
everything, then. 2^31 pages is plenty. Return -EINVAL if the slot is
too large.


I agree with that, so we make this patch pending to fix like that?
  -- or should make a new patch based on this patch?






--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html