date:20080116

Re: [kvm-devel] [PATCH 0 of 2] A couple ifdefs

2008-01-16 Thread Avi Kivity

Hollis Blanchard wrote:
> These small ifdefs are necessary for integration of the PowerPC port.
>
>   

Only patch 2 of 2 made it.

-- 
error compiling committee.c: too many arguments to function

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 2 of 2] Use CONFIG_PREEMPT_NOTIFIERS around struct preempt_notifier

2008-01-16 Thread Avi Kivity

Hollis Blanchard wrote:
> # HG changeset patch
> # User Hollis Blanchard <[EMAIL PROTECTED]>
> # Date 1200434370 21600
> # Node ID 9878c9cec5f831ff5e9b97539aabc5fa3d934501
> # Parent  931a81e1002110be0e8bf5b335bf199d43534c2c
> This allows kvm_host.h to be #included even when struct preempt_notifier is
> undefined.
>
>   

Don't you actually need preempt notifiers?  They are useful if you have 
state that is only needed from userspace, but is expensive to switch.  
For x86, this is the syscall msrs (which define the syscall entry 
point), the fpu (which is not used in the kernel), and a few other bits 
(which I'm too lazy too look up and are esoteric anyway).

-- 
error compiling committee.c: too many arguments to function

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 1 of 6] Move IO handling code to a separate file

2008-01-16 Thread Avi Kivity

Hollis Blanchard wrote:
> # HG changeset patch
> # User Hollis Blanchard <[EMAIL PROTECTED]>
> # Date 1200436754 21600
> # Node ID c6e8bf3f9f7c9705a0ad29f44fa148fe80a365ff
> # Parent  f22e390c06b78ffbcec4738112309f66267e3582
> This will allow other architectures to share it, since main.c is x86-only.
>
>   

Applied patches 1-4.  Can we not avoid the duplication in 5?

-- 
error compiling committee.c: too many arguments to function

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 2/3] kvmclock - the host part.

2008-01-16 Thread Avi Kivity

Glauber de Oliveira Costa wrote:

> +static void kvm_write_guest_time(struct kvm_vcpu *v)
> +{
> + struct timespec ts, wc_ts;
> + int wc_args[3]; /* version, wc_sec, wc_nsec */
> + unsigned long flags;
> + struct kvm_vcpu_arch *vcpu = &v->arch;
> + struct xen_shared_info *shared_kaddr;
> +
> + if ((!vcpu->shared_page))
> + return;
> +
> + /* Keep irq disabled to prevent changes to the clock */
> + local_irq_save(flags);
> + kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER,
> +   &vcpu->hv_clock.tsc_timestamp);
> + wc_ts = current_kernel_time();
> + ktime_get_ts(&ts);
> + local_irq_restore(flags);
> +
> + /* With all the info we got, fill in the values */
> + wc_args[1] = wc_ts.tv_sec;
> + wc_args[2] = wc_ts.tv_nsec;
> +
> + vcpu->hv_clock.system_time = ts.tv_nsec +
> +  (NSEC_PER_SEC * (u64)ts.tv_sec);
> + /*
> +  * The interface expects us to write an even number signaling that the
> +  * update is finished. Since the guest won't see the intermediate 
> states,
> +  * we just write "2" at the end
> +  */
> + wc_args[0] = 2;
> + vcpu->hv_clock.version = 2;
> +
> + preempt_disable();
> +
> + shared_kaddr = kmap_atomic(vcpu->shared_page, KM_USER0);
> +
> + /*
> +  * We could write everything at once, but it can break future
> +  * implementations. We're just a tiny and lonely clock, so let's
> +  * write only what matters here
> +  */
> + memcpy(&shared_kaddr->wc_version, wc_args, sizeof(wc_args));
>   


We want to avoid updating wall clock all the time.  As far as I 
understand, wall clock is just a base which doesn't change.  To get the 
real wall clock, you read the shared_info wall clock and add the current 
system time.  This means that you avoid writing to a shared global  
(which is expensive in cache lines).  The shared_info wall clock is only 
updated if the host clocked is moved (other than in the way you expect 
it to).

Also, when you write to the shared clock, you must respect the protocol 
since it can be read concurrently:

- increment the version
- smp_wmb()
- copy the goodies
- smp_wmb()
- increment the version again

[I think this is the protocol, but better read the sources to double-check]

>   }
> diff --git a/include/asm-x86/kvm_host.h b/include/asm-x86/kvm_host.h
> index d6db0de..9a66b90 100644
> --- a/include/asm-x86/kvm_host.h
> +++ b/include/asm-x86/kvm_host.h
> @@ -261,6 +261,10 @@ struct kvm_vcpu_arch {
>   /* emulate context */
>  
>   struct x86_emulate_ctxt emulate_ctxt;
> +
> + struct xen_vcpu_time_info hv_clock;
> + gpa_t shared_info;
> + struct page *shared_page;
>  };
>   

shared_{info,page} is too generic a name for just a clock.

>  
> +/* xen binary-compatible interfaces. See xen headers for details */
> +struct xen_vcpu_time_info {
> + uint32_t version;
> + uint32_t pad0;
> + uint64_t tsc_timestamp;
> + uint64_t system_time;
> + uint32_t tsc_to_system_mul;
> + int8_t   tsc_shift;
> + int8_t   pad1[3];
> +};
> +
> +struct xen_vcpu_info {
> + uint8_t  pad[32];
> + struct xen_vcpu_time_info time;
> +};
>   

Please drop xen_vcpu_info...

> +
> +#define XEN_MAX_VIRT_CPUS32
> +
> +struct xen_shared_info {
> + struct xen_vcpu_info vcpu_info[XEN_MAX_VIRT_CPUS];
> +
> + unsigned long evt[2];
> +
> + uint32_t wc_version;  /* Version counter: see vcpu_time_info_t. */
> + uint32_t wc_sec;  /* Secs  00:00:00 UTC, Jan 1, 1970.  */
> + uint32_t wc_nsec; /* Nsecs 00:00:00 UTC, Jan 1, 1970.  */
> +
> + unsigned long pad[12];
> +};
>   

... and everything non-time-related in here.  Yes, in means we need two 
msrs (for wall clock and system time), but it also means we don't impose 
any layout upon the guest, and do not (for example) restrict the number 
of vcpus.  We could easily put the vcpu clock in a per_cpu() area.


-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 1/3] put kvm_para.h include outside KERNEL

2008-01-16 Thread Avi Kivity

Glauber de Oliveira Costa wrote:
> kvm_para.h potentially contains definitions that are to be used by 
> kvm-userspace,
> so it should not be included inside the __KERNEL__ block. To protect its own 
> data structures,
> kvm_para.h already includes its own __KERNEL__ block.
>
>   

Applied this one, thanks.

-- 
error compiling committee.c: too many arguments to function

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [RFC] fix VMX TSC synchronicity

2008-01-16 Thread Avi Kivity

[fixing gmane emails, urgfhsz]

Andi Kleen wrote:
> Avi Kivity  writes:
>
>   
>> Thanks; that's reassuring to know that it will work (at least on Intel).
>> 
>
> Actually there are modern Intel systems which still have instable TSCs;
> e.g. IBM Summit multi node systems and some others. So you should
> still handle that case.
>   

I really don't see any way we could.  If the guest assumes tscs are 
synchronous, and they really are not, there's nothing we can do.

[well, we could trap and emulate rdtsc, but performance would tank]

You might taskset guests into a single node on such systems, which is a 
good idea anyway.

-- 
error compiling committee.c: too many arguments to function

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 2/3] kvmclock - the host part.

2008-01-16 Thread Gerd Hoffmann

  Hi,

> We want to avoid updating wall clock all the time.  As far as I 
> understand, wall clock is just a base which doesn't change.

Yep, it is.  Got that wrong first in xenner, with the result that guest
time ran at double speed ;)

>> +/* xen binary-compatible interfaces. See xen headers for details */
>> +struct xen_vcpu_time_info {
>> +uint32_t version;
>> +uint32_t pad0;
>> +uint64_t tsc_timestamp;
>> +uint64_t system_time;
>> +uint32_t tsc_to_system_mul;
>> +int8_t   tsc_shift;
>> +int8_t   pad1[3];
>> +};

>> +struct xen_vcpu_info {
>> +uint8_t  pad[32];
>> +struct xen_vcpu_time_info time;
>> +};
>>   
> 
> Please drop xen_vcpu_info...

Oh, yeah.  No point in assembling the whole xen shared info page.  Just
xen_vcpu_time_info is enougth, it will work just fine for xenner.

cheers,
  Gerd

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 0 of 2] A couple ifdefs

2008-01-16 Thread Christian Ehrhardt

Avi Kivity wrote:
> Hollis Blanchard wrote:
>> These small ifdefs are necessary for integration of the PowerPC port.
>>
>>   
> 
> Only patch 2 of 2 made it.
> 
As Hollis should be sleeping right now I resend 1/2 as it arrived on 
kvm-powerpc-devel
(I hope my mail-app keeps the format this time)

-- 

Grüsse / regards, 
Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization

original mail
---

# HG changeset patch
# User Hollis Blanchard <[EMAIL PROTECTED]>
# Date 1200434310 21600
# Node ID 7fa5947a2da8c0c7424ebdcfaebcae624d6cf015
# Parent  ee0c227fe3f6632f4b1b5fde3f7e05c8ea0a4378

Signed-off-by: Hollis Blanchard <[EMAIL PROTECTED]>
Signed-off-by: Christian Ehrhardt <[EMAIL PROTECTED]>

---
2 files changed, 7 insertions(+)
arch/x86/kvm/Kconfig |5 +
virt/kvm/kvm_main.c  |2 ++


diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -33,9 +33,13 @@ config KVM

  If unsure, say N.

+config KVM_HAS_PIO
+   bool
+
 config KVM_INTEL
tristate "KVM for Intel processors support"
depends on KVM
+   select KVM_HAS_PIO
---help---
  Provides support for KVM on Intel processors equipped with the VT
  extensions.
@@ -43,6 +47,7 @@ config KVM_AMD
 config KVM_AMD
tristate "KVM for AMD processors support"
depends on KVM
+   select KVM_HAS_PIO
---help---
  Provides support for KVM on AMD processors equipped with the AMD-V
  (SVM) extensions.
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -677,8 +677,10 @@ static int kvm_vcpu_fault(struct vm_area

if (vmf->pgoff == 0)
page = virt_to_page(vcpu->run);
+#ifdef CONFIG_KVM_HAS_PIO
else if (vmf->pgoff == KVM_PIO_PAGE_OFFSET)
page = virt_to_page(vcpu->arch.pio_data);
+#endif /* CONFIG_KVM_HAS_PIO */
else
return VM_FAULT_SIGBUS;
get_page(page);



-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 2/3] kvmclock - the host part.

2008-01-16 Thread Avi Kivity

Gerd Hoffmann wrote:
>   Hi,
>
>   
>> We want to avoid updating wall clock all the time.  As far as I 
>> understand, wall clock is just a base which doesn't change.
>> 
>
> Yep, it is.  Got that wrong first in xenner, with the result that guest
> time ran at double speed ;)
>
>   
>>> +/* xen binary-compatible interfaces. See xen headers for details */
>>> +struct xen_vcpu_time_info {
>>> +   uint32_t version;
>>> +   uint32_t pad0;
>>> +   uint64_t tsc_timestamp;
>>> +   uint64_t system_time;
>>> +   uint32_t tsc_to_system_mul;
>>> +   int8_t   tsc_shift;
>>> +   int8_t   pad1[3];
>>> +};
>>>   
>
>   
>>> +struct xen_vcpu_info {
>>> +   uint8_t  pad[32];
>>> +   struct xen_vcpu_time_info time;
>>> +};
>>>   
>>>   
>> Please drop xen_vcpu_info...
>> 
>
> Oh, yeah.  No point in assembling the whole xen shared info page.  Just
> xen_vcpu_time_info is enougth, it will work just fine for xenner.
>
>   

We should also not use the xen_ namespace, that can only cause conflicts.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [PATCH] add more regs to kvm_show_regs for powerpc

2008-01-16 Thread Christian Ehrhardt

Subject: [PATCH] add more regs to kvm_show_regs for powerpc 
From: Christian Ehrhardt <[EMAIL PROTECTED]>

This adds some registers useful for guest debugging to the powerpc code for
kvm_show_regs in libkvm.

Signed-off-by: Christian Ehrhardt <[EMAIL PROTECTED]>

 libkvm-powerpc.c |4 
 1 files changed, 4 insertions(+)

diff --git a/libkvm/libkvm-powerpc.c b/libkvm/libkvm-powerpc.c
--- a/libkvm/libkvm-powerpc.c
+++ b/libkvm/libkvm-powerpc.c
@@ -67,6 +67,10 @@ void kvm_show_regs(kvm_context_t kvm, in
if (kvm_get_regs(kvm, vcpu, ®s))
return;

+   fprintf(stderr,"guest vcpu #%d\n", vcpu);
+   fprintf(stderr,"pc:   %08x msr:  %08x\n", regs.pc, regs.msr);
+   fprintf(stderr,"lr:   %08x ctr:  %08x\n", regs.lr, regs.ctr);
+   fprintf(stderr,"srr0: %08x srr1: %08x\n", regs.srr0, regs.srr1);
for (i=0; i<32; i+=4)
{
fprintf(stderr, "gpr%02d: %08x %08x %08x %08x\n", i,

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] Hacking QEMU/KVM to use unused graphics adapters

2008-01-16 Thread Anthony de Almeida Lopes

I just read the ChangeLogs from kvm-47 to kvm-59 but I didn't notice 
anything that about PCI pass-through or any VGA work. I'm curious how 
things are going and what method was selected to accomplish this 
functionality?
 - Tony

Dor Laor wrote:
> It's still out-of -tree.
> Not for long :)
>
> Anthony de Almeida Lopes wrote:
>> Muli Ben-Yehuda wrote:
>>  
>>> On Thu, Oct 11, 2007 at 10:40:47AM +0200, Laurent Vivier wrote:
>>>
>>>  
> There is work in progress for pci pass through capability. Besides
> PCI it also required to have pv dma or 1-1 mapping between the
> guest and the host.  Both will be released in the following
> month. NIC pass through works but I'm not sure about the features
> required from VGA pass through.  Dor.
>   
 Perhaps if we use host IOMMU we don't need pv DMA ?
   
>>> Indeed, an IOMMU can provide the 1-1 mapping Dor mentioned above (or
>>> you can have both PV DMA and an IOMMU).
>>>
>>>  
 How do you say to host to not manage a PCI devices and let the guest
 managing it ?
   
>>> If the host driver is modular, it might be enough to just not load (or
>>> unload) it.
>>>
>>> Cheers,
>>> Muli
>>>   
>>
>> Thank you for your responses. I was curious, Dor, where could I take 
>> a look at this code?
>> I checked both sets of recent git logs and nothing popped out at me 
>> as being related. Is it still out-of-tree?
>>
>> Thanks again,
>>  - Tony
>>
>> - 
>>
>> This SF.net email is sponsored by: Splunk Inc.
>> Still grepping through log files to find problems?  Stop.
>> Now Search log events and configuration files using AJAX and a browser.
>> Download your FREE copy of Splunk now >> http://get.splunk.com/
>> ___
>> kvm-devel mailing list
>> kvm-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/kvm-devel
>>
>>   
>
>


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [RFC] fix VMX TSC synchronicity

2008-01-16 Thread Andi Kleen

On Wed, Jan 16, 2008 at 10:46:11AM +0200, Avi Kivity wrote:
> [fixing gmane emails, urgfhsz]
> 
> Andi Kleen wrote:
> >Avi Kivity  writes:
> >
> >  
> >>Thanks; that's reassuring to know that it will work (at least on Intel).
> >>
> >
> >Actually there are modern Intel systems which still have instable TSCs;
> >e.g. IBM Summit multi node systems and some others. So you should
> >still handle that case.
> >  
> 
> I really don't see any way we could.  If the guest assumes tscs are 
> synchronous, and they really are not, there's nothing we can do.

Linux checks a couple of things: e.g. if there are no deep C states
and if there are no clustered nodes in the APIC etc.

It might be reasonable to check the clock source of the kernel
and if it's not TSC force one of these in the emulated firmware
environment

> You might taskset guests into a single node on such systems, which is a
> good idea anyway.

Ah pushing the problem to the user. An easy, but typically wrong, solution.

-Andi

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [RFC] fix VMX TSC synchronicity

2008-01-16 Thread Andi Kleen

On Wed, Jan 16, 2008 at 03:38:45PM +0200, Avi Kivity wrote:
> Andi Kleen wrote:
> >Linux checks a couple of things: e.g. if there are no deep C states
> >and if there are no clustered nodes in the APIC etc.
> >
> >It might be reasonable to check the clock source of the kernel
> >and if it's not TSC force one of these in the emulated firmware
> >environment
> >
> >  
> 
> The problems are with older guests which assume the tsc is okay.  Newer 
> guests check the tsc and conclude that it isn't usable.

If the guest would get it wrong running natively on the host
I guess it would be reasonable to require an option that forces
TSC off. Disabling the TSC bit unfortunately won't work for 64bit
guests, but for probably most 32bit guests.

But for non broken guests they can only do that if the guest has the same 
visibility into the firmware state as the host. For the easy cases Linux will
check it anyways becaused on standard the TSC synchronicity check, but
there are cases where the TSCs only drift apart slowly over a longer time

[I finally fixed the clocksource watchdog now to catch this case, but
it will be only in .25]

I think it would be better to fake at least some of the usual
firmware cues for bad TSC if the host does not use it.

> 
> >>You might taskset guests into a single node on such systems, which is a
> >>good idea anyway.
> >>
> >
> >Ah pushing the problem to the user. An easy, but typically wrong, solution.
> >  
> 
> If you have other suggestions I'll be happy to hear them.  I don't like 
> this either.

Check if host is using TSC source and if not force a clustered
APIC mode (only works for 64bit unfortunately) or fake a C3 state
in ACPI and on AMD clear the synchronous TSC bit.

-Andi

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [RFC] fix VMX TSC synchronicity

2008-01-16 Thread Avi Kivity

Andi Kleen wrote:
> Check if host is using TSC source and if not force a clustered
> APIC mode (only works for 64bit unfortunately) or fake a C3 state
> in ACPI and on AMD clear the synchronous TSC bit.
>   

Yes, I got similar suggestions from Thomas.  But it looks like older 
guests will need a boot option.

-- 
error compiling committee.c: too many arguments to function

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [Qemu-devel] Re: [RFC][PATCH] Modify loop device to be able to manage partitions of the image disk

2008-01-16 Thread Anthony Liguori

Laurent Vivier wrote:
> Le mardi 15 janvier 2008 à 23:54 +, Daniel P. Berrange a écrit :
>   
>> On Wed, Jan 16, 2008 at 12:40:06AM +0100, Laurent Vivier wrote:
>> 
>>> Le mardi 15 janvier 2008 à 18:27 +, Daniel P. Berrange a écrit :
>>>   
 On Tue, Jan 15, 2008 at 07:22:53PM +0100, Laurent Vivier wrote:
 
> As it should be useful to be able to mount partition from a 
> disk image, (and as I need a break in my bug hunting) I've 
> modified the loop driver to mount raw disk image.
>
> To not break original loop device, as we have to change minor 
> numbers to manage partitions, a new parameter is added to the module:
>   
 I don't see the point in modifying the loop device driver when you
 can already access the partitions with existing device mapper
 functionality & tools.
 
>>> There are two reasons:
>>>
>>> 1- I didn't know kpartx (thank you for the tip)
>>>
>>> but using loop device, you will be able to use all partition tables
>>> known by the kernel (acorn,  atari,  efi,  karma,  mac, osf, sun,
>>> ultrix, amiga, ibm, ldm, msdos, sgi, sysv68), whereas kpartx can use
>>> only partition tables it knows (bsd, dasd, dos, mac, sun, efi, sun,
>>> unixware).
>>>   
>> This is an argument for extending kpartx to cope with the other
>> partition tables :-)  I have 50/50 split between VMs using files
>> 
>
> Good try... but IMHO, I think it is better to let the kernel decode the
> partition table...
>
>   
>> vs VMs using LVM volumes - the loop driver patches only help you
>> access partitions within a file based image, whereas kpartx can
>> access the partitions within any block device, so can support 
>> files (via existing loop device) & LVM vols & nested partitions.
>> 
>
> I think you're wrong (but you seem to know the subject better than me,
> so ...): you should be able to use the modified loop device on the
> logical volume to decode partition table.
>
>   
>>> 2- I'd like to mount qcow2 or others disk image formats, so perhaps it's
>>> easier to modify loop device driver (but perhaps you know another magic
>>> tool ?)
>>>   
>> There has been some work in this area wrt to Xen - the DM-Userspace project
>> had some working code providing a device mapper target calling out to a 
>> userspace daemon to handle non-raw file formats like qcow. I don't
>> know what the state of it is now wrt to upstream kernel / device-mapper,
>> or even whether it is more than just 'proof of concept', but the project
>> page is here with some info:
>>
>>   http://wiki.xensource.com/xenwiki/DmUserspace

FWIW, I still think a userspace block device is the Right Way to support 
these sort of things.  dm-userspace turned out to be difficult as device 
mapper has some rather strict requirements about alignment that some 
formats (like qcow) cannot satisfy.

The loop driver is a terrible base to start from as it does not preserve 
data integrity.

Regards,

Anthony Liguori

>> It seems a very good idea, but what I don't like:
>> - it seems very complex (like IBM guys like ;-) )
>> - it is one and a half year old
>>
>> To be honest, if something good already exists, I take it...
>>
>> Laurent
>> 
> 
>
> -
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2008.
> http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
> 
>
> ___
> kvm-devel mailing list
> kvm-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/kvm-devel
>   


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [Qemu-devel] Re: [RFC][PATCH] Modify loop device to be able to manage partitions of the image disk

2008-01-16 Thread Laurent Vivier

Le mercredi 16 janvier 2008 à 08:57 -0600, Anthony Liguori a écrit :
> > Le mardi 15 janvier 2008 à 23:54 +, Daniel P. Berrange a écrit :
[...]
> >>> 2- I'd like to mount qcow2 or others disk image formats, so perhaps it's
> >>> easier to modify loop device driver (but perhaps you know another magic
> >>> tool ?)
> >>>   
> >> There has been some work in this area wrt to Xen - the DM-Userspace project
> >> had some working code providing a device mapper target calling out to a 
> >> userspace daemon to handle non-raw file formats like qcow. I don't
> >> know what the state of it is now wrt to upstream kernel / device-mapper,
> >> or even whether it is more than just 'proof of concept', but the project
> >> page is here with some info:
> >>
> >>   http://wiki.xensource.com/xenwiki/DmUserspace
> 
> FWIW, I still think a userspace block device is the Right Way to support 

I agree with you, it was my first idea too, but it introduces complexity
to manage communications between the kernel part of the driver and the
userspace daemon: I don't like complexity.

> these sort of things.  dm-userspace turned out to be difficult as device 
> mapper has some rather strict requirements about alignment that some 
> formats (like qcow) cannot satisfy.
> 
> The loop driver is a terrible base to start from as it does not preserve 
> data integrity.
[...]

But everyone already uses loop as it is currently, so why not to add
more supported formats for the disk image ?
Why do you say it doesn't preserve data integrity ?

Regards,
Laurent
-- 
- [EMAIL PROTECTED]  --
  "La perfection est atteinte non quand il ne reste rien à
ajouter mais quand il ne reste rien à enlever." Saint Exupéry


signature.asc
Description: Ceci est une partie de message	numériquement signée
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] Wiki downtime

2008-01-16 Thread Avi Kivity

Due to Qumranet relocating to new premises, the kvm wiki will be down 
tomorrow for at least a few hours.

-- 
error compiling committee.c: too many arguments to function


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 2 of 2] Use CONFIG_PREEMPT_NOTIFIERS around struct preempt_notifier

2008-01-16 Thread Hollis Blanchard

On Wed, 2008-01-16 at 10:08 +0200, Avi Kivity wrote:
> Hollis Blanchard wrote:
> > # HG changeset patch
> > # User Hollis Blanchard <[EMAIL PROTECTED]>
> > # Date 1200434370 21600
> > # Node ID 9878c9cec5f831ff5e9b97539aabc5fa3d934501
> > # Parent  931a81e1002110be0e8bf5b335bf199d43534c2c
> > This allows kvm_host.h to be #included even when struct preempt_notifier is
> > undefined.
> 
> Don't you actually need preempt notifiers?  They are useful if you have 
> state that is only needed from userspace, but is expensive to switch.  
> For x86, this is the syscall msrs (which define the syscall entry 
> point), the fpu (which is not used in the kernel), and a few other bits 
> (which I'm too lazy too look up and are esoteric anyway).

Yes, I do. However, if you #include  *without*
CONFIG_VIRTUALIZATION=y, CONFIG_PREEMPT_NOTIFIERS is not set and the
structure is undefined.

It is Linux policy to be able to unconditionally include headers, and
indeed I already hit this problem when I added that #include to
arch/powerpc/kernel/asm-offsets.c.

-- 
Hollis Blanchard
IBM Linux Technology Center

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [PATCH 0/2] kvm clock - xen compatible by accident

2008-01-16 Thread Glauber de Oliveira Costa

I think I've misunderstood what you guys wanted to achieve with "xen
compatible", but now I get it. It's something that's kvm specific, 
but happens to be able to communicate with xen guests, provided they 
do a kvm-aware initialization.

So, here's the two patches for it, using two msrs and non-xen data structures

Userspace is the same, so I'm only sending these ones



-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [PATCH 1/2] kvmclock - the host part.

2008-01-16 Thread Glauber de Oliveira Costa

This is the host part of kvm clocksource implementation. As it does
not include clockevents, it is a fairly simple implementation. We
only have to register a per-vcpu area, and start writting to it periodically.

The area is binary compatible with xen, as we use the same shadow_info 
structure.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
---
 arch/x86/kvm/x86.c |   98 +++-
 include/asm-x86/kvm_host.h |6 +++
 include/asm-x86/kvm_para.h |   24 +++
 include/linux/kvm.h|1 +
 4 files changed, 128 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8a90403..fd69aa1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -19,6 +19,7 @@
 #include "irq.h"
 #include "mmu.h"
 
+#include 
 #include 
 #include 
 #include 
@@ -412,7 +413,7 @@ static u32 msrs_to_save[] = {
 #ifdef CONFIG_X86_64
MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR,
 #endif
-   MSR_IA32_TIME_STAMP_COUNTER,
+   MSR_IA32_TIME_STAMP_COUNTER, MSR_KVM_SYSTEM_TIME,
 };
 
 static unsigned num_msrs_to_save;
@@ -467,6 +468,73 @@ static int do_set_msr(struct kvm_vcpu *vcpu, unsigned 
index, u64 *data)
return kvm_set_msr(vcpu, index, *data);
 }
 
+static void kvm_write_wall_clock(struct kvm_vcpu *v, gpa_t wall_clock)
+{
+   int version = 1;
+   struct wall_clock wc;
+   unsigned long flags;
+   struct timespec wc_ts;
+
+   local_irq_save(flags);
+   kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER,
+ &v->arch.hv_clock.tsc_timestamp);
+   wc_ts = current_kernel_time();
+   local_irq_restore(flags);
+
+   down_write(¤t->mm->mmap_sem);
+   kvm_write_guest(v->kvm, wall_clock, &version, sizeof(version));
+   up_write(¤t->mm->mmap_sem);
+
+   /* With all the info we got, fill in the values */
+   wc.wc_sec = wc_ts.tv_sec;
+   wc.wc_nsec = wc_ts.tv_nsec;
+   wc.wc_version = ++version;
+
+   down_write(¤t->mm->mmap_sem);
+   kvm_write_guest(v->kvm, wall_clock, &wc, sizeof(wc));
+   up_write(¤t->mm->mmap_sem);
+}
+static void kvm_write_guest_time(struct kvm_vcpu *v)
+{
+   struct timespec ts;
+   unsigned long flags;
+   struct kvm_vcpu_arch *vcpu = &v->arch;
+   void *shared_kaddr;
+
+   if ((!vcpu->time_page))
+   return;
+
+   /* Keep irq disabled to prevent changes to the clock */
+   local_irq_save(flags);
+   kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER,
+ &vcpu->hv_clock.tsc_timestamp);
+   ktime_get_ts(&ts);
+   local_irq_restore(flags);
+
+   /* With all the info we got, fill in the values */
+
+   vcpu->hv_clock.system_time = ts.tv_nsec +
+(NSEC_PER_SEC * (u64)ts.tv_sec);
+   /*
+* The interface expects us to write an even number signaling that the
+* update is finished. Since the guest won't see the intermediate 
states,
+* we just write "2" at the end
+*/
+   vcpu->hv_clock.version = 2;
+
+   preempt_disable();
+
+   shared_kaddr = kmap_atomic(vcpu->time_page, KM_USER0);
+
+   memcpy(shared_kaddr + vcpu->time_offset, &vcpu->hv_clock,
+   sizeof(vcpu->hv_clock));
+
+   kunmap_atomic(shared_kaddr, KM_USER0);
+   preempt_enable();
+
+   mark_page_dirty(v->kvm, vcpu->time >> PAGE_SHIFT);
+}
+
 
 int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 {
@@ -494,6 +562,25 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 
data)
case MSR_IA32_MISC_ENABLE:
vcpu->arch.ia32_misc_enable_msr = data;
break;
+   case MSR_KVM_WALL_CLOCK:
+   vcpu->arch.wall_clock = data;
+   kvm_write_wall_clock(vcpu, data);
+   break;
+   case MSR_KVM_SYSTEM_TIME: {
+   vcpu->arch.time = data & PAGE_MASK;
+   vcpu->arch.time_offset = data & ~PAGE_MASK;
+
+   vcpu->arch.hv_clock.tsc_to_system_mul =
+   clocksource_khz2mult(tsc_khz, 22);
+   vcpu->arch.hv_clock.tsc_shift = 22;
+
+   down_write(¤t->mm->mmap_sem);
+   vcpu->arch.time_page = gfn_to_page(vcpu->kvm, data >> 
PAGE_SHIFT);
+   up_write(¤t->mm->mmap_sem);
+   if (is_error_page(vcpu->arch.time_page))
+   vcpu->arch.time_page = NULL;
+   break;
+   }
default:
pr_unimpl(vcpu, "unhandled wrmsr: 0x%x data %llx\n", msr, data);
return 1;
@@ -553,6 +640,13 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 
*pdata)
data = vcpu->arch.shadow_efer;
break;
 #endif
+   case MSR_KVM_WALL_CLOCK:
+   data = vcpu->arch.wall_clock;
+   break;
+   case MSR_KVM_SYSTEM_TIME:
+   data = vcpu->arch.time;
+

[kvm-devel] [PATCH 2/2] kvmclock implementation, the guest part.

2008-01-16 Thread Glauber de Oliveira Costa

This is the guest part of kvm clock implementation
It does not do tsc-only timing, as tsc can have deltas
between cpus, and it did not seem worthy to me to keep
adjusting them.

We do use it, however, for fine-grained adjustment.

Other than that, time comes from the host.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
---
 arch/x86/Kconfig|   10 +++
 arch/x86/kernel/Makefile_32 |1 +
 arch/x86/kernel/kvmclock.c  |  154 +++
 arch/x86/kernel/setup_32.c  |5 ++
 4 files changed, 170 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/kernel/kvmclock.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index ab2df55..968315e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -350,6 +350,16 @@ config VMI
  at the moment), by linking the kernel to a GPL-ed ROM module
  provided by the hypervisor.
 
+config KVM_CLOCK
+   bool "KVM paravirtualized clock"
+   select PARAVIRT
+   help
+ Turning on this option will allow you to run a paravirtualized clock
+ when running over the KVM hypervisor. Instead of relying on a PIT
+ (or probably other) emulation by the underlying device model, the host
+ provides the guest with timing infrastructure, as time of day, and
+ timer expiration.
+
 source "arch/x86/lguest/Kconfig"
 
 endif
diff --git a/arch/x86/kernel/Makefile_32 b/arch/x86/kernel/Makefile_32
index a7bc93c..f6332b6 100644
--- a/arch/x86/kernel/Makefile_32
+++ b/arch/x86/kernel/Makefile_32
@@ -44,6 +44,7 @@ obj-$(CONFIG_K8_NB)   += k8.o
 obj-$(CONFIG_MGEODE_LX)+= geode_32.o mfgpt_32.o
 
 obj-$(CONFIG_VMI)  += vmi_32.o vmiclock_32.o
+obj-$(CONFIG_KVM_CLOCK)+= kvmclock.o
 obj-$(CONFIG_PARAVIRT) += paravirt_32.o
 obj-y  += pcspeaker.o
 
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
new file mode 100644
index 000..56be828
--- /dev/null
+++ b/arch/x86/kernel/kvmclock.c
@@ -0,0 +1,154 @@
+/*  KVM paravirtual clock driver. A clocksource implementation
+Copyright (C) 2008 Glauber de Oliveira Costa, Red Hat Inc.
+
+This program is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 2 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; if not, write to the Free Software
+Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+*/
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define KVM_SCALE 22
+
+static int kvmclock = 1;
+
+static int parse_no_kvmclock(char *arg)
+{
+   kvmclock = 0;
+   return 0;
+}
+early_param("no-kvmclock", parse_no_kvmclock);
+
+struct shared_info shared_info __attribute__((__aligned__(PAGE_SIZE)));
+
+/* The hypervisor will put information about time periodically here */
+static struct kvm_vcpu_time_info hv_clock[NR_CPUS];
+#define get_clock(cpu, field) hv_clock[cpu].field
+
+static inline u64 kvm_get_delta(u64 last_tsc)
+{
+   int cpu = smp_processor_id();
+   u64 delta = native_read_tsc() - last_tsc;
+   return (delta * get_clock(cpu, tsc_to_system_mul)) >> KVM_SCALE;
+}
+
+static struct wall_clock wall_clock;
+/*
+ * The wallclock is the time of day when we booted. Since then, some time may
+ * have elapsed since the hypervisor wrote the data. So we try to account for
+ * that. Even if the tsc is not accurate, it gives us a more accurate timing
+ * than not adjusting at all
+ */
+unsigned long kvm_get_wallclock(void)
+{
+   u32 wc_sec, wc_nsec;
+   u64 delta, last_tsc;
+   struct timespec ts;
+   int version, nsec, cpu = smp_processor_id();
+
+   native_write_msr(MSR_KVM_WALL_CLOCK, __pa(&wall_clock));
+   do {
+   version = wall_clock.wc_version;
+   rmb();
+   wc_sec = wall_clock.wc_sec;
+   wc_nsec = wall_clock.wc_nsec;
+   last_tsc = get_clock(cpu, tsc_timestamp);
+   rmb();
+   } while ((wall_clock.wc_version != version) || (version & 1));
+
+   delta = kvm_get_delta(last_tsc);
+   delta += wc_nsec;
+   nsec = do_div(delta, NSEC_PER_SEC);
+   set_normalized_timespec(&ts, wc_sec + delta, nsec);
+   /*
+* Of all mechanisms of time adjustment I've tested, this one
+* was the champion!
+*/
+   return ts.tv_sec + 1;
+}
+
+int kvm_set_wallclock(unsigned long now)
+{
+   return 0;
+}
+
+/*
+ * This is our read_clock function. The host puts an tsc

Re: [kvm-devel] [PATCH] fix cpuid function 4

2008-01-16 Thread Alexander Graf

Dan Kenigsberg wrote:
> On Tue, Jan 15, 2008 at 08:57:45AM +0100, Alexander Graf wrote:
>   
>> Dan Kenigsberg wrote:
>> 
>>> On Mon, Jan 14, 2008 at 02:49:31PM +0100, Alexander Graf wrote:
>>>   
>>>   
 Hi,

 Currently CPUID function 4 is broken. This function's values rely on the
 value of ECX.
 To solve the issue cleanly, there is already a new API for cpuid
 settings, which is not used yet.
 Using the current interface, the function 4 can be easily passed
 through, by giving multiple function 4 outputs and increasing the
 index-identifier on the fly. This does not break compatibility.

 This fix is really important for Mac OS X, as it requires cache
 information. Please also see my previous patches for Mac OS X (or rather
 core duo target) compatibility.

 Regards,

 Alex

>>>   
>>>   
 diff --git a/kernel/x86.c b/kernel/x86.c
 index b55c177..73312e9 100644
 --- a/kernel/x86.c
 +++ b/kernel/x86.c
 @@ -783,7 +783,7 @@ static int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu 
 *vcpu,
struct kvm_cpuid *cpuid,
struct kvm_cpuid_entry __user *entries)
  {
 -  int r, i;
 +  int r, i, n = 0;
struct kvm_cpuid_entry *cpuid_entries;

r = -E2BIG;
 @@ -803,8 +803,17 @@ static int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu 
 *vcpu,
vcpu->arch.cpuid_entries[i].ebx = cpuid_entries[i].ebx;
vcpu->arch.cpuid_entries[i].ecx = cpuid_entries[i].ecx;
vcpu->arch.cpuid_entries[i].edx = cpuid_entries[i].edx;
 -  vcpu->arch.cpuid_entries[i].index = 0;
 -  vcpu->arch.cpuid_entries[i].flags = 0;
 +switch(vcpu->arch.cpuid_entries[i].function) {
 +case 4:
 +vcpu->arch.cpuid_entries[i].index = n;
 +vcpu->arch.cpuid_entries[i].flags = 
 KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
 +n++;
 +break;
 +default:
 +vcpu->arch.cpuid_entries[i].index = 0;
 +vcpu->arch.cpuid_entries[i].flags = 0;
 +break;
 +}

>>> I will not mention the whitespace damage here :-). Instead, I'd ask you
>>>   
>>>   
>> Oh well, after having been into qemu source, I just got used to use
>> spaces instead of tabs ;-).
>>
>> 
>>> to review, comment, and even try, the patch that I posted here not long
>>> ago, exposing all safe host cpuid functions to guests.
>>>   
>>>   
>> Sure.
>> Basically your patch targets at a completely different use case than
>> mine though. You want to expose the host features on the virtual CPU,
>> whereas my goal is to have a virtual Core Duo/Solo CPU, even if your
>> host CPU is actually an SVM capable one.
>>
>> So my CoreDuo CPU definition still fails to populate a proper CPUID
>> function 4. With the -cpu host option, Linux works (as it's bright
>> enough to know that some values are just plain wrong), but Darwin
>> crashes. I am not exactly sure why it is, but I guess it's due to the
>> function 4 values exposing a 2-core CPU, which kvm simply doesn't emulate.
>> 
>
> What I wanted to say is that the fact that the usermode support is not
> used, is not IMHO a good-enough reason to change the kernel:
> kvm_vcpu_ioctl_set_cpuid() was ment to be a stupid function, to be used
> only with old usermode. I hate to teach it the true complex logic of Intel's
> CPUID.
>
>   

The funny part is, you don't have to. Every complex I know of so far is
simply repetitive. If the userspace just sends x cpuid values and the
kernel takes x, where's the problem?

Of course having a full descriptionary approach is way better, but I see
no real need to not use a stupid interface.

> What I would like to see is something that uses the cpuid2 API, and not
> circumvene it... For this to happen, I need a deep review of my code.
>   

I have to admin that I am really bad at reviewing, so don't expect
anything glorious from me.

> How about the (untested) attched kvm-cpuid.patch, on top of the attached
> cpuid-user patch?
>   

Is there any real difference between this kvm-cpuid.patch and the one I
sent?

What I was really wondering about is, why do you fetch the cpuid
information about the host from the kernel module? CPUID does not get
intercepted and can be easily triggered from userspace.
All the fancy processing of capabilities could be done in userspace as
well (except for features that'd need to be implemented in the kernel,
like MTRR) and this might even reduce the code, and in any case the
amount of code changes in the kernel.

Furthermore most people probably don't even want their host cpu to be
the default one. It renders migration near

Re: [kvm-devel] [PATCH] mmu notifiers #v2

2008-01-16 Thread Rik van Riel

On Sun, 13 Jan 2008 17:24:18 +0100
Andrea Arcangeli <[EMAIL PROTECTED]> wrote:

> In my basic initial patch I only track the tlb flushes which should be
> the minimum required to have a nice linux-VM controlled swapping
> behavior of the KVM gphysical memory. 

I have a vaguely related question on KVM swapping.

Do page accesses inside KVM guests get propagated to the host
OS, so Linux can choose a reasonable page for eviction, or is
the pageout of KVM guest pages essentially random?

-- 
All rights reversed.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] 8th Annual National Business Conference

2008-01-16 Thread ALEXANDER




Dear Reader, You are invited to attend the 8th
 Annual National Multicultural Business Conference. The event on April 23 -25, 2008 bringing together Small Businesses, Government Agencies and Fortune 1000 companies for promoting business opportunities. Attendance is expected to top 1200. This years conference will be held at the Disney's BoardWalk Resorts, Orlando, Florida. 
If you haven't registered yet, kindly register at Click Here to RegisterHave a wonderful and prosperous New Year.
Carylon Alexander
Director, Business Relations
DiversityBusiness 8th Annual National Business Conference
 
You are receiving this special promotion, since you or your associates subscribed your email id to receive the communication from DiversityBusiness or its affiliates,  To remove or unsubscribe click HERE

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH] mmu notifiers #v2

2008-01-16 Thread Izik Eidus

Rik van Riel wrote:
> On Sun, 13 Jan 2008 17:24:18 +0100
> Andrea Arcangeli <[EMAIL PROTECTED]> wrote:
>
>   
>> In my basic initial patch I only track the tlb flushes which should be
>> the minimum required to have a nice linux-VM controlled swapping
>> behavior of the KVM gphysical memory. 
>> 
>
> I have a vaguely related question on KVM swapping.
>
> Do page accesses inside KVM guests get propagated to the host
> OS, so Linux can choose a reasonable page for eviction, or is
> the pageout of KVM guest pages essentially random?
>
>   
right now when kvm remove pte from the shadow cache, it mark as access 
the page that this pte pointed to.
it was a good solution untill the mmut notifiers beacuse the pages were 
pinned and couldnt be swapped to disk
so now it will have to do something more sophisticated or at least mark 
as access every page pointed by pte
that get insrted to the shadow cache


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] mmu notifiers

2008-01-16 Thread Christoph Lameter

On Wed, 16 Jan 2008, Avi Kivity wrote:

> Yes, that was poorly phrased.  The page and its page struct may
be reallocated
> for other purposes.

Its better to say "reused". Otherwise one may think that an allocation of 
page structs is needed.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 1/2] kvmclock - the host part.

2008-01-16 Thread Anthony Liguori

Glauber de Oliveira Costa wrote:
> This is the host part of kvm clocksource implementation. As it does
> not include clockevents, it is a fairly simple implementation. We
> only have to register a per-vcpu area, and start writting to it periodically.
>
> The area is binary compatible with xen, as we use the same shadow_info 
> structure.
>
> Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
> ---
>  arch/x86/kvm/x86.c |   98 
> +++-
>  include/asm-x86/kvm_host.h |6 +++
>  include/asm-x86/kvm_para.h |   24 +++
>  include/linux/kvm.h|1 +
>  4 files changed, 128 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 8a90403..fd69aa1 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -19,6 +19,7 @@
>  #include "irq.h"
>  #include "mmu.h"
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -412,7 +413,7 @@ static u32 msrs_to_save[] = {
>  #ifdef CONFIG_X86_64
>   MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR,
>  #endif
> - MSR_IA32_TIME_STAMP_COUNTER,
> + MSR_IA32_TIME_STAMP_COUNTER, MSR_KVM_SYSTEM_TIME,
>  };
>  
>  static unsigned num_msrs_to_save;
> @@ -467,6 +468,73 @@ static int do_set_msr(struct kvm_vcpu *vcpu, unsigned 
> index, u64 *data)
>   return kvm_set_msr(vcpu, index, *data);
>  }
>  
> +static void kvm_write_wall_clock(struct kvm_vcpu *v, gpa_t wall_clock)
> +{
> + int version = 1;
> + struct wall_clock wc;
> + unsigned long flags;
> + struct timespec wc_ts;
> +
> + local_irq_save(flags);
> + kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER,
> +   &v->arch.hv_clock.tsc_timestamp);
> + wc_ts = current_kernel_time();
> + local_irq_restore(flags);
> +
> + down_write(¤t->mm->mmap_sem);
> + kvm_write_guest(v->kvm, wall_clock, &version, sizeof(version));
> + up_write(¤t->mm->mmap_sem);
> +
> + /* With all the info we got, fill in the values */
> + wc.wc_sec = wc_ts.tv_sec;
> + wc.wc_nsec = wc_ts.tv_nsec;
> + wc.wc_version = ++version;
> +
> + down_write(¤t->mm->mmap_sem);
> + kvm_write_guest(v->kvm, wall_clock, &wc, sizeof(wc));
> + up_write(¤t->mm->mmap_sem);
>   

Can we get a comment explaining why we only write the version field and 
then immediately increment the version and write the whole struct?  It's 
not at all obvious why the first write is needed to me.

> +}
> +static void kvm_write_guest_time(struct kvm_vcpu *v)
> +{
> + struct timespec ts;
> + unsigned long flags;
> + struct kvm_vcpu_arch *vcpu = &v->arch;
> + void *shared_kaddr;
> +
> + if ((!vcpu->time_page))
> + return;
> +
> + /* Keep irq disabled to prevent changes to the clock */
> + local_irq_save(flags);
> + kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER,
> +   &vcpu->hv_clock.tsc_timestamp);
> + ktime_get_ts(&ts);
> + local_irq_restore(flags);
> +
> + /* With all the info we got, fill in the values */
> +
> + vcpu->hv_clock.system_time = ts.tv_nsec +
> +  (NSEC_PER_SEC * (u64)ts.tv_sec);
> + /*
> +  * The interface expects us to write an even number signaling that the
> +  * update is finished. Since the guest won't see the intermediate 
> states,
> +  * we just write "2" at the end
> +  */
> + vcpu->hv_clock.version = 2;
> +
> + preempt_disable();
> +
> + shared_kaddr = kmap_atomic(vcpu->time_page, KM_USER0);
> +
> + memcpy(shared_kaddr + vcpu->time_offset, &vcpu->hv_clock,
> + sizeof(vcpu->hv_clock));
> +
> + kunmap_atomic(shared_kaddr, KM_USER0);
>   

Instead of doing a kmap/memcpy, I think it would be better to store the 
GPA of the time page and do a kvm_write_guest().  Otherwise, you're 
pinning this page in memory.

Regards,

Anthony Liguori

> + preempt_enable();
> +
> + mark_page_dirty(v->kvm, vcpu->time >> PAGE_SHIFT);
> +}
> +
>  
>  int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data)
>  {
> @@ -494,6 +562,25 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
> u64 data)
>   case MSR_IA32_MISC_ENABLE:
>   vcpu->arch.ia32_misc_enable_msr = data;
>   break;
> + case MSR_KVM_WALL_CLOCK:
> + vcpu->arch.wall_clock = data;
> + kvm_write_wall_clock(vcpu, data);
> + break;
> + case MSR_KVM_SYSTEM_TIME: {
> + vcpu->arch.time = data & PAGE_MASK;
> + vcpu->arch.time_offset = data & ~PAGE_MASK;
> +
> + vcpu->arch.hv_clock.tsc_to_system_mul =
> + clocksource_khz2mult(tsc_khz, 22);
> + vcpu->arch.hv_clock.tsc_shift = 22;
> +
> + down_write(¤t->mm->mmap_sem);
> + vcpu->arch.time_page = gfn_to_page(vcpu->kvm, data >> 
> PAGE_SHIFT);
> + up_write(¤t->mm->mmap_sem);
> + if (is_error_page(vcpu

Re: [kvm-devel] [PATCH] fix cpuid function 4

2008-01-16 Thread Dan Kenigsberg

On Wed, Jan 16, 2008 at 06:34:08PM +0100, Alexander Graf wrote:
> Dan Kenigsberg wrote:
> > On Tue, Jan 15, 2008 at 08:57:45AM +0100, Alexander Graf wrote:
> >   
> >> Dan Kenigsberg wrote:
> >> 
> >>> On Mon, Jan 14, 2008 at 02:49:31PM +0100, Alexander Graf wrote:
> >>>   
> >>>   
>  Hi,
> 
>  Currently CPUID function 4 is broken. This function's values rely on the
>  value of ECX.
>  To solve the issue cleanly, there is already a new API for cpuid
>  settings, which is not used yet.
>  Using the current interface, the function 4 can be easily passed
>  through, by giving multiple function 4 outputs and increasing the
>  index-identifier on the fly. This does not break compatibility.
> 
>  This fix is really important for Mac OS X, as it requires cache
>  information. Please also see my previous patches for Mac OS X (or rather
>  core duo target) compatibility.
> 
>  Regards,
> 
>  Alex
>  
>  
> >>>   
> >>>   
>  diff --git a/kernel/x86.c b/kernel/x86.c
>  index b55c177..73312e9 100644
>  --- a/kernel/x86.c
>  +++ b/kernel/x86.c
>  @@ -783,7 +783,7 @@ static int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu 
>  *vcpu,
>   struct kvm_cpuid *cpuid,
>   struct kvm_cpuid_entry __user 
>  *entries)
>   {
>  -int r, i;
>  +int r, i, n = 0;
>   struct kvm_cpuid_entry *cpuid_entries;
>   
>   r = -E2BIG;
>  @@ -803,8 +803,17 @@ static int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu 
>  *vcpu,
>   vcpu->arch.cpuid_entries[i].ebx = cpuid_entries[i].ebx;
>   vcpu->arch.cpuid_entries[i].ecx = cpuid_entries[i].ecx;
>   vcpu->arch.cpuid_entries[i].edx = cpuid_entries[i].edx;
>  -vcpu->arch.cpuid_entries[i].index = 0;
>  -vcpu->arch.cpuid_entries[i].flags = 0;
>  +switch(vcpu->arch.cpuid_entries[i].function) {
>  +case 4:
>  +vcpu->arch.cpuid_entries[i].index = n;
>  +vcpu->arch.cpuid_entries[i].flags = 
>  KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
>  +n++;
>  +break;
>  +default:
>  +vcpu->arch.cpuid_entries[i].index = 0;
>  +vcpu->arch.cpuid_entries[i].flags = 0;
>  +break;
>  +}
>  
>  
> >>> I will not mention the whitespace damage here :-). Instead, I'd ask you
> >>>   
> >>>   
> >> Oh well, after having been into qemu source, I just got used to use
> >> spaces instead of tabs ;-).
> >>
> >> 
> >>> to review, comment, and even try, the patch that I posted here not long
> >>> ago, exposing all safe host cpuid functions to guests.
> >>>   
> >>>   
> >> Sure.
> >> Basically your patch targets at a completely different use case than
> >> mine though. You want to expose the host features on the virtual CPU,
> >> whereas my goal is to have a virtual Core Duo/Solo CPU, even if your
> >> host CPU is actually an SVM capable one.
> >>
> >> So my CoreDuo CPU definition still fails to populate a proper CPUID
> >> function 4. With the -cpu host option, Linux works (as it's bright
> >> enough to know that some values are just plain wrong), but Darwin
> >> crashes. I am not exactly sure why it is, but I guess it's due to the
> >> function 4 values exposing a 2-core CPU, which kvm simply doesn't emulate.
> >> 
> >
> > What I wanted to say is that the fact that the usermode support is not
> > used, is not IMHO a good-enough reason to change the kernel:
> > kvm_vcpu_ioctl_set_cpuid() was ment to be a stupid function, to be used
> > only with old usermode. I hate to teach it the true complex logic of Intel's
> > CPUID.
> >
> >   
> 
> The funny part is, you don't have to. Every complex I know of so far is
> simply repetitive. If the userspace just sends x cpuid values and the
> kernel takes x, where's the problem?
> 
> Of course having a full descriptionary approach is way better, but I see
> no real need to not use a stupid interface.

The only reason is that a smarter interface exists, and I want it to be used,
not hacked arround.

> > What I would like to see is something that uses the cpuid2 API, and not
> > circumvene it... For this to happen, I need a deep review of my code.
> >   
> 
> I have to admin that I am really bad at reviewing, so don't expect
> anything glorious from me.

Anything beyond silence would be glorious.

> > How about the (untested) attched kvm-cpuid.patch, on top of the attached
> > cpuid-user patch?
> >   
> 
> Is there any real difference between this kvm-cpuid.patch and the one I
> sent?

There is none. I just wanted to recruit you

[kvm-devel] RFC: qemu acpi hotplug

2008-01-16 Thread Glauber de Oliveira Costa

When it's more close to inclusion, I'd also post it to main qemu list. 
But right now, I'm just aiming at a first round around this draft.


The attached patch is enough to make the notifications DEVICE_CHECK and 
EJECT reach the kernel. As far as I understand, some userspace black 
magic that keeps changing its scroll is needed to really put the 
processors logically off/on after the notify (acpi code itself will 
never call cpu_up/down)


Just let me tell you what you think.
>From c45432c0cec8241dbcd6ed6cf38c953b17a6f826 Mon Sep 17 00:00:00 2001
From: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Date: Wed, 16 Jan 2008 18:43:11 -0200
Subject: [PATCH] RFC: qemu cpu hotplug

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
---
 bios/acpi-dsdt.dsl|   87 +-
 bios/rombios32.c  |2 +
 qemu/hw/acpi.c|  125 +
 qemu/hw/pc.c  |4 +-
 qemu/monitor.c|9 
 qemu/pc-bios/bios.bin |  Bin
 6 files changed, 214 insertions(+), 13 deletions(-)

diff --git a/bios/acpi-dsdt.dsl b/bios/acpi-dsdt.dsl
index df255ce..497b866 100755
--- a/bios/acpi-dsdt.dsl
+++ b/bios/acpi-dsdt.dsl
@@ -27,18 +27,35 @@ DefinitionBlock (
 {
 Scope (_PR)
 {
-Processor (CPU0, 0x00, 0xb010, 0x06) {}
-Processor (CPU1, 0x01, 0xb010, 0x06) {}
-Processor (CPU2, 0x02, 0xb010, 0x06) {}
-Processor (CPU3, 0x03, 0xb010, 0x06) {}
-Processor (CPU4, 0x04, 0xb010, 0x06) {}
-Processor (CPU5, 0x05, 0xb010, 0x06) {}
-Processor (CPU6, 0x06, 0xb010, 0x06) {}
-Processor (CPU7, 0x07, 0xb010, 0x06) {}
-Processor (CPU8, 0x08, 0xb010, 0x06) {}
-Processor (CPU9, 0x09, 0xb010, 0x06) {}
-Processor (CPUA, 0x0a, 0xb010, 0x06) {}
-Processor (CPUB, 0x0b, 0xb010, 0x06) {}
+	OperationRegion( PRO, SystemIO, 0xaf00, 0x02)
+	Field (PRO, ByteAcc, NoLock, WriteAsZeros)
+	{
+		PR0U, 1,
+		PR1U, 1,
+		PR2U, 1,
+		PR3U, 1,
+		PR4U, 1,
+		PADU, 3,
+
+		PR0D, 1,
+		PR1D, 1,
+		PR2D, 1,
+		PR3D, 1,
+		PR4D, 1,
+		PADD, 3,
+	}
+Processor (CPU0, 0x00, 0xb010, 0x06) { Method (_STA) { Return(0x1)} }
+Processor (CPU1, 0x01, 0xb010, 0x06) { Method (_STA) { Return(0x1)} }
+Processor (CPU2, 0x02, 0xb010, 0x06) { Method (_STA) { Return(0x1)} }
+Processor (CPU3, 0x03, 0xb010, 0x06) { Method (_STA) { Return(0x1)} }
+Processor (CPU4, 0x04, 0xb010, 0x06) { Method (_STA) { Return(0x1)} } 
+Processor (CPU5, 0x05, 0xb010, 0x06) { Method (_STA) { Return(0x1)} }
+Processor (CPU6, 0x06, 0xb010, 0x06) { Method (_STA) { Return(0x1)} }
+Processor (CPU7, 0x07, 0xb010, 0x06) { Method (_STA) { Return(0x1)} }
+Processor (CPU8, 0x08, 0xb010, 0x06) { Method (_STA) { Return(0x1)} }
+Processor (CPU9, 0x09, 0xb010, 0x06) { Method (_STA) { Return(0x1)} }
+Processor (CPUA, 0x0a, 0xb010, 0x06) { Method (_STA) { Return(0x1)} }
+Processor (CPUB, 0x0b, 0xb010, 0x06) { Method (_STA) { Return(0x1)} }
 Processor (CPUC, 0x0c, 0xb010, 0x06) {}
 Processor (CPUD, 0x0d, 0xb010, 0x06) {}
 Processor (CPUE, 0x0e, 0xb010, 0x06) {}
@@ -559,6 +576,51 @@ DefinitionBlock (
 }
 }
 }
+Scope(\_GPE)
+{
+   Method(_L00) {  
+	  Return(0x01)
+   }
+   Method(_L01) { 
+ If (\_PR.PR1U) {
+	  Notify(\_PR.CPU1, 1)
+	 }
+	 If (\_PR.PR1D){
+	  Notify(\_PR.CPU1, 3) 
+	 }
+	 Return(0x01)
+   }
+
+   Method(_L02) { 
+ If (\_PR.PR2U) {
+	  Notify(\_PR.CPU2, 1)
+	 }
+	 If (\_PR.PR2D){
+	  Notify(\_PR.CPU2, 3) 
+	 }
+	 Return(0x01)
+   }
+
+   Method(_L03) { 
+ If (\_PR.PR3U) {
+	  Notify(\_PR.CPU3, 1)
+	 }
+	 If (\_PR.PR3D){
+	  Notify(\_PR.CPU3, 3) 
+	 }
+	 Return(0x01)
+   }
+
+   Method(_L04) { 
+ If (\_PR.PR4U) {
+	  Notify(\_PR.CPU4, 1)
+	 }
+	 IF (\_PR.PR4D) {
+	  Notify(\_PR.CPU4, 3) 
+	 }
+	 Return(0x01)
+   }
+}
 
 /* S5 = power off state */
 Name (_S5, Package (4) {
@@ -567,4 +629,5 @@ DefinitionBlock (
 0x00, // reserved
 0x00, // reserved
 })
+
 }
diff --git a/bios/rombios32.c b/bios/rombios32.c
index 967c119..4580462 100755
--- a/bios/rombios32.c
+++ b/bios/rombios32.c
@@ -1329,6 +1329,8 @@ void acpi_bios_init(void)
 fadt->pm_tmr_len = 4;
 fadt->plvl2_lat = cpu_to_le16(0x0fff); // C2 state not supported
 fadt->plvl3_lat = cpu_to_le16(0x0fff); // C3 state not supported
+fadt->gpe0_blk = cpu_to_le32(0xafe0);
+fadt->gpe0_blk_len = 4;
 /* WBINVD + PROC_C1 + SLP_BUTTON + FIX_RTC */
 fadt->flags = cpu_to_le32((1 << 0) | (1 << 2) | (1 << 5) | (1 << 6));
 acpi_build_table_header((struct acpi_table_header *)fadt, "FACP", 
diff --git a/qemu/hw/acpi.c b/qemu/hw/acpi.c
index b97b37d..6e1af9e 100644
--- a/qemu/hw/acpi.c
+++ b/qemu/hw/

Re: [kvm-devel] Hacking QEMU/KVM to use unused graphics adapters

2008-01-16 Thread Dor Laor


On Wed, 2008-01-16 at 13:05 +0100, Anthony de Almeida Lopes wrote:
> I just read the ChangeLogs from kvm-47 to kvm-59 but I didn't notice 
> anything that about PCI pass-through or any VGA work. I'm curious how 
> things are going and what method was selected to accomplish this 
> functionality?
>  - Tony
> 

First we had plans only for plain PCI pass through and not VGA device
that has some bios unification possible issues.
Second, we have it working (also all the code was sent to the list) but
there's quick an effort to be done in order to merge it into mainline.
We do want it to happen but we have some other issues on our plait.
Nevertheless, if one wants to push it on
we'll be happy to assist.
Regards,
Dor

> Dor Laor wrote:
> > It's still out-of -tree.
> > Not for long :)
> >
> > Anthony de Almeida Lopes wrote:
> >> Muli Ben-Yehuda wrote:
> >>  
> >>> On Thu, Oct 11, 2007 at 10:40:47AM +0200, Laurent Vivier wrote:
> >>>
> >>>  
> > There is work in progress for pci pass through capability. Besides
> > PCI it also required to have pv dma or 1-1 mapping between the
> > guest and the host.  Both will be released in the following
> > month. NIC pass through works but I'm not sure about the features
> > required from VGA pass through.  Dor.
> >   
>  Perhaps if we use host IOMMU we don't need pv DMA ?
>    
> >>> Indeed, an IOMMU can provide the 1-1 mapping Dor mentioned above (or
> >>> you can have both PV DMA and an IOMMU).
> >>>
> >>>  
>  How do you say to host to not manage a PCI devices and let the guest
>  managing it ?
>    
> >>> If the host driver is modular, it might be enough to just not load (or
> >>> unload) it.
> >>>
> >>> Cheers,
> >>> Muli
> >>>   
> >>
> >> Thank you for your responses. I was curious, Dor, where could I take 
> >> a look at this code?
> >> I checked both sets of recent git logs and nothing popped out at me 
> >> as being related. Is it still out-of-tree?
> >>
> >> Thanks again,
> >>  - Tony
> >>
> >> - 
> >>
> >> This SF.net email is sponsored by: Splunk Inc.
> >> Still grepping through log files to find problems?  Stop.
> >> Now Search log events and configuration files using AJAX and a browser.
> >> Download your FREE copy of Splunk now >> http://get.splunk.com/
> >> ___
> >> kvm-devel mailing list
> >> kvm-devel@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/kvm-devel
> >>
> >>   
> >
> >
> 


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH] KVM virtio balloon driver

2008-01-16 Thread Dor Laor


On Tue, 2008-01-15 at 17:01 -0200, Marcelo Tosatti wrote:

> OK, thats simpler. How about this:
> 

It's sure is simpler :)

> [PATCH] Virtio balloon driver
> 
> Add a balloon driver for KVM, host<->guest communication is performed
> via virtio.
> 
> Signed-off-by: Marcelo Tosatti <[EMAIL PROTECTED]>

[snip]

> +static void free_page_array(struct balloon_buf *buf, unsigned int npages)
> +{
> + struct page *page;
> + u32 *pfn = (u32 *)&buf->data;
> + int i;
> +
> + for (i=0; i + page = pfn_to_page(*pfn);
> + list_del_init(&page->lru);
> +  __free_page(page);
> + pfn++;

In add_page_array below you update baloon_size & totalram_pages,
it is need here too.

> + }
> +}
> +
> +static void add_page_array(struct virtballoon *v, struct balloon_buf *buf,
> +unsigned int npages)
> +{
> + struct page *page;
> + u32 *pfn = (u32 *)&buf->data;
> + int i;
> +
> + for (i=0; i + page = pfn_to_page(*pfn);
> + v->balloon_size++;
> + totalram_pages--;
> + list_add(&page->lru, &v->balloon_plist);
> + pfn++;
> + }
> +}
> +
> +static void inflate_done(struct virtballoon *v, struct balloon_buf *buf,
> +  unsigned int npages)
> +{
> + u8 status = buf->hdr.status;
> +
> + /* inflate OK */
> + if (!status)
> + add_page_array(v, buf, npages);
> + else 
> + free_page_array(buf, npages);
> +}
> +
> +static void deflate_done(struct virtballoon *v, struct balloon_buf *buf,
> +  unsigned int npages)
> +{
> + u8 status = buf->hdr.status;
> +
> + /* deflate OK, return pages to the system */
> + if (!status) {
> + free_page_array(buf, npages);

If there are update above then no need below.

> + totalram_pages += npages;
> + v->balloon_size -= npages;
> + }
> + return;
> +}
> +

[snip]

> +static void balloon_config_changed(struct virtio_device *vdev)
> +{
> + struct virtballoon *v = vdev->priv;
> + u32 target_nrpages;
> +

A check should be added to see if rmmod_wait is active.
If it is then don't allow the monitor to inflate the balloon since
we like to remove the module.
Best regards,
Dor

> + __virtio_config_val(v->vdev, 0, &target_nrpages);
> + atomic_set(&v->target_nrpages, target_nrpages);
> + wake_up(&v->balloon_wait);
> + dprintk(&vdev->dev, "%s\n", __func__);
> +}
> +
> +static struct virtio_driver virtio_balloon = {
> + .driver.name =  KBUILD_MODNAME,
> + .driver.owner = THIS_MODULE,
> + .id_table = id_table,
> + .probe =balloon_probe,
> + .remove =   __devexit_p(balloon_remove),
> + .config_changed = balloon_config_changed,
> +};
> +
> +module_param(kvm_balloon_debug, int, 0);
> +
> +static int __init kvm_balloon_init(void)
> +{
> + return register_virtio_driver(&virtio_balloon);
> +}
> +
> +static void __exit kvm_balloon_exit(void)
> +{
> + struct virtballoon *v;
> + 
> + list_for_each_entry(v, &balloon_devices, list) { 
> + while (v->balloon_size) {
> + DEFINE_WAIT(wait);
> +
> + atomic_add(v->balloon_size, &v->target_nrpages);
> + wake_up(&v->balloon_wait);
> + prepare_to_wait(&v->rmmod_wait, &wait,
> + TASK_INTERRUPTIBLE);
> + schedule_timeout(HZ*10);
> + finish_wait(&v->rmmod_wait, &wait);
> + }
> + }
> +
> + unregister_virtio_driver(&virtio_balloon);
> +}
> +
> +module_init(kvm_balloon_init);
> +module_exit(kvm_balloon_exit);
> Index: linux-2.6-nv/drivers/virtio/virtio_pci.c
> ===
> --- linux-2.6-nv.orig/drivers/virtio/virtio_pci.c
> +++ linux-2.6-nv/drivers/virtio/virtio_pci.c
> @@ -67,6 +67,7 @@ static struct pci_device_id virtio_pci_i
>   { 0x1AF4, 0x1000, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 }, /* Dummy entry */
>   { 0x1AF4, 0x1001, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 }, /* Dummy entry */
>   { 0x1AF4, 0x1002, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 }, /* Dummy entry */
> + { 0x1AF4, 0x1003, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 }, /* Balloon */
>   { 0 },
>  };
>  
> Index: linux-2.6-nv/include/linux/virtio_balloon.h
> ===
> --- /dev/null
> +++ linux-2.6-nv/include/linux/virtio_balloon.h
> @@ -0,0 +1,20 @@
> +#ifndef _LINUX_VIRTIO_BALLOON_H
> +#define _LINUX_VIRTIO_BALLOON_H
> +#include 
> +
> +#define VIRTIO_ID_BALLOON 3
> +
> +#define CMD_BALLOON_INFLATE 0x1
> +#define CMD_BALLOON_DEFLATE 0x2
> +
> +struct virtio_balloon_hdr {
> + __u8 cmd;
> + __u8 status;
> +};
> +
> +struct virtio_balloon_config
> +{
> + __u32 target_nrpages;
> +};
> +
> +#endif /* _LINUX_VIRTIO_BALLOON_H */
> _

Re: [kvm-devel] [PATCH 1/2] kvmclock - the host part.

2008-01-16 Thread Glauber de Oliveira Costa

Anthony Liguori wrote:
> Glauber de Oliveira Costa wrote:
>> This is the host part of kvm clocksource implementation. As it does
>> not include clockevents, it is a fairly simple implementation. We
>> only have to register a per-vcpu area, and start writting to it 
>> periodically.
>>
>> The area is binary compatible with xen, as we use the same shadow_info 
>> structure.
>>
>> Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
>> ---
>>  arch/x86/kvm/x86.c |   98 
>> +++-
>>  include/asm-x86/kvm_host.h |6 +++
>>  include/asm-x86/kvm_para.h |   24 +++
>>  include/linux/kvm.h|1 +
>>  4 files changed, 128 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 8a90403..fd69aa1 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -19,6 +19,7 @@
>>  #include "irq.h"
>>  #include "mmu.h"
>>  
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>> @@ -412,7 +413,7 @@ static u32 msrs_to_save[] = {
>>  #ifdef CONFIG_X86_64
>>  MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR,
>>  #endif
>> -MSR_IA32_TIME_STAMP_COUNTER,
>> +MSR_IA32_TIME_STAMP_COUNTER, MSR_KVM_SYSTEM_TIME,
>>  };
>>  
>>  static unsigned num_msrs_to_save;
>> @@ -467,6 +468,73 @@ static int do_set_msr(struct kvm_vcpu *vcpu, 
>> unsigned index, u64 *data)
>>  return kvm_set_msr(vcpu, index, *data);
>>  }
>>  
>> +static void kvm_write_wall_clock(struct kvm_vcpu *v, gpa_t wall_clock)
>> +{
>> +int version = 1;
>> +struct wall_clock wc;
>> +unsigned long flags;
>> +struct timespec wc_ts;
>> +
>> +local_irq_save(flags);
>> +kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER,
>> +  &v->arch.hv_clock.tsc_timestamp);
>> +wc_ts = current_kernel_time();
>> +local_irq_restore(flags);
>> +
>> +down_write(¤t->mm->mmap_sem);
>> +kvm_write_guest(v->kvm, wall_clock, &version, sizeof(version));
>> +up_write(¤t->mm->mmap_sem);
>> +
>> +/* With all the info we got, fill in the values */
>> +wc.wc_sec = wc_ts.tv_sec;
>> +wc.wc_nsec = wc_ts.tv_nsec;
>> +wc.wc_version = ++version;
>> +
>> +down_write(¤t->mm->mmap_sem);
>> +kvm_write_guest(v->kvm, wall_clock, &wc, sizeof(wc));
>> +up_write(¤t->mm->mmap_sem);
>>   
> 
> Can we get a comment explaining why we only write the version field and 
> then immediately increment the version and write the whole struct?  It's 
> not at all obvious why the first write is needed to me.
If the comment is the only pending thing, can we add the comment in a 
later commit?

>> +}
>> +static void kvm_write_guest_time(struct kvm_vcpu *v)
>> +{
>> +struct timespec ts;
>> +unsigned long flags;
>> +struct kvm_vcpu_arch *vcpu = &v->arch;
>> +void *shared_kaddr;
>> +
>> +if ((!vcpu->time_page))
>> +return;
>> +
>> +/* Keep irq disabled to prevent changes to the clock */
>> +local_irq_save(flags);
>> +kvm_get_msr(v, MSR_IA32_TIME_STAMP_COUNTER,
>> +  &vcpu->hv_clock.tsc_timestamp);
>> +ktime_get_ts(&ts);
>> +local_irq_restore(flags);
>> +
>> +/* With all the info we got, fill in the values */
>> +
>> +vcpu->hv_clock.system_time = ts.tv_nsec +
>> + (NSEC_PER_SEC * (u64)ts.tv_sec);
>> +/*
>> + * The interface expects us to write an even number signaling 
>> that the
>> + * update is finished. Since the guest won't see the intermediate 
>> states,
>> + * we just write "2" at the end
>> + */
>> +vcpu->hv_clock.version = 2;
>> +
>> +preempt_disable();
>> +
>> +shared_kaddr = kmap_atomic(vcpu->time_page, KM_USER0);
>> +
>> +memcpy(shared_kaddr + vcpu->time_offset, &vcpu->hv_clock,
>> +sizeof(vcpu->hv_clock));
>> +
>> +kunmap_atomic(shared_kaddr, KM_USER0);
>>   
> 
> Instead of doing a kmap/memcpy, I think it would be better to store the 
> GPA of the time page and do a kvm_write_guest().  Otherwise, you're 
> pinning this page in memory.
this functions end up being called from various contexts. Some with the 
mmap_sem held, some uncontended. kvm_write_guest needs it held, so it 
would turn the code into a big spaguetti. Using the kmap was avi's 
suggestion to get around it, which I personally liked: we only grab the 
semaphore when the msr is registered.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [PATCH] KVM simplified virtio balloon driver

2008-01-16 Thread Rusty Russell

After discussions with Anthony Liguori, it seems that the virtio
balloon can be made even simpler.  Here's my attempt.

Since the balloon requires Guest cooperation anyway, there seems
little reason to force it to tell the Host when it wants to reuse a
page.  It can simply fault it in.

Moreover, the target is best expressed in balloon size, since there is
no portable way of getting the total RAM in the system.  The host can
do the math.

Tested with a (fairly hacky) lguest patch.

Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
---
 drivers/virtio/Kconfig  |   10 +
 drivers/virtio/Makefile |1
 drivers/virtio/virtio_balloon.c |  230 
 include/linux/virtio_balloon.h  |   13 ++
 4 files changed, 254 insertions(+)

diff -r c4762959de25 drivers/virtio/Kconfig
--- a/drivers/virtio/KconfigThu Jan 17 10:31:37 2008 +1100
+++ b/drivers/virtio/KconfigThu Jan 17 12:28:23 2008 +1100
@@ -23,3 +23,13 @@ config VIRTIO_PCI
 
  If unsure, say M.
 
+config VIRTIO_BALLOON
+   tristate "Virtio balloon driver (EXPERIMENTAL)"
+   select VIRTIO
+   select VIRTIO_RING
+   ---help---
+This driver supports increasing and decreasing the amount 
+of memory within a KVM guest.
+
+If unsure, say M.
+
diff -r c4762959de25 drivers/virtio/Makefile
--- a/drivers/virtio/Makefile   Thu Jan 17 10:31:37 2008 +1100
+++ b/drivers/virtio/Makefile   Thu Jan 17 12:28:23 2008 +1100
@@ -1,3 +1,4 @@ obj-$(CONFIG_VIRTIO) += virtio.o
 obj-$(CONFIG_VIRTIO) += virtio.o
 obj-$(CONFIG_VIRTIO_RING) += virtio_ring.o
 obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
+obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
diff -r c4762959de25 drivers/virtio/virtio_balloon.c
--- /dev/null   Thu Jan 01 00:00:00 1970 +
+++ b/drivers/virtio/virtio_balloon.c   Thu Jan 17 12:28:23 2008 +1100
@@ -0,0 +1,235 @@
+/* Virtio balloon implementation, inspired by Dor Loar and Marcelo
+ * Tosatti's implementations.
+ *
+ *  Copyright 2008 Rusty Russell IBM Corporation
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, write to the Free Software
+ *  Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+#define DEBUG
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct virtio_balloon
+{
+   struct virtio_device *vdev;
+   struct virtqueue *vq;
+
+   /* Where the ballooning thread waits for config to change. */
+   wait_queue_head_t config_change;
+
+   /* The thread servicing the balloon. */
+   struct task_struct *thread;
+
+   /* Waiting for host to ack the pages we released. */
+   struct completion acked;
+
+   /* The pages we've told the Host we're not using. */
+   unsigned int num_pages;
+   struct list_head pages;
+
+   /* The array of pfns we tell the Host about. */
+   unsigned int num_pfns;
+   u32 pfns[256];
+};
+
+static struct virtio_device_id id_table[] = {
+   { VIRTIO_ID_BALLOON, VIRTIO_DEV_ANY_ID},
+   { 0 },
+};
+
+static void leak_balloon(struct virtio_balloon *vb, unsigned int num)
+{
+   struct page *page;
+   unsigned int i;
+
+   /* Simply free pages, and usage will fault them back in. */
+   for (i = 0; i < num; i++) {
+   page = list_first_entry(&vb->pages, struct page, lru);
+   list_del(&page->lru);
+   __free_page(page);
+   vb->num_pages--;
+   totalram_pages++;
+   }
+}
+
+static void balloon_ack(struct virtqueue *vq)
+{
+   struct virtio_balloon *vb;
+   unsigned int len;
+
+   vb = vq->vq_ops->get_buf(vq, &len);
+   if (vb)
+   complete(&vb->acked);
+}
+
+static void fill_balloon(struct virtio_balloon *vb, unsigned int num)
+{
+   struct scatterlist sg;
+
+   /* We can only do one array worth at a time. */
+   num = min(num, ARRAY_SIZE(vb->pfns));
+
+   for (vb->num_pfns = 0; vb->num_pfns < num; vb->num_pfns++) {
+   struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY);
+   if (!page) {
+   if (printk_ratelimit())
+   dev_printk(KERN_INFO, &vb->vdev->dev,
+  "Out of puff! Can't get %u pages\n",
+  num);
+   /* Sleep for at least 1/5 of a second before retry. */
+

Re: [kvm-devel] [PATCH] KVM simplified virtio balloon driver

2008-01-16 Thread Anthony Liguori

Rusty Russell wrote:
> After discussions with Anthony Liguori, it seems that the virtio
> balloon can be made even simpler.  Here's my attempt.
>
> Since the balloon requires Guest cooperation anyway, there seems
> little reason to force it to tell the Host when it wants to reuse a
> page.  It can simply fault it in.
>
> Moreover, the target is best expressed in balloon size, since there is
> no portable way of getting the total RAM in the system.  The host can
> do the math.
>
> Tested with a (fairly hacky) lguest patch.
>
> Signed-off-by: Rusty Russell <[EMAIL PROTECTED]>
> ---
>  drivers/virtio/Kconfig  |   10 +
>  drivers/virtio/Makefile |1
>  drivers/virtio/virtio_balloon.c |  230 
> 
>  include/linux/virtio_balloon.h  |   13 ++
>  4 files changed, 254 insertions(+)
>
> diff -r c4762959de25 drivers/virtio/Kconfig
> --- a/drivers/virtio/Kconfig  Thu Jan 17 10:31:37 2008 +1100
> +++ b/drivers/virtio/Kconfig  Thu Jan 17 12:28:23 2008 +1100
> @@ -23,3 +23,13 @@ config VIRTIO_PCI
>  
> If unsure, say M.
>  
> +config VIRTIO_BALLOON
> + tristate "Virtio balloon driver (EXPERIMENTAL)"
> + select VIRTIO
> + select VIRTIO_RING
> + ---help---
> +  This driver supports increasing and decreasing the amount 
> +  of memory within a KVM guest.
> +
> +  If unsure, say M.
> +
> diff -r c4762959de25 drivers/virtio/Makefile
> --- a/drivers/virtio/Makefile Thu Jan 17 10:31:37 2008 +1100
> +++ b/drivers/virtio/Makefile Thu Jan 17 12:28:23 2008 +1100
> @@ -1,3 +1,4 @@ obj-$(CONFIG_VIRTIO) += virtio.o
>  obj-$(CONFIG_VIRTIO) += virtio.o
>  obj-$(CONFIG_VIRTIO_RING) += virtio_ring.o
>  obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
> +obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> diff -r c4762959de25 drivers/virtio/virtio_balloon.c
> --- /dev/null Thu Jan 01 00:00:00 1970 +
> +++ b/drivers/virtio/virtio_balloon.c Thu Jan 17 12:28:23 2008 +1100
> @@ -0,0 +1,235 @@
> +/* Virtio balloon implementation, inspired by Dor Loar and Marcelo
> + * Tosatti's implementations.
> + *
> + *  Copyright 2008 Rusty Russell IBM Corporation
> + *
> + *  This program is free software; you can redistribute it and/or modify
> + *  it under the terms of the GNU General Public License as published by
> + *  the Free Software Foundation; either version 2 of the License, or
> + *  (at your option) any later version.
> + *
> + *  This program is distributed in the hope that it will be useful,
> + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
> + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + *  GNU General Public License for more details.
> + *
> + *  You should have received a copy of the GNU General Public License
> + *  along with this program; if not, write to the Free Software
> + *  Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  
> USA
> + */
> +#define DEBUG
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +struct virtio_balloon
> +{
> + struct virtio_device *vdev;
> + struct virtqueue *vq;
> +
> + /* Where the ballooning thread waits for config to change. */
> + wait_queue_head_t config_change;
> +
> + /* The thread servicing the balloon. */
> + struct task_struct *thread;
> +
> + /* Waiting for host to ack the pages we released. */
> + struct completion acked;
> +
> + /* The pages we've told the Host we're not using. */
> + unsigned int num_pages;
> + struct list_head pages;
> +
> + /* The array of pfns we tell the Host about. */
> + unsigned int num_pfns;
> + u32 pfns[256];
> +};
> +
> +static struct virtio_device_id id_table[] = {
> + { VIRTIO_ID_BALLOON, VIRTIO_DEV_ANY_ID},
>   

Could use a space after VIRTIO_DEV_ANY_ID

> + { 0 },
> +};
> +
> +static void leak_balloon(struct virtio_balloon *vb, unsigned int num)
> +{
> + struct page *page;
> + unsigned int i;
> +
> + /* Simply free pages, and usage will fault them back in. */
> + for (i = 0; i < num; i++) {
> + page = list_first_entry(&vb->pages, struct page, lru);
> + list_del(&page->lru);
> + __free_page(page);
> + vb->num_pages--;
> + totalram_pages++;
>   

Do we really want to modify totalram_pages in this driver?  The only 
other place that I see that modifies it is in mm/memory_hotplug and it 
also modifies other things (like num_physpages).  The cmm driver doesn't 
touch totalram_pages.

It would be very useful too to write vb->num_pages into the config space 
whenever it was updated.  This way, the host can easily keep track of 
where the guest is at in terms of ballooning.

Regards,

Anthony Liguori

> + }
> +}
> +
> +static void balloon_ack(struct virtqueue *vq)
> +{
> + struct virtio_balloon *vb;
> + unsigned int len;
> +
> + vb = vq->vq_ops->get_buf(vq, &len);
> + if (vb)
> + complete(&vb->acked);
> +}
> +
> +static void f

Re: [kvm-devel] [PATCH] KVM simplified virtio balloon driver

2008-01-16 Thread Rusty Russell

On Thursday 17 January 2008 13:14:58 Anthony Liguori wrote:
> Rusty Russell wrote:
> > +static struct virtio_device_id id_table[] = {
> > +   { VIRTIO_ID_BALLOON, VIRTIO_DEV_ANY_ID},
>
> Could use a space after VIRTIO_DEV_ANY_ID

Thanks, fixed.

> > +   __free_page(page);
> > +   vb->num_pages--;
> > +   totalram_pages++;
>
> Do we really want to modify totalram_pages in this driver?  The only
> other place that I see that modifies it is in mm/memory_hotplug and it
> also modifies other things (like num_physpages).  The cmm driver doesn't
> touch totalram_pages.

I don't think there's a standard here, they're all ad-hoc (eg. no locking)  
Modifying totalram_pages has the nice effect of showing up in "free" in the 
guest.

We should probably not modify num_physpages, because some places seem to use 
it as an address space limit.  But we should probably fix all those 
networking size heuristics to use totalram_pages instead of num_physpages.

> It would be very useful too to write vb->num_pages into the config space
> whenever it was updated.  This way, the host can easily keep track of
> where the guest is at in terms of ballooning.

OTOH it's currently pretty obvious (and usually fatal) if the guest has 
trouble meeting the balloon requirements.  A serious host needs a way of 
detecting stress in the guest anyway, which this doesn't offer until it's too 
late...

Rusty.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH] KVM simplified virtio balloon driver

2008-01-16 Thread Anthony Liguori

Rusty Russell wrote:
> On Thursday 17 January 2008 13:14:58 Anthony Liguori wrote:
>   
>> Rusty Russell wrote:
>> 
>>> +static struct virtio_device_id id_table[] = {
>>> +   { VIRTIO_ID_BALLOON, VIRTIO_DEV_ANY_ID},
>>>   
>> Could use a space after VIRTIO_DEV_ANY_ID
>> 
>
> Thanks, fixed.
>
>   
>>> +   __free_page(page);
>>> +   vb->num_pages--;
>>> +   totalram_pages++;
>>>   
>> Do we really want to modify totalram_pages in this driver?  The only
>> other place that I see that modifies it is in mm/memory_hotplug and it
>> also modifies other things (like num_physpages).  The cmm driver doesn't
>> touch totalram_pages.
>> 
>
> I don't think there's a standard here, they're all ad-hoc (eg. no locking)  
> Modifying totalram_pages has the nice effect of showing up in "free" in the 
> guest.
>
> We should probably not modify num_physpages, because some places seem to use 
> it as an address space limit.  But we should probably fix all those 
> networking size heuristics to use totalram_pages instead of num_physpages.
>
>   
>> It would be very useful too to write vb->num_pages into the config space
>> whenever it was updated.  This way, the host can easily keep track of
>> where the guest is at in terms of ballooning.
>> 
>
> OTOH it's currently pretty obvious (and usually fatal) if the guest has 
> trouble meeting the balloon requirements.  A serious host needs a way of 
> detecting stress in the guest anyway, which this doesn't offer until it's too 
> late...
>   

The question I'm interested in answering though is not if but when.  I 
would like to know when the guest has reached it's target.

And while we do get the madvise call outs, it's possible that pages have 
been faulted in since then.

Regards,

Anthony Liguori

> Rusty.
>   


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH] KVM simplified virtio balloon driver

2008-01-16 Thread Rusty Russell

On Thursday 17 January 2008 15:01:46 Anthony Liguori wrote:
> Rusty Russell wrote:
> > OTOH it's currently pretty obvious (and usually fatal) if the guest has
> > trouble meeting the balloon requirements.  A serious host needs a way of
> > detecting stress in the guest anyway, which this doesn't offer until it's
> > too late...
>
> The question I'm interested in answering though is not if but when.  I
> would like to know when the guest has reached it's target.

I'm saying that it will be v. quickly in all but "too much squeeze" case.

> And while we do get the madvise call outs, it's possible that pages have
> been faulted in since then.

But that's exactly what the balloon number *doesn't* tell you.  It can tell 
you that it's released pages back to be used by the OS, but not whether the 
OS has used them.

I think this number is good for debugging the balloon driver, but for anything 
else it's a false friend.

Rusty.
PS.  Please cut down mails when you reply.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

[kvm-devel] [PATCH] Export three symbols out.

2008-01-16 Thread Zhang, Xiantao

Hi, Avi/Tony 
This patch exports three symbols out for module use.  Please
comments! :) 
Thanks
Xiantao

From: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
Date: Thu, 17 Jan 2008 14:03:04 +0800
Subject: [PATCH] kvm: ia64 : Export some symbols out for module use.

Export empty_zero_page, ia64_sal_cache_flush, ia64_sal_freq_base
in this patch.
Signed-off-by: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
---
 arch/ia64/kernel/ia64_ksyms.c |3 +++
 arch/ia64/kernel/sal.c|   14 ++
 include/asm-ia64/sal.h|   14 +++---
 3 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/arch/ia64/kernel/ia64_ksyms.c
b/arch/ia64/kernel/ia64_ksyms.c
index c3b4412..43d227f 100644
--- a/arch/ia64/kernel/ia64_ksyms.c
+++ b/arch/ia64/kernel/ia64_ksyms.c
@@ -12,6 +12,9 @@ EXPORT_SYMBOL(memset);
 EXPORT_SYMBOL(memcpy);
 EXPORT_SYMBOL(strlen);
 
+#include
+EXPORT_SYMBOL(empty_zero_page);
+
 #include 
 EXPORT_SYMBOL(ip_fast_csum);   /* hand-coded assembly */
 EXPORT_SYMBOL(csum_ipv6_magic);
diff --git a/arch/ia64/kernel/sal.c b/arch/ia64/kernel/sal.c
index 27c2ef4..67c1d34 100644
--- a/arch/ia64/kernel/sal.c
+++ b/arch/ia64/kernel/sal.c
@@ -284,6 +284,7 @@ ia64_sal_cache_flush (u64 cache_type)
SAL_CALL(isrv, SAL_CACHE_FLUSH, cache_type, 0, 0, 0, 0, 0, 0);
return isrv.status;
 }
+EXPORT_SYMBOL(ia64_sal_cache_flush);
 
 void __init
 ia64_sal_init (struct ia64_sal_systab *systab)
@@ -372,3 +373,16 @@ ia64_sal_oemcall_reentrant(struct ia64_sal_retval
*isrvp, u64 oemfunc,
return 0;
 }
 EXPORT_SYMBOL(ia64_sal_oemcall_reentrant);
+
+long
+ia64_sal_freq_base (unsigned long which, unsigned long
*ticks_per_second,
+   unsigned long *drift_info)
+{
+   struct ia64_sal_retval isrv;
+
+   SAL_CALL(isrv, SAL_FREQ_BASE, which, 0, 0, 0, 0, 0, 0);
+   *ticks_per_second = isrv.v0;
+   *drift_info = isrv.v1;
+   return isrv.status;
+}
+EXPORT_SYMBOL(ia64_sal_freq_base);
diff --git a/include/asm-ia64/sal.h b/include/asm-ia64/sal.h
index 1f5412d..2251118 100644
--- a/include/asm-ia64/sal.h
+++ b/include/asm-ia64/sal.h
@@ -649,17 +649,6 @@ typedef struct err_rec {
  * Now define a couple of inline functions for improved type checking
  * and convenience.
  */
-static inline long
-ia64_sal_freq_base (unsigned long which, unsigned long
*ticks_per_second,
-   unsigned long *drift_info)
-{
-   struct ia64_sal_retval isrv;
-
-   SAL_CALL(isrv, SAL_FREQ_BASE, which, 0, 0, 0, 0, 0, 0);
-   *ticks_per_second = isrv.v0;
-   *drift_info = isrv.v1;
-   return isrv.status;
-}
 
 extern s64 ia64_sal_cache_flush (u64 cache_type);
 extern void __init check_sal_cache_flush (void);
@@ -841,6 +830,9 @@ extern int ia64_sal_oemcall_nolock(struct
ia64_sal_retval *, u64, u64, u64,
   u64, u64, u64, u64, u64);
 extern int ia64_sal_oemcall_reentrant(struct ia64_sal_retval *, u64,
u64, u64,
  u64, u64, u64, u64, u64);
+extern long
+ia64_sal_freq_base (unsigned long which, unsigned long
*ticks_per_second,
+   unsigned long *drift_info);
 #ifdef CONFIG_HOTPLUG_CPU
 /*
  * System Abstraction Layer Specification
-- 
1.5.2


0001-kvm-ia64-Export-some-symbols-out-for-module-use.patch
Description: 0001-kvm-ia64-Export-some-symbols-out-for-module-use.patch
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH] fix cpuid function 4

2008-01-16 Thread Alexander Graf


On Jan 16, 2008, at 9:12 PM, Dan Kenigsberg wrote:

> On Wed, Jan 16, 2008 at 06:34:08PM +0100, Alexander Graf wrote:
>> Dan Kenigsberg wrote:
>>> On Tue, Jan 15, 2008 at 08:57:45AM +0100, Alexander Graf wrote:
>>>
 Dan Kenigsberg wrote:

> On Mon, Jan 14, 2008 at 02:49:31PM +0100, Alexander Graf wrote:
>
>
>> Hi,
>>
>> Currently CPUID function 4 is broken. This function's values  
>> rely on the
>> value of ECX.
>> To solve the issue cleanly, there is already a new API for cpuid
>> settings, which is not used yet.
>> Using the current interface, the function 4 can be easily passed
>> through, by giving multiple function 4 outputs and increasing the
>> index-identifier on the fly. This does not break compatibility.
>>
>> This fix is really important for Mac OS X, as it requires cache
>> information. Please also see my previous patches for Mac OS X  
>> (or rather
>> core duo target) compatibility.
>>
>> Regards,
>>
>> Alex
>>
>>
>
>
>> diff --git a/kernel/x86.c b/kernel/x86.c
>> index b55c177..73312e9 100644
>> --- a/kernel/x86.c
>> +++ b/kernel/x86.c
>> @@ -783,7 +783,7 @@ static int kvm_vcpu_ioctl_set_cpuid(struct  
>> kvm_vcpu *vcpu,
>>  struct kvm_cpuid *cpuid,
>>  struct kvm_cpuid_entry __user *entries)
>> {
>> -int r, i;
>> +int r, i, n = 0;
>>  struct kvm_cpuid_entry *cpuid_entries;
>>
>>  r = -E2BIG;
>> @@ -803,8 +803,17 @@ static int kvm_vcpu_ioctl_set_cpuid(struct  
>> kvm_vcpu *vcpu,
>>  vcpu->arch.cpuid_entries[i].ebx = cpuid_entries[i].ebx;
>>  vcpu->arch.cpuid_entries[i].ecx = cpuid_entries[i].ecx;
>>  vcpu->arch.cpuid_entries[i].edx = cpuid_entries[i].edx;
>> -vcpu->arch.cpuid_entries[i].index = 0;
>> -vcpu->arch.cpuid_entries[i].flags = 0;
>> +switch(vcpu->arch.cpuid_entries[i].function) {
>> +case 4:
>> +vcpu->arch.cpuid_entries[i].index = n;
>> +vcpu->arch.cpuid_entries[i].flags =  
>> KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
>> +n++;
>> +break;
>> +default:
>> +vcpu->arch.cpuid_entries[i].index = 0;
>> +vcpu->arch.cpuid_entries[i].flags = 0;
>> +break;
>> +}
>>
>>
> I will not mention the whitespace damage here :-). Instead, I'd  
> ask you
>
>
 Oh well, after having been into qemu source, I just got used to use
 spaces instead of tabs ;-).


> to review, comment, and even try, the patch that I posted here  
> not long
> ago, exposing all safe host cpuid functions to guests.
>
>
 Sure.
 Basically your patch targets at a completely different use case  
 than
 mine though. You want to expose the host features on the virtual  
 CPU,
 whereas my goal is to have a virtual Core Duo/Solo CPU, even if  
 your
 host CPU is actually an SVM capable one.

 So my CoreDuo CPU definition still fails to populate a proper CPUID
 function 4. With the -cpu host option, Linux works (as it's bright
 enough to know that some values are just plain wrong), but Darwin
 crashes. I am not exactly sure why it is, but I guess it's due to  
 the
 function 4 values exposing a 2-core CPU, which kvm simply doesn't  
 emulate.

>>>
>>> What I wanted to say is that the fact that the usermode support is  
>>> not
>>> used, is not IMHO a good-enough reason to change the kernel:
>>> kvm_vcpu_ioctl_set_cpuid() was ment to be a stupid function, to be  
>>> used
>>> only with old usermode. I hate to teach it the true complex logic  
>>> of Intel's
>>> CPUID.
>>>
>>>
>>
>> The funny part is, you don't have to. Every complex I know of so  
>> far is
>> simply repetitive. If the userspace just sends x cpuid values and the
>> kernel takes x, where's the problem?
>>
>> Of course having a full descriptionary approach is way better, but  
>> I see
>> no real need to not use a stupid interface.
>
> The only reason is that a smarter interface exists, and I want it to  
> be used,
> not hacked arround.
>

This is a valid complaint. Still, one wouldn't have needed the smart  
interface in the first place. Now that it is in, one should of course  
use it.

>>> What I would like to see is something that uses the cpuid2 API,  
>>> and not
>>> circumvene it... For this to happen, I need a deep review of my  
>>> code.
>>>
>>
>> I have to admin that I am really bad at reviewing, so don't expect
>> anything glorious from me.
>
> Anything beyond silence would be glorious.
>

Let's break it and get cpuid2 support in libkvm upst

Re: [kvm-devel] [PATCH 1/2] kvmclock - the host part.

2008-01-16 Thread Gerd Hoffmann

Glauber de Oliveira Costa wrote:
> This is the host part of kvm clocksource implementation. As it does
> not include clockevents, it is a fairly simple implementation. We
> only have to register a per-vcpu area, and start writting to it periodically.
> 
> The area is binary compatible with xen, as we use the same shadow_info 
> structure.

comment needs an update too ;)

> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> - MSR_IA32_TIME_STAMP_COUNTER,
> + MSR_IA32_TIME_STAMP_COUNTER, MSR_KVM_SYSTEM_TIME,

+ MSR_KVM_WALL_CLOCK

Looks good otherwise.

cheers,
  Gerd

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

40 matches

Mail list logo