Re: gva_to_gpa function internals

2015-12-01 Thread Yacine HEBBAL
In fact, my tool walks through paging data structures (entry by entry) 
using the function "kvm_read_guest" (sorry i don't have my machine with 
me right now to poste my code :-( ).

for example to read PDPTEs, I do something like this:

for(i = 0; i < 32; i= i + 8)
{
  kvm_read_guest(kvm, cr3 + i, , 8);
}

I use the same logique for PDEs and PTEs (of couse by masking the flags 
bits to walk from one level to another)


I hope this explains a little more.
I'll poste more code tomorrow to give more details.

Le 01/12/2015 22:31, Paolo Bonzini a écrit :


On 01/12/2015 19:30, Yacine HEBBAL wrote:

Hi all,
I'm trying to build some tools on top of kvm in order to debug, monitor and
reverse engineer the guest OS (ubuntu 12.04, 32 bits)
One of my tools walks through (and prints) the guest paging data structures
as following: cr3 -> pdpte -> pde -> pte -> page (PAE paging, 32 bits)

According to my logs some accessed kernel PTEs are not present (pte =
9090909090909090) in all processes address spaces (even from init process
cr3), however when I use the function kvm_read_guest_virt_helper on their
corresponding virtual addresses (GVAs), I get a correct content (content
correctness checked using system.map file).
Just after calling kvm_read_guest_virt_helper, I check again the PTE
corresponding to the read gva, I see that they are unmapped (invalid, always
9090909090909090)

I investigated a little the code of kvm_read_guest_virt_helper, this
function calls vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, ...) which in turn
calls other functions until FNAME(walk_addr_generic) which seems to do the
translation.
walk_addr_generic seems to do the translation starting from cr3 of the
current process (in line: mmu->get_cr3(vcpu);) and works fine regardless of
the identity of the current process (i.e. current cr3).

So how the function gva_to_gpa is able to the read correctly any GVA that my
tool sees invalid (unmapped) in the paging structures, knowing that my tool
is able to read and display correctly a content of (thousands) many other GVAs ?
I would be very thankful for any feedback :)

Unfortunately that's impossible to know without knowing your tool.  How
does it read guest memory?

Paolo


--
Hebbal Yacine
PhD student
Tel +33 6 45 42 10 96

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: gva_to_gpa function internals

2015-12-01 Thread Paolo Bonzini


On 01/12/2015 19:30, Yacine HEBBAL wrote:
> Hi all,
> I'm trying to build some tools on top of kvm in order to debug, monitor and
> reverse engineer the guest OS (ubuntu 12.04, 32 bits)
> One of my tools walks through (and prints) the guest paging data structures
> as following: cr3 -> pdpte -> pde -> pte -> page (PAE paging, 32 bits)
> 
> According to my logs some accessed kernel PTEs are not present (pte =
> 9090909090909090) in all processes address spaces (even from init process
> cr3), however when I use the function kvm_read_guest_virt_helper on their
> corresponding virtual addresses (GVAs), I get a correct content (content
> correctness checked using system.map file).
> Just after calling kvm_read_guest_virt_helper, I check again the PTE
> corresponding to the read gva, I see that they are unmapped (invalid, always
> 9090909090909090)
> 
> I investigated a little the code of kvm_read_guest_virt_helper, this
> function calls vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, ...) which in turn
> calls other functions until FNAME(walk_addr_generic) which seems to do the
> translation.
> walk_addr_generic seems to do the translation starting from cr3 of the
> current process (in line: mmu->get_cr3(vcpu);) and works fine regardless of
> the identity of the current process (i.e. current cr3).
> 
> So how the function gva_to_gpa is able to the read correctly any GVA that my
> tool sees invalid (unmapped) in the paging structures, knowing that my tool
> is able to read and display correctly a content of (thousands) many other 
> GVAs ?
> I would be very thankful for any feedback :)

Unfortunately that's impossible to know without knowing your tool.  How
does it read guest memory?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: remove unused variable 'vcpu_book3s'

2015-12-01 Thread Daniel Axtens
"Geyslan G. Bem"  writes:

> The vcpu_book3s struct is assigned but never used. So remove it.

Just out of interest, how did you find this? Compiler warning? Static
analysis? Manual inspection?

Thanks in advance!

Regards,
Daniel

>
> Signed-off-by: Geyslan G. Bem 
> ---
>  arch/powerpc/kvm/book3s_64_mmu.c | 3 ---
>  1 file changed, 3 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_64_mmu.c 
> b/arch/powerpc/kvm/book3s_64_mmu.c
> index 774a253..9bf7031 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu.c
> @@ -377,15 +377,12 @@ no_seg_found:
>  
>  static void kvmppc_mmu_book3s_64_slbmte(struct kvm_vcpu *vcpu, u64 rs, u64 
> rb)
>  {
> - struct kvmppc_vcpu_book3s *vcpu_book3s;
>   u64 esid, esid_1t;
>   int slb_nr;
>   struct kvmppc_slb *slbe;
>  
>   dprintk("KVM MMU: slbmte(0x%llx, 0x%llx)\n", rs, rb);
>  
> - vcpu_book3s = to_book3s(vcpu);
> -
>   esid = GET_ESID(rb);
>   esid_1t = GET_ESID_1T(rb);
>   slb_nr = rb & 0xfff;
> -- 
> 2.6.2
>
> ___
> Linuxppc-dev mailing list
> linuxppc-...@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev


signature.asc
Description: PGP signature


[PATCH] kvm: remove unused variable 'vcpu_book3s'

2015-12-01 Thread Geyslan G. Bem
The vcpu_book3s struct is assigned but never used. So remove it.

Signed-off-by: Geyslan G. Bem 
---
 arch/powerpc/kvm/book3s_64_mmu.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c
index 774a253..9bf7031 100644
--- a/arch/powerpc/kvm/book3s_64_mmu.c
+++ b/arch/powerpc/kvm/book3s_64_mmu.c
@@ -377,15 +377,12 @@ no_seg_found:
 
 static void kvmppc_mmu_book3s_64_slbmte(struct kvm_vcpu *vcpu, u64 rs, u64 rb)
 {
-   struct kvmppc_vcpu_book3s *vcpu_book3s;
u64 esid, esid_1t;
int slb_nr;
struct kvmppc_slb *slbe;
 
dprintk("KVM MMU: slbmte(0x%llx, 0x%llx)\n", rs, rb);
 
-   vcpu_book3s = to_book3s(vcpu);
-
esid = GET_ESID(rb);
esid_1t = GET_ESID_1T(rb);
slb_nr = rb & 0xfff;
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: remove unused variable 'vcpu_book3s'

2015-12-01 Thread Geyslan G. Bem
The vcpu_book3s struct is assigned but never used. So remove it.

Signed-off-by: Geyslan G. Bem 
---
 arch/powerpc/kvm/book3s_64_mmu.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu.c b/arch/powerpc/kvm/book3s_64_mmu.c
index 774a253..9bf7031 100644
--- a/arch/powerpc/kvm/book3s_64_mmu.c
+++ b/arch/powerpc/kvm/book3s_64_mmu.c
@@ -377,15 +377,12 @@ no_seg_found:
 
 static void kvmppc_mmu_book3s_64_slbmte(struct kvm_vcpu *vcpu, u64 rs, u64 rb)
 {
-   struct kvmppc_vcpu_book3s *vcpu_book3s;
u64 esid, esid_1t;
int slb_nr;
struct kvmppc_slb *slbe;
 
dprintk("KVM MMU: slbmte(0x%llx, 0x%llx)\n", rs, rb);
 
-   vcpu_book3s = to_book3s(vcpu);
-
esid = GET_ESID(rb);
esid_1t = GET_ESID_1T(rb);
slb_nr = rb & 0xfff;
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: remove unused variable 'vcpu_book3s'

2015-12-01 Thread Daniel Axtens
"Geyslan G. Bem"  writes:

> The vcpu_book3s struct is assigned but never used. So remove it.

Just out of interest, how did you find this? Compiler warning? Static
analysis? Manual inspection?

Thanks in advance!

Regards,
Daniel

>
> Signed-off-by: Geyslan G. Bem 
> ---
>  arch/powerpc/kvm/book3s_64_mmu.c | 3 ---
>  1 file changed, 3 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_64_mmu.c 
> b/arch/powerpc/kvm/book3s_64_mmu.c
> index 774a253..9bf7031 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu.c
> @@ -377,15 +377,12 @@ no_seg_found:
>  
>  static void kvmppc_mmu_book3s_64_slbmte(struct kvm_vcpu *vcpu, u64 rs, u64 
> rb)
>  {
> - struct kvmppc_vcpu_book3s *vcpu_book3s;
>   u64 esid, esid_1t;
>   int slb_nr;
>   struct kvmppc_slb *slbe;
>  
>   dprintk("KVM MMU: slbmte(0x%llx, 0x%llx)\n", rs, rb);
>  
> - vcpu_book3s = to_book3s(vcpu);
> -
>   esid = GET_ESID(rb);
>   esid_1t = GET_ESID_1T(rb);
>   slb_nr = rb & 0xfff;
> -- 
> 2.6.2
>
> ___
> Linuxppc-dev mailing list
> linuxppc-...@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev


signature.asc
Description: PGP signature


Re: [PATCH] kvm: remove unused variable 'vcpu_book3s'

2015-12-01 Thread Geyslan G. Bem
2015-12-01 21:34 GMT-03:00 Daniel Axtens :
> "Geyslan G. Bem"  writes:
>
>> The vcpu_book3s struct is assigned but never used. So remove it.
>
> Just out of interest, how did you find this? Compiler warning? Static
> analysis? Manual inspection?

Sorry, I should have done the patch self contained. I caught it
through static analysis (cppcheck).

>
> Thanks in advance!

You're welcome.

>
> Regards,
> Daniel
>
>>
>> Signed-off-by: Geyslan G. Bem 
>> ---
>>  arch/powerpc/kvm/book3s_64_mmu.c | 3 ---
>>  1 file changed, 3 deletions(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_64_mmu.c 
>> b/arch/powerpc/kvm/book3s_64_mmu.c
>> index 774a253..9bf7031 100644
>> --- a/arch/powerpc/kvm/book3s_64_mmu.c
>> +++ b/arch/powerpc/kvm/book3s_64_mmu.c
>> @@ -377,15 +377,12 @@ no_seg_found:
>>
>>  static void kvmppc_mmu_book3s_64_slbmte(struct kvm_vcpu *vcpu, u64 rs, u64 
>> rb)
>>  {
>> - struct kvmppc_vcpu_book3s *vcpu_book3s;
>>   u64 esid, esid_1t;
>>   int slb_nr;
>>   struct kvmppc_slb *slbe;
>>
>>   dprintk("KVM MMU: slbmte(0x%llx, 0x%llx)\n", rs, rb);
>>
>> - vcpu_book3s = to_book3s(vcpu);
>> -
>>   esid = GET_ESID(rb);
>>   esid_1t = GET_ESID_1T(rb);
>>   slb_nr = rb & 0xfff;
>> --
>> 2.6.2
>>
>> ___
>> Linuxppc-dev mailing list
>> linuxppc-...@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev



-- 
Regards,

Geyslan G. Bem
hackingbits.com
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: remove unused variable 'vcpu_book3s'

2015-12-01 Thread Geyslan G. Bem
2015-12-01 21:34 GMT-03:00 Daniel Axtens :
> "Geyslan G. Bem"  writes:
>
>> The vcpu_book3s struct is assigned but never used. So remove it.
>
> Just out of interest, how did you find this? Compiler warning? Static
> analysis? Manual inspection?

Sorry, I should have done the patch self contained. I caught it
through static analysis (cppcheck).

>
> Thanks in advance!

You're welcome.

>
> Regards,
> Daniel
>
>>
>> Signed-off-by: Geyslan G. Bem 
>> ---
>>  arch/powerpc/kvm/book3s_64_mmu.c | 3 ---
>>  1 file changed, 3 deletions(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_64_mmu.c 
>> b/arch/powerpc/kvm/book3s_64_mmu.c
>> index 774a253..9bf7031 100644
>> --- a/arch/powerpc/kvm/book3s_64_mmu.c
>> +++ b/arch/powerpc/kvm/book3s_64_mmu.c
>> @@ -377,15 +377,12 @@ no_seg_found:
>>
>>  static void kvmppc_mmu_book3s_64_slbmte(struct kvm_vcpu *vcpu, u64 rs, u64 
>> rb)
>>  {
>> - struct kvmppc_vcpu_book3s *vcpu_book3s;
>>   u64 esid, esid_1t;
>>   int slb_nr;
>>   struct kvmppc_slb *slbe;
>>
>>   dprintk("KVM MMU: slbmte(0x%llx, 0x%llx)\n", rs, rb);
>>
>> - vcpu_book3s = to_book3s(vcpu);
>> -
>>   esid = GET_ESID(rb);
>>   esid_1t = GET_ESID_1T(rb);
>>   slb_nr = rb & 0xfff;
>> --
>> 2.6.2
>>
>> ___
>> Linuxppc-dev mailing list
>> linuxppc-...@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev



-- 
Regards,

Geyslan G. Bem
hackingbits.com
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 18/21] KVM: ARM64: Add PMU overflow interrupt routing

2015-12-01 Thread Shannon Zhao


On 2015/12/2 0:57, Marc Zyngier wrote:
> On 01/12/15 16:26, Shannon Zhao wrote:
>>
>>
>> On 2015/12/1 23:41, Marc Zyngier wrote:
 The reason is that when guest clear the overflow register, it will trap
> to kvm and call kvm_pmu_sync_hwstate() as you see above. At this moment,
> the overflow register is still overflowed(that is some bit is still 1).
> So We need to use some flag to mark we already inject this interrupt.
> And if during guest handling the overflow, there is a new overflow
> happening, the pmu->irq_pending will be set ture by
> kvm_pmu_perf_overflow(), then it needs to inject this new interrupt, 
> right?
>>> I don't think so. This is a level interrupt, so the level should stay
>>> high as long as the guest hasn't cleared all possible sources for that
>>> interrupt.
>>>
>>> For your example, the guest writes to PMOVSCLR to clear the overflow
>>> caused by a given counter. If the status is now 0, the interrupt line
>>> drops. If the status is still non zero, the line stays high. And I
>>> believe that writing a 1 to PMOVSSET would actually trigger an
>>> interrupt, or keep it high if it has already high.
>>>
>> Right, writing 1 to PMOVSSET will trigger an interrupt.
>>
>>> In essence, do not try to maintain side state. I've been bitten.
>>
>> So on VM entry, it check if PMOVSSET is zero. If not, call 
>> kvm_vgic_inject_irq to set the level high. If so, set the level low.
>> On VM exit, it seems there is nothing to do.
> 
> It is even simpler than that:
> 
> - When you get an overflow, you inject an interrupt with the level set to 1.
> - When the overflow register gets cleared, you inject the same interrupt
> with the level set to 0.
> 
> I don't think you need to do anything else, and the world switch should
> be left untouched.
> 

On 2015/7/17 23:28, Christoffer Dall wrote:>> > +   
kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
>> > +  pmu->irq_num, 1);
> what context is this overflow handler function?  kvm_vgic_inject_irq
> grabs a mutex, so it can sleep...
>
> from a quick glance at the perf core code, it looks like this is in
> interrupt context, so that call to kvm_vgic_inject_irq looks bad.
>

But as Christoffer said before, it's not good to call
kvm_vgic_inject_irq directly in interrupt context. So if we just kick
the vcpu here and call kvm_vgic_inject_irq on VM entry, is this fine?

Thanks,
-- 
Shannon

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC

2015-12-01 Thread Alexander Duyck
On Tue, Dec 1, 2015 at 7:28 AM, Michael S. Tsirkin  wrote:
> On Tue, Dec 01, 2015 at 11:04:31PM +0800, Lan, Tianyu wrote:
>>
>>
>> On 12/1/2015 12:07 AM, Alexander Duyck wrote:
>> >They can only be corrected if the underlying assumptions are correct
>> >and they aren't.  Your solution would have never worked correctly.
>> >The problem is you assume you can keep the device running when you are
>> >migrating and you simply cannot.  At some point you will always have
>> >to stop the device in order to complete the migration, and you cannot
>> >stop it before you have stopped your page tracking mechanism.  So
>> >unless the platform has an IOMMU that is somehow taking part in the
>> >dirty page tracking you will not be able to stop the guest and then
>> >the device, it will have to be the device and then the guest.
>> >
>> >>>Doing suspend and resume() may help to do migration easily but some
>> >>>devices requires low service down time. Especially network and I got
>> >>>that some cloud company promised less than 500ms network service downtime.
>> >Honestly focusing on the downtime is getting the cart ahead of the
>> >horse.  First you need to be able to do this without corrupting system
>> >memory and regardless of the state of the device.  You haven't even
>> >gotten to that state yet.  Last I knew the device had to be up in
>> >order for your migration to even work.
>>
>> I think the issue is that the content of rx package delivered to stack maybe
>> changed during migration because the piece of memory won't be migrated to
>> new machine. This may confuse applications or stack. Current dummy write
>> solution can ensure the content of package won't change after doing dummy
>> write while the content maybe not received data if migration happens before
>> that point. We can recheck the content via checksum or crc in the protocol
>> after dummy write to ensure the content is what VF received. I think stack
>> has already done such checks and the package will be abandoned if failed to
>> pass through the check.
>
>
> Most people nowdays rely on hardware checksums so I don't think this can
> fly.

Correct.  The checksum/crc approach will not work since it is possible
for a checksum to even be mangled in the case of some features such as
LRO or GRO.

>> Another way is to tell all memory driver are using to Qemu and let Qemu to
>> migrate these memory after stopping VCPU and the device. This seems safe but
>> implementation maybe complex.
>
> Not really 100% safe.  See below.
>
> I think hiding these details behind dma_* API does have
> some appeal. In any case, it gives us a good
> terminology as it covers what most drivers do.

That was kind of my thought.  If we were to build our own
dma_mark_clean() type function that will mark the DMA region dirty on
sync or unmap then that is half the battle right there as we would be
able to at least keep the regions consistent after they have left the
driver.

> There are several components to this:
> - dma_map_* needs to prevent page from
>   being migrated while device is running.
>   For example, expose some kind of bitmap from guest
>   to host, set bit there while page is mapped.
>   What happens if we stop the guest and some
>   bits are still set? See dma_alloc_coherent below
>   for some ideas.

Yeah, I could see something like this working.  Maybe we could do
something like what was done for the NX bit and make use of the upper
order bits beyond the limits of the memory range to mark pages as
non-migratable?

I'm curious.  What we have with a DMA mapped region is essentially
shared memory between the guest and the device.  How would we resolve
something like this with IVSHMEM, or are we blocked there as well in
terms of migration?

> - dma_unmap_* needs to mark page as dirty
>   This can be done by writing into a page.
>
> - dma_sync_* needs to mark page as dirty
>   This is trickier as we can not change the data.
>   One solution is using atomics.
>   For example:
> int x = ACCESS_ONCE(*p);
> cmpxchg(p, x, x);
>   Seems to do a write without changing page
>   contents.

Like I said we can probably kill 2 birds with one stone by just
implementing our own dma_mark_clean() for x86 virtualized
environments.

I'd say we could take your solution one step further and just use 0
instead of bothering to read the value.  After all it won't write the
area if the value at the offset is not 0.  The only downside is that
this is a locked operation so we will take a pretty serious
performance penalty when this is active.  As such my preference would
be to hide the code behind some static key that we could then switch
on in the event of a VM being migrated.

> - dma_alloc_coherent memory (e.g. device rings)
>   must be migrated after device stopped modifying it.
>   Just stopping the VCPU is not enough:
>   you must make sure device is not changing it.
>
>   Or maybe the device has some kind of ring flush operation,
>   if there was a 

Re: [PATCH 00/11] KVM: x86: track guest page access

2015-12-01 Thread Xiao Guangrong



On 12/01/2015 06:17 PM, Paolo Bonzini wrote:



On 30/11/2015 19:26, Xiao Guangrong wrote:

This patchset introduces the feature which allows us to track page
access in guest. Currently, only write access tracking is implemented
in this version.

Four APIs are introduces:
- kvm_page_track_add_page(kvm, gfn, mode), single guest page @gfn is
   added into the track pool of the guest instance represented by @kvm,
   @mode specifies which kind of access on the @gfn is tracked

- kvm_page_track_remove_page(kvm, gfn, mode), is the opposed operation
   of kvm_page_track_add_page() which removes @gfn from the tracking pool.
   gfn is no tracked after its last user is gone

- kvm_page_track_register_notifier(kvm, n), register a notifier so that
   the event triggered by page tracking will be received, at that time,
   the callback of n->track_write() will be called

- kvm_page_track_unregister_notifier(kvm, n), does the opposed operation
   of kvm_page_track_register_notifier(), which unlinks the notifier and
   stops receiving the tracked event

The first user of page track is non-leaf shadow page tables as they are
always write protected. It also gains performance improvement because
page track speeds up page fault handler for the tracked pages. The
performance result of kernel building is as followings:

before   after
real 461.63   real 455.48
user 4529.55  user 4557.88
sys 1995.39   sys 1922.57


For KVM-GT, as far as I know Andrea Arcangeli is working on extending
userfaultfd to tracking write faults only.  Perhaps KVM-GT can do
something similar, where KVM gets the write tracking functionality for
free through the MMU notifiers.  Any thoughts on this?


Userfaultfd is excellent and has the ability to notify write event indeed,
however, it is not suitable for the use case of shadow page.

For the performance, shadow GPU is performance critical and requires
frequently being switched, it is not good to handle it in userspace. And
windows guest has many GPU tables and updates it frequently, that means,
we need to write protect huge number of pages which are single page based,
I am afraid userfaultfd can not handle this case efficiently.

For the functionality, userfaultfd can not fill the need of shadow page
because:
- the page is keeping readonly, userfaultfd can not fix the fault and let
  the vcpu progress (write access causes writeable gup).

- the access need to be emulated, however, userfaultfd/kernel does not have
  the ability to emulate the access as the access is trigged by guest, the
  instruction info is stored in VMCS so that only KVM can emulate it.

- shadow page needs to be notified after the emulation is finished as it
  should know the new data written to the page to update its page hierarchy.
  (some hardwares lack the 'retry' ability so the shadow page table need to
   reflect the table in guest at any time).



Applying your technique to non-leaf shadow pages actually makes this
series quite interesting. :)  Shadow paging is still in use for nested
EPT, so it's always a good idea to speed it up.


Yes. Very glad to see you like it. :)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM with PCI forwarding really slow after 4.1

2015-12-01 Thread Michael Büsch
Hi,

I use "-device pci-assign,host=00:1a.0" to forward a USB host chip to a
Win7 32 bit inside of qemu/kvm. That used to work pretty well, but it broke
horribly somewhere after 4.1. With recent kernels the virtual machine
boots, but is _very_ slow. It takes hours to boot.
If PCI forwarding is disabled, everything is fine.

qemu throws this warning on startup:
qemu-system-i386: -device pci-assign,host=00:1a.0: PCI region 0 at address 
0xf253a000 has size 0x400, which is not a multiple of 4K.  You might experience 
some performance hit due to that.

_But_ it also shows that warning for 4.1 and earlier kernels that work pretty 
fast.

I tried to bisect the problem, but I ran into some some kernels that
don't even boot on my machine (the skipped ones). So it's a bit hard to
make progress.

Here is my git bisect log that narrows it down to under 100 commits.
Does anyone have a clue what could cause this?

(The log can be replayed with git bisect replay on Linus' tree).




# bad: [8005c49d9aea74d382f474ce11afbbc7d7130bec] Linux 4.4-rc1
# good: [b953c0d234bc72e8489d3bf51a276c5c4ec85345] Linux 4.1
git bisect start 'v4.4-rc1' 'v4.1'
# bad: [dd5cdb48edfd34401799056a9acf61078d773f90] Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
git bisect bad dd5cdb48edfd34401799056a9acf61078d773f90
# bad: [23908db413eccd77084b09c9b0a4451dfb0524c0] Merge tag 'staging-4.2-rc1' 
of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
git bisect bad 23908db413eccd77084b09c9b0a4451dfb0524c0
# bad: [14738e03312ff1137109d68bcbf103c738af0f4a] Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
git bisect bad 14738e03312ff1137109d68bcbf103c738af0f4a
# good: [5a602e157a9d91d5ce98d07c404097edba8ec9f3] Merge tag 'spi-v4.2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
git bisect good 5a602e157a9d91d5ce98d07c404097edba8ec9f3
# good: [a4244b0cf58d56c171874e85228ba5deffeb017a] net/ethtool: Add current 
supported tunable options
git bisect good a4244b0cf58d56c171874e85228ba5deffeb017a
# bad: [98ec21a01896751b673b6c731ca8881daa8b2c6d] Merge branch 
'sched-hrtimers-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 98ec21a01896751b673b6c731ca8881daa8b2c6d
# good: [4b1f2af6752a4cc9acc1c22ddf3842478965f113] Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
git bisect good 4b1f2af6752a4cc9acc1c22ddf3842478965f113
# good: [08d183e3c1f650b4db1d07d764502116861542fa] Merge tag 'powerpc-4.2-1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux
git bisect good 08d183e3c1f650b4db1d07d764502116861542fa
# skip: [05fe125fa3237de2ec5bada80031e694de78909c] Merge tag 'kvm-arm-for-4.2' 
of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
git bisect skip 05fe125fa3237de2ec5bada80031e694de78909c
# skip: [edc90b7dc4ceef62ef0ad9cc6c3f5dc770e83ad2] KVM: MMU: fix SMAP 
virtualization
git bisect skip edc90b7dc4ceef62ef0ad9cc6c3f5dc770e83ad2
# skip: [910a6aae4e2e45855efc4a268e43eed2d8445575] KVM: MTRR: exactly define 
the size of variable MTRRs
git bisect skip 910a6aae4e2e45855efc4a268e43eed2d8445575
# skip: [822bf4833ecc8ea63c69f3ed894c13b4509c9e85] arm64: defconfig: enable 
memtest
git bisect skip 822bf4833ecc8ea63c69f3ed894c13b4509c9e85


-- 
Michael


pgpOqF_wEpYeL.pgp
Description: OpenPGP digital signature


Re: [PATCH v4 18/21] KVM: ARM64: Add PMU overflow interrupt routing

2015-12-01 Thread Marc Zyngier
On 01/12/15 16:26, Shannon Zhao wrote:
> 
> 
> On 2015/12/1 23:41, Marc Zyngier wrote:
>>> The reason is that when guest clear the overflow register, it will trap
 to kvm and call kvm_pmu_sync_hwstate() as you see above. At this moment,
 the overflow register is still overflowed(that is some bit is still 1).
 So We need to use some flag to mark we already inject this interrupt.
 And if during guest handling the overflow, there is a new overflow
 happening, the pmu->irq_pending will be set ture by
 kvm_pmu_perf_overflow(), then it needs to inject this new interrupt, right?
>> I don't think so. This is a level interrupt, so the level should stay
>> high as long as the guest hasn't cleared all possible sources for that
>> interrupt.
>>
>> For your example, the guest writes to PMOVSCLR to clear the overflow
>> caused by a given counter. If the status is now 0, the interrupt line
>> drops. If the status is still non zero, the line stays high. And I
>> believe that writing a 1 to PMOVSSET would actually trigger an
>> interrupt, or keep it high if it has already high.
>>
> Right, writing 1 to PMOVSSET will trigger an interrupt.
> 
>> In essence, do not try to maintain side state. I've been bitten.
> 
> So on VM entry, it check if PMOVSSET is zero. If not, call 
> kvm_vgic_inject_irq to set the level high. If so, set the level low.
> On VM exit, it seems there is nothing to do.

It is even simpler than that:

- When you get an overflow, you inject an interrupt with the level set to 1.
- When the overflow register gets cleared, you inject the same interrupt
with the level set to 0.

I don't think you need to do anything else, and the world switch should
be left untouched.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [for-2.6 PATCH 1/3] target-i386: Define structs for layout of xsave area

2015-12-01 Thread Richard Henderson

On 11/30/2015 03:18 AM, Paolo Bonzini wrote:

Because this is always little endian, I would write it as uint8_t[16][16].


Maybe.  That isn't altogether handy for TCG, since we'll be wanting to bswap 
these buffers (probably in uint64_t chunks).



r~
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [for-2.6 PATCH 1/3] target-i386: Define structs for layout of xsave area

2015-12-01 Thread Eduardo Habkost
On Tue, Dec 01, 2015 at 09:09:47AM -0800, Richard Henderson wrote:
> On 11/30/2015 03:18 AM, Paolo Bonzini wrote:
> >Because this is always little endian, I would write it as uint8_t[16][16].
> 
> Maybe.  That isn't altogether handy for TCG, since we'll be wanting to bswap
> these buffers (probably in uint64_t chunks).

X86XSaveArea will be used only when loading/saving state using
xsave, not for executing regular instructions. In X86CPU, the
data is already stored as XMMReg unions (the one with the
XMM_[BWDQ] helpers).

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [for-2.6 PATCH 1/3] target-i386: Define structs for layout of xsave area

2015-12-01 Thread Richard Henderson

On 12/01/2015 09:15 AM, Eduardo Habkost wrote:

On Tue, Dec 01, 2015 at 09:09:47AM -0800, Richard Henderson wrote:

On 11/30/2015 03:18 AM, Paolo Bonzini wrote:

Because this is always little endian, I would write it as uint8_t[16][16].


Maybe.  That isn't altogether handy for TCG, since we'll be wanting to bswap
these buffers (probably in uint64_t chunks).


X86XSaveArea will be used only when loading/saving state using
xsave, not for executing regular instructions.


... like the regular instruction xsave?

https://patchwork.ozlabs.org/patch/493318/


In X86CPU, the
data is already stored as XMMReg unions (the one with the
XMM_[BWDQ] helpers).


Of course.  But those unions are arranged to be in big-endian format on 
big-endian hosts.  So we need to swap the data back to little-endian format for 
storage into guest memory.



r~
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [for-2.6 PATCH 1/3] target-i386: Define structs for layout of xsave area

2015-12-01 Thread Paolo Bonzini


On 01/12/2015 18:20, Richard Henderson wrote:
>>
>> X86XSaveArea will be used only when loading/saving state using
>> xsave, not for executing regular instructions.
> 
> ... like the regular instruction xsave?
> 
> https://patchwork.ozlabs.org/patch/493318/

Right, but that's a helper anyway.

>> In X86CPU, the
>> data is already stored as XMMReg unions (the one with the
>> XMM_[BWDQ] helpers).
> 
> Of course.  But those unions are arranged to be in big-endian format on
> big-endian hosts.  So we need to swap the data back to little-endian
> format for storage into guest memory.

Yes, you can use byte moves with XMM_B (more obvious), or stq_le_p with
XMM_Q (faster I guess---though the compiler might optimize the former on
little-endian hosts).  Either works with an uint8_t[] destination.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC

2015-12-01 Thread Michael S. Tsirkin
On Tue, Dec 01, 2015 at 09:04:32AM -0800, Alexander Duyck wrote:
> On Tue, Dec 1, 2015 at 7:28 AM, Michael S. Tsirkin  wrote:
> > On Tue, Dec 01, 2015 at 11:04:31PM +0800, Lan, Tianyu wrote:
> >>
> >>
> >> On 12/1/2015 12:07 AM, Alexander Duyck wrote:
> >> >They can only be corrected if the underlying assumptions are correct
> >> >and they aren't.  Your solution would have never worked correctly.
> >> >The problem is you assume you can keep the device running when you are
> >> >migrating and you simply cannot.  At some point you will always have
> >> >to stop the device in order to complete the migration, and you cannot
> >> >stop it before you have stopped your page tracking mechanism.  So
> >> >unless the platform has an IOMMU that is somehow taking part in the
> >> >dirty page tracking you will not be able to stop the guest and then
> >> >the device, it will have to be the device and then the guest.
> >> >
> >> >>>Doing suspend and resume() may help to do migration easily but some
> >> >>>devices requires low service down time. Especially network and I got
> >> >>>that some cloud company promised less than 500ms network service 
> >> >>>downtime.
> >> >Honestly focusing on the downtime is getting the cart ahead of the
> >> >horse.  First you need to be able to do this without corrupting system
> >> >memory and regardless of the state of the device.  You haven't even
> >> >gotten to that state yet.  Last I knew the device had to be up in
> >> >order for your migration to even work.
> >>
> >> I think the issue is that the content of rx package delivered to stack 
> >> maybe
> >> changed during migration because the piece of memory won't be migrated to
> >> new machine. This may confuse applications or stack. Current dummy write
> >> solution can ensure the content of package won't change after doing dummy
> >> write while the content maybe not received data if migration happens before
> >> that point. We can recheck the content via checksum or crc in the protocol
> >> after dummy write to ensure the content is what VF received. I think stack
> >> has already done such checks and the package will be abandoned if failed to
> >> pass through the check.
> >
> >
> > Most people nowdays rely on hardware checksums so I don't think this can
> > fly.
> 
> Correct.  The checksum/crc approach will not work since it is possible
> for a checksum to even be mangled in the case of some features such as
> LRO or GRO.
> 
> >> Another way is to tell all memory driver are using to Qemu and let Qemu to
> >> migrate these memory after stopping VCPU and the device. This seems safe 
> >> but
> >> implementation maybe complex.
> >
> > Not really 100% safe.  See below.
> >
> > I think hiding these details behind dma_* API does have
> > some appeal. In any case, it gives us a good
> > terminology as it covers what most drivers do.
> 
> That was kind of my thought.  If we were to build our own
> dma_mark_clean() type function that will mark the DMA region dirty on
> sync or unmap then that is half the battle right there as we would be
> able to at least keep the regions consistent after they have left the
> driver.
> 
> > There are several components to this:
> > - dma_map_* needs to prevent page from
> >   being migrated while device is running.
> >   For example, expose some kind of bitmap from guest
> >   to host, set bit there while page is mapped.
> >   What happens if we stop the guest and some
> >   bits are still set? See dma_alloc_coherent below
> >   for some ideas.
> 
> Yeah, I could see something like this working.  Maybe we could do
> something like what was done for the NX bit and make use of the upper
> order bits beyond the limits of the memory range to mark pages as
> non-migratable?
> 
> I'm curious.  What we have with a DMA mapped region is essentially
> shared memory between the guest and the device.  How would we resolve
> something like this with IVSHMEM, or are we blocked there as well in
> terms of migration?

I have some ideas. Will post later.

> > - dma_unmap_* needs to mark page as dirty
> >   This can be done by writing into a page.
> >
> > - dma_sync_* needs to mark page as dirty
> >   This is trickier as we can not change the data.
> >   One solution is using atomics.
> >   For example:
> > int x = ACCESS_ONCE(*p);
> > cmpxchg(p, x, x);
> >   Seems to do a write without changing page
> >   contents.
> 
> Like I said we can probably kill 2 birds with one stone by just
> implementing our own dma_mark_clean() for x86 virtualized
> environments.
> 
> I'd say we could take your solution one step further and just use 0
> instead of bothering to read the value.  After all it won't write the
> area if the value at the offset is not 0.

Really almost any atomic that has no side effect will do.
atomic or with 0
atomic and with 

It's just that cmpxchg already happens to have a portable
wrapper.

> The only downside is that
> this is a locked operation so we will take a 

Re: [PATCH v2 00/21] arm64: KVM: world switch in C

2015-12-01 Thread Marc Zyngier
On 01/12/15 12:00, Christoffer Dall wrote:
> On Tue, Dec 01, 2015 at 09:58:23AM +, Marc Zyngier wrote:
>> On 30/11/15 20:33, Christoffer Dall wrote:
>>> On Fri, Nov 27, 2015 at 06:49:54PM +, Marc Zyngier wrote:
 Once upon a time, the KVM/arm64 world switch was a nice, clean, lean
 and mean piece of hand-crafted assembly code. Over time, features have
 crept in, the code has become harder to maintain, and the smallest
 change is a pain to introduce. The VHE patches are a prime example of
 why this doesn't work anymore.

 This series rewrites most of the existing assembly code in C, but keeps
 the existing code structure in place (most function names will look
 familiar to the reader). The biggest change is that we don't have to
 deal with a static register allocation (the compiler does it for us),
 we can easily follow structure and pointers, and only the lowest level
 is still in assembly code. Oh, and a negative diffstat.

 There is still a healthy dose of inline assembly (system register
 accessors, runtime code patching), but I've tried not to make it too
 invasive. The generated code, while not exactly brilliant, doesn't
 look too shaby. I do expect a small performance degradation, but I
 believe this is something we can improve over time (my initial
 measurements don't show any obvious regression though).
>>>
>>> I ran this through my experimental setup on m400 and got this:
>>
>> [...]
>>
>>> What this tells me is that we do take a noticable hit on the
>>> world-switch path, which shows up in the TCP_RR and hackbench workloads,
>>> which have a high precision in their output.
>>>
>>> Note that the memcached number is well within its variability between
>>> individual benchmark runs, where it varies to 12% of its average in over
>>> 80% of the executions.
>>>
>>> I don't think this is a showstopper thought, but we could consider
>>> looking more closely at a breakdown of the world-switch path and verify
>>> if/where we are really taking a hit.
>>
>> Thanks for doing so, very interesting. As a data point, what compiler
>> are you using? I'd expect some variability based on the compiler version...
>>
> I used the following (compiling natively on the m400):
> 
> gcc version 4.8.2 (Ubuntu/Linaro 4.8.2-19ubuntu1)

For what it is worth, I've ran hackbench on my Seattle B0 (8xA57 2GHz),
with a 4 vcpu VM and got the following results (10 runs per kernel
version, same configuration):

v4.4-rc3-wsinc: Average 31.750
32.459
32.124
32.435
31.940
31.085
31.804
31.862
30.985
31.450
31.359

v4.4-rc3: Average 31.954
31.806
31.598
32.697
31.472
31.410
32.562
31.938
31.932
31.672
32.459

This is with GCC as produced by Linaro:
aarch64-linux-gnu-gcc (Linaro GCC 5.1-2015.08) 5.1.1 20150608

It could well be that your compiler generates worse code than the one I
use, or that the code it outputs is badly tuned for XGene. I guess I
need to unearth my Mustang to find out...

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM with PCI forwarding really slow after 4.1

2015-12-01 Thread Paolo Bonzini


On 01/12/2015 18:09, Michael Büsch wrote:
> Hi,
> 
> I use "-device pci-assign,host=00:1a.0" to forward a USB host chip
> to a Win7 32 bit inside of qemu/kvm. That used to work pretty well,
> but it broke horribly somewhere after 4.1. With recent kernels the
> virtual machine boots, but is _very_ slow. It takes hours to boot. 
> If PCI forwarding is disabled, everything is fine.

This has been reported already, I'm going to look at it this week.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


gva_to_gpa function internals

2015-12-01 Thread Yacine HEBBAL
Hi all,
I'm trying to build some tools on top of kvm in order to debug, monitor and
reverse engineer the guest OS (ubuntu 12.04, 32 bits)
One of my tools walks through (and prints) the guest paging data structures
as following: cr3 -> pdpte -> pde -> pte -> page (PAE paging, 32 bits)

According to my logs some accessed kernel PTEs are not present (pte =
9090909090909090) in all processes address spaces (even from init process
cr3), however when I use the function kvm_read_guest_virt_helper on their
corresponding virtual addresses (GVAs), I get a correct content (content
correctness checked using system.map file).
Just after calling kvm_read_guest_virt_helper, I check again the PTE
corresponding to the read gva, I see that they are unmapped (invalid, always
9090909090909090)

I investigated a little the code of kvm_read_guest_virt_helper, this
function calls vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, ...) which in turn
calls other functions until FNAME(walk_addr_generic) which seems to do the
translation.
walk_addr_generic seems to do the translation starting from cr3 of the
current process (in line: mmu->get_cr3(vcpu);) and works fine regardless of
the identity of the current process (i.e. current cr3).

So how the function gva_to_gpa is able to the read correctly any GVA that my
tool sees invalid (unmapped) in the paging structures, knowing that my tool
is able to read and display correctly a content of (thousands) many other GVAs ?
I would be very thankful for any feedback :)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/5] Add virtio transport for AF_VSOCK

2015-12-01 Thread Stefan Hajnoczi
v2:
 * Rebased onto Linux v4.4-rc2
 * vhost: Refuse to assign reserved CIDs
 * vhost: Refuse guest CID if already in use
 * vhost: Only accept correctly addressed packets (no spoofing!)
 * vhost: Support flexible rx/tx descriptor layout
 * vhost: Add missing total_tx_buf decrement
 * virtio_transport: Fix total_tx_buf accounting
 * virtio_transport: Add virtio_transport global mutex to prevent races
 * common: Notify other side of SOCK_STREAM disconnect (fixes shutdown
   semantics)
 * common: Avoid recursive mutex_lock(tx_lock) for write_space (fixes deadlock)
 * common: Define VIRTIO_VSOCK_TYPE_STREAM/DGRAM hardware interface constants
 * common: Define VIRTIO_VSOCK_SHUTDOWN_RCV/SEND hardware interface constants
 * common: Fix peer_buf_alloc inheritance on child socket

This patch series adds a virtio transport for AF_VSOCK (net/vmw_vsock/).
AF_VSOCK is designed for communication between virtual machines and
hypervisors.  It is currently only implemented for VMware's VMCI transport.

This series implements the proposed virtio-vsock device specification from
here:
http://comments.gmane.org/gmane.comp.emulators.virtio.devel/855

Most of the work was done by Asias He and Gerd Hoffmann a while back.  I have
picked up the series again.

The QEMU userspace changes are here:
https://github.com/stefanha/qemu/commits/vsock

Why virtio-vsock?
-
Guest<->host communication is currently done over the virtio-serial device.
This makes it hard to port sockets API-based applications and is limited to
static ports.

virtio-vsock uses the sockets API so that applications can rely on familiar
SOCK_STREAM and SOCK_DGRAM semantics.  Applications on the host can easily
connect to guest agents because the sockets API allows multiple connections to
a listen socket (unlike virtio-serial).  This simplifies the guest<->host
communication and eliminates the need for extra processes on the host to
arbitrate virtio-serial ports.

Overview

This series adds 3 pieces:

1. virtio_transport_common.ko - core virtio vsock code that uses vsock.ko

2. virtio_transport.ko - guest driver

3. drivers/vhost/vsock.ko - host driver

Howto
-
The following kernel options are needed:
  CONFIG_VSOCKETS=y
  CONFIG_VIRTIO_VSOCKETS=y
  CONFIG_VIRTIO_VSOCKETS_COMMON=y
  CONFIG_VHOST_VSOCK=m

Launch QEMU as follows:
  # qemu ... -device vhost-vsock-pci,id=vhost-vsock-pci0,guest-cid=3

Guest and host can communicate via AF_VSOCK sockets.  The host's CID (address)
is 2 and the guest is automatically assigned a CID (use VMADDR_CID_ANY (-1) to
bind to it).

Status
--
There are a few design changes I'd like to make to the virtio-vsock device:

1. The 3-way handshake isn't necessary over a reliable transport (virtqueue).
   Spoofing packets is also impossible so the security aspects of the 3-way
   handshake (including syn cookie) add nothing.  The next version will have a
   single operation to establish a connection.

2. Credit-based flow control doesn't work for SOCK_DGRAM since multiple clients
   can transmit to the same listen socket.  There is no way for the clients to
   coordinate buffer space with each other fairly.  The next version will drop
   credit-based flow control for SOCK_DGRAM and only rely on best-effort
   delivery.  SOCK_STREAM still has guaranteed delivery.

3. In the next version only the host will be able to establish connections
   (i.e. to connect to a guest agent).  This is for security reasons since
   there is currently no ability to provide host services only to certain
   guests.  This also matches how AF_VSOCK works on modern VMware hypervisors.

Asias He (5):
  VSOCK: Introduce vsock_find_unbound_socket and
vsock_bind_dgram_generic
  VSOCK: Introduce virtio-vsock-common.ko
  VSOCK: Introduce virtio-vsock.ko
  VSOCK: Introduce vhost-vsock.ko
  VSOCK: Add Makefile and Kconfig

 drivers/vhost/Kconfig   |4 +
 drivers/vhost/Kconfig.vsock |7 +
 drivers/vhost/Makefile  |4 +
 drivers/vhost/vsock.c   |  631 +++
 drivers/vhost/vsock.h   |4 +
 include/linux/virtio_vsock.h|  209 +
 include/net/af_vsock.h  |2 +
 include/uapi/linux/virtio_ids.h |1 +
 include/uapi/linux/virtio_vsock.h   |   89 +++
 net/vmw_vsock/Kconfig   |   18 +
 net/vmw_vsock/Makefile  |2 +
 net/vmw_vsock/af_vsock.c|   70 ++
 net/vmw_vsock/virtio_transport.c|  466 +++
 net/vmw_vsock/virtio_transport_common.c | 1272 +++
 14 files changed, 2779 insertions(+)
 create mode 100644 drivers/vhost/Kconfig.vsock
 create mode 100644 drivers/vhost/vsock.c
 create mode 100644 drivers/vhost/vsock.h
 create mode 100644 include/linux/virtio_vsock.h
 create mode 100644 include/uapi/linux/virtio_vsock.h
 create mode 100644 net/vmw_vsock/virtio_transport.c
 create mode 100644 

[PATCH v2 3/5] VSOCK: Introduce virtio-vsock.ko

2015-12-01 Thread Stefan Hajnoczi
From: Asias He 

VM sockets virtio transport implementation. This module runs in guest
kernel.

Signed-off-by: Asias He 
Signed-off-by: Stefan Hajnoczi 
---
v2:
 * Fix total_tx_buf accounting
 * Add virtio_transport global mutex to prevent races
---
 net/vmw_vsock/virtio_transport.c | 466 +++
 1 file changed, 466 insertions(+)
 create mode 100644 net/vmw_vsock/virtio_transport.c

diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
new file mode 100644
index 000..df65dca
--- /dev/null
+++ b/net/vmw_vsock/virtio_transport.c
@@ -0,0 +1,466 @@
+/*
+ * virtio transport for vsock
+ *
+ * Copyright (C) 2013-2015 Red Hat, Inc.
+ * Author: Asias He 
+ * Stefan Hajnoczi 
+ *
+ * Some of the code is take from Gerd Hoffmann 's
+ * early virtio-vsock proof-of-concept bits.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static struct workqueue_struct *virtio_vsock_workqueue;
+static struct virtio_vsock *the_virtio_vsock;
+static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
+static void virtio_vsock_rx_fill(struct virtio_vsock *vsock);
+
+struct virtio_vsock {
+   /* Virtio device */
+   struct virtio_device *vdev;
+   /* Virtio virtqueue */
+   struct virtqueue *vqs[VSOCK_VQ_MAX];
+   /* Wait queue for send pkt */
+   wait_queue_head_t queue_wait;
+   /* Work item to send pkt */
+   struct work_struct tx_work;
+   /* Work item to recv pkt */
+   struct work_struct rx_work;
+   /* Mutex to protect send pkt*/
+   struct mutex tx_lock;
+   /* Mutex to protect recv pkt*/
+   struct mutex rx_lock;
+   /* Number of recv buffers */
+   int rx_buf_nr;
+   /* Number of max recv buffers */
+   int rx_buf_max_nr;
+   /* Used for global tx buf limitation */
+   u32 total_tx_buf;
+   /* Guest context id, just like guest ip address */
+   u32 guest_cid;
+};
+
+static struct virtio_vsock *virtio_vsock_get(void)
+{
+   return the_virtio_vsock;
+}
+
+static u32 virtio_transport_get_local_cid(void)
+{
+   struct virtio_vsock *vsock = virtio_vsock_get();
+
+   return vsock->guest_cid;
+}
+
+static int
+virtio_transport_send_pkt(struct vsock_sock *vsk,
+ struct virtio_vsock_pkt_info *info)
+{
+   u32 src_cid, src_port, dst_cid, dst_port;
+   int ret, in_sg = 0, out_sg = 0;
+   struct virtio_transport *trans;
+   struct virtio_vsock_pkt *pkt;
+   struct virtio_vsock *vsock;
+   struct scatterlist hdr, buf, *sgs[2];
+   struct virtqueue *vq;
+   u32 pkt_len = info->pkt_len;
+   DEFINE_WAIT(wait);
+
+   vsock = virtio_vsock_get();
+   if (!vsock)
+   return -ENODEV;
+
+   src_cid = virtio_transport_get_local_cid();
+   src_port = vsk->local_addr.svm_port;
+   if (!info->remote_cid) {
+   dst_cid = vsk->remote_addr.svm_cid;
+   dst_port = vsk->remote_addr.svm_port;
+   } else {
+   dst_cid = info->remote_cid;
+   dst_port = info->remote_port;
+   }
+
+   trans = vsk->trans;
+   vq = vsock->vqs[VSOCK_VQ_TX];
+
+   if (pkt_len > VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE)
+   pkt_len = VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE;
+   pkt_len = virtio_transport_get_credit(trans, pkt_len);
+   /* Do not send zero length OP_RW pkt*/
+   if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW)
+   return pkt_len;
+
+   /* Respect global tx buf limitation */
+   mutex_lock(>tx_lock);
+   while (pkt_len + vsock->total_tx_buf > VIRTIO_VSOCK_MAX_TX_BUF_SIZE) {
+   prepare_to_wait_exclusive(>queue_wait, ,
+ TASK_UNINTERRUPTIBLE);
+   mutex_unlock(>tx_lock);
+   schedule();
+   mutex_lock(>tx_lock);
+   finish_wait(>queue_wait, );
+   }
+   vsock->total_tx_buf += pkt_len;
+   mutex_unlock(>tx_lock);
+
+   pkt = virtio_transport_alloc_pkt(vsk, info, pkt_len,
+src_cid, src_port,
+dst_cid, dst_port);
+   if (!pkt) {
+   mutex_lock(>tx_lock);
+   vsock->total_tx_buf -= pkt_len;
+   mutex_unlock(>tx_lock);
+   virtio_transport_put_credit(trans, pkt_len);
+   return -ENOMEM;
+   }
+
+   pr_debug("%s:info->pkt_len= %d\n", __func__, info->pkt_len);
+
+   /* Will be released in virtio_transport_send_pkt_work */
+   sock_hold(>vsk->sk);
+   virtio_transport_inc_tx_pkt(pkt);
+
+   /* Put pkt in the virtqueue */
+   sg_init_one(, >hdr, 

[PATCH v2 4/5] VSOCK: Introduce vhost-vsock.ko

2015-12-01 Thread Stefan Hajnoczi
From: Asias He 

VM sockets vhost transport implementation. This module runs in host
kernel.

Signed-off-by: Asias He 
Signed-off-by: Stefan Hajnoczi 
---
v2:
 * Add missing total_tx_buf decrement
 * Support flexible rx/tx descriptor layout
 * Refuse to assign reserved CIDs
 * Refuse guest CID if already in use
 * Only accept correctly addressed packets
---
 drivers/vhost/vsock.c | 631 ++
 drivers/vhost/vsock.h |   4 +
 2 files changed, 635 insertions(+)
 create mode 100644 drivers/vhost/vsock.c
 create mode 100644 drivers/vhost/vsock.h

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
new file mode 100644
index 000..65b1cf8
--- /dev/null
+++ b/drivers/vhost/vsock.c
@@ -0,0 +1,631 @@
+/*
+ * vhost transport for vsock
+ *
+ * Copyright (C) 2013-2015 Red Hat, Inc.
+ * Author: Asias He 
+ * Stefan Hajnoczi 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include "vhost.h"
+#include "vsock.h"
+
+#define VHOST_VSOCK_DEFAULT_HOST_CID   2
+
+static int vhost_transport_socket_init(struct vsock_sock *vsk,
+  struct vsock_sock *psk);
+
+enum {
+   VHOST_VSOCK_FEATURES = VHOST_FEATURES,
+};
+
+/* Used to track all the vhost_vsock instances on the system. */
+static LIST_HEAD(vhost_vsock_list);
+static DEFINE_MUTEX(vhost_vsock_mutex);
+
+struct vhost_vsock_virtqueue {
+   struct vhost_virtqueue vq;
+};
+
+struct vhost_vsock {
+   /* Vhost device */
+   struct vhost_dev dev;
+   /* Vhost vsock virtqueue*/
+   struct vhost_vsock_virtqueue vqs[VSOCK_VQ_MAX];
+   /* Link to global vhost_vsock_list*/
+   struct list_head list;
+   /* Head for pkt from host to guest */
+   struct list_head send_pkt_list;
+   /* Work item to send pkt */
+   struct vhost_work send_pkt_work;
+   /* Wait queue for send pkt */
+   wait_queue_head_t queue_wait;
+   /* Used for global tx buf limitation */
+   u32 total_tx_buf;
+   /* Guest contex id this vhost_vsock instance handles */
+   u32 guest_cid;
+};
+
+static u32 vhost_transport_get_local_cid(void)
+{
+   u32 cid = VHOST_VSOCK_DEFAULT_HOST_CID;
+   return cid;
+}
+
+static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
+{
+   struct vhost_vsock *vsock;
+
+   mutex_lock(_vsock_mutex);
+   list_for_each_entry(vsock, _vsock_list, list) {
+   if (vsock->guest_cid == guest_cid) {
+   mutex_unlock(_vsock_mutex);
+   return vsock;
+   }
+   }
+   mutex_unlock(_vsock_mutex);
+
+   return NULL;
+}
+
+static void
+vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
+   struct vhost_virtqueue *vq)
+{
+   bool added = false;
+
+   mutex_lock(>mutex);
+   vhost_disable_notify(>dev, vq);
+   for (;;) {
+   struct virtio_vsock_pkt *pkt;
+   struct iov_iter iov_iter;
+   unsigned out, in;
+   struct sock *sk;
+   size_t nbytes;
+   size_t len;
+   int head;
+
+   if (list_empty(>send_pkt_list)) {
+   vhost_enable_notify(>dev, vq);
+   break;
+   }
+
+   head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
+, , NULL, NULL);
+   pr_debug("%s: head = %d\n", __func__, head);
+   if (head < 0)
+   break;
+
+   if (head == vq->num) {
+   if (unlikely(vhost_enable_notify(>dev, vq))) {
+   vhost_disable_notify(>dev, vq);
+   continue;
+   }
+   break;
+   }
+
+   pkt = list_first_entry(>send_pkt_list,
+  struct virtio_vsock_pkt, list);
+   list_del_init(>list);
+
+   if (out) {
+   virtio_transport_free_pkt(pkt);
+   vq_err(vq, "Expected 0 output buffers, got %u\n", out);
+   break;
+   }
+
+   len = iov_length(>iov[out], in);
+   iov_iter_init(_iter, READ, >iov[out], in, len);
+
+   nbytes = copy_to_iter(>hdr, sizeof(pkt->hdr), _iter);
+   if (nbytes != sizeof(pkt->hdr)) {
+   virtio_transport_free_pkt(pkt);
+   vq_err(vq, "Faulted on copying pkt hdr\n");
+   break;
+   }
+
+   nbytes = copy_to_iter(pkt->buf, pkt->len, _iter);
+   if (nbytes != pkt->len) {
+   virtio_transport_free_pkt(pkt);
+

[PATCH v2 2/5] VSOCK: Introduce virtio-vsock-common.ko

2015-12-01 Thread Stefan Hajnoczi
From: Asias He 

This module contains the common code and header files for the following
virtio-vsock and virtio-vhost kernel modules.

Signed-off-by: Asias He 
Signed-off-by: Stefan Hajnoczi 
---
v2:
 * Fix peer_buf_alloc inheritance on child socket
 * Notify other side of SOCK_STREAM disconnect (fixes shutdown
   semantics)
 * Avoid recursive mutex_lock(tx_lock) for write_space (fixes deadlock)
 * Define VIRTIO_VSOCK_TYPE_STREAM/DGRAM hardware interface constants
 * Define VIRTIO_VSOCK_SHUTDOWN_RCV/SEND hardware interface constants
---
 include/linux/virtio_vsock.h|  209 +
 include/uapi/linux/virtio_ids.h |1 +
 include/uapi/linux/virtio_vsock.h   |   89 +++
 net/vmw_vsock/virtio_transport_common.c | 1272 +++
 4 files changed, 1571 insertions(+)
 create mode 100644 include/linux/virtio_vsock.h
 create mode 100644 include/uapi/linux/virtio_vsock.h
 create mode 100644 net/vmw_vsock/virtio_transport_common.c

diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
new file mode 100644
index 000..a5f3ecc
--- /dev/null
+++ b/include/linux/virtio_vsock.h
@@ -0,0 +1,209 @@
+/*
+ * This header, excluding the #ifdef __KERNEL__ part, is BSD licensed so
+ * anyone can use the definitions to implement compatible drivers/servers:
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of IBM nor the names of its contributors
+ *may be used to endorse or promote products derived from this software
+ *without specific prior written permission.
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS 
IS''
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * Copyright (C) Red Hat, Inc., 2013-2015
+ * Copyright (C) Asias He , 2013
+ * Copyright (C) Stefan Hajnoczi , 2015
+ */
+
+#ifndef _LINUX_VIRTIO_VSOCK_H
+#define _LINUX_VIRTIO_VSOCK_H
+
+#include 
+#include 
+#include 
+
+#define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE  128
+#define VIRTIO_VSOCK_DEFAULT_BUF_SIZE  (1024 * 256)
+#define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE  (1024 * 256)
+#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE   (1024 * 4)
+#define VIRTIO_VSOCK_MAX_BUF_SIZE  0xUL
+#define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE  (1024 * 64)
+#define VIRTIO_VSOCK_MAX_TX_BUF_SIZE   (1024 * 1024 * 16)
+#define VIRTIO_VSOCK_MAX_DGRAM_SIZE(1024 * 64)
+
+struct vsock_transport_recv_notify_data;
+struct vsock_transport_send_notify_data;
+struct sockaddr_vm;
+struct vsock_sock;
+
+enum {
+   VSOCK_VQ_CTRL   = 0,
+   VSOCK_VQ_RX = 1, /* for host to guest data */
+   VSOCK_VQ_TX = 2, /* for guest to host data */
+   VSOCK_VQ_MAX= 3,
+};
+
+/* virtio transport socket state */
+struct virtio_transport {
+   struct virtio_transport_pkt_ops *ops;
+   struct vsock_sock *vsk;
+
+   u32 buf_size;
+   u32 buf_size_min;
+   u32 buf_size_max;
+
+   struct mutex tx_lock;
+   struct mutex rx_lock;
+
+   struct list_head rx_queue;
+   u32 rx_bytes;
+
+   /* Protected by trans->tx_lock */
+   u32 tx_cnt;
+   u32 buf_alloc;
+   u32 peer_fwd_cnt;
+   u32 peer_buf_alloc;
+   /* Protected by trans->rx_lock */
+   u32 fwd_cnt;
+
+   /* Protected by sk_lock */
+   u16 dgram_id;
+   struct list_head incomplete_dgrams; /* dgram fragments */
+};
+
+struct virtio_vsock_pkt {
+   struct virtio_vsock_hdr hdr;
+   struct virtio_transport *trans;
+   struct work_struct work;
+   struct list_head list;
+   void *buf;
+   u32 len;
+   u32 off;
+};
+
+struct virtio_vsock_pkt_info {
+   u32 remote_cid, remote_port;
+   struct msghdr *msg;
+   u32 pkt_len;
+   u16 type;
+   u16 op;

[PATCH v2 5/5] VSOCK: Add Makefile and Kconfig

2015-12-01 Thread Stefan Hajnoczi
From: Asias He 

Enable virtio-vsock and vhost-vsock.

Signed-off-by: Asias He 
Signed-off-by: Stefan Hajnoczi 
---
 drivers/vhost/Kconfig   |  4 
 drivers/vhost/Kconfig.vsock |  7 +++
 drivers/vhost/Makefile  |  4 
 net/vmw_vsock/Kconfig   | 18 ++
 net/vmw_vsock/Makefile  |  2 ++
 5 files changed, 35 insertions(+)
 create mode 100644 drivers/vhost/Kconfig.vsock

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 533eaf0..81449bf 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -47,3 +47,7 @@ config VHOST_CROSS_ENDIAN_LEGACY
  adds some overhead, it is disabled by default.
 
  If unsure, say "N".
+
+if STAGING
+source "drivers/vhost/Kconfig.vsock"
+endif
diff --git a/drivers/vhost/Kconfig.vsock b/drivers/vhost/Kconfig.vsock
new file mode 100644
index 000..3491865
--- /dev/null
+++ b/drivers/vhost/Kconfig.vsock
@@ -0,0 +1,7 @@
+config VHOST_VSOCK
+   tristate "vhost virtio-vsock driver"
+   depends on VSOCKETS && EVENTFD
+   select VIRTIO_VSOCKETS_COMMON
+   default n
+   ---help---
+   Say M here to enable the vhost-vsock for virtio-vsock guests
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index e0441c3..6b012b9 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -4,5 +4,9 @@ vhost_net-y := net.o
 obj-$(CONFIG_VHOST_SCSI) += vhost_scsi.o
 vhost_scsi-y := scsi.o
 
+obj-$(CONFIG_VHOST_VSOCK) += vhost_vsock.o
+vhost_vsock-y := vsock.o
+
 obj-$(CONFIG_VHOST_RING) += vringh.o
+
 obj-$(CONFIG_VHOST)+= vhost.o
diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig
index 14810ab..74e0bc8 100644
--- a/net/vmw_vsock/Kconfig
+++ b/net/vmw_vsock/Kconfig
@@ -26,3 +26,21 @@ config VMWARE_VMCI_VSOCKETS
 
  To compile this driver as a module, choose M here: the module
  will be called vmw_vsock_vmci_transport. If unsure, say N.
+
+config VIRTIO_VSOCKETS
+   tristate "virtio transport for Virtual Sockets"
+   depends on VSOCKETS && VIRTIO
+   select VIRTIO_VSOCKETS_COMMON
+   help
+ This module implements a virtio transport for Virtual Sockets.
+
+ Enable this transport if your Virtual Machine runs on Qemu/KVM.
+
+ To compile this driver as a module, choose M here: the module
+ will be called virtio_vsock_transport. If unsure, say N.
+
+config VIRTIO_VSOCKETS_COMMON
+   tristate
+   ---help---
+ This option is selected by any driver which needs to access
+ the virtio_vsock.
diff --git a/net/vmw_vsock/Makefile b/net/vmw_vsock/Makefile
index 2ce52d7..cf4c294 100644
--- a/net/vmw_vsock/Makefile
+++ b/net/vmw_vsock/Makefile
@@ -1,5 +1,7 @@
 obj-$(CONFIG_VSOCKETS) += vsock.o
 obj-$(CONFIG_VMWARE_VMCI_VSOCKETS) += vmw_vsock_vmci_transport.o
+obj-$(CONFIG_VIRTIO_VSOCKETS) += virtio_transport.o
+obj-$(CONFIG_VIRTIO_VSOCKETS_COMMON) += virtio_transport_common.o
 
 vsock-y += af_vsock.o vsock_addr.o
 
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v9 3/5] nvdimm acpi: build ACPI NFIT table

2015-12-01 Thread Xiao Guangrong
NFIT is defined in ACPI 6.0: 5.2.25 NVDIMM Firmware Interface Table (NFIT)

Currently, we only support PMEM mode. Each device has 3 structures:
- SPA structure, defines the PMEM region info

- MEM DEV structure, it has the @handle which is used to associate specified
  ACPI NVDIMM  device we will introduce in later patch.
  Also we can happily ignored the memory device's interleave, the real
  nvdimm hardware access is hidden behind host

- DCR structure, it defines vendor ID used to associate specified vendor
  nvdimm driver. Since we only implement PMEM mode this time, Command
  window and Data window are not needed

The NVDIMM functionality is controlled by the parameter, 'nvdimm', which
is introduced for the machine, there is a example to enable it:
-machine pc,nvdimm -m 8G,maxmem=100G,slots=100  -object \
memory-backend-file,id=mem1,share,mem-path=/tmp/nvdimm1,size=10G -device \
nvdimm,memdev=mem1,id=nv1

It is disabled on default

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Xiao Guangrong 
---
 default-configs/i386-softmmu.mak   |   1 +
 default-configs/x86_64-softmmu.mak |   1 +
 hw/acpi/Makefile.objs  |   1 +
 hw/acpi/nvdimm.c   | 382 +
 hw/i386/acpi-build.c   |  12 ++
 hw/i386/pc.c   |  19 ++
 include/hw/i386/pc.h   |   2 +
 include/hw/mem/nvdimm.h|   3 +
 qemu-options.hx|   5 +-
 9 files changed, 425 insertions(+), 1 deletion(-)
 create mode 100644 hw/acpi/nvdimm.c

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 4c79d3b..53fb517 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -47,6 +47,7 @@ CONFIG_IOAPIC=y
 CONFIG_PVPANIC=y
 CONFIG_MEM_HOTPLUG=y
 CONFIG_NVDIMM=y
+CONFIG_ACPI_NVDIMM=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
diff --git a/default-configs/x86_64-softmmu.mak 
b/default-configs/x86_64-softmmu.mak
index e42d2fc..766c27c 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -47,6 +47,7 @@ CONFIG_IOAPIC=y
 CONFIG_PVPANIC=y
 CONFIG_MEM_HOTPLUG=y
 CONFIG_NVDIMM=y
+CONFIG_ACPI_NVDIMM=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
index 7d3230c..095597f 100644
--- a/hw/acpi/Makefile.objs
+++ b/hw/acpi/Makefile.objs
@@ -2,6 +2,7 @@ common-obj-$(CONFIG_ACPI_X86) += core.o piix4.o pcihp.o
 common-obj-$(CONFIG_ACPI_X86_ICH) += ich9.o tco.o
 common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu_hotplug.o
 common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
+common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
 common-obj-$(CONFIG_ACPI) += acpi_interface.o
 common-obj-$(CONFIG_ACPI) += bios-linker-loader.o
 common-obj-$(CONFIG_ACPI) += aml-build.o
diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
new file mode 100644
index 000..98c004d
--- /dev/null
+++ b/hw/acpi/nvdimm.c
@@ -0,0 +1,382 @@
+/*
+ * NVDIMM ACPI Implementation
+ *
+ * Copyright(C) 2015 Intel Corporation.
+ *
+ * Author:
+ *  Xiao Guangrong 
+ *
+ * NFIT is defined in ACPI 6.0: 5.2.25 NVDIMM Firmware Interface Table (NFIT)
+ * and the DSM specification can be found at:
+ *   http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
+ *
+ * Currently, it only supports PMEM Virtualization.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 
+ */
+
+#include "hw/acpi/acpi.h"
+#include "hw/acpi/aml-build.h"
+#include "hw/mem/nvdimm.h"
+
+static int nvdimm_plugged_device_list(Object *obj, void *opaque)
+{
+GSList **list = opaque;
+
+if (object_dynamic_cast(obj, TYPE_NVDIMM)) {
+DeviceState *dev = DEVICE(obj);
+
+if (dev->realized) { /* only realized NVDIMMs matter */
+*list = g_slist_append(*list, DEVICE(obj));
+}
+}
+
+object_child_foreach(obj, nvdimm_plugged_device_list, opaque);
+return 0;
+}
+
+/*
+ * inquire plugged NVDIMM devices and link them into the list which is
+ * returned to the caller.
+ *
+ * Note: it is the caller's responsibility to free the list to avoid
+ * memory leak.
+ */
+static GSList *nvdimm_get_plugged_device_list(void)
+{
+GSList *list = NULL;
+
+object_child_foreach(qdev_get_machine(), 

[PATCH v9 5/5] nvdimm: add maintain info

2015-12-01 Thread Xiao Guangrong
Add NVDIMM maintainer

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Xiao Guangrong 
---
 MAINTAINERS | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index bb1f3e4..7e82340 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -940,6 +940,13 @@ M: Jiri Pirko 
 S: Maintained
 F: hw/net/rocker/
 
+NVDIMM
+M: Xiao Guangrong 
+S: Maintained
+F: hw/acpi/nvdimm.c
+F: hw/mem/nvdimm.c
+F: include/hw/mem/nvdimm.h
+
 Subsystems
 --
 Audio
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/5] VSOCK: Introduce vsock_find_unbound_socket and vsock_bind_dgram_generic

2015-12-01 Thread Stefan Hajnoczi
From: Asias He 

Signed-off-by: Asias He 
Signed-off-by: Stefan Hajnoczi 
---
 include/net/af_vsock.h   |  2 ++
 net/vmw_vsock/af_vsock.c | 70 
 2 files changed, 72 insertions(+)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index e9eb2d6..a0c8fa2 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -175,8 +175,10 @@ void vsock_insert_connected(struct vsock_sock *vsk);
 void vsock_remove_bound(struct vsock_sock *vsk);
 void vsock_remove_connected(struct vsock_sock *vsk);
 struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr);
+struct sock *vsock_find_unbound_socket(struct sockaddr_vm *addr);
 struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
 struct sockaddr_vm *dst);
 void vsock_for_each_connected_socket(void (*fn)(struct sock *sk));
+int vsock_bind_dgram_generic(struct vsock_sock *vsk, struct sockaddr_vm *addr);
 
 #endif /* __AF_VSOCK_H__ */
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 7fd1220..77247a2 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -223,6 +223,17 @@ static struct sock *__vsock_find_bound_socket(struct 
sockaddr_vm *addr)
return NULL;
 }
 
+static struct sock *__vsock_find_unbound_socket(struct sockaddr_vm *addr)
+{
+   struct vsock_sock *vsk;
+
+   list_for_each_entry(vsk, vsock_unbound_sockets, bound_table)
+   if (addr->svm_port == vsk->local_addr.svm_port)
+   return sk_vsock(vsk);
+
+   return NULL;
+}
+
 static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src,
  struct sockaddr_vm *dst)
 {
@@ -298,6 +309,21 @@ struct sock *vsock_find_bound_socket(struct sockaddr_vm 
*addr)
 }
 EXPORT_SYMBOL_GPL(vsock_find_bound_socket);
 
+struct sock *vsock_find_unbound_socket(struct sockaddr_vm *addr)
+{
+   struct sock *sk;
+
+   spin_lock_bh(_table_lock);
+   sk = __vsock_find_unbound_socket(addr);
+   if (sk)
+   sock_hold(sk);
+
+   spin_unlock_bh(_table_lock);
+
+   return sk;
+}
+EXPORT_SYMBOL_GPL(vsock_find_unbound_socket);
+
 struct sock *vsock_find_connected_socket(struct sockaddr_vm *src,
 struct sockaddr_vm *dst)
 {
@@ -532,6 +558,50 @@ static int __vsock_bind_stream(struct vsock_sock *vsk,
return 0;
 }
 
+int vsock_bind_dgram_generic(struct vsock_sock *vsk, struct sockaddr_vm *addr)
+{
+   static u32 port = LAST_RESERVED_PORT + 1;
+   struct sockaddr_vm new_addr;
+
+   vsock_addr_init(_addr, addr->svm_cid, addr->svm_port);
+
+   if (addr->svm_port == VMADDR_PORT_ANY) {
+   bool found = false;
+   unsigned int i;
+
+   for (i = 0; i < MAX_PORT_RETRIES; i++) {
+   if (port <= LAST_RESERVED_PORT)
+   port = LAST_RESERVED_PORT + 1;
+
+   new_addr.svm_port = port++;
+
+   if (!__vsock_find_unbound_socket(_addr)) {
+   found = true;
+   break;
+   }
+   }
+
+   if (!found)
+   return -EADDRNOTAVAIL;
+   } else {
+   /* If port is in reserved range, ensure caller
+* has necessary privileges.
+*/
+   if (addr->svm_port <= LAST_RESERVED_PORT &&
+   !capable(CAP_NET_BIND_SERVICE)) {
+   return -EACCES;
+   }
+
+   if (__vsock_find_unbound_socket(_addr))
+   return -EADDRINUSE;
+   }
+
+   vsock_addr_init(>local_addr, new_addr.svm_cid, new_addr.svm_port);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(vsock_bind_dgram_generic);
+
 static int __vsock_bind_dgram(struct vsock_sock *vsk,
  struct sockaddr_vm *addr)
 {
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v9 1/5] nvdimm: implement NVDIMM device abstract

2015-12-01 Thread Xiao Guangrong
Introduce "nvdimm" device which is based on pc-dimm device type

Currently, nothing is specific for nvdimm but hotplug is disabled

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Xiao Guangrong 
---
 default-configs/i386-softmmu.mak   |  1 +
 default-configs/x86_64-softmmu.mak |  1 +
 hw/acpi/memory_hotplug.c   |  5 +
 hw/mem/Makefile.objs   |  1 +
 hw/mem/nvdimm.c| 46 ++
 include/hw/mem/nvdimm.h| 29 
 6 files changed, 83 insertions(+)
 create mode 100644 hw/mem/nvdimm.c
 create mode 100644 include/hw/mem/nvdimm.h

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 43c96d1..4c79d3b 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -46,6 +46,7 @@ CONFIG_APIC=y
 CONFIG_IOAPIC=y
 CONFIG_PVPANIC=y
 CONFIG_MEM_HOTPLUG=y
+CONFIG_NVDIMM=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
diff --git a/default-configs/x86_64-softmmu.mak 
b/default-configs/x86_64-softmmu.mak
index dfb8095..e42d2fc 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -46,6 +46,7 @@ CONFIG_APIC=y
 CONFIG_IOAPIC=y
 CONFIG_PVPANIC=y
 CONFIG_MEM_HOTPLUG=y
+CONFIG_NVDIMM=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c
index e4b9a01..298e868 100644
--- a/hw/acpi/memory_hotplug.c
+++ b/hw/acpi/memory_hotplug.c
@@ -231,6 +231,11 @@ void acpi_memory_plug_cb(ACPIREGS *ar, qemu_irq irq, 
MemHotplugState *mem_st,
  DeviceState *dev, Error **errp)
 {
 MemStatus *mdev;
+DeviceClass *dc = DEVICE_GET_CLASS(dev);
+
+if (!dc->hotpluggable) {
+return;
+}
 
 mdev = acpi_memory_slot_status(mem_st, dev, errp);
 if (!mdev) {
diff --git a/hw/mem/Makefile.objs b/hw/mem/Makefile.objs
index b000fb4..f12f8b9 100644
--- a/hw/mem/Makefile.objs
+++ b/hw/mem/Makefile.objs
@@ -1 +1,2 @@
 common-obj-$(CONFIG_MEM_HOTPLUG) += pc-dimm.o
+common-obj-$(CONFIG_NVDIMM) += nvdimm.o
diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
new file mode 100644
index 000..4fd397f
--- /dev/null
+++ b/hw/mem/nvdimm.c
@@ -0,0 +1,46 @@
+/*
+ * Non-Volatile Dual In-line Memory Module Virtualization Implementation
+ *
+ * Copyright(C) 2015 Intel Corporation.
+ *
+ * Author:
+ *  Xiao Guangrong 
+ *
+ * Currently, it only supports PMEM Virtualization.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 
+ */
+
+#include "hw/mem/nvdimm.h"
+
+static void nvdimm_class_init(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+
+/* nvdimm hotplug has not been supported yet. */
+dc->hotpluggable = false;
+}
+
+static TypeInfo nvdimm_info = {
+.name  = TYPE_NVDIMM,
+.parent= TYPE_PC_DIMM,
+.class_init= nvdimm_class_init,
+};
+
+static void nvdimm_register_types(void)
+{
+type_register_static(_info);
+}
+
+type_init(nvdimm_register_types)
diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
new file mode 100644
index 000..dbfa8d6
--- /dev/null
+++ b/include/hw/mem/nvdimm.h
@@ -0,0 +1,29 @@
+/*
+ * Non-Volatile Dual In-line Memory Module Virtualization Implementation
+ *
+ * Copyright(C) 2015 Intel Corporation.
+ *
+ * Author:
+ *  Xiao Guangrong 
+ *
+ * NVDIMM specifications and some documents can be found at:
+ * NVDIMM ACPI device and NFIT are introduced in ACPI 6:
+ *  http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
+ * NVDIMM Namespace specification:
+ *  http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
+ * DSM Interface Example:
+ *  http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
+ * Driver Writer's Guide:
+ *  http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_NVDIMM_H
+#define QEMU_NVDIMM_H
+
+#include "hw/mem/pc-dimm.h"
+
+#define TYPE_NVDIMM  "nvdimm"
+#endif
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

[PATCH v9 0/5] implement vNVDIMM

2015-12-01 Thread Xiao Guangrong
This patchset can be found at:
  https://github.com/xiaogr/qemu.git nvdimm-v9

It is based on pci branch on Michael's tree and the top commit is:
commit 0c73277af7 (vhost-user-test: fix crash with glib < 2.36).

Changelog in v9:
- the changes address Michael's comments:
  1) move the control parameter to -machine and it is off on default, then
 it can be enabled by, for example, -machine pc,nvdimm
  2) introduce a macro to define "NCAL"
  3) abstract the function, nvdimm_build_device_dsm(), to clean up the
 code
  4) adjust the code style of dsm method
  5) add spec reference in the code comment

other:
  pick up Stefan's Reviewed-by
  
Changelog in v8:
We split the long patch series into the small parts, as you see now, this
is the first part which enables NVDIMM without label data support.

The command line has been changed because some patches simplifying the
things have not been included into this series, you should specify the
file size exactly using the parameters as follows:
   memory-backend-file,id=mem1,share,mem-path=/tmp/nvdimm1,size=10G \
   -device nvdimm,memdev=mem1,id=nv1

Changelog in v7:
- changes from Vladimir Sementsov-Ogievskiy's comments:
  1) let gethugepagesize() realize if fstat is failed instead of get
 normal page size
  2) rename  open_file_path to open_ram_file_path
  3) better log the error message by using error_setg_errno
  4) update commit in the commit log to explain hugepage detection on
 Windows

- changes from Eduardo Habkost's comments:
  1) use 'Error**' to collect error message for qemu_file_get_page_size()
  2) move gethugepagesize() replacement to the same patch to make it
 better for review
  3) introduce qemu_get_file_size to unity the code with raw_getlength()

- changes from Stefan's comments:
  1) check the memory region is large enough to contain DSM output
 buffer

- changes from Eric Blake's comments:
  1) update the shell command in the commit log to generate the patch
 which drops 'pc-dimm' prefix
  
- others:
  pick up Reviewed-by from Stefan, Vladimir Sementsov-Ogievskiy, and
  Eric Blake.

Changelog in v6:
- changes from Stefan's comments:
  1) fix code style of struct naming by CamelCase way
  2) fix offset + length overflow when read/write label data
  3) compile hw/acpi/nvdimm.c for per target so that TARGET_PAGE_SIZE can
 be used to replace getpagesize()

Changelog in v5:
- changes from Michael's comments:
  1) prefix nvdimm_ to everything in NVDIMM source files
  2) make parsing _DSM Arg3 more clear
  3) comment style fix
  5) drop single used definition
  6) fix dirty dsm buffer lost due to memory write happened on host
  7) check dsm buffer if it is big enough to contain input data
  8) use build_append_int_noprefix to store single value to GArray

- changes from Michael's and Igor's comments:
  1) introduce 'nvdimm-support' parameter to control nvdimm
 enablement and it is disabled for 2.4 and its earlier versions
 to make live migration compatible
  2) only reserve 1 RAM page and 4 bytes IO Port for NVDIMM ACPI
 virtualization

- changes from Stefan's comments:
  1) do endian adjustment for the buffer length

- changes from Bharata B Rao's comments:
  1) fix compile on ppc

- others:
  1) the buffer length is directly got from IO read rather than got
 from dsm memory
  2) fix dirty label data lost due to memory write happened on host

Changelog in v4:
- changes from Michael's comments:
  1) show the message, "Memory is not allocated from HugeTlbfs", if file
 based memory is not allocated from hugetlbfs.
  2) introduce function, acpi_get_nvdimm_state(), to get NVDIMMState
 from Machine.
  3) statically define UUID and make its operation more clear
  4) use GArray to build device structures to avoid potential buffer
 overflow
  4) improve comments in the code
  5) improve code style

- changes from Igor's comments:
  1) add NVDIMM ACPI spec document
  2) use serialized method to avoid Mutex
  3) move NVDIMM ACPI's code to hw/acpi/nvdimm.c
  4) introduce a common ASL method used by _DSM for all devices to reduce
 ACPI size
  5) handle UUID in ACPI AML code. BTW, i'd keep handling revision in QEMU
 it's better to upgrade QEMU to support Rev2 in the future

- changes from Stefan's comments:
  1) copy input data from DSM memory to local buffer to avoid potential
 issues as DSM memory is visible to guest. Output data is handled
 in a similar way

- changes from Dan's comments:
  1) drop static namespace as Linux has already supported label-less
 nvdimm devices

- changes from Vladimir's comments:
  1) print better message, "failed to get file size for %s, can't create
 backend on it", if any file operation filed to obtain file size

- others:
  create a git repo on github.com for better review/test

Also, thanks for Eric Blake's review on QAPI's side.

Thank all of you to review this patchset.

Changelog in v3:
There is huge change in this version, thank Igor, 

[PATCH v9 2/5] acpi: support specified oem table id for build_header

2015-12-01 Thread Xiao Guangrong
Let build_header() support specified OEM table id so that we can build
multiple SSDT later

If the oem table id is not specified (aka, NULL), we use the default id
instead as the previous behavior

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Xiao Guangrong 
---
 hw/acpi/aml-build.c | 15 +++
 hw/arm/virt-acpi-build.c| 13 +++--
 hw/i386/acpi-build.c| 20 ++--
 include/hw/acpi/aml-build.h |  3 ++-
 4 files changed, 30 insertions(+), 21 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index a00a0ab..92873bb 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1137,14 +1137,21 @@ Aml *aml_unicode(const char *str)
 
 void
 build_header(GArray *linker, GArray *table_data,
- AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
+ AcpiTableHeader *h, const char *sig, int len, uint8_t rev,
+ const char *oem_table_id)
 {
 memcpy(>signature, sig, 4);
 h->length = cpu_to_le32(len);
 h->revision = rev;
 memcpy(h->oem_id, ACPI_BUILD_APPNAME6, 6);
-memcpy(h->oem_table_id, ACPI_BUILD_APPNAME4, 4);
-memcpy(h->oem_table_id + 4, sig, 4);
+
+if (oem_table_id) {
+strncpy((char *)h->oem_table_id, oem_table_id, 
sizeof(h->oem_table_id));
+} else {
+memcpy(h->oem_table_id, ACPI_BUILD_APPNAME4, 4);
+memcpy(h->oem_table_id + 4, sig, 4);
+}
+
 h->oem_revision = cpu_to_le32(1);
 memcpy(h->asl_compiler_id, ACPI_BUILD_APPNAME4, 4);
 h->asl_compiler_revision = cpu_to_le32(1);
@@ -1211,5 +1218,5 @@ build_rsdt(GArray *table_data, GArray *linker, GArray 
*table_offsets)
sizeof(uint32_t));
 }
 build_header(linker, table_data,
- (void *)rsdt, "RSDT", rsdt_len, 1);
+ (void *)rsdt, "RSDT", rsdt_len, 1, NULL);
 }
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 3c2c5d6..da17779 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -381,7 +381,8 @@ build_spcr(GArray *table_data, GArray *linker, 
VirtGuestInfo *guest_info)
 spcr->pci_device_id = 0x;  /* PCI Device ID: not a PCI device */
 spcr->pci_vendor_id = 0x;  /* PCI Vendor ID: not a PCI device */
 
-build_header(linker, table_data, (void *)spcr, "SPCR", sizeof(*spcr), 2);
+build_header(linker, table_data, (void *)spcr, "SPCR", sizeof(*spcr), 2,
+ NULL);
 }
 
 static void
@@ -400,7 +401,7 @@ build_mcfg(GArray *table_data, GArray *linker, 
VirtGuestInfo *guest_info)
 mcfg->allocation[0].end_bus_number = (memmap[VIRT_PCIE_ECAM].size
   / PCIE_MMCFG_SIZE_MIN) - 1;
 
-build_header(linker, table_data, (void *)mcfg, "MCFG", len, 1);
+build_header(linker, table_data, (void *)mcfg, "MCFG", len, 1, NULL);
 }
 
 /* GTDT */
@@ -426,7 +427,7 @@ build_gtdt(GArray *table_data, GArray *linker)
 
 build_header(linker, table_data,
  (void *)(table_data->data + gtdt_start), "GTDT",
- table_data->len - gtdt_start, 2);
+ table_data->len - gtdt_start, 2, NULL);
 }
 
 /* MADT */
@@ -488,7 +489,7 @@ build_madt(GArray *table_data, GArray *linker, 
VirtGuestInfo *guest_info,
 
 build_header(linker, table_data,
  (void *)(table_data->data + madt_start), "APIC",
- table_data->len - madt_start, 3);
+ table_data->len - madt_start, 3, NULL);
 }
 
 /* FADT */
@@ -513,7 +514,7 @@ build_fadt(GArray *table_data, GArray *linker, unsigned 
dsdt)
sizeof fadt->dsdt);
 
 build_header(linker, table_data,
- (void *)fadt, "FACP", sizeof(*fadt), 5);
+ (void *)fadt, "FACP", sizeof(*fadt), 5, NULL);
 }
 
 /* DSDT */
@@ -546,7 +547,7 @@ build_dsdt(GArray *table_data, GArray *linker, 
VirtGuestInfo *guest_info)
 g_array_append_vals(table_data, dsdt->buf->data, dsdt->buf->len);
 build_header(linker, table_data,
 (void *)(table_data->data + table_data->len - dsdt->buf->len),
-"DSDT", dsdt->buf->len, 2);
+"DSDT", dsdt->buf->len, 2, NULL);
 free_aml_allocator();
 }
 
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 95e0c65..215b58c 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -361,7 +361,7 @@ build_fadt(GArray *table_data, GArray *linker, AcpiPmInfo 
*pm,
 fadt_setup(fadt, pm);
 
 build_header(linker, table_data,
- (void *)fadt, "FACP", sizeof(*fadt), 1);
+ (void *)fadt, "FACP", sizeof(*fadt), 1, NULL);
 }
 
 static void
@@ -431,7 +431,7 @@ build_madt(GArray *table_data, GArray *linker, AcpiCpuInfo 
*cpu,
 
 build_header(linker, table_data,
  (void *)(table_data->data + madt_start), "APIC",
- table_data->len - madt_start, 1);
+ table_data->len - 

[PATCH v9 4/5] nvdimm acpi: build ACPI nvdimm devices

2015-12-01 Thread Xiao Guangrong
NVDIMM devices is defined in ACPI 6.0 9.20 NVDIMM Devices

There is a root device under \_SB and specified NVDIMM devices are under the
root device. Each NVDIMM device has _ADR which returns its handle used to
associate MEMDEV structure in NFIT

Currently, we do not support any function on _DSM, that means, NVDIMM
label data has not been supported yet

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Xiao Guangrong 
---
 hw/acpi/nvdimm.c | 106 +++
 1 file changed, 106 insertions(+)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 98c004d..d2fad01 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -367,6 +367,111 @@ static void nvdimm_build_nfit(GSList *device_list, GArray 
*table_offsets,
 g_array_free(structures, true);
 }
 
+#define NVDIMM_COMMON_DSM  "NCAL"
+
+static void nvdimm_build_common_dsm(Aml *dev)
+{
+Aml *method, *ifctx, *function;
+uint8_t byte_list[1];
+
+method = aml_method(NVDIMM_COMMON_DSM, 4);
+function = aml_arg(2);
+
+/*
+ * function 0 is called to inquire what functions are supported by
+ * OSPM
+ */
+ifctx = aml_if(aml_equal(function, aml_int(0)));
+byte_list[0] = 0 /* No function Supported */;
+aml_append(ifctx, aml_return(aml_buffer(1, byte_list)));
+aml_append(method, ifctx);
+
+/* No function is supported yet. */
+byte_list[0] = 1 /* Not Supported */;
+aml_append(method, aml_return(aml_buffer(1, byte_list)));
+
+aml_append(dev, method);
+}
+
+static void nvdimm_build_device_dsm(Aml *dev)
+{
+Aml *method;
+
+method = aml_method("_DSM", 4);
+aml_append(method, aml_return(aml_call4(NVDIMM_COMMON_DSM, aml_arg(0),
+  aml_arg(1), aml_arg(2), aml_arg(3;
+aml_append(dev, method);
+}
+
+static void nvdimm_build_nvdimm_devices(GSList *device_list, Aml *root_dev)
+{
+for (; device_list; device_list = device_list->next) {
+DeviceState *dev = device_list->data;
+int slot = object_property_get_int(OBJECT(dev), PC_DIMM_SLOT_PROP,
+   NULL);
+uint32_t handle = nvdimm_slot_to_handle(slot);
+Aml *nvdimm_dev;
+
+nvdimm_dev = aml_device("NV%02X", slot);
+
+/*
+ * ACPI 6.0: 9.20 NVDIMM Devices:
+ *
+ * _ADR object that is used to supply OSPM with unique address
+ * of the NVDIMM device. This is done by returning the NFIT Device
+ * handle that is used to identify the associated entries in ACPI
+ * table NFIT or _FIT.
+ */
+aml_append(nvdimm_dev, aml_name_decl("_ADR", aml_int(handle)));
+
+nvdimm_build_device_dsm(nvdimm_dev);
+aml_append(root_dev, nvdimm_dev);
+}
+}
+
+static void nvdimm_build_ssdt(GSList *device_list, GArray *table_offsets,
+  GArray *table_data, GArray *linker)
+{
+Aml *ssdt, *sb_scope, *dev;
+
+acpi_add_table(table_offsets, table_data);
+
+ssdt = init_aml_allocator();
+acpi_data_push(ssdt->buf, sizeof(AcpiTableHeader));
+
+sb_scope = aml_scope("\\_SB");
+
+dev = aml_device("NVDR");
+
+/*
+ * ACPI 6.0: 9.20 NVDIMM Devices:
+ *
+ * The ACPI Name Space device uses _HID of ACPI0012 to identify the root
+ * NVDIMM interface device. Platform firmware is required to contain one
+ * such device in _SB scope if NVDIMMs support is exposed by platform to
+ * OSPM.
+ * For each NVDIMM present or intended to be supported by platform,
+ * platform firmware also exposes an ACPI Namespace Device under the
+ * root device.
+ */
+aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0012")));
+
+nvdimm_build_common_dsm(dev);
+nvdimm_build_device_dsm(dev);
+
+nvdimm_build_nvdimm_devices(device_list, dev);
+
+aml_append(sb_scope, dev);
+
+aml_append(ssdt, sb_scope);
+/* copy AML table into ACPI tables blob and patch header there */
+g_array_append_vals(table_data, ssdt->buf->data, ssdt->buf->len);
+build_header(linker, table_data,
+(void *)(table_data->data + table_data->len - ssdt->buf->len),
+"SSDT", ssdt->buf->len, 1, "NVDIMM");
+free_aml_allocator();
+}
+
 void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
GArray *linker)
 {
@@ -378,5 +483,6 @@ void nvdimm_build_acpi(GArray *table_offsets, GArray 
*table_data,
 return;
 }
 nvdimm_build_nfit(device_list, table_offsets, table_data, linker);
+nvdimm_build_ssdt(device_list, table_offsets, table_data, linker);
 g_slist_free(device_list);
 }
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 3/3] vhost_net: basic polling support

2015-12-01 Thread Jason Wang


On 12/01/2015 10:43 PM, Michael S. Tsirkin wrote:
> On Tue, Dec 01, 2015 at 01:17:49PM +0800, Jason Wang wrote:
>>
>> On 11/30/2015 06:44 PM, Michael S. Tsirkin wrote:
>>> On Wed, Nov 25, 2015 at 03:11:29PM +0800, Jason Wang wrote:
> This patch tries to poll for new added tx buffer or socket receive
> queue for a while at the end of tx/rx processing. The maximum time
> spent on polling were specified through a new kind of vring ioctl.
>
> Signed-off-by: Jason Wang 
>>> One further enhancement would be to actually poll
>>> the underlying device. This should be reasonably
>>> straight-forward with macvtap (especially in the
>>> passthrough mode).
>>>
>>>
>> Yes, it is. I have some patches to do this by replacing
>> skb_queue_empty() with sk_busy_loop() but for tap.
> We probably don't want to do this unconditionally, though.
>
>> Tests does not show
>> any improvement but some regression.
> Did you add code to call sk_mark_napi_id on tap then?
> sk_busy_loop won't do anything useful without.

Yes I did. Probably something wrong elsewhere.

>
>>  Maybe it's better to test macvtap.
> Same thing ...
>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 09/21] arm64: KVM: Implement guest entry

2015-12-01 Thread Marc Zyngier
On 01/12/15 15:29, Christoffer Dall wrote:
> On Fri, Nov 27, 2015 at 06:50:03PM +, Marc Zyngier wrote:
>> Contrary to the previous patch, the guest entry is fairly different
>> from its assembly counterpart, mostly because it is only concerned
>> with saving/restoring the GP registers, and nothing else.
>>
>> Signed-off-by: Marc Zyngier 
>> ---
>>  arch/arm64/kvm/hyp/Makefile |   1 +
>>  arch/arm64/kvm/hyp/entry.S  | 155 
>> 
>>  arch/arm64/kvm/hyp/hyp.h|   2 +
>>  3 files changed, 158 insertions(+)
>>  create mode 100644 arch/arm64/kvm/hyp/entry.S
>>
>> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
>> index ec14cac..1e1ff06 100644
>> --- a/arch/arm64/kvm/hyp/Makefile
>> +++ b/arch/arm64/kvm/hyp/Makefile
>> @@ -7,3 +7,4 @@ obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
>>  obj-$(CONFIG_KVM_ARM_HOST) += timer-sr.o
>>  obj-$(CONFIG_KVM_ARM_HOST) += sysreg-sr.o
>>  obj-$(CONFIG_KVM_ARM_HOST) += debug-sr.o
>> +obj-$(CONFIG_KVM_ARM_HOST) += entry.o
>> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
>> new file mode 100644
>> index 000..2c4449a
>> --- /dev/null
>> +++ b/arch/arm64/kvm/hyp/entry.S
>> @@ -0,0 +1,155 @@
>> +/*
>> + * Copyright (C) 2015 - ARM Ltd
>> + * Author: Marc Zyngier 
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see .
>> + */
>> +
>> +#include 
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#define CPU_GP_REG_OFFSET(x)(CPU_GP_REGS + x)
>> +#define CPU_XREG_OFFSET(x)  CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
>> +
>> +.text
>> +.pushsection.hyp.text, "ax"
>> +
>> +.macro save_common_regs ctxt
>> +stp x19, x20, [\ctxt, #CPU_XREG_OFFSET(19)]
>> +stp x21, x22, [\ctxt, #CPU_XREG_OFFSET(21)]
>> +stp x23, x24, [\ctxt, #CPU_XREG_OFFSET(23)]
>> +stp x25, x26, [\ctxt, #CPU_XREG_OFFSET(25)]
>> +stp x27, x28, [\ctxt, #CPU_XREG_OFFSET(27)]
>> +stp x29, lr,  [\ctxt, #CPU_XREG_OFFSET(29)]
>> +.endm
>> +
>> +.macro restore_common_regs ctxt
>> +ldp x19, x20, [\ctxt, #CPU_XREG_OFFSET(19)]
>> +ldp x21, x22, [\ctxt, #CPU_XREG_OFFSET(21)]
>> +ldp x23, x24, [\ctxt, #CPU_XREG_OFFSET(23)]
>> +ldp x25, x26, [\ctxt, #CPU_XREG_OFFSET(25)]
>> +ldp x27, x28, [\ctxt, #CPU_XREG_OFFSET(27)]
>> +ldp x29, lr,  [\ctxt, #CPU_XREG_OFFSET(29)]
>> +.endm
>> +
>> +.macro save_host_regs reg
>> +save_common_regs \reg
>> +.endm
>> +
>> +.macro restore_host_regs reg
>> +restore_common_regs \reg
>> +.endm
>> +
>> +.macro save_guest_regs
>> +// x0 is the vcpu address
>> +// x1 is the return code, do not corrupt!
>> +// x2 is the cpu context
> 
> this is confusing because the caller says x2 is free, so are these the
> inputs or invariants preserved in the function, or?
> 
> note that you'll avoid this kind of confusion by inlining this stuff in
> __guest_exit.

Indeed. I might just do that.

>> +// x3 is a tmp register
>> +// Guest's x0-x3 are on the stack
>> +
>> +add x2, x0, #VCPU_CONTEXT
>> +
>> +// Compute base to save registers
> 
> misleading comment?

Of course. Isn't that the very purpose of a comment? I'm confused... ;-)

>> +stp x4, x5,   [x2, #CPU_XREG_OFFSET(4)]
>> +stp x6, x7,   [x2, #CPU_XREG_OFFSET(6)]
>> +stp x8, x9,   [x2, #CPU_XREG_OFFSET(8)]
>> +stp x10, x11, [x2, #CPU_XREG_OFFSET(10)]
>> +stp x12, x13, [x2, #CPU_XREG_OFFSET(12)]
>> +stp x14, x15, [x2, #CPU_XREG_OFFSET(14)]
>> +stp x16, x17, [x2, #CPU_XREG_OFFSET(16)]
>> +str x18,  [x2, #CPU_XREG_OFFSET(18)]
>> +
>> +pop x6, x7  // x2, x3
>> +pop x4, x5  // x0, x1
> 
> hard to review when I haven't seen the code that calls this, but I'll
> assume we store things in register order on the stack.

Indeed. I've basically lifted that code from the previous version, so
some things may have stuck...

>> +
>> +stp x4, x5, [x2, #CPU_XREG_OFFSET(0)]
>> +stp x6, x7, [x2, #CPU_XREG_OFFSET(2)]
>> +
>> +save_common_regs x2
>> +.endm
>> +
>> +.macro restore_guest_regs
>> +// Assume vcpu in x0, clobbers everything else
> 
> nit: clobbers everything (x0 gets nuked too)
> 
>> +
>> +add x2, x0, #VCPU_CONTEXT
>> +
>> 

Re: [Qemu-devel] [for-2.6 PATCH 1/3] target-i386: Define structs for layout of xsave area

2015-12-01 Thread Eduardo Habkost
On Tue, Dec 01, 2015 at 06:27:17PM +0100, Paolo Bonzini wrote:
> On 01/12/2015 18:20, Richard Henderson wrote:
> >>
> >> X86XSaveArea will be used only when loading/saving state using
> >> xsave, not for executing regular instructions.
> > 
> > ... like the regular instruction xsave?
> > 
> > https://patchwork.ozlabs.org/patch/493318/
> 
> Right, but that's a helper anyway.
> 
> >> In X86CPU, the
> >> data is already stored as XMMReg unions (the one with the
> >> XMM_[BWDQ] helpers).
> > 
> > Of course.  But those unions are arranged to be in big-endian format on
> > big-endian hosts.  So we need to swap the data back to little-endian
> > format for storage into guest memory.
> 
> Yes, you can use byte moves with XMM_B (more obvious), or stq_le_p with
> XMM_Q (faster I guess---though the compiler might optimize the former on
> little-endian hosts).  Either works with an uint8_t[] destination.

stq_le_p() (more exactly, stq_p()) is exactly what is already
done by kvm_{get,put}_xsave(), using uint8_t pointers.

BTW, if we are going to implement xsave in TCG, the
X86CPU<->xsave translation logic in kvm_{get,put}_xsave() could
be moved to generic code and reused by TCG instead of being
reimplemented.

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 08/21] arm64: KVM: Implement debug save/restore

2015-12-01 Thread Marc Zyngier
On 01/12/15 15:41, Christoffer Dall wrote:
> On Tue, Dec 01, 2015 at 03:01:16PM +, Marc Zyngier wrote:
>> On 01/12/15 14:47, Christoffer Dall wrote:
>>> On Tue, Dec 01, 2015 at 01:06:31PM +, Marc Zyngier wrote:
 On 01/12/15 12:56, Christoffer Dall wrote:
> On Fri, Nov 27, 2015 at 06:50:02PM +, Marc Zyngier wrote:
>> Implement the debug save restore as a direct translation of
>> the assembly code version.
>>
>> Signed-off-by: Marc Zyngier 
>> ---
>>  arch/arm64/kvm/hyp/Makefile   |   1 +
>>  arch/arm64/kvm/hyp/debug-sr.c | 130 
>> ++
>>  arch/arm64/kvm/hyp/hyp.h  |   9 +++
>>  3 files changed, 140 insertions(+)
>>  create mode 100644 arch/arm64/kvm/hyp/debug-sr.c
>>
>> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
>> index ec94200..ec14cac 100644
>> --- a/arch/arm64/kvm/hyp/Makefile
>> +++ b/arch/arm64/kvm/hyp/Makefile
>> @@ -6,3 +6,4 @@ obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
>>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
>>  obj-$(CONFIG_KVM_ARM_HOST) += timer-sr.o
>>  obj-$(CONFIG_KVM_ARM_HOST) += sysreg-sr.o
>> +obj-$(CONFIG_KVM_ARM_HOST) += debug-sr.o
>> diff --git a/arch/arm64/kvm/hyp/debug-sr.c 
>> b/arch/arm64/kvm/hyp/debug-sr.c
>> new file mode 100644
>> index 000..a0b2b99
>> --- /dev/null
>> +++ b/arch/arm64/kvm/hyp/debug-sr.c
>> @@ -0,0 +1,130 @@
>> +/*
>> + * Copyright (C) 2015 - ARM Ltd
>> + * Author: Marc Zyngier 
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see .
>> + */
>> +
>> +#include 
>> +#include 
>> +
>> +#include 
>> +
>> +#include "hyp.h"
>> +
>> +#define read_debug(r,n) read_sysreg(r##n##_el1)
>> +#define write_debug(v,r,n)  write_sysreg(v, r##n##_el1)
>> +
>> +#define save_debug(ptr,reg,nr)  
>> \
>> +switch (nr) {   
>> \
>> +case 15:ptr[15] = read_debug(reg, 15);  
>> \
>> +case 14:ptr[14] = read_debug(reg, 14);  
>> \
>> +case 13:ptr[13] = read_debug(reg, 13);  
>> \
>> +case 12:ptr[12] = read_debug(reg, 12);  
>> \
>> +case 11:ptr[11] = read_debug(reg, 11);  
>> \
>> +case 10:ptr[10] = read_debug(reg, 10);  
>> \
>> +case 9: ptr[9] = read_debug(reg, 9);
>> \
>> +case 8: ptr[8] = read_debug(reg, 8);
>> \
>> +case 7: ptr[7] = read_debug(reg, 7);
>> \
>> +case 6: ptr[6] = read_debug(reg, 6);
>> \
>> +case 5: ptr[5] = read_debug(reg, 5);
>> \
>> +case 4: ptr[4] = read_debug(reg, 4);
>> \
>> +case 3: ptr[3] = read_debug(reg, 3);
>> \
>> +case 2: ptr[2] = read_debug(reg, 2);
>> \
>> +case 1: ptr[1] = read_debug(reg, 1);
>> \
>> +default:ptr[0] = read_debug(reg, 0);
>> \
>> +}
>> +
>> +#define restore_debug(ptr,reg,nr)   
>> \
>> +switch (nr) {   
>> \
>> +case 15:write_debug(ptr[15], reg, 15);  
>> \
>> +case 14:write_debug(ptr[14], reg, 14);  
>> \
>> +case 13:write_debug(ptr[13], reg, 13);  
>> \
>> +case 12:write_debug(ptr[12], reg, 12);  
>> \
>> +case 11:write_debug(ptr[11], reg, 11);  
>> \
>> +case 10:write_debug(ptr[10], reg, 10);  
>> \
>> +case 9: write_debug(ptr[9], reg, 9);   

Re: [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC

2015-12-01 Thread Alexander Duyck
On Tue, Dec 1, 2015 at 9:37 AM, Michael S. Tsirkin  wrote:
> On Tue, Dec 01, 2015 at 09:04:32AM -0800, Alexander Duyck wrote:
>> On Tue, Dec 1, 2015 at 7:28 AM, Michael S. Tsirkin  wrote:

>> > There are several components to this:
>> > - dma_map_* needs to prevent page from
>> >   being migrated while device is running.
>> >   For example, expose some kind of bitmap from guest
>> >   to host, set bit there while page is mapped.
>> >   What happens if we stop the guest and some
>> >   bits are still set? See dma_alloc_coherent below
>> >   for some ideas.
>>
>> Yeah, I could see something like this working.  Maybe we could do
>> something like what was done for the NX bit and make use of the upper
>> order bits beyond the limits of the memory range to mark pages as
>> non-migratable?
>>
>> I'm curious.  What we have with a DMA mapped region is essentially
>> shared memory between the guest and the device.  How would we resolve
>> something like this with IVSHMEM, or are we blocked there as well in
>> terms of migration?
>
> I have some ideas. Will post later.

I look forward to it.

>> > - dma_unmap_* needs to mark page as dirty
>> >   This can be done by writing into a page.
>> >
>> > - dma_sync_* needs to mark page as dirty
>> >   This is trickier as we can not change the data.
>> >   One solution is using atomics.
>> >   For example:
>> > int x = ACCESS_ONCE(*p);
>> > cmpxchg(p, x, x);
>> >   Seems to do a write without changing page
>> >   contents.
>>
>> Like I said we can probably kill 2 birds with one stone by just
>> implementing our own dma_mark_clean() for x86 virtualized
>> environments.
>>
>> I'd say we could take your solution one step further and just use 0
>> instead of bothering to read the value.  After all it won't write the
>> area if the value at the offset is not 0.
>
> Really almost any atomic that has no side effect will do.
> atomic or with 0
> atomic and with 
>
> It's just that cmpxchg already happens to have a portable
> wrapper.

I was originally thinking maybe an atomic_add with 0 would be the way
to go.  Either way though we still are using a locked prefix and
having to dirty a cache line per page which is going to come at some
cost.

>> > - dma_alloc_coherent memory (e.g. device rings)
>> >   must be migrated after device stopped modifying it.
>> >   Just stopping the VCPU is not enough:
>> >   you must make sure device is not changing it.
>> >
>> >   Or maybe the device has some kind of ring flush operation,
>> >   if there was a reasonably portable way to do this
>> >   (e.g. a flush capability could maybe be added to SRIOV)
>> >   then hypervisor could do this.
>>
>> This is where things start to get messy. I was suggesting the
>> suspend/resume to resolve this bit, but it might be possible to also
>> deal with this via something like this via clearing the bus master
>> enable bit for the VF.  If I am not mistaken that should disable MSI-X
>> interrupts and halt any DMA.  That should work as long as you have
>> some mechanism that is tracking the pages in use for DMA.
>
> A bigger issue is recovering afterwards.

Agreed.

>> >   In case you need to resume on source, you
>> >   really need to follow the same path
>> >   as on destination, preferably detecting
>> >   device reset and restoring the device
>> >   state.
>>
>> The problem with detecting the reset is that you would likely have to
>> be polling to do something like that.
>
> We could some event to guest to notify it about this event
> through a new or existing channel.
>
> Or we could make it possible for userspace to trigger this,
> then notify guest through the guest agent.

The first thing that comes to mind would be to use something like PCIe
Advanced Error Reporting, however I don't know if we can put a
requirement on the system supporting the q35 machine type or not in
order to support migration.

>>  I believe the fm10k driver
>> already has code like that in place where it will detect a reset as a
>> part of its watchdog, however the response time is something like 2
>> seconds for that.  That was one of the reasons I preferred something
>> like hot-plug as that should be functioning as soon as the guest is up
>> and it is a mechanism that operates outside of the VF drivers.
>
> That's pretty minor.
> A bigger issue is making sure guest does not crash
> when device is suddenly reset under it's legs.

I know the ixgbevf driver should already have logic to address some of
that.  If you look through the code there should be logic there for a
surprise removal support in ixgbevf.  The only issue is that unlike
fm10k it will not restore itself after a resume or slot_reset call.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [for-2.6 PATCH 1/3] target-i386: Define structs for layout of xsave area

2015-12-01 Thread Richard Henderson

On 12/01/2015 10:34 AM, Eduardo Habkost wrote:

BTW, if we are going to implement xsave in TCG, the
X86CPU<->xsave translation logic in kvm_{get,put}_xsave() could
be moved to generic code and reused by TCG instead of being
reimplemented.


That's not trivial.

In particular, stq_p isn't what the tcg helper needs to use, but rather 
cpu_stq_data_ra.  Given the differing parameters, we'd have to resort to some 
sort of macro-ization.  It's probably easiest to simply keep the two 
implementations separate.



r~

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 10/21] arm64: KVM: Add patchable function selector

2015-12-01 Thread Marc Zyngier
On 01/12/15 15:39, Christoffer Dall wrote:
> On Fri, Nov 27, 2015 at 06:50:04PM +, Marc Zyngier wrote:
>> KVM so far relies on code patching, and is likely to use it more
>> in the future. The main issue is that our alternative system works
>> at the instruction level, while we'd like to have alternatives at
>> the function level.
>>
>> In order to cope with this, add the "hyp_alternate_select" macro that
>> outputs a brief sequence of code that in turn can be patched, allowing
>> al alternative function to be selected.
> 
> s/al/an/ ?
> 
>>
>> Signed-off-by: Marc Zyngier 
>> ---
>>  arch/arm64/kvm/hyp/hyp.h | 16 
>>  1 file changed, 16 insertions(+)
>>
>> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
>> index 7ac8e11..f0427ee 100644
>> --- a/arch/arm64/kvm/hyp/hyp.h
>> +++ b/arch/arm64/kvm/hyp/hyp.h
>> @@ -27,6 +27,22 @@
>>  
>>  #define kern_hyp_va(v) (typeof(v))((unsigned long)v & HYP_PAGE_OFFSET_MASK)
>>  
>> +/*
>> + * Generates patchable code sequences that are used to switch between
>> + * two implementations of a function, depending on the availability of
>> + * a feature.
>> + */
> 
> This looks right to me, but I'm a bit unclear what the types of this is
> and how to use it.
> 
> Are orig and alt function pointers and cond is a CONFIG_FOO ?  fname is
> a symbol, which is defined as a prototype somewhere and then implemented
> here, or?
> 
> Perhaps a Usage: part of the docs would be helpful.

How about:

@fname: a symbol name that will be defined as a function returning a
function pointer whose type will match @orig and @alt
@orig: A pointer to the default function, as returned by @fname when
@cond doesn't hold
@alt: A pointer to the alternate function, as returned by @fname when
@cond holds
@cond: a CPU feature (as described in asm/cpufeature.h)

> 
>> +#define hyp_alternate_select(fname, orig, alt, cond)
>> \
>> +typeof(orig) * __hyp_text fname(void)   
>> \
>> +{   \
>> +typeof(alt) *val = orig;\
>> +asm volatile(ALTERNATIVE("nop   \n",\
>> + "mov   %0, %1  \n",\
>> + cond)  \
>> + : "+r" (val) : "r" (alt)); \
>> +return val; \
>> +}
>> +
>>  void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
>>  void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
>>  
>> -- 
>> 2.1.4
>>
> 
> I haven't thought much about how all of this is implemented, but from my
> point of views the ideal situation would be something like:
> 
> void foo(int a, int b)
> {
>   ALTERNATIVE_IF_NOT CONFIG_BAR
>   foo_legacy(a, b);
>   ALTERNATIVE_ELSE
>   foo_new(a, b);
>   ALTERNATIVE_END
> }
> 
> I realize this may be impossible because the C code could implement all
> sort of fun stuff around the actual function calls, but would there be
> some way to annotate the functions and find the actual branch statement
> and change the target?

The main issue is that C doesn't give you any access to the branch
function itself, except for the asm-goto statements. It also makes it
very hard to preserve the return type. For your idea to work, we'd need
some support in the compiler itself. I'm sure that it is doable, just
not by me! ;-)

This is why I've ended up creating something that returns a function
*pointer*, because that's something that exists in the language (no new
concept). I simply made sure I could return it at minimal cost.

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/21] arm64: KVM: world switch in C

2015-12-01 Thread Christoffer Dall
On Tue, Dec 01, 2015 at 05:51:46PM +, Marc Zyngier wrote:
> On 01/12/15 12:00, Christoffer Dall wrote:
> > On Tue, Dec 01, 2015 at 09:58:23AM +, Marc Zyngier wrote:
> >> On 30/11/15 20:33, Christoffer Dall wrote:
> >>> On Fri, Nov 27, 2015 at 06:49:54PM +, Marc Zyngier wrote:
>  Once upon a time, the KVM/arm64 world switch was a nice, clean, lean
>  and mean piece of hand-crafted assembly code. Over time, features have
>  crept in, the code has become harder to maintain, and the smallest
>  change is a pain to introduce. The VHE patches are a prime example of
>  why this doesn't work anymore.
> 
>  This series rewrites most of the existing assembly code in C, but keeps
>  the existing code structure in place (most function names will look
>  familiar to the reader). The biggest change is that we don't have to
>  deal with a static register allocation (the compiler does it for us),
>  we can easily follow structure and pointers, and only the lowest level
>  is still in assembly code. Oh, and a negative diffstat.
> 
>  There is still a healthy dose of inline assembly (system register
>  accessors, runtime code patching), but I've tried not to make it too
>  invasive. The generated code, while not exactly brilliant, doesn't
>  look too shaby. I do expect a small performance degradation, but I
>  believe this is something we can improve over time (my initial
>  measurements don't show any obvious regression though).
> >>>
> >>> I ran this through my experimental setup on m400 and got this:
> >>
> >> [...]
> >>
> >>> What this tells me is that we do take a noticable hit on the
> >>> world-switch path, which shows up in the TCP_RR and hackbench workloads,
> >>> which have a high precision in their output.
> >>>
> >>> Note that the memcached number is well within its variability between
> >>> individual benchmark runs, where it varies to 12% of its average in over
> >>> 80% of the executions.
> >>>
> >>> I don't think this is a showstopper thought, but we could consider
> >>> looking more closely at a breakdown of the world-switch path and verify
> >>> if/where we are really taking a hit.
> >>
> >> Thanks for doing so, very interesting. As a data point, what compiler
> >> are you using? I'd expect some variability based on the compiler version...
> >>
> > I used the following (compiling natively on the m400):
> > 
> > gcc version 4.8.2 (Ubuntu/Linaro 4.8.2-19ubuntu1)
> 
> For what it is worth, I've ran hackbench on my Seattle B0 (8xA57 2GHz),
> with a 4 vcpu VM and got the following results (10 runs per kernel
> version, same configuration):
> 
> v4.4-rc3-wsinc: Average 31.750
> 32.459
> 32.124
> 32.435
> 31.940
> 31.085
> 31.804
> 31.862
> 30.985
> 31.450
> 31.359
> 
> v4.4-rc3: Average 31.954
> 31.806
> 31.598
> 32.697
> 31.472
> 31.410
> 32.562
> 31.938
> 31.932
> 31.672
> 32.459
> 
> This is with GCC as produced by Linaro:
> aarch64-linux-gnu-gcc (Linaro GCC 5.1-2015.08) 5.1.1 20150608
> 
> It could well be that your compiler generates worse code than the one I
> use, or that the code it outputs is badly tuned for XGene. I guess I
> need to unearth my Mustang to find out...
> 
Worth investigating I suppose.  At any rate, the conclusion stays the
same; we should proceed with these patches.

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG ALERT: ARM32 KVM does not work in 4.4-rc3

2015-12-01 Thread Marc Zyngier
On 01/12/15 07:24, Pavel Fedin wrote:
> Hello!
> 
> My project involves ARM64, but from time to time i also test ARM32
> KVM. I have discovered that it stopped working in 4.4-rc3. The same
> virtual machine works perfectly under current kvmarm/next, but gets
> stuck at random point under 4.4-rc3 from linux-stable. I'm not sure
> that i have time to investigate this quickly, but i'll post some new
> information as soon as i get it

root@canarsie:~# uname -a
Linux canarsie 4.4.0-rc3 #5044 SMP PREEMPT Tue Dec 1 09:12:40 GMT 2015 armv7l 
GNU/Linux
root@canarsie:~# cat /proc/cpuinfo 
processor   : 0
model name  : ARMv7 Processor rev 4 (v7l)
BogoMIPS: 48.00
Features: half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4 
idiva idivt vfpd32 lpae evtstrm 
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part: 0xc07
CPU revision: 4

processor   : 1
model name  : ARMv7 Processor rev 4 (v7l)
BogoMIPS: 48.00
Features: half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpv4 
idiva idivt vfpd32 lpae evtstrm 
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part: 0xc07
CPU revision: 4

Hardware: Generic DT based system
Revision: 
Serial  : 

The same kernel is used both as a guest and a host with v4.4-rc3.

So until you bisect it to an exact commit and configuration, I declare
the alert over. ;-)

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: best way to create a snapshot of a running vm ?

2015-12-01 Thread Stefan Hajnoczi
On Mon, Nov 30, 2015 at 04:34:14PM +0100, Lentes, Bernd wrote:
> Stefan wrote:
> 
> > 
> > Hi Bernd,
> > qemu-img cannot be used on the disk image when the VM is running.
> > Please use virsh, it communicates with the running QEMU process and
> > ensures that the snapshot is crash-consistent.
> > 
> 
> Hi Stefan,
> 
> thanks for your answer.
> 
> i read that virsh uses internally qemu-img
> (http://serverfault.com/questions/692435/qemu-img-snapshot-on-live-vm).
> Is that true ?

It's false in the general case.

While the VM is running libvirt will use the QEMU monitor to communicate
with the QEMU process instead of using qemu-img.

While the VM is shut down libvirt can use qemu-img safely.

The reason why qemu-img isn't safe is that the image file might be
written to by the running VM at the same time as qemu-img reads/writes
it.  This can corrupt image files.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v4 05/21] KVM: ARM64: Add reset and access handlers for PMSELR register

2015-12-01 Thread Marc Zyngier
On 01/12/15 01:51, Shannon Zhao wrote:
> Hi Marc,
> 
> On 2015/12/1 1:56, Marc Zyngier wrote:
>> Same remark here as the one I made earlier. I'm pretty sure we don't
>> call any CP15 reset because they are all shared with their 64bit
>> counterparts. The same thing goes for the whole series.
> Ok, I see. But within the 64bit reset function, it needs to update the
> 32bit register value, right? Since when accessing these 32bit registers,
> it uses the offset c9_PM.

It shouldn't,  because the 64bit and 32bit share the same storage. From
your own patch:

+/* Performance Monitors*/
+#define c9_PMCR(PMCR_EL0 * 2)
+#define c9_PMOVSSET(PMOVSSET_EL0 * 2)
+#define c9_PMOVSCLR(PMOVSCLR_EL0 * 2)
+#define c9_PMCCNTR (PMCCNTR_EL0 * 2)
+#define c9_PMSELR  (PMSELR_EL0 * 2)
+#define c9_PMCEID0 (PMCEID0_EL0 * 2)
+#define c9_PMCEID1 (PMCEID1_EL0 * 2)
+#define c9_PMXEVCNTR   (PMXEVCNTR_EL0 * 2)
+#define c9_PMXEVTYPER  (PMXEVTYPER_EL0 * 2)
+#define c9_PMCNTENSET  (PMCNTENSET_EL0 * 2)
+#define c9_PMCNTENCLR  (PMCNTENCLR_EL0 * 2)
+#define c9_PMINTENSET  (PMINTENSET_EL1 * 2)
+#define c9_PMINTENCLR  (PMINTENCLR_EL1 * 2)
+#define c9_PMUSERENR   (PMUSERENR_EL0 * 2)
+#define c9_PMSWINC (PMSWINC_EL0 * 2)

These are indexes in the copro array:

struct kvm_cpu_context {
struct kvm_regs gp_regs;
union {
u64 sys_regs[NR_SYS_REGS];
u32 copro[NR_COPRO_REGS];
};
};

which is in a union with the sys_reg array. So anything that affects one
affects the other because:
- there is only one state in the physical CPU, no matter which mode
you're in,
- the guest EL1 is either 32bit or 64bit, and never changes over time.

Hope this helps,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: BUG ALERT: ARM32 KVM does not work in 4.4-rc3

2015-12-01 Thread Pavel Fedin
 Hello!

> The same kernel is used both as a guest and a host with v4.4-rc3.
> 
> So until you bisect it to an exact commit and configuration, I declare
> the alert over. ;-)

 By this time i have also tried it on another machine, and there it also works. 
Looks like it's triggered only on some particular
HW. I'll try to figure this out.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/21] arm64: KVM: world switch in C

2015-12-01 Thread Marc Zyngier
On 30/11/15 20:33, Christoffer Dall wrote:
> On Fri, Nov 27, 2015 at 06:49:54PM +, Marc Zyngier wrote:
>> Once upon a time, the KVM/arm64 world switch was a nice, clean, lean
>> and mean piece of hand-crafted assembly code. Over time, features have
>> crept in, the code has become harder to maintain, and the smallest
>> change is a pain to introduce. The VHE patches are a prime example of
>> why this doesn't work anymore.
>>
>> This series rewrites most of the existing assembly code in C, but keeps
>> the existing code structure in place (most function names will look
>> familiar to the reader). The biggest change is that we don't have to
>> deal with a static register allocation (the compiler does it for us),
>> we can easily follow structure and pointers, and only the lowest level
>> is still in assembly code. Oh, and a negative diffstat.
>>
>> There is still a healthy dose of inline assembly (system register
>> accessors, runtime code patching), but I've tried not to make it too
>> invasive. The generated code, while not exactly brilliant, doesn't
>> look too shaby. I do expect a small performance degradation, but I
>> believe this is something we can improve over time (my initial
>> measurements don't show any obvious regression though).
> 
> I ran this through my experimental setup on m400 and got this:

[...]

> What this tells me is that we do take a noticable hit on the
> world-switch path, which shows up in the TCP_RR and hackbench workloads,
> which have a high precision in their output.
> 
> Note that the memcached number is well within its variability between
> individual benchmark runs, where it varies to 12% of its average in over
> 80% of the executions.
> 
> I don't think this is a showstopper thought, but we could consider
> looking more closely at a breakdown of the world-switch path and verify
> if/where we are really taking a hit.

Thanks for doing so, very interesting. As a data point, what compiler
are you using? I'd expect some variability based on the compiler version...

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: BUG ALERT: ARM32 KVM does not work in 4.4-rc3

2015-12-01 Thread Pavel Fedin
 Hello!

> > My project involves ARM64, but from time to time i also test ARM32
> > KVM. I have discovered that it stopped working in 4.4-rc3. The same
> > virtual machine works perfectly under current kvmarm/next, but gets
> > stuck at random point under 4.4-rc3 from linux-stable. I'm not sure
> > that i have time to investigate this quickly, but i'll post some new
> > information as soon as i get it

[skip]

> So until you bisect it to an exact commit and configuration, I declare
> the alert over. ;-)

 The commit in question is e6fab54423450d699a09ec2b899473a541f61971 
("ARM/arm64: KVM: test properly for a PTE's uncachedness").
Reverting it fixes the problem.
 Study in qemu shows that the CPU gets stuck at PC = 0x0C with LR = 0x10. So i 
quickly decided that it might have to do with
caching, and my first hit was correct. The guest crashes in this state very 
early, sometimes it even cannot fully print
"Uncompressing kernel".
 The machine which reproduces it is custom Samsung's out-of-tree board. I'll 
investigate it further in order to determine how
exactly the commit could harm. I know that it passed reviews and testing, and i 
was involved too. Perhaps it's board's code fault,
however.

Cc'ed to others involved.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 0/7] KVM: Hyper-V SynIC timers

2015-12-01 Thread Wanpeng Li
2015-11-26 16:34 GMT+08:00 Andrey Smetanin :
>
>
> On 11/26/2015 08:28 AM, Wanpeng Li wrote:
>>
>> 2015-11-25 23:20 GMT+08:00 Andrey Smetanin :
>>>
>>> Per Hyper-V specification (and as required by Hyper-V-aware guests),
>>> SynIC provides 4 per-vCPU timers.  Each timer is programmed via a pair
>>> of MSRs, and signals expiration by delivering a special format message
>>> to the configured SynIC message slot and triggering the corresponding
>>> synthetic interrupt.
>>
>>
>> Could you post a link for this specification?
>
>
> Official link:
>
> http://download.microsoft.com/download/A/B/4/AB43A34E-BDD0-4FA6-BDEF-79EEF16E880B/Hypervisor%20Top%20Level%20Functional%20Specification%20v4.0.docx
>
> and there is a pdf variant(my own docx -> pdf conversion):
>
> https://www.dropbox.com/s/ehxictr5wgnedq7/Hypervisor%20Top%20Level%20Functional%20Specification%20v4.0.pdf?dl=0

Btw, is there performance data for such feature?

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/11] KVM: x86: track guest page access

2015-12-01 Thread Paolo Bonzini


On 30/11/2015 19:26, Xiao Guangrong wrote:
> This patchset introduces the feature which allows us to track page
> access in guest. Currently, only write access tracking is implemented
> in this version.
> 
> Four APIs are introduces:
> - kvm_page_track_add_page(kvm, gfn, mode), single guest page @gfn is
>   added into the track pool of the guest instance represented by @kvm,
>   @mode specifies which kind of access on the @gfn is tracked
>   
> - kvm_page_track_remove_page(kvm, gfn, mode), is the opposed operation
>   of kvm_page_track_add_page() which removes @gfn from the tracking pool.
>   gfn is no tracked after its last user is gone
> 
> - kvm_page_track_register_notifier(kvm, n), register a notifier so that
>   the event triggered by page tracking will be received, at that time,
>   the callback of n->track_write() will be called
> 
> - kvm_page_track_unregister_notifier(kvm, n), does the opposed operation
>   of kvm_page_track_register_notifier(), which unlinks the notifier and
>   stops receiving the tracked event
> 
> The first user of page track is non-leaf shadow page tables as they are
> always write protected. It also gains performance improvement because
> page track speeds up page fault handler for the tracked pages. The
> performance result of kernel building is as followings:
> 
>before   after
> real 461.63   real 455.48
> user 4529.55  user 4557.88
> sys 1995.39   sys 1922.57

For KVM-GT, as far as I know Andrea Arcangeli is working on extending
userfaultfd to tracking write faults only.  Perhaps KVM-GT can do
something similar, where KVM gets the write tracking functionality for
free through the MMU notifiers.  Any thoughts on this?

Applying your technique to non-leaf shadow pages actually makes this
series quite interesting. :)  Shadow paging is still in use for nested
EPT, so it's always a good idea to speed it up.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] KVM: arm/arm64: Decouple virtual timer from vGIC

2015-12-01 Thread Pavel Fedin
Remove dependency on vgic_initialized() and use the newly introduced
infrastructure to send interrupts via the userspace if vGIC is not being
used.

Signed-off-by: Pavel Fedin 
---
 arch/arm/kvm/arm.c|  8 +---
 virt/kvm/arm/arch_timer.c | 23 +--
 2 files changed, 14 insertions(+), 17 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 6392a5b..e729068 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -468,13 +468,7 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
return ret;
}
 
-   /*
-* Enable the arch timers only if we have an in-kernel VGIC
-* and it has been properly initialized, since we cannot handle
-* interrupts from the virtual timer with a userspace gic.
-*/
-   if (irqchip_in_kernel(kvm) && vgic_initialized(kvm))
-   kvm_timer_enable(kvm);
+   kvm_timer_enable(kvm);
 
return 0;
 }
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 69bca18..90c91b0 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -128,15 +128,17 @@ static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, 
bool new_level)
int ret;
struct arch_timer_cpu *timer = >arch.timer_cpu;
 
-   BUG_ON(!vgic_initialized(vcpu->kvm));
-
timer->irq.level = new_level;
trace_kvm_timer_update_irq(vcpu->vcpu_id, timer->map->virt_irq,
   timer->irq.level);
-   ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
-timer->map,
-timer->irq.level);
-   WARN_ON(ret);
+   if (irqchip_in_kernel(vcpu->kvm)) {
+   ret = kvm_vgic_inject_mapped_irq(vcpu->kvm, vcpu->vcpu_id,
+timer->map,
+timer->irq.level);
+   WARN_ON(ret);
+   } else {
+   vcpu->irq = >irq;
+   }
 }
 
 /*
@@ -149,12 +151,12 @@ static void kvm_timer_update_state(struct kvm_vcpu *vcpu)
 
/*
 * If userspace modified the timer registers via SET_ONE_REG before
-* the vgic was initialized, we mustn't set the timer->irq.level value
+* the timer was initialized, we mustn't set the timer->irq.level value
 * because the guest would never see the interrupt.  Instead wait
 * until we call this function from kvm_timer_flush_hwstate.
 */
-   if (!vgic_initialized(vcpu->kvm))
-   return;
+   if (!vcpu->kvm->arch.timer.enabled)
+   return;
 
if (kvm_timer_should_fire(vcpu) != timer->irq.level)
kvm_timer_update_irq(vcpu, !timer->irq.level);
@@ -237,7 +239,8 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
* to ensure that hardware interrupts from the timer triggers a guest
* exit.
*/
-   if (timer->irq.level || kvm_vgic_map_is_active(vcpu, timer->map))
+   if (timer->irq.level || (irqchip_in_kernel(vcpu->kvm) &&
+kvm_vgic_map_is_active(vcpu, timer->map)))
phys_active = true;
else
phys_active = false;
-- 
2.4.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] KVM: Documentation: Document KVM_EXIT_IRQ

2015-12-01 Thread Pavel Fedin
Add documentation for the new exit code.

Signed-off-by: Pavel Fedin 
---
 Documentation/virtual/kvm/api.txt | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 092ee9f..d8aae4c 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3331,6 +3331,20 @@ the userspace IOAPIC should process the EOI and 
retrigger the interrupt if
 it is still asserted.  Vector is the LAPIC interrupt vector for which the
 EOI was received.
 
+   /* KVM_EXIT_IRQ */
+   struct kvm_irq_level irq;
+
+Indicates that an interrupt happens, to be processed by irqchip implemented in
+userspace. irq.irq specifies the raw IRQ number, and irq.status is to be
+interpreted according to interrupt type:
+ For level-triggered interrupts irq.status is set to new level of the line, and
+  the exit happens upon level change.
+ For edge-triggered interrupts irq.status is set to active level of the line
+  (low or high), and the exit happens when the line is pulsed.
+
+CPU-private interrupts (like per-CPU timers) belong to the vCPU where the exit
+happened.
+
/* Fix the size of the union. */
char padding[256];
};
-- 
2.4.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] KVM: Introduce KVM_EXIT_IRQ

2015-12-01 Thread Pavel Fedin
This exit code means that this vCPU wants to inject an interrupt using
userspace-emulated controller.

IRQs are signalled by adding pending interrupt descriptors to vcpu
structure. For simplicity, we currently reserve only one pointer for a
single interrupt, which will be used by ARM virtual timer code. This can
be extended in the future if necessary.

The interface is designed to be as much arch-agnostic as possible.
Therefore, it has IRQ number and level as parameters (encoded in
struct kvm_irq_level).

Signed-off-by: Pavel Fedin 
---
 arch/arm/kvm/arm.c   | 6 ++
 include/linux/kvm_host.h | 7 +++
 include/uapi/linux/kvm.h | 3 +++
 3 files changed, 16 insertions(+)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 66f90c1..6392a5b 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -585,6 +585,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
if (signal_pending(current)) {
ret = -EINTR;
run->exit_reason = KVM_EXIT_INTR;
+   } else if (vcpu->irq) {
+   ret = 0;
+   run->exit_reason = KVM_EXIT_IRQ;
+   run->irq.irq = vcpu->irq->irq;
+   run->irq.level = vcpu->irq->level;
+   vcpu->irq = NULL;
}
 
if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c923350..93f59c5 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -281,6 +281,13 @@ struct kvm_vcpu {
} spin_loop;
 #endif
bool preempted;
+
+   /*
+* IRQ pending to the userspace on this CPU.
+* Currently we support only one slot, used only by ARM architecture.
+*/
+   const struct kvm_irq_level *irq;
+
struct kvm_vcpu_arch arch;
 };
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 03f3618..a717a9b 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -184,6 +184,7 @@ struct kvm_s390_skeys {
 #define KVM_EXIT_SYSTEM_EVENT 24
 #define KVM_EXIT_S390_STSI25
 #define KVM_EXIT_IOAPIC_EOI   26
+#define KVM_EXIT_IRQ  27
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -338,6 +339,8 @@ struct kvm_run {
struct {
__u8 vector;
} eoi;
+   /* KVM_EXIT_IRQ */
+   struct kvm_irq_level irq;
/* Fix the size of the union. */
char padding[256];
};
-- 
2.4.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] Add support for handling IRQs in userspace

2015-12-01 Thread Pavel Fedin
This patch series introduces ability to handle IRQs in userspace. This is
currently necessary for ARM KVM in order to be able to use virtual CP15
timer without in-kernel irqchip. This allows to use KVM on machines with
either broken vGIC or custom interrupt controller, like Raspberry Pi 2.

The API is designed to be as much architecture-agnostic is possible.
Currently it actually supports only a single IRQ, but it can be easily
extended to accomodate more.

Pavel Fedin (3):
  KVM: Introduce KVM_EXIT_IRQ
  KVM: Documentation: Document KVM_EXIT_IRQ
  KVM: arm/arm64: Decouple virtual timer from vGIC

 Documentation/virtual/kvm/api.txt | 14 ++
 arch/arm/kvm/arm.c| 14 +++---
 include/linux/kvm_host.h  |  7 +++
 include/uapi/linux/kvm.h  |  3 +++
 virt/kvm/arm/arch_timer.c | 23 +--
 5 files changed, 44 insertions(+), 17 deletions(-)

-- 
2.4.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/3] target-i386: Use C struct for xsave area layout, offsets & sizes

2015-12-01 Thread Paolo Bonzini
On 30/11/2015 18:34, Eduardo Habkost wrote:
> target-i386/cpu.c:ext_save_area uses magic numbers for the xsave
> area offets and sizes, and target-i386/kvm.c:kvm_{put,get}_xsave()
> uses offset macros and bit manipulation to access the xsave area.
> This series changes both to use C structs for those operations.
> 
> I still need to figure out a way to write unit tests for the new
> code. Maybe I will just copy and paste the new and old functions,
> and test them locally (checking if they give the same results
> when translating blobs of random bytes).
> 
> Changes v1 -> v2:
> * Use uint8_t[8*n] instead of uint64_t[n] for register data
> * Keep the QEMU_BUILD_BUG_ON lines
> 
> v1 -> v2 diff below:
> 
>   diff --git a/target-i386/cpu.h b/target-i386/cpu.h
>   index 3d1d01e..41f55ef 100644
>   --- a/target-i386/cpu.h
>   +++ b/target-i386/cpu.h
>   @@ -818,7 +818,7 @@ typedef union X86LegacyXSaveArea {
>uint32_t mxcsr;
>uint32_t mxcsr_mask;
>FPReg fpregs[8];
>   -uint64_t xmm_regs[16][2];
>   +uint8_t xmm_regs[16][16];
>};
>uint8_t data[512];
>} X86LegacyXSaveArea;
>   @@ -831,7 +831,7 @@ typedef struct X86XSaveHeader {
> 
>/* Ext. save area 2: AVX State */
>typedef struct XSaveAVX {
>   -uint64_t ymmh[16][2];
>   +uint8_t ymmh[16][16];
>} XSaveAVX;
> 
>/* Ext. save area 3: BNDREG */
>   @@ -852,12 +852,12 @@ typedef struct XSaveOpmask {
> 
>/* Ext. save area 6: ZMM_Hi256 */
>typedef struct XSaveZMM_Hi256 {
>   -uint64_t zmm_hi256[16][4];
>   +uint8_t zmm_hi256[16][32];
>} XSaveZMM_Hi256;
> 
>/* Ext. save area 7: Hi16_ZMM */
>typedef struct XSaveHi16_ZMM {
>   -XMMReg hi16_zmm[16];
>   +uint8_t hi16_zmm[16][64];
>} XSaveHi16_ZMM;
> 
>typedef struct X86XSaveArea {
>   diff --git a/target-i386/kvm.c b/target-i386/kvm.c
>   index 5e7ec70..98249e4 100644
>   --- a/target-i386/kvm.c
>   +++ b/target-i386/kvm.c
>   @@ -1203,6 +1203,43 @@ static int kvm_put_fpu(X86CPU *cpu)
>return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_FPU, );
>}
> 
>   +#define XSAVE_FCW_FSW 0
>   +#define XSAVE_FTW_FOP 1
>   +#define XSAVE_CWD_RIP 2
>   +#define XSAVE_CWD_RDP 4
>   +#define XSAVE_MXCSR   6
>   +#define XSAVE_ST_SPACE8
>   +#define XSAVE_XMM_SPACE   40
>   +#define XSAVE_XSTATE_BV   128
>   +#define XSAVE_YMMH_SPACE  144
>   +#define XSAVE_BNDREGS 240
>   +#define XSAVE_BNDCSR  256
>   +#define XSAVE_OPMASK  272
>   +#define XSAVE_ZMM_Hi256   288
>   +#define XSAVE_Hi16_ZMM416
>   +
>   +#define XSAVE_BYTE_OFFSET(word_offset) \
>   +((word_offset)*sizeof(((struct kvm_xsave*)0)->region[0]))
>   +
>   +#define ASSERT_OFFSET(word_offset, field) \
>   +QEMU_BUILD_BUG_ON(XSAVE_BYTE_OFFSET(word_offset) != \
>   +  offsetof(X86XSaveArea, field))
>   +
>   +ASSERT_OFFSET(XSAVE_FCW_FSW, legacy.fcw);
>   +ASSERT_OFFSET(XSAVE_FTW_FOP, legacy.ftw);
>   +ASSERT_OFFSET(XSAVE_CWD_RIP, legacy.fpip);
>   +ASSERT_OFFSET(XSAVE_CWD_RDP, legacy.fpdp);
>   +ASSERT_OFFSET(XSAVE_MXCSR, legacy.mxcsr);
>   +ASSERT_OFFSET(XSAVE_ST_SPACE, legacy.fpregs);
>   +ASSERT_OFFSET(XSAVE_XMM_SPACE, legacy.xmm_regs);
>   +ASSERT_OFFSET(XSAVE_XSTATE_BV, header.xstate_bv);
>   +ASSERT_OFFSET(XSAVE_YMMH_SPACE, avx_state);
>   +ASSERT_OFFSET(XSAVE_BNDREGS, bndreg_state);
>   +ASSERT_OFFSET(XSAVE_BNDCSR, bndcsr_state);
>   +ASSERT_OFFSET(XSAVE_OPMASK, opmask_state);
>   +ASSERT_OFFSET(XSAVE_ZMM_Hi256, zmm_hi256_state);
>   +ASSERT_OFFSET(XSAVE_Hi16_ZMM, hi16_zmm_state);
>   +
>static int kvm_put_xsave(X86CPU *cpu)
>{
>CPUX86State *env = >env;
>   @@ -1239,17 +1276,17 @@ static int kvm_put_xsave(X86CPU *cpu)
>sizeof env->opmask_regs);
> 
>for (i = 0; i < CPU_NB_REGS; i++) {
>   -X86LegacyXSaveArea *legacy = >legacy;
>   -XSaveAVX *avx = >avx_state;
>   -XSaveZMM_Hi256 *zmm_hi256 = >zmm_hi256_state;
>   -stq_p(>xmm_regs[i][0], env->xmm_regs[i].XMM_Q(0));
>   -stq_p(>xmm_regs[i][1], env->xmm_regs[i].XMM_Q(1));
>   -stq_p(>ymmh[i][0],env->xmm_regs[i].XMM_Q(2));
>   -stq_p(>ymmh[i][1],env->xmm_regs[i].XMM_Q(3));
>   -stq_p(_hi256->zmm_hi256[i][0], env->xmm_regs[i].XMM_Q(4));
>   -stq_p(_hi256->zmm_hi256[i][1], env->xmm_regs[i].XMM_Q(5));
>   -stq_p(_hi256->zmm_hi256[i][2], env->xmm_regs[i].XMM_Q(6));
>   -stq_p(_hi256->zmm_hi256[i][3], env->xmm_regs[i].XMM_Q(7));
>   +uint8_t *xmm = xsave->legacy.xmm_regs[i];
>   +uint8_t *ymmh = xsave->avx_state.ymmh[i];
>   +uint8_t *zmmh = xsave->zmm_hi256_state.zmm_hi256[i];
>   +stq_p(xmm, env->xmm_regs[i].XMM_Q(0));
>   +stq_p(xmm+8,   env->xmm_regs[i].XMM_Q(1));
>   +stq_p(ymmh,env->xmm_regs[i].XMM_Q(2));
>   +stq_p(ymmh+8,  env->xmm_regs[i].XMM_Q(3));
>   +stq_p(zmmh,

Re: [PATCH v1 0/7] KVM: Hyper-V SynIC timers

2015-12-01 Thread Denis V. Lunev

On 12/01/2015 01:12 PM, Wanpeng Li wrote:

2015-11-26 16:34 GMT+08:00 Andrey Smetanin :


On 11/26/2015 08:28 AM, Wanpeng Li wrote:

2015-11-25 23:20 GMT+08:00 Andrey Smetanin :

Per Hyper-V specification (and as required by Hyper-V-aware guests),
SynIC provides 4 per-vCPU timers.  Each timer is programmed via a pair
of MSRs, and signals expiration by delivering a special format message
to the configured SynIC message slot and triggering the corresponding
synthetic interrupt.


Could you post a link for this specification?


Official link:

http://download.microsoft.com/download/A/B/4/AB43A34E-BDD0-4FA6-BDEF-79EEF16E880B/Hypervisor%20Top%20Level%20Functional%20Specification%20v4.0.docx

and there is a pdf variant(my own docx -> pdf conversion):

https://www.dropbox.com/s/ehxictr5wgnedq7/Hypervisor%20Top%20Level%20Functional%20Specification%20v4.0.pdf?dl=0

Btw, is there performance data for such feature?

Regards,
Wanpeng Li

not yet.

This is a requirement for any Hyper-V device emulation
to be activated by Windows. We are going to have
basic infrastructure merged and run performance tests
with all that stuff done.

Den
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/3] target-i386: Use C struct for xsave area layout, offsets & sizes

2015-12-01 Thread Paolo Bonzini


On 30/11/2015 18:34, Eduardo Habkost wrote:
> target-i386/cpu.c:ext_save_area uses magic numbers for the xsave
> area offets and sizes, and target-i386/kvm.c:kvm_{put,get}_xsave()
> uses offset macros and bit manipulation to access the xsave area.
> This series changes both to use C structs for those operations.
> 
> I still need to figure out a way to write unit tests for the new
> code. Maybe I will just copy and paste the new and old functions,
> and test them locally (checking if they give the same results
> when translating blobs of random bytes).

I think it's easier to use small guests (i.e. kvm-unit-tests) to test
this code.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 18/21] KVM: ARM64: Add PMU overflow interrupt routing

2015-12-01 Thread Shannon Zhao



On 2015/12/1 22:50, Marc Zyngier wrote:

On 01/12/15 14:35, Shannon Zhao wrote:



On 2015/12/1 2:22, Marc Zyngier wrote:

On Fri, 30 Oct 2015 14:22:00 +0800
Shannon Zhao  wrote:


From: Shannon Zhao 

When calling perf_event_create_kernel_counter to create perf_event,
assign a overflow handler. Then when perf event overflows, set
irq_pending and call kvm_vcpu_kick() to sync the interrupt.

Signed-off-by: Shannon Zhao 
---
  arch/arm/kvm/arm.c|  4 +++
  include/kvm/arm_pmu.h |  4 +++
  virt/kvm/arm/pmu.c| 76 ++-
  3 files changed, 83 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 78b2869..9c0fec4 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -28,6 +28,7 @@
  #include 
  #include 
  #include 
+#include 

  #define CREATE_TRACE_POINTS
  #include "trace.h"
@@ -551,6 +552,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)

if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
local_irq_enable();
+   kvm_pmu_sync_hwstate(vcpu);


This is very weird. Are you only injecting interrupts when a signal is
pending? I don't understand how this works...


kvm_vgic_sync_hwstate(vcpu);
preempt_enable();
kvm_timer_sync_hwstate(vcpu);
@@ -598,6 +600,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
kvm_guest_exit();
trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));

+   kvm_pmu_post_sync_hwstate(vcpu);
+
kvm_vgic_sync_hwstate(vcpu);

preempt_enable();
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index acd025a..5e7f943 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -39,6 +39,8 @@ struct kvm_pmu {
  };

  #ifdef CONFIG_KVM_ARM_PMU
+void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu);
+void kvm_pmu_post_sync_hwstate(struct kvm_vcpu *vcpu);


Please follow the current terminology: _flush_ on VM entry, _sync_ on
VM exit.



Hi Marc,

Is below patch the right way for this?

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 78b2869..84008d1 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -28,6 +28,7 @@
  #include 
  #include 
  #include 
+#include 

  #define CREATE_TRACE_POINTS
  #include "trace.h"
@@ -531,6 +532,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
struct kvm_run *run)
  */
 kvm_timer_flush_hwstate(vcpu);

+   kvm_pmu_flush_hwstate(vcpu);
+
 /*
  * Preparing the interrupts to be injected also
  * involves poking the GIC, which must be done in a
@@ -554,6 +557,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
struct kvm_run *run)
 kvm_vgic_sync_hwstate(vcpu);
 preempt_enable();
 kvm_timer_sync_hwstate(vcpu);
+   kvm_pmu_sync_hwstate(vcpu);
 continue;
 }

@@ -604,6 +608,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
struct kvm_run *run)

 kvm_timer_sync_hwstate(vcpu);

+   kvm_pmu_sync_hwstate(vcpu);
+
 ret = handle_exit(vcpu, run, ret);
 }


yeah, that's more like it!



diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 47bbd43..edfe4e5 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -41,6 +41,8 @@ struct kvm_pmu {
  };

  #ifdef CONFIG_KVM_ARM_PMU
+void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu);
+void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu);
  unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32
select_idx);
  void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u32 val);
  void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u32 val, bool
all_enable);
@@ -51,6 +53,8 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu
*vcpu, u32 data,
 u32 select_idx);
  void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u32 val);
  #else
+void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu) {}
+void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu) {}
  unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32
select_idx)
  {
 return 0;
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index 15cac45..9aad2f7 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -21,6 +21,7 @@
  #include 
  #include 
  #include 
+#include 

  /**
   * kvm_pmu_get_counter_value - get PMU counter value
@@ -79,6 +80,78 @@ static void kvm_pmu_stop_counter(struct kvm_pmc *pmc)
  }

  /**
+ * kvm_pmu_flush_hwstate - flush pmu state to cpu
+ * @vcpu: The vcpu pointer
+ *
+ * Inject virtual PMU IRQ if IRQ is pending for this cpu.
+ */
+void 

Re: [PATCH v2 0/3] target-i386: Use C struct for xsave area layout, offsets & sizes

2015-12-01 Thread Eduardo Habkost
On Tue, Dec 01, 2015 at 11:22:31AM +0100, Paolo Bonzini wrote:
> On 30/11/2015 18:34, Eduardo Habkost wrote:
> > target-i386/cpu.c:ext_save_area uses magic numbers for the xsave
> > area offets and sizes, and target-i386/kvm.c:kvm_{put,get}_xsave()
> > uses offset macros and bit manipulation to access the xsave area.
> > This series changes both to use C structs for those operations.
> > 
> > I still need to figure out a way to write unit tests for the new
> > code. Maybe I will just copy and paste the new and old functions,
> > and test them locally (checking if they give the same results
> > when translating blobs of random bytes).
> > 
> > Changes v1 -> v2:
> > * Use uint8_t[8*n] instead of uint64_t[n] for register data
> > * Keep the QEMU_BUILD_BUG_ON lines
> > 
[...]
> > 
> > Eduardo Habkost (3):
> >   target-i386: Define structs for layout of xsave area
> >   target-i386: Use xsave structs for ext_save_area
> >   target-i386: kvm: Use X86XSaveArea struct for xsave save/load
> > 
> >  target-i386/cpu.c | 18 +++
> >  target-i386/cpu.h | 85 
> >  target-i386/kvm.c | 96 
> > +--
> >  3 files changed, 155 insertions(+), 44 deletions(-)
> > 
> 
> The patches are okay, are you going to rebase them on top of the PKRU
> patches?

I will probably redo the PKRU patches on top of this, to reduce
diff size.

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/3] target-i386: Use C struct for xsave area layout, offsets & sizes

2015-12-01 Thread Paolo Bonzini


On 01/12/2015 16:25, Eduardo Habkost wrote:
> > I think it's easier to use small guests (i.e. kvm-unit-tests) to test
> > this code.
>
> I agree it's easier, but how likely it is to catch bugs in the
> save/load code? If the code corrupts a register, we need to
> trigger a save/load cycle at the exact moment the guest code is
> using that register. Do we have something that helps us
> repeatedly save/load CPU state while kvm-unit-tests is running?

A vmware magic port read should do that.  Put VMPORT_MAGIC in EAX and
VMPORT_CMD_GETVERSION in ECX, then do a 32-bit in from port 0x5658.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC

2015-12-01 Thread Michael S. Tsirkin
On Tue, Dec 01, 2015 at 11:04:31PM +0800, Lan, Tianyu wrote:
> 
> 
> On 12/1/2015 12:07 AM, Alexander Duyck wrote:
> >They can only be corrected if the underlying assumptions are correct
> >and they aren't.  Your solution would have never worked correctly.
> >The problem is you assume you can keep the device running when you are
> >migrating and you simply cannot.  At some point you will always have
> >to stop the device in order to complete the migration, and you cannot
> >stop it before you have stopped your page tracking mechanism.  So
> >unless the platform has an IOMMU that is somehow taking part in the
> >dirty page tracking you will not be able to stop the guest and then
> >the device, it will have to be the device and then the guest.
> >
> >>>Doing suspend and resume() may help to do migration easily but some
> >>>devices requires low service down time. Especially network and I got
> >>>that some cloud company promised less than 500ms network service downtime.
> >Honestly focusing on the downtime is getting the cart ahead of the
> >horse.  First you need to be able to do this without corrupting system
> >memory and regardless of the state of the device.  You haven't even
> >gotten to that state yet.  Last I knew the device had to be up in
> >order for your migration to even work.
> 
> I think the issue is that the content of rx package delivered to stack maybe
> changed during migration because the piece of memory won't be migrated to
> new machine. This may confuse applications or stack. Current dummy write
> solution can ensure the content of package won't change after doing dummy
> write while the content maybe not received data if migration happens before
> that point. We can recheck the content via checksum or crc in the protocol
> after dummy write to ensure the content is what VF received. I think stack
> has already done such checks and the package will be abandoned if failed to
> pass through the check.


Most people nowdays rely on hardware checksums so I don't think this can
fly.

> Another way is to tell all memory driver are using to Qemu and let Qemu to
> migrate these memory after stopping VCPU and the device. This seems safe but
> implementation maybe complex.

Not really 100% safe.  See below.

I think hiding these details behind dma_* API does have
some appeal. In any case, it gives us a good
terminology as it covers what most drivers do.

There are several components to this:
- dma_map_* needs to prevent page from
  being migrated while device is running.
  For example, expose some kind of bitmap from guest
  to host, set bit there while page is mapped.
  What happens if we stop the guest and some
  bits are still set? See dma_alloc_coherent below
  for some ideas.


- dma_unmap_* needs to mark page as dirty
  This can be done by writing into a page.

- dma_sync_* needs to mark page as dirty
  This is trickier as we can not change the data.
  One solution is using atomics.
  For example:
int x = ACCESS_ONCE(*p);
cmpxchg(p, x, x);
  Seems to do a write without changing page
  contents.

- dma_alloc_coherent memory (e.g. device rings)
  must be migrated after device stopped modifying it.
  Just stopping the VCPU is not enough:
  you must make sure device is not changing it.

  Or maybe the device has some kind of ring flush operation,
  if there was a reasonably portable way to do this
  (e.g. a flush capability could maybe be added to SRIOV)
  then hypervisor could do this.

  With existing devices,
  either do it after device reset, or disable
  memory access in the IOMMU. Maybe both.

  In case you need to resume on source, you
  really need to follow the same path
  as on destination, preferably detecting
  device reset and restoring the device
  state.

  A similar approach could work for dma_map_ above.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 09/21] arm64: KVM: Implement guest entry

2015-12-01 Thread Christoffer Dall
On Fri, Nov 27, 2015 at 06:50:03PM +, Marc Zyngier wrote:
> Contrary to the previous patch, the guest entry is fairly different
> from its assembly counterpart, mostly because it is only concerned
> with saving/restoring the GP registers, and nothing else.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/hyp/Makefile |   1 +
>  arch/arm64/kvm/hyp/entry.S  | 155 
> 
>  arch/arm64/kvm/hyp/hyp.h|   2 +
>  3 files changed, 158 insertions(+)
>  create mode 100644 arch/arm64/kvm/hyp/entry.S
> 
> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> index ec14cac..1e1ff06 100644
> --- a/arch/arm64/kvm/hyp/Makefile
> +++ b/arch/arm64/kvm/hyp/Makefile
> @@ -7,3 +7,4 @@ obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
>  obj-$(CONFIG_KVM_ARM_HOST) += timer-sr.o
>  obj-$(CONFIG_KVM_ARM_HOST) += sysreg-sr.o
>  obj-$(CONFIG_KVM_ARM_HOST) += debug-sr.o
> +obj-$(CONFIG_KVM_ARM_HOST) += entry.o
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> new file mode 100644
> index 000..2c4449a
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -0,0 +1,155 @@
> +/*
> + * Copyright (C) 2015 - ARM Ltd
> + * Author: Marc Zyngier 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define CPU_GP_REG_OFFSET(x) (CPU_GP_REGS + x)
> +#define CPU_XREG_OFFSET(x)   CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x)
> +
> + .text
> + .pushsection.hyp.text, "ax"
> +
> +.macro save_common_regs ctxt
> + stp x19, x20, [\ctxt, #CPU_XREG_OFFSET(19)]
> + stp x21, x22, [\ctxt, #CPU_XREG_OFFSET(21)]
> + stp x23, x24, [\ctxt, #CPU_XREG_OFFSET(23)]
> + stp x25, x26, [\ctxt, #CPU_XREG_OFFSET(25)]
> + stp x27, x28, [\ctxt, #CPU_XREG_OFFSET(27)]
> + stp x29, lr,  [\ctxt, #CPU_XREG_OFFSET(29)]
> +.endm
> +
> +.macro restore_common_regs ctxt
> + ldp x19, x20, [\ctxt, #CPU_XREG_OFFSET(19)]
> + ldp x21, x22, [\ctxt, #CPU_XREG_OFFSET(21)]
> + ldp x23, x24, [\ctxt, #CPU_XREG_OFFSET(23)]
> + ldp x25, x26, [\ctxt, #CPU_XREG_OFFSET(25)]
> + ldp x27, x28, [\ctxt, #CPU_XREG_OFFSET(27)]
> + ldp x29, lr,  [\ctxt, #CPU_XREG_OFFSET(29)]
> +.endm
> +
> +.macro save_host_regs reg
> + save_common_regs \reg
> +.endm
> +
> +.macro restore_host_regs reg
> + restore_common_regs \reg
> +.endm
> +
> +.macro save_guest_regs
> + // x0 is the vcpu address
> + // x1 is the return code, do not corrupt!
> + // x2 is the cpu context

this is confusing because the caller says x2 is free, so are these the
inputs or invariants preserved in the function, or?

note that you'll avoid this kind of confusion by inlining this stuff in
__guest_exit.

> + // x3 is a tmp register
> + // Guest's x0-x3 are on the stack
> +
> + add x2, x0, #VCPU_CONTEXT
> +
> + // Compute base to save registers

misleading comment?

> + stp x4, x5,   [x2, #CPU_XREG_OFFSET(4)]
> + stp x6, x7,   [x2, #CPU_XREG_OFFSET(6)]
> + stp x8, x9,   [x2, #CPU_XREG_OFFSET(8)]
> + stp x10, x11, [x2, #CPU_XREG_OFFSET(10)]
> + stp x12, x13, [x2, #CPU_XREG_OFFSET(12)]
> + stp x14, x15, [x2, #CPU_XREG_OFFSET(14)]
> + stp x16, x17, [x2, #CPU_XREG_OFFSET(16)]
> + str x18,  [x2, #CPU_XREG_OFFSET(18)]
> +
> + pop x6, x7  // x2, x3
> + pop x4, x5  // x0, x1

hard to review when I haven't seen the code that calls this, but I'll
assume we store things in register order on the stack.

> +
> + stp x4, x5, [x2, #CPU_XREG_OFFSET(0)]
> + stp x6, x7, [x2, #CPU_XREG_OFFSET(2)]
> +
> + save_common_regs x2
> +.endm
> +
> +.macro restore_guest_regs
> + // Assume vcpu in x0, clobbers everything else

nit: clobbers everything (x0 gets nuked too)

> +
> + add x2, x0, #VCPU_CONTEXT
> +
> + // Prepare x0-x3 for later restore
> + ldp x4, x5, [x2, #CPU_XREG_OFFSET(0)]
> + ldp x6, x7, [x2, #CPU_XREG_OFFSET(2)]
> + pushx4, x5  // Push x0-x3 on the stack
> + pushx6, x7

why do you need x2 and x3 later? can't you just make do with x0 and x1
and move the cpu context pointer to x1 ?

> +
> + // x4-x18
> + 

Re: [PATCH v2 0/3] target-i386: Use C struct for xsave area layout, offsets & sizes

2015-12-01 Thread Eduardo Habkost
On Tue, Dec 01, 2015 at 04:09:44PM +0100, Paolo Bonzini wrote:
> 
> 
> On 30/11/2015 18:34, Eduardo Habkost wrote:
> > target-i386/cpu.c:ext_save_area uses magic numbers for the xsave
> > area offets and sizes, and target-i386/kvm.c:kvm_{put,get}_xsave()
> > uses offset macros and bit manipulation to access the xsave area.
> > This series changes both to use C structs for those operations.
> > 
> > I still need to figure out a way to write unit tests for the new
> > code. Maybe I will just copy and paste the new and old functions,
> > and test them locally (checking if they give the same results
> > when translating blobs of random bytes).
> 
> I think it's easier to use small guests (i.e. kvm-unit-tests) to test
> this code.

I agree it's easier, but how likely it is to catch bugs in the
save/load code? If the code corrupts a register, we need to
trigger a save/load cycle at the exact moment the guest code is
using that register. Do we have something that helps us
repeatedly save/load CPU state while kvm-unit-tests is running?

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 10/21] arm64: KVM: Add patchable function selector

2015-12-01 Thread Christoffer Dall
On Fri, Nov 27, 2015 at 06:50:04PM +, Marc Zyngier wrote:
> KVM so far relies on code patching, and is likely to use it more
> in the future. The main issue is that our alternative system works
> at the instruction level, while we'd like to have alternatives at
> the function level.
> 
> In order to cope with this, add the "hyp_alternate_select" macro that
> outputs a brief sequence of code that in turn can be patched, allowing
> al alternative function to be selected.

s/al/an/ ?

> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/hyp/hyp.h | 16 
>  1 file changed, 16 insertions(+)
> 
> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
> index 7ac8e11..f0427ee 100644
> --- a/arch/arm64/kvm/hyp/hyp.h
> +++ b/arch/arm64/kvm/hyp/hyp.h
> @@ -27,6 +27,22 @@
>  
>  #define kern_hyp_va(v) (typeof(v))((unsigned long)v & HYP_PAGE_OFFSET_MASK)
>  
> +/*
> + * Generates patchable code sequences that are used to switch between
> + * two implementations of a function, depending on the availability of
> + * a feature.
> + */

This looks right to me, but I'm a bit unclear what the types of this is
and how to use it.

Are orig and alt function pointers and cond is a CONFIG_FOO ?  fname is
a symbol, which is defined as a prototype somewhere and then implemented
here, or?

Perhaps a Usage: part of the docs would be helpful.


> +#define hyp_alternate_select(fname, orig, alt, cond) \
> +typeof(orig) * __hyp_text fname(void)
> \
> +{\
> + typeof(alt) *val = orig;\
> + asm volatile(ALTERNATIVE("nop   \n",\
> +  "mov   %0, %1  \n",\
> +  cond)  \
> +  : "+r" (val) : "r" (alt)); \
> + return val; \
> +}
> +
>  void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
>  void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
>  
> -- 
> 2.1.4
> 

I haven't thought much about how all of this is implemented, but from my
point of views the ideal situation would be something like:

void foo(int a, int b)
{
ALTERNATIVE_IF_NOT CONFIG_BAR
foo_legacy(a, b);
ALTERNATIVE_ELSE
foo_new(a, b);
ALTERNATIVE_END
}

I realize this may be impossible because the C code could implement all
sort of fun stuff around the actual function calls, but would there be
some way to annotate the functions and find the actual branch statement
and change the target?

Apologies if this question is just outright ridiculous.

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 08/21] arm64: KVM: Implement debug save/restore

2015-12-01 Thread Christoffer Dall
On Tue, Dec 01, 2015 at 03:01:16PM +, Marc Zyngier wrote:
> On 01/12/15 14:47, Christoffer Dall wrote:
> > On Tue, Dec 01, 2015 at 01:06:31PM +, Marc Zyngier wrote:
> >> On 01/12/15 12:56, Christoffer Dall wrote:
> >>> On Fri, Nov 27, 2015 at 06:50:02PM +, Marc Zyngier wrote:
>  Implement the debug save restore as a direct translation of
>  the assembly code version.
> 
>  Signed-off-by: Marc Zyngier 
>  ---
>   arch/arm64/kvm/hyp/Makefile   |   1 +
>   arch/arm64/kvm/hyp/debug-sr.c | 130 
>  ++
>   arch/arm64/kvm/hyp/hyp.h  |   9 +++
>   3 files changed, 140 insertions(+)
>   create mode 100644 arch/arm64/kvm/hyp/debug-sr.c
> 
>  diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
>  index ec94200..ec14cac 100644
>  --- a/arch/arm64/kvm/hyp/Makefile
>  +++ b/arch/arm64/kvm/hyp/Makefile
>  @@ -6,3 +6,4 @@ obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
>   obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
>   obj-$(CONFIG_KVM_ARM_HOST) += timer-sr.o
>   obj-$(CONFIG_KVM_ARM_HOST) += sysreg-sr.o
>  +obj-$(CONFIG_KVM_ARM_HOST) += debug-sr.o
>  diff --git a/arch/arm64/kvm/hyp/debug-sr.c 
>  b/arch/arm64/kvm/hyp/debug-sr.c
>  new file mode 100644
>  index 000..a0b2b99
>  --- /dev/null
>  +++ b/arch/arm64/kvm/hyp/debug-sr.c
>  @@ -0,0 +1,130 @@
>  +/*
>  + * Copyright (C) 2015 - ARM Ltd
>  + * Author: Marc Zyngier 
>  + *
>  + * This program is free software; you can redistribute it and/or modify
>  + * it under the terms of the GNU General Public License version 2 as
>  + * published by the Free Software Foundation.
>  + *
>  + * This program is distributed in the hope that it will be useful,
>  + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>  + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>  + * GNU General Public License for more details.
>  + *
>  + * You should have received a copy of the GNU General Public License
>  + * along with this program.  If not, see .
>  + */
>  +
>  +#include 
>  +#include 
>  +
>  +#include 
>  +
>  +#include "hyp.h"
>  +
>  +#define read_debug(r,n) read_sysreg(r##n##_el1)
>  +#define write_debug(v,r,n)  write_sysreg(v, r##n##_el1)
>  +
>  +#define save_debug(ptr,reg,nr)  
>  \
>  +switch (nr) {   
>  \
>  +case 15:ptr[15] = read_debug(reg, 15);  
>  \
>  +case 14:ptr[14] = read_debug(reg, 14);  
>  \
>  +case 13:ptr[13] = read_debug(reg, 13);  
>  \
>  +case 12:ptr[12] = read_debug(reg, 12);  
>  \
>  +case 11:ptr[11] = read_debug(reg, 11);  
>  \
>  +case 10:ptr[10] = read_debug(reg, 10);  
>  \
>  +case 9: ptr[9] = read_debug(reg, 9);
>  \
>  +case 8: ptr[8] = read_debug(reg, 8);
>  \
>  +case 7: ptr[7] = read_debug(reg, 7);
>  \
>  +case 6: ptr[6] = read_debug(reg, 6);
>  \
>  +case 5: ptr[5] = read_debug(reg, 5);
>  \
>  +case 4: ptr[4] = read_debug(reg, 4);
>  \
>  +case 3: ptr[3] = read_debug(reg, 3);
>  \
>  +case 2: ptr[2] = read_debug(reg, 2);
>  \
>  +case 1: ptr[1] = read_debug(reg, 1);
>  \
>  +default:ptr[0] = read_debug(reg, 0);
>  \
>  +}
>  +
>  +#define restore_debug(ptr,reg,nr)   
>  \
>  +switch (nr) {   
>  \
>  +case 15:write_debug(ptr[15], reg, 15);  
>  \
>  +case 14:write_debug(ptr[14], reg, 14);  
>  \
>  +case 13:write_debug(ptr[13], reg, 13);  
>  \
>  +case 12:write_debug(ptr[12], reg, 12);  
>  \
>  +case 11:write_debug(ptr[11], reg, 11);  
>  \
>  +case 10:write_debug(ptr[10], reg, 10);  
>  \
>  +case 9: write_debug(ptr[9], reg, 9);
>  \
>  +case 8:

Re: [PATCH v4 18/21] KVM: ARM64: Add PMU overflow interrupt routing

2015-12-01 Thread Marc Zyngier
On 01/12/15 15:13, Shannon Zhao wrote:
> 
> 
> On 2015/12/1 22:50, Marc Zyngier wrote:
>> On 01/12/15 14:35, Shannon Zhao wrote:
>>>
>>>
>>> On 2015/12/1 2:22, Marc Zyngier wrote:
 On Fri, 30 Oct 2015 14:22:00 +0800
 Shannon Zhao  wrote:

> From: Shannon Zhao 
>
> When calling perf_event_create_kernel_counter to create perf_event,
> assign a overflow handler. Then when perf event overflows, set
> irq_pending and call kvm_vcpu_kick() to sync the interrupt.
>
> Signed-off-by: Shannon Zhao 
> ---
>   arch/arm/kvm/arm.c|  4 +++
>   include/kvm/arm_pmu.h |  4 +++
>   virt/kvm/arm/pmu.c| 76 
> ++-
>   3 files changed, 83 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 78b2869..9c0fec4 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -28,6 +28,7 @@
>   #include 
>   #include 
>   #include 
> +#include 
>
>   #define CREATE_TRACE_POINTS
>   #include "trace.h"
> @@ -551,6 +552,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
> struct kvm_run *run)
>
>   if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>   local_irq_enable();
> + kvm_pmu_sync_hwstate(vcpu);

 This is very weird. Are you only injecting interrupts when a signal is
 pending? I don't understand how this works...

>   kvm_vgic_sync_hwstate(vcpu);
>   preempt_enable();
>   kvm_timer_sync_hwstate(vcpu);
> @@ -598,6 +600,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
> struct kvm_run *run)
>   kvm_guest_exit();
>   trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), 
> *vcpu_pc(vcpu));
>
> + kvm_pmu_post_sync_hwstate(vcpu);
> +
>   kvm_vgic_sync_hwstate(vcpu);
>
>   preempt_enable();
> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
> index acd025a..5e7f943 100644
> --- a/include/kvm/arm_pmu.h
> +++ b/include/kvm/arm_pmu.h
> @@ -39,6 +39,8 @@ struct kvm_pmu {
>   };
>
>   #ifdef CONFIG_KVM_ARM_PMU
> +void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu);
> +void kvm_pmu_post_sync_hwstate(struct kvm_vcpu *vcpu);

 Please follow the current terminology: _flush_ on VM entry, _sync_ on
 VM exit.

>>>
>>> Hi Marc,
>>>
>>> Is below patch the right way for this?
>>>
>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>> index 78b2869..84008d1 100644
>>> --- a/arch/arm/kvm/arm.c
>>> +++ b/arch/arm/kvm/arm.c
>>> @@ -28,6 +28,7 @@
>>>   #include 
>>>   #include 
>>>   #include 
>>> +#include 
>>>
>>>   #define CREATE_TRACE_POINTS
>>>   #include "trace.h"
>>> @@ -531,6 +532,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
>>> struct kvm_run *run)
>>>   */
>>>  kvm_timer_flush_hwstate(vcpu);
>>>
>>> +   kvm_pmu_flush_hwstate(vcpu);
>>> +
>>>  /*
>>>   * Preparing the interrupts to be injected also
>>>   * involves poking the GIC, which must be done in a
>>> @@ -554,6 +557,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
>>> struct kvm_run *run)
>>>  kvm_vgic_sync_hwstate(vcpu);
>>>  preempt_enable();
>>>  kvm_timer_sync_hwstate(vcpu);
>>> +   kvm_pmu_sync_hwstate(vcpu);
>>>  continue;
>>>  }
>>>
>>> @@ -604,6 +608,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
>>> struct kvm_run *run)
>>>
>>>  kvm_timer_sync_hwstate(vcpu);
>>>
>>> +   kvm_pmu_sync_hwstate(vcpu);
>>> +
>>>  ret = handle_exit(vcpu, run, ret);
>>>  }
>>
>> yeah, that's more like it!
>>
>>>
>>> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
>>> index 47bbd43..edfe4e5 100644
>>> --- a/include/kvm/arm_pmu.h
>>> +++ b/include/kvm/arm_pmu.h
>>> @@ -41,6 +41,8 @@ struct kvm_pmu {
>>>   };
>>>
>>>   #ifdef CONFIG_KVM_ARM_PMU
>>> +void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu);
>>> +void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu);
>>>   unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32
>>> select_idx);
>>>   void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u32 val);
>>>   void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u32 val, bool
>>> all_enable);
>>> @@ -51,6 +53,8 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu
>>> *vcpu, u32 data,
>>>  u32 select_idx);
>>>   void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u32 val);
>>>   #else
>>> +void 

Re: [PATCH v2 07/21] arm64: KVM: Implement 32bit system register save/restore

2015-12-01 Thread Christoffer Dall
On Fri, Nov 27, 2015 at 06:50:01PM +, Marc Zyngier wrote:
> Implement the 32bit system register save restore as a direct
> translation of the assembly code version.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/hyp/hyp.h   |  2 ++
>  arch/arm64/kvm/hyp/sysreg-sr.c | 41 +
>  2 files changed, 43 insertions(+)
> 
> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
> index 087d3a5..4639330 100644
> --- a/arch/arm64/kvm/hyp/hyp.h
> +++ b/arch/arm64/kvm/hyp/hyp.h
> @@ -38,6 +38,8 @@ void __timer_restore_state(struct kvm_vcpu *vcpu);
>  
>  void __sysreg_save_state(struct kvm_cpu_context *ctxt);
>  void __sysreg_restore_state(struct kvm_cpu_context *ctxt);
> +void __sysreg32_save_state(struct kvm_vcpu *vcpu);
> +void __sysreg32_restore_state(struct kvm_vcpu *vcpu);
>  
>  #endif /* __ARM64_KVM_HYP_H__ */
>  
> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> index add8fcb..3f81a4d 100644
> --- a/arch/arm64/kvm/hyp/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> @@ -88,3 +88,44 @@ void __hyp_text __sysreg_restore_state(struct 
> kvm_cpu_context *ctxt)
>   write_sysreg(ctxt->gp_regs.elr_el1, elr_el1);
>   write_sysreg(ctxt->gp_regs.spsr[KVM_SPSR_EL1], spsr_el1);
>  }
> +
> +void __hyp_text __sysreg32_save_state(struct kvm_vcpu *vcpu)
> +{
> + if (!(read_sysreg(hcr_el2) & HCR_RW)) {

nit: I would probably invert the check and return early.

> + u64 *spsr = vcpu->arch.ctxt.gp_regs.spsr;
> + u64 *sysreg = vcpu->arch.ctxt.sys_regs;
> +
> + spsr[KVM_SPSR_ABT] = read_sysreg(spsr_abt);
> + spsr[KVM_SPSR_UND] = read_sysreg(spsr_und);
> + spsr[KVM_SPSR_IRQ] = read_sysreg(spsr_irq);
> + spsr[KVM_SPSR_FIQ] = read_sysreg(spsr_fiq);
> +
> + sysreg[DACR32_EL2] = read_sysreg(dacr32_el2);
> + sysreg[IFSR32_EL2] = read_sysreg(ifsr32_el2);
> +
> + if (!(read_sysreg(cptr_el2) & CPTR_EL2_TFP))
> + sysreg[FPEXC32_EL2] = read_sysreg(fpexc32_el2);
> +
> + if (vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
> + sysreg[DBGVCR32_EL2] = read_sysreg(dbgvcr32_el2);
> + }
> +}
> +
> +void __hyp_text __sysreg32_restore_state(struct kvm_vcpu *vcpu)
> +{
> + if (!(read_sysreg(hcr_el2) & HCR_RW)) {

same here

> + u64 *spsr = vcpu->arch.ctxt.gp_regs.spsr;
> + u64 *sysreg = vcpu->arch.ctxt.sys_regs;
> +
> + write_sysreg(spsr[KVM_SPSR_ABT], spsr_abt);
> + write_sysreg(spsr[KVM_SPSR_UND], spsr_und);
> + write_sysreg(spsr[KVM_SPSR_IRQ], spsr_irq);
> + write_sysreg(spsr[KVM_SPSR_FIQ], spsr_fiq);
> +  

nit: white space

> + write_sysreg(sysreg[DACR32_EL2], dacr32_el2);
> + write_sysreg(sysreg[IFSR32_EL2], ifsr32_el2);
> +
> + if (vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY)
> + write_sysreg(sysreg[DBGVCR32_EL2], dbgvcr32_el2);
> + }
> +}
> -- 
> 2.1.4
> 

Otherwise:

Reviewed-by: Christoffer Dall 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 06/21] arm64: KVM: Implement system register save/restore

2015-12-01 Thread Christoffer Dall
On Fri, Nov 27, 2015 at 06:50:00PM +, Marc Zyngier wrote:
> Implement the system registe save restore as a direct translation of

nit: s/registe/register/

> the assembly code version.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/hyp/Makefile|  1 +
>  arch/arm64/kvm/hyp/hyp.h   |  3 ++
>  arch/arm64/kvm/hyp/sysreg-sr.c | 90 
> ++
>  3 files changed, 94 insertions(+)
>  create mode 100644 arch/arm64/kvm/hyp/sysreg-sr.c
> 
> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> index 455dc0a..ec94200 100644
> --- a/arch/arm64/kvm/hyp/Makefile
> +++ b/arch/arm64/kvm/hyp/Makefile
> @@ -5,3 +5,4 @@
>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
>  obj-$(CONFIG_KVM_ARM_HOST) += timer-sr.o
> +obj-$(CONFIG_KVM_ARM_HOST) += sysreg-sr.o
> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
> index 86aa5a2..087d3a5 100644
> --- a/arch/arm64/kvm/hyp/hyp.h
> +++ b/arch/arm64/kvm/hyp/hyp.h
> @@ -36,5 +36,8 @@ void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
>  void __timer_save_state(struct kvm_vcpu *vcpu);
>  void __timer_restore_state(struct kvm_vcpu *vcpu);
>  
> +void __sysreg_save_state(struct kvm_cpu_context *ctxt);
> +void __sysreg_restore_state(struct kvm_cpu_context *ctxt);
> +
>  #endif /* __ARM64_KVM_HYP_H__ */
>  
> diff --git a/arch/arm64/kvm/hyp/sysreg-sr.c b/arch/arm64/kvm/hyp/sysreg-sr.c
> new file mode 100644
> index 000..add8fcb
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/sysreg-sr.c
> @@ -0,0 +1,90 @@
> +/*
> + * Copyright (C) 2012-2015 - ARM Ltd
> + * Author: Marc Zyngier 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +
> +#include 
> +#include 
> +
> +#include 
> +
> +#include "hyp.h"
> +
> +/* ctxt is already in the HYP VA space */
> +void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
> +{
> + ctxt->sys_regs[MPIDR_EL1]   = read_sysreg(vmpidr_el2);
> + ctxt->sys_regs[CSSELR_EL1]  = read_sysreg(csselr_el1);
> + ctxt->sys_regs[SCTLR_EL1]   = read_sysreg(sctlr_el1);
> + ctxt->sys_regs[ACTLR_EL1]   = read_sysreg(actlr_el1);
> + ctxt->sys_regs[CPACR_EL1]   = read_sysreg(cpacr_el1);
> + ctxt->sys_regs[TTBR0_EL1]   = read_sysreg(ttbr0_el1);
> + ctxt->sys_regs[TTBR1_EL1]   = read_sysreg(ttbr1_el1);
> + ctxt->sys_regs[TCR_EL1] = read_sysreg(tcr_el1);
> + ctxt->sys_regs[ESR_EL1] = read_sysreg(esr_el1);
> + ctxt->sys_regs[AFSR0_EL1]   = read_sysreg(afsr0_el1);
> + ctxt->sys_regs[AFSR1_EL1]   = read_sysreg(afsr1_el1);
> + ctxt->sys_regs[FAR_EL1] = read_sysreg(far_el1);
> + ctxt->sys_regs[MAIR_EL1]= read_sysreg(mair_el1);
> + ctxt->sys_regs[VBAR_EL1]= read_sysreg(vbar_el1);
> + ctxt->sys_regs[CONTEXTIDR_EL1]  = read_sysreg(contextidr_el1);
> + ctxt->sys_regs[TPIDR_EL0]   = read_sysreg(tpidr_el0);
> + ctxt->sys_regs[TPIDRRO_EL0] = read_sysreg(tpidrro_el0);
> + ctxt->sys_regs[TPIDR_EL1]   = read_sysreg(tpidr_el1);
> + ctxt->sys_regs[AMAIR_EL1]   = read_sysreg(amair_el1);
> + ctxt->sys_regs[CNTKCTL_EL1] = read_sysreg(cntkctl_el1);
> + ctxt->sys_regs[PAR_EL1] = read_sysreg(par_el1);
> + ctxt->sys_regs[MDSCR_EL1]   = read_sysreg(mdscr_el1);
> +
> + ctxt->gp_regs.regs.sp   = read_sysreg(sp_el0);
> + ctxt->gp_regs.regs.pc   = read_sysreg(elr_el2);
> + ctxt->gp_regs.regs.pstate   = read_sysreg(spsr_el2);
> + ctxt->gp_regs.sp_el1= read_sysreg(sp_el1);
> + ctxt->gp_regs.elr_el1   = read_sysreg(elr_el1);
> + ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg(spsr_el1);
> +}
> +
> +void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
> +{
> + write_sysreg(ctxt->sys_regs[MPIDR_EL1],   vmpidr_el2);
> + write_sysreg(ctxt->sys_regs[CSSELR_EL1],  csselr_el1);
> + write_sysreg(ctxt->sys_regs[SCTLR_EL1],   sctlr_el1);
> + write_sysreg(ctxt->sys_regs[ACTLR_EL1],   actlr_el1);
> + write_sysreg(ctxt->sys_regs[CPACR_EL1],   cpacr_el1);
> + write_sysreg(ctxt->sys_regs[TTBR0_EL1],   ttbr0_el1);
> + write_sysreg(ctxt->sys_regs[TTBR1_EL1],   ttbr1_el1);
> + write_sysreg(ctxt->sys_regs[TCR_EL1], tcr_el1);
> + write_sysreg(ctxt->sys_regs[ESR_EL1], 

Re: [PATCH v2 11/21] arm64: KVM: Implement the core world switch

2015-12-01 Thread Christoffer Dall
On Fri, Nov 27, 2015 at 06:50:05PM +, Marc Zyngier wrote:
> Implement the core of the world switch in C. Not everything is there
> yet, and there is nothing to re-enter the world switch either.
> 
> But this already outlines the code structure well enough.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/hyp/Makefile |   1 +
>  arch/arm64/kvm/hyp/switch.c | 134 
> 
>  2 files changed, 135 insertions(+)
>  create mode 100644 arch/arm64/kvm/hyp/switch.c
> 
> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> index 1e1ff06..9c11b0f 100644
> --- a/arch/arm64/kvm/hyp/Makefile
> +++ b/arch/arm64/kvm/hyp/Makefile
> @@ -8,3 +8,4 @@ obj-$(CONFIG_KVM_ARM_HOST) += timer-sr.o
>  obj-$(CONFIG_KVM_ARM_HOST) += sysreg-sr.o
>  obj-$(CONFIG_KVM_ARM_HOST) += debug-sr.o
>  obj-$(CONFIG_KVM_ARM_HOST) += entry.o
> +obj-$(CONFIG_KVM_ARM_HOST) += switch.o
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> new file mode 100644
> index 000..d67ed9e
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -0,0 +1,134 @@
> +/*
> + * Copyright (C) 2015 - ARM Ltd
> + * Author: Marc Zyngier 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +
> +#include "hyp.h"
> +
> +static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
> +{
> + u64 val;
> +
> + /*
> +  * We are about to set CPTR_EL2.TFP to trap all floating point
> +  * register accesses to EL2, however, the ARM ARM clearly states that
> +  * traps are only taken to EL2 if the operation would not otherwise
> +  * trap to EL1.  Therefore, always make sure that for 32-bit guests,
> +  * we set FPEXC.EN to prevent traps to EL1, when setting the TFP bit.
> +  */
> + val = vcpu->arch.hcr_el2;
> + if (!(val & HCR_RW)) {
> + write_sysreg(1 << 30, fpexc32_el2);
> + isb();
> + }
> + write_sysreg(val, hcr_el2);

> + write_sysreg(1 << 15, hstr_el2);

can you add a comment on this 1 << 15 stuff or use a define or something
to remind mushy-brained people like myself that this is about trapping
something we care about?

> + write_sysreg(CPTR_EL2_TTA | CPTR_EL2_TFP, cptr_el2);
> + write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
> +}
> +
> +static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu)
> +{
> + write_sysreg(HCR_RW, hcr_el2);
> + write_sysreg(0, hstr_el2);
> + write_sysreg(read_sysreg(mdcr_el2) & MDCR_EL2_HPMN_MASK, mdcr_el2);
> + write_sysreg(0, cptr_el2);
> +}
> +
> +static void __hyp_text __activate_vm(struct kvm_vcpu *vcpu)
> +{
> + struct kvm *kvm = kern_hyp_va(vcpu->kvm);
> + write_sysreg(kvm->arch.vttbr, vttbr_el2);
> +}
> +
> +static void __hyp_text __deactivate_vm(struct kvm_vcpu *vcpu)
> +{
> + write_sysreg(0, vttbr_el2);
> +}
> +
> +static hyp_alternate_select(__vgic_call_save_state,
> + __vgic_v2_save_state, __vgic_v3_save_state,
> + ARM64_HAS_SYSREG_GIC_CPUIF);
> +
> +static hyp_alternate_select(__vgic_call_restore_state,
> + __vgic_v2_restore_state, __vgic_v3_restore_state,
> + ARM64_HAS_SYSREG_GIC_CPUIF);
> +
> +static void __hyp_text __vgic_save_state(struct kvm_vcpu *vcpu)
> +{
> + __vgic_call_save_state()(vcpu);
> + write_sysreg(read_sysreg(hcr_el2) & ~HCR_INT_OVERRIDE, hcr_el2);
> +}
> +
> +static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu)
> +{
> + u64 val;
> +
> + val = read_sysreg(hcr_el2);
> + val |=  HCR_INT_OVERRIDE;
> + val |= vcpu->arch.irq_lines;
> + write_sysreg(val, hcr_el2);
> +
> + __vgic_call_restore_state()(vcpu);
> +}
> +
> +int __hyp_text __guest_run(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_cpu_context *host_ctxt;
> + struct kvm_cpu_context *guest_ctxt;
> + u64 exit_code;
> +
> + vcpu = kern_hyp_va(vcpu);
> + write_sysreg(vcpu, tpidr_el2);
> +
> + host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> + guest_ctxt = >arch.ctxt;
> +
> + __sysreg_save_state(host_ctxt);
> + __debug_cond_save_host_state(vcpu);
> +
> + __activate_traps(vcpu);
> + __activate_vm(vcpu);
> +
> + __vgic_restore_state(vcpu);
> + __timer_restore_state(vcpu);
> +
> + /*
> +  * We must restore the 32-bit state before the sysregs, thanks
> + 

Re: [PATCH v5 2/2] KVM: Make KVM_CAP_IRQFD dependent on KVM_CAP_IRQCHIP

2015-12-01 Thread Paolo Bonzini


On 30/11/2015 15:38, Cornelia Huck wrote:
> It obviously
> requires an irqchip; but if you need some configuration/enablement
> beforehand, you'll get different values depending on when you retrieve
> the cap. So does KVM_CAP_IRQFD mean "irqfds are available in principle"
> or "everything has been setup for usage of irqfds"? I'd assume the
> former.

It should be the former, yes.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6] arm/arm64: KVM: Detect vGIC presence at runtime

2015-12-01 Thread Pavel Fedin
Before commit 662d9715840aef44dcb573b0f9fab9e8319c868a
("arm/arm64: KVM: Kill CONFIG_KVM_ARM_{VGIC,TIMER}") is was possible to
compile the kernel without vGIC and vTimer support. Commit message says
about possibility to detect vGIC support in runtime, but this has never
been implemented.

This patch introdices runtime check, restoring the lost functionality.
It again allows to use KVM on hardware without vGIC. Interrupt
controller has to be emulated in userspace in this case.

-ENODEV return code from probe function means there's no GIC at all.
-ENXIO happens when, for example, there is GIC node in the device tree,
but it does not specify vGIC resources. Normally this means that vGIC
hardware is defunct. Any other error code is still treated as full stop
because it might mean some really serious problems.

This patch does not touch any virtual timer code, suggesting that timer
hardware is actually in place. Normally on boards in question it is true,
however since vGIC is missing, it is impossible to correctly utilize
interrupts from the virtual timer. Since virtual timer handling is in
active redevelopment now, handling in it userspace is out of scope at
the moment. The guest is currently suggested to use some memory-mapped
timer which can be emulated in userspace.

Signed-off-by: Pavel Fedin 
---
v5 => v6:
- KVM_CAP_IRQFD patch also dropped, causing many problems on PowerPC and
  S390
- Rebased on top of 4.3-rc3

v4 => v5:
- Tested on top of kvmarm/next
- Dropped already applied part
- Fixed minor checkpatch issues

v3 => v4:
- Revert back to using switch on kvm_vgic_hyp_init() return code. I decided
  to leave 'vgic_present = false' statement because it helps to understand
  the code.

v2 => v3:
- Improved commit messages, added references to commits where the respective
  functionality was broken
- Explicitly specify that the solution currently affects only vGIC and has
  nothing to do with timer.
- Fixed code style according to previous notes
- Removed ARM64 save/restore patch introduced in v2 because it was already
  obsolete for linux-next
- Modify KVM_CAP_IRQFD handling in correct place

v1 => v2:
- Do not use defensive approach in patch 0001. Use correct conditions in
  callers instead
- Added ARM64-specific code, without which attempt to run a VM ends in a
  HYP crash because of unset vGIC save/restore function pointers
---
 arch/arm/kvm/arm.c | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index e06fd29..66f90c1 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -61,6 +61,8 @@ static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
 static u8 kvm_next_vmid;
 static DEFINE_SPINLOCK(kvm_vmid_lock);
 
+static bool vgic_present;
+
 static void kvm_arm_set_running_vcpu(struct kvm_vcpu *vcpu)
 {
BUG_ON(preemptible());
@@ -132,7 +134,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
kvm->arch.vmid_gen = 0;
 
/* The maximum number of VCPUs is limited by the host's GIC model */
-   kvm->arch.max_vcpus = kvm_vgic_get_max_vcpus();
+   kvm->arch.max_vcpus = vgic_present ?
+   kvm_vgic_get_max_vcpus() : KVM_MAX_VCPUS;
 
return ret;
 out_free_stage2_pgd:
@@ -172,6 +175,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
int r;
switch (ext) {
case KVM_CAP_IRQCHIP:
+   r = vgic_present;
+   break;
case KVM_CAP_IOEVENTFD:
case KVM_CAP_DEVICE_CTRL:
case KVM_CAP_USER_MEMORY:
@@ -913,6 +918,8 @@ static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
 
switch (dev_id) {
case KVM_ARM_DEVICE_VGIC_V2:
+   if (!vgic_present)
+   return -ENXIO;
return kvm_vgic_addr(kvm, type, _addr->addr, true);
default:
return -ENODEV;
@@ -927,6 +934,8 @@ long kvm_arch_vm_ioctl(struct file *filp,
 
switch (ioctl) {
case KVM_CREATE_IRQCHIP: {
+   if (!vgic_present)
+   return -ENXIO;
return kvm_vgic_create(kvm, KVM_DEV_TYPE_ARM_VGIC_V2);
}
case KVM_ARM_SET_DEVICE_ADDR: {
@@ -,8 +1120,17 @@ static int init_hyp_mode(void)
 * Init HYP view of VGIC
 */
err = kvm_vgic_hyp_init();
-   if (err)
+   switch (err) {
+   case 0:
+   vgic_present = true;
+   break;
+   case -ENODEV:
+   case -ENXIO:
+   vgic_present = false;
+   break;
+   default:
goto out_free_context;
+   }
 
/*
 * Init HYP architected timer support
-- 
2.4.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 05/21] arm64: KVM: Implement timer save/restore

2015-12-01 Thread Marc Zyngier
On 30/11/15 19:59, Christoffer Dall wrote:
> On Fri, Nov 27, 2015 at 06:49:59PM +, Marc Zyngier wrote:
>> Implement the timer save restore as a direct translation of
>> the assembly code version.
>>
>> Signed-off-by: Marc Zyngier 
>> ---
>>  arch/arm64/kvm/hyp/Makefile  |  1 +
>>  arch/arm64/kvm/hyp/hyp.h |  3 ++
>>  arch/arm64/kvm/hyp/timer-sr.c| 71 
>> 
>>  include/clocksource/arm_arch_timer.h |  6 +++
>>  4 files changed, 81 insertions(+)
>>  create mode 100644 arch/arm64/kvm/hyp/timer-sr.c
>>
>> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
>> index d1e38ce..455dc0a 100644
>> --- a/arch/arm64/kvm/hyp/Makefile
>> +++ b/arch/arm64/kvm/hyp/Makefile
>> @@ -4,3 +4,4 @@
>>  
>>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
>>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
>> +obj-$(CONFIG_KVM_ARM_HOST) += timer-sr.o
>> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
>> index a31cb6e..86aa5a2 100644
>> --- a/arch/arm64/kvm/hyp/hyp.h
>> +++ b/arch/arm64/kvm/hyp/hyp.h
>> @@ -33,5 +33,8 @@ void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
>>  void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
>>  void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
>>  
>> +void __timer_save_state(struct kvm_vcpu *vcpu);
>> +void __timer_restore_state(struct kvm_vcpu *vcpu);
>> +
>>  #endif /* __ARM64_KVM_HYP_H__ */
>>  
>> diff --git a/arch/arm64/kvm/hyp/timer-sr.c b/arch/arm64/kvm/hyp/timer-sr.c
>> new file mode 100644
>> index 000..8e2209c
>> --- /dev/null
>> +++ b/arch/arm64/kvm/hyp/timer-sr.c
>> @@ -0,0 +1,71 @@
>> +/*
>> + * Copyright (C) 2012-2015 - ARM Ltd
>> + * Author: Marc Zyngier 
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see .
>> + */
>> +
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include 
>> +
>> +#include "hyp.h"
>> +
>> +/* vcpu is already in the HYP VA space */
>> +void __hyp_text __timer_save_state(struct kvm_vcpu *vcpu)
>> +{
>> +struct kvm *kvm = kern_hyp_va(vcpu->kvm);
>> +struct arch_timer_cpu *timer = >arch.timer_cpu;
>> +
>> +if (kvm->arch.timer.enabled) {
>> +timer->cntv_ctl = read_sysreg(cntv_ctl_el0);
>> +isb();
>> +timer->cntv_cval = read_sysreg(cntv_cval_el0);
>> +}
>> +
>> +/* Disable the virtual timer */
>> +write_sysreg(0, cntv_ctl_el0);
>> +
>> +/* Allow physical timer/counter access for the host */
>> +write_sysreg((read_sysreg(cnthctl_el2) | CNTHCTL_EL1PCTEN |
>> +  CNTHCTL_EL1PCEN),
>> + cnthctl_el2);
> 
> nit: again I probably prefer reading cnthctl_el2 into a variable, modify
> the bits and write it back, but it's no big deal.

Sure.

>> +
>> +/* Clear cntvoff for the host */
>> +write_sysreg(0, cntvoff_el2);
> 
> why do we do this when we've just disabled the timer?

Because the host does use CNTVCT_EL0 (see the VDSO code), and you don't
want time to go backward over there...

>> +}
>> +
>> +void __hyp_text __timer_restore_state(struct kvm_vcpu *vcpu)
>> +{
>> +struct kvm *kvm = kern_hyp_va(vcpu->kvm);
>> +struct arch_timer_cpu *timer = >arch.timer_cpu;
>> +u64 val;
>> +
>> +/*
>> + * Disallow physical timer access for the guest
>> + * Physical counter access is allowed
>> + */
>> +val = read_sysreg(cnthctl_el2);
>> +val &= ~CNTHCTL_EL1PCEN;
>> +val |= CNTHCTL_EL1PCTEN;
>> +write_sysreg(val, cnthctl_el2);
>> +
>> +if (kvm->arch.timer.enabled) {
>> +write_sysreg(kvm->arch.timer.cntvoff, cntvoff_el2);
>> +write_sysreg(timer->cntv_cval, cntv_cval_el0);
>> +isb();
>> +write_sysreg(timer->cntv_ctl, cntv_ctl_el0);
>> +}
>> +}
>> diff --git a/include/clocksource/arm_arch_timer.h 
>> b/include/clocksource/arm_arch_timer.h
>> index 9916d0e..25d0914 100644
>> --- a/include/clocksource/arm_arch_timer.h
>> +++ b/include/clocksource/arm_arch_timer.h
>> @@ -23,6 +23,12 @@
>>  #define ARCH_TIMER_CTRL_IT_MASK (1 << 1)
>>  #define ARCH_TIMER_CTRL_IT_STAT (1 << 2)
>>  
>> +#define CNTHCTL_EL1PCTEN(1 << 0)
>> +#define CNTHCTL_EL1PCEN (1 << 1)
>> +#define CNTHCTL_EVNTEN  (1 << 2)
>> +#define CNTHCTL_EVNTDIR (1 << 3)
>> +#define CNTHCTL_EVNTI   (0xF << 4)
>> 

Re: [PATCH v4 05/21] KVM: ARM64: Add reset and access handlers for PMSELR register

2015-12-01 Thread Shannon Zhao


On 2015/12/1 16:49, Marc Zyngier wrote:
> On 01/12/15 01:51, Shannon Zhao wrote:
>> Hi Marc,
>>
>> On 2015/12/1 1:56, Marc Zyngier wrote:
>>> Same remark here as the one I made earlier. I'm pretty sure we don't
>>> call any CP15 reset because they are all shared with their 64bit
>>> counterparts. The same thing goes for the whole series.
>> Ok, I see. But within the 64bit reset function, it needs to update the
>> 32bit register value, right? Since when accessing these 32bit registers,
>> it uses the offset c9_PM.
> 
> It shouldn't,  because the 64bit and 32bit share the same storage. From
> your own patch:
> 
> +/* Performance Monitors*/
> +#define c9_PMCR  (PMCR_EL0 * 2)
> +#define c9_PMOVSSET  (PMOVSSET_EL0 * 2)
> +#define c9_PMOVSCLR  (PMOVSCLR_EL0 * 2)
> +#define c9_PMCCNTR   (PMCCNTR_EL0 * 2)
> +#define c9_PMSELR(PMSELR_EL0 * 2)
> +#define c9_PMCEID0   (PMCEID0_EL0 * 2)
> +#define c9_PMCEID1   (PMCEID1_EL0 * 2)
> +#define c9_PMXEVCNTR (PMXEVCNTR_EL0 * 2)
> +#define c9_PMXEVTYPER(PMXEVTYPER_EL0 * 2)
> +#define c9_PMCNTENSET(PMCNTENSET_EL0 * 2)
> +#define c9_PMCNTENCLR(PMCNTENCLR_EL0 * 2)
> +#define c9_PMINTENSET(PMINTENSET_EL1 * 2)
> +#define c9_PMINTENCLR(PMINTENCLR_EL1 * 2)
> +#define c9_PMUSERENR (PMUSERENR_EL0 * 2)
> +#define c9_PMSWINC   (PMSWINC_EL0 * 2)
> 
> These are indexes in the copro array:
> 
> struct kvm_cpu_context {
>   struct kvm_regs gp_regs;
>   union {
>   u64 sys_regs[NR_SYS_REGS];
>   u32 copro[NR_COPRO_REGS];
>   };
> };
> 
> which is in a union with the sys_reg array. So anything that affects one
> affects the other because:
> - there is only one state in the physical CPU, no matter which mode
> you're in,
> - the guest EL1 is either 32bit or 64bit, and never changes over time.
> 
> Hope this helps,
> 
Ok, I see. Thanks for the explanation. :)

-- 
Shannon

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 03/21] arm64: KVM: Implement vgic-v2 save/restore

2015-12-01 Thread Marc Zyngier
On 30/11/15 20:00, Christoffer Dall wrote:
> On Fri, Nov 27, 2015 at 06:49:57PM +, Marc Zyngier wrote:
>> Implement the vgic-v2 save restore (mostly) as a direct translation
>> of the assembly code version.
>>
>> Signed-off-by: Marc Zyngier 
>> ---
>>  arch/arm64/kvm/Makefile |  1 +
>>  arch/arm64/kvm/hyp/Makefile |  5 +++
>>  arch/arm64/kvm/hyp/hyp.h|  3 ++
>>  arch/arm64/kvm/hyp/vgic-v2-sr.c | 89 
>> +
>>  4 files changed, 98 insertions(+)
>>  create mode 100644 arch/arm64/kvm/hyp/Makefile
>>  create mode 100644 arch/arm64/kvm/hyp/vgic-v2-sr.c
>>
>> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
>> index 1949fe5..d31e4e5 100644
>> --- a/arch/arm64/kvm/Makefile
>> +++ b/arch/arm64/kvm/Makefile
>> @@ -10,6 +10,7 @@ KVM=../../../virt/kvm
>>  ARM=../../../arch/arm/kvm
>>  
>>  obj-$(CONFIG_KVM_ARM_HOST) += kvm.o
>> +obj-$(CONFIG_KVM_ARM_HOST) += hyp/
>>  
>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o 
>> $(KVM)/eventfd.o $(KVM)/vfio.o
>>  kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/arm.o $(ARM)/mmu.o $(ARM)/mmio.o
>> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
>> new file mode 100644
>> index 000..d8d5968
>> --- /dev/null
>> +++ b/arch/arm64/kvm/hyp/Makefile
>> @@ -0,0 +1,5 @@
>> +#
>> +# Makefile for Kernel-based Virtual Machine module, HYP part
>> +#
>> +
>> +obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
>> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
>> index dac843e..78f25c4 100644
>> --- a/arch/arm64/kvm/hyp/hyp.h
>> +++ b/arch/arm64/kvm/hyp/hyp.h
>> @@ -27,5 +27,8 @@
>>  
>>  #define kern_hyp_va(v) (typeof(v))((unsigned long)v & HYP_PAGE_OFFSET_MASK)
>>  
>> +void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
>> +void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
> 
> should we call these flush/sync here now ?
> 
>> +
>>  #endif /* __ARM64_KVM_HYP_H__ */
>>  
>> diff --git a/arch/arm64/kvm/hyp/vgic-v2-sr.c 
>> b/arch/arm64/kvm/hyp/vgic-v2-sr.c
>> new file mode 100644
>> index 000..29a5c1d
>> --- /dev/null
>> +++ b/arch/arm64/kvm/hyp/vgic-v2-sr.c
>> @@ -0,0 +1,89 @@
>> +/*
>> + * Copyright (C) 2012-2015 - ARM Ltd
>> + * Author: Marc Zyngier 
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see .
>> + */
>> +
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include 
>> +
>> +#include "hyp.h"
>> +
>> +/* vcpu is already in the HYP VA space */
> 
> should we annotate hyp pointers similarly to __user or will that be
> confusing when VHE enters the scene ?

I looked at doing that. That's a possibility, and I don't think that
would be too bad as long as we have kern_hyp_va() doing the (potentially
NOP) conversion. The only issue is that this is only enforced with
sparse, not by a usual compilation.

Still, this is a valid use case, and I'll try to invest some time doing
that.

> 
>> +void __hyp_text __vgic_v2_save_state(struct kvm_vcpu *vcpu)
>> +{
>> +struct kvm *kvm = kern_hyp_va(vcpu->kvm);
>> +struct vgic_v2_cpu_if *cpu_if = >arch.vgic_cpu.vgic_v2;
>> +struct vgic_dist *vgic = >arch.vgic;
>> +void __iomem *base = kern_hyp_va(vgic->vctrl_base);
>> +u32 __iomem *lr_base;
>> +u32 eisr0, eisr1, elrsr0, elrsr1;
>> +int i = 0, nr_lr;
>> +
>> +if (!base)
>> +return;
>> +
>> +nr_lr = vcpu->arch.vgic_cpu.nr_lr;
>> +cpu_if->vgic_vmcr = readl_relaxed(base + GICH_VMCR);
>> +cpu_if->vgic_misr = readl_relaxed(base + GICH_MISR);
>> +eisr0  = readl_relaxed(base + GICH_EISR0);
>> +elrsr0 = readl_relaxed(base + GICH_ELRSR0);
>> +if (unlikely(nr_lr > 32)) {
>> +eisr1  = readl_relaxed(base + GICH_EISR1);
>> +elrsr1 = readl_relaxed(base + GICH_ELRSR1);
>> +} else {
>> +eisr1 = elrsr1 = 0;
>> +}
>> +#ifdef CONFIG_CPU_BIG_ENDIAN
>> +cpu_if->vgic_eisr  = ((u64)eisr0 << 32) | eisr1;
>> +cpu_if->vgic_elrsr = ((u64)elrsr0 << 32) | elrsr1;
>> +#else
>> +cpu_if->vgic_eisr  = ((u64)eisr1 << 32) | eisr0;
>> +cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0;
>> +#endif
>> +cpu_if->vgic_apr= readl_relaxed(base + GICH_APR);
>> +
>> +writel_relaxed(0, base + GICH_HCR);
>> +
>> +lr_base = base + GICH_LR0;
>> +do {
>> +cpu_if->vgic_lr[i++] = readl_relaxed(lr_base++);
>> +} while (--nr_lr);
> 
> why not a simple 

Re: [PATCH v2 04/21] arm64: KVM: Implement vgic-v3 save/restore

2015-12-01 Thread Christoffer Dall
On Tue, Dec 01, 2015 at 12:44:26PM +0100, Christoffer Dall wrote:
> On Tue, Dec 01, 2015 at 11:32:20AM +, Marc Zyngier wrote:
> > On 30/11/15 19:50, Christoffer Dall wrote:
> > > On Fri, Nov 27, 2015 at 06:49:58PM +, Marc Zyngier wrote:
> > >> Implement the vgic-v3 save restore as a direct translation of
> > >> the assembly code version.
> > >>
> > >> Signed-off-by: Marc Zyngier 
> > >> ---
> > >>  arch/arm64/kvm/hyp/Makefile |   1 +
> > >>  arch/arm64/kvm/hyp/hyp.h|   3 +
> > >>  arch/arm64/kvm/hyp/vgic-v3-sr.c | 222 
> > >> 
> > >>  3 files changed, 226 insertions(+)
> > >>  create mode 100644 arch/arm64/kvm/hyp/vgic-v3-sr.c
> > >>
> > >> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> > >> index d8d5968..d1e38ce 100644
> > >> --- a/arch/arm64/kvm/hyp/Makefile
> > >> +++ b/arch/arm64/kvm/hyp/Makefile
> > >> @@ -3,3 +3,4 @@
> > >>  #
> > >>  
> > >>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
> > >> +obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
> > >> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
> > >> index 78f25c4..a31cb6e 100644
> > >> --- a/arch/arm64/kvm/hyp/hyp.h
> > >> +++ b/arch/arm64/kvm/hyp/hyp.h
> > >> @@ -30,5 +30,8 @@
> > >>  void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
> > >>  void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
> > >>  
> > >> +void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
> > >> +void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
> > >> +
> > >>  #endif /* __ARM64_KVM_HYP_H__ */
> > >>  
> > >> diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c 
> > >> b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> > >> new file mode 100644
> > >> index 000..b490db5
> > >> --- /dev/null
> > >> +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> > >> @@ -0,0 +1,222 @@
> > >> +/*
> > >> + * Copyright (C) 2012-2015 - ARM Ltd
> > >> + * Author: Marc Zyngier 
> > >> + *
> > >> + * This program is free software; you can redistribute it and/or modify
> > >> + * it under the terms of the GNU General Public License version 2 as
> > >> + * published by the Free Software Foundation.
> > >> + *
> > >> + * This program is distributed in the hope that it will be useful,
> > >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > >> + * GNU General Public License for more details.
> > >> + *
> > >> + * You should have received a copy of the GNU General Public License
> > >> + * along with this program.  If not, see .
> > >> + */
> > >> +
> > >> +#include 
> > >> +#include 
> > >> +#include 
> > >> +
> > >> +#include 
> > >> +
> > >> +#include "hyp.h"
> > >> +
> > >> +/*
> > >> + * We store LRs in reverse order to let the CPU deal with streaming
> > >> + * access. Use this macro to make it look saner...
> > >> + */
> > >> +#define LR_OFFSET(n)(15 - n)
> > >> +
> > >> +#define read_gicreg(r)  
> > >> \
> > >> +({  
> > >> \
> > >> +u64 reg;
> > >> \
> > >> +asm volatile("mrs_s %0, " __stringify(r) : "=r" (reg)); 
> > >> \
> > >> +reg;
> > >> \
> > >> +})
> > >> +
> > >> +#define write_gicreg(v,r)   
> > >> \
> > >> +do {
> > >> \
> > >> +u64 __val = (v);
> > >> \
> > >> +asm volatile("msr_s " __stringify(r) ", %0" : : "r" 
> > >> (__val));\
> > >> +} while (0)
> > > 
> > > remind me what the msr_s and mrs_s do compared to msr and mrs?
> > 
> > They do the same job, only for the system registers which are not in the
> > original ARMv8 architecture spec, and most likely not implemented by
> > old(er) compilers.
> > 
> > > are these the reason why we need separate macros to access the gic
> > > registers compared to 'normal' sysregs?
> > 
> > Indeed.
> > 
> > >> +
> > >> +/* vcpu is already in the HYP VA space */
> > >> +void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
> > >> +{
> > >> +struct vgic_v3_cpu_if *cpu_if = >arch.vgic_cpu.vgic_v3;
> > >> +u64 val;
> > >> +u32 nr_lr, nr_pri;
> > >> +
> > >> +/*
> > >> + * Make sure stores to the GIC via the memory mapped interface
> > >> + * are now visible to the system register interface.
> > >> + */
> > >> +dsb(st);
> > >> +
> > >> +cpu_if->vgic_vmcr  = read_gicreg(ICH_VMCR_EL2);
> > >> +cpu_if->vgic_misr  = read_gicreg(ICH_MISR_EL2);
> > >> +cpu_if->vgic_eisr  = read_gicreg(ICH_EISR_EL2);
> > >> +cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2);
> > 

Re: [PATCH v2 04/21] arm64: KVM: Implement vgic-v3 save/restore

2015-12-01 Thread Marc Zyngier
On 01/12/15 11:50, Christoffer Dall wrote:
> On Tue, Dec 01, 2015 at 12:44:26PM +0100, Christoffer Dall wrote:
>> On Tue, Dec 01, 2015 at 11:32:20AM +, Marc Zyngier wrote:
>>> On 30/11/15 19:50, Christoffer Dall wrote:
 On Fri, Nov 27, 2015 at 06:49:58PM +, Marc Zyngier wrote:
> Implement the vgic-v3 save restore as a direct translation of
> the assembly code version.
>
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/hyp/Makefile |   1 +
>  arch/arm64/kvm/hyp/hyp.h|   3 +
>  arch/arm64/kvm/hyp/vgic-v3-sr.c | 222 
> 
>  3 files changed, 226 insertions(+)
>  create mode 100644 arch/arm64/kvm/hyp/vgic-v3-sr.c
>
> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> index d8d5968..d1e38ce 100644
> --- a/arch/arm64/kvm/hyp/Makefile
> +++ b/arch/arm64/kvm/hyp/Makefile
> @@ -3,3 +3,4 @@
>  #
>  
>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
> +obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
> index 78f25c4..a31cb6e 100644
> --- a/arch/arm64/kvm/hyp/hyp.h
> +++ b/arch/arm64/kvm/hyp/hyp.h
> @@ -30,5 +30,8 @@
>  void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
>  void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
>  
> +void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
> +void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
> +
>  #endif /* __ARM64_KVM_HYP_H__ */
>  
> diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c 
> b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> new file mode 100644
> index 000..b490db5
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> @@ -0,0 +1,222 @@
> +/*
> + * Copyright (C) 2012-2015 - ARM Ltd
> + * Author: Marc Zyngier 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#include "hyp.h"
> +
> +/*
> + * We store LRs in reverse order to let the CPU deal with streaming
> + * access. Use this macro to make it look saner...
> + */
> +#define LR_OFFSET(n) (15 - n)
> +
> +#define read_gicreg(r)   
> \
> + ({  \
> + u64 reg;\
> + asm volatile("mrs_s %0, " __stringify(r) : "=r" (reg)); \
> + reg;\
> + })
> +
> +#define write_gicreg(v,r)
> \
> + do {\
> + u64 __val = (v);\
> + asm volatile("msr_s " __stringify(r) ", %0" : : "r" (__val));\
> + } while (0)

 remind me what the msr_s and mrs_s do compared to msr and mrs?
>>>
>>> They do the same job, only for the system registers which are not in the
>>> original ARMv8 architecture spec, and most likely not implemented by
>>> old(er) compilers.
>>>
 are these the reason why we need separate macros to access the gic
 registers compared to 'normal' sysregs?
>>>
>>> Indeed.
>>>
> +
> +/* vcpu is already in the HYP VA space */
> +void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
> +{
> + struct vgic_v3_cpu_if *cpu_if = >arch.vgic_cpu.vgic_v3;
> + u64 val;
> + u32 nr_lr, nr_pri;
> +
> + /*
> +  * Make sure stores to the GIC via the memory mapped interface
> +  * are now visible to the system register interface.
> +  */
> + dsb(st);
> +
> + cpu_if->vgic_vmcr  = read_gicreg(ICH_VMCR_EL2);
> + cpu_if->vgic_misr  = read_gicreg(ICH_MISR_EL2);
> + cpu_if->vgic_eisr  = read_gicreg(ICH_EISR_EL2);
> + cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2);
> +
> + write_gicreg(0, ICH_HCR_EL2);
> + val = read_gicreg(ICH_VTR_EL2);
> + nr_lr = val & 0xf;

 this is not technically nr_lr, it's max_lr or max_lr_idx or something
 like that.
>>>
>>> Let's go for max_lr_idx  then.
>>>
> + nr_pri = 

Re: [PATCH v2 08/21] arm64: KVM: Implement debug save/restore

2015-12-01 Thread Marc Zyngier
On 01/12/15 12:56, Christoffer Dall wrote:
> On Fri, Nov 27, 2015 at 06:50:02PM +, Marc Zyngier wrote:
>> Implement the debug save restore as a direct translation of
>> the assembly code version.
>>
>> Signed-off-by: Marc Zyngier 
>> ---
>>  arch/arm64/kvm/hyp/Makefile   |   1 +
>>  arch/arm64/kvm/hyp/debug-sr.c | 130 
>> ++
>>  arch/arm64/kvm/hyp/hyp.h  |   9 +++
>>  3 files changed, 140 insertions(+)
>>  create mode 100644 arch/arm64/kvm/hyp/debug-sr.c
>>
>> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
>> index ec94200..ec14cac 100644
>> --- a/arch/arm64/kvm/hyp/Makefile
>> +++ b/arch/arm64/kvm/hyp/Makefile
>> @@ -6,3 +6,4 @@ obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
>>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
>>  obj-$(CONFIG_KVM_ARM_HOST) += timer-sr.o
>>  obj-$(CONFIG_KVM_ARM_HOST) += sysreg-sr.o
>> +obj-$(CONFIG_KVM_ARM_HOST) += debug-sr.o
>> diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
>> new file mode 100644
>> index 000..a0b2b99
>> --- /dev/null
>> +++ b/arch/arm64/kvm/hyp/debug-sr.c
>> @@ -0,0 +1,130 @@
>> +/*
>> + * Copyright (C) 2015 - ARM Ltd
>> + * Author: Marc Zyngier 
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see .
>> + */
>> +
>> +#include 
>> +#include 
>> +
>> +#include 
>> +
>> +#include "hyp.h"
>> +
>> +#define read_debug(r,n) read_sysreg(r##n##_el1)
>> +#define write_debug(v,r,n)  write_sysreg(v, r##n##_el1)
>> +
>> +#define save_debug(ptr,reg,nr)  
>> \
>> +switch (nr) {   \
>> +case 15:ptr[15] = read_debug(reg, 15);  \
>> +case 14:ptr[14] = read_debug(reg, 14);  \
>> +case 13:ptr[13] = read_debug(reg, 13);  \
>> +case 12:ptr[12] = read_debug(reg, 12);  \
>> +case 11:ptr[11] = read_debug(reg, 11);  \
>> +case 10:ptr[10] = read_debug(reg, 10);  \
>> +case 9: ptr[9] = read_debug(reg, 9);\
>> +case 8: ptr[8] = read_debug(reg, 8);\
>> +case 7: ptr[7] = read_debug(reg, 7);\
>> +case 6: ptr[6] = read_debug(reg, 6);\
>> +case 5: ptr[5] = read_debug(reg, 5);\
>> +case 4: ptr[4] = read_debug(reg, 4);\
>> +case 3: ptr[3] = read_debug(reg, 3);\
>> +case 2: ptr[2] = read_debug(reg, 2);\
>> +case 1: ptr[1] = read_debug(reg, 1);\
>> +default:ptr[0] = read_debug(reg, 0);\
>> +}
>> +
>> +#define restore_debug(ptr,reg,nr)   \
>> +switch (nr) {   \
>> +case 15:write_debug(ptr[15], reg, 15);  \
>> +case 14:write_debug(ptr[14], reg, 14);  \
>> +case 13:write_debug(ptr[13], reg, 13);  \
>> +case 12:write_debug(ptr[12], reg, 12);  \
>> +case 11:write_debug(ptr[11], reg, 11);  \
>> +case 10:write_debug(ptr[10], reg, 10);  \
>> +case 9: write_debug(ptr[9], reg, 9);\
>> +case 8: write_debug(ptr[8], reg, 8);\
>> +case 7: write_debug(ptr[7], reg, 7);\
>> +case 6: write_debug(ptr[6], reg, 6);\
>> +case 5: write_debug(ptr[5], reg, 5);\
>> +case 4: write_debug(ptr[4], reg, 4);\
>> +case 3: write_debug(ptr[3], reg, 3);\
>> +case 2: write_debug(ptr[2], reg, 2);\
>> +case 1: write_debug(ptr[1], reg, 1);\
>> +default:write_debug(ptr[0], reg, 0);\
>> +}
>> +
>> +void __hyp_text __debug_save_state(struct kvm_vcpu *vcpu,
>> +   struct kvm_guest_debug_arch *dbg,
>> +   struct 

Re: [PATCH v2 04/21] arm64: KVM: Implement vgic-v3 save/restore

2015-12-01 Thread Christoffer Dall
On Tue, Dec 01, 2015 at 11:32:20AM +, Marc Zyngier wrote:
> On 30/11/15 19:50, Christoffer Dall wrote:
> > On Fri, Nov 27, 2015 at 06:49:58PM +, Marc Zyngier wrote:
> >> Implement the vgic-v3 save restore as a direct translation of
> >> the assembly code version.
> >>
> >> Signed-off-by: Marc Zyngier 
> >> ---
> >>  arch/arm64/kvm/hyp/Makefile |   1 +
> >>  arch/arm64/kvm/hyp/hyp.h|   3 +
> >>  arch/arm64/kvm/hyp/vgic-v3-sr.c | 222 
> >> 
> >>  3 files changed, 226 insertions(+)
> >>  create mode 100644 arch/arm64/kvm/hyp/vgic-v3-sr.c
> >>
> >> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> >> index d8d5968..d1e38ce 100644
> >> --- a/arch/arm64/kvm/hyp/Makefile
> >> +++ b/arch/arm64/kvm/hyp/Makefile
> >> @@ -3,3 +3,4 @@
> >>  #
> >>  
> >>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
> >> +obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
> >> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
> >> index 78f25c4..a31cb6e 100644
> >> --- a/arch/arm64/kvm/hyp/hyp.h
> >> +++ b/arch/arm64/kvm/hyp/hyp.h
> >> @@ -30,5 +30,8 @@
> >>  void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
> >>  void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
> >>  
> >> +void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
> >> +void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
> >> +
> >>  #endif /* __ARM64_KVM_HYP_H__ */
> >>  
> >> diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c 
> >> b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> >> new file mode 100644
> >> index 000..b490db5
> >> --- /dev/null
> >> +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> >> @@ -0,0 +1,222 @@
> >> +/*
> >> + * Copyright (C) 2012-2015 - ARM Ltd
> >> + * Author: Marc Zyngier 
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License version 2 as
> >> + * published by the Free Software Foundation.
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public License
> >> + * along with this program.  If not, see .
> >> + */
> >> +
> >> +#include 
> >> +#include 
> >> +#include 
> >> +
> >> +#include 
> >> +
> >> +#include "hyp.h"
> >> +
> >> +/*
> >> + * We store LRs in reverse order to let the CPU deal with streaming
> >> + * access. Use this macro to make it look saner...
> >> + */
> >> +#define LR_OFFSET(n)  (15 - n)
> >> +
> >> +#define read_gicreg(r)
> >> \
> >> +  ({  \
> >> +  u64 reg;\
> >> +  asm volatile("mrs_s %0, " __stringify(r) : "=r" (reg)); \
> >> +  reg;\
> >> +  })
> >> +
> >> +#define write_gicreg(v,r) \
> >> +  do {\
> >> +  u64 __val = (v);\
> >> +  asm volatile("msr_s " __stringify(r) ", %0" : : "r" (__val));\
> >> +  } while (0)
> > 
> > remind me what the msr_s and mrs_s do compared to msr and mrs?
> 
> They do the same job, only for the system registers which are not in the
> original ARMv8 architecture spec, and most likely not implemented by
> old(er) compilers.
> 
> > are these the reason why we need separate macros to access the gic
> > registers compared to 'normal' sysregs?
> 
> Indeed.
> 
> >> +
> >> +/* vcpu is already in the HYP VA space */
> >> +void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
> >> +{
> >> +  struct vgic_v3_cpu_if *cpu_if = >arch.vgic_cpu.vgic_v3;
> >> +  u64 val;
> >> +  u32 nr_lr, nr_pri;
> >> +
> >> +  /*
> >> +   * Make sure stores to the GIC via the memory mapped interface
> >> +   * are now visible to the system register interface.
> >> +   */
> >> +  dsb(st);
> >> +
> >> +  cpu_if->vgic_vmcr  = read_gicreg(ICH_VMCR_EL2);
> >> +  cpu_if->vgic_misr  = read_gicreg(ICH_MISR_EL2);
> >> +  cpu_if->vgic_eisr  = read_gicreg(ICH_EISR_EL2);
> >> +  cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2);
> >> +
> >> +  write_gicreg(0, ICH_HCR_EL2);
> >> +  val = read_gicreg(ICH_VTR_EL2);
> >> +  nr_lr = val & 0xf;
> > 
> > this is not technically nr_lr, it's max_lr or max_lr_idx or something
> > like that.
> 
> Let's go for max_lr_idx  then.
> 
> >> +  nr_pri = ((u32)val >> 29) + 1;
> > 
> > nit: nr_pri_bits
> > 
> >> +
> >> +  switch (nr_lr) {
> >> +  case 15:
> >> +  cpu_if->vgic_lr[LR_OFFSET(15)] = read_gicreg(ICH_LR15_EL2);
> >> +  case 14:
> >> +  

Re: [PATCH v2 04/21] arm64: KVM: Implement vgic-v3 save/restore

2015-12-01 Thread Marc Zyngier
On 01/12/15 12:24, Christoffer Dall wrote:
> On Tue, Dec 01, 2015 at 11:57:16AM +, Marc Zyngier wrote:
>> On 01/12/15 11:50, Christoffer Dall wrote:
>>> On Tue, Dec 01, 2015 at 12:44:26PM +0100, Christoffer Dall wrote:
 On Tue, Dec 01, 2015 at 11:32:20AM +, Marc Zyngier wrote:
> On 30/11/15 19:50, Christoffer Dall wrote:
>> On Fri, Nov 27, 2015 at 06:49:58PM +, Marc Zyngier wrote:
>>> Implement the vgic-v3 save restore as a direct translation of
>>> the assembly code version.
>>>
>>> Signed-off-by: Marc Zyngier 
>>> ---
>>>  arch/arm64/kvm/hyp/Makefile |   1 +
>>>  arch/arm64/kvm/hyp/hyp.h|   3 +
>>>  arch/arm64/kvm/hyp/vgic-v3-sr.c | 222 
>>> 
>>>  3 files changed, 226 insertions(+)
>>>  create mode 100644 arch/arm64/kvm/hyp/vgic-v3-sr.c
>>>
>>> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
>>> index d8d5968..d1e38ce 100644
>>> --- a/arch/arm64/kvm/hyp/Makefile
>>> +++ b/arch/arm64/kvm/hyp/Makefile
>>> @@ -3,3 +3,4 @@
>>>  #
>>>  
>>>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
>>> +obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
>>> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
>>> index 78f25c4..a31cb6e 100644
>>> --- a/arch/arm64/kvm/hyp/hyp.h
>>> +++ b/arch/arm64/kvm/hyp/hyp.h
>>> @@ -30,5 +30,8 @@
>>>  void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
>>>  void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
>>>  
>>> +void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
>>> +void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
>>> +
>>>  #endif /* __ARM64_KVM_HYP_H__ */
>>>  
>>> diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c 
>>> b/arch/arm64/kvm/hyp/vgic-v3-sr.c
>>> new file mode 100644
>>> index 000..b490db5
>>> --- /dev/null
>>> +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
>>> @@ -0,0 +1,222 @@
>>> +/*
>>> + * Copyright (C) 2012-2015 - ARM Ltd
>>> + * Author: Marc Zyngier 
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License version 2 as
>>> + * published by the Free Software Foundation.
>>> + *
>>> + * This program is distributed in the hope that it will be useful,
>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> + * GNU General Public License for more details.
>>> + *
>>> + * You should have received a copy of the GNU General Public License
>>> + * along with this program.  If not, see 
>>> .
>>> + */
>>> +
>>> +#include 
>>> +#include 
>>> +#include 
>>> +
>>> +#include 
>>> +
>>> +#include "hyp.h"
>>> +
>>> +/*
>>> + * We store LRs in reverse order to let the CPU deal with streaming
>>> + * access. Use this macro to make it look saner...
>>> + */
>>> +#define LR_OFFSET(n)   (15 - n)
>>> +
>>> +#define read_gicreg(r) 
>>> \
>>> +   ({  
>>> \
>>> +   u64 reg;
>>> \
>>> +   asm volatile("mrs_s %0, " __stringify(r) : "=r" (reg)); 
>>> \
>>> +   reg;
>>> \
>>> +   })
>>> +
>>> +#define write_gicreg(v,r)  
>>> \
>>> +   do {
>>> \
>>> +   u64 __val = (v);
>>> \
>>> +   asm volatile("msr_s " __stringify(r) ", %0" : : "r" 
>>> (__val));\
>>> +   } while (0)
>>
>> remind me what the msr_s and mrs_s do compared to msr and mrs?
>
> They do the same job, only for the system registers which are not in the
> original ARMv8 architecture spec, and most likely not implemented by
> old(er) compilers.
>
>> are these the reason why we need separate macros to access the gic
>> registers compared to 'normal' sysregs?
>
> Indeed.
>
>>> +
>>> +/* vcpu is already in the HYP VA space */
>>> +void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
>>> +{
>>> +   struct vgic_v3_cpu_if *cpu_if = >arch.vgic_cpu.vgic_v3;
>>> +   u64 val;
>>> +   u32 nr_lr, nr_pri;
>>> +
>>> +   /*
>>> +* Make sure stores to the GIC via the memory mapped interface
>>> +* are now visible to the system register interface.
>>> +*/
>>> +   

Re: [PATCH v2 08/21] arm64: KVM: Implement debug save/restore

2015-12-01 Thread Christoffer Dall
On Fri, Nov 27, 2015 at 06:50:02PM +, Marc Zyngier wrote:
> Implement the debug save restore as a direct translation of
> the assembly code version.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/hyp/Makefile   |   1 +
>  arch/arm64/kvm/hyp/debug-sr.c | 130 
> ++
>  arch/arm64/kvm/hyp/hyp.h  |   9 +++
>  3 files changed, 140 insertions(+)
>  create mode 100644 arch/arm64/kvm/hyp/debug-sr.c
> 
> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> index ec94200..ec14cac 100644
> --- a/arch/arm64/kvm/hyp/Makefile
> +++ b/arch/arm64/kvm/hyp/Makefile
> @@ -6,3 +6,4 @@ obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
>  obj-$(CONFIG_KVM_ARM_HOST) += timer-sr.o
>  obj-$(CONFIG_KVM_ARM_HOST) += sysreg-sr.o
> +obj-$(CONFIG_KVM_ARM_HOST) += debug-sr.o
> diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
> new file mode 100644
> index 000..a0b2b99
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/debug-sr.c
> @@ -0,0 +1,130 @@
> +/*
> + * Copyright (C) 2015 - ARM Ltd
> + * Author: Marc Zyngier 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see .
> + */
> +
> +#include 
> +#include 
> +
> +#include 
> +
> +#include "hyp.h"
> +
> +#define read_debug(r,n)  read_sysreg(r##n##_el1)
> +#define write_debug(v,r,n)   write_sysreg(v, r##n##_el1)
> +
> +#define save_debug(ptr,reg,nr)   
> \
> + switch (nr) {   \
> + case 15:ptr[15] = read_debug(reg, 15);  \
> + case 14:ptr[14] = read_debug(reg, 14);  \
> + case 13:ptr[13] = read_debug(reg, 13);  \
> + case 12:ptr[12] = read_debug(reg, 12);  \
> + case 11:ptr[11] = read_debug(reg, 11);  \
> + case 10:ptr[10] = read_debug(reg, 10);  \
> + case 9: ptr[9] = read_debug(reg, 9);\
> + case 8: ptr[8] = read_debug(reg, 8);\
> + case 7: ptr[7] = read_debug(reg, 7);\
> + case 6: ptr[6] = read_debug(reg, 6);\
> + case 5: ptr[5] = read_debug(reg, 5);\
> + case 4: ptr[4] = read_debug(reg, 4);\
> + case 3: ptr[3] = read_debug(reg, 3);\
> + case 2: ptr[2] = read_debug(reg, 2);\
> + case 1: ptr[1] = read_debug(reg, 1);\
> + default:ptr[0] = read_debug(reg, 0);\
> + }
> +
> +#define restore_debug(ptr,reg,nr)\
> + switch (nr) {   \
> + case 15:write_debug(ptr[15], reg, 15);  \
> + case 14:write_debug(ptr[14], reg, 14);  \
> + case 13:write_debug(ptr[13], reg, 13);  \
> + case 12:write_debug(ptr[12], reg, 12);  \
> + case 11:write_debug(ptr[11], reg, 11);  \
> + case 10:write_debug(ptr[10], reg, 10);  \
> + case 9: write_debug(ptr[9], reg, 9);\
> + case 8: write_debug(ptr[8], reg, 8);\
> + case 7: write_debug(ptr[7], reg, 7);\
> + case 6: write_debug(ptr[6], reg, 6);\
> + case 5: write_debug(ptr[5], reg, 5);\
> + case 4: write_debug(ptr[4], reg, 4);\
> + case 3: write_debug(ptr[3], reg, 3);\
> + case 2: write_debug(ptr[2], reg, 2);\
> + case 1: write_debug(ptr[1], reg, 1);\
> + default:write_debug(ptr[0], reg, 0);\
> + }
> +
> +void __hyp_text __debug_save_state(struct kvm_vcpu *vcpu,
> +struct kvm_guest_debug_arch *dbg,
> +struct kvm_cpu_context *ctxt)
> +{
> + if (vcpu->arch.debug_flags & KVM_ARM64_DEBUG_DIRTY) {
> + u64 

[PATCH] KVM: arm/arm64: Revert to old way of checking for device mapping in stage2_flush_ptes().

2015-12-01 Thread Pavel Fedin
This function takes stage-II physical addresses (A.K.A. IPA), on input, not
real physical addresses. This causes kvm_is_device_pfn() to return wrong
values, depending on how much guest and host memory maps match. This
results in completely broken KVM on some boards. The problem has been
caught on Samsung proprietary hardware.

Cc: sta...@vger.kernel.org
Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's uncachedness")

Signed-off-by: Pavel Fedin 
---
 arch/arm/kvm/mmu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 7dace90..51ad98f 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -310,7 +310,8 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
 
pte = pte_offset_kernel(pmd, addr);
do {
-   if (!pte_none(*pte) && !kvm_is_device_pfn(__phys_to_pfn(addr)))
+   if (!pte_none(*pte) &&
+   (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
kvm_flush_dcache_pte(*pte);
} while (pte++, addr += PAGE_SIZE, addr != end);
 }
-- 
2.4.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v5 2/2] KVM: Make KVM_CAP_IRQFD dependent on KVM_CAP_IRQCHIP

2015-12-01 Thread Pavel Fedin
 Hello!

> >  b) I simply drop it as it is, because current qemu knows about the 
> > dependency and does not
> try to use irqfd without irqchip,
> > because there's simply no use for them. But, well, perhaps there would be 
> > an exception in
> vhost, i don't remember testing it.
> 
> Wouldn't an irqfd emulation cover vhost?

 I've just tested, and no, it does not cause any problems with qemu. It happens 
to correctly detect that the whole thing is not
running and falls back to not using vhost. This is output from my qemu:
--- cut ---
2015-12-01T11:03:16.135724Z qemu-system-arm: Error binding guest notifier: 11
2015-12-01T11:03:16.135849Z qemu-system-arm: unable to start vhost net: 11: 
falling back on userspace virtio
--- cut ---

 So, the resume is: we just drop this patch and only N1 remains.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


> -Original Message-
> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf 
> Of Cornelia Huck
> Sent: Monday, November 30, 2015 5:38 PM
> To: Pavel Fedin
> Cc: kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org; 'Marc Zyngier'; 
> 'Christoffer Dall';
> 'Gleb Natapov'; 'Paolo Bonzini'
> Subject: Re: [PATCH v5 2/2] KVM: Make KVM_CAP_IRQFD dependent on 
> KVM_CAP_IRQCHIP
> 
> On Mon, 30 Nov 2015 15:41:20 +0300
> Pavel Fedin  wrote:
> 
> >  Hello!
> >
> > > >  Thank you for the note, i didn't know about irqchip-specific 
> > > > capability codes. There's
> the
> > > > same issue with PowerPC, now i
> > > > understand why there's no KVM_CAP_IRQCHIP for them. Because they have 
> > > > KVM_CAP_IRQ_MPIC
> and
> > > > KVM_CAP_IRQ_XICS, similar to S390.
> > > >  But isn't it just weird? I understand that perhaps we have some real 
> > > > need to
> distinguish
> > > > between different irqchip types, but
> > > > shouldn't the kernel also publish KVM_CAP_IRQCHIP, which stands just 
> > > > for "we support
> some
> > > > irqchip virtualization"?
> > > >  May be we should just add this for PowerPC and S390, to make things 
> > > > less ambiguous?
> > >
> > > Note that we explicitly need to _enable_ the s390 cap (for
> > > compatibility). I'd need to recall the exact details but I came to the
> > > conclusion back than that I could not simply enable KVM_CAP_IRQCHIP for
> > > s390 (and current qemu would fail to enable the s390 cap if we started
> > > advertising KVM_CAP_IRQCHIP now).
> >
> >  OMG... I've looked at the code, what a mess...
> >  If i was implementing this, i'd simply introduce kvm_vm_enable_cap(s, 
> > KVM_CAP_IRQCHIP, 0),
> > which would be allowed to fail with -ENOSYS, so that backwards 
> > compatibility is kept and an
> existing API is reused... But, well,
> > it's already impossible to unscramble an egg... :)
> >  Ok, i think in current situation we could choose one of these ways (both 
> > are based on the
> fact that it's obvious that irqfd require
> > IRQCHIP).
> >  a) I look for an alternate way to report KVM_CAP_IRQFD dynamically, and 
> > maybe PowerPC and
> S390 follow this way.
> 
> The thing is: _when_ can you report KVM_CAP_IRQFD? It obviously
> requires an irqchip; but if you need some configuration/enablement
> beforehand, you'll get different values depending on when you retrieve
> the cap. So does KVM_CAP_IRQFD mean "irqfds are available in principle"
> or "everything has been setup for usage of irqfds"? I'd assume the
> former.
> 
> >  b) I simply drop it as it is, because current qemu knows about the 
> > dependency and does not
> try to use irqfd without irqchip,
> > because there's simply no use for them. But, well, perhaps there would be 
> > an exception in
> vhost, i don't remember testing it.
> 
> Wouldn't an irqfd emulation cover vhost?
> 
> >  So what shall we do?
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 00/21] arm64: KVM: world switch in C

2015-12-01 Thread Christoffer Dall
On Tue, Dec 01, 2015 at 09:58:23AM +, Marc Zyngier wrote:
> On 30/11/15 20:33, Christoffer Dall wrote:
> > On Fri, Nov 27, 2015 at 06:49:54PM +, Marc Zyngier wrote:
> >> Once upon a time, the KVM/arm64 world switch was a nice, clean, lean
> >> and mean piece of hand-crafted assembly code. Over time, features have
> >> crept in, the code has become harder to maintain, and the smallest
> >> change is a pain to introduce. The VHE patches are a prime example of
> >> why this doesn't work anymore.
> >>
> >> This series rewrites most of the existing assembly code in C, but keeps
> >> the existing code structure in place (most function names will look
> >> familiar to the reader). The biggest change is that we don't have to
> >> deal with a static register allocation (the compiler does it for us),
> >> we can easily follow structure and pointers, and only the lowest level
> >> is still in assembly code. Oh, and a negative diffstat.
> >>
> >> There is still a healthy dose of inline assembly (system register
> >> accessors, runtime code patching), but I've tried not to make it too
> >> invasive. The generated code, while not exactly brilliant, doesn't
> >> look too shaby. I do expect a small performance degradation, but I
> >> believe this is something we can improve over time (my initial
> >> measurements don't show any obvious regression though).
> > 
> > I ran this through my experimental setup on m400 and got this:
> 
> [...]
> 
> > What this tells me is that we do take a noticable hit on the
> > world-switch path, which shows up in the TCP_RR and hackbench workloads,
> > which have a high precision in their output.
> > 
> > Note that the memcached number is well within its variability between
> > individual benchmark runs, where it varies to 12% of its average in over
> > 80% of the executions.
> > 
> > I don't think this is a showstopper thought, but we could consider
> > looking more closely at a breakdown of the world-switch path and verify
> > if/where we are really taking a hit.
> 
> Thanks for doing so, very interesting. As a data point, what compiler
> are you using? I'd expect some variability based on the compiler version...
> 
I used the following (compiling natively on the m400):

gcc version 4.8.2 (Ubuntu/Linaro 4.8.2-19ubuntu1)


-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 04/21] arm64: KVM: Implement vgic-v3 save/restore

2015-12-01 Thread Christoffer Dall
On Tue, Dec 01, 2015 at 11:57:16AM +, Marc Zyngier wrote:
> On 01/12/15 11:50, Christoffer Dall wrote:
> > On Tue, Dec 01, 2015 at 12:44:26PM +0100, Christoffer Dall wrote:
> >> On Tue, Dec 01, 2015 at 11:32:20AM +, Marc Zyngier wrote:
> >>> On 30/11/15 19:50, Christoffer Dall wrote:
>  On Fri, Nov 27, 2015 at 06:49:58PM +, Marc Zyngier wrote:
> > Implement the vgic-v3 save restore as a direct translation of
> > the assembly code version.
> >
> > Signed-off-by: Marc Zyngier 
> > ---
> >  arch/arm64/kvm/hyp/Makefile |   1 +
> >  arch/arm64/kvm/hyp/hyp.h|   3 +
> >  arch/arm64/kvm/hyp/vgic-v3-sr.c | 222 
> > 
> >  3 files changed, 226 insertions(+)
> >  create mode 100644 arch/arm64/kvm/hyp/vgic-v3-sr.c
> >
> > diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> > index d8d5968..d1e38ce 100644
> > --- a/arch/arm64/kvm/hyp/Makefile
> > +++ b/arch/arm64/kvm/hyp/Makefile
> > @@ -3,3 +3,4 @@
> >  #
> >  
> >  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
> > +obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
> > diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
> > index 78f25c4..a31cb6e 100644
> > --- a/arch/arm64/kvm/hyp/hyp.h
> > +++ b/arch/arm64/kvm/hyp/hyp.h
> > @@ -30,5 +30,8 @@
> >  void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
> >  void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
> >  
> > +void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
> > +void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
> > +
> >  #endif /* __ARM64_KVM_HYP_H__ */
> >  
> > diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c 
> > b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> > new file mode 100644
> > index 000..b490db5
> > --- /dev/null
> > +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
> > @@ -0,0 +1,222 @@
> > +/*
> > + * Copyright (C) 2012-2015 - ARM Ltd
> > + * Author: Marc Zyngier 
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program.  If not, see 
> > .
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include 
> > +
> > +#include "hyp.h"
> > +
> > +/*
> > + * We store LRs in reverse order to let the CPU deal with streaming
> > + * access. Use this macro to make it look saner...
> > + */
> > +#define LR_OFFSET(n)   (15 - n)
> > +
> > +#define read_gicreg(r) 
> > \
> > +   ({  
> > \
> > +   u64 reg;
> > \
> > +   asm volatile("mrs_s %0, " __stringify(r) : "=r" (reg)); 
> > \
> > +   reg;
> > \
> > +   })
> > +
> > +#define write_gicreg(v,r)  
> > \
> > +   do {
> > \
> > +   u64 __val = (v);
> > \
> > +   asm volatile("msr_s " __stringify(r) ", %0" : : "r" 
> > (__val));\
> > +   } while (0)
> 
>  remind me what the msr_s and mrs_s do compared to msr and mrs?
> >>>
> >>> They do the same job, only for the system registers which are not in the
> >>> original ARMv8 architecture spec, and most likely not implemented by
> >>> old(er) compilers.
> >>>
>  are these the reason why we need separate macros to access the gic
>  registers compared to 'normal' sysregs?
> >>>
> >>> Indeed.
> >>>
> > +
> > +/* vcpu is already in the HYP VA space */
> > +void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
> > +{
> > +   struct vgic_v3_cpu_if *cpu_if = >arch.vgic_cpu.vgic_v3;
> > +   u64 val;
> > +   u32 nr_lr, nr_pri;
> > +
> > +   /*
> > +* Make sure stores to the GIC via the memory mapped interface
> > +* are now visible to the system register interface.
> > +*/
> > +   dsb(st);
> > +
> > +   

Re: [PATCH v2 04/21] arm64: KVM: Implement vgic-v3 save/restore

2015-12-01 Thread Marc Zyngier
On 30/11/15 19:50, Christoffer Dall wrote:
> On Fri, Nov 27, 2015 at 06:49:58PM +, Marc Zyngier wrote:
>> Implement the vgic-v3 save restore as a direct translation of
>> the assembly code version.
>>
>> Signed-off-by: Marc Zyngier 
>> ---
>>  arch/arm64/kvm/hyp/Makefile |   1 +
>>  arch/arm64/kvm/hyp/hyp.h|   3 +
>>  arch/arm64/kvm/hyp/vgic-v3-sr.c | 222 
>> 
>>  3 files changed, 226 insertions(+)
>>  create mode 100644 arch/arm64/kvm/hyp/vgic-v3-sr.c
>>
>> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
>> index d8d5968..d1e38ce 100644
>> --- a/arch/arm64/kvm/hyp/Makefile
>> +++ b/arch/arm64/kvm/hyp/Makefile
>> @@ -3,3 +3,4 @@
>>  #
>>  
>>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
>> +obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
>> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
>> index 78f25c4..a31cb6e 100644
>> --- a/arch/arm64/kvm/hyp/hyp.h
>> +++ b/arch/arm64/kvm/hyp/hyp.h
>> @@ -30,5 +30,8 @@
>>  void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
>>  void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
>>  
>> +void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
>> +void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
>> +
>>  #endif /* __ARM64_KVM_HYP_H__ */
>>  
>> diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c 
>> b/arch/arm64/kvm/hyp/vgic-v3-sr.c
>> new file mode 100644
>> index 000..b490db5
>> --- /dev/null
>> +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
>> @@ -0,0 +1,222 @@
>> +/*
>> + * Copyright (C) 2012-2015 - ARM Ltd
>> + * Author: Marc Zyngier 
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see .
>> + */
>> +
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include 
>> +
>> +#include "hyp.h"
>> +
>> +/*
>> + * We store LRs in reverse order to let the CPU deal with streaming
>> + * access. Use this macro to make it look saner...
>> + */
>> +#define LR_OFFSET(n)(15 - n)
>> +
>> +#define read_gicreg(r)  
>> \
>> +({  \
>> +u64 reg;\
>> +asm volatile("mrs_s %0, " __stringify(r) : "=r" (reg)); \
>> +reg;\
>> +})
>> +
>> +#define write_gicreg(v,r)   \
>> +do {\
>> +u64 __val = (v);\
>> +asm volatile("msr_s " __stringify(r) ", %0" : : "r" (__val));\
>> +} while (0)
> 
> remind me what the msr_s and mrs_s do compared to msr and mrs?

They do the same job, only for the system registers which are not in the
original ARMv8 architecture spec, and most likely not implemented by
old(er) compilers.

> are these the reason why we need separate macros to access the gic
> registers compared to 'normal' sysregs?

Indeed.

>> +
>> +/* vcpu is already in the HYP VA space */
>> +void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
>> +{
>> +struct vgic_v3_cpu_if *cpu_if = >arch.vgic_cpu.vgic_v3;
>> +u64 val;
>> +u32 nr_lr, nr_pri;
>> +
>> +/*
>> + * Make sure stores to the GIC via the memory mapped interface
>> + * are now visible to the system register interface.
>> + */
>> +dsb(st);
>> +
>> +cpu_if->vgic_vmcr  = read_gicreg(ICH_VMCR_EL2);
>> +cpu_if->vgic_misr  = read_gicreg(ICH_MISR_EL2);
>> +cpu_if->vgic_eisr  = read_gicreg(ICH_EISR_EL2);
>> +cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2);
>> +
>> +write_gicreg(0, ICH_HCR_EL2);
>> +val = read_gicreg(ICH_VTR_EL2);
>> +nr_lr = val & 0xf;
> 
> this is not technically nr_lr, it's max_lr or max_lr_idx or something
> like that.

Let's go for max_lr_idx  then.

>> +nr_pri = ((u32)val >> 29) + 1;
> 
> nit: nr_pri_bits
> 
>> +
>> +switch (nr_lr) {
>> +case 15:
>> +cpu_if->vgic_lr[LR_OFFSET(15)] = read_gicreg(ICH_LR15_EL2);
>> +case 14:
>> +cpu_if->vgic_lr[LR_OFFSET(14)] = read_gicreg(ICH_LR14_EL2);
>> +case 13:
>> +cpu_if->vgic_lr[LR_OFFSET(13)] = read_gicreg(ICH_LR13_EL2);
>> +case 12:
>> +cpu_if->vgic_lr[LR_OFFSET(12)] = read_gicreg(ICH_LR12_EL2);
>> +case 11:
>> +

Re: [PATCH v2 02/21] arm64: KVM: Add a HYP-specific header file

2015-12-01 Thread Marc Zyngier
On 30/11/15 20:00, Christoffer Dall wrote:
> On Fri, Nov 27, 2015 at 06:49:56PM +, Marc Zyngier wrote:
>> In order to expose the various EL2 services that are private to
>> the hypervisor, add a new hyp.h file.
>>
>> So far, it only contains mundane things such as section annotation
>> and VA manipulation.
>>
>> Signed-off-by: Marc Zyngier 
>> ---
>>  arch/arm64/kvm/hyp/hyp.h | 31 +++
>>  1 file changed, 31 insertions(+)
>>  create mode 100644 arch/arm64/kvm/hyp/hyp.h
>>
>> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
>> new file mode 100644
>> index 000..dac843e
>> --- /dev/null
>> +++ b/arch/arm64/kvm/hyp/hyp.h
>> @@ -0,0 +1,31 @@
>> +/*
>> + * Copyright (C) 2015 - ARM Ltd
>> + * Author: Marc Zyngier 
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see .
>> + */
>> +
>> +#ifndef __ARM64_KVM_HYP_H__
>> +#define __ARM64_KVM_HYP_H__
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#define __hyp_text __section(.hyp.text) notrace
> 
> why notrace?

Because you'd end with calls to mcount in each function prologue, and
that doesn't really well for stuff that is not executed in the kernel
address space.

>> +
>> +#define kern_hyp_va(v) (typeof(v))((unsigned long)v & HYP_PAGE_OFFSET_MASK)
> 
> should you have parenthesis around 'v' ?

Yup.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 02/21] arm64: KVM: Add a HYP-specific header file

2015-12-01 Thread Christoffer Dall
On Tue, Dec 01, 2015 at 11:41:08AM +, Marc Zyngier wrote:
> On 30/11/15 20:00, Christoffer Dall wrote:
> > On Fri, Nov 27, 2015 at 06:49:56PM +, Marc Zyngier wrote:
> >> In order to expose the various EL2 services that are private to
> >> the hypervisor, add a new hyp.h file.
> >>
> >> So far, it only contains mundane things such as section annotation
> >> and VA manipulation.
> >>
> >> Signed-off-by: Marc Zyngier 
> >> ---
> >>  arch/arm64/kvm/hyp/hyp.h | 31 +++
> >>  1 file changed, 31 insertions(+)
> >>  create mode 100644 arch/arm64/kvm/hyp/hyp.h
> >>
> >> diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
> >> new file mode 100644
> >> index 000..dac843e
> >> --- /dev/null
> >> +++ b/arch/arm64/kvm/hyp/hyp.h
> >> @@ -0,0 +1,31 @@
> >> +/*
> >> + * Copyright (C) 2015 - ARM Ltd
> >> + * Author: Marc Zyngier 
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License version 2 as
> >> + * published by the Free Software Foundation.
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public License
> >> + * along with this program.  If not, see .
> >> + */
> >> +
> >> +#ifndef __ARM64_KVM_HYP_H__
> >> +#define __ARM64_KVM_HYP_H__
> >> +
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +
> >> +#define __hyp_text __section(.hyp.text) notrace
> > 
> > why notrace?
> 
> Because you'd end with calls to mcount in each function prologue, and
> that doesn't really well for stuff that is not executed in the kernel
> address space.

right, makes good sense.

> 
> >> +
> >> +#define kern_hyp_va(v) (typeof(v))((unsigned long)v & 
> >> HYP_PAGE_OFFSET_MASK)
> > 
> > should you have parenthesis around 'v' ?
> 
> Yup.
> 
> Thanks,
> 
>   M.
> -- 
> Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 04/21] arm64: KVM: Implement vgic-v3 save/restore

2015-12-01 Thread Marc Zyngier
On 01/12/15 11:44, Christoffer Dall wrote:
> On Tue, Dec 01, 2015 at 11:32:20AM +, Marc Zyngier wrote:
>> On 30/11/15 19:50, Christoffer Dall wrote:
>>> On Fri, Nov 27, 2015 at 06:49:58PM +, Marc Zyngier wrote:
 Implement the vgic-v3 save restore as a direct translation of
 the assembly code version.

 Signed-off-by: Marc Zyngier 
 ---
  arch/arm64/kvm/hyp/Makefile |   1 +
  arch/arm64/kvm/hyp/hyp.h|   3 +
  arch/arm64/kvm/hyp/vgic-v3-sr.c | 222 
 
  3 files changed, 226 insertions(+)
  create mode 100644 arch/arm64/kvm/hyp/vgic-v3-sr.c

 diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
 index d8d5968..d1e38ce 100644
 --- a/arch/arm64/kvm/hyp/Makefile
 +++ b/arch/arm64/kvm/hyp/Makefile
 @@ -3,3 +3,4 @@
  #
  
  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
 +obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
 diff --git a/arch/arm64/kvm/hyp/hyp.h b/arch/arm64/kvm/hyp/hyp.h
 index 78f25c4..a31cb6e 100644
 --- a/arch/arm64/kvm/hyp/hyp.h
 +++ b/arch/arm64/kvm/hyp/hyp.h
 @@ -30,5 +30,8 @@
  void __vgic_v2_save_state(struct kvm_vcpu *vcpu);
  void __vgic_v2_restore_state(struct kvm_vcpu *vcpu);
  
 +void __vgic_v3_save_state(struct kvm_vcpu *vcpu);
 +void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);
 +
  #endif /* __ARM64_KVM_HYP_H__ */
  
 diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c 
 b/arch/arm64/kvm/hyp/vgic-v3-sr.c
 new file mode 100644
 index 000..b490db5
 --- /dev/null
 +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
 @@ -0,0 +1,222 @@
 +/*
 + * Copyright (C) 2012-2015 - ARM Ltd
 + * Author: Marc Zyngier 
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License version 2 as
 + * published by the Free Software Foundation.
 + *
 + * This program is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 + * GNU General Public License for more details.
 + *
 + * You should have received a copy of the GNU General Public License
 + * along with this program.  If not, see .
 + */
 +
 +#include 
 +#include 
 +#include 
 +
 +#include 
 +
 +#include "hyp.h"
 +
 +/*
 + * We store LRs in reverse order to let the CPU deal with streaming
 + * access. Use this macro to make it look saner...
 + */
 +#define LR_OFFSET(n)  (15 - n)
 +
 +#define read_gicreg(r)
 \
 +  ({  \
 +  u64 reg;\
 +  asm volatile("mrs_s %0, " __stringify(r) : "=r" (reg)); \
 +  reg;\
 +  })
 +
 +#define write_gicreg(v,r) \
 +  do {\
 +  u64 __val = (v);\
 +  asm volatile("msr_s " __stringify(r) ", %0" : : "r" (__val));\
 +  } while (0)
>>>
>>> remind me what the msr_s and mrs_s do compared to msr and mrs?
>>
>> They do the same job, only for the system registers which are not in the
>> original ARMv8 architecture spec, and most likely not implemented by
>> old(er) compilers.
>>
>>> are these the reason why we need separate macros to access the gic
>>> registers compared to 'normal' sysregs?
>>
>> Indeed.
>>
 +
 +/* vcpu is already in the HYP VA space */
 +void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu)
 +{
 +  struct vgic_v3_cpu_if *cpu_if = >arch.vgic_cpu.vgic_v3;
 +  u64 val;
 +  u32 nr_lr, nr_pri;
 +
 +  /*
 +   * Make sure stores to the GIC via the memory mapped interface
 +   * are now visible to the system register interface.
 +   */
 +  dsb(st);
 +
 +  cpu_if->vgic_vmcr  = read_gicreg(ICH_VMCR_EL2);
 +  cpu_if->vgic_misr  = read_gicreg(ICH_MISR_EL2);
 +  cpu_if->vgic_eisr  = read_gicreg(ICH_EISR_EL2);
 +  cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2);
 +
 +  write_gicreg(0, ICH_HCR_EL2);
 +  val = read_gicreg(ICH_VTR_EL2);
 +  nr_lr = val & 0xf;
>>>
>>> this is not technically nr_lr, it's max_lr or max_lr_idx or something
>>> like that.
>>
>> Let's go for max_lr_idx  then.
>>
 +  nr_pri = ((u32)val >> 29) + 1;
>>>
>>> nit: nr_pri_bits
>>>
 +
 +  switch (nr_lr) {
 +  case 15:
 +  cpu_if->vgic_lr[LR_OFFSET(15)] = read_gicreg(ICH_LR15_EL2);
 +  

RE: BUG ALERT: ARM32 KVM does not work in 4.4-rc3

2015-12-01 Thread Pavel Fedin
Hello!

> -Original Message-
> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf 
> Of Pavel Fedin
> Sent: Tuesday, December 01, 2015 1:03 PM
> To: 'Marc Zyngier'; kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org
> Cc: 'Ard Biesheuvel'; christoffer.d...@linaro.org
> Subject: RE: BUG ALERT: ARM32 KVM does not work in 4.4-rc3
> 
>  Hello!
> 
> > > My project involves ARM64, but from time to time i also test ARM32
> > > KVM. I have discovered that it stopped working in 4.4-rc3. The same
> > > virtual machine works perfectly under current kvmarm/next, but gets
> > > stuck at random point under 4.4-rc3 from linux-stable. I'm not sure
> > > that i have time to investigate this quickly, but i'll post some new
> > > information as soon as i get it
> 
> [skip]
> 
> > So until you bisect it to an exact commit and configuration, I declare
> > the alert over. ;-)
> 
>  The commit in question is e6fab54423450d699a09ec2b899473a541f61971 
> ("ARM/arm64: KVM: test
> properly for a PTE's uncachedness").
> Reverting it fixes the problem.
>  Study in qemu shows that the CPU gets stuck at PC = 0x0C with LR = 0x10. So 
> i quickly decided
> that it might have to do with
> caching, and my first hit was correct. The guest crashes in this state very 
> early, sometimes
> it even cannot fully print
> "Uncompressing kernel".
>  The machine which reproduces it is custom Samsung's out-of-tree board. I'll 
> investigate it
> further in order to determine how
> exactly the commit could harm. I know that it passed reviews and testing, and 
> i was involved
> too. Perhaps it's board's code fault,
> however.
> 
> Cc'ed to others involved.

 The problem seems to be triggered by ARCH_SPARSEMEM_ENABLE. My top-secret 
machine uses it, while more widespread things like
vexpress and Exynos don't.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 08/21] arm64: KVM: Implement debug save/restore

2015-12-01 Thread Christoffer Dall
On Tue, Dec 01, 2015 at 01:06:31PM +, Marc Zyngier wrote:
> On 01/12/15 12:56, Christoffer Dall wrote:
> > On Fri, Nov 27, 2015 at 06:50:02PM +, Marc Zyngier wrote:
> >> Implement the debug save restore as a direct translation of
> >> the assembly code version.
> >>
> >> Signed-off-by: Marc Zyngier 
> >> ---
> >>  arch/arm64/kvm/hyp/Makefile   |   1 +
> >>  arch/arm64/kvm/hyp/debug-sr.c | 130 
> >> ++
> >>  arch/arm64/kvm/hyp/hyp.h  |   9 +++
> >>  3 files changed, 140 insertions(+)
> >>  create mode 100644 arch/arm64/kvm/hyp/debug-sr.c
> >>
> >> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> >> index ec94200..ec14cac 100644
> >> --- a/arch/arm64/kvm/hyp/Makefile
> >> +++ b/arch/arm64/kvm/hyp/Makefile
> >> @@ -6,3 +6,4 @@ obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
> >>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
> >>  obj-$(CONFIG_KVM_ARM_HOST) += timer-sr.o
> >>  obj-$(CONFIG_KVM_ARM_HOST) += sysreg-sr.o
> >> +obj-$(CONFIG_KVM_ARM_HOST) += debug-sr.o
> >> diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
> >> new file mode 100644
> >> index 000..a0b2b99
> >> --- /dev/null
> >> +++ b/arch/arm64/kvm/hyp/debug-sr.c
> >> @@ -0,0 +1,130 @@
> >> +/*
> >> + * Copyright (C) 2015 - ARM Ltd
> >> + * Author: Marc Zyngier 
> >> + *
> >> + * This program is free software; you can redistribute it and/or modify
> >> + * it under the terms of the GNU General Public License version 2 as
> >> + * published by the Free Software Foundation.
> >> + *
> >> + * This program is distributed in the hope that it will be useful,
> >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> >> + * GNU General Public License for more details.
> >> + *
> >> + * You should have received a copy of the GNU General Public License
> >> + * along with this program.  If not, see .
> >> + */
> >> +
> >> +#include 
> >> +#include 
> >> +
> >> +#include 
> >> +
> >> +#include "hyp.h"
> >> +
> >> +#define read_debug(r,n)   read_sysreg(r##n##_el1)
> >> +#define write_debug(v,r,n)write_sysreg(v, r##n##_el1)
> >> +
> >> +#define save_debug(ptr,reg,nr)
> >> \
> >> +  switch (nr) {   \
> >> +  case 15:ptr[15] = read_debug(reg, 15);  \
> >> +  case 14:ptr[14] = read_debug(reg, 14);  \
> >> +  case 13:ptr[13] = read_debug(reg, 13);  \
> >> +  case 12:ptr[12] = read_debug(reg, 12);  \
> >> +  case 11:ptr[11] = read_debug(reg, 11);  \
> >> +  case 10:ptr[10] = read_debug(reg, 10);  \
> >> +  case 9: ptr[9] = read_debug(reg, 9);\
> >> +  case 8: ptr[8] = read_debug(reg, 8);\
> >> +  case 7: ptr[7] = read_debug(reg, 7);\
> >> +  case 6: ptr[6] = read_debug(reg, 6);\
> >> +  case 5: ptr[5] = read_debug(reg, 5);\
> >> +  case 4: ptr[4] = read_debug(reg, 4);\
> >> +  case 3: ptr[3] = read_debug(reg, 3);\
> >> +  case 2: ptr[2] = read_debug(reg, 2);\
> >> +  case 1: ptr[1] = read_debug(reg, 1);\
> >> +  default:ptr[0] = read_debug(reg, 0);\
> >> +  }
> >> +
> >> +#define restore_debug(ptr,reg,nr) \
> >> +  switch (nr) {   \
> >> +  case 15:write_debug(ptr[15], reg, 15);  \
> >> +  case 14:write_debug(ptr[14], reg, 14);  \
> >> +  case 13:write_debug(ptr[13], reg, 13);  \
> >> +  case 12:write_debug(ptr[12], reg, 12);  \
> >> +  case 11:write_debug(ptr[11], reg, 11);  \
> >> +  case 10:write_debug(ptr[10], reg, 10);  \
> >> +  case 9: write_debug(ptr[9], reg, 9);\
> >> +  case 8: write_debug(ptr[8], reg, 8);\
> >> +  case 7: write_debug(ptr[7], reg, 7);\
> >> +  case 6: write_debug(ptr[6], reg, 6);\
> >> +  case 5: write_debug(ptr[5], reg, 5);\
> >> +  case 4: write_debug(ptr[4], reg, 4);\
> >> +  case 3: write_debug(ptr[3], reg, 3);\
> >> +  case 2: write_debug(ptr[2], reg, 2);\
> >> +  case 1: write_debug(ptr[1], reg, 1);\
> >> +  default:write_debug(ptr[0], reg, 0);\
> >> +  }
> >> +

Re: [PATCH v2 08/21] arm64: KVM: Implement debug save/restore

2015-12-01 Thread Christoffer Dall
On Tue, Dec 01, 2015 at 03:47:37PM +0100, Christoffer Dall wrote:
> On Tue, Dec 01, 2015 at 01:06:31PM +, Marc Zyngier wrote:
> > On 01/12/15 12:56, Christoffer Dall wrote:
> > > On Fri, Nov 27, 2015 at 06:50:02PM +, Marc Zyngier wrote:
> > >> Implement the debug save restore as a direct translation of
> > >> the assembly code version.
> > >>
> > >> Signed-off-by: Marc Zyngier 
> > >> ---
> > >>  arch/arm64/kvm/hyp/Makefile   |   1 +
> > >>  arch/arm64/kvm/hyp/debug-sr.c | 130 
> > >> ++
> > >>  arch/arm64/kvm/hyp/hyp.h  |   9 +++
> > >>  3 files changed, 140 insertions(+)
> > >>  create mode 100644 arch/arm64/kvm/hyp/debug-sr.c
> > >>
> > >> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> > >> index ec94200..ec14cac 100644
> > >> --- a/arch/arm64/kvm/hyp/Makefile
> > >> +++ b/arch/arm64/kvm/hyp/Makefile
> > >> @@ -6,3 +6,4 @@ obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
> > >>  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
> > >>  obj-$(CONFIG_KVM_ARM_HOST) += timer-sr.o
> > >>  obj-$(CONFIG_KVM_ARM_HOST) += sysreg-sr.o
> > >> +obj-$(CONFIG_KVM_ARM_HOST) += debug-sr.o
> > >> diff --git a/arch/arm64/kvm/hyp/debug-sr.c 
> > >> b/arch/arm64/kvm/hyp/debug-sr.c
> > >> new file mode 100644
> > >> index 000..a0b2b99
> > >> --- /dev/null
> > >> +++ b/arch/arm64/kvm/hyp/debug-sr.c
> > >> @@ -0,0 +1,130 @@
> > >> +/*
> > >> + * Copyright (C) 2015 - ARM Ltd
> > >> + * Author: Marc Zyngier 
> > >> + *
> > >> + * This program is free software; you can redistribute it and/or modify
> > >> + * it under the terms of the GNU General Public License version 2 as
> > >> + * published by the Free Software Foundation.
> > >> + *
> > >> + * This program is distributed in the hope that it will be useful,
> > >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > >> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > >> + * GNU General Public License for more details.
> > >> + *
> > >> + * You should have received a copy of the GNU General Public License
> > >> + * along with this program.  If not, see .
> > >> + */
> > >> +
> > >> +#include 
> > >> +#include 
> > >> +
> > >> +#include 
> > >> +
> > >> +#include "hyp.h"
> > >> +
> > >> +#define read_debug(r,n) read_sysreg(r##n##_el1)
> > >> +#define write_debug(v,r,n)  write_sysreg(v, r##n##_el1)
> > >> +
> > >> +#define save_debug(ptr,reg,nr)  
> > >> \
> > >> +switch (nr) {   
> > >> \
> > >> +case 15:ptr[15] = read_debug(reg, 15);  
> > >> \
> > >> +case 14:ptr[14] = read_debug(reg, 14);  
> > >> \
> > >> +case 13:ptr[13] = read_debug(reg, 13);  
> > >> \
> > >> +case 12:ptr[12] = read_debug(reg, 12);  
> > >> \
> > >> +case 11:ptr[11] = read_debug(reg, 11);  
> > >> \
> > >> +case 10:ptr[10] = read_debug(reg, 10);  
> > >> \
> > >> +case 9: ptr[9] = read_debug(reg, 9);
> > >> \
> > >> +case 8: ptr[8] = read_debug(reg, 8);
> > >> \
> > >> +case 7: ptr[7] = read_debug(reg, 7);
> > >> \
> > >> +case 6: ptr[6] = read_debug(reg, 6);
> > >> \
> > >> +case 5: ptr[5] = read_debug(reg, 5);
> > >> \
> > >> +case 4: ptr[4] = read_debug(reg, 4);
> > >> \
> > >> +case 3: ptr[3] = read_debug(reg, 3);
> > >> \
> > >> +case 2: ptr[2] = read_debug(reg, 2);
> > >> \
> > >> +case 1: ptr[1] = read_debug(reg, 1);
> > >> \
> > >> +default:ptr[0] = read_debug(reg, 0);
> > >> \
> > >> +}
> > >> +
> > >> +#define restore_debug(ptr,reg,nr)   
> > >> \
> > >> +switch (nr) {   
> > >> \
> > >> +case 15:write_debug(ptr[15], reg, 15);  
> > >> \
> > >> +case 14:write_debug(ptr[14], reg, 14);  
> > >> \
> > >> +case 13:write_debug(ptr[13], reg, 13);  
> > >> \
> > >> +case 12:write_debug(ptr[12], reg, 12);  
> > >> \
> > >> +case 11:write_debug(ptr[11], reg, 11);  
> > >> \
> > >> +case 10:write_debug(ptr[10], reg, 10);  
> > >> \
> > >> +case 9: write_debug(ptr[9], reg, 9);
> > >> \
> > >> +case 8: write_debug(ptr[8], reg, 8);   

Re: [PATCH 00/11] KVM: x86: track guest page access

2015-12-01 Thread Andrea Arcangeli
On Tue, Dec 01, 2015 at 11:17:30AM +0100, Paolo Bonzini wrote:
> 
> 
> On 30/11/2015 19:26, Xiao Guangrong wrote:
> > This patchset introduces the feature which allows us to track page
> > access in guest. Currently, only write access tracking is implemented
> > in this version.
> > 
> > Four APIs are introduces:
> > - kvm_page_track_add_page(kvm, gfn, mode), single guest page @gfn is
> >   added into the track pool of the guest instance represented by @kvm,
> >   @mode specifies which kind of access on the @gfn is tracked
> >   
> > - kvm_page_track_remove_page(kvm, gfn, mode), is the opposed operation
> >   of kvm_page_track_add_page() which removes @gfn from the tracking pool.
> >   gfn is no tracked after its last user is gone
> > 
> > - kvm_page_track_register_notifier(kvm, n), register a notifier so that
> >   the event triggered by page tracking will be received, at that time,
> >   the callback of n->track_write() will be called
> > 
> > - kvm_page_track_unregister_notifier(kvm, n), does the opposed operation
> >   of kvm_page_track_register_notifier(), which unlinks the notifier and
> >   stops receiving the tracked event
> > 
> > The first user of page track is non-leaf shadow page tables as they are
> > always write protected. It also gains performance improvement because
> > page track speeds up page fault handler for the tracked pages. The
> > performance result of kernel building is as followings:
> > 
> >before   after
> > real 461.63   real 455.48
> > user 4529.55  user 4557.88
> > sys 1995.39   sys 1922.57
> 
> For KVM-GT, as far as I know Andrea Arcangeli is working on extending
> userfaultfd to tracking write faults only.  Perhaps KVM-GT can do

I was a bit busy lately with the KSMscale design change and to fix a
THP purely theoretical issue, but the userfaultfd write tracking is
already become available here:

http://www.spinics.net/lists/linux-mm/msg97422.html

I'll be merging it soon in my tree after a thoughtful review.

> something similar, where KVM gets the write tracking functionality for
> free through the MMU notifiers.  Any thoughts on this?
> 
> Applying your technique to non-leaf shadow pages actually makes this
> series quite interesting. :)  Shadow paging is still in use for nested
> EPT, so it's always a good idea to speed it up.

I don't have the full picture of how userfaultfd write tracking could
also fit in the leaf/non-leaf shadow pagetable write tracking yet but
it's good to think about it.

In the userfaultfd case the write notification would always arrive
first through the uffd and it would be received by the qemu userfault
thread, it's then the uffd memory protect ioctl invoked by the qemu
userfault thread (to handle the write fault in userland and wake up
the thread stuck in handle_userfault()) that would also flush the
secondary MMU TLB through MMU notifier and get rid of the readonly
spte (or update it to read-write with change_pte in the best case).

Thanks,
Andrea
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH V2 0/3] IXGBE/VFIO: Add live migration support for SRIOV NIC

2015-12-01 Thread Lan, Tianyu



On 12/1/2015 12:07 AM, Alexander Duyck wrote:

They can only be corrected if the underlying assumptions are correct
and they aren't.  Your solution would have never worked correctly.
The problem is you assume you can keep the device running when you are
migrating and you simply cannot.  At some point you will always have
to stop the device in order to complete the migration, and you cannot
stop it before you have stopped your page tracking mechanism.  So
unless the platform has an IOMMU that is somehow taking part in the
dirty page tracking you will not be able to stop the guest and then
the device, it will have to be the device and then the guest.


>Doing suspend and resume() may help to do migration easily but some
>devices requires low service down time. Especially network and I got
>that some cloud company promised less than 500ms network service downtime.

Honestly focusing on the downtime is getting the cart ahead of the
horse.  First you need to be able to do this without corrupting system
memory and regardless of the state of the device.  You haven't even
gotten to that state yet.  Last I knew the device had to be up in
order for your migration to even work.


I think the issue is that the content of rx package delivered to stack 
maybe changed during migration because the piece of memory won't be 
migrated to new machine. This may confuse applications or stack. Current 
dummy write solution can ensure the content of package won't change 
after doing dummy write while the content maybe not received data if 
migration happens before that point. We can recheck the content via 
checksum or crc in the protocol after dummy write to ensure the content 
is what VF received. I think stack has already done such checks and the 
package will be abandoned if failed to pass through the check.


Another way is to tell all memory driver are using to Qemu and let Qemu 
to migrate these memory after stopping VCPU and the device. This seems 
safe but implementation maybe complex.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 18/21] KVM: ARM64: Add PMU overflow interrupt routing

2015-12-01 Thread Shannon Zhao


On 2015/12/1 2:22, Marc Zyngier wrote:
> On Fri, 30 Oct 2015 14:22:00 +0800
> Shannon Zhao  wrote:
> 
>> From: Shannon Zhao 
>>
>> When calling perf_event_create_kernel_counter to create perf_event,
>> assign a overflow handler. Then when perf event overflows, set
>> irq_pending and call kvm_vcpu_kick() to sync the interrupt.
>>
>> Signed-off-by: Shannon Zhao 
>> ---
>>  arch/arm/kvm/arm.c|  4 +++
>>  include/kvm/arm_pmu.h |  4 +++
>>  virt/kvm/arm/pmu.c| 76 
>> ++-
>>  3 files changed, 83 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 78b2869..9c0fec4 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -28,6 +28,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #define CREATE_TRACE_POINTS
>>  #include "trace.h"
>> @@ -551,6 +552,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
>> struct kvm_run *run)
>>  
>>  if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>>  local_irq_enable();
>> +kvm_pmu_sync_hwstate(vcpu);
> 
> This is very weird. Are you only injecting interrupts when a signal is
> pending? I don't understand how this works...
> 
>>  kvm_vgic_sync_hwstate(vcpu);
>>  preempt_enable();
>>  kvm_timer_sync_hwstate(vcpu);
>> @@ -598,6 +600,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
>> struct kvm_run *run)
>>  kvm_guest_exit();
>>  trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
>>  
>> +kvm_pmu_post_sync_hwstate(vcpu);
>> +
>>  kvm_vgic_sync_hwstate(vcpu);
>>  
>>  preempt_enable();
>> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
>> index acd025a..5e7f943 100644
>> --- a/include/kvm/arm_pmu.h
>> +++ b/include/kvm/arm_pmu.h
>> @@ -39,6 +39,8 @@ struct kvm_pmu {
>>  };
>>  
>>  #ifdef CONFIG_KVM_ARM_PMU
>> +void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu);
>> +void kvm_pmu_post_sync_hwstate(struct kvm_vcpu *vcpu);
> 
> Please follow the current terminology: _flush_ on VM entry, _sync_ on
> VM exit.
> 

Hi Marc,

Is below patch the right way for this?

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 78b2869..84008d1 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 

 #define CREATE_TRACE_POINTS
 #include "trace.h"
@@ -531,6 +532,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
struct kvm_run *run)
 */
kvm_timer_flush_hwstate(vcpu);

+   kvm_pmu_flush_hwstate(vcpu);
+
/*
 * Preparing the interrupts to be injected also
 * involves poking the GIC, which must be done in a
@@ -554,6 +557,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
struct kvm_run *run)
kvm_vgic_sync_hwstate(vcpu);
preempt_enable();
kvm_timer_sync_hwstate(vcpu);
+   kvm_pmu_sync_hwstate(vcpu);
continue;
}

@@ -604,6 +608,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
struct kvm_run *run)

kvm_timer_sync_hwstate(vcpu);

+   kvm_pmu_sync_hwstate(vcpu);
+
ret = handle_exit(vcpu, run, ret);
}

diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 47bbd43..edfe4e5 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -41,6 +41,8 @@ struct kvm_pmu {
 };

 #ifdef CONFIG_KVM_ARM_PMU
+void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu);
+void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu);
 unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32
select_idx);
 void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u32 val);
 void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u32 val, bool
all_enable);
@@ -51,6 +53,8 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu
*vcpu, u32 data,
u32 select_idx);
 void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u32 val);
 #else
+void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu) {}
+void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu) {}
 unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32
select_idx)
 {
return 0;
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index 15cac45..9aad2f7 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 

 /**
  * kvm_pmu_get_counter_value - get PMU counter value
@@ -79,6 +80,78 @@ static void kvm_pmu_stop_counter(struct kvm_pmc *pmc)
 }

 /**
+ * kvm_pmu_flush_hwstate - flush pmu state to cpu
+ * @vcpu: The vcpu pointer
+ *
+ * Inject virtual PMU IRQ if IRQ is pending for this cpu.
+ */
+void kvm_pmu_flush_hwstate(struct 

Re: [PATCH net-next 3/3] vhost_net: basic polling support

2015-12-01 Thread Michael S. Tsirkin
On Tue, Dec 01, 2015 at 01:17:49PM +0800, Jason Wang wrote:
> 
> 
> On 11/30/2015 06:44 PM, Michael S. Tsirkin wrote:
> > On Wed, Nov 25, 2015 at 03:11:29PM +0800, Jason Wang wrote:
> >> > This patch tries to poll for new added tx buffer or socket receive
> >> > queue for a while at the end of tx/rx processing. The maximum time
> >> > spent on polling were specified through a new kind of vring ioctl.
> >> > 
> >> > Signed-off-by: Jason Wang 
> > One further enhancement would be to actually poll
> > the underlying device. This should be reasonably
> > straight-forward with macvtap (especially in the
> > passthrough mode).
> >
> >
> 
> Yes, it is. I have some patches to do this by replacing
> skb_queue_empty() with sk_busy_loop() but for tap.

We probably don't want to do this unconditionally, though.

> Tests does not show
> any improvement but some regression.

Did you add code to call sk_mark_napi_id on tap then?
sk_busy_loop won't do anything useful without.

>  Maybe it's better to test macvtap.

Same thing ...

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 18/21] KVM: ARM64: Add PMU overflow interrupt routing

2015-12-01 Thread Marc Zyngier
On 01/12/15 14:35, Shannon Zhao wrote:
> 
> 
> On 2015/12/1 2:22, Marc Zyngier wrote:
>> On Fri, 30 Oct 2015 14:22:00 +0800
>> Shannon Zhao  wrote:
>>
>>> From: Shannon Zhao 
>>>
>>> When calling perf_event_create_kernel_counter to create perf_event,
>>> assign a overflow handler. Then when perf event overflows, set
>>> irq_pending and call kvm_vcpu_kick() to sync the interrupt.
>>>
>>> Signed-off-by: Shannon Zhao 
>>> ---
>>>  arch/arm/kvm/arm.c|  4 +++
>>>  include/kvm/arm_pmu.h |  4 +++
>>>  virt/kvm/arm/pmu.c| 76 
>>> ++-
>>>  3 files changed, 83 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>> index 78b2869..9c0fec4 100644
>>> --- a/arch/arm/kvm/arm.c
>>> +++ b/arch/arm/kvm/arm.c
>>> @@ -28,6 +28,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>  
>>>  #define CREATE_TRACE_POINTS
>>>  #include "trace.h"
>>> @@ -551,6 +552,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
>>> struct kvm_run *run)
>>>  
>>> if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
>>> local_irq_enable();
>>> +   kvm_pmu_sync_hwstate(vcpu);
>>
>> This is very weird. Are you only injecting interrupts when a signal is
>> pending? I don't understand how this works...
>>
>>> kvm_vgic_sync_hwstate(vcpu);
>>> preempt_enable();
>>> kvm_timer_sync_hwstate(vcpu);
>>> @@ -598,6 +600,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
>>> struct kvm_run *run)
>>> kvm_guest_exit();
>>> trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
>>>  
>>> +   kvm_pmu_post_sync_hwstate(vcpu);
>>> +
>>> kvm_vgic_sync_hwstate(vcpu);
>>>  
>>> preempt_enable();
>>> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
>>> index acd025a..5e7f943 100644
>>> --- a/include/kvm/arm_pmu.h
>>> +++ b/include/kvm/arm_pmu.h
>>> @@ -39,6 +39,8 @@ struct kvm_pmu {
>>>  };
>>>  
>>>  #ifdef CONFIG_KVM_ARM_PMU
>>> +void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu);
>>> +void kvm_pmu_post_sync_hwstate(struct kvm_vcpu *vcpu);
>>
>> Please follow the current terminology: _flush_ on VM entry, _sync_ on
>> VM exit.
>>
> 
> Hi Marc,
> 
> Is below patch the right way for this?
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 78b2869..84008d1 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -28,6 +28,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #define CREATE_TRACE_POINTS
>  #include "trace.h"
> @@ -531,6 +532,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
> struct kvm_run *run)
>  */
> kvm_timer_flush_hwstate(vcpu);
> 
> +   kvm_pmu_flush_hwstate(vcpu);
> +
> /*
>  * Preparing the interrupts to be injected also
>  * involves poking the GIC, which must be done in a
> @@ -554,6 +557,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
> struct kvm_run *run)
> kvm_vgic_sync_hwstate(vcpu);
> preempt_enable();
> kvm_timer_sync_hwstate(vcpu);
> +   kvm_pmu_sync_hwstate(vcpu);
> continue;
> }
> 
> @@ -604,6 +608,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu,
> struct kvm_run *run)
> 
> kvm_timer_sync_hwstate(vcpu);
> 
> +   kvm_pmu_sync_hwstate(vcpu);
> +
> ret = handle_exit(vcpu, run, ret);
> }

yeah, that's more like it!

> 
> diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
> index 47bbd43..edfe4e5 100644
> --- a/include/kvm/arm_pmu.h
> +++ b/include/kvm/arm_pmu.h
> @@ -41,6 +41,8 @@ struct kvm_pmu {
>  };
> 
>  #ifdef CONFIG_KVM_ARM_PMU
> +void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu);
> +void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu);
>  unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32
> select_idx);
>  void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u32 val);
>  void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u32 val, bool
> all_enable);
> @@ -51,6 +53,8 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu
> *vcpu, u32 data,
> u32 select_idx);
>  void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u32 val);
>  #else
> +void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu) {}
> +void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu) {}
>  unsigned long kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u32
> select_idx)
>  {
> return 0;
> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> index 15cac45..9aad2f7 100644
> --- a/virt/kvm/arm/pmu.c
> +++ b/virt/kvm/arm/pmu.c
> @@ -21,6 +21,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  /**
>   * 

Re: [RFC PATCH V2 00/10] Qemu: Add live migration support for SRIOV NIC

2015-12-01 Thread Michael S. Tsirkin
On Tue, Dec 01, 2015 at 02:26:57PM +0800, Lan, Tianyu wrote:
> 
> 
> On 11/30/2015 4:01 PM, Michael S. Tsirkin wrote:
> >It is still not very clear what it is you are trying to achieve, and
> >whether your patchset achieves it.  You merely say "adding live
> >migration" but it seems pretty clear this isn't about being able to
> >migrate a guest transparently, since you are adding a host/guest
> >handshake.
> >
> >This isn't about functionality either: I think that on KVM, it isn't
> >hard to live migrate if you can do a host/guest handshake, even today,
> >with no kernel changes:
> >1. before migration, expose a pv nic to guest (can be done directly on
> >   boot)
> >2. use e.g. a serial connection to move IP from an assigned device to pv nic
> >3. maybe move the mac as well
> >4. eject the assigned device
> >5. detect eject on host (QEMU generates a DEVICE_DELETED event when this
> >happens) and start migration
> >
> 
> This looks like the bonding driver solution

Why does it? Unlike bonding, this doesn't touch data path or
any kernel code. Just run a script from guest agent.

> which put pv nic and VF
> in one bonded interface under active-backup mode. The bonding driver
> will switch from VF to PV nic automatically when VF is unplugged during
> migration. This is the only available solution for VF NIC migration.

It really isn't. For one, there is also teaming.

> But
> it requires guest OS to do specific configurations inside and rely on
> bonding driver which blocks it work on Windows.
> From performance side,
> putting VF and virtio NIC under bonded interface will affect their
> performance even when not do migration. These factors block to use VF
> NIC passthough in some user cases(Especially in the cloud) which require
> migration.

That's really up to guest. You don't need to do bonding,
you can just move the IP and mac from userspace, that's
possible on most OS-es.

Or write something in guest kernel that is more lightweight if you are
so inclined. What we are discussing here is the host-guest interface,
not the in-guest interface.

> Current solution we proposed changes NIC driver and Qemu. Guest Os
> doesn't need to do special thing for migration.
> It's easy to deploy


Except of course these patches don't even work properly yet.

And when they do, even minor changes in host side NIC hardware across
migration will break guests in hard to predict ways.

> and
> all changes are in the NIC driver, NIC vendor can implement migration
> support just in the their driver.

Kernel code and hypervisor code is not easier to develop and deploy than
a userspace script.  If that is all the motivation there is, that's a
pretty small return on investment.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 08/21] arm64: KVM: Implement debug save/restore

2015-12-01 Thread Marc Zyngier
On 01/12/15 14:47, Christoffer Dall wrote:
> On Tue, Dec 01, 2015 at 01:06:31PM +, Marc Zyngier wrote:
>> On 01/12/15 12:56, Christoffer Dall wrote:
>>> On Fri, Nov 27, 2015 at 06:50:02PM +, Marc Zyngier wrote:
 Implement the debug save restore as a direct translation of
 the assembly code version.

 Signed-off-by: Marc Zyngier 
 ---
  arch/arm64/kvm/hyp/Makefile   |   1 +
  arch/arm64/kvm/hyp/debug-sr.c | 130 
 ++
  arch/arm64/kvm/hyp/hyp.h  |   9 +++
  3 files changed, 140 insertions(+)
  create mode 100644 arch/arm64/kvm/hyp/debug-sr.c

 diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
 index ec94200..ec14cac 100644
 --- a/arch/arm64/kvm/hyp/Makefile
 +++ b/arch/arm64/kvm/hyp/Makefile
 @@ -6,3 +6,4 @@ obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o
  obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o
  obj-$(CONFIG_KVM_ARM_HOST) += timer-sr.o
  obj-$(CONFIG_KVM_ARM_HOST) += sysreg-sr.o
 +obj-$(CONFIG_KVM_ARM_HOST) += debug-sr.o
 diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c
 new file mode 100644
 index 000..a0b2b99
 --- /dev/null
 +++ b/arch/arm64/kvm/hyp/debug-sr.c
 @@ -0,0 +1,130 @@
 +/*
 + * Copyright (C) 2015 - ARM Ltd
 + * Author: Marc Zyngier 
 + *
 + * This program is free software; you can redistribute it and/or modify
 + * it under the terms of the GNU General Public License version 2 as
 + * published by the Free Software Foundation.
 + *
 + * This program is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 + * GNU General Public License for more details.
 + *
 + * You should have received a copy of the GNU General Public License
 + * along with this program.  If not, see .
 + */
 +
 +#include 
 +#include 
 +
 +#include 
 +
 +#include "hyp.h"
 +
 +#define read_debug(r,n)   read_sysreg(r##n##_el1)
 +#define write_debug(v,r,n)write_sysreg(v, r##n##_el1)
 +
 +#define save_debug(ptr,reg,nr)
 \
 +  switch (nr) {   \
 +  case 15:ptr[15] = read_debug(reg, 15);  \
 +  case 14:ptr[14] = read_debug(reg, 14);  \
 +  case 13:ptr[13] = read_debug(reg, 13);  \
 +  case 12:ptr[12] = read_debug(reg, 12);  \
 +  case 11:ptr[11] = read_debug(reg, 11);  \
 +  case 10:ptr[10] = read_debug(reg, 10);  \
 +  case 9: ptr[9] = read_debug(reg, 9);\
 +  case 8: ptr[8] = read_debug(reg, 8);\
 +  case 7: ptr[7] = read_debug(reg, 7);\
 +  case 6: ptr[6] = read_debug(reg, 6);\
 +  case 5: ptr[5] = read_debug(reg, 5);\
 +  case 4: ptr[4] = read_debug(reg, 4);\
 +  case 3: ptr[3] = read_debug(reg, 3);\
 +  case 2: ptr[2] = read_debug(reg, 2);\
 +  case 1: ptr[1] = read_debug(reg, 1);\
 +  default:ptr[0] = read_debug(reg, 0);\
 +  }
 +
 +#define restore_debug(ptr,reg,nr) \
 +  switch (nr) {   \
 +  case 15:write_debug(ptr[15], reg, 15);  \
 +  case 14:write_debug(ptr[14], reg, 14);  \
 +  case 13:write_debug(ptr[13], reg, 13);  \
 +  case 12:write_debug(ptr[12], reg, 12);  \
 +  case 11:write_debug(ptr[11], reg, 11);  \
 +  case 10:write_debug(ptr[10], reg, 10);  \
 +  case 9: write_debug(ptr[9], reg, 9);\
 +  case 8: write_debug(ptr[8], reg, 8);\
 +  case 7: write_debug(ptr[7], reg, 7);\
 +  case 6: write_debug(ptr[6], reg, 6);\
 +  case 5: write_debug(ptr[5], reg, 5);\
 +  case 4: write_debug(ptr[4], reg, 4);\
 +  case 3: write_debug(ptr[3], reg, 3);\
 +  case 2: write_debug(ptr[2], reg, 2);\
 +  case 1: write_debug(ptr[1], reg, 1);\
 +  default:write_debug(ptr[0], 

Re: [PATCH 00/11] KVM: x86: track guest page access

2015-12-01 Thread Paolo Bonzini


On 01/12/2015 16:02, Andrea Arcangeli wrote:
> > Applying your technique to non-leaf shadow pages actually makes this
> > series quite interesting. :)  Shadow paging is still in use for nested
> > EPT, so it's always a good idea to speed it up.
> 
> I don't have the full picture of how userfaultfd write tracking could
> also fit in the leaf/non-leaf shadow pagetable write tracking yet but
> it's good to think about it.

It's unrelated.

Xiao wrote this series for KVM-GT.  I'm suggesting that he uses
userfaultfd write tracking (or similar techniques---but anyway
implemented out of KVM) for KVM-GT.  The benefit is that KVM-GT is then
unrelated to KVM, similar to legacy KVM device assignment vs. VFIO.

However, he also applied this new API to shadow pagetable write
tracking.  He gets measurable (~2%) performance improvement.  We can
look separately at how to get a similar performance improvement, even if
KVM-GT will not use the new page tracking API.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 08/21] arm64: KVM: Implement debug save/restore

2015-12-01 Thread Alex Bennée

Marc Zyngier  writes:

> On 01/12/15 12:56, Christoffer Dall wrote:
>> On Fri, Nov 27, 2015 at 06:50:02PM +, Marc Zyngier wrote:
>>> Implement the debug save restore as a direct translation of
>>> the assembly code version.
>>>
>>> Signed-off-by: Marc Zyngier 
>>> ---
>>>  arch/arm64/kvm/hyp/Makefile   |   1 +
>>>  arch/arm64/kvm/hyp/debug-sr.c | 130 
>>> ++
>>>  arch/arm64/kvm/hyp/hyp.h  |   9 +++
>>>  3 files changed, 140 insertions(+)
>>>  create mode 100644 arch/arm64/kvm/hyp/debug-sr.c

>>> +void __hyp_text __debug_cond_save_host_state(struct kvm_vcpu *vcpu)
>>> +{
>>> +   if ((vcpu->arch.ctxt.sys_regs[MDSCR_EL1] & DBG_MDSCR_KDE) ||
>>> +   (vcpu->arch.ctxt.sys_regs[MDSCR_EL1] & DBG_MDSCR_KDE))

I've just noticed I'm seeing double here. Did a DBG_MDSCR_MDE can
transliterated here?

>>> +   vcpu->arch.debug_flags |= KVM_ARM64_DEBUG_DIRTY;
>>> +
>>> +   __debug_save_state(vcpu, >arch.host_debug_state,
>>> +  kern_hyp_va(vcpu->arch.host_cpu_context));
>>
>> doesn't the assmebly code jump across saving this state neither bits are
>> set where this always saves the state?
>
> It doesn't. The save/restore functions are guarded by tests on
> KVM_ARM64_DEBUG_DIRTY, just like we have skip_debug_state on all actions
> involving the save/restore in the assembly version.
>
>> in any case, I feel some context is lost when this is moved away from
>> assembly and understanding this patch would be easier if the semantics
>> of these two _cond functions were documented.
>
> I can migrate the existing comments if you think that helps.
>
> Thanks,
>
>   M.


--
Alex Bennée
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 08/21] arm64: KVM: Implement debug save/restore

2015-12-01 Thread Marc Zyngier
On 01/12/15 13:19, Alex Bennée wrote:
> 
> Marc Zyngier  writes:
> 
>> On 01/12/15 12:56, Christoffer Dall wrote:
>>> On Fri, Nov 27, 2015 at 06:50:02PM +, Marc Zyngier wrote:
 Implement the debug save restore as a direct translation of
 the assembly code version.

 Signed-off-by: Marc Zyngier 
 ---
  arch/arm64/kvm/hyp/Makefile   |   1 +
  arch/arm64/kvm/hyp/debug-sr.c | 130 
 ++
  arch/arm64/kvm/hyp/hyp.h  |   9 +++
  3 files changed, 140 insertions(+)
  create mode 100644 arch/arm64/kvm/hyp/debug-sr.c
> 
 +void __hyp_text __debug_cond_save_host_state(struct kvm_vcpu *vcpu)
 +{
 +  if ((vcpu->arch.ctxt.sys_regs[MDSCR_EL1] & DBG_MDSCR_KDE) ||
 +  (vcpu->arch.ctxt.sys_regs[MDSCR_EL1] & DBG_MDSCR_KDE))
> 
> I've just noticed I'm seeing double here. Did a DBG_MDSCR_MDE can
> transliterated here?

Quite probably! Guess there is a small hole in your test suite! ;-)

Thanks for noticing this.

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >