Re: [PATCH 1/2] ARM: hyp-stub: improve ABI

2017-01-09 Thread Russell King - ARM Linux
On Mon, Jan 09, 2017 at 04:01:38PM +0100, Christoffer Dall wrote:
> Nobody is arguing with the need for documentation, and I think I
> understan the reason for needing to support a common ABI with a common
> set of functions regardless of the logic in place in Hyp mode.
> 
> We just have to agree on a sane ABI and I'll be happy to help
> write/review the API docs and clean things up.

Let me make this crystal clear - this is where I am with this problem.

I've identified many issues with the ABI, and that there are several
differnet chunks of code which are a problem.

Although I've proposed a change to the hyp-stub ABI, which has since
been found to be insufficient, right now I have no better suggestions
to make.  I'm operating in an information vaccuum here, and it is not
yet clear to me what changes to the hyp-stub and KVM ABI would be
acceptable.  I don't know what environment the KVM hypervisor operates
in yet, eg whether it can see the kernel, whether it can see the IDMAP
region, etc.

Why would the IDMAP region be important?  The hyp-stub doesn't setup
any page tables, so it seems to operate without any MMU translation.
That rather rules out passing virtual address pointers through a
common hyp-mode interface.

However, it seems that the KVM hypervisor does setup MMU translations,
making virtual addresses acceptable but physical/IDMAP addresses not.

So, right now I've no idea what a replacement ABI would look like.

Hence, the lack of _current_ documentation is hampering the situation.

Now, you've said to me privately that you think my demands for
documentation are unwarranted.  I've been trying to gain enough
understanding that I can move forward, with my requests for
documentation being treated as "we'll do that later."  Well, given
that, it leads me to only one possible outcome.

Enough is enough.  Since there's no movement on the documentation
front, and because you're now saying that my demands for documentation
are unreasonable, I've reached the end of the road.  There's nothing
more that I'm willing to do on this problem - at least not until there's
a change of heart wrt documentation.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/2] ARM: hyp-stub: improve ABI

2017-01-09 Thread Christoffer Dall
On Mon, Jan 09, 2017 at 02:42:35PM +, Russell King - ARM Linux wrote:
> On Mon, Jan 09, 2017 at 02:05:00PM +, Russell King - ARM Linux wrote:
> > So, although Marc produced a patch which updates the KVM hypervisor for
> > the GET_VECTORS change, through reading the code today, it's become clear
> > that much more is needed, so I'm yet again banging on about documentation.
> > It's only become clear to me today that the KVM stub calling convention
> > for the host kernel is:
> > 
> > entry:
> > r0 = function pointer
> > r1 = 32-bit function argument 0
> > r2 = 32-bit function argument 1
> > r3 = 32-bit function argument 2
> > no further arguments are supported
> > --- or ---
> > r0 = -1 (or 0 post Marc's patch) for get_vectors
> > exit:
> > r0 = vectors (if get_vectors call was made)
> > otherwise, who knows...
> 
> Hang on, even this is nowhere near the full picture.
> 
> static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
>unsigned long hyp_stack_ptr,
>unsigned long vector_ptr)
> {
> /*
>  * Call initialization code, and switch to the full blown HYP
>  * code. The init code doesn't need to preserve these
>  * registers as r0-r3 are already callee saved according to
>  * the AAPCS.
>  * Note that we slightly misuse the prototype by casting the
>  * stack pointer to a void *.
> 
>  * The PGDs are always passed as the third argument, in order
>  * to be passed into r2-r3 to the init code (yes, this is
>  * compliant with the PCS!).
>  */
> 
> kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr);
> }
> 
> This results in a completely different calling convention -
> 
>   r0 = hyp_stack_ptr
>   r1 = vector_ptr
>   r2,r3 = pgd_ptr
> 
> Which clearly doesn't fit the KVM hypervisor's calling requirements...
> and, looking deeper at this:
> 
> /* Switch from the HYP stub to our own HYP init vector */
> __hyp_set_vectors(kvm_get_idmap_vector());
> 
> pgd_ptr = kvm_mmu_get_httbr();
> stack_page = __this_cpu_read(kvm_arm_hyp_stack_page);
> hyp_stack_ptr = stack_page + PAGE_SIZE;
> vector_ptr = (unsigned long)kvm_ksym_ref(__kvm_hyp_vector);
> 
> __cpu_init_hyp_mode(pgd_ptr, hyp_stack_ptr, vector_ptr);
> 
> So we actually have _another_ hypervisor stub to care about - should
> anything go wrong between __hyp_set_vectors() and __cpu_init_hyp_mode(),
> we will be hitting the __do_hyp_init assembly code with maybe a get
> vectors or soft reboot call, which, reading the code, would be bad
> news.
> 
> Since this code is run at several different times - CPU hotplug (when
> the system will be quiescent) and also cpuidle PM (when the system is
> not quiescent).  With kdump/kexec, I think this could be racy.
> Certainly if anything were to go wrong between the two with a kdump
> kernel in place, we'd be making HVC calls to the KVM init stub and
> expecting them to work.
> 
Indeed it looks like interrupts are enabled during cpu_init_hyp_mode,
and if it's possible to be preempted there or if kdump can be initiated
from interrupt context, this could go wrong, so you're probably right
that we need to support a common hyp-ABI for the kernel hyp stub, the
KVM stub (a.k.a. the trampoline code), and KVM's hyp layer itself.

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC 00/55] Nested Virtualization on KVM/ARM

2017-01-09 Thread David Hildenbrand



Even though this work is not complete (see limitations below), I'd appreciate
early feedback on this RFC. Specifically, I'm interested in:
- Is it better to have a kernel config or to make it configurable at runtime?


x86 and s390x have a kernel module parameter (nested) that can only be
changed when loading the module and should default to false. So the
admin explicitly has to enable it. Maybe going the same path makes
sense.

--

David
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/2] ARM: hyp-stub: improve ABI

2017-01-09 Thread Christoffer Dall
On Mon, Jan 09, 2017 at 02:05:00PM +, Russell King - ARM Linux wrote:
> On Mon, Jan 09, 2017 at 02:26:36PM +0100, Christoffer Dall wrote:
> > On Mon, Jan 09, 2017 at 12:26:39PM +, Russell King - ARM Linux wrote:
> > > On Tue, Jan 03, 2017 at 10:51:49AM +0100, Christoffer Dall wrote:
> > > > Hi Russell,
> > > > 
> > > > On Thu, Dec 15, 2016 at 06:57:18PM +, Russell King - ARM Linux 
> > > > wrote:
> > > > > What's also coming clear is that there's very few people who 
> > > > > understand
> > > > > all the interactions here, and the whole thing seems to be an 
> > > > > undocumented
> > > > > mess.  
> > > > 
> > > > I think the hyp stub has just served a very limited purpose so far, and
> > > > therefore is a somewhat immature implementation.  Now we've discovered a
> > > > need to clean it up, and we're all for that.  Again, I don't think the
> > > > problem is any larger than that, we just need to fix it, and it seems to
> > > > me everyone is willing to work on that.
> > > 
> > > What I want to see is some documentation of the hyp-stub, so that there
> > > can be some element of confidence that changes there are properly
> > > coordinated.  As I said in a follow up email:
> > > 
> > > | Either we need more people to have an understanding (so if one of them
> > > | gets run over by a bus, we're not left floundering around) or we need
> > > | it to be documented - even if it's just a simple comment "the ABI in
> > > | this file is shared with XYZ, if you change the ABI here, also update
> > > | XYZ too."
> > > 
> > > > Marc even offered to work on your suggestion to support the general
> > > > hyp ABI commands in KVM.
> > > 
> > > ... which is pointless, because it's a duplication of the effort I've
> > > already put in.  My patches already do the:
> > > 
> > > #define HVC_GET_VECTORS 0
> > > #define HVC_SET_VECTORS 1
> > > #define HVC_SOFT_RESTART 2
> > > 
> > > thing which ARM64 does, passing the arguments in via the appropriate
> > > registers.  However, such a change is a major revision of hyp-stub's
> > > ABI, which completely changes the way it works.
> > 
> > Sorry, I'm afraid I might have been unclear.  What I meant with "general
> > hyp ABI commands in KVM" was, that if there's a need for KVM to support
> > the operations (using a unified and documented ABI) that the hyp stub
> > supports, then we could add that in KVM as well.  I thought your patches
> > added the functionality for the hyp stub, and Marc would add whichever
> > remaining pieces in the KVM side.
> 
> The view over Christmas was "we only need to ensure the GET_VECTORS call
> continues to work", which is what Marc's patch does.  I've come to
> realise (through no help of the documentation situation) that that is
> far from the full story, because of the way kdump works.
> 
> Let me refresh...
> 
> In normal kexec(), kernel_restart_prepare() is called, which calls the
> reboot_notifier_list.  KVM uses this via kvm_reboot() and
> kvm_arch_hardware_disable() to call cpu_hyp_reset(), and in turn
> __kvm_hyp_reset in HYP mode.  That sets the stub vectors back to the
> hyp-stub implementation.  So, normal kexec should work fine.
> 
> However, kernel_restart_prepare() is _not_ called in the kdump case.

Thanks for this, it was slightly unclear to me in which way kdump
different from the other cases, but makes sense now.

> So, with my two patches in place, we will issue a HVC call with the
> arguments that I've given to whatever hypervisor is in place at the
> time, and as I've pointed out, with the KVM hypervisor, we will try to
> branch to address 2 or 3 - who knows what effect that'll have.
> 
> So, although Marc produced a patch which updates the KVM hypervisor for
> the GET_VECTORS change, through reading the code today, it's become clear
> that much more is needed, so I'm yet again banging on about documentation.
> It's only become clear to me today that the KVM stub calling convention
> for the host kernel is:

Just for clarity: I understood that Marc said he was willing to write
the code to support the common ABI that we agree on in KVM.  The fact
that he sent another small patch was a separate contribution, but that's
how I understood. it.

> 
> entry:
>   r0 = function pointer
>   r1 = 32-bit function argument 0
>   r2 = 32-bit function argument 1
>   r3 = 32-bit function argument 2
>   no further arguments are supported
>   --- or ---
>   r0 = -1 (or 0 post Marc's patch) for get_vectors
> exit:
>   r0 = vectors (if get_vectors call was made)
>   otherwise, who knows...
> 
> I specify "32-bit" there because they're shifted by one register, which,
> if a 64-bit argument is passed with EABI, the arguments will no longer be
> appropriately aligned... so it's an important detail to be aware of with
> the current KVM hypervisor interface.
> 
> What I want to do here is to fix this kexec issue completely, not in a
> piecemeal fashion - I'm not interested in fixing one small problem, 

Re: [PATCH 1/2] ARM: hyp-stub: improve ABI

2017-01-09 Thread Russell King - ARM Linux
On Mon, Jan 09, 2017 at 02:05:00PM +, Russell King - ARM Linux wrote:
> So, although Marc produced a patch which updates the KVM hypervisor for
> the GET_VECTORS change, through reading the code today, it's become clear
> that much more is needed, so I'm yet again banging on about documentation.
> It's only become clear to me today that the KVM stub calling convention
> for the host kernel is:
> 
> entry:
>   r0 = function pointer
>   r1 = 32-bit function argument 0
>   r2 = 32-bit function argument 1
>   r3 = 32-bit function argument 2
>   no further arguments are supported
>   --- or ---
>   r0 = -1 (or 0 post Marc's patch) for get_vectors
> exit:
>   r0 = vectors (if get_vectors call was made)
>   otherwise, who knows...

Hang on, even this is nowhere near the full picture.

static inline void __cpu_init_hyp_mode(phys_addr_t pgd_ptr,
   unsigned long hyp_stack_ptr,
   unsigned long vector_ptr)
{
/*
 * Call initialization code, and switch to the full blown HYP
 * code. The init code doesn't need to preserve these
 * registers as r0-r3 are already callee saved according to
 * the AAPCS.
 * Note that we slightly misuse the prototype by casting the
 * stack pointer to a void *.

 * The PGDs are always passed as the third argument, in order
 * to be passed into r2-r3 to the init code (yes, this is
 * compliant with the PCS!).
 */

kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr);
}

This results in a completely different calling convention -

r0 = hyp_stack_ptr
r1 = vector_ptr
r2,r3 = pgd_ptr

Which clearly doesn't fit the KVM hypervisor's calling requirements...
and, looking deeper at this:

/* Switch from the HYP stub to our own HYP init vector */
__hyp_set_vectors(kvm_get_idmap_vector());

pgd_ptr = kvm_mmu_get_httbr();
stack_page = __this_cpu_read(kvm_arm_hyp_stack_page);
hyp_stack_ptr = stack_page + PAGE_SIZE;
vector_ptr = (unsigned long)kvm_ksym_ref(__kvm_hyp_vector);

__cpu_init_hyp_mode(pgd_ptr, hyp_stack_ptr, vector_ptr);

So we actually have _another_ hypervisor stub to care about - should
anything go wrong between __hyp_set_vectors() and __cpu_init_hyp_mode(),
we will be hitting the __do_hyp_init assembly code with maybe a get
vectors or soft reboot call, which, reading the code, would be bad
news.

Since this code is run at several different times - CPU hotplug (when
the system will be quiescent) and also cpuidle PM (when the system is
not quiescent).  With kdump/kexec, I think this could be racy.
Certainly if anything were to go wrong between the two with a kdump
kernel in place, we'd be making HVC calls to the KVM init stub and
expecting them to work.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/2] ARM: hyp-stub: improve ABI

2017-01-09 Thread Catalin Marinas
On Mon, Jan 09, 2017 at 12:54:31PM +, Russell King - ARM Linux wrote:
> So, we need KVM's stub to be (a) better documented so this stuff is
> obvious, and (b) updated so that kdump stands a chance of working even
> if the KVM stub is still in place at the point the host kernel panics.
> 
> Another reason why documentation is important here is that we need to
> make it clear to alternative hypervisors that the host kernel may issue
> a HVC call at any moment due to a crash with particular arguments, and
> that the host kernel expects a certain behaviour in that case, and that
> the hypervisor does not crash.

The only hypervisor (apart from the hyp-stub) built and deployed
together with the kernel is KVM. On ARM, to be able to enable KVM, the
host kernel must be booted in Hyp mode and install the stub before
dropping to SVC.

With Xen (or a different Type-1 hypervisor), the "host" kernel (dom0 for
Xen, a.k.a. control domain) is booted in SVC mode directly, so hyp-stub
is not installed and is_hyp_mode_available() returns false.

> For example, how will Xen behave - is introducing these changes going
> to cause a regression with Xen?  Does anyone even know the answer to
> that?  From what I can see, it seems we'll end up calling Xen's
> hypervisor with a random r12 value (which it uses as a reason code)
> but without the 0xea1 immediate constant in the HVC instruction.
> Beyond that, I've no idea.

Any HVC calls from the control domain kernel must comply with the ABI
offered by the corresponding hypervisor and has nothing to do with the
hyp-stub ABI. Routing hyp-stub ABI HVC calls to an unaware hypervisor
like Xen as part of kdump/kexec is a kernel bug and would probably
result in the kernel being killed. I haven't tried but kexec in a host
kernel under Xen should work just like kexec in any other guest kernel.

-- 
Catalin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/2] ARM: hyp-stub: improve ABI

2017-01-09 Thread Russell King - ARM Linux
On Mon, Jan 09, 2017 at 02:05:00PM +, Russell King - ARM Linux wrote:
> So, although Marc produced a patch which updates the KVM hypervisor for
> the GET_VECTORS change, through reading the code today, it's become clear
> that much more is needed, so I'm yet again banging on about documentation.
> It's only become clear to me today that the KVM stub calling convention
> for the host kernel is:
> 
> entry:
>   r0 = function pointer
>   r1 = 32-bit function argument 0
>   r2 = 32-bit function argument 1
>   r3 = 32-bit function argument 2
>   no further arguments are supported
>   --- or ---
>   r0 = -1 (or 0 post Marc's patch) for get_vectors
> exit:
>   r0 = vectors (if get_vectors call was made)
>   otherwise, who knows...
> 
> I specify "32-bit" there because they're shifted by one register, which,
> if a 64-bit argument is passed with EABI, the arguments will no longer be
> appropriately aligned... so it's an important detail to be aware of with
> the current KVM hypervisor interface.
> 
> What I want to do here is to fix this kexec issue completely, not in a
> piecemeal fashion - I'm not interested in fixing one small problem, then
> coming back to it in a few months time to fix another problem.  That's a
> waste of time (well, unless you're into job creation.)  I've always been
> for "if you're going to do the job, damn well do the job properly".  So
> I'm not going to accept anything short of fixing _both_ kexec and kdump
> together.
> 
> So, given that the hyp-stub has this ABI after my patches:
> 
> entry:
>   r0 = argument (0 = get vectors, 1 = set vectors, 2 = call function)
>   r1 = vectors for r0 = 1
>   r3 = function pointer (with bit 0 already set for thumb functions)
>for r0 = 2
> exit:
>   r0 = -1 for invalid calls
>   r0 = current vectors address (for r0 = 0 on entry)
>   is not expected to return for r0 = 2 on entry
>   otherwise registers preserved preserved
> 
> which is clearly incompatible with the current KVM stub, can we come up
> with a common ABI that is satisfactory to both.
> 
> The above are probably the very first time anyone has written out the
> ABI of these things, and as can be seen, it's still something of a mess.

For completeness, this is the existing hyp-stub ABI:

entry:
r0 = -1 => get_vectors
r0 != -1 => set_vectors (to the value in r0)
exit:
r0 = current vector address

And that's it.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/2] ARM: hyp-stub: improve ABI

2017-01-09 Thread Russell King - ARM Linux
On Mon, Jan 09, 2017 at 02:26:36PM +0100, Christoffer Dall wrote:
> On Mon, Jan 09, 2017 at 12:26:39PM +, Russell King - ARM Linux wrote:
> > On Tue, Jan 03, 2017 at 10:51:49AM +0100, Christoffer Dall wrote:
> > > Hi Russell,
> > > 
> > > On Thu, Dec 15, 2016 at 06:57:18PM +, Russell King - ARM Linux wrote:
> > > > What's also coming clear is that there's very few people who understand
> > > > all the interactions here, and the whole thing seems to be an 
> > > > undocumented
> > > > mess.  
> > > 
> > > I think the hyp stub has just served a very limited purpose so far, and
> > > therefore is a somewhat immature implementation.  Now we've discovered a
> > > need to clean it up, and we're all for that.  Again, I don't think the
> > > problem is any larger than that, we just need to fix it, and it seems to
> > > me everyone is willing to work on that.
> > 
> > What I want to see is some documentation of the hyp-stub, so that there
> > can be some element of confidence that changes there are properly
> > coordinated.  As I said in a follow up email:
> > 
> > | Either we need more people to have an understanding (so if one of them
> > | gets run over by a bus, we're not left floundering around) or we need
> > | it to be documented - even if it's just a simple comment "the ABI in
> > | this file is shared with XYZ, if you change the ABI here, also update
> > | XYZ too."
> > 
> > > Marc even offered to work on your suggestion to support the general
> > > hyp ABI commands in KVM.
> > 
> > ... which is pointless, because it's a duplication of the effort I've
> > already put in.  My patches already do the:
> > 
> > #define HVC_GET_VECTORS 0
> > #define HVC_SET_VECTORS 1
> > #define HVC_SOFT_RESTART 2
> > 
> > thing which ARM64 does, passing the arguments in via the appropriate
> > registers.  However, such a change is a major revision of hyp-stub's
> > ABI, which completely changes the way it works.
> 
> Sorry, I'm afraid I might have been unclear.  What I meant with "general
> hyp ABI commands in KVM" was, that if there's a need for KVM to support
> the operations (using a unified and documented ABI) that the hyp stub
> supports, then we could add that in KVM as well.  I thought your patches
> added the functionality for the hyp stub, and Marc would add whichever
> remaining pieces in the KVM side.

The view over Christmas was "we only need to ensure the GET_VECTORS call
continues to work", which is what Marc's patch does.  I've come to
realise (through no help of the documentation situation) that that is
far from the full story, because of the way kdump works.

Let me refresh...

In normal kexec(), kernel_restart_prepare() is called, which calls the
reboot_notifier_list.  KVM uses this via kvm_reboot() and
kvm_arch_hardware_disable() to call cpu_hyp_reset(), and in turn
__kvm_hyp_reset in HYP mode.  That sets the stub vectors back to the
hyp-stub implementation.  So, normal kexec should work fine.

However, kernel_restart_prepare() is _not_ called in the kdump case.
So, with my two patches in place, we will issue a HVC call with the
arguments that I've given to whatever hypervisor is in place at the
time, and as I've pointed out, with the KVM hypervisor, we will try to
branch to address 2 or 3 - who knows what effect that'll have.

So, although Marc produced a patch which updates the KVM hypervisor for
the GET_VECTORS change, through reading the code today, it's become clear
that much more is needed, so I'm yet again banging on about documentation.
It's only become clear to me today that the KVM stub calling convention
for the host kernel is:

entry:
r0 = function pointer
r1 = 32-bit function argument 0
r2 = 32-bit function argument 1
r3 = 32-bit function argument 2
no further arguments are supported
--- or ---
r0 = -1 (or 0 post Marc's patch) for get_vectors
exit:
r0 = vectors (if get_vectors call was made)
otherwise, who knows...

I specify "32-bit" there because they're shifted by one register, which,
if a 64-bit argument is passed with EABI, the arguments will no longer be
appropriately aligned... so it's an important detail to be aware of with
the current KVM hypervisor interface.

What I want to do here is to fix this kexec issue completely, not in a
piecemeal fashion - I'm not interested in fixing one small problem, then
coming back to it in a few months time to fix another problem.  That's a
waste of time (well, unless you're into job creation.)  I've always been
for "if you're going to do the job, damn well do the job properly".  So
I'm not going to accept anything short of fixing _both_ kexec and kdump
together.

So, given that the hyp-stub has this ABI after my patches:

entry:
r0 = argument (0 = get vectors, 1 = set vectors, 2 = call function)
r1 = vectors for r0 = 1
r3 = function pointer (with bit 0 already set for thumb functions)
 for r0 = 2
exit:
r0 = -1 for invalid calls
  

Re: [PATCH 1/2] ARM: hyp-stub: improve ABI

2017-01-09 Thread Marc Zyngier
On 09/01/17 13:20, Russell King - ARM Linux wrote:
> On Mon, Jan 09, 2017 at 01:14:31PM +, Marc Zyngier wrote:
>> On 09/01/17 12:54, Russell King - ARM Linux wrote:
>>> I don't think this is sufficient - the kdump case for ARM will still be
>>> broken after these patches.
>>>
>>> The new soft-restart ABI introduced by my patch 2 passes in:
>>>
>>> r0 = HVC_SOFT_RESTART
>>> r1 = non-zero
>>> r2 = undefined
>>> r3 = function pointer
>>>
>>> and the assumption is that r3 will be preserved if the HVC call does
>>> nothing - which probably isn't a safe assumption.
>>>
>>> With these arguments passed to the KVM stub (which may be in place at
>>> the point of a kdump), we end up executing this code:
>>>
>>> push{lr}
>>>
>>> mov lr, r0
>>> mov r0, r1
>>> mov r1, r2
>>> mov r2, r3
>>>
>>> THUMB(  orr lr, #1)
>>> blx lr  @ Call the HYP function
>>>
>>> This will result in an attempt to branch to address 2 or 3, which isn't
>>> sane - a panic in the host kernel (eg due to a NULL pointer deref with
>>> panic_on_oops enabled) will then cause kdump to try to execute code from
>>> a stupid address.
>>>
>>> So, we need KVM's stub to be (a) better documented so this stuff is
>>> obvious, and (b) updated so that kdump stands a chance of working even
>>> if the KVM stub is still in place at the point the host kernel panics.
>>>
>>> Another reason why documentation is important here is that we need to
>>> make it clear to alternative hypervisors that the host kernel may issue
>>> a HVC call at any moment due to a crash with particular arguments, and
>>> that the host kernel expects a certain behaviour in that case, and that
>>> the hypervisor does not crash.
>>>
>>> For example, how will Xen behave - is introducing these changes going
>>> to cause a regression with Xen?  Does anyone even know the answer to
>>> that?  From what I can see, it seems we'll end up calling Xen's
>>> hypervisor with a random r12 value (which it uses as a reason code)
>>> but without the 0xea1 immediate constant in the HVC instruction.
>>> Beyond that, I've no idea.
>>
>> I fail to see why you would issue a hyp stub hypercall if you're booted
>> under *any* hypervisor. The only way you can have a valid hyp stub is
>> because the kernel was booted at EL2/HYP. If you're running under Xen,
>> KVM or whatever other hypervisor, you're booted at EL1/SVC, you only
>> have the hypercall API exposed by that hypervisor, and nothing else. You
>> can't even install the stub the first place.
>>
>> So any code path that tries to tear down KVM would better check that the
>> kernel was entered at HYP. If it doesn't, it is broken.
> 
> Let me refresh your memory.  This thread is about kexec/kdump on 32-bit
> ARM of the _host_ kernel, which will be entered in HYP mode.
>
> I've never used Xen, so I've no idea about it.  I'm merely pointing out
> the possibilities as I see them.
> 
> Let me repeat the on-going theme here.  Documentation.  Documentation.
> Documentation.  Documentation.  Documentation.  Documentation.
> Documentation.  Documentation.  Documentation.  Documentation.
> Documentation.  Documentation.  Documentation.  Documentation.
> Documentation.  Documentation.  Documentation.  Documentation.
> Documentation.  Documentation.  Documentation.  Documentation.
> Documentation.  Documentation.  Documentation.  Documentation.
> 
> Is the message getting through yet?  Can you see the waste of time
> the lack of documentation is having yet - not only my time, but your
> time replying to these emails.

I'm all for more documentation. Once we agree on what the API is going
to be (as you're about to change it), I'll be happy to write something up.

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/2] ARM: hyp-stub: improve ABI

2017-01-09 Thread Christoffer Dall
On Mon, Jan 09, 2017 at 12:26:39PM +, Russell King - ARM Linux wrote:
> On Tue, Jan 03, 2017 at 10:51:49AM +0100, Christoffer Dall wrote:
> > Hi Russell,
> > 
> > On Thu, Dec 15, 2016 at 06:57:18PM +, Russell King - ARM Linux wrote:
> > > What's also coming clear is that there's very few people who understand
> > > all the interactions here, and the whole thing seems to be an undocumented
> > > mess.  
> > 
> > I think the hyp stub has just served a very limited purpose so far, and
> > therefore is a somewhat immature implementation.  Now we've discovered a
> > need to clean it up, and we're all for that.  Again, I don't think the
> > problem is any larger than that, we just need to fix it, and it seems to
> > me everyone is willing to work on that.
> 
> What I want to see is some documentation of the hyp-stub, so that there
> can be some element of confidence that changes there are properly
> coordinated.  As I said in a follow up email:
> 
> | Either we need more people to have an understanding (so if one of them
> | gets run over by a bus, we're not left floundering around) or we need
> | it to be documented - even if it's just a simple comment "the ABI in
> | this file is shared with XYZ, if you change the ABI here, also update
> | XYZ too."
> 
> > Marc even offered to work on your suggestion to support the general
> > hyp ABI commands in KVM.
> 
> ... which is pointless, because it's a duplication of the effort I've
> already put in.  My patches already do the:
> 
> #define HVC_GET_VECTORS 0
> #define HVC_SET_VECTORS 1
> #define HVC_SOFT_RESTART 2
> 
> thing which ARM64 does, passing the arguments in via the appropriate
> registers.  However, such a change is a major revision of hyp-stub's
> ABI, which completely changes the way it works.

Sorry, I'm afraid I might have been unclear.  What I meant with "general
hyp ABI commands in KVM" was, that if there's a need for KVM to support
the operations (using a unified and documented ABI) that the hyp stub
supports, then we could add that in KVM as well.  I thought your patches
added the functionality for the hyp stub, and Marc would add whichever
remaining pieces in the KVM side.


[...]

> 
> Longer term, I'd like to see the existing hypervisor documentation in
> Documentation/virtual/kvm/hypercalls.txt updated with the ARM details.
> According to that document, KVM effectively only exists on PPC and x86
> at the present time...
> 

I'm afraid I don't think this is right place to document this behavior.

There's a difference between an internal ABI between code running in two
CPU modes but both part of the same kernel, and a hypervisor running
a guest OS on top.

I believe that Documentation/virtual/kvm/hypercalls.txt documents the
latter case (i.e. guest hypercalls supported by the KVM host
hypervisor), not the former case, and these things should not be
combined.

I would suggest adding something like
Documentation/virtual/kvm/arm/hyp-abi.txt instead.

-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/2] ARM: hyp-stub: improve ABI

2017-01-09 Thread Russell King - ARM Linux
On Mon, Jan 09, 2017 at 01:14:31PM +, Marc Zyngier wrote:
> On 09/01/17 12:54, Russell King - ARM Linux wrote:
> > I don't think this is sufficient - the kdump case for ARM will still be
> > broken after these patches.
> > 
> > The new soft-restart ABI introduced by my patch 2 passes in:
> > 
> > r0 = HVC_SOFT_RESTART
> > r1 = non-zero
> > r2 = undefined
> > r3 = function pointer
> > 
> > and the assumption is that r3 will be preserved if the HVC call does
> > nothing - which probably isn't a safe assumption.
> > 
> > With these arguments passed to the KVM stub (which may be in place at
> > the point of a kdump), we end up executing this code:
> > 
> > push{lr}
> > 
> > mov lr, r0
> > mov r0, r1
> > mov r1, r2
> > mov r2, r3
> > 
> > THUMB(  orr lr, #1)
> > blx lr  @ Call the HYP function
> > 
> > This will result in an attempt to branch to address 2 or 3, which isn't
> > sane - a panic in the host kernel (eg due to a NULL pointer deref with
> > panic_on_oops enabled) will then cause kdump to try to execute code from
> > a stupid address.
> > 
> > So, we need KVM's stub to be (a) better documented so this stuff is
> > obvious, and (b) updated so that kdump stands a chance of working even
> > if the KVM stub is still in place at the point the host kernel panics.
> > 
> > Another reason why documentation is important here is that we need to
> > make it clear to alternative hypervisors that the host kernel may issue
> > a HVC call at any moment due to a crash with particular arguments, and
> > that the host kernel expects a certain behaviour in that case, and that
> > the hypervisor does not crash.
> > 
> > For example, how will Xen behave - is introducing these changes going
> > to cause a regression with Xen?  Does anyone even know the answer to
> > that?  From what I can see, it seems we'll end up calling Xen's
> > hypervisor with a random r12 value (which it uses as a reason code)
> > but without the 0xea1 immediate constant in the HVC instruction.
> > Beyond that, I've no idea.
> 
> I fail to see why you would issue a hyp stub hypercall if you're booted
> under *any* hypervisor. The only way you can have a valid hyp stub is
> because the kernel was booted at EL2/HYP. If you're running under Xen,
> KVM or whatever other hypervisor, you're booted at EL1/SVC, you only
> have the hypercall API exposed by that hypervisor, and nothing else. You
> can't even install the stub the first place.
> 
> So any code path that tries to tear down KVM would better check that the
> kernel was entered at HYP. If it doesn't, it is broken.

Let me refresh your memory.  This thread is about kexec/kdump on 32-bit
ARM of the _host_ kernel, which will be entered in HYP mode.

I've never used Xen, so I've no idea about it.  I'm merely pointing out
the possibilities as I see them.

Let me repeat the on-going theme here.  Documentation.  Documentation.
Documentation.  Documentation.  Documentation.  Documentation.
Documentation.  Documentation.  Documentation.  Documentation.
Documentation.  Documentation.  Documentation.  Documentation.
Documentation.  Documentation.  Documentation.  Documentation.
Documentation.  Documentation.  Documentation.  Documentation.
Documentation.  Documentation.  Documentation.  Documentation.

Is the message getting through yet?  Can you see the waste of time
the lack of documentation is having yet - not only my time, but your
time replying to these emails.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/2] ARM: hyp-stub: improve ABI

2017-01-09 Thread Marc Zyngier
On 09/01/17 12:54, Russell King - ARM Linux wrote:
> On Thu, Dec 15, 2016 at 11:18:48AM +, Marc Zyngier wrote:
>> On 14/12/16 10:46, Russell King wrote:
>>> Improve the hyp-stub ABI to allow it to do more than just get/set the
>>> vectors.  We follow the example in ARM64, where r0 is used as an opcode
>>> with the other registers as an argument.
>>>
>>> Signed-off-by: Russell King 
>>> ---
>>>  arch/arm/kernel/hyp-stub.S | 27 ++-
>>>  1 file changed, 22 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arch/arm/kernel/hyp-stub.S b/arch/arm/kernel/hyp-stub.S
>>> index 15d073ae5da2..f3e9ba5fb642 100644
>>> --- a/arch/arm/kernel/hyp-stub.S
>>> +++ b/arch/arm/kernel/hyp-stub.S
>>> @@ -22,6 +22,9 @@
>>>  #include 
>>>  #include 
>>>  
>>> +#define HVC_GET_VECTORS 0
>>> +#define HVC_SET_VECTORS 1
>>> +
>>>  #ifndef ZIMAGE
>>>  /*
>>>   * For the kernel proper, we need to find out the CPU boot mode long after
>>> @@ -202,9 +205,19 @@ ARM_BE8(orrr7, r7, #(1 << 25)) @ HSCTLR.EE
>>>  ENDPROC(__hyp_stub_install_secondary)
>>>  
>>>  __hyp_stub_do_trap:
>>> -   cmp r0, #-1
>>> -   mrceq   p15, 4, r0, c12, c0, 0  @ get HVBAR
>>> -   mcrne   p15, 4, r0, c12, c0, 0  @ set HVBAR
>>> +   teq r0, #HVC_GET_VECTORS
>>> +   bne 1f
>>> +   mrc p15, 4, r0, c12, c0, 0  @ get HVBAR
>>> +   b   __hyp_stub_exit
>>> +
>>> +1: teq r0, #HVC_SET_VECTORS
>>> +   bne 1f
>>> +   mcr p15, 4, r1, c12, c0, 0  @ set HVBAR
>>> +   b   __hyp_stub_exit
>>> +
>>> +1: mov r0, #-1
>>> +
>>> +__hyp_stub_exit:
>>> __ERET
>>>  ENDPROC(__hyp_stub_do_trap)
>>>  
>>> @@ -231,10 +244,14 @@ ENDPROC(__hyp_stub_do_trap)
>>>   * initialisation entry point.
>>>   */
>>>  ENTRY(__hyp_get_vectors)
>>> -   mov r0, #-1
>>> +   mov r0, #HVC_GET_VECTORS
>>
>> This breaks the KVM implementation of __hyp_get_vectors, easily fixed
>> with the following patchlet:
>>
>> diff --git a/arch/arm/include/asm/virt.h b/arch/arm/include/asm/virt.h
>> index a2e75b8..0fe637e 100644
>> --- a/arch/arm/include/asm/virt.h
>> +++ b/arch/arm/include/asm/virt.h
>> @@ -89,6 +89,14 @@ extern char __hyp_text_start[];
>>  extern char __hyp_text_end[];
>>  #endif
>>  
>> +#else
>> +
>> +/* Only assembly code should need those */
>> +
>> +#define HVC_GET_VECTORS 0
>> +#define HVC_SET_VECTORS 1
>> +#define HVC_SOFT_RESTART 2
>> +
>>  #endif /* __ASSEMBLY__ */
>>  
>>  #endif /* ! VIRT_H */
>> diff --git a/arch/arm/kernel/hyp-stub.S b/arch/arm/kernel/hyp-stub.S
>> index ebc26f8..1c6888f 100644
>> --- a/arch/arm/kernel/hyp-stub.S
>> +++ b/arch/arm/kernel/hyp-stub.S
>> @@ -22,10 +22,6 @@
>>  #include 
>>  #include 
>>  
>> -#define HVC_GET_VECTORS 0
>> -#define HVC_SET_VECTORS 1
>> -#define HVC_SOFT_RESTART 2
>> -
>>  #ifndef ZIMAGE
>>  /*
>>   * For the kernel proper, we need to find out the CPU boot mode long after
>> diff --git a/arch/arm/kvm/hyp/hyp-entry.S b/arch/arm/kvm/hyp/hyp-entry.S
>> index 96beb53..1f8db7d 100644
>> --- a/arch/arm/kvm/hyp/hyp-entry.S
>> +++ b/arch/arm/kvm/hyp/hyp-entry.S
>> @@ -127,7 +127,7 @@ hyp_hvc:
>>  pop {r0, r1, r2}
>>  
>>  /* Check for __hyp_get_vectors */
>> -cmp r0, #-1
>> +cmp r0, #HVC_GET_VECTORS
>>  mrceq   p15, 4, r0, c12, c0, 0  @ get HVBAR
>>  beq 1f
> 
> I don't think this is sufficient - the kdump case for ARM will still be
> broken after these patches.
> 
> The new soft-restart ABI introduced by my patch 2 passes in:
> 
> r0 = HVC_SOFT_RESTART
> r1 = non-zero
> r2 = undefined
> r3 = function pointer
> 
> and the assumption is that r3 will be preserved if the HVC call does
> nothing - which probably isn't a safe assumption.
> 
> With these arguments passed to the KVM stub (which may be in place at
> the point of a kdump), we end up executing this code:
> 
> push{lr}
> 
> mov lr, r0
> mov r0, r1
> mov r1, r2
> mov r2, r3
> 
> THUMB(  orr lr, #1)
> blx lr  @ Call the HYP function
> 
> This will result in an attempt to branch to address 2 or 3, which isn't
> sane - a panic in the host kernel (eg due to a NULL pointer deref with
> panic_on_oops enabled) will then cause kdump to try to execute code from
> a stupid address.
> 
> So, we need KVM's stub to be (a) better documented so this stuff is
> obvious, and (b) updated so that kdump stands a chance of working even
> if the KVM stub is still in place at the point the host kernel panics.
> 
> Another reason why documentation is important here is that we need to
> make it clear to alternative hypervisors that the host kernel may issue
> a HVC call at any moment due to a crash with particular arguments, and
> that the host kernel expects a certain behaviour in that case, and that
> the hypervisor does not crash.
> 
> For example, how will Xen behave - is introducing these changes going
> to cause a regression with Xen?  Does anyone even know the answer to
> 

Re: [PATCH 1/2] ARM: hyp-stub: improve ABI

2017-01-09 Thread Russell King - ARM Linux
On Thu, Dec 15, 2016 at 11:18:48AM +, Marc Zyngier wrote:
> On 14/12/16 10:46, Russell King wrote:
> > Improve the hyp-stub ABI to allow it to do more than just get/set the
> > vectors.  We follow the example in ARM64, where r0 is used as an opcode
> > with the other registers as an argument.
> > 
> > Signed-off-by: Russell King 
> > ---
> >  arch/arm/kernel/hyp-stub.S | 27 ++-
> >  1 file changed, 22 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/arm/kernel/hyp-stub.S b/arch/arm/kernel/hyp-stub.S
> > index 15d073ae5da2..f3e9ba5fb642 100644
> > --- a/arch/arm/kernel/hyp-stub.S
> > +++ b/arch/arm/kernel/hyp-stub.S
> > @@ -22,6 +22,9 @@
> >  #include 
> >  #include 
> >  
> > +#define HVC_GET_VECTORS 0
> > +#define HVC_SET_VECTORS 1
> > +
> >  #ifndef ZIMAGE
> >  /*
> >   * For the kernel proper, we need to find out the CPU boot mode long after
> > @@ -202,9 +205,19 @@ ARM_BE8(orrr7, r7, #(1 << 25)) @ HSCTLR.EE
> >  ENDPROC(__hyp_stub_install_secondary)
> >  
> >  __hyp_stub_do_trap:
> > -   cmp r0, #-1
> > -   mrceq   p15, 4, r0, c12, c0, 0  @ get HVBAR
> > -   mcrne   p15, 4, r0, c12, c0, 0  @ set HVBAR
> > +   teq r0, #HVC_GET_VECTORS
> > +   bne 1f
> > +   mrc p15, 4, r0, c12, c0, 0  @ get HVBAR
> > +   b   __hyp_stub_exit
> > +
> > +1: teq r0, #HVC_SET_VECTORS
> > +   bne 1f
> > +   mcr p15, 4, r1, c12, c0, 0  @ set HVBAR
> > +   b   __hyp_stub_exit
> > +
> > +1: mov r0, #-1
> > +
> > +__hyp_stub_exit:
> > __ERET
> >  ENDPROC(__hyp_stub_do_trap)
> >  
> > @@ -231,10 +244,14 @@ ENDPROC(__hyp_stub_do_trap)
> >   * initialisation entry point.
> >   */
> >  ENTRY(__hyp_get_vectors)
> > -   mov r0, #-1
> > +   mov r0, #HVC_GET_VECTORS
> 
> This breaks the KVM implementation of __hyp_get_vectors, easily fixed
> with the following patchlet:
> 
> diff --git a/arch/arm/include/asm/virt.h b/arch/arm/include/asm/virt.h
> index a2e75b8..0fe637e 100644
> --- a/arch/arm/include/asm/virt.h
> +++ b/arch/arm/include/asm/virt.h
> @@ -89,6 +89,14 @@ extern char __hyp_text_start[];
>  extern char __hyp_text_end[];
>  #endif
>  
> +#else
> +
> +/* Only assembly code should need those */
> +
> +#define HVC_GET_VECTORS 0
> +#define HVC_SET_VECTORS 1
> +#define HVC_SOFT_RESTART 2
> +
>  #endif /* __ASSEMBLY__ */
>  
>  #endif /* ! VIRT_H */
> diff --git a/arch/arm/kernel/hyp-stub.S b/arch/arm/kernel/hyp-stub.S
> index ebc26f8..1c6888f 100644
> --- a/arch/arm/kernel/hyp-stub.S
> +++ b/arch/arm/kernel/hyp-stub.S
> @@ -22,10 +22,6 @@
>  #include 
>  #include 
>  
> -#define HVC_GET_VECTORS 0
> -#define HVC_SET_VECTORS 1
> -#define HVC_SOFT_RESTART 2
> -
>  #ifndef ZIMAGE
>  /*
>   * For the kernel proper, we need to find out the CPU boot mode long after
> diff --git a/arch/arm/kvm/hyp/hyp-entry.S b/arch/arm/kvm/hyp/hyp-entry.S
> index 96beb53..1f8db7d 100644
> --- a/arch/arm/kvm/hyp/hyp-entry.S
> +++ b/arch/arm/kvm/hyp/hyp-entry.S
> @@ -127,7 +127,7 @@ hyp_hvc:
>   pop {r0, r1, r2}
>  
>   /* Check for __hyp_get_vectors */
> - cmp r0, #-1
> + cmp r0, #HVC_GET_VECTORS
>   mrceq   p15, 4, r0, c12, c0, 0  @ get HVBAR
>   beq 1f

I don't think this is sufficient - the kdump case for ARM will still be
broken after these patches.

The new soft-restart ABI introduced by my patch 2 passes in:

r0 = HVC_SOFT_RESTART
r1 = non-zero
r2 = undefined
r3 = function pointer

and the assumption is that r3 will be preserved if the HVC call does
nothing - which probably isn't a safe assumption.

With these arguments passed to the KVM stub (which may be in place at
the point of a kdump), we end up executing this code:

push{lr}

mov lr, r0
mov r0, r1
mov r1, r2
mov r2, r3

THUMB(  orr lr, #1)
blx lr  @ Call the HYP function

This will result in an attempt to branch to address 2 or 3, which isn't
sane - a panic in the host kernel (eg due to a NULL pointer deref with
panic_on_oops enabled) will then cause kdump to try to execute code from
a stupid address.

So, we need KVM's stub to be (a) better documented so this stuff is
obvious, and (b) updated so that kdump stands a chance of working even
if the KVM stub is still in place at the point the host kernel panics.

Another reason why documentation is important here is that we need to
make it clear to alternative hypervisors that the host kernel may issue
a HVC call at any moment due to a crash with particular arguments, and
that the host kernel expects a certain behaviour in that case, and that
the hypervisor does not crash.

For example, how will Xen behave - is introducing these changes going
to cause a regression with Xen?  Does anyone even know the answer to
that?  From what I can see, it seems we'll end up calling Xen's
hypervisor with a random r12 value (which it uses as a reason code)
but without the 0xea1 immediate constant in the HVC 

Re: [PATCH 1/2] ARM: hyp-stub: improve ABI

2017-01-09 Thread Russell King - ARM Linux
On Tue, Jan 03, 2017 at 10:51:49AM +0100, Christoffer Dall wrote:
> Hi Russell,
> 
> On Thu, Dec 15, 2016 at 06:57:18PM +, Russell King - ARM Linux wrote:
> > What's also coming clear is that there's very few people who understand
> > all the interactions here, and the whole thing seems to be an undocumented
> > mess.  
> 
> I think the hyp stub has just served a very limited purpose so far, and
> therefore is a somewhat immature implementation.  Now we've discovered a
> need to clean it up, and we're all for that.  Again, I don't think the
> problem is any larger than that, we just need to fix it, and it seems to
> me everyone is willing to work on that.

What I want to see is some documentation of the hyp-stub, so that there
can be some element of confidence that changes there are properly
coordinated.  As I said in a follow up email:

| Either we need more people to have an understanding (so if one of them
| gets run over by a bus, we're not left floundering around) or we need
| it to be documented - even if it's just a simple comment "the ABI in
| this file is shared with XYZ, if you change the ABI here, also update
| XYZ too."

> Marc even offered to work on your suggestion to support the general
> hyp ABI commands in KVM.

... which is pointless, because it's a duplication of the effort I've
already put in.  My patches already do the:

#define HVC_GET_VECTORS 0
#define HVC_SET_VECTORS 1
#define HVC_SOFT_RESTART 2

thing which ARM64 does, passing the arguments in via the appropriate
registers.  However, such a change is a major revision of hyp-stub's
ABI, which completely changes the way it works.

The issue here is that there seems to be only one person, Marc, who
understands that the KVM hypervisor also needs to be updated - everyone
else I've talked to was of the opinion that hyp-stub's ABI is limited
to hyp-stub.S.  That's worse than "I don't know" because it leads to
exactly the situation that my patches created: the two ABIs going out
of step.

What I'm asking for is a patch to add some sort of documentation ASAP
which ensures that this important non-obvious information isn't limited
to just one person.  As I mentioned in the quote above, it can be as
simple as merely pointing each file at its counterparts which have to
be updated at the same time.

Longer term, I'd like to see the existing hypervisor documentation in
Documentation/virtual/kvm/hypercalls.txt updated with the ARM details.
According to that document, KVM effectively only exists on PPC and x86
at the present time...

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC 8/8] KVM: arm/arm64: Emulate the EL1 phys timer register access

2017-01-09 Thread Christoffer Dall
On Mon, Dec 26, 2016 at 12:12:06PM -0500, Jintack Lim wrote:
> Emulate read and write operations to CNTP_TVAL, CNTP_CVAL and CNTP_CTL.
> Now the VM is able to use the EL1 physical timer.
> 
> Signed-off-by: Jintack Lim 
> ---
>  arch/arm64/kvm/sys_regs.c| 35 ---
>  include/kvm/arm_arch_timer.h |  3 +++
>  virt/kvm/arm/arch_timer.c|  4 ++--
>  3 files changed, 37 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index fd9e747..7cef94f 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -824,7 +824,15 @@ static bool access_cntp_tval(struct kvm_vcpu *vcpu,
>   struct sys_reg_params *p,
>   const struct sys_reg_desc *r)
>  {
> - kvm_inject_undefined(vcpu);
> + struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> + cycle_t now = kvm_phys_timer_read();
> +
> + if (p->is_write) {
> + ptimer->cnt_cval = p->regval + now;
> + kvm_timer_emulate(vcpu, ptimer);

Hmm, do we really need those calls here?

I guess I'm a little confused about exactly what the kvm_timer_emulate()
function is supposed to do, and it feels to me like these handlers
should just record what the guest is asking the kernel to do and the
logic of handling the additional timer should be moved into the run path
as much as possible.

Thanks,
-Christoffer

> + } else
> + p->regval = ptimer->cnt_cval - now;
> +
>   return true;
>  }
>  
> @@ -832,7 +840,21 @@ static bool access_cntp_ctl(struct kvm_vcpu *vcpu,
>   struct sys_reg_params *p,
>   const struct sys_reg_desc *r)
>  {
> - kvm_inject_undefined(vcpu);
> + struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> +
> + if (p->is_write) {
> + /* ISTATUS bit is read-only */
> + ptimer->cnt_ctl = p->regval & ~ARCH_TIMER_CTRL_IT_STAT;
> + kvm_timer_emulate(vcpu, ptimer);
> + } else {
> + cycle_t now = kvm_phys_timer_read();
> +
> + p->regval = ptimer->cnt_ctl;
> + /* Set ISTATUS bit if it's expired */
> + if (ptimer->cnt_cval <= now)
> + p->regval |= ARCH_TIMER_CTRL_IT_STAT;
> + }
> +
>   return true;
>  }
>  
> @@ -840,7 +862,14 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu,
>   struct sys_reg_params *p,
>   const struct sys_reg_desc *r)
>  {
> - kvm_inject_undefined(vcpu);
> + struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> +
> + if (p->is_write) {
> + ptimer->cnt_cval = p->regval;
> + kvm_timer_emulate(vcpu, ptimer);
> + } else
> + p->regval = ptimer->cnt_cval;
> +
>   return true;
>  }
>  
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index 04ed9c1..776579b 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -75,6 +75,9 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu,
>  struct arch_timer_context *timer_ctx);
>  void kvm_timer_schedule(struct kvm_vcpu *vcpu);
>  void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
> +void kvm_timer_emulate(struct kvm_vcpu *vcpu, struct arch_timer_context 
> *timer);
> +
> +cycle_t kvm_phys_timer_read(void);
>  
>  void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu);
>  
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index be8d953..7a161f8 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -39,7 +39,7 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
>   vcpu_vtimer(vcpu)->active_cleared_last = false;
>  }
>  
> -static cycle_t kvm_phys_timer_read(void)
> +cycle_t kvm_phys_timer_read(void)
>  {
>   return timecounter->cc->read(timecounter->cc);
>  }
> @@ -258,7 +258,7 @@ static int kvm_timer_update_state(struct kvm_vcpu *vcpu)
>   * Schedule the background timer for the emulated timer. The background timer
>   * runs whenever vcpu is runnable and the timer is not expired.
>   */
> -static void kvm_timer_emulate(struct kvm_vcpu *vcpu,
> +void kvm_timer_emulate(struct kvm_vcpu *vcpu,
>  struct arch_timer_context *timer_ctx)
>  {
>   struct arch_timer_cpu *timer = >arch.timer_cpu;
> -- 
> 1.9.1
> 
> 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC 6/8] KVM: arm/arm64: Update the physical timer interrupt level

2017-01-09 Thread Christoffer Dall
On Mon, Dec 26, 2016 at 12:12:04PM -0500, Jintack Lim wrote:
> Now that we maintain the EL1 physical timer register states of the VM,
> update the physical timer interrupt level along with the virtual one.
> 
> Note that the emulated EL1 physical timer is not mapped to any hardware
> timer, so we let vgic know that.
> 
> With this commit, VMs are able to get the physical timer interrupts
> while they are runnable. But they won't get interrupts once vcpus go to
> sleep since we don't have code to wake vcpus up on the emulated physical
> timer expiration yet.

I'm not sure I understand.  It looks to me like it's the other way
around and that this code supports waking up a sleeping VCPU sooner if
it has a physical timer scheduled, but doesn't yet support causing an
early exit from running the VPCU base on programing the physical timer?


Thanks,
-Christoffer

> 
> Signed-off-by: Jintack Lim 
> ---
>  arch/arm/kvm/arm.c|  3 +-
>  virt/kvm/arm/arch_timer.c | 76 
> ++-
>  2 files changed, 64 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 37d1623..d2dfa32 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -295,7 +295,8 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>  
>  int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
>  {
> - return kvm_timer_should_fire(vcpu, vcpu_vtimer(vcpu));
> + return kvm_timer_should_fire(vcpu, vcpu_vtimer(vcpu)) ||
> + kvm_timer_should_fire(vcpu, vcpu_ptimer(vcpu));
>  }
>  
>  void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index ed80864..aa7e243 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -91,7 +91,8 @@ static void kvm_timer_inject_irq_work(struct work_struct 
> *work)
>   vcpu = container_of(work, struct kvm_vcpu, arch.timer_cpu.expired);
>   vcpu->arch.timer_cpu.armed = false;
>  
> - WARN_ON(!kvm_timer_should_fire(vcpu, vcpu_vtimer(vcpu)));
> + WARN_ON(!kvm_timer_should_fire(vcpu, vcpu_vtimer(vcpu)) &&
> + !kvm_timer_should_fire(vcpu, vcpu_ptimer(vcpu)));
>  
>   /*
>* If the vcpu is blocked we want to wake it up so that it will see
> @@ -130,6 +131,33 @@ static u64 kvm_timer_compute_delta(struct kvm_vcpu *vcpu,
>   return 0;
>  }
>  
> +static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx)
> +{
> + return !(timer_ctx->cnt_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
> + (timer_ctx->cnt_ctl & ARCH_TIMER_CTRL_ENABLE);
> +}
> +
> +/*
> + * Returns minimal timer expiration time in ns among guest timers.
> + * Note that it will return inf time if none of timers can fire.
> + */
> +static u64 kvm_timer_min_block(struct kvm_vcpu *vcpu)
> +{
> + u64 min_virt = ULLONG_MAX, min_phys = ULLONG_MAX;
> + struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> + struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> +
> + if (kvm_timer_irq_can_fire(vtimer))
> + min_virt = kvm_timer_compute_delta(vcpu, vtimer);
> +
> + if (kvm_timer_irq_can_fire(ptimer))
> + min_phys = kvm_timer_compute_delta(vcpu, ptimer);
> +
> + WARN_ON((min_virt == ULLONG_MAX) && (min_phys == ULLONG_MAX));
> +
> + return min(min_virt, min_phys);
> +}
> +
>  static enum hrtimer_restart kvm_timer_expire(struct hrtimer *hrt)
>  {
>   struct arch_timer_cpu *timer;
> @@ -144,7 +172,7 @@ static enum hrtimer_restart kvm_timer_expire(struct 
> hrtimer *hrt)
>* PoV (NTP on the host may have forced it to expire
>* early). If we should have slept longer, restart it.
>*/
> - ns = kvm_timer_compute_delta(vcpu, vcpu_vtimer(vcpu));
> + ns = kvm_timer_min_block(vcpu);
>   if (unlikely(ns)) {
>   hrtimer_forward_now(hrt, ns_to_ktime(ns));
>   return HRTIMER_RESTART;
> @@ -154,12 +182,6 @@ static enum hrtimer_restart kvm_timer_expire(struct 
> hrtimer *hrt)
>   return HRTIMER_NORESTART;
>  }
>  
> -static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx)
> -{
> - return !(timer_ctx->cnt_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
> - (timer_ctx->cnt_ctl & ARCH_TIMER_CTRL_ENABLE);
> -}
> -
>  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu,
>  struct arch_timer_context *timer_ctx)
>  {
> @@ -191,6 +213,21 @@ static void kvm_timer_update_mapped_irq(struct kvm_vcpu 
> *vcpu, bool new_level,
>   WARN_ON(ret);
>  }
>  
> +static void kvm_timer_update_irq(struct kvm_vcpu *vcpu, bool new_level,
> +  struct arch_timer_context *timer)
> +{
> + int ret;
> +
> + BUG_ON(!vgic_initialized(vcpu->kvm));
> +
> + timer->irq.level = new_level;
> + trace_kvm_timer_update_irq(vcpu->vcpu_id, timer->irq.irq,
> +timer->irq.level);
> + ret = kvm_vgic_inject_irq(vcpu->kvm, 

Re: [RFC 7/8] KVM: arm/arm64: Set up a background timer for the physical timer emulation

2017-01-09 Thread Christoffer Dall
On Mon, Dec 26, 2016 at 12:12:05PM -0500, Jintack Lim wrote:
> Set a background timer for the EL1 physical timer emulation while VMs are
> running, so that VMs get interrupts for the physical timer in a timely
> manner.
> 
> We still use just one background timer. When a VM is runnable, we use
> the background timer for the physical timer emulation.  When the VM is
> about to be blocked, we use the background timer to wake up the vcpu at
> the earliest timer expiration among timers the VM is using.
> 
> As a result, the assumption that the background timer is not armed while
> VMs are running does not hold any more. So, remove BUG_ON()s and
> WARN_ON()s accordingly.
> 
> Signed-off-by: Jintack Lim 
> ---
>  virt/kvm/arm/arch_timer.c | 42 +++---
>  1 file changed, 31 insertions(+), 11 deletions(-)
> 
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index aa7e243..be8d953 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -91,9 +91,6 @@ static void kvm_timer_inject_irq_work(struct work_struct 
> *work)
>   vcpu = container_of(work, struct kvm_vcpu, arch.timer_cpu.expired);
>   vcpu->arch.timer_cpu.armed = false;
>  
> - WARN_ON(!kvm_timer_should_fire(vcpu, vcpu_vtimer(vcpu)) &&
> - !kvm_timer_should_fire(vcpu, vcpu_ptimer(vcpu)));
> -

This seems misplaced and has been addressed here:
https://lists.cs.columbia.edu/pipermail/kvmarm/2017-January/022933.html

When you respin you can benefit from basing on that (assuming it gets
acked and goes int).

>   /*
>* If the vcpu is blocked we want to wake it up so that it will see
>* the timer has expired when entering the guest.
> @@ -139,7 +136,6 @@ static bool kvm_timer_irq_can_fire(struct 
> arch_timer_context *timer_ctx)
>  
>  /*
>   * Returns minimal timer expiration time in ns among guest timers.
> - * Note that it will return inf time if none of timers can fire.
>   */
>  static u64 kvm_timer_min_block(struct kvm_vcpu *vcpu)
>  {
> @@ -153,7 +149,9 @@ static u64 kvm_timer_min_block(struct kvm_vcpu *vcpu)
>   if (kvm_timer_irq_can_fire(ptimer))
>   min_phys = kvm_timer_compute_delta(vcpu, ptimer);
>  
> - WARN_ON((min_virt == ULLONG_MAX) && (min_phys == ULLONG_MAX));
> + /* If none of timers can fire, then return 0 */
> + if ((min_virt == ULLONG_MAX) && (min_phys == ULLONG_MAX))
> + return 0;

Why didn't you have this semantics in the previous patch?

>  
>   return min(min_virt, min_phys);
>  }
> @@ -257,6 +255,26 @@ static int kvm_timer_update_state(struct kvm_vcpu *vcpu)
>  }
>  
>  /*
> + * Schedule the background timer for the emulated timer. The background timer
> + * runs whenever vcpu is runnable and the timer is not expired.
> + */
> +static void kvm_timer_emulate(struct kvm_vcpu *vcpu,
> +struct arch_timer_context *timer_ctx)
> +{
> + struct arch_timer_cpu *timer = >arch.timer_cpu;
> +
> + if (kvm_timer_should_fire(vcpu, timer_ctx))
> + return;
> +
> + if (!kvm_timer_irq_can_fire(timer_ctx))
> + return;
> +
> + /*  The timer has not yet expired, schedule a background timer */
> + timer_disarm(timer);
> + timer_arm(timer, kvm_timer_compute_delta(vcpu, timer_ctx));

I'm wondering what the effect of this thing really is.  Isn't the soft
timer programmed in timer_arm() based on Linux's own timekeeping
schedule, such that the physical timer will be programmed to the next
tick, regardless of what you program here, so all you have to do is
check if you need to inject the phys timer on entry to the VM?

On the other hand, if this can cause Linux to program the phys timer to
expire sooner, then I guess it makes sense.  Thinking about it, would
that be the case on a tickless system?

> +}
> +
> +/*
>   * Schedule the background timer before calling kvm_vcpu_block, so that this
>   * thread is removed from its waitqueue and made runnable when there's a 
> timer
>   * interrupt to handle.
> @@ -267,8 +285,6 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
>   struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>   struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>  
> - BUG_ON(timer_is_armed(timer));
> -
>   /*
>* No need to schedule a background timer if any guest timer has
>* already expired, because kvm_vcpu_block will return before putting
> @@ -290,13 +306,21 @@ void kvm_timer_schedule(struct kvm_vcpu *vcpu)
>* The guest timers have not yet expired, schedule a background timer.
>* Pick smaller expiration time between phys and virt timer.
>*/
> + timer_disarm(timer);
>   timer_arm(timer, kvm_timer_min_block(vcpu));
>  }
>  
>  void kvm_timer_unschedule(struct kvm_vcpu *vcpu)
>  {
>   struct arch_timer_cpu *timer = >arch.timer_cpu;
> +
>   timer_disarm(timer);
> +
> + /*
> +  * Now we return from the 

Re: [RFC 4/8] KVM: arm/arm64: Initialize the emulated EL1 physical timer

2017-01-09 Thread Christoffer Dall
On Mon, Dec 26, 2016 at 12:12:02PM -0500, Jintack Lim wrote:
> Initialize the emulated EL1 physical timer with the default irq number.
> 
> Signed-off-by: Jintack Lim 
> ---
>  arch/arm/kvm/reset.c |  9 -
>  arch/arm64/kvm/reset.c   |  9 -
>  include/kvm/arm_arch_timer.h |  3 ++-
>  virt/kvm/arm/arch_timer.c| 12 ++--
>  4 files changed, 28 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm/kvm/reset.c b/arch/arm/kvm/reset.c
> index 4b5e802..1da8b2d 100644
> --- a/arch/arm/kvm/reset.c
> +++ b/arch/arm/kvm/reset.c
> @@ -37,6 +37,11 @@
>   .usr_regs.ARM_cpsr = SVC_MODE | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT,
>  };
>  
> +static const struct kvm_irq_level cortexa_ptimer_irq = {
> + { .irq = 30 },
> + .level = 1,
> +};
> +
>  static const struct kvm_irq_level cortexa_vtimer_irq = {
>   { .irq = 27 },
>   .level = 1,
> @@ -58,6 +63,7 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>  {
>   struct kvm_regs *reset_regs;
>   const struct kvm_irq_level *cpu_vtimer_irq;
> + const struct kvm_irq_level *cpu_ptimer_irq;
>  
>   switch (vcpu->arch.target) {
>   case KVM_ARM_TARGET_CORTEX_A7:
> @@ -65,6 +71,7 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>   reset_regs = _regs_reset;
>   vcpu->arch.midr = read_cpuid_id();
>   cpu_vtimer_irq = _vtimer_irq;
> + cpu_ptimer_irq = _ptimer_irq;
>   break;
>   default:
>   return -ENODEV;
> @@ -77,5 +84,5 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>   kvm_reset_coprocs(vcpu);
>  
>   /* Reset arch_timer context */
> - return kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq);
> + return kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq, cpu_ptimer_irq);
>  }
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index 5bc4608..74322c2 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -46,6 +46,11 @@
>   COMPAT_PSR_I_BIT | COMPAT_PSR_F_BIT),
>  };
>  
> +static const struct kvm_irq_level default_ptimer_irq = {
> + .irq= 30,
> + .level  = 1,
> +};
> +
>  static const struct kvm_irq_level default_vtimer_irq = {
>   .irq= 27,
>   .level  = 1,
> @@ -110,6 +115,7 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, 
> long ext)
>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>  {
>   const struct kvm_irq_level *cpu_vtimer_irq;
> + const struct kvm_irq_level *cpu_ptimer_irq;
>   const struct kvm_regs *cpu_reset;
>  
>   switch (vcpu->arch.target) {
> @@ -123,6 +129,7 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>   }
>  
>   cpu_vtimer_irq = _vtimer_irq;
> + cpu_ptimer_irq = _ptimer_irq;
>   break;
>   }
>  
> @@ -136,5 +143,5 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
>   kvm_pmu_vcpu_reset(vcpu);
>  
>   /* Reset timer */
> - return kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq);
> + return kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq, cpu_ptimer_irq);
>  }
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index d21652a..04ed9c1 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -61,7 +61,8 @@ struct arch_timer_cpu {
>  int kvm_timer_enable(struct kvm_vcpu *vcpu);
>  void kvm_timer_init(struct kvm *kvm);
>  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> -  const struct kvm_irq_level *irq);
> +  const struct kvm_irq_level *virt_irq,
> +  const struct kvm_irq_level *phys_irq);
>  void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu);
>  void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu);
>  void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu);
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 3bd6063..ed80864 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -339,9 +339,11 @@ void kvm_timer_sync_hwstate(struct kvm_vcpu *vcpu)
>  }
>  
>  int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
> -  const struct kvm_irq_level *irq)
> +  const struct kvm_irq_level *virt_irq,
> +  const struct kvm_irq_level *phys_irq)
>  {
>   struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> + struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>  
>   /*
>* The vcpu timer irq number cannot be determined in
> @@ -349,7 +351,8 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
>* kvm_vcpu_set_target(). To handle this, we determine
>* vcpu timer irq number when the vcpu is reset.
>*/
> - vtimer->irq.irq = irq->irq;
> + vtimer->irq.irq = virt_irq->irq;
> + ptimer->irq.irq = phys_irq->irq;
>  
>   /*
>* The bits in CNTV_CTL are architecturally reset to UNKNOWN for ARMv8
> @@ -358,6 +361,7 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
>* the ARMv7 architecture.

Re: [RFC 2/8] KVM: arm/arm64: Decouple kvm timer functions from virtual timer

2017-01-09 Thread Christoffer Dall
On Mon, Dec 26, 2016 at 12:12:00PM -0500, Jintack Lim wrote:
> Now that we have a separate structure for timer context, make functions
> general so that they can work on any timer context, not just the virtual
> timer context.  This does not change the virtual timer functionality.
> 
> Signed-off-by: Jintack Lim 
> ---
>  arch/arm/kvm/arm.c   |  2 +-
>  include/kvm/arm_arch_timer.h |  3 +-
>  virt/kvm/arm/arch_timer.c| 65 
> +---
>  3 files changed, 40 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 19b5f5c..37d1623 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -295,7 +295,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>  
>  int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
>  {
> - return kvm_timer_should_fire(vcpu);
> + return kvm_timer_should_fire(vcpu, vcpu_vtimer(vcpu));
>  }
>  
>  void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index 7dabe56..cf84145 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -69,7 +69,8 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
>  u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 regid);
>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>  
> -bool kvm_timer_should_fire(struct kvm_vcpu *vcpu);
> +bool kvm_timer_should_fire(struct kvm_vcpu *vcpu,
> +struct arch_timer_context *timer_ctx);
>  void kvm_timer_schedule(struct kvm_vcpu *vcpu);
>  void kvm_timer_unschedule(struct kvm_vcpu *vcpu);
>  
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 30a64df..3bd6063 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -91,7 +91,7 @@ static void kvm_timer_inject_irq_work(struct work_struct 
> *work)
>   vcpu = container_of(work, struct kvm_vcpu, arch.timer_cpu.expired);
>   vcpu->arch.timer_cpu.armed = false;
>  
> - WARN_ON(!kvm_timer_should_fire(vcpu));
> + WARN_ON(!kvm_timer_should_fire(vcpu, vcpu_vtimer(vcpu)));
>  
>   /*
>* If the vcpu is blocked we want to wake it up so that it will see
> @@ -100,12 +100,22 @@ static void kvm_timer_inject_irq_work(struct 
> work_struct *work)
>   kvm_vcpu_kick(vcpu);
>  }
>  
> -static u64 kvm_timer_compute_delta(struct kvm_vcpu *vcpu)
> +static u64 kvm_timer_cntvoff(struct kvm_vcpu *vcpu,
> +  struct arch_timer_context *timer_ctx)
> +{
> + if (timer_ctx == vcpu_vtimer(vcpu))
> + return vcpu->kvm->arch.timer.cntvoff;
> +
> + return 0;
> +}
> +
> +static u64 kvm_timer_compute_delta(struct kvm_vcpu *vcpu,
> +struct arch_timer_context *timer_ctx)
>  {
>   cycle_t cval, now;
>  
> - cval = vcpu_vtimer(vcpu)->cnt_cval;
> - now = kvm_phys_timer_read() - vcpu->kvm->arch.timer.cntvoff;
> + cval = timer_ctx->cnt_cval;
> + now = kvm_phys_timer_read() - kvm_timer_cntvoff(vcpu, timer_ctx);
>  
>   if (now < cval) {
>   u64 ns;
> @@ -134,7 +144,7 @@ static enum hrtimer_restart kvm_timer_expire(struct 
> hrtimer *hrt)
>* PoV (NTP on the host may have forced it to expire
>* early). If we should have slept longer, restart it.
>*/
> - ns = kvm_timer_compute_delta(vcpu);
> + ns = kvm_timer_compute_delta(vcpu, vcpu_vtimer(vcpu));
>   if (unlikely(ns)) {
>   hrtimer_forward_now(hrt, ns_to_ktime(ns));
>   return HRTIMER_RESTART;
> @@ -144,42 +154,40 @@ static enum hrtimer_restart kvm_timer_expire(struct 
> hrtimer *hrt)
>   return HRTIMER_NORESTART;
>  }
>  
> -static bool kvm_timer_irq_can_fire(struct kvm_vcpu *vcpu)
> +static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx)
>  {
> - struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> -
> - return !(vtimer->cnt_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
> - (vtimer->cnt_ctl & ARCH_TIMER_CTRL_ENABLE);
> + return !(timer_ctx->cnt_ctl & ARCH_TIMER_CTRL_IT_MASK) &&
> + (timer_ctx->cnt_ctl & ARCH_TIMER_CTRL_ENABLE);
>  }
>  
> -bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> +bool kvm_timer_should_fire(struct kvm_vcpu *vcpu,
> +struct arch_timer_context *timer_ctx)
>  {
> - struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>   cycle_t cval, now;
>  
> - if (!kvm_timer_irq_can_fire(vcpu))
> + if (!kvm_timer_irq_can_fire(timer_ctx))
>   return false;
>  
> - cval = vtimer->cnt_cval;
> - now = kvm_phys_timer_read() - vcpu->kvm->arch.timer.cntvoff;
> + cval = timer_ctx->cnt_cval;
> + now = kvm_phys_timer_read() - kvm_timer_cntvoff(vcpu, timer_ctx);
>  

I'm wondering if we should move the cntvoff field to the
arch_timer_context struct so that all these functions can just take
the timer_ctx pointer without a vcpu 

Re: [RFC PATCH 1/7] arm64: Use physical counter for in-kernel reads

2017-01-09 Thread Christoffer Dall
On Fri, Jan 06, 2017 at 03:16:24PM +, Marc Zyngier wrote:
> On 06/01/17 10:53, Christoffer Dall wrote:
> > On Fri, Jan 06, 2017 at 10:38:49AM +, Marc Zyngier wrote:
> >> On 06/01/17 10:00, Christoffer Dall wrote:
> >>> Hi Marc,
> >>>
> >>> On Thu, Jan 05, 2017 at 06:11:14PM +, Marc Zyngier wrote:
>  [adding the arm64 maintainers, plus Mark as arch timer maintainer]
> >>>
> >>> Right, sorry, I should have done that already.
> >>>
> 
>  On 10/12/16 20:47, Christoffer Dall wrote:
> > Using the physical counter allows KVM to retain the offset between the
> > virtual and physical counter as long as it is actively running a VCPU.
> >
> > As soon as a VCPU is released, another thread is scheduled or we start
> > running userspace applications, we reset the offset to 0, so that VDSO
> > operations can still read the virtual counter and get the same view of
> > time as the kernel.
> >
> > This opens up potential improvements for KVM performance.
> >
> > VHE kernels or kernels using the virtual timer are unaffected by this.
> >
> > Signed-off-by: Christoffer Dall 
> > ---
> >  arch/arm64/include/asm/arch_timer.h  | 6 --
> >  drivers/clocksource/arm_arch_timer.c | 2 +-
> >  2 files changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/arch_timer.h 
> > b/arch/arm64/include/asm/arch_timer.h
> > index eaa5bbe..cec2549 100644
> > --- a/arch/arm64/include/asm/arch_timer.h
> > +++ b/arch/arm64/include/asm/arch_timer.h
> > @@ -139,11 +139,13 @@ static inline void arch_timer_set_cntkctl(u32 
> > cntkctl)
> >  
> >  static inline u64 arch_counter_get_cntpct(void)
> >  {
> > +   u64 cval;
> > /*
> >  * AArch64 kernel and user space mandate the use of CNTVCT.
> >  */
> > -   BUG();
> > -   return 0;
> > +   isb();
> > +   asm volatile("mrs %0, cntpct_el0" : "=r" (cval));
> > +   return cval;
> >  }
> >  
> >  static inline u64 arch_counter_get_cntvct(void)
> > diff --git a/drivers/clocksource/arm_arch_timer.c 
> > b/drivers/clocksource/arm_arch_timer.c
> > index 73c487d..a5b0789 100644
> > --- a/drivers/clocksource/arm_arch_timer.c
> > +++ b/drivers/clocksource/arm_arch_timer.c
> > @@ -597,7 +597,7 @@ static void __init arch_counter_register(unsigned 
> > type)
> >  
> > /* Register the CP15 based counter if we have one */
> > if (type & ARCH_CP15_TIMER) {
> > -   if (IS_ENABLED(CONFIG_ARM64) || arch_timer_uses_ppi == 
> > VIRT_PPI)
> > +   if (arch_timer_uses_ppi == VIRT_PPI || 
> > is_kernel_in_hyp_mode())
> 
>  Why do we have this is_kernel_in_hyp_mode clause? I can't think of any
>  reason for a VHE kernel to use the virtual counter at all...
> 
> >>>
> >>> Good question.  I think I just didn't want to change behavior from the
> >>> existing functionality mre than necessary.
> >>>
> >>> Note that on a VHE kernel this will be the EL2 virtual counter, not the
> >>> EL1 virtual counter, due to the register redirection.  Are the virtual
> >>> and physical EL2 counters always equivalent on a VHE system?
> >>
> >> Yes, they are. CNTVOFF_EL2 is ignored in that case, and you get an extra
> >> interrupt for the new EL2 virtual timer as well.
> >>
> > 
> > ok, in that case I suppose I can just check for arch_timer_uses_ppi ==
> > VIRT_PPI and be done with it.
> 
> I wonder what we should do for get_cycles(), which is hardwired to the
> virtual counter at the moment. If we decide to use the physical counter
> on systems identified as hosts, we may have to revise this as well (I
> feel the need for an alternative... ;-).
> 

The physical counter is always available right?  So why not just use the
physical counter?

I guess that comes down to two questions:

First, can get_cycles() be called from whereever in any context etc.?
In other words, is it ever called between vcpu_load() and vcpu_put(),
because if not, it doesn't matter which one we use, but that may be a
slightly brittle solution.

Second, what should the semantics of get_cycles be?  Is it important
it's synchronized with the counter backing the timer being used, or
should it rather represents a wall-clock cycle measure, or should it
indeed represent virtual cycles spent if we ever decide to start
tweaking the offset depending on stolen cycles?

Perhaps I'm getting ahead of myself and we should just make sure it's
aligned with the timer in use, in which case we only have to decide if
we should implement that by (a) setting a function pointer, (b) using an
alternative, or (c) using a static key :)

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2] KVM: arm/arm64: Fix occasional warning from the timer work function

2017-01-09 Thread Christoffer Dall
When a VCPU blocks (WFI) and has programmed the vtimer, we program a
soft timer to expire in the future to wake up the vcpu thread when
appropriate.  Because such as wake up involves a vcpu kick, and the
timer expire function can get called from interrupt context, and the
kick may sleep, we have to schedule the kick in the work function.

The work function currently has a warning that gets raised if it turns
out that the timer shouldn't fire when it's run, which was added because
the idea was that in that case the work should never have been cancelled.

However, it turns out that this whole thing is racy and we can get
spurious warnings.  The problem is that we clear the armed flag in the
work function, which may run in parallel with the
kvm_timer_unschedule->timer_disarm() call.  This results in a possible
situation where the timer_disarm() call does not call
cancel_work_sync(), which effectively synchronizes the completion of the
work function with running the VCPU.  As a result, the VCPU thread
proceeds before the work function completees, causing changes to the
timer state such that kvm_timer_should_fire(vcpu) returns false in the
work function.

All we do in the work function is to kick the VCPU, and an occasional
rare extra kick never harmed anyone.  Since the race above is extremely
rare, we don't bother checking if the race happens but simply remove the
check and the clearing of the armed flag from the work function.

Reported-by: Matthias Brugger 
Signed-off-by: Christoffer Dall 
---
Changes since v1:
 - Don't add a second call to cancel_work_sync, but avoid clearing
   the armed flag to let the timer_disarm() function call
   cancel_work_sync on every unschedule call.
 - Note that I chose to remove the warning, despite it shouldn't really
   happen anymore, because I don't see the value in the warning and
   it does a bit of potentially unnecessary checking in a potentially
   hot path.

 virt/kvm/arm/arch_timer.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index a2dbbcc..a7fe606 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -89,9 +89,6 @@ static void kvm_timer_inject_irq_work(struct work_struct 
*work)
struct kvm_vcpu *vcpu;
 
vcpu = container_of(work, struct kvm_vcpu, arch.timer_cpu.expired);
-   vcpu->arch.timer_cpu.armed = false;
-
-   WARN_ON(!kvm_timer_should_fire(vcpu));
 
/*
 * If the vcpu is blocked we want to wake it up so that it will see
-- 
2.9.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC] arm64: KVM: Honor guest device memory type mappings

2017-01-09 Thread Marc Zyngier
[+ Christoffer, Will, Catalin - In the future, please Cc the relevant
maintainers]

On 09/12/16 20:59, Caraman Mihai Claudiu-B02008 wrote:
> From: Mihai Caraman 
> 
> Stage 2 non-Gathering and non-Reordering memory attributes have precedence 
> over stage 1
> Gathering and Reordering attributes. Honor guest device memory type mappings 
> using GRE
> memory type at stage 2.

This feels incredibly scary. One you start enabling gathering and
reordering, there is nothing that prevents a write to a device from a
given guest to be merged or reordered with a write from another guest to
the same device (think of the GIC). This would only require one of the
guests to use GRE attributes itself.

So as it stands, my feeling is that it is a big NO.

> This is a preamble for a bigger change that will honor guest normal memory 
> mappings for
> targets connected to HN-F interfaces.

Normal memory mappings for devices? Unless you can prove that this is a
valid architectural configuration, this is unlikely to happen. And I'm
not keen on taking any such change before seeing the whole thing.

> 
> Signed-off-by: Mihai Caraman 
> ---
>  arch/arm64/include/asm/memory.h   | 2 +-
>  arch/arm64/include/asm/pgtable-prot.h | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index b71086d..d4834ba 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -134,7 +134,7 @@
>   * Memory types for Stage-2 translation
>   */
>  #define MT_S2_NORMAL 0xf
> -#define MT_S2_DEVICE_nGnRE   0x1
> +#define MT_S2_DEVICE_GRE 0x3
>  
>  #ifdef CONFIG_ARM64_4K_PAGES
>  #define IOREMAP_MAX_ORDER(PUD_SHIFT)
> diff --git a/arch/arm64/include/asm/pgtable-prot.h 
> b/arch/arm64/include/asm/pgtable-prot.h
> index 2142c77..11b43f7 100644
> --- a/arch/arm64/include/asm/pgtable-prot.h
> +++ b/arch/arm64/include/asm/pgtable-prot.h
> @@ -61,7 +61,7 @@
>  #define PAGE_HYP_DEVICE  __pgprot(PROT_DEVICE_nGnRE | PTE_HYP)
>  
>  #define PAGE_S2  __pgprot(PROT_DEFAULT | 
> PTE_S2_MEMATTR(MT_S2_NORMAL) | PTE_S2_RDONLY)
> -#define PAGE_S2_DEVICE   __pgprot(PROT_DEFAULT | 
> PTE_S2_MEMATTR(MT_S2_DEVICE_nGnRE) | PTE_S2_RDONLY | PTE_UXN)
> +#define PAGE_S2_DEVICE   __pgprot(PROT_DEFAULT | 
> PTE_S2_MEMATTR(MT_S2_DEVICE_GRE) | PTE_S2_RDONLY | PTE_UXN)
>  
>  #define PAGE_NONE__pgprot(((_PAGE_DEFAULT) & ~PTE_VALID) | 
> PTE_PROT_NONE | PTE_PXN | PTE_UXN)
>  #define PAGE_SHARED  __pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | 
> PTE_PXN | PTE_UXN | PTE_WRITE)
> 

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm