Re: [PATCH 11/19] KVM: PPC: Book3S HV: add support for the XIVE native exploitation mode hcalls

2019-01-25 Thread Benjamin Herrenschmidt
On Tue, 2019-01-22 at 16:23 +1100, Paul Mackerras wrote:
> 
> Which ones of these could be implemented in QEMU?  Are there any that
> can't possibly be implemented in QEMU because they need to do things
> that require calling internal interfaces that userspace doesn't have
> access to?

Implementing them in qemu doesn't make a lot of sense. Qemu doesn't
have access to most of the XIVE HW state. There's a XIVE model for full
emulation but when using the real thing, almost none of it is visible.
A lot of those hcalls effectively turn into OPAL calls.

> How often do we expect each of these hypercalls to be called?

It depends, I can't tell for AIX. For Linux, not often with one
exception: H_INT_ESB which will be used whenever you EOI an emulated
LSI.

 .../...

> Why do we need to provide real-mode versions of these hypercall
> handlers?  I thought these hypercalls would only get called
> infrequently, and in any case certainly much less frequently than once
> per interrupt delivered.  If they are infrequent, then let's leave out
> the real-mode version and just handle them in book3s_hv.c.

Agreed with the exception maybe of H_INT_ESB

> > @@ -5153,6 +5169,19 @@ static unsigned int default_hcall_list[] = {
> >H_IPOLL,
> >H_XIRR,
> >H_XIRR_X,
> > +#endif
> > +#ifdef CONFIG_KVM_XIVE
> > + H_INT_GET_SOURCE_INFO,
> > + H_INT_SET_SOURCE_CONFIG,
> > + H_INT_GET_SOURCE_CONFIG,
> > + H_INT_GET_QUEUE_INFO,
> > + H_INT_SET_QUEUE_CONFIG,
> > + H_INT_GET_QUEUE_CONFIG,
> > + H_INT_SET_OS_REPORTING_LINE,
> > + H_INT_GET_OS_REPORTING_LINE,
> > + H_INT_ESB,
> > + H_INT_SYNC,
> > + H_INT_RESET,
> >   #endif
> 
> The policy is not to add new hcalls to default_hcall_list[].  Is there
> a strong reason for adding them here?
> 
> Paul.



Re: [PATCH 11/19] KVM: PPC: Book3S HV: add support for the XIVE native exploitation mode hcalls

2019-01-25 Thread Benjamin Herrenschmidt
On Wed, 2019-01-23 at 21:26 +1100, Paul Mackerras wrote:
> If H_INT_ESB is only used for LSIs, then is a guest going to be using
> it at all?  

*emulated* LSIs, ie LSIs coming from emulated devices. It will depends
in practice of what kind of emulated device you put in your guest. We
need that because under the hood, we send a XIVE MSI, so we need to be
notified of the EOI so we can resend if the emulated LSI is still
asserted. 

> My understanding was that with XIVE, only a small number
> of interrupts that are to do with system management functions are
> LSIs; all of the interrupts relating to PCI-e devices are MSIs.  So do
> we actually have a real high-frequency use case for LSIs in a guest?
> 
> For now I would prefer that you remove all the real-mode hcall
> handlers.  We can add them later if we get performance data showing
> that they are needed.
> 
> Regarding whether or not to have a given hcall handler in the kernel
> at all - if there is for example an hcall which is just called once
> on guest startup, and its function is just to provide information to
> the guest, and QEMU has that information, then why not have that hcall
> implemented by QEMU?  Are any of the hcalls like that?
> 
> For example, if H_INT_GET_SOURCE_INFO was implemented in QEMU, could
> we then remove the VC_BASE thing from the xive device?

Ben.



Re: [PATCH 11/19] KVM: PPC: Book3S HV: add support for the XIVE native exploitation mode hcalls

2019-01-23 Thread Cédric Le Goater
On 1/23/19 11:26 AM, Paul Mackerras wrote:
> On Wed, Jan 23, 2019 at 09:48:31AM +0100, Cédric Le Goater wrote:
>> On 1/23/19 7:44 AM, Benjamin Herrenschmidt wrote:
>>> On Tue, 2019-01-22 at 16:23 +1100, Paul Mackerras wrote:
 Why do we need to provide real-mode versions of these hypercall
 handlers?  I thought these hypercalls would only get called
 infrequently, and in any case certainly much less frequently than once
 per interrupt delivered.  If they are infrequent, then let's leave out
 the real-mode version and just handle them in book3s_hv.c.
>>>
>>> Agreed with the exception maybe of H_INT_ESB
>>
>> ok. 
>>
>> Some of these hcalls are really simple and only getting local info from 
>> the host (h_int_get_*). I thought handling the hcall ASAP was a preferred 
>> practice, even if the hcall is not called frequently. Isn't it ?
> 
> If we are going to handle a given hcall in the kernel at all, then we
> have to have a virtual mode handler.  If we have a real-mode handler
> as well then we in general incur a certain amount of code duplication
> with consequent maintenance costs and possibility of bugs.  So we
> generally only have real-mode handlers for the hcalls where it is
> critical to minimize the latency.  From what Ben is saying that would
> only be H_INT_ESB, and maybe not even that.

ok. and yes, even the H_INT_ESB is questionable as this is really a rare
configuration. 

> If H_INT_ESB is only used for LSIs, then is a guest going to be using
> it at all?  My understanding was that with XIVE, only a small number
> of interrupts that are to do with system management functions are
> LSIs; all of the interrupts relating to PCI-e devices are MSIs.  So do
> we actually have a real high-frequency use case for LSIs in a guest?

The guest should be using a rtl8139 or a e1000 NIC under QEMU/KVM, which 
is not the common scenario. 

> For now I would prefer that you remove all the real-mode hcall
> handlers.  We can add them later if we get performance data showing
> that they are needed.

ok. I will.

> Regarding whether or not to have a given hcall handler in the kernel
> at all - if there is for example an hcall which is just called once
> on guest startup, and its function is just to provide information to
> the guest, and QEMU has that information, then why not have that hcall
> implemented by QEMU?  Are any of the hcalls like that?
> 
> For example, if H_INT_GET_SOURCE_INFO was implemented in QEMU, could
> we then remove the VC_BASE thing from the xive device?

Yes. H_INT_GET_SOURCE_INFO looks like a good candidate, all info should
be in QEMU and there are no OPAL calls, and we would get rid of the 
VC_BASE kvm device ioctl at the same time.

Thanks,

C.


Re: [PATCH 11/19] KVM: PPC: Book3S HV: add support for the XIVE native exploitation mode hcalls

2019-01-23 Thread Paul Mackerras
On Wed, Jan 23, 2019 at 09:48:31AM +0100, Cédric Le Goater wrote:
> On 1/23/19 7:44 AM, Benjamin Herrenschmidt wrote:
> > On Tue, 2019-01-22 at 16:23 +1100, Paul Mackerras wrote:
> >> Why do we need to provide real-mode versions of these hypercall
> >> handlers?  I thought these hypercalls would only get called
> >> infrequently, and in any case certainly much less frequently than once
> >> per interrupt delivered.  If they are infrequent, then let's leave out
> >> the real-mode version and just handle them in book3s_hv.c.
> > 
> > Agreed with the exception maybe of H_INT_ESB
> 
> ok. 
> 
> Some of these hcalls are really simple and only getting local info from 
> the host (h_int_get_*). I thought handling the hcall ASAP was a preferred 
> practice, even if the hcall is not called frequently. Isn't it ?

If we are going to handle a given hcall in the kernel at all, then we
have to have a virtual mode handler.  If we have a real-mode handler
as well then we in general incur a certain amount of code duplication
with consequent maintenance costs and possibility of bugs.  So we
generally only have real-mode handlers for the hcalls where it is
critical to minimize the latency.  From what Ben is saying that would
only be H_INT_ESB, and maybe not even that.

If H_INT_ESB is only used for LSIs, then is a guest going to be using
it at all?  My understanding was that with XIVE, only a small number
of interrupts that are to do with system management functions are
LSIs; all of the interrupts relating to PCI-e devices are MSIs.  So do
we actually have a real high-frequency use case for LSIs in a guest?

For now I would prefer that you remove all the real-mode hcall
handlers.  We can add them later if we get performance data showing
that they are needed.

Regarding whether or not to have a given hcall handler in the kernel
at all - if there is for example an hcall which is just called once
on guest startup, and its function is just to provide information to
the guest, and QEMU has that information, then why not have that hcall
implemented by QEMU?  Are any of the hcalls like that?

For example, if H_INT_GET_SOURCE_INFO was implemented in QEMU, could
we then remove the VC_BASE thing from the xive device?

Paul.


Re: [PATCH 11/19] KVM: PPC: Book3S HV: add support for the XIVE native exploitation mode hcalls

2019-01-23 Thread Cédric Le Goater
On 1/23/19 7:44 AM, Benjamin Herrenschmidt wrote:
> On Tue, 2019-01-22 at 16:23 +1100, Paul Mackerras wrote:
>>
>> Which ones of these could be implemented in QEMU?  Are there any that
>> can't possibly be implemented in QEMU because they need to do things
>> that require calling internal interfaces that userspace doesn't have
>> access to?
> 
> Implementing them in qemu doesn't make a lot of sense. Qemu doesn't
> have access to most of the XIVE HW state. There's a XIVE model for full
> emulation but when using the real thing, almost none of it is visible.
> A lot of those hcalls effectively turn into OPAL calls.
> 
>> How often do we expect each of these hypercalls to be called?
> 
> It depends, I can't tell for AIX. For Linux, not often with one
> exception: H_INT_ESB which will be used whenever you EOI an emulated
> LSI.

yes. This one is only doing loads and stores.

>  .../...
> 
>> Why do we need to provide real-mode versions of these hypercall
>> handlers?  I thought these hypercalls would only get called
>> infrequently, and in any case certainly much less frequently than once
>> per interrupt delivered.  If they are infrequent, then let's leave out
>> the real-mode version and just handle them in book3s_hv.c.
> 
> Agreed with the exception maybe of H_INT_ESB

ok. 

Some of these hcalls are really simple and only getting local info from 
the host (h_int_get_*). I thought handling the hcall ASAP was a preferred 
practice, even if the hcall is not called frequently. Isn't it ?

 
>>> @@ -5153,6 +5169,19 @@ static unsigned int default_hcall_list[] = {
>>>H_IPOLL,
>>>H_XIRR,
>>>H_XIRR_X,
>>> +#endif
>>> +#ifdef CONFIG_KVM_XIVE
>>> + H_INT_GET_SOURCE_INFO,
>>> + H_INT_SET_SOURCE_CONFIG,
>>> + H_INT_GET_SOURCE_CONFIG,
>>> + H_INT_GET_QUEUE_INFO,
>>> + H_INT_SET_QUEUE_CONFIG,
>>> + H_INT_GET_QUEUE_CONFIG,
>>> + H_INT_SET_OS_REPORTING_LINE,
>>> + H_INT_GET_OS_REPORTING_LINE,
>>> + H_INT_ESB,
>>> + H_INT_SYNC,
>>> + H_INT_RESET,
>>>   #endif
>>
>> The policy is not to add new hcalls to default_hcall_list[].  Is there
>> a strong reason for adding them here?

I don't remember. I will check for v2.

Thanks, 

C.



Re: [PATCH 11/19] KVM: PPC: Book3S HV: add support for the XIVE native exploitation mode hcalls

2019-01-21 Thread Paul Mackerras
On Mon, Jan 07, 2019 at 07:43:23PM +0100, Cédric Le Goater wrote:
> The XIVE native exploitation mode specs define a set of Hypervisor
> calls to configure the sources and the event queues :
> 
>  - H_INT_GET_SOURCE_INFO
> 
>used to obtain the address of the MMIO page of the Event State
>Buffer (PQ bits) entry associated with the source.
> 
>  - H_INT_SET_SOURCE_CONFIG
> 
>assigns a source to a "target".
> 
>  - H_INT_GET_SOURCE_CONFIG
> 
>determines which "target" and "priority" is assigned to a source
> 
>  - H_INT_GET_QUEUE_INFO
> 
>returns the address of the notification management page associated
>with the specified "target" and "priority".
> 
>  - H_INT_SET_QUEUE_CONFIG
> 
>sets or resets the event queue for a given "target" and "priority".
>It is also used to set the notification configuration associated
>with the queue, only unconditional notification is supported for
>the moment. Reset is performed with a queue size of 0 and queueing
>is disabled in that case.
> 
>  - H_INT_GET_QUEUE_CONFIG
> 
>returns the queue settings for a given "target" and "priority".
> 
>  - H_INT_RESET
> 
>resets all of the guest's internal interrupt structures to their
>initial state, losing all configuration set via the hcalls
>H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.
> 
>  - H_INT_SYNC
> 
>issue a synchronisation on a source to make sure all notifications
>have reached their queue.

Which ones of these could be implemented in QEMU?  Are there any that
can't possibly be implemented in QEMU because they need to do things
that require calling internal interfaces that userspace doesn't have
access to?

How often do we expect each of these hypercalls to be called?

[snip]

> @@ -682,6 +685,46 @@ int kvmppc_rm_h_cppr(struct kvm_vcpu *vcpu, unsigned 
> long cppr);
>  int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr);
>  void kvmppc_guest_entry_inject_int(struct kvm_vcpu *vcpu);
>  
> +int kvmppc_rm_h_int_get_source_info(struct kvm_vcpu *vcpu,
> + unsigned long flag,
> + unsigned long lisn);
> +int kvmppc_rm_h_int_set_source_config(struct kvm_vcpu *vcpu,
> +   unsigned long flag,
> +   unsigned long lisn,
> +   unsigned long target,
> +   unsigned long priority,
> +   unsigned long eisn);
> +int kvmppc_rm_h_int_get_source_config(struct kvm_vcpu *vcpu,
> +   unsigned long flag,
> +   unsigned long lisn);
> +int kvmppc_rm_h_int_get_queue_info(struct kvm_vcpu *vcpu,
> +unsigned long flag,
> +unsigned long target,
> +unsigned long priority);
> +int kvmppc_rm_h_int_set_queue_config(struct kvm_vcpu *vcpu,
> +  unsigned long flag,
> +  unsigned long target,
> +  unsigned long priority,
> +  unsigned long qpage,
> +  unsigned long qsize);
> +int kvmppc_rm_h_int_get_queue_config(struct kvm_vcpu *vcpu,
> +  unsigned long flag,
> +  unsigned long target,
> +  unsigned long priority);
> +int kvmppc_rm_h_int_set_os_reporting_line(struct kvm_vcpu *vcpu,
> +   unsigned long flag,
> +   unsigned long reportingline);
> +int kvmppc_rm_h_int_get_os_reporting_line(struct kvm_vcpu *vcpu,
> +   unsigned long flag,
> +   unsigned long target,
> +   unsigned long reportingline);
> +int kvmppc_rm_h_int_esb(struct kvm_vcpu *vcpu, unsigned long flag,
> + unsigned long lisn, unsigned long offset,
> + unsigned long data);
> +int kvmppc_rm_h_int_sync(struct kvm_vcpu *vcpu, unsigned long flag,
> +  unsigned long lisn);
> +int kvmppc_rm_h_int_reset(struct kvm_vcpu *vcpu, unsigned long flag);

Why do we need to provide real-mode versions of these hypercall
handlers?  I thought these hypercalls would only get called
infrequently, and in any case certainly much less frequently than once
per interrupt delivered.  If they are infrequent, then let's leave out
the real-mode version and just handle them in book3s_hv.c.

> @@ -5153,6 +5169,19 @@ static unsigned int default_hcall_list[] = {
>   H_IPOLL,
>   H_XIRR,
>   H_XIRR_X,
> +#endif
> +#ifdef CONFIG_KVM_XIVE
> + H_INT_GET_SOURCE_INFO,
> + H_INT_SET_SOURCE_CONFIG,
> + H_INT_GET_SOURCE_CONFIG,
> + H_INT_GET_QUEUE_INFO,
> + 

[PATCH 11/19] KVM: PPC: Book3S HV: add support for the XIVE native exploitation mode hcalls

2019-01-07 Thread Cédric Le Goater
The XIVE native exploitation mode specs define a set of Hypervisor
calls to configure the sources and the event queues :

 - H_INT_GET_SOURCE_INFO

   used to obtain the address of the MMIO page of the Event State
   Buffer (PQ bits) entry associated with the source.

 - H_INT_SET_SOURCE_CONFIG

   assigns a source to a "target".

 - H_INT_GET_SOURCE_CONFIG

   determines which "target" and "priority" is assigned to a source

 - H_INT_GET_QUEUE_INFO

   returns the address of the notification management page associated
   with the specified "target" and "priority".

 - H_INT_SET_QUEUE_CONFIG

   sets or resets the event queue for a given "target" and "priority".
   It is also used to set the notification configuration associated
   with the queue, only unconditional notification is supported for
   the moment. Reset is performed with a queue size of 0 and queueing
   is disabled in that case.

 - H_INT_GET_QUEUE_CONFIG

   returns the queue settings for a given "target" and "priority".

 - H_INT_RESET

   resets all of the guest's internal interrupt structures to their
   initial state, losing all configuration set via the hcalls
   H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.

 - H_INT_SYNC

   issue a synchronisation on a source to make sure all notifications
   have reached their queue.

Calls that still need to be addressed :

   H_INT_SET_OS_REPORTING_LINE
   H_INT_GET_OS_REPORTING_LINE

Signed-off-by: Cédric Le Goater 
---
 arch/powerpc/include/asm/kvm_ppc.h|  43 ++
 arch/powerpc/kvm/book3s_xive.h|  54 +++
 arch/powerpc/kvm/book3s_hv.c  |  29 ++
 arch/powerpc/kvm/book3s_hv_builtin.c  | 196 +
 arch/powerpc/kvm/book3s_hv_rm_xive_native.c   |  47 +++
 arch/powerpc/kvm/book3s_xive_native.c | 326 ++-
 .../powerpc/kvm/book3s_xive_native_template.c | 371 ++
 arch/powerpc/kvm/Makefile |   2 +
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   |  52 +++
 9 files changed, 1118 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/kvm/book3s_hv_rm_xive_native.c

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 1bb313f238fe..4cc897039485 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -602,6 +602,7 @@ extern int kvmppc_xive_native_connect_vcpu(struct 
kvm_device *dev,
 extern void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu);
 extern void kvmppc_xive_native_init_module(void);
 extern void kvmppc_xive_native_exit_module(void);
+extern int kvmppc_xive_native_hcall(struct kvm_vcpu *vcpu, u32 cmd);
 
 #else
 static inline int kvmppc_xive_set_xive(struct kvm *kvm, u32 irq, u32 server,
@@ -634,6 +635,8 @@ static inline int kvmppc_xive_native_connect_vcpu(struct 
kvm_device *dev,
 static inline void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu) { }
 static inline void kvmppc_xive_native_init_module(void) { }
 static inline void kvmppc_xive_native_exit_module(void) { }
+static inline int kvmppc_xive_native_hcall(struct kvm_vcpu *vcpu, u32 cmd)
+   { return 0; }
 
 #endif /* CONFIG_KVM_XIVE */
 
@@ -682,6 +685,46 @@ int kvmppc_rm_h_cppr(struct kvm_vcpu *vcpu, unsigned long 
cppr);
 int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr);
 void kvmppc_guest_entry_inject_int(struct kvm_vcpu *vcpu);
 
+int kvmppc_rm_h_int_get_source_info(struct kvm_vcpu *vcpu,
+   unsigned long flag,
+   unsigned long lisn);
+int kvmppc_rm_h_int_set_source_config(struct kvm_vcpu *vcpu,
+ unsigned long flag,
+ unsigned long lisn,
+ unsigned long target,
+ unsigned long priority,
+ unsigned long eisn);
+int kvmppc_rm_h_int_get_source_config(struct kvm_vcpu *vcpu,
+ unsigned long flag,
+ unsigned long lisn);
+int kvmppc_rm_h_int_get_queue_info(struct kvm_vcpu *vcpu,
+  unsigned long flag,
+  unsigned long target,
+  unsigned long priority);
+int kvmppc_rm_h_int_set_queue_config(struct kvm_vcpu *vcpu,
+unsigned long flag,
+unsigned long target,
+unsigned long priority,
+unsigned long qpage,
+unsigned long qsize);
+int kvmppc_rm_h_int_get_queue_config(struct kvm_vcpu *vcpu,
+unsigned long flag,
+unsigned long target,
+unsigned long priority);
+int kvmppc_rm_h_int_set_os_reporting_line(struct kvm_vcpu *vcpu,
+