Re: [PATCH v2 -next] powerpc: kernel/time.c - cleanup warnings

2021-03-23 Thread Christophe Leroy




Le 24/03/2021 à 07:14, Christophe Leroy a écrit :



Le 24/03/2021 à 00:05, Alexandre Belloni a écrit :

On 23/03/2021 23:18:17+0100, Alexandre Belloni wrote:

Hello,

On 23/03/2021 05:12:57-0400, He Ying wrote:

We found these warnings in arch/powerpc/kernel/time.c as follows:
warning: symbol 'decrementer_max' was not declared. Should it be static?
warning: symbol 'rtc_lock' was not declared. Should it be static?
warning: symbol 'dtl_consumer' was not declared. Should it be static?

Declare 'decrementer_max' and 'rtc_lock' in powerpc asm/time.h.
Rename 'rtc_lock' in drviers/rtc/rtc-vr41xx.c to 'vr41xx_rtc_lock' to
avoid the conflict with the variable in powerpc asm/time.h.
Move 'dtl_consumer' definition behind "include " because it
is declared there.

Reported-by: Hulk Robot 
Signed-off-by: He Ying 
---
v2:
- Instead of including linux/mc146818rtc.h in powerpc kernel/time.c, declare
   rtc_lock in powerpc asm/time.h.



V1 was actually the correct thing to do. rtc_lock is there exactly
because chrp and maple are using mc146818 compatible RTCs. This is then
useful because then drivers/char/nvram.c is enabled. The proper fix
would be to scrap all of that and use rtc-cmos for those platforms as
this drives the RTC properly and exposes the NVRAM for the mc146818.

Or at least, if there are no users for the char/nvram driver on those
two platforms, remove the spinlock and stop enabling CONFIG_NVRAM or
more likely rename the symbol as it seems to be abused by both chrp and
powermac.



Ok so rtc_lock is not even used by the char/nvram.c driver as it is
completely compiled out.

I guess it is fine having it move to the individual platform as looking
very quickly at the Kconfig, it is not possible to select both
simultaneously. Tentative patch:



Looking at it once more, it looks like including linux/mc146818rtc.h is the thing to do, at least 
for now. Several platforms are defining the rtc_lock exactly the same way as powerpc does, and 
including mc146818rtc.h


I think that to get it clean, this change should go in a dedicated patch and do a bit more and 
explain exactly what is being do and why. I'll try to draft something for it.


He Y., can you make a version v3 of your patch excluding the rtc_lock change ?



Finally, I think there is not enough changes to justify a separate patch.

So you can send a V3 based on your V1. In addition to the changes you had in V1, please remove the 
declaration of rfc_lock in arch/powerpc/platforms/chrp/chrp.h


Christophe


Re: [PATCH v2 -next] powerpc: kernel/time.c - cleanup warnings

2021-03-23 Thread Christophe Leroy




Le 24/03/2021 à 00:05, Alexandre Belloni a écrit :

On 23/03/2021 23:18:17+0100, Alexandre Belloni wrote:

Hello,

On 23/03/2021 05:12:57-0400, He Ying wrote:

We found these warnings in arch/powerpc/kernel/time.c as follows:
warning: symbol 'decrementer_max' was not declared. Should it be static?
warning: symbol 'rtc_lock' was not declared. Should it be static?
warning: symbol 'dtl_consumer' was not declared. Should it be static?

Declare 'decrementer_max' and 'rtc_lock' in powerpc asm/time.h.
Rename 'rtc_lock' in drviers/rtc/rtc-vr41xx.c to 'vr41xx_rtc_lock' to
avoid the conflict with the variable in powerpc asm/time.h.
Move 'dtl_consumer' definition behind "include " because it
is declared there.

Reported-by: Hulk Robot 
Signed-off-by: He Ying 
---
v2:
- Instead of including linux/mc146818rtc.h in powerpc kernel/time.c, declare
   rtc_lock in powerpc asm/time.h.



V1 was actually the correct thing to do. rtc_lock is there exactly
because chrp and maple are using mc146818 compatible RTCs. This is then
useful because then drivers/char/nvram.c is enabled. The proper fix
would be to scrap all of that and use rtc-cmos for those platforms as
this drives the RTC properly and exposes the NVRAM for the mc146818.

Or at least, if there are no users for the char/nvram driver on those
two platforms, remove the spinlock and stop enabling CONFIG_NVRAM or
more likely rename the symbol as it seems to be abused by both chrp and
powermac.



Ok so rtc_lock is not even used by the char/nvram.c driver as it is
completely compiled out.

I guess it is fine having it move to the individual platform as looking
very quickly at the Kconfig, it is not possible to select both
simultaneously. Tentative patch:



Looking at it once more, it looks like including linux/mc146818rtc.h is the thing to do, at least 
for now. Several platforms are defining the rtc_lock exactly the same way as powerpc does, and 
including mc146818rtc.h


I think that to get it clean, this change should go in a dedicated patch and do a bit more and 
explain exactly what is being do and why. I'll try to draft something for it.


He Y., can you make a version v3 of your patch excluding the rtc_lock change ?

Christophe


Re: [PATCH 02/10] ARM: disable CONFIG_IDE in footbridge_defconfig

2021-03-23 Thread Cye Borg
Sure, here it is:
snow / # lspci -vxxx -s 7.0
00:07.0 ISA bridge: Contaq Microsystems 82c693
Flags: bus master, medium devsel, latency 0
Kernel modules: pata_cypress
00: 80 10 93 c6 47 00 80 02 00 00 01 06 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
40: 03 02 00 00 26 60 00 01 f0 60 00 80 80 71 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Best regards,
Barnabas

ps.: let me know, if anything else I can do.

On Tue, Mar 23, 2021 at 7:43 PM Russell King - ARM Linux admin
 wrote:
>
> On Mon, Mar 22, 2021 at 06:10:01PM +0100, Cye Borg wrote:
> > PWS 500au:
> >
> > snow / # lspci -vvx -s 7.1
> > 00:07.1 IDE interface: Contaq Microsystems 82c693 (prog-if 80 [ISA
> > Compatibility mode-only controller, supports bus mastering])
> > Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > ParErr+ Stepping- SERR- FastB2B- DisINTx-
> > Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
> > >TAbort- SERR-  > Latency: 0
> > Interrupt: pin A routed to IRQ 0
> > Region 0: I/O ports at 01f0 [size=8]
> > Region 1: I/O ports at 03f4
> > Region 4: I/O ports at 9080 [size=16]
> > Kernel driver in use: pata_cypress
> > Kernel modules: pata_cypress
> > 00: 80 10 93 c6 45 00 80 02 00 80 01 01 00 00 80 00
> > 10: f1 01 00 00 f5 03 00 00 00 00 00 00 00 00 00 00
> > 20: 81 90 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00
> >
> > snow / # lspci -vvx -s 7.2
> > 00:07.2 IDE interface: Contaq Microsystems 82c693 (prog-if 00 [ISA
> > Compatibility mode-only controller])
> > Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > ParErr+ Stepping- SERR- FastB2B- DisINTx-
> > Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
> > >TAbort- SERR-  > Latency: 0
> > Interrupt: pin B routed to IRQ 0
> > Region 0: I/O ports at 0170 [size=8]
> > Region 1: I/O ports at 0374
> > Region 4: Memory at 0c24 (32-bit, non-prefetchable)
> > [disabled] [size=64K]
> > Kernel modules: pata_cypress
> > 00: 80 10 93 c6 45 00 80 02 00 00 01 01 00 00 80 00
> > 10: 71 01 00 00 75 03 00 00 00 00 00 00 00 00 00 00
> > 20: 00 00 24 0c 00 00 00 00 00 00 00 00 00 00 00 00
> > 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 02 00 00
>
> Thanks very much.
>
> Could I also ask for the output of:
>
> # lspci -vxxx -s 7.0
>
> as well please - this will dump all 256 bytes for the ISA bridge, which
> contains a bunch of configuration registers. Thanks.
>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


Re: [PATCH V2 1/5] powerpc/perf: Expose processor pipeline stage cycles using PERF_SAMPLE_WEIGHT_STRUCT

2021-03-23 Thread Madhavan Srinivasan



On 3/22/21 8:27 PM, Athira Rajeev wrote:

Performance Monitoring Unit (PMU) registers in powerpc provides
information on cycles elapsed between different stages in the
pipeline. This can be used for application tuning. On ISA v3.1
platform, this information is exposed by sampling registers.
Patch adds kernel support to capture two of the cycle counters
as part of perf sample using the sample type:
PERF_SAMPLE_WEIGHT_STRUCT.

The power PMU function 'get_mem_weight' currently uses 64 bit weight
field of perf_sample_data to capture memory latency. But following the
introduction of PERF_SAMPLE_WEIGHT_TYPE, weight field could contain
64-bit or 32-bit value depending on the architexture support for
PERF_SAMPLE_WEIGHT_STRUCT. Patches uses WEIGHT_STRUCT to expose the
pipeline stage cycles info. Hence update the ppmu functions to work for
64-bit and 32-bit weight values.

If the sample type is PERF_SAMPLE_WEIGHT, use the 64-bit weight field.
if the sample type is PERF_SAMPLE_WEIGHT_STRUCT, memory subsystem
latency is stored in the low 32bits of perf_sample_weight structure.
Also for CPU_FTR_ARCH_31, capture the two cycle counter information in
two 16 bit fields of perf_sample_weight structure.


Changes looks fine to me.

Reviewed-by: Madhavan Srinivasan 



Signed-off-by: Athira Rajeev 
---
  arch/powerpc/include/asm/perf_event_server.h |  2 +-
  arch/powerpc/perf/core-book3s.c  |  4 ++--
  arch/powerpc/perf/isa207-common.c| 29 +---
  arch/powerpc/perf/isa207-common.h|  6 +-
  4 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/perf_event_server.h 
b/arch/powerpc/include/asm/perf_event_server.h
index 00e7e671bb4b..112cf092d7b3 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -43,7 +43,7 @@ struct power_pmu {
u64 alt[]);
void(*get_mem_data_src)(union perf_mem_data_src *dsrc,
u32 flags, struct pt_regs *regs);
-   void(*get_mem_weight)(u64 *weight);
+   void(*get_mem_weight)(u64 *weight, u64 type);
unsigned long   group_constraint_mask;
unsigned long   group_constraint_val;
u64 (*bhrb_filter_map)(u64 branch_sample_type);
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 766f064f00fb..6936763246bd 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2206,9 +2206,9 @@ static void record_and_restart(struct perf_event *event, 
unsigned long val,
ppmu->get_mem_data_src)
ppmu->get_mem_data_src(&data.data_src, ppmu->flags, 
regs);
  
-		if (event->attr.sample_type & PERF_SAMPLE_WEIGHT &&

+   if (event->attr.sample_type & PERF_SAMPLE_WEIGHT_TYPE &&
ppmu->get_mem_weight)
-   ppmu->get_mem_weight(&data.weight.full);
+   ppmu->get_mem_weight(&data.weight.full, 
event->attr.sample_type);
  
  		if (perf_event_overflow(event, &data, regs))

power_pmu_stop(event, 0);
diff --git a/arch/powerpc/perf/isa207-common.c 
b/arch/powerpc/perf/isa207-common.c
index e4f577da33d8..5dcbdbd54598 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -284,8 +284,10 @@ void isa207_get_mem_data_src(union perf_mem_data_src 
*dsrc, u32 flags,
}
  }
  
-void isa207_get_mem_weight(u64 *weight)

+void isa207_get_mem_weight(u64 *weight, u64 type)
  {
+   union perf_sample_weight *weight_fields;
+   u64 weight_lat;
u64 mmcra = mfspr(SPRN_MMCRA);
u64 exp = MMCRA_THR_CTR_EXP(mmcra);
u64 mantissa = MMCRA_THR_CTR_MANT(mmcra);
@@ -296,9 +298,30 @@ void isa207_get_mem_weight(u64 *weight)
mantissa = P10_MMCRA_THR_CTR_MANT(mmcra);
  
  	if (val == 0 || val == 7)

-   *weight = 0;
+   weight_lat = 0;
else
-   *weight = mantissa << (2 * exp);
+   weight_lat = mantissa << (2 * exp);
+
+   /*
+* Use 64 bit weight field (full) if sample type is
+* WEIGHT.
+*
+* if sample type is WEIGHT_STRUCT:
+* - store memory latency in the lower 32 bits.
+* - For ISA v3.1, use remaining two 16 bit fields of
+*   perf_sample_weight to store cycle counter values
+*   from sier2.
+*/
+   weight_fields = (union perf_sample_weight *)weight;
+   if (type & PERF_SAMPLE_WEIGHT)
+   weight_fields->full = weight_lat;
+   else {
+   weight_fields->var1_dw = (u32)weight_lat;
+   if (cpu_has_feature(CPU_FTR_ARCH_31)) {
+   weight_fields->var2_w = 
P10_SIER2_FINISH_CYC(mfspr(SPRN_SIER2));
+   weight_fields->var3_w =

Re: [PATCH v2 -next] powerpc: kernel/time.c - cleanup warnings

2021-03-23 Thread heying (H)

Dear,


在 2021/3/24 6:18, Alexandre Belloni 写道:

Hello,

On 23/03/2021 05:12:57-0400, He Ying wrote:

We found these warnings in arch/powerpc/kernel/time.c as follows:
warning: symbol 'decrementer_max' was not declared. Should it be static?
warning: symbol 'rtc_lock' was not declared. Should it be static?
warning: symbol 'dtl_consumer' was not declared. Should it be static?

Declare 'decrementer_max' and 'rtc_lock' in powerpc asm/time.h.
Rename 'rtc_lock' in drviers/rtc/rtc-vr41xx.c to 'vr41xx_rtc_lock' to
avoid the conflict with the variable in powerpc asm/time.h.
Move 'dtl_consumer' definition behind "include " because it
is declared there.

Reported-by: Hulk Robot 
Signed-off-by: He Ying 
---
v2:
- Instead of including linux/mc146818rtc.h in powerpc kernel/time.c, declare
   rtc_lock in powerpc asm/time.h.


V1 was actually the correct thing to do. rtc_lock is there exactly
because chrp and maple are using mc146818 compatible RTCs. This is then
useful because then drivers/char/nvram.c is enabled. The proper fix
would be to scrap all of that and use rtc-cmos for those platforms as
this drives the RTC properly and exposes the NVRAM for the mc146818.


Do you mean that 'rtc_lock' declared in linux/mc146818rtc.h points to

same thing as that defined in powerpc kernel/time.c? And you think V1

was correct? Oh, I should have added you to my patch V1 senders:)



Or at least, if there are no users for the char/nvram driver on those
two platforms, remove the spinlock and stop enabling CONFIG_NVRAM or
more likely rename the symbol as it seems to be abused by both chrp and
powermac.

I'm not completely against the rename in vr41xxx but the fix for the
warnings can and should be contained in arch/powerpc.


Yes, I agree with you. But I have no choice because there is a compiling 
error.


Maybe there's a better way.

So, what about my patch V1? Should I resend it and add you to senders?


Thanks.



Re: [PATCH v4 44/46] KVM: PPC: Book3S HV P9: implement hash guest support

2021-03-23 Thread Nicholas Piggin
Excerpts from Fabiano Rosas's message of March 24, 2021 1:53 am:
> Nicholas Piggin  writes:
> 
>> Guest entry/exit has to restore and save/clear the SLB, plus several
>> other bits to accommodate hash guests in the P9 path.
>>
>> Radix host, hash guest support is removed from the P7/8 path.
>>
>> Signed-off-by: Nicholas Piggin 
>> ---
> 
> 
> 
>> diff --git a/arch/powerpc/kvm/book3s_hv_interrupt.c 
>> b/arch/powerpc/kvm/book3s_hv_interrupt.c
>> index cd84d2c37632..03fbfef708a8 100644
>> --- a/arch/powerpc/kvm/book3s_hv_interrupt.c
>> +++ b/arch/powerpc/kvm/book3s_hv_interrupt.c
>> @@ -55,6 +55,50 @@ static void __accumulate_time(struct kvm_vcpu *vcpu, 
>> struct kvmhv_tb_accumulator
>>  #define accumulate_time(vcpu, next) do {} while (0)
>>  #endif
>>
>> +static inline void mfslb(unsigned int idx, u64 *slbee, u64 *slbev)
>> +{
>> +asm volatile("slbmfev  %0,%1" : "=r" (*slbev) : "r" (idx));
>> +asm volatile("slbmfee  %0,%1" : "=r" (*slbee) : "r" (idx));
>> +}
>> +
>> +static inline void __mtslb(u64 slbee, u64 slbev)
>> +{
>> +asm volatile("slbmte %0,%1" :: "r" (slbev), "r" (slbee));
>> +}
>> +
>> +static inline void mtslb(unsigned int idx, u64 slbee, u64 slbev)
>> +{
>> +BUG_ON((slbee & 0xfff) != idx);
>> +
>> +__mtslb(slbee, slbev);
>> +}
>> +
>> +static inline void slb_invalidate(unsigned int ih)
>> +{
>> +asm volatile("slbia %0" :: "i"(ih));
>> +}
> 
> Fyi, in my environment the assembler complains:
> 
> {standard input}: Assembler messages:
> {standard input}:1293: Error: junk at end of line: `6'
>  
> {standard input}:2138: Error: junk at end of line: `6'
> make[3]: *** [../scripts/Makefile.build:271:
> arch/powerpc/kvm/book3s_hv_interrupt.o] Error 1
> 
> This works:
> 
> -   asm volatile("slbia %0" :: "i"(ih));
> +   asm volatile(PPC_SLBIA(%0) :: "i"(ih));
> 
> But I don't know what is going on.

Ah yes, we still need to use PPC_SLBIA. IH parameter to slbia was only 
added in binutils 2.27 and we support down to 2.23.

Thanks for the fix I'll add it.

Thanks,
Nick


Re: [PATCH v4 22/46] KVM: PPC: Book3S HV P9: Stop handling hcalls in real-mode in the P9 path

2021-03-23 Thread Nicholas Piggin
Excerpts from Fabiano Rosas's message of March 24, 2021 8:57 am:
> Nicholas Piggin  writes:
> 
>> In the interest of minimising the amount of code that is run in
>> "real-mode", don't handle hcalls in real mode in the P9 path.
>>
>> POWER8 and earlier are much more expensive to exit from HV real mode
>> and switch to host mode, because on those processors HV interrupts get
>> to the hypervisor with the MMU off, and the other threads in the core
>> need to be pulled out of the guest, and SLBs all need to be saved,
>> ERATs invalidated, and host SLB reloaded before the MMU is re-enabled
>> in host mode. Hash guests also require a lot of hcalls to run. The
>> XICS interrupt controller requires hcalls to run.
>>
>> By contrast, POWER9 has independent thread switching, and in radix mode
>> the hypervisor is already in a host virtual memory mode when the HV
>> interrupt is taken. Radix + xive guests don't need hcalls to handle
>> interrupts or manage translations.
>>
>> So it's much less important to handle hcalls in real mode in P9.
>>
>> Signed-off-by: Nicholas Piggin 
>> ---
> 
> 
> 
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index fa7614c37e08..17739aaee3d8 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -1142,12 +1142,13 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
>>  }
>>
>>  /*
>> - * Handle H_CEDE in the nested virtualization case where we haven't
>> - * called the real-mode hcall handlers in book3s_hv_rmhandlers.S.
>> + * Handle H_CEDE in the P9 path where we don't call the real-mode hcall
>> + * handlers in book3s_hv_rmhandlers.S.
>> + *
>>   * This has to be done early, not in kvmppc_pseries_do_hcall(), so
>>   * that the cede logic in kvmppc_run_single_vcpu() works properly.
>>   */
>> -static void kvmppc_nested_cede(struct kvm_vcpu *vcpu)
>> +static void kvmppc_cede(struct kvm_vcpu *vcpu)
>>  {
>>  vcpu->arch.shregs.msr |= MSR_EE;
>>  vcpu->arch.ceded = 1;
>> @@ -1403,9 +1404,15 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu 
>> *vcpu,
>>  /* hcall - punt to userspace */
>>  int i;
>>
>> -/* hypercall with MSR_PR has already been handled in rmode,
>> - * and never reaches here.
>> - */
>> +if (unlikely(vcpu->arch.shregs.msr & MSR_PR)) {
>> +/*
>> + * Guest userspace executed sc 1, reflect it back as a
>> + * privileged program check interrupt.
>> + */
>> +kvmppc_core_queue_program(vcpu, SRR1_PROGPRIV);
>> +r = RESUME_GUEST;
>> +break;
>> +}
> 
> This patch bypasses sc_1_fast_return so it breaks KVM-PR. L1 loops with
> the following output:
> 
> [9.503929][ T3443] Couldn't emulate instruction 0x4e800020 (op 19 xop 16)
> [9.503990][ T3443] kvmppc_exit_pr_progint: emulation at 48f4 failed 
> (4e800020)
> [9.504080][ T3443] Couldn't emulate instruction 0x4e800020 (op 19 xop 16)
> [9.504170][ T3443] kvmppc_exit_pr_progint: emulation at 48f4 failed 
> (4e800020)
> 
> 0x4e800020 is a blr after a sc 1 in SLOF.
> 
> For KVM-PR we need to inject a 0xc00 at some point, either here or
> before branching to no_try_real in book3s_hv_rmhandlers.S.

Ah, I didn't know about that PR KVM (I suppose I should test it but I 
haven't been able to get it running in the past).

Should be able to deal with that. This patch probably shouldn't change 
the syscall behaviour like this anyway.

Thanks,
Nick


Re: [PATCH v4 22/46] KVM: PPC: Book3S HV P9: Stop handling hcalls in real-mode in the P9 path

2021-03-23 Thread Nicholas Piggin
Excerpts from Fabiano Rosas's message of March 24, 2021 4:03 am:
> Nicholas Piggin  writes:
> 
>> In the interest of minimising the amount of code that is run in
>> "real-mode", don't handle hcalls in real mode in the P9 path.
>>
>> POWER8 and earlier are much more expensive to exit from HV real mode
>> and switch to host mode, because on those processors HV interrupts get
>> to the hypervisor with the MMU off, and the other threads in the core
>> need to be pulled out of the guest, and SLBs all need to be saved,
>> ERATs invalidated, and host SLB reloaded before the MMU is re-enabled
>> in host mode. Hash guests also require a lot of hcalls to run. The
>> XICS interrupt controller requires hcalls to run.
>>
>> By contrast, POWER9 has independent thread switching, and in radix mode
>> the hypervisor is already in a host virtual memory mode when the HV
>> interrupt is taken. Radix + xive guests don't need hcalls to handle
>> interrupts or manage translations.
>>
>> So it's much less important to handle hcalls in real mode in P9.
>>
>> Signed-off-by: Nicholas Piggin 
> 
> I tried this again in the L2 with xive=off and it works as expected now.
> 
> Tested-by: Fabiano Rosas 

Oh good, thanks for spotting the problem and re testing.

Thanks,
Nick



Re: [PATCH v4 22/46] KVM: PPC: Book3S HV P9: Stop handling hcalls in real-mode in the P9 path

2021-03-23 Thread Nicholas Piggin
Excerpts from Cédric Le Goater's message of March 23, 2021 11:23 pm:
> On 3/23/21 2:02 AM, Nicholas Piggin wrote:
>> In the interest of minimising the amount of code that is run in
>> "real-mode", don't handle hcalls in real mode in the P9 path.
>> 
>> POWER8 and earlier are much more expensive to exit from HV real mode
>> and switch to host mode, because on those processors HV interrupts get
>> to the hypervisor with the MMU off, and the other threads in the core
>> need to be pulled out of the guest, and SLBs all need to be saved,
>> ERATs invalidated, and host SLB reloaded before the MMU is re-enabled
>> in host mode. Hash guests also require a lot of hcalls to run. The
>> XICS interrupt controller requires hcalls to run.
>> 
>> By contrast, POWER9 has independent thread switching, and in radix mode
>> the hypervisor is already in a host virtual memory mode when the HV
>> interrupt is taken. Radix + xive guests don't need hcalls to handle
>> interrupts or manage translations.
>> 
>> So it's much less important to handle hcalls in real mode in P9.
>> 
>> Signed-off-by: Nicholas Piggin 
>> ---
>>  arch/powerpc/include/asm/kvm_ppc.h  |  5 ++
>>  arch/powerpc/kvm/book3s_hv.c| 57 
>>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |  5 ++
>>  arch/powerpc/kvm/book3s_xive.c  | 70 +
>>  4 files changed, 127 insertions(+), 10 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
>> b/arch/powerpc/include/asm/kvm_ppc.h
>> index 73b1ca5a6471..db6646c2ade2 100644
>> --- a/arch/powerpc/include/asm/kvm_ppc.h
>> +++ b/arch/powerpc/include/asm/kvm_ppc.h
>> @@ -607,6 +607,7 @@ extern void kvmppc_free_pimap(struct kvm *kvm);
>>  extern int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall);
>>  extern void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu);
>>  extern int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd);
>> +extern int kvmppc_xive_xics_hcall(struct kvm_vcpu *vcpu, u32 req);
>>  extern u64 kvmppc_xics_get_icp(struct kvm_vcpu *vcpu);
>>  extern int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 icpval);
>>  extern int kvmppc_xics_connect_vcpu(struct kvm_device *dev,
>> @@ -639,6 +640,8 @@ static inline int kvmppc_xics_enabled(struct kvm_vcpu 
>> *vcpu)
>>  static inline void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu) { }
>>  static inline int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd)
>>  { return 0; }
>> +static inline int kvmppc_xive_xics_hcall(struct kvm_vcpu *vcpu, u32 req)
>> +{ return 0; }
>>  #endif
>>  
>>  #ifdef CONFIG_KVM_XIVE
>> @@ -673,6 +676,7 @@ extern int kvmppc_xive_set_irq(struct kvm *kvm, int 
>> irq_source_id, u32 irq,
>> int level, bool line_status);
>>  extern void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu);
>>  extern void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu);
>> +extern void kvmppc_xive_cede_vcpu(struct kvm_vcpu *vcpu);
>>  
>>  static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)
>>  {
>> @@ -714,6 +718,7 @@ static inline int kvmppc_xive_set_irq(struct kvm *kvm, 
>> int irq_source_id, u32 ir
>>int level, bool line_status) { return 
>> -ENODEV; }
>>  static inline void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu) { }
>>  static inline void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu) { }
>> +static inline void kvmppc_xive_cede_vcpu(struct kvm_vcpu *vcpu) { }
>>  
>>  static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)
>>  { return 0; }
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index fa7614c37e08..17739aaee3d8 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -1142,12 +1142,13 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
>>  }
>>  
>>  /*
>> - * Handle H_CEDE in the nested virtualization case where we haven't
>> - * called the real-mode hcall handlers in book3s_hv_rmhandlers.S.
>> + * Handle H_CEDE in the P9 path where we don't call the real-mode hcall
>> + * handlers in book3s_hv_rmhandlers.S.
>> + *
>>   * This has to be done early, not in kvmppc_pseries_do_hcall(), so
>>   * that the cede logic in kvmppc_run_single_vcpu() works properly.
>>   */
>> -static void kvmppc_nested_cede(struct kvm_vcpu *vcpu)
>> +static void kvmppc_cede(struct kvm_vcpu *vcpu)
>>  {
>>  vcpu->arch.shregs.msr |= MSR_EE;
>>  vcpu->arch.ceded = 1;
>> @@ -1403,9 +1404,15 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu 
>> *vcpu,
>>  /* hcall - punt to userspace */
>>  int i;
>>  
>> -/* hypercall with MSR_PR has already been handled in rmode,
>> - * and never reaches here.
>> - */
>> +if (unlikely(vcpu->arch.shregs.msr & MSR_PR)) {
>> +/*
>> + * Guest userspace executed sc 1, reflect it back as a
>> + * privileged program check interrupt.
>> + */
>> +kvmppc_co

[powerpc:next-test] BUILD SUCCESS 8a83feefbd5254ae7f13aff3e4097dd7d8723bce

2021-03-23 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
branch HEAD: 8a83feefbd5254ae7f13aff3e4097dd7d8723bce  cxl: Fix couple of 
spellings

elapsed time: 725m

configs tested: 109
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
x86_64   allyesconfig
riscvallmodconfig
i386 allyesconfig
riscvallyesconfig
h8300allyesconfig
arc nsimosci_hs_smp_defconfig
powerpc  ppc40x_defconfig
powerpc  makalu_defconfig
m68k   m5208evb_defconfig
mips cu1000-neo_defconfig
powerpc ksi8560_defconfig
armmps2_defconfig
powerpc  walnut_defconfig
arm rpc_defconfig
mipsjmr3927_defconfig
arm am200epdkit_defconfig
powerpc   currituck_defconfig
sh sh7710voipgw_defconfig
arcvdk_hs38_defconfig
mips  bmips_stb_defconfig
ia64generic_defconfig
arcnsim_700_defconfig
arm  pxa910_defconfig
xtensa  nommu_kc705_defconfig
powerpc mpc8272_ads_defconfig
powerpc linkstation_defconfig
powerpc rainier_defconfig
mipsmaltaup_defconfig
arm  pxa168_defconfig
arm  collie_defconfig
arm pxa_defconfig
powerpc tqm8555_defconfig
powerpc   eiger_defconfig
arm   aspeed_g5_defconfig
powerpc pseries_defconfig
arm  pxa255-idp_defconfig
arm  exynos_defconfig
h8300alldefconfig
sh   se7780_defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
s390 allmodconfig
parisc   allyesconfig
s390defconfig
sparcallyesconfig
sparc   defconfig
i386   tinyconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
x86_64   randconfig-a006-20210323
x86_64   randconfig-a004-20210323
x86_64   randconfig-a005-20210323
i386 randconfig-a003-20210323
i386 randconfig-a004-20210323
i386 randconfig-a001-20210323
i386 randconfig-a002-20210323
i386 randconfig-a006-20210323
i386 randconfig-a005-20210323
i386 randconfig-a014-20210323
i386 randconfig-a011-20210323
i386 randconfig-a015-20210323
i386 randconfig-a016-20210323
i386 randconfig-a012-20210323
i386 randconfig-a013-20210323
x86_64   randconfig-a002-20210323
x86_64   randconfig-a003-20210323
x86_64   randconfig-a001-20210323
riscvnommu_k210_defconfig
riscvnommu_virt_defconfig
riscv allnoconfig
riscv   defconfig
riscv  rv32_defconfig
x86_64rhel-7.6-kselftests
x86_64  defconfig
x86_64   rhel

[powerpc:merge] BUILD SUCCESS 909b15d4ac3524a89c6df8c60e0cb0b4d5a3c248

2021-03-23 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
merge
branch HEAD: 909b15d4ac3524a89c6df8c60e0cb0b4d5a3c248  Automatic merge of 
'fixes' into merge (2021-03-23 22:53)

elapsed time: 725m

configs tested: 127
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
x86_64   allyesconfig
i386 allyesconfig
powerpc   maple_defconfig
powerpc   lite5200b_defconfig
sparcalldefconfig
mipsmaltaup_xpa_defconfig
powerpc mpc836x_rdk_defconfig
armlart_defconfig
m68k   m5208evb_defconfig
mips cu1000-neo_defconfig
powerpc ksi8560_defconfig
armmps2_defconfig
powerpc  tqm8xx_defconfig
m68k alldefconfig
powerpc  mgcoge_defconfig
sh   se7751_defconfig
mipsomega2p_defconfig
powerpc  ppc64e_defconfig
powerpc  walnut_defconfig
arm rpc_defconfig
mipsjmr3927_defconfig
arm am200epdkit_defconfig
powerpc   currituck_defconfig
sh sh7710voipgw_defconfig
powerpcsocrates_defconfig
nds32 allnoconfig
arm   imx_v6_v7_defconfig
armneponset_defconfig
shhp6xx_defconfig
arm orion5x_defconfig
mipsmalta_qemu_32r6_defconfig
mips   capcella_defconfig
arm lubbock_defconfig
sh   alldefconfig
powerpc ep8248e_defconfig
powerpc tqm8540_defconfig
arm  integrator_defconfig
riscv  rv32_defconfig
powerpc  mpc866_ads_defconfig
arm   mainstone_defconfig
sh sh03_defconfig
m68k  multi_defconfig
arm pxa_defconfig
powerpc tqm8555_defconfig
powerpc   eiger_defconfig
arm  pxa168_defconfig
mips cu1830-neo_defconfig
powerpc  obs600_defconfig
powerpc64   defconfig
mips  ath25_defconfig
arm   aspeed_g5_defconfig
powerpc pseries_defconfig
arm  pxa255-idp_defconfig
arm  exynos_defconfig
h8300alldefconfig
sh   se7780_defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
s390 allmodconfig
parisc   allyesconfig
s390defconfig
sparcallyesconfig
sparc   defconfig
i386   tinyconfig
i386defconfig
nios2   defconfig
arc  allyesconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
x86_64   randconfig-a002-20210323
x86_64   randconfig-a003-20210323
x86_64   randconfig-a006-20210323
x86_64   randconfig-a001-20210323
x86_64   randconfig-a004-20210323
x86_64   randconfig-a005-20210323
i386 randconfig-a003-20210323
i386 randconfig-a004-202

[powerpc:fixes-test] BUILD SUCCESS 274cb1ca2e7ce02cab56f5f4c61a74aeb566f931

2021-03-23 Thread kernel test robot
  obs600_defconfig
m68kmvme16x_defconfig
nios2 3c120_defconfig
sh  landisk_defconfig
sh   secureedge5410_defconfig
arm  integrator_defconfig
powerpc mpc836x_mds_defconfig
powerpc mpc8272_ads_defconfig
powerpc linkstation_defconfig
powerpc rainier_defconfig
mipsmaltaup_defconfig
arm  pxa168_defconfig
arm  collie_defconfig
arm lpc18xx_defconfig
shecovec24-romimage_defconfig
mips   rs90_defconfig
shsh7785lcr_defconfig
sh   se7721_defconfig
arm davinci_all_defconfig
powerpc  ppc6xx_defconfig
powerpc mpc834x_mds_defconfig
sh  rsk7201_defconfig
powerpc tqm8541_defconfig
powerpc mpc834x_itx_defconfig
sh  rsk7203_defconfig
mips loongson1b_defconfig
arm pxa_defconfig
powerpc tqm8555_defconfig
powerpc   eiger_defconfig
mips cu1830-neo_defconfig
powerpc64   defconfig
mips  ath25_defconfig
arm axm55xx_defconfig
arc nsimosci_hs_smp_defconfig
powerpc asp8347_defconfig
archsdk_defconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
nios2   defconfig
arc  allyesconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
s390 allmodconfig
parisc   allyesconfig
s390defconfig
sparcallyesconfig
sparc   defconfig
i386   tinyconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
x86_64   randconfig-a002-20210323
x86_64   randconfig-a003-20210323
x86_64   randconfig-a006-20210323
x86_64   randconfig-a001-20210323
x86_64   randconfig-a004-20210323
x86_64   randconfig-a005-20210323
i386 randconfig-a003-20210323
i386 randconfig-a001-20210323
i386 randconfig-a002-20210323
i386 randconfig-a004-20210323
i386 randconfig-a006-20210323
i386 randconfig-a005-20210323
i386 randconfig-a004-20210324
i386 randconfig-a003-20210324
i386 randconfig-a001-20210324
i386 randconfig-a002-20210324
i386 randconfig-a006-20210324
i386 randconfig-a005-20210324
i386 randconfig-a015-20210323
i386 randconfig-a016-20210323
i386 randconfig-a014-20210323
i386 randconfig-a011-20210323
i386 randconfig-a012-20210323
i386 randconfig-a013-20210323
riscvnommu_virt_defconfig
riscv  rv32_defconfig
riscvnommu_k210_defconfig
riscv allnoconfig
riscv   defconfig
x86_64rhel-7.6-kselftests
x86_64  defconfig
x86_64   rhel-8.3
x86_64  rhel-8.3-kbuiltin
x86_64  kexec

clang tested configs:
x86_64   randconfig-a012-20210323
x86_64   randconfig-a015-20210323
x86_64   randconfig-a013-20210323
x86_64   randconfig-a014-20210323
x86_64   randconfig-a011-20210323
x86_64   randconfig-a016-20210323

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01

Re: [PATCH v2 -next] powerpc: kernel/time.c - cleanup warnings

2021-03-23 Thread Alexandre Belloni
On 23/03/2021 23:18:17+0100, Alexandre Belloni wrote:
> Hello,
> 
> On 23/03/2021 05:12:57-0400, He Ying wrote:
> > We found these warnings in arch/powerpc/kernel/time.c as follows:
> > warning: symbol 'decrementer_max' was not declared. Should it be static?
> > warning: symbol 'rtc_lock' was not declared. Should it be static?
> > warning: symbol 'dtl_consumer' was not declared. Should it be static?
> > 
> > Declare 'decrementer_max' and 'rtc_lock' in powerpc asm/time.h.
> > Rename 'rtc_lock' in drviers/rtc/rtc-vr41xx.c to 'vr41xx_rtc_lock' to
> > avoid the conflict with the variable in powerpc asm/time.h.
> > Move 'dtl_consumer' definition behind "include " because it
> > is declared there.
> > 
> > Reported-by: Hulk Robot 
> > Signed-off-by: He Ying 
> > ---
> > v2:
> > - Instead of including linux/mc146818rtc.h in powerpc kernel/time.c, declare
> >   rtc_lock in powerpc asm/time.h.
> > 
> 
> V1 was actually the correct thing to do. rtc_lock is there exactly
> because chrp and maple are using mc146818 compatible RTCs. This is then
> useful because then drivers/char/nvram.c is enabled. The proper fix
> would be to scrap all of that and use rtc-cmos for those platforms as
> this drives the RTC properly and exposes the NVRAM for the mc146818.
> 
> Or at least, if there are no users for the char/nvram driver on those
> two platforms, remove the spinlock and stop enabling CONFIG_NVRAM or
> more likely rename the symbol as it seems to be abused by both chrp and
> powermac.
> 

Ok so rtc_lock is not even used by the char/nvram.c driver as it is
completely compiled out.

I guess it is fine having it move to the individual platform as looking
very quickly at the Kconfig, it is not possible to select both
simultaneously. Tentative patch:

8<-
>From dfa59b6f44fdfdefafffa7666aec89e62bbd5c80 Mon Sep 17 00:00:00 2001
From: Alexandre Belloni 
Date: Wed, 24 Mar 2021 00:00:03 +0100
Subject: [PATCH] powerpc: move rtc_lock to specific platforms

Signed-off-by: Alexandre Belloni 
---
 arch/powerpc/kernel/time.c  | 3 ---
 arch/powerpc/platforms/chrp/time.c  | 2 +-
 arch/powerpc/platforms/maple/time.c | 2 ++
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 67feb3524460..d3bb189ea7f4 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -123,9 +123,6 @@ EXPORT_SYMBOL(tb_ticks_per_usec);
 unsigned long tb_ticks_per_sec;
 EXPORT_SYMBOL(tb_ticks_per_sec);   /* for cputime_t conversions */
 
-DEFINE_SPINLOCK(rtc_lock);
-EXPORT_SYMBOL_GPL(rtc_lock);
-
 static u64 tb_to_ns_scale __read_mostly;
 static unsigned tb_to_ns_shift __read_mostly;
 static u64 boot_tb __read_mostly;
diff --git a/arch/powerpc/platforms/chrp/time.c 
b/arch/powerpc/platforms/chrp/time.c
index acde7bbe0716..ea90c15f5edd 100644
--- a/arch/powerpc/platforms/chrp/time.c
+++ b/arch/powerpc/platforms/chrp/time.c
@@ -30,7 +30,7 @@
 
 #include 
 
-extern spinlock_t rtc_lock;
+DEFINE_SPINLOCK(rtc_lock);
 
 #define NVRAM_AS0  0x74
 #define NVRAM_AS1  0x75
diff --git a/arch/powerpc/platforms/maple/time.c 
b/arch/powerpc/platforms/maple/time.c
index 78209bb7629c..ddda02010d86 100644
--- a/arch/powerpc/platforms/maple/time.c
+++ b/arch/powerpc/platforms/maple/time.c
@@ -34,6 +34,8 @@
 #define DBG(x...)
 #endif
 
+DEFINE_SPINLOCK(rtc_lock);
+
 static int maple_rtc_addr;
 
 static int maple_clock_read(int addr)
-- 
2.25.1


> I'm not completely against the rename in vr41xxx but the fix for the
> warnings can and should be contained in arch/powerpc.
> 
> >  arch/powerpc/include/asm/time.h |  3 +++
> >  arch/powerpc/kernel/time.c  |  6 ++
> >  drivers/rtc/rtc-vr41xx.c| 22 +++---
> >  3 files changed, 16 insertions(+), 15 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/time.h 
> > b/arch/powerpc/include/asm/time.h
> > index 8dd3cdb25338..64a3ef0b4270 100644
> > --- a/arch/powerpc/include/asm/time.h
> > +++ b/arch/powerpc/include/asm/time.h
> > @@ -12,6 +12,7 @@
> >  #ifdef __KERNEL__
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #include 
> >  #include 
> > @@ -22,6 +23,8 @@ extern unsigned long tb_ticks_per_jiffy;
> >  extern unsigned long tb_ticks_per_usec;
> >  extern unsigned long tb_ticks_per_sec;
> >  extern struct clock_event_device decrementer_clockevent;
> > +extern u64 decrementer_max;
> > +extern spinlock_t rtc_lock;
> >  
> >  
> >  extern void generic_calibrate_decr(void);
> > diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
> > index b67d93a609a2..60b6ac7d3685 100644
> > --- a/arch/powerpc/kernel/time.c
> > +++ b/arch/powerpc/kernel/time.c
> > @@ -150,10 +150,6 @@ bool tb_invalid;
> >  u64 __cputime_usec_factor;
> >  EXPORT_SYMBOL(__cputime_usec_factor);
> >  
> > -#ifdef CONFIG_PPC_SPLPAR
> > -void (*dtl_consumer)(struct dtl_entry *, u64);
> > -#endif
> > -
> >  static void calc_cputime_factors(void)
> >  {
> > struct div_result res;
> > @@ -179,6 +175,8 @@ static inline uns

Re: [PATCH v4 22/46] KVM: PPC: Book3S HV P9: Stop handling hcalls in real-mode in the P9 path

2021-03-23 Thread Fabiano Rosas
Nicholas Piggin  writes:

> In the interest of minimising the amount of code that is run in
> "real-mode", don't handle hcalls in real mode in the P9 path.
>
> POWER8 and earlier are much more expensive to exit from HV real mode
> and switch to host mode, because on those processors HV interrupts get
> to the hypervisor with the MMU off, and the other threads in the core
> need to be pulled out of the guest, and SLBs all need to be saved,
> ERATs invalidated, and host SLB reloaded before the MMU is re-enabled
> in host mode. Hash guests also require a lot of hcalls to run. The
> XICS interrupt controller requires hcalls to run.
>
> By contrast, POWER9 has independent thread switching, and in radix mode
> the hypervisor is already in a host virtual memory mode when the HV
> interrupt is taken. Radix + xive guests don't need hcalls to handle
> interrupts or manage translations.
>
> So it's much less important to handle hcalls in real mode in P9.
>
> Signed-off-by: Nicholas Piggin 
> ---



> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index fa7614c37e08..17739aaee3d8 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -1142,12 +1142,13 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
>  }
>
>  /*
> - * Handle H_CEDE in the nested virtualization case where we haven't
> - * called the real-mode hcall handlers in book3s_hv_rmhandlers.S.
> + * Handle H_CEDE in the P9 path where we don't call the real-mode hcall
> + * handlers in book3s_hv_rmhandlers.S.
> + *
>   * This has to be done early, not in kvmppc_pseries_do_hcall(), so
>   * that the cede logic in kvmppc_run_single_vcpu() works properly.
>   */
> -static void kvmppc_nested_cede(struct kvm_vcpu *vcpu)
> +static void kvmppc_cede(struct kvm_vcpu *vcpu)
>  {
>   vcpu->arch.shregs.msr |= MSR_EE;
>   vcpu->arch.ceded = 1;
> @@ -1403,9 +1404,15 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
>   /* hcall - punt to userspace */
>   int i;
>
> - /* hypercall with MSR_PR has already been handled in rmode,
> -  * and never reaches here.
> -  */
> + if (unlikely(vcpu->arch.shregs.msr & MSR_PR)) {
> + /*
> +  * Guest userspace executed sc 1, reflect it back as a
> +  * privileged program check interrupt.
> +  */
> + kvmppc_core_queue_program(vcpu, SRR1_PROGPRIV);
> + r = RESUME_GUEST;
> + break;
> + }

This patch bypasses sc_1_fast_return so it breaks KVM-PR. L1 loops with
the following output:

[9.503929][ T3443] Couldn't emulate instruction 0x4e800020 (op 19 xop 16)
[9.503990][ T3443] kvmppc_exit_pr_progint: emulation at 48f4 failed 
(4e800020)
[9.504080][ T3443] Couldn't emulate instruction 0x4e800020 (op 19 xop 16)
[9.504170][ T3443] kvmppc_exit_pr_progint: emulation at 48f4 failed 
(4e800020)

0x4e800020 is a blr after a sc 1 in SLOF.

For KVM-PR we need to inject a 0xc00 at some point, either here or
before branching to no_try_real in book3s_hv_rmhandlers.S.

>
>   run->papr_hcall.nr = kvmppc_get_gpr(vcpu, 3);
>   for (i = 0; i < 9; ++i)
> @@ -3663,6 +3670,12 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu 
> *vcpu, u64 time_limit,
>   return trap;
>  }
>
> +static inline bool hcall_is_xics(unsigned long req)
> +{
> + return (req == H_EOI || req == H_CPPR || req == H_IPI ||
> + req == H_IPOLL || req == H_XIRR || req == H_XIRR_X);
> +}
> +
>  /*
>   * Virtual-mode guest entry for POWER9 and later when the host and
>   * guest are both using the radix MMU.  The LPIDR has already been set.
> @@ -3774,15 +3787,36 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu 
> *vcpu, u64 time_limit,
>   /* H_CEDE has to be handled now, not later */
>   if (trap == BOOK3S_INTERRUPT_SYSCALL && !vcpu->arch.nested &&
>   kvmppc_get_gpr(vcpu, 3) == H_CEDE) {
> - kvmppc_nested_cede(vcpu);
> + kvmppc_cede(vcpu);
>   kvmppc_set_gpr(vcpu, 3, 0);
>   trap = 0;
>   }
>   } else {
>   kvmppc_xive_push_vcpu(vcpu);
>   trap = kvmhv_load_hv_regs_and_go(vcpu, time_limit, lpcr);
> + if (trap == BOOK3S_INTERRUPT_SYSCALL && !vcpu->arch.nested &&
> + !(vcpu->arch.shregs.msr & MSR_PR)) {
> + unsigned long req = kvmppc_get_gpr(vcpu, 3);
> +
> + /* H_CEDE has to be handled now, not later */
> + if (req == H_CEDE) {
> + kvmppc_cede(vcpu);
> + kvmppc_xive_cede_vcpu(vcpu); /* may un-cede */
> + kvmppc_set_gpr(vcpu, 3, 0);
> + trap = 0;
> +
> + /* XICS hca

Re: [PATCH v2 -next] powerpc: kernel/time.c - cleanup warnings

2021-03-23 Thread Alexandre Belloni
Hello,

On 23/03/2021 05:12:57-0400, He Ying wrote:
> We found these warnings in arch/powerpc/kernel/time.c as follows:
> warning: symbol 'decrementer_max' was not declared. Should it be static?
> warning: symbol 'rtc_lock' was not declared. Should it be static?
> warning: symbol 'dtl_consumer' was not declared. Should it be static?
> 
> Declare 'decrementer_max' and 'rtc_lock' in powerpc asm/time.h.
> Rename 'rtc_lock' in drviers/rtc/rtc-vr41xx.c to 'vr41xx_rtc_lock' to
> avoid the conflict with the variable in powerpc asm/time.h.
> Move 'dtl_consumer' definition behind "include " because it
> is declared there.
> 
> Reported-by: Hulk Robot 
> Signed-off-by: He Ying 
> ---
> v2:
> - Instead of including linux/mc146818rtc.h in powerpc kernel/time.c, declare
>   rtc_lock in powerpc asm/time.h.
> 

V1 was actually the correct thing to do. rtc_lock is there exactly
because chrp and maple are using mc146818 compatible RTCs. This is then
useful because then drivers/char/nvram.c is enabled. The proper fix
would be to scrap all of that and use rtc-cmos for those platforms as
this drives the RTC properly and exposes the NVRAM for the mc146818.

Or at least, if there are no users for the char/nvram driver on those
two platforms, remove the spinlock and stop enabling CONFIG_NVRAM or
more likely rename the symbol as it seems to be abused by both chrp and
powermac.

I'm not completely against the rename in vr41xxx but the fix for the
warnings can and should be contained in arch/powerpc.

>  arch/powerpc/include/asm/time.h |  3 +++
>  arch/powerpc/kernel/time.c  |  6 ++
>  drivers/rtc/rtc-vr41xx.c| 22 +++---
>  3 files changed, 16 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
> index 8dd3cdb25338..64a3ef0b4270 100644
> --- a/arch/powerpc/include/asm/time.h
> +++ b/arch/powerpc/include/asm/time.h
> @@ -12,6 +12,7 @@
>  #ifdef __KERNEL__
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -22,6 +23,8 @@ extern unsigned long tb_ticks_per_jiffy;
>  extern unsigned long tb_ticks_per_usec;
>  extern unsigned long tb_ticks_per_sec;
>  extern struct clock_event_device decrementer_clockevent;
> +extern u64 decrementer_max;
> +extern spinlock_t rtc_lock;
>  
>  
>  extern void generic_calibrate_decr(void);
> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
> index b67d93a609a2..60b6ac7d3685 100644
> --- a/arch/powerpc/kernel/time.c
> +++ b/arch/powerpc/kernel/time.c
> @@ -150,10 +150,6 @@ bool tb_invalid;
>  u64 __cputime_usec_factor;
>  EXPORT_SYMBOL(__cputime_usec_factor);
>  
> -#ifdef CONFIG_PPC_SPLPAR
> -void (*dtl_consumer)(struct dtl_entry *, u64);
> -#endif
> -
>  static void calc_cputime_factors(void)
>  {
>   struct div_result res;
> @@ -179,6 +175,8 @@ static inline unsigned long read_spurr(unsigned long tb)
>  
>  #include 
>  
> +void (*dtl_consumer)(struct dtl_entry *, u64);
> +
>  /*
>   * Scan the dispatch trace log and count up the stolen time.
>   * Should be called with interrupts disabled.
> diff --git a/drivers/rtc/rtc-vr41xx.c b/drivers/rtc/rtc-vr41xx.c
> index 5a9f9ad86d32..cc31db058197 100644
> --- a/drivers/rtc/rtc-vr41xx.c
> +++ b/drivers/rtc/rtc-vr41xx.c
> @@ -72,7 +72,7 @@ static void __iomem *rtc2_base;
>  
>  static unsigned long epoch = 1970;   /* Jan 1 1970 00:00:00 */
>  
> -static DEFINE_SPINLOCK(rtc_lock);
> +static DEFINE_SPINLOCK(vr41xx_rtc_lock);
>  static char rtc_name[] = "RTC";
>  static unsigned long periodic_count;
>  static unsigned int alarm_enabled;
> @@ -101,13 +101,13 @@ static inline time64_t read_elapsed_second(void)
>  
>  static inline void write_elapsed_second(time64_t sec)
>  {
> - spin_lock_irq(&rtc_lock);
> + spin_lock_irq(&vr41xx_rtc_lock);
>  
>   rtc1_write(ETIMELREG, (uint16_t)(sec << 15));
>   rtc1_write(ETIMEMREG, (uint16_t)(sec >> 1));
>   rtc1_write(ETIMEHREG, (uint16_t)(sec >> 17));
>  
> - spin_unlock_irq(&rtc_lock);
> + spin_unlock_irq(&vr41xx_rtc_lock);
>  }
>  
>  static int vr41xx_rtc_read_time(struct device *dev, struct rtc_time *time)
> @@ -139,14 +139,14 @@ static int vr41xx_rtc_read_alarm(struct device *dev, 
> struct rtc_wkalrm *wkalrm)
>   unsigned long low, mid, high;
>   struct rtc_time *time = &wkalrm->time;
>  
> - spin_lock_irq(&rtc_lock);
> + spin_lock_irq(&vr41xx_rtc_lock);
>  
>   low = rtc1_read(ECMPLREG);
>   mid = rtc1_read(ECMPMREG);
>   high = rtc1_read(ECMPHREG);
>   wkalrm->enabled = alarm_enabled;
>  
> - spin_unlock_irq(&rtc_lock);
> + spin_unlock_irq(&vr41xx_rtc_lock);
>  
>   rtc_time64_to_tm((high << 17) | (mid << 1) | (low >> 15), time);
>  
> @@ -159,7 +159,7 @@ static int vr41xx_rtc_set_alarm(struct device *dev, 
> struct rtc_wkalrm *wkalrm)
>  
>   alarm_sec = rtc_tm_to_time64(&wkalrm->time);
>  
> - spin_lock_irq(&rtc_lock);
> + spin_lock_irq(&vr41xx_rtc_lock);
>  
>   if (alarm_enabled)
>  

Re: [PATCH] macintosh: A typo fix

2021-03-23 Thread Randy Dunlap
On 3/23/21 1:46 PM, Bhaskar Chowdhury wrote:
> 
> s/coment/comment/
> 
> Signed-off-by: Bhaskar Chowdhury 

Acked-by: Randy Dunlap 

> ---
>  drivers/macintosh/windfarm_smu_controls.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/macintosh/windfarm_smu_controls.c 
> b/drivers/macintosh/windfarm_smu_controls.c
> index 79cb1ad09bfd..75966052819a 100644
> --- a/drivers/macintosh/windfarm_smu_controls.c
> +++ b/drivers/macintosh/windfarm_smu_controls.c
> @@ -94,7 +94,7 @@ static int smu_set_fan(int pwm, u8 id, u16 value)
>   return rc;
>   wait_for_completion(&comp);
> 
> - /* Handle fallback (see coment above) */
> + /* Handle fallback (see comment above) */
>   if (cmd.status != 0 && smu_supports_new_fans_ops) {
>   printk(KERN_WARNING "windfarm: SMU failed new fan command "
>  "falling back to old method\n");
> --


-- 
~Randy



[PATCH v2 1/1] hotplug-cpu.c: show 'last online CPU' error in dlpar_cpu_offline()

2021-03-23 Thread Daniel Henrique Barboza
One of the reasons that dlpar_cpu_offline can fail is when attempting to
offline the last online CPU of the kernel. This can be observed in a
pseries QEMU guest that has hotplugged CPUs. If the user offlines all
other CPUs of the guest, and a hotplugged CPU is now the last online
CPU, trying to reclaim it will fail. See [1] for an example.

The current error message in this situation returns rc with -EBUSY and a
generic explanation, e.g.:

pseries-hotplug-cpu: Failed to offline CPU PowerPC,POWER9, rc: -16

EBUSY can be caused by other conditions, such as cpu_hotplug_disable
being true. Throwing a more specific error message for this case,
instead of just "Failed to offline CPU", makes it clearer that the error
is in fact a known error situation instead of other generic/unknown
cause.

This patch adds a 'last online' check in dlpar_cpu_offline() to catch
the 'last online CPU' offline error, eturning a more informative error
message:

pseries-hotplug-cpu: Unable to remove last online CPU PowerPC,POWER9

[1] https://bugzilla.redhat.com/1911414

Signed-off-by: Daniel Henrique Barboza 
---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c 
b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 12cbffd3c2e3..3ac7e904385c 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -271,6 +271,18 @@ static int dlpar_offline_cpu(struct device_node *dn)
if (!cpu_online(cpu))
break;
 
+   /* device_offline() will return -EBUSY (via cpu_down())
+* if there is only one CPU left. Check it here to fail
+* earlier and with a more informative error message,
+* while also retaining the cpu_add_remove_lock to be 
sure
+* that no CPUs are being online/offlined during this
+* check. */
+   if (num_online_cpus() == 1) {
+   pr_warn("Unable to remove last online CPU 
%pOFn\n", dn);
+   rc = -EBUSY;
+   goto out_unlock;
+   }
+
cpu_maps_update_done();
rc = device_offline(get_cpu_device(cpu));
if (rc)
@@ -283,6 +295,7 @@ static int dlpar_offline_cpu(struct device_node *dn)
thread);
}
}
+out_unlock:
cpu_maps_update_done();
 
 out:
-- 
2.30.2



[PATCH v2 0/1] show 'last online CPU' error in dlpar_cpu_offline()

2021-03-23 Thread Daniel Henrique Barboza
changes in v2 after Michael Ellerman review:
- moved the verification code from dlpar_cpu_remove() to
  dlpar_cpu_offline(), while holding cpu_add_remove_lock
- reworded the commit message and code comment
v1 link: 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20210305173845.451158-1-danielhb...@gmail.com/

Daniel Henrique Barboza (1):
  hotplug-cpu.c: show 'last online CPU' error in dlpar_cpu_offline()

 arch/powerpc/platforms/pseries/hotplug-cpu.c | 13 +
 1 file changed, 13 insertions(+)

-- 
2.30.2



[PATCH] macintosh: A typo fix

2021-03-23 Thread Bhaskar Chowdhury


s/coment/comment/

Signed-off-by: Bhaskar Chowdhury 
---
 drivers/macintosh/windfarm_smu_controls.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/macintosh/windfarm_smu_controls.c 
b/drivers/macintosh/windfarm_smu_controls.c
index 79cb1ad09bfd..75966052819a 100644
--- a/drivers/macintosh/windfarm_smu_controls.c
+++ b/drivers/macintosh/windfarm_smu_controls.c
@@ -94,7 +94,7 @@ static int smu_set_fan(int pwm, u8 id, u16 value)
return rc;
wait_for_completion(&comp);

-   /* Handle fallback (see coment above) */
+   /* Handle fallback (see comment above) */
if (cmd.status != 0 && smu_supports_new_fans_ops) {
printk(KERN_WARNING "windfarm: SMU failed new fan command "
   "falling back to old method\n");
--
2.30.1



Re: [PATCH 02/10] ARM: disable CONFIG_IDE in footbridge_defconfig

2021-03-23 Thread Russell King - ARM Linux admin
On Mon, Mar 22, 2021 at 06:10:01PM +0100, Cye Borg wrote:
> PWS 500au:
> 
> snow / # lspci -vvx -s 7.1
> 00:07.1 IDE interface: Contaq Microsystems 82c693 (prog-if 80 [ISA
> Compatibility mode-only controller, supports bus mastering])
> Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr+ Stepping- SERR- FastB2B- DisINTx-
> Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
> >TAbort- SERR-  Latency: 0
> Interrupt: pin A routed to IRQ 0
> Region 0: I/O ports at 01f0 [size=8]
> Region 1: I/O ports at 03f4
> Region 4: I/O ports at 9080 [size=16]
> Kernel driver in use: pata_cypress
> Kernel modules: pata_cypress
> 00: 80 10 93 c6 45 00 80 02 00 80 01 01 00 00 80 00
> 10: f1 01 00 00 f5 03 00 00 00 00 00 00 00 00 00 00
> 20: 81 90 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00
> 
> snow / # lspci -vvx -s 7.2
> 00:07.2 IDE interface: Contaq Microsystems 82c693 (prog-if 00 [ISA
> Compatibility mode-only controller])
> Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr+ Stepping- SERR- FastB2B- DisINTx-
> Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
> >TAbort- SERR-  Latency: 0
> Interrupt: pin B routed to IRQ 0
> Region 0: I/O ports at 0170 [size=8]
> Region 1: I/O ports at 0374
> Region 4: Memory at 0c24 (32-bit, non-prefetchable)
> [disabled] [size=64K]
> Kernel modules: pata_cypress
> 00: 80 10 93 c6 45 00 80 02 00 00 01 01 00 00 80 00
> 10: 71 01 00 00 75 03 00 00 00 00 00 00 00 00 00 00
> 20: 00 00 24 0c 00 00 00 00 00 00 00 00 00 00 00 00
> 30: 00 00 00 00 00 00 00 00 00 00 00 00 00 02 00 00

Thanks very much.

Could I also ask for the output of:

# lspci -vxxx -s 7.0

as well please - this will dump all 256 bytes for the ISA bridge, which
contains a bunch of configuration registers. Thanks.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


Re: [PATCH v4 02/46] KVM: PPC: Book3S HV: Add a function to filter guest LPCR bits

2021-03-23 Thread Fabiano Rosas
Nicholas Piggin  writes:

> Guest LPCR depends on hardware type, and future changes will add
> restrictions based on errata and guest MMU mode. Move this logic
> to a common function and use it for the cases where the guest
> wants to update its LPCR (or the LPCR of a nested guest).
>
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas 

> ---
>  arch/powerpc/include/asm/kvm_book3s.h |  2 +
>  arch/powerpc/kvm/book3s_hv.c  | 60 ++-
>  arch/powerpc/kvm/book3s_hv_nested.c   |  3 +-
>  3 files changed, 45 insertions(+), 20 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
> b/arch/powerpc/include/asm/kvm_book3s.h
> index 2f5f919f6cd3..3eec3ef6f083 100644
> --- a/arch/powerpc/include/asm/kvm_book3s.h
> +++ b/arch/powerpc/include/asm/kvm_book3s.h
> @@ -258,6 +258,8 @@ extern long kvmppc_hv_get_dirty_log_hpt(struct kvm *kvm,
>  extern void kvmppc_harvest_vpa_dirty(struct kvmppc_vpa *vpa,
>   struct kvm_memory_slot *memslot,
>   unsigned long *map);
> +extern unsigned long kvmppc_filter_lpcr_hv(struct kvmppc_vcore *vc,
> + unsigned long lpcr);
>  extern void kvmppc_update_lpcr(struct kvm *kvm, unsigned long lpcr,
>   unsigned long mask);
>  extern void kvmppc_set_fscr(struct kvm_vcpu *vcpu, u64 fscr);
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 13bad6bf4c95..c4539c38c639 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -1635,6 +1635,27 @@ static int kvm_arch_vcpu_ioctl_set_sregs_hv(struct 
> kvm_vcpu *vcpu,
>   return 0;
>  }
>
> +/*
> + * Enforce limits on guest LPCR values based on hardware availability,
> + * guest configuration, and possibly hypervisor support and security
> + * concerns.
> + */
> +unsigned long kvmppc_filter_lpcr_hv(struct kvmppc_vcore *vc, unsigned long 
> lpcr)
> +{
> + /* On POWER8 and above, userspace can modify AIL */
> + if (!cpu_has_feature(CPU_FTR_ARCH_207S))
> + lpcr &= ~LPCR_AIL;
> +
> + /*
> +  * On POWER9, allow userspace to enable large decrementer for the
> +  * guest, whether or not the host has it enabled.
> +  */
> + if (!cpu_has_feature(CPU_FTR_ARCH_300))
> + lpcr &= ~LPCR_LD;
> +
> + return lpcr;
> +}
> +
>  static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr,
>   bool preserve_top32)
>  {
> @@ -1643,6 +1664,23 @@ static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 
> new_lpcr,
>   u64 mask;
>
>   spin_lock(&vc->lock);
> +
> + /*
> +  * Userspace can only modify
> +  * DPFD (default prefetch depth), ILE (interrupt little-endian),
> +  * TC (translation control), AIL (alternate interrupt location),
> +  * LD (large decrementer).
> +  * These are subject to restrictions from kvmppc_filter_lcpr_hv().
> +  */
> + mask = LPCR_DPFD | LPCR_ILE | LPCR_TC | LPCR_AIL | LPCR_LD;
> +
> + /* Broken 32-bit version of LPCR must not clear top bits */
> + if (preserve_top32)
> + mask &= 0x;
> +
> + new_lpcr = kvmppc_filter_lpcr_hv(vc,
> + (vc->lpcr & ~mask) | (new_lpcr & mask));
> +
>   /*
>* If ILE (interrupt little-endian) has changed, update the
>* MSR_LE bit in the intr_msr for each vcpu in this vcore.
> @@ -1661,25 +1699,8 @@ static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 
> new_lpcr,
>   }
>   }
>
> - /*
> -  * Userspace can only modify DPFD (default prefetch depth),
> -  * ILE (interrupt little-endian) and TC (translation control).
> -  * On POWER8 and POWER9 userspace can also modify AIL (alt. interrupt 
> loc.).
> -  */
> - mask = LPCR_DPFD | LPCR_ILE | LPCR_TC;
> - if (cpu_has_feature(CPU_FTR_ARCH_207S))
> - mask |= LPCR_AIL;
> - /*
> -  * On POWER9, allow userspace to enable large decrementer for the
> -  * guest, whether or not the host has it enabled.
> -  */
> - if (cpu_has_feature(CPU_FTR_ARCH_300))
> - mask |= LPCR_LD;
> + vc->lpcr = new_lpcr;
>
> - /* Broken 32-bit version of LPCR must not clear top bits */
> - if (preserve_top32)
> - mask &= 0x;
> - vc->lpcr = (vc->lpcr & ~mask) | (new_lpcr & mask);
>   spin_unlock(&vc->lock);
>  }
>
> @@ -4641,8 +4662,9 @@ void kvmppc_update_lpcr(struct kvm *kvm, unsigned long 
> lpcr, unsigned long mask)
>   struct kvmppc_vcore *vc = kvm->arch.vcores[i];
>   if (!vc)
>   continue;
> +
>   spin_lock(&vc->lock);
> - vc->lpcr = (vc->lpcr & ~mask) | lpcr;
> + vc->lpcr = kvmppc_filter_lpcr_hv(vc, (vc->lpcr & ~mask) | lpcr);
>   spin_unlock(&vc->lock);
>   if (++cores_done >= kvm->arch.online_vcores)
>   break;
> diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
> b/arch/powerpc/

Re: [PATCH v4 01/46] KVM: PPC: Book3S HV: Nested move LPCR sanitising to sanitise_hv_regs

2021-03-23 Thread Fabiano Rosas
Nicholas Piggin  writes:

> This will get a bit more complicated in future patches. Move it
> into the helper function.
>
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Fabiano Rosas 

> ---
>  arch/powerpc/kvm/book3s_hv_nested.c | 18 --
>  1 file changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
> b/arch/powerpc/kvm/book3s_hv_nested.c
> index 0cd0e7aad588..2fe1fea4c934 100644
> --- a/arch/powerpc/kvm/book3s_hv_nested.c
> +++ b/arch/powerpc/kvm/book3s_hv_nested.c
> @@ -134,6 +134,16 @@ static void save_hv_return_state(struct kvm_vcpu *vcpu, 
> int trap,
>
>  static void sanitise_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state 
> *hr)
>  {
> + struct kvmppc_vcore *vc = vcpu->arch.vcore;
> + u64 mask;
> +
> + /*
> +  * Don't let L1 change LPCR bits for the L2 except these:
> +  */
> + mask = LPCR_DPFD | LPCR_ILE | LPCR_TC | LPCR_AIL | LPCR_LD |
> + LPCR_LPES | LPCR_MER;
> + hr->lpcr = (vc->lpcr & ~mask) | (hr->lpcr & mask);
> +
>   /*
>* Don't let L1 enable features for L2 which we've disabled for L1,
>* but preserve the interrupt cause field.
> @@ -271,8 +281,6 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
>   u64 hv_ptr, regs_ptr;
>   u64 hdec_exp;
>   s64 delta_purr, delta_spurr, delta_ic, delta_vtb;
> - u64 mask;
> - unsigned long lpcr;
>
>   if (vcpu->kvm->arch.l1_ptcr == 0)
>   return H_NOT_AVAILABLE;
> @@ -321,9 +329,7 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
>   vcpu->arch.nested_vcpu_id = l2_hv.vcpu_token;
>   vcpu->arch.regs = l2_regs;
>   vcpu->arch.shregs.msr = vcpu->arch.regs.msr;
> - mask = LPCR_DPFD | LPCR_ILE | LPCR_TC | LPCR_AIL | LPCR_LD |
> - LPCR_LPES | LPCR_MER;
> - lpcr = (vc->lpcr & ~mask) | (l2_hv.lpcr & mask);
> +
>   sanitise_hv_regs(vcpu, &l2_hv);
>   restore_hv_regs(vcpu, &l2_hv);
>
> @@ -335,7 +341,7 @@ long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu)
>   r = RESUME_HOST;
>   break;
>   }
> - r = kvmhv_run_single_vcpu(vcpu, hdec_exp, lpcr);
> + r = kvmhv_run_single_vcpu(vcpu, hdec_exp, l2_hv.lpcr);
>   } while (is_kvmppc_resume_guest(r));
>
>   /* save L2 state for return */


Re: [PATCH v4 22/46] KVM: PPC: Book3S HV P9: Stop handling hcalls in real-mode in the P9 path

2021-03-23 Thread Fabiano Rosas
Nicholas Piggin  writes:

> In the interest of minimising the amount of code that is run in
> "real-mode", don't handle hcalls in real mode in the P9 path.
>
> POWER8 and earlier are much more expensive to exit from HV real mode
> and switch to host mode, because on those processors HV interrupts get
> to the hypervisor with the MMU off, and the other threads in the core
> need to be pulled out of the guest, and SLBs all need to be saved,
> ERATs invalidated, and host SLB reloaded before the MMU is re-enabled
> in host mode. Hash guests also require a lot of hcalls to run. The
> XICS interrupt controller requires hcalls to run.
>
> By contrast, POWER9 has independent thread switching, and in radix mode
> the hypervisor is already in a host virtual memory mode when the HV
> interrupt is taken. Radix + xive guests don't need hcalls to handle
> interrupts or manage translations.
>
> So it's much less important to handle hcalls in real mode in P9.
>
> Signed-off-by: Nicholas Piggin 

I tried this again in the L2 with xive=off and it works as expected now.

Tested-by: Fabiano Rosas 

> ---
>  arch/powerpc/include/asm/kvm_ppc.h  |  5 ++
>  arch/powerpc/kvm/book3s_hv.c| 57 
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |  5 ++
>  arch/powerpc/kvm/book3s_xive.c  | 70 +
>  4 files changed, 127 insertions(+), 10 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
> b/arch/powerpc/include/asm/kvm_ppc.h
> index 73b1ca5a6471..db6646c2ade2 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -607,6 +607,7 @@ extern void kvmppc_free_pimap(struct kvm *kvm);
>  extern int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall);
>  extern void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu);
>  extern int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd);
> +extern int kvmppc_xive_xics_hcall(struct kvm_vcpu *vcpu, u32 req);
>  extern u64 kvmppc_xics_get_icp(struct kvm_vcpu *vcpu);
>  extern int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 icpval);
>  extern int kvmppc_xics_connect_vcpu(struct kvm_device *dev,
> @@ -639,6 +640,8 @@ static inline int kvmppc_xics_enabled(struct kvm_vcpu 
> *vcpu)
>  static inline void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu) { }
>  static inline int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd)
>   { return 0; }
> +static inline int kvmppc_xive_xics_hcall(struct kvm_vcpu *vcpu, u32 req)
> + { return 0; }
>  #endif
>
>  #ifdef CONFIG_KVM_XIVE
> @@ -673,6 +676,7 @@ extern int kvmppc_xive_set_irq(struct kvm *kvm, int 
> irq_source_id, u32 irq,
>  int level, bool line_status);
>  extern void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu);
>  extern void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu);
> +extern void kvmppc_xive_cede_vcpu(struct kvm_vcpu *vcpu);
>
>  static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)
>  {
> @@ -714,6 +718,7 @@ static inline int kvmppc_xive_set_irq(struct kvm *kvm, 
> int irq_source_id, u32 ir
> int level, bool line_status) { return 
> -ENODEV; }
>  static inline void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu) { }
>  static inline void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu) { }
> +static inline void kvmppc_xive_cede_vcpu(struct kvm_vcpu *vcpu) { }
>
>  static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)
>   { return 0; }
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index fa7614c37e08..17739aaee3d8 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -1142,12 +1142,13 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
>  }
>
>  /*
> - * Handle H_CEDE in the nested virtualization case where we haven't
> - * called the real-mode hcall handlers in book3s_hv_rmhandlers.S.
> + * Handle H_CEDE in the P9 path where we don't call the real-mode hcall
> + * handlers in book3s_hv_rmhandlers.S.
> + *
>   * This has to be done early, not in kvmppc_pseries_do_hcall(), so
>   * that the cede logic in kvmppc_run_single_vcpu() works properly.
>   */
> -static void kvmppc_nested_cede(struct kvm_vcpu *vcpu)
> +static void kvmppc_cede(struct kvm_vcpu *vcpu)
>  {
>   vcpu->arch.shregs.msr |= MSR_EE;
>   vcpu->arch.ceded = 1;
> @@ -1403,9 +1404,15 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
>   /* hcall - punt to userspace */
>   int i;
>
> - /* hypercall with MSR_PR has already been handled in rmode,
> -  * and never reaches here.
> -  */
> + if (unlikely(vcpu->arch.shregs.msr & MSR_PR)) {
> + /*
> +  * Guest userspace executed sc 1, reflect it back as a
> +  * privileged program check interrupt.
> +  */
> + kvmppc_core_queue_program(vcpu, SRR1_PROGPRIV);
> + r = RESUME_GUEST;
> + 

[PATCH v2] powerpc/papr_scm: Implement support for H_SCM_FLUSH hcall

2021-03-23 Thread Shivaprasad G Bhat
Add support for ND_REGION_ASYNC capability if the device tree
indicates 'ibm,hcall-flush-required' property in the NVDIMM node.
Flush is done by issuing H_SCM_FLUSH hcall to the hypervisor.

If the flush request failed, the hypervisor is expected to
to reflect the problem in the subsequent dimm health request call.

This patch prevents mmap of namespaces with MAP_SYNC flag if the
nvdimm requires explicit flush[1].

References:
[1] 
https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/ndctl.py.data/map_sync.c

Signed-off-by: Shivaprasad G Bhat 
---
v1 - https://www.spinics.net/lists/kvm-ppc/msg18272.html
Changes from v1:
   - Hcall semantics finalized, all changes are to accomodate them.

 Documentation/powerpc/papr_hcalls.rst |   14 ++
 arch/powerpc/include/asm/hvcall.h |3 +-
 arch/powerpc/platforms/pseries/papr_scm.c |   39 +
 3 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/Documentation/powerpc/papr_hcalls.rst 
b/Documentation/powerpc/papr_hcalls.rst
index 48fcf1255a33..648f278eea8f 100644
--- a/Documentation/powerpc/papr_hcalls.rst
+++ b/Documentation/powerpc/papr_hcalls.rst
@@ -275,6 +275,20 @@ Health Bitmap Flags:
 Given a DRC Index collect the performance statistics for NVDIMM and copy them
 to the resultBuffer.
 
+**H_SCM_FLUSH**
+
+| Input: *drcIndex, continue-token*
+| Out: *continue-token*
+| Return Value: *H_SUCCESS, H_Parameter, H_P2, H_BUSY*
+
+Given a DRC Index Flush the data to backend NVDIMM device.
+
+The hcall returns H_BUSY when the flush takes longer time and the hcall needs
+to be issued multiple times in order to be completely serviced. The
+*continue-token* from the output to be passed in the argument list of
+subsequent hcalls to the hypervisor until the hcall is completely serviced
+at which point H_SUCCESS or other error is returned by the hypervisor.
+
 References
 ==
 .. [1] "Power Architecture Platform Reference"
diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index ed6086d57b22..9f7729a97ebd 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -315,7 +315,8 @@
 #define H_SCM_HEALTH0x400
 #define H_SCM_PERFORMANCE_STATS 0x418
 #define H_RPT_INVALIDATE   0x448
-#define MAX_HCALL_OPCODE   H_RPT_INVALIDATE
+#define H_SCM_FLUSH0x44C
+#define MAX_HCALL_OPCODE   H_SCM_FLUSH
 
 /* Scope args for H_SCM_UNBIND_ALL */
 #define H_UNBIND_SCOPE_ALL (0x1)
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index 835163f54244..f0407e135410 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -93,6 +93,7 @@ struct papr_scm_priv {
uint64_t block_size;
int metadata_size;
bool is_volatile;
+   bool hcall_flush_required;
 
uint64_t bound_addr;
 
@@ -117,6 +118,38 @@ struct papr_scm_priv {
size_t stat_buffer_len;
 };
 
+static int papr_scm_pmem_flush(struct nd_region *nd_region,
+  struct bio *bio __maybe_unused)
+{
+   struct papr_scm_priv *p = nd_region_provider_data(nd_region);
+   unsigned long ret_buf[PLPAR_HCALL_BUFSIZE];
+   uint64_t token = 0;
+   int64_t rc;
+
+   do {
+   rc = plpar_hcall(H_SCM_FLUSH, ret_buf, p->drc_index, token);
+   token = ret_buf[0];
+
+   /* Check if we are stalled for some time */
+   if (H_IS_LONG_BUSY(rc)) {
+   msleep(get_longbusy_msecs(rc));
+   rc = H_BUSY;
+   } else if (rc == H_BUSY) {
+   cond_resched();
+   }
+
+   } while (rc == H_BUSY);
+
+   if (rc) {
+   dev_err(&p->pdev->dev, "flush error: %lld", rc);
+   rc = -EIO;
+   } else {
+   dev_dbg(&p->pdev->dev, "flush drc 0x%x complete", p->drc_index);
+   }
+
+   return rc;
+}
+
 static LIST_HEAD(papr_nd_regions);
 static DEFINE_MUTEX(papr_ndr_lock);
 
@@ -943,6 +976,11 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
ndr_desc.num_mappings = 1;
ndr_desc.nd_set = &p->nd_set;
 
+   if (p->hcall_flush_required) {
+   set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
+   ndr_desc.flush = papr_scm_pmem_flush;
+   }
+
if (p->is_volatile)
p->region = nvdimm_volatile_region_create(p->bus, &ndr_desc);
else {
@@ -1088,6 +1126,7 @@ static int papr_scm_probe(struct platform_device *pdev)
p->block_size = block_size;
p->blocks = blocks;
p->is_volatile = !of_property_read_bool(dn, "ibm,cache-flush-required");
+   p->hcall_flush_required = of_property_read_bool(dn, 
"ibm,hcall-flush-required");
 
/* We just need to ensure that set cookies are unique across */
uuid_parse(uuid_str, (uuid_t *) uuid);




Re: [PATCH] xsysace: Remove SYSACE driver

2021-03-23 Thread Michal Simek



On 3/23/21 5:28 PM, Jens Axboe wrote:
> On 3/23/21 10:25 AM, Michal Simek wrote:
>>
>>
>> On 3/23/21 5:23 PM, Jens Axboe wrote:
>>> On 3/22/21 6:04 PM, Davidlohr Bueso wrote:
 Hi,

 On Mon, 09 Nov 2020, Michal Simek wrote:

> Sysace IP is no longer used on Xilinx PowerPC 405/440 and Microblaze
> systems. The driver is not regularly tested and very likely not working 
> for
> quite a long time that's why remove it.

 Is there a reason this patch was never merged? can the driver be
 removed? I ran into this as a potential tasklet user that can be
 replaced/removed.
>>>
>>> I'd be happy to merge it for 5.13.
>>>
>>
>> Can you just take this version? Or do you want me to send it again?
> 
> Minor edits needed for fuzz, but I've applied this version.

Thanks,
Michal



Re: [PATCH] xsysace: Remove SYSACE driver

2021-03-23 Thread Jens Axboe
On 3/23/21 10:25 AM, Michal Simek wrote:
> 
> 
> On 3/23/21 5:23 PM, Jens Axboe wrote:
>> On 3/22/21 6:04 PM, Davidlohr Bueso wrote:
>>> Hi,
>>>
>>> On Mon, 09 Nov 2020, Michal Simek wrote:
>>>
 Sysace IP is no longer used on Xilinx PowerPC 405/440 and Microblaze
 systems. The driver is not regularly tested and very likely not working for
 quite a long time that's why remove it.
>>>
>>> Is there a reason this patch was never merged? can the driver be
>>> removed? I ran into this as a potential tasklet user that can be
>>> replaced/removed.
>>
>> I'd be happy to merge it for 5.13.
>>
> 
> Can you just take this version? Or do you want me to send it again?

Minor edits needed for fuzz, but I've applied this version.

-- 
Jens Axboe



Re: [PATCH] xsysace: Remove SYSACE driver

2021-03-23 Thread Michal Simek



On 3/23/21 5:23 PM, Jens Axboe wrote:
> On 3/22/21 6:04 PM, Davidlohr Bueso wrote:
>> Hi,
>>
>> On Mon, 09 Nov 2020, Michal Simek wrote:
>>
>>> Sysace IP is no longer used on Xilinx PowerPC 405/440 and Microblaze
>>> systems. The driver is not regularly tested and very likely not working for
>>> quite a long time that's why remove it.
>>
>> Is there a reason this patch was never merged? can the driver be
>> removed? I ran into this as a potential tasklet user that can be
>> replaced/removed.
> 
> I'd be happy to merge it for 5.13.
> 

Can you just take this version? Or do you want me to send it again?

Thanks,
Michal


Re: [PATCH] xsysace: Remove SYSACE driver

2021-03-23 Thread Jens Axboe
On 3/22/21 6:04 PM, Davidlohr Bueso wrote:
> Hi,
> 
> On Mon, 09 Nov 2020, Michal Simek wrote:
> 
>> Sysace IP is no longer used on Xilinx PowerPC 405/440 and Microblaze
>> systems. The driver is not regularly tested and very likely not working for
>> quite a long time that's why remove it.
> 
> Is there a reason this patch was never merged? can the driver be
> removed? I ran into this as a potential tasklet user that can be
> replaced/removed.

I'd be happy to merge it for 5.13.

-- 
Jens Axboe



Re: [PATCH v4 44/46] KVM: PPC: Book3S HV P9: implement hash guest support

2021-03-23 Thread Fabiano Rosas
Nicholas Piggin  writes:

> Guest entry/exit has to restore and save/clear the SLB, plus several
> other bits to accommodate hash guests in the P9 path.
>
> Radix host, hash guest support is removed from the P7/8 path.
>
> Signed-off-by: Nicholas Piggin 
> ---



> diff --git a/arch/powerpc/kvm/book3s_hv_interrupt.c 
> b/arch/powerpc/kvm/book3s_hv_interrupt.c
> index cd84d2c37632..03fbfef708a8 100644
> --- a/arch/powerpc/kvm/book3s_hv_interrupt.c
> +++ b/arch/powerpc/kvm/book3s_hv_interrupt.c
> @@ -55,6 +55,50 @@ static void __accumulate_time(struct kvm_vcpu *vcpu, 
> struct kvmhv_tb_accumulator
>  #define accumulate_time(vcpu, next) do {} while (0)
>  #endif
>
> +static inline void mfslb(unsigned int idx, u64 *slbee, u64 *slbev)
> +{
> + asm volatile("slbmfev  %0,%1" : "=r" (*slbev) : "r" (idx));
> + asm volatile("slbmfee  %0,%1" : "=r" (*slbee) : "r" (idx));
> +}
> +
> +static inline void __mtslb(u64 slbee, u64 slbev)
> +{
> + asm volatile("slbmte %0,%1" :: "r" (slbev), "r" (slbee));
> +}
> +
> +static inline void mtslb(unsigned int idx, u64 slbee, u64 slbev)
> +{
> + BUG_ON((slbee & 0xfff) != idx);
> +
> + __mtslb(slbee, slbev);
> +}
> +
> +static inline void slb_invalidate(unsigned int ih)
> +{
> + asm volatile("slbia %0" :: "i"(ih));
> +}

Fyi, in my environment the assembler complains:

{standard input}: Assembler messages:
{standard input}:1293: Error: junk at end of line: `6'  
   
{standard input}:2138: Error: junk at end of line: `6'
make[3]: *** [../scripts/Makefile.build:271:
arch/powerpc/kvm/book3s_hv_interrupt.o] Error 1

This works:

-   asm volatile("slbia %0" :: "i"(ih));
+   asm volatile(PPC_SLBIA(%0) :: "i"(ih));

But I don't know what is going on.



[PATCH] powerpc: Switch to relative jump labels

2021-03-23 Thread Christophe Leroy
Convert powerpc to relative jump labels.

Before the patch, pseries_defconfig vmlinux.o has:
9074 __jump_table  0003f2a0      01321fa8  2**0

With the patch, the same config gets:
9074 __jump_table  0002a0e0      01321fb4  2**0

Size is 258720 without the patch, 172256 with the patch.
That's a 33% size reduction.

Largely copied from commit c296146c058c ("arm64/kernel: jump_label:
Switch to relative references")

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig  |  1 +
 arch/powerpc/include/asm/jump_label.h | 21 ++---
 arch/powerpc/kernel/jump_label.c  |  4 ++--
 3 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index d46db0bfb998..a52938c0f85b 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -182,6 +182,7 @@ config PPC
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_HUGE_VMAP  if PPC_BOOK3S_64 && 
PPC_RADIX_MMU
select HAVE_ARCH_JUMP_LABEL
+   select HAVE_ARCH_JUMP_LABEL_RELATIVE
select HAVE_ARCH_KASAN  if PPC32 && PPC_PAGE_SHIFT <= 14
select HAVE_ARCH_KASAN_VMALLOC  if PPC32 && PPC_PAGE_SHIFT <= 14
select HAVE_ARCH_KGDB
diff --git a/arch/powerpc/include/asm/jump_label.h 
b/arch/powerpc/include/asm/jump_label.h
index 09297ec9fa52..2d5c6bec2b4f 100644
--- a/arch/powerpc/include/asm/jump_label.h
+++ b/arch/powerpc/include/asm/jump_label.h
@@ -20,7 +20,8 @@ static __always_inline bool arch_static_branch(struct 
static_key *key, bool bran
asm_volatile_goto("1:\n\t"
 "nop # arch_static_branch\n\t"
 ".pushsection __jump_table,  \"aw\"\n\t"
-JUMP_ENTRY_TYPE "1b, %l[l_yes], %c0\n\t"
+".long 1b - ., %l[l_yes] - .\n\t"
+JUMP_ENTRY_TYPE "%c0 - .\n\t"
 ".popsection \n\t"
 : :  "i" (&((char *)key)[branch]) : : l_yes);
 
@@ -34,7 +35,8 @@ static __always_inline bool arch_static_branch_jump(struct 
static_key *key, bool
asm_volatile_goto("1:\n\t"
 "b %l[l_yes] # arch_static_branch_jump\n\t"
 ".pushsection __jump_table,  \"aw\"\n\t"
-JUMP_ENTRY_TYPE "1b, %l[l_yes], %c0\n\t"
+".long 1b - ., %l[l_yes] - .\n\t"
+JUMP_ENTRY_TYPE "%c0 - .\n\t"
 ".popsection \n\t"
 : :  "i" (&((char *)key)[branch]) : : l_yes);
 
@@ -43,23 +45,12 @@ static __always_inline bool arch_static_branch_jump(struct 
static_key *key, bool
return true;
 }
 
-#ifdef CONFIG_PPC64
-typedef u64 jump_label_t;
-#else
-typedef u32 jump_label_t;
-#endif
-
-struct jump_entry {
-   jump_label_t code;
-   jump_label_t target;
-   jump_label_t key;
-};
-
 #else
 #define ARCH_STATIC_BRANCH(LABEL, KEY) \
 1098:  nop;\
.pushsection __jump_table, "aw";\
-   FTR_ENTRY_LONG 1098b, LABEL, KEY;   \
+   .long 1098b - ., LABEL - .; \
+   FTR_ENTRY_LONG KEY; \
.popsection
 #endif
 
diff --git a/arch/powerpc/kernel/jump_label.c b/arch/powerpc/kernel/jump_label.c
index 144858027fa3..ce87dc5ea23c 100644
--- a/arch/powerpc/kernel/jump_label.c
+++ b/arch/powerpc/kernel/jump_label.c
@@ -11,10 +11,10 @@
 void arch_jump_label_transform(struct jump_entry *entry,
   enum jump_label_type type)
 {
-   struct ppc_inst *addr = (struct ppc_inst *)(unsigned long)entry->code;
+   struct ppc_inst *addr = (struct ppc_inst *)jump_entry_code(entry);
 
if (type == JUMP_LABEL_JMP)
-   patch_branch(addr, entry->target, 0);
+   patch_branch(addr, jump_entry_target(entry), 0);
else
patch_instruction(addr, ppc_inst(PPC_INST_NOP));
 }
-- 
2.25.0



Re: [PATCH next v1 2/3] printk: remove safe buffers

2021-03-23 Thread Petr Mladek
On Wed 2021-03-17 00:33:25, John Ogness wrote:
> With @logbuf_lock removed, the high level printk functions for
> storing messages are lockless. Messages can be stored from any
> context, so there is no need for the NMI and safe buffers anymore.
> Remove the NMI and safe buffers.
> 
> Although the safe buffers are removed, the NMI and safe context
> tracking is still in place. In these contexts, store the message
> immediately but still use irq_work to defer the console printing.
> 
> Since printk recursion tracking is in place, safe context tracking
> for most of printk is not needed. Remove it. Only safe context
> tracking relating to the console lock is left in place. This is
> because the console lock is needed for the actual printing.

I have two more questions after actually checking the entire patch.
See below.

> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1084,7 +1069,6 @@ void __init setup_log_buf(int early)
>   struct printk_record r;
>   size_t new_descs_size;
>   size_t new_infos_size;
> - unsigned long flags;
>   char *new_log_buf;
>   unsigned int free;
>   u64 seq;
> @@ -1142,8 +1126,6 @@ void __init setup_log_buf(int early)
>new_descs, ilog2(new_descs_count),
>new_infos);
>  
> - printk_safe_enter_irqsave(flags);
> -
>   log_buf_len = new_log_buf_len;
>   log_buf = new_log_buf;
>   new_log_buf_len = 0;
> @@ -1159,8 +1141,6 @@ void __init setup_log_buf(int early)
>*/
>   prb = &printk_rb_dynamic;
>  
> - printk_safe_exit_irqrestore(flags);

This will allow to add new messages from the IRQ context when we
are copying them to the new buffer. They might get lost in
the small race window.

Also the messages from NMI might get lost because they are not
longer stored in the per-CPU buffer.

A possible solution might be to do something like this:

prb_for_each_record(0, &printk_rb_static, seq, &r)
free -= add_to_rb(&printk_rb_dynamic, &r);

prb = &printk_rb_dynamic;

/*
 * Copy the remaining messages that might have appeared
 * from IRQ or NMI context after we ended copying and
 * before we switched the buffers. They must be finalized
 * because only one CPU is up at this stage.
 */
prb_for_each_record(seq, &printk_rb_static, seq, &r)
free -= add_to_rb(&printk_rb_dynamic, &r);


> -
>   if (seq != prb_next_seq(&printk_rb_static)) {
>   pr_err("dropped %llu messages\n",
>  prb_next_seq(&printk_rb_static) - seq);
> @@ -2666,7 +2631,6 @@ void console_unlock(void)
>   size_t ext_len = 0;
>   size_t len;
>  
> - printk_safe_enter_irqsave(flags);
>  skip:
>   if (!prb_read_valid(prb, console_seq, &r))
>   break;
> @@ -2711,6 +2675,8 @@ void console_unlock(void)
>   printk_time);
>   console_seq++;
>  
> + printk_safe_enter_irqsave(flags);

What is the purpose of the printk_safe context here, please?

I guess that you wanted to prevent calling console drivers
recursively. But it is already serialized by console_lock().

IMHO, the only risk is when manipulating console_sem->lock
or console_owner_lock. But they are already guarded by
printk_safe context, for example, in console_lock() or
console_lock_spinning_enable().

Do I miss something, please?


> +
>   /*
>* While actively printing out messages, if another printk()
>* were to occur on another CPU, it may wait for this one to
> @@ -2745,8 +2711,6 @@ void console_unlock(void)
>* flush, no worries.
>*/
>   retry = prb_read_valid(prb, console_seq, NULL);
> - printk_safe_exit_irqrestore(flags);
> -
>   if (retry && console_trylock())
>   goto again;
>  }

Heh, all these patches feels like stripping printk of an armour. I hope
that we trained it enough to be flexible and avoid any damage.

Best Regards,
Petr


Re: [PATCH 0/4] Rust for Linux for ppc64le

2021-03-23 Thread Miguel Ojeda
On Tue, Mar 23, 2021 at 1:16 PM Michael Ellerman  wrote:
>
> It would be nice to be in the CI. I was building natively so I haven't
> tried cross compiling yet (which we'll need for CI).

Indeed -- in the CI we already cross-compile arm64 (and run under QEMU
both arm64 as well as x86_64), so it is easy to add new ones to the
matrix.

> I can send a pull request if that's easiest.

No worries, I will pick the patches. But, of course, feel free to join
us in GitHub! :-)

Cheers,
Miguel


Re: [PATCH 02/10] ARM: disable CONFIG_IDE in footbridge_defconfig

2021-03-23 Thread Russell King - ARM Linux admin
On Mon, Mar 22, 2021 at 04:33:14PM +0100, Christoph Hellwig wrote:
> On Mon, Mar 22, 2021 at 04:18:23PM +0100, Christoph Hellwig wrote:
> > On Mon, Mar 22, 2021 at 03:15:03PM +, Russell King - ARM Linux admin 
> > wrote:
> > > It gets worse than that though - due to a change to remove
> > > pcibios_min_io from the generic code, moving it into the ARM
> > > architecture code, this has caused a regression that prevents the
> > > legacy resources being registered against the bus resource. So even
> > > if they are there, they cause probe failures. I haven't found a
> > > reasonable way to solve this yet, but until there is, there is no
> > > way that the PATA driver can be used as the "legacy mode" support
> > > is effectively done via the PCI code assigning virtual IO port
> > > resources.
> > > 
> > > I'm quite surprised that the CY82C693 even works on Alpha - I've
> > > asked for a lspci for that last week but nothing has yet been
> > > forthcoming from whoever responded to your patch for Alpha - so I
> > > can't compare what I'm seeing with what's happening with Alpha.
> > 
> > That sounds like something we could fix with a quirk for function 2
> > in the PCI resource assignment code.  Can you show what vendor and
> > device ID function 2 has so that I could try to come up with one?
> 
> Something like this:

That solves the problem for the IDE driver, which knows how to deal
with legacy mode, but not the PATA driver, which doesn't. The PATA
driver needs these resources.

As I say, having these resources presents a problem on ARM. A previous
commit (3c5d1699887b) changed the way the bus resources are setup which
results in /proc/ioports containing:

-000f : dma1
0020-003f : pic1
0060-006f : i8042
0070-0073 : rtc_cmos
  0070-0073 : rtc0
0080-008f : dma low page
00a0-00bf : pic2
00c0-00df : dma2
0213-0213 : ISAPnP
02f8-02ff : serial8250.0
  02f8-02ff : serial
03c0-03df : vga+
03f8-03ff : serial8250.0
  03f8-03ff : serial
0480-048f : dma high page
0a79-0a79 : isapnp write
1000- : PCI0 I/O
  1000-107f : :00:08.0
1000-107f : 3c59x
  1080-108f : :00:06.1
  1090-109f : :00:07.0
1090-109f : pata_it821x
  10a0-10a7 : :00:07.0
10a0-10a7 : pata_it821x
  10a8-10af : :00:07.0
10a8-10af : pata_it821x
  10b0-10b3 : :00:07.0
10b0-10b3 : pata_it821x
  10b4-10b7 : :00:07.0
10b4-10b7 : pata_it821x

The "PCI0 I/O" resource is the bus level resource, and the legacy
resources can not be claimed against that.

Without these resources, the PATA cypress driver doesn't work.

As I said previously, the reason this regression was not picked up
earlier is because I don't upgrade the kernel on this machine very
often; the machine has had uptimes into thousands of days.

I need to try reverting Rob's commit to find out if anything breaks
on this platform - it's completely wrong from a technical point of
view for any case where we have a PCI southbridge, since the
southbridge provides ISA based resources. I'm not entirely sure
what the point of it was, since we still have the PCIBIOS_MIN_IO
macro which still uses pcibios_min_io.

I'm looking at some of the other changes Rob made back at that time
which also look wrong, such as 8ef6e6201b26 which has the effect of
locating the 21285 IO resources to PCI address 0, over the top of
the ISA southbridge resources. I've no idea what Rob was thinking
when he removed the csrio allocation code in that commit, but
looking at it to day, it's soo obviously wrong even to a casual
glance.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


Re: [PATCH next v1 2/3] printk: remove safe buffers

2021-03-23 Thread Petr Mladek
On Mon 2021-03-22 22:58:47, John Ogness wrote:
> On 2021-03-22, Petr Mladek  wrote:
> > On Mon 2021-03-22 12:16:15, John Ogness wrote:
> >> On 2021-03-21, Sergey Senozhatsky  wrote:
> >> >> @@ -369,7 +70,10 @@ __printf(1, 0) int vprintk_func(const char *fmt, 
> >> >> va_list args)
> >> >>  * Use the main logbuf even in NMI. But avoid calling console
> >> >>  * drivers that might have their own locks.
> >> >>  */
> >> >> -   if ((this_cpu_read(printk_context) & 
> >> >> PRINTK_NMI_DIRECT_CONTEXT_MASK)) {
> >> >> +   if (this_cpu_read(printk_context) &
> >> >> +   (PRINTK_NMI_DIRECT_CONTEXT_MASK |
> >> >> +PRINTK_NMI_CONTEXT_MASK |
> >> >> +PRINTK_SAFE_CONTEXT_MASK)) {
> >> >

> >> But I suppose I could switch
> >> the 1 printk_nmi_direct_enter() user to printk_nmi_enter() so that
> >> PRINTK_NMI_DIRECT_CONTEXT_MASK can be removed now. I would do this in a
> >> 4th patch of the series.
> >
> > Yes, please unify the PRINTK_NMI_CONTEXT. One is enough.
> 
> Agreed. (But I'll go even further. See below.)
> 
> > I wonder if it would make sense to go even further at this stage.
> > What is possible?
> >
> > 1. We could get rid of printk_nmi_enter()/exit() and
> >PRINTK_NMI_CONTEXT completely already now. It is enough
> >to check in_nmi() in printk_func().
> >
> 
> Agreed. in_nmi() within vprintk_emit() is enough to detect if the
> console code should be skipped:
> 
> if (!in_sched && !in_nmi()) {
> ...
> }

Well, we also need to make sure that the irq work is scheduled to
call console later. We should keep this dicision in
printk_func(). I mean to replace the current

if (this_cpu_read(printk_context) &
(PRINTK_NMI_DIRECT_CONTEXT_MASK |
 PRINTK_NMI_CONTEXT_MASK |
 PRINTK_SAFE_CONTEXT_MASK)) {

with

/*
 * Avoid calling console drivers in recursive printk()
 * and in NMI context.
 */
if (this_cpu_read(printk_context) || in_nmi() {

That said, I am not sure how this fits your further rework.
I do not want to complicate it too much.

I am just afraid that the discussion about console rework might
take some time. And this would remove some complexity before we
started the more complicated or controversial changes.


> > 2. I thought about unifying printk_safe_enter()/exit() and
> >printk_enter()/exit(). They both count recursion with
> >IRQs disabled, have similar name. But they are used
> >different way.
> >
> >But better might be to rename printk_safe_enter()/exit() to
> >console_enter()/exit() or to printk_deferred_enter()/exit().
> >It would make more clear what it does now. And it might help
> >to better distinguish it from the new printk_enter()/exit().
> >
> >I am not sure if it is worth it.
> 
> I am also not sure if it is worth the extra "noise" just to give the
> function a more appropriate name. The plan is to remove it completely
> soon anyway. My vote is to leave the name as it is.

OK, let's keep printk_safe() name. It was just an idea. I wrote it
primary to sort my thoughts.

Best Regards,
Petr


Re: [PATCH v4 39/46] KVM: PPC: Book3S HV: Remove virt mode checks from real mode handlers

2021-03-23 Thread Cédric Le Goater
On 3/23/21 2:02 AM, Nicholas Piggin wrote:
> Now that the P7/8 path no longer supports radix, real-mode handlers
> do not need to deal with being called in virt mode.
> 
> This change effectively reverts commit acde25726bc6 ("KVM: PPC: Book3S
> HV: Add radix checks in real-mode hypercall handlers").
> 
> It removes a few more real-mode tests in rm hcall handlers, which also
> allows the indirect ops for the xive module to be removed from the
> built-in xics rm handlers.
> 
> kvmppc_h_random is renamed to kvmppc_rm_h_random to be a bit more
> descriptive of its function.
> 
> Cc: Cédric Le Goater 
> Signed-off-by: Nicholas Piggin 

Reviewed-by: Cédric Le Goater 

> ---
>  arch/powerpc/include/asm/kvm_ppc.h  | 10 +--
>  arch/powerpc/kvm/book3s.c   | 11 +--
>  arch/powerpc/kvm/book3s_64_vio_hv.c | 12 
>  arch/powerpc/kvm/book3s_hv_builtin.c| 91 ++---
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |  2 +-
>  arch/powerpc/kvm/book3s_xive.c  | 18 -
>  arch/powerpc/kvm/book3s_xive.h  |  7 --
>  arch/powerpc/kvm/book3s_xive_native.c   | 10 ---
>  8 files changed, 23 insertions(+), 138 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
> b/arch/powerpc/include/asm/kvm_ppc.h
> index db6646c2ade2..5dfb3f167f2c 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -659,8 +659,6 @@ extern int kvmppc_xive_get_xive(struct kvm *kvm, u32 irq, 
> u32 *server,
>   u32 *priority);
>  extern int kvmppc_xive_int_on(struct kvm *kvm, u32 irq);
>  extern int kvmppc_xive_int_off(struct kvm *kvm, u32 irq);
> -extern void kvmppc_xive_init_module(void);
> -extern void kvmppc_xive_exit_module(void);
>  
>  extern int kvmppc_xive_connect_vcpu(struct kvm_device *dev,
>   struct kvm_vcpu *vcpu, u32 cpu);
> @@ -686,8 +684,6 @@ static inline int kvmppc_xive_enabled(struct kvm_vcpu 
> *vcpu)
>  extern int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev,
>  struct kvm_vcpu *vcpu, u32 cpu);
>  extern void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu);
> -extern void kvmppc_xive_native_init_module(void);
> -extern void kvmppc_xive_native_exit_module(void);
>  extern int kvmppc_xive_native_get_vp(struct kvm_vcpu *vcpu,
>union kvmppc_one_reg *val);
>  extern int kvmppc_xive_native_set_vp(struct kvm_vcpu *vcpu,
> @@ -701,8 +697,6 @@ static inline int kvmppc_xive_get_xive(struct kvm *kvm, 
> u32 irq, u32 *server,
>  u32 *priority) { return -1; }
>  static inline int kvmppc_xive_int_on(struct kvm *kvm, u32 irq) { return -1; }
>  static inline int kvmppc_xive_int_off(struct kvm *kvm, u32 irq) { return -1; 
> }
> -static inline void kvmppc_xive_init_module(void) { }
> -static inline void kvmppc_xive_exit_module(void) { }
>  
>  static inline int kvmppc_xive_connect_vcpu(struct kvm_device *dev,
>  struct kvm_vcpu *vcpu, u32 cpu) { 
> return -EBUSY; }
> @@ -725,8 +719,6 @@ static inline int kvmppc_xive_enabled(struct kvm_vcpu 
> *vcpu)
>  static inline int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev,
> struct kvm_vcpu *vcpu, u32 cpu) { return -EBUSY; }
>  static inline void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu) { }
> -static inline void kvmppc_xive_native_init_module(void) { }
> -static inline void kvmppc_xive_native_exit_module(void) { }
>  static inline int kvmppc_xive_native_get_vp(struct kvm_vcpu *vcpu,
>   union kvmppc_one_reg *val)
>  { return 0; }
> @@ -762,7 +754,7 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
>  unsigned long tce_value, unsigned long npages);
>  long int kvmppc_rm_h_confer(struct kvm_vcpu *vcpu, int target,
>  unsigned int yield_count);
> -long kvmppc_h_random(struct kvm_vcpu *vcpu);
> +long kvmppc_rm_h_random(struct kvm_vcpu *vcpu);
>  void kvmhv_commence_exit(int trap);
>  void kvmppc_realmode_machine_check(struct kvm_vcpu *vcpu);
>  void kvmppc_subcore_enter_guest(void);
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index 44bf567b6589..1888aedfd410 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -1046,13 +1046,10 @@ static int kvmppc_book3s_init(void)
>  #ifdef CONFIG_KVM_XICS
>  #ifdef CONFIG_KVM_XIVE
>   if (xics_on_xive()) {
> - kvmppc_xive_init_module();
>   kvm_register_device_ops(&kvm_xive_ops, KVM_DEV_TYPE_XICS);
> - if (kvmppc_xive_native_supported()) {
> - kvmppc_xive_native_init_module();
> + if (kvmppc_xive_native_supported())
>   kvm_register_device_ops(&kvm_xive_native_ops,
>   KVM_DEV_TYPE_XIVE);
> - }
>   } else

Re: [PATCH v4 22/46] KVM: PPC: Book3S HV P9: Stop handling hcalls in real-mode in the P9 path

2021-03-23 Thread Cédric Le Goater
On 3/23/21 2:02 AM, Nicholas Piggin wrote:
> In the interest of minimising the amount of code that is run in
> "real-mode", don't handle hcalls in real mode in the P9 path.
> 
> POWER8 and earlier are much more expensive to exit from HV real mode
> and switch to host mode, because on those processors HV interrupts get
> to the hypervisor with the MMU off, and the other threads in the core
> need to be pulled out of the guest, and SLBs all need to be saved,
> ERATs invalidated, and host SLB reloaded before the MMU is re-enabled
> in host mode. Hash guests also require a lot of hcalls to run. The
> XICS interrupt controller requires hcalls to run.
> 
> By contrast, POWER9 has independent thread switching, and in radix mode
> the hypervisor is already in a host virtual memory mode when the HV
> interrupt is taken. Radix + xive guests don't need hcalls to handle
> interrupts or manage translations.
> 
> So it's much less important to handle hcalls in real mode in P9.
> 
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/include/asm/kvm_ppc.h  |  5 ++
>  arch/powerpc/kvm/book3s_hv.c| 57 
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |  5 ++
>  arch/powerpc/kvm/book3s_xive.c  | 70 +
>  4 files changed, 127 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
> b/arch/powerpc/include/asm/kvm_ppc.h
> index 73b1ca5a6471..db6646c2ade2 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -607,6 +607,7 @@ extern void kvmppc_free_pimap(struct kvm *kvm);
>  extern int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall);
>  extern void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu);
>  extern int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd);
> +extern int kvmppc_xive_xics_hcall(struct kvm_vcpu *vcpu, u32 req);
>  extern u64 kvmppc_xics_get_icp(struct kvm_vcpu *vcpu);
>  extern int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 icpval);
>  extern int kvmppc_xics_connect_vcpu(struct kvm_device *dev,
> @@ -639,6 +640,8 @@ static inline int kvmppc_xics_enabled(struct kvm_vcpu 
> *vcpu)
>  static inline void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu) { }
>  static inline int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd)
>   { return 0; }
> +static inline int kvmppc_xive_xics_hcall(struct kvm_vcpu *vcpu, u32 req)
> + { return 0; }
>  #endif
>  
>  #ifdef CONFIG_KVM_XIVE
> @@ -673,6 +676,7 @@ extern int kvmppc_xive_set_irq(struct kvm *kvm, int 
> irq_source_id, u32 irq,
>  int level, bool line_status);
>  extern void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu);
>  extern void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu);
> +extern void kvmppc_xive_cede_vcpu(struct kvm_vcpu *vcpu);
>  
>  static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)
>  {
> @@ -714,6 +718,7 @@ static inline int kvmppc_xive_set_irq(struct kvm *kvm, 
> int irq_source_id, u32 ir
> int level, bool line_status) { return 
> -ENODEV; }
>  static inline void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu) { }
>  static inline void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu) { }
> +static inline void kvmppc_xive_cede_vcpu(struct kvm_vcpu *vcpu) { }
>  
>  static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)
>   { return 0; }
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index fa7614c37e08..17739aaee3d8 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -1142,12 +1142,13 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
>  }
>  
>  /*
> - * Handle H_CEDE in the nested virtualization case where we haven't
> - * called the real-mode hcall handlers in book3s_hv_rmhandlers.S.
> + * Handle H_CEDE in the P9 path where we don't call the real-mode hcall
> + * handlers in book3s_hv_rmhandlers.S.
> + *
>   * This has to be done early, not in kvmppc_pseries_do_hcall(), so
>   * that the cede logic in kvmppc_run_single_vcpu() works properly.
>   */
> -static void kvmppc_nested_cede(struct kvm_vcpu *vcpu)
> +static void kvmppc_cede(struct kvm_vcpu *vcpu)
>  {
>   vcpu->arch.shregs.msr |= MSR_EE;
>   vcpu->arch.ceded = 1;
> @@ -1403,9 +1404,15 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
>   /* hcall - punt to userspace */
>   int i;
>  
> - /* hypercall with MSR_PR has already been handled in rmode,
> -  * and never reaches here.
> -  */
> + if (unlikely(vcpu->arch.shregs.msr & MSR_PR)) {
> + /*
> +  * Guest userspace executed sc 1, reflect it back as a
> +  * privileged program check interrupt.
> +  */
> + kvmppc_core_queue_program(vcpu, SRR1_PROGPRIV);
> + r = RESUME_GUEST;
> + break;
> + }
>  
>   run->papr

Re: [PATCH v11 0/6] KASAN for powerpc64 radix

2021-03-23 Thread Christophe Leroy




Le 23/03/2021 à 02:21, Daniel Axtens a écrit :

Hi Christophe,


In the discussion we had long time ago,
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20190806233827.16454-5-...@axtens.net/#2321067
, I challenged you on why it was not possible to implement things the same way 
as other
architectures, in extenso with an early mapping.

Your first answer was that too many things were done in real mode at startup. 
After some discussion
you said that finally there was not that much things at startup but the issue 
was KVM.

Now you say that instrumentation on KVM is fully disabled.

So my question is, if KVM is not a problem anymore, why not go the standard way 
with an early shadow
? Then you could also support inline instrumentation.


Fair enough, I've had some trouble both understanding the problem myself
and clearly articulating it. Let me try again.

We need translations on to access the shadow area.

We reach setup_64.c::early_setup() with translations off. At this point
we don't know what MMU we're running under, or our CPU features.


What do you need to know ? Whether it is Hash or Radix, or more/different 
details ?

IIUC, today we only support KASAN on Radix. Would it make sense to say that a kernel built with 
KASAN can only run on processors having Radix capacility ? Then select CONFIG_PPC_RADIX_MMU_DEFAULT 
when KASAN is set, and accept that the kernel crashes if Radix is not available ?




To determine our MMU and CPU features, early_setup() calls functions
(dt_cpu_ftrs_init, early_init_devtree) that call out to generic code
like of_scan_flat_dt. We need to do this before we turn on translations
because we can't set up the MMU until we know what MMU we have.

So this puts us in a bind:

  - We can't set up an early shadow until we have translations on, which
requires that the MMU is set up.

  - We can't set up an MMU until we call out to generic code for FDT
parsing.

So there will be calls to generic FDT parsing code that happen before the
early shadow is set up.


I see some logic in kernel/prom_init.c for detecting MMU. Can we get the information from there in 
order to setup the MMU ?




The setup code also prints a bunch of information about the platform
with printk() while translations are off, so it wouldn't even be enough
to disable instrumentation for bits of the generic DT code on ppc64.


I'm sure the printk() stuff can be avoided or delayed without much problems, I guess the main 
problem is the DT code, isn't it ?


As far as I can see the code only use udbg_printf() before MMU is on, and this could be simply 
skipped when KASAN is selected, I see no situation where you need early printk together with KASAN.




Does that make sense? If you can figure out how to 'square the circle'
here I'm all ears.


Yes it is a lot more clear now, thanks you. Gave a few ideas above, does it 
help ?



Other notes:

  - There's a comment about printk() being 'safe' in early_setup(), that
refers to having a valid PACA, it doesn't mean that it's safe in any
other sense.

  - KVM does indeed also run stuff with translations off but we can catch
all of that by disabling instrumentation on the real-mode handlers:
it doesn't seem to leak out to generic code. So you are right that
KVM is no longer an issue.



Christophe


[PATCH] soc/fsl: qbman: fix conflicting alignment attributes

2021-03-23 Thread Arnd Bergmann
From: Arnd Bergmann 

When building with W=1, gcc points out that the __packed attribute
on struct qm_eqcr_entry conflicts with the 8-byte alignment
attribute on struct qm_fd inside it:

drivers/soc/fsl/qbman/qman.c:189:1: error: alignment 1 of 'struct 
qm_eqcr_entry' is less than 8 [-Werror=packed-not-aligned]

I assume that the alignment attribute is the correct one, and
that qm_eqcr_entry cannot actually be unaligned in memory,
so add the same alignment on the outer struct.

Fixes: c535e923bb97 ("soc/fsl: Introduce DPAA 1.x QMan device driver")
Signed-off-by: Arnd Bergmann 
---
 drivers/soc/fsl/qbman/qman.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/soc/fsl/qbman/qman.c b/drivers/soc/fsl/qbman/qman.c
index a1b9be1d105a..fde4edd83c14 100644
--- a/drivers/soc/fsl/qbman/qman.c
+++ b/drivers/soc/fsl/qbman/qman.c
@@ -186,7 +186,7 @@ struct qm_eqcr_entry {
__be32 tag;
struct qm_fd fd;
u8 __reserved3[32];
-} __packed;
+} __packed __aligned(8);
 #define QM_EQCR_VERB_VBIT  0x80
 #define QM_EQCR_VERB_CMD_MASK  0x61/* but only one value; */
 #define QM_EQCR_VERB_CMD_ENQUEUE   0x01
-- 
2.29.2



Re: [PATCH 0/4] Rust for Linux for ppc64le

2021-03-23 Thread Michael Ellerman
Miguel Ojeda  writes:
> Hi Michael,
>
> On Tue, Mar 23, 2021 at 4:27 AM Michael Ellerman  wrote:
>>
>> Hi all,
>>
>> Here's a first attempt at getting the kernel Rust support building on 
>> powerpc.
>
> Thanks a *lot*! It is great to have more architectures rolling.

No worries.

>> It's powerpc64le only for now, as that's what I can easily test given the
>> distros I have installed. Though powerpc and powerpc64 are also Tier 2 
>> platforms
>
> Even if it is just 64-bit, it is very good to have it!
>
>> so in theory should work. Supporting those would require something more
>> complicated than just pointing rustc at arch/$(ARCH)/rust/target.json.
>
> Yeah, the arch/$(ARCH)/rust/target.json dance is a placeholder -- I
> need to figure out how to do that more cleanly, likely generating them
> on the fly.

Yeah that's a good idea. That way they can be made to exactly match the
kernel configuration.

>> This is based on 832575d934a2 from the Rust-for-Linux tree. Anything newer 
>> gives
>> me errors about symbol name lengths. I figured I'd send this anyway, as it 
>> seems
>> like those errors are probably not powerpc specific.
>
> Sure, feel free to send things even if they don't work completely.
>
> I will take a look at the symbol name lengths -- I increased that
> limit to 512 and added support for 2-byte lengths in the tables, but
> perhaps something is missing. If I manage to make it work, I can add
> ppc64le to our CI! :-)

It would be nice to be in the CI. I was building natively so I haven't
tried cross compiling yet (which we'll need for CI).

>> Michael Ellerman (4):
>>   rust: Export symbols in initialized data section
>>   rust: Add powerpc64 as a 64-bit target_arch in c_types.rs
>>   powerpc/rust: Add target.json for ppc64le
>>   rust: Enable for ppc64le
>
> Regarding the development process: at least until the RFC we are
> working with the usual GitHub PR workflow (for several reasons: having
> a quick CI setup, getting new Rust developers on-board, having a list
> of "issues", cross-reference with the Rust repo, etc.).
>
> I can take patches from the list, of course, but since we are pre-RFC,
> do you mind if they get rebased etc. through there?

No I don't mind at all. I just sent patches so other ppc folks could see
what I had, and it's kind of the process I'm used to.

I can send a pull request if that's easiest.

cheers


Re: [PATCH 0/2] handle premature return from H_JOIN in pseries mobility code

2021-03-23 Thread Michael Ellerman
On Mon, 15 Mar 2021 03:00:43 -0500, Nathan Lynch wrote:
> pseries VMs in shared processor mode are susceptible to failed
> migrations becasue stray H_PRODs from the paravirt spinlock
> implementation can bump threads out of joining state before the
> suspend has occurred. Fix this by adding a small amount of shared
> state and ordering accesses to it with respect to H_PROD and H_JOIN.
> 
> Nathan Lynch (2):
>   powerpc/pseries/mobility: use struct for shared state
>   powerpc/pseries/mobility: handle premature return from H_JOIN
> 
> [...]

Applied to powerpc/fixes.

[1/2] powerpc/pseries/mobility: use struct for shared state
  https://git.kernel.org/powerpc/c/e834df6cfc71d8e5ce2c27a0184145ea125c3f0f
[2/2] powerpc/pseries/mobility: handle premature return from H_JOIN
  https://git.kernel.org/powerpc/c/274cb1ca2e7ce02cab56f5f4c61a74aeb566f931

cheers


Re: [PATCH v4 28/46] KVM: PPC: Book3S HV P9: Reduce irq_work vs guest decrementer races

2021-03-23 Thread Nicholas Piggin
Excerpts from Nicholas Piggin's message of March 23, 2021 8:36 pm:
> Excerpts from Alexey Kardashevskiy's message of March 23, 2021 8:13 pm:
>> 
>> 
>> On 23/03/2021 12:02, Nicholas Piggin wrote:
>>> irq_work's use of the DEC SPR is racy with guest<->host switch and guest
>>> entry which flips the DEC interrupt to guest, which could lose a host
>>> work interrupt.
>>> 
>>> This patch closes one race, and attempts to comment another class of
>>> races.
>>> 
>>> Signed-off-by: Nicholas Piggin 
>>> ---
>>>   arch/powerpc/kvm/book3s_hv.c | 15 ++-
>>>   1 file changed, 14 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>>> index 1f38a0abc611..989a1ff5ad11 100644
>>> --- a/arch/powerpc/kvm/book3s_hv.c
>>> +++ b/arch/powerpc/kvm/book3s_hv.c
>>> @@ -3745,6 +3745,18 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu 
>>> *vcpu, u64 time_limit,
>>> if (!(vcpu->arch.ctrl & 1))
>>> mtspr(SPRN_CTRLT, mfspr(SPRN_CTRLF) & ~1);
>>>   
>>> +   /*
>>> +* When setting DEC, we must always deal with irq_work_raise via NMI vs
>>> +* setting DEC. The problem occurs right as we switch into guest mode
>>> +* if a NMI hits and sets pending work and sets DEC, then that will
>>> +* apply to the guest and not bring us back to the host.
>>> +*
>>> +* irq_work_raise could check a flag (or possibly LPCR[HDICE] for
>>> +* example) and set HDEC to 1? That wouldn't solve the nested hv
>>> +* case which needs to abort the hcall or zero the time limit.
>>> +*
>>> +* XXX: Another day's problem.
>>> +*/
>>> mtspr(SPRN_DEC, vcpu->arch.dec_expires - tb);
>>>   
>>> if (kvmhv_on_pseries()) {
>>> @@ -3879,7 +3891,8 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu 
>>> *vcpu, u64 time_limit,
>>> vc->entry_exit_map = 0x101;
>>> vc->in_guest = 0;
>>>   
>>> -   mtspr(SPRN_DEC, local_paca->kvm_hstate.dec_expires - tb);
>>> +   set_dec_or_work(local_paca->kvm_hstate.dec_expires - tb);
>> 
>> 
>> set_dec_or_work() will write local_paca->kvm_hstate.dec_expires - tb - 1 
>> to SPRN_DEC which is not exactly the same, is this still alright?
>> 
>> I asked in v3 but it is probably lost :)
> 
> Oh I did see that then forgot.
> 
> It will write dec_expires - tb, then it will write 1 if it found irq_work
> was pending.

Ah you were actually asking about set_dec writing val - 1. I totally 
missed that.

Yes that was an unintentional change. This is the way timer.c code works 
with respect to the decrementers_next_tb value, so it's probably better 
to make them so it seems like it should be okay (and better to bring the 
KVM code up to match timer code rather than be different or the other
way around). The difference should be noted in the changelog though.

Thanks,
Nick


Re: [PATCH v4 28/46] KVM: PPC: Book3S HV P9: Reduce irq_work vs guest decrementer races

2021-03-23 Thread Nicholas Piggin
Excerpts from Alexey Kardashevskiy's message of March 23, 2021 8:13 pm:
> 
> 
> On 23/03/2021 12:02, Nicholas Piggin wrote:
>> irq_work's use of the DEC SPR is racy with guest<->host switch and guest
>> entry which flips the DEC interrupt to guest, which could lose a host
>> work interrupt.
>> 
>> This patch closes one race, and attempts to comment another class of
>> races.
>> 
>> Signed-off-by: Nicholas Piggin 
>> ---
>>   arch/powerpc/kvm/book3s_hv.c | 15 ++-
>>   1 file changed, 14 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index 1f38a0abc611..989a1ff5ad11 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -3745,6 +3745,18 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu 
>> *vcpu, u64 time_limit,
>>  if (!(vcpu->arch.ctrl & 1))
>>  mtspr(SPRN_CTRLT, mfspr(SPRN_CTRLF) & ~1);
>>   
>> +/*
>> + * When setting DEC, we must always deal with irq_work_raise via NMI vs
>> + * setting DEC. The problem occurs right as we switch into guest mode
>> + * if a NMI hits and sets pending work and sets DEC, then that will
>> + * apply to the guest and not bring us back to the host.
>> + *
>> + * irq_work_raise could check a flag (or possibly LPCR[HDICE] for
>> + * example) and set HDEC to 1? That wouldn't solve the nested hv
>> + * case which needs to abort the hcall or zero the time limit.
>> + *
>> + * XXX: Another day's problem.
>> + */
>>  mtspr(SPRN_DEC, vcpu->arch.dec_expires - tb);
>>   
>>  if (kvmhv_on_pseries()) {
>> @@ -3879,7 +3891,8 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
>> u64 time_limit,
>>  vc->entry_exit_map = 0x101;
>>  vc->in_guest = 0;
>>   
>> -mtspr(SPRN_DEC, local_paca->kvm_hstate.dec_expires - tb);
>> +set_dec_or_work(local_paca->kvm_hstate.dec_expires - tb);
> 
> 
> set_dec_or_work() will write local_paca->kvm_hstate.dec_expires - tb - 1 
> to SPRN_DEC which is not exactly the same, is this still alright?
> 
> I asked in v3 but it is probably lost :)

Oh I did see that then forgot.

It will write dec_expires - tb, then it will write 1 if it found irq_work
was pending.

The change is intentional to fixes one of the lost irq_work races.

Thanks,
Nick


[PATCH] sound:ppc: fix spelling typo of values

2021-03-23 Thread caizhichao
From: caizhichao 

vaules -> values

Signed-off-by: caizhichao 
---
 sound/ppc/snd_ps3_reg.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/ppc/snd_ps3_reg.h b/sound/ppc/snd_ps3_reg.h
index 566a318..e2212b7 100644
--- a/sound/ppc/snd_ps3_reg.h
+++ b/sound/ppc/snd_ps3_reg.h
@@ -308,7 +308,7 @@
 each interrupt in this register.
 Writing 1b to a field containing 1b clears field and de-asserts interrupt.
 Writing 0b to a field has no effect.
-Field vaules are the following:
+Field values are the following:
 0 - Interrupt hasn't occurred.
 1 - Interrupt has occurred.
 
-- 
1.9.1




Re: [PATCH v4 28/46] KVM: PPC: Book3S HV P9: Reduce irq_work vs guest decrementer races

2021-03-23 Thread Alexey Kardashevskiy




On 23/03/2021 12:02, Nicholas Piggin wrote:

irq_work's use of the DEC SPR is racy with guest<->host switch and guest
entry which flips the DEC interrupt to guest, which could lose a host
work interrupt.

This patch closes one race, and attempts to comment another class of
races.

Signed-off-by: Nicholas Piggin 
---
  arch/powerpc/kvm/book3s_hv.c | 15 ++-
  1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 1f38a0abc611..989a1ff5ad11 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3745,6 +3745,18 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
if (!(vcpu->arch.ctrl & 1))
mtspr(SPRN_CTRLT, mfspr(SPRN_CTRLF) & ~1);
  
+	/*

+* When setting DEC, we must always deal with irq_work_raise via NMI vs
+* setting DEC. The problem occurs right as we switch into guest mode
+* if a NMI hits and sets pending work and sets DEC, then that will
+* apply to the guest and not bring us back to the host.
+*
+* irq_work_raise could check a flag (or possibly LPCR[HDICE] for
+* example) and set HDEC to 1? That wouldn't solve the nested hv
+* case which needs to abort the hcall or zero the time limit.
+*
+* XXX: Another day's problem.
+*/
mtspr(SPRN_DEC, vcpu->arch.dec_expires - tb);
  
  	if (kvmhv_on_pseries()) {

@@ -3879,7 +3891,8 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, 
u64 time_limit,
vc->entry_exit_map = 0x101;
vc->in_guest = 0;
  
-	mtspr(SPRN_DEC, local_paca->kvm_hstate.dec_expires - tb);

+   set_dec_or_work(local_paca->kvm_hstate.dec_expires - tb);



set_dec_or_work() will write local_paca->kvm_hstate.dec_expires - tb - 1 
to SPRN_DEC which is not exactly the same, is this still alright?


I asked in v3 but it is probably lost :)


+
mtspr(SPRN_SPRG_VDSO_WRITE, local_paca->sprg_vdso);
  
  	kvmhv_load_host_pmu();




--
Alexey


Re: [PATCH 0/4] Rust for Linux for ppc64le

2021-03-23 Thread Miguel Ojeda
Hi Michael,

On Tue, Mar 23, 2021 at 4:27 AM Michael Ellerman  wrote:
>
> Hi all,
>
> Here's a first attempt at getting the kernel Rust support building on powerpc.

Thanks a *lot*! It is great to have more architectures rolling.

> It's powerpc64le only for now, as that's what I can easily test given the
> distros I have installed. Though powerpc and powerpc64 are also Tier 2 
> platforms

Even if it is just 64-bit, it is very good to have it!

> so in theory should work. Supporting those would require something more
> complicated than just pointing rustc at arch/$(ARCH)/rust/target.json.

Yeah, the arch/$(ARCH)/rust/target.json dance is a placeholder -- I
need to figure out how to do that more cleanly, likely generating them
on the fly.

> This is based on 832575d934a2 from the Rust-for-Linux tree. Anything newer 
> gives
> me errors about symbol name lengths. I figured I'd send this anyway, as it 
> seems
> like those errors are probably not powerpc specific.

Sure, feel free to send things even if they don't work completely.

I will take a look at the symbol name lengths -- I increased that
limit to 512 and added support for 2-byte lengths in the tables, but
perhaps something is missing. If I manage to make it work, I can add
ppc64le to our CI! :-)

> Michael Ellerman (4):
>   rust: Export symbols in initialized data section
>   rust: Add powerpc64 as a 64-bit target_arch in c_types.rs
>   powerpc/rust: Add target.json for ppc64le
>   rust: Enable for ppc64le

Regarding the development process: at least until the RFC we are
working with the usual GitHub PR workflow (for several reasons: having
a quick CI setup, getting new Rust developers on-board, having a list
of "issues", cross-reference with the Rust repo, etc.).

I can take patches from the list, of course, but since we are pre-RFC,
do you mind if they get rebased etc. through there?

Thanks again!

Cheers,
Miguel


Re: [PATCH v4 22/46] KVM: PPC: Book3S HV P9: Stop handling hcalls in real-mode in the P9 path

2021-03-23 Thread Nicholas Piggin
Excerpts from Alexey Kardashevskiy's message of March 23, 2021 7:24 pm:
> 
> 
> On 23/03/2021 20:16, Nicholas Piggin wrote:
>> Excerpts from Alexey Kardashevskiy's message of March 23, 2021 7:02 pm:
>>>
>>>
>>> On 23/03/2021 12:02, Nicholas Piggin wrote:
 diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
 b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
 index c11597f815e4..2d0d14ed1d92 100644
 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
 +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
 @@ -1397,9 +1397,14 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
mr  r4,r9
bge fast_guest_return
2:
 +  /* If we came in through the P9 short path, no real mode hcalls */
 +  lwz r0, STACK_SLOT_SHORT_PATH(r1)
 +  cmpwi   r0, 0
 +  bne no_try_real
>>>
>>>
>>> btw is mmu on at this point? or it gets enabled by rfid at the end of
>>> guest_exit_short_path?
>> 
>> Hash guest it's off. Radix guest it can be on or off depending on the
>> interrupt type and MSR and LPCR[AIL] values.
> 
> What I meant was - what do we expect here on p9? mmu on? ^w^w^w^w^w^w^w^w^w

P9 radix can be on or off. If the guest had MSR[IR] or MSR[DR] clear, or 
if the guest is running AIL=0 mode, or if this is a machine check, 
system reset, or HMI interrupt then the MMU will be off here.

> I just realized - it is radix so there is no problem with vmalloc 
> addresses in real mode as these do not use top 2 bits as on hash and the 
> exact mmu state is less important here. Cheers.

We still can't use vmalloc addresses in real mode on radix because they 
don't translate with the page tables.

Thanks,
Nick


Re: [PATCH 3/4] powerpc/rust: Add target.json for ppc64le

2021-03-23 Thread Miguel Ojeda
On Tue, Mar 23, 2021 at 4:27 AM Michael Ellerman  wrote:
>
> ppc64le only for now. We'll eventually need to come up with some way to
> change the target.json that's used based on more than just $(ARCH).

Indeed, it is one reason I didn't tackle e.g. x86 32-bit, because I
wanted to figure out how to do the whole `target.json` cleanly (i.e.
likely have a script generate them on the fly), so I thought it was
better to wait post-RFC.

Cheers,
Miguel


Re: [PATCH v4 22/46] KVM: PPC: Book3S HV P9: Stop handling hcalls in real-mode in the P9 path

2021-03-23 Thread Alexey Kardashevskiy




On 23/03/2021 20:16, Nicholas Piggin wrote:

Excerpts from Alexey Kardashevskiy's message of March 23, 2021 7:02 pm:



On 23/03/2021 12:02, Nicholas Piggin wrote:

In the interest of minimising the amount of code that is run in
"real-mode", don't handle hcalls in real mode in the P9 path.

POWER8 and earlier are much more expensive to exit from HV real mode
and switch to host mode, because on those processors HV interrupts get
to the hypervisor with the MMU off, and the other threads in the core
need to be pulled out of the guest, and SLBs all need to be saved,
ERATs invalidated, and host SLB reloaded before the MMU is re-enabled
in host mode. Hash guests also require a lot of hcalls to run. The
XICS interrupt controller requires hcalls to run.

By contrast, POWER9 has independent thread switching, and in radix mode
the hypervisor is already in a host virtual memory mode when the HV
interrupt is taken. Radix + xive guests don't need hcalls to handle
interrupts or manage translations.

So it's much less important to handle hcalls in real mode in P9.

Signed-off-by: Nicholas Piggin 
---
   arch/powerpc/include/asm/kvm_ppc.h  |  5 ++
   arch/powerpc/kvm/book3s_hv.c| 57 
   arch/powerpc/kvm/book3s_hv_rmhandlers.S |  5 ++
   arch/powerpc/kvm/book3s_xive.c  | 70 +
   4 files changed, 127 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 73b1ca5a6471..db6646c2ade2 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -607,6 +607,7 @@ extern void kvmppc_free_pimap(struct kvm *kvm);
   extern int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall);
   extern void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu);
   extern int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd);
+extern int kvmppc_xive_xics_hcall(struct kvm_vcpu *vcpu, u32 req);
   extern u64 kvmppc_xics_get_icp(struct kvm_vcpu *vcpu);
   extern int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 icpval);
   extern int kvmppc_xics_connect_vcpu(struct kvm_device *dev,
@@ -639,6 +640,8 @@ static inline int kvmppc_xics_enabled(struct kvm_vcpu *vcpu)
   static inline void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu) { }
   static inline int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd)
{ return 0; }
+static inline int kvmppc_xive_xics_hcall(struct kvm_vcpu *vcpu, u32 req)
+   { return 0; }
   #endif
   
   #ifdef CONFIG_KVM_XIVE

@@ -673,6 +676,7 @@ extern int kvmppc_xive_set_irq(struct kvm *kvm, int 
irq_source_id, u32 irq,
   int level, bool line_status);
   extern void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu);
   extern void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu);
+extern void kvmppc_xive_cede_vcpu(struct kvm_vcpu *vcpu);
   
   static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)

   {
@@ -714,6 +718,7 @@ static inline int kvmppc_xive_set_irq(struct kvm *kvm, int 
irq_source_id, u32 ir
  int level, bool line_status) { return 
-ENODEV; }
   static inline void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu) { }
   static inline void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu) { }
+static inline void kvmppc_xive_cede_vcpu(struct kvm_vcpu *vcpu) { }
   
   static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)

{ return 0; }
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index fa7614c37e08..17739aaee3d8 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1142,12 +1142,13 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
   }
   
   /*

- * Handle H_CEDE in the nested virtualization case where we haven't
- * called the real-mode hcall handlers in book3s_hv_rmhandlers.S.
+ * Handle H_CEDE in the P9 path where we don't call the real-mode hcall
+ * handlers in book3s_hv_rmhandlers.S.
+ *
* This has to be done early, not in kvmppc_pseries_do_hcall(), so
* that the cede logic in kvmppc_run_single_vcpu() works properly.
*/
-static void kvmppc_nested_cede(struct kvm_vcpu *vcpu)
+static void kvmppc_cede(struct kvm_vcpu *vcpu)
   {
vcpu->arch.shregs.msr |= MSR_EE;
vcpu->arch.ceded = 1;
@@ -1403,9 +1404,15 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
/* hcall - punt to userspace */
int i;
   
-		/* hypercall with MSR_PR has already been handled in rmode,

-* and never reaches here.
-*/
+   if (unlikely(vcpu->arch.shregs.msr & MSR_PR)) {
+   /*
+* Guest userspace executed sc 1, reflect it back as a
+* privileged program check interrupt.
+*/
+   kvmppc_core_queue_program(vcpu, SRR1_PROGPRIV);
+   r = RESUME_GUEST;
+   break;
+   }
   
   		run->

Re: [PATCH v4 22/46] KVM: PPC: Book3S HV P9: Stop handling hcalls in real-mode in the P9 path

2021-03-23 Thread Nicholas Piggin
Excerpts from Alexey Kardashevskiy's message of March 23, 2021 7:02 pm:
> 
> 
> On 23/03/2021 12:02, Nicholas Piggin wrote:
>> In the interest of minimising the amount of code that is run in
>> "real-mode", don't handle hcalls in real mode in the P9 path.
>> 
>> POWER8 and earlier are much more expensive to exit from HV real mode
>> and switch to host mode, because on those processors HV interrupts get
>> to the hypervisor with the MMU off, and the other threads in the core
>> need to be pulled out of the guest, and SLBs all need to be saved,
>> ERATs invalidated, and host SLB reloaded before the MMU is re-enabled
>> in host mode. Hash guests also require a lot of hcalls to run. The
>> XICS interrupt controller requires hcalls to run.
>> 
>> By contrast, POWER9 has independent thread switching, and in radix mode
>> the hypervisor is already in a host virtual memory mode when the HV
>> interrupt is taken. Radix + xive guests don't need hcalls to handle
>> interrupts or manage translations.
>> 
>> So it's much less important to handle hcalls in real mode in P9.
>> 
>> Signed-off-by: Nicholas Piggin 
>> ---
>>   arch/powerpc/include/asm/kvm_ppc.h  |  5 ++
>>   arch/powerpc/kvm/book3s_hv.c| 57 
>>   arch/powerpc/kvm/book3s_hv_rmhandlers.S |  5 ++
>>   arch/powerpc/kvm/book3s_xive.c  | 70 +
>>   4 files changed, 127 insertions(+), 10 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
>> b/arch/powerpc/include/asm/kvm_ppc.h
>> index 73b1ca5a6471..db6646c2ade2 100644
>> --- a/arch/powerpc/include/asm/kvm_ppc.h
>> +++ b/arch/powerpc/include/asm/kvm_ppc.h
>> @@ -607,6 +607,7 @@ extern void kvmppc_free_pimap(struct kvm *kvm);
>>   extern int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall);
>>   extern void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu);
>>   extern int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd);
>> +extern int kvmppc_xive_xics_hcall(struct kvm_vcpu *vcpu, u32 req);
>>   extern u64 kvmppc_xics_get_icp(struct kvm_vcpu *vcpu);
>>   extern int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 icpval);
>>   extern int kvmppc_xics_connect_vcpu(struct kvm_device *dev,
>> @@ -639,6 +640,8 @@ static inline int kvmppc_xics_enabled(struct kvm_vcpu 
>> *vcpu)
>>   static inline void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu) { }
>>   static inline int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd)
>>  { return 0; }
>> +static inline int kvmppc_xive_xics_hcall(struct kvm_vcpu *vcpu, u32 req)
>> +{ return 0; }
>>   #endif
>>   
>>   #ifdef CONFIG_KVM_XIVE
>> @@ -673,6 +676,7 @@ extern int kvmppc_xive_set_irq(struct kvm *kvm, int 
>> irq_source_id, u32 irq,
>> int level, bool line_status);
>>   extern void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu);
>>   extern void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu);
>> +extern void kvmppc_xive_cede_vcpu(struct kvm_vcpu *vcpu);
>>   
>>   static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)
>>   {
>> @@ -714,6 +718,7 @@ static inline int kvmppc_xive_set_irq(struct kvm *kvm, 
>> int irq_source_id, u32 ir
>>int level, bool line_status) { return 
>> -ENODEV; }
>>   static inline void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu) { }
>>   static inline void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu) { }
>> +static inline void kvmppc_xive_cede_vcpu(struct kvm_vcpu *vcpu) { }
>>   
>>   static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)
>>  { return 0; }
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index fa7614c37e08..17739aaee3d8 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -1142,12 +1142,13 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
>>   }
>>   
>>   /*
>> - * Handle H_CEDE in the nested virtualization case where we haven't
>> - * called the real-mode hcall handlers in book3s_hv_rmhandlers.S.
>> + * Handle H_CEDE in the P9 path where we don't call the real-mode hcall
>> + * handlers in book3s_hv_rmhandlers.S.
>> + *
>>* This has to be done early, not in kvmppc_pseries_do_hcall(), so
>>* that the cede logic in kvmppc_run_single_vcpu() works properly.
>>*/
>> -static void kvmppc_nested_cede(struct kvm_vcpu *vcpu)
>> +static void kvmppc_cede(struct kvm_vcpu *vcpu)
>>   {
>>  vcpu->arch.shregs.msr |= MSR_EE;
>>  vcpu->arch.ceded = 1;
>> @@ -1403,9 +1404,15 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu 
>> *vcpu,
>>  /* hcall - punt to userspace */
>>  int i;
>>   
>> -/* hypercall with MSR_PR has already been handled in rmode,
>> - * and never reaches here.
>> - */
>> +if (unlikely(vcpu->arch.shregs.msr & MSR_PR)) {
>> +/*
>> + * Guest userspace executed sc 1, reflect it back as a
>> + * privileged program check interrupt.
>> +  

[PATCH v2 -next] powerpc: kernel/time.c - cleanup warnings

2021-03-23 Thread He Ying
We found these warnings in arch/powerpc/kernel/time.c as follows:
warning: symbol 'decrementer_max' was not declared. Should it be static?
warning: symbol 'rtc_lock' was not declared. Should it be static?
warning: symbol 'dtl_consumer' was not declared. Should it be static?

Declare 'decrementer_max' and 'rtc_lock' in powerpc asm/time.h.
Rename 'rtc_lock' in drviers/rtc/rtc-vr41xx.c to 'vr41xx_rtc_lock' to
avoid the conflict with the variable in powerpc asm/time.h.
Move 'dtl_consumer' definition behind "include " because it
is declared there.

Reported-by: Hulk Robot 
Signed-off-by: He Ying 
---
v2:
- Instead of including linux/mc146818rtc.h in powerpc kernel/time.c, declare
  rtc_lock in powerpc asm/time.h.

 arch/powerpc/include/asm/time.h |  3 +++
 arch/powerpc/kernel/time.c  |  6 ++
 drivers/rtc/rtc-vr41xx.c| 22 +++---
 3 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index 8dd3cdb25338..64a3ef0b4270 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -12,6 +12,7 @@
 #ifdef __KERNEL__
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -22,6 +23,8 @@ extern unsigned long tb_ticks_per_jiffy;
 extern unsigned long tb_ticks_per_usec;
 extern unsigned long tb_ticks_per_sec;
 extern struct clock_event_device decrementer_clockevent;
+extern u64 decrementer_max;
+extern spinlock_t rtc_lock;
 
 
 extern void generic_calibrate_decr(void);
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index b67d93a609a2..60b6ac7d3685 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -150,10 +150,6 @@ bool tb_invalid;
 u64 __cputime_usec_factor;
 EXPORT_SYMBOL(__cputime_usec_factor);
 
-#ifdef CONFIG_PPC_SPLPAR
-void (*dtl_consumer)(struct dtl_entry *, u64);
-#endif
-
 static void calc_cputime_factors(void)
 {
struct div_result res;
@@ -179,6 +175,8 @@ static inline unsigned long read_spurr(unsigned long tb)
 
 #include 
 
+void (*dtl_consumer)(struct dtl_entry *, u64);
+
 /*
  * Scan the dispatch trace log and count up the stolen time.
  * Should be called with interrupts disabled.
diff --git a/drivers/rtc/rtc-vr41xx.c b/drivers/rtc/rtc-vr41xx.c
index 5a9f9ad86d32..cc31db058197 100644
--- a/drivers/rtc/rtc-vr41xx.c
+++ b/drivers/rtc/rtc-vr41xx.c
@@ -72,7 +72,7 @@ static void __iomem *rtc2_base;
 
 static unsigned long epoch = 1970; /* Jan 1 1970 00:00:00 */
 
-static DEFINE_SPINLOCK(rtc_lock);
+static DEFINE_SPINLOCK(vr41xx_rtc_lock);
 static char rtc_name[] = "RTC";
 static unsigned long periodic_count;
 static unsigned int alarm_enabled;
@@ -101,13 +101,13 @@ static inline time64_t read_elapsed_second(void)
 
 static inline void write_elapsed_second(time64_t sec)
 {
-   spin_lock_irq(&rtc_lock);
+   spin_lock_irq(&vr41xx_rtc_lock);
 
rtc1_write(ETIMELREG, (uint16_t)(sec << 15));
rtc1_write(ETIMEMREG, (uint16_t)(sec >> 1));
rtc1_write(ETIMEHREG, (uint16_t)(sec >> 17));
 
-   spin_unlock_irq(&rtc_lock);
+   spin_unlock_irq(&vr41xx_rtc_lock);
 }
 
 static int vr41xx_rtc_read_time(struct device *dev, struct rtc_time *time)
@@ -139,14 +139,14 @@ static int vr41xx_rtc_read_alarm(struct device *dev, 
struct rtc_wkalrm *wkalrm)
unsigned long low, mid, high;
struct rtc_time *time = &wkalrm->time;
 
-   spin_lock_irq(&rtc_lock);
+   spin_lock_irq(&vr41xx_rtc_lock);
 
low = rtc1_read(ECMPLREG);
mid = rtc1_read(ECMPMREG);
high = rtc1_read(ECMPHREG);
wkalrm->enabled = alarm_enabled;
 
-   spin_unlock_irq(&rtc_lock);
+   spin_unlock_irq(&vr41xx_rtc_lock);
 
rtc_time64_to_tm((high << 17) | (mid << 1) | (low >> 15), time);
 
@@ -159,7 +159,7 @@ static int vr41xx_rtc_set_alarm(struct device *dev, struct 
rtc_wkalrm *wkalrm)
 
alarm_sec = rtc_tm_to_time64(&wkalrm->time);
 
-   spin_lock_irq(&rtc_lock);
+   spin_lock_irq(&vr41xx_rtc_lock);
 
if (alarm_enabled)
disable_irq(aie_irq);
@@ -173,7 +173,7 @@ static int vr41xx_rtc_set_alarm(struct device *dev, struct 
rtc_wkalrm *wkalrm)
 
alarm_enabled = wkalrm->enabled;
 
-   spin_unlock_irq(&rtc_lock);
+   spin_unlock_irq(&vr41xx_rtc_lock);
 
return 0;
 }
@@ -202,7 +202,7 @@ static int vr41xx_rtc_ioctl(struct device *dev, unsigned 
int cmd, unsigned long
 
 static int vr41xx_rtc_alarm_irq_enable(struct device *dev, unsigned int 
enabled)
 {
-   spin_lock_irq(&rtc_lock);
+   spin_lock_irq(&vr41xx_rtc_lock);
if (enabled) {
if (!alarm_enabled) {
enable_irq(aie_irq);
@@ -214,7 +214,7 @@ static int vr41xx_rtc_alarm_irq_enable(struct device *dev, 
unsigned int enabled)
alarm_enabled = 0;
}
}
-   spin_unlock_irq(&rtc_lock);
+   spin_unlock_irq(&vr41xx_rtc_lock);
return 0;
 }
 
@@ -296,7 +296,7 @@ static 

Re: [PATCH v4 22/46] KVM: PPC: Book3S HV P9: Stop handling hcalls in real-mode in the P9 path

2021-03-23 Thread Alexey Kardashevskiy




On 23/03/2021 12:02, Nicholas Piggin wrote:

In the interest of minimising the amount of code that is run in
"real-mode", don't handle hcalls in real mode in the P9 path.

POWER8 and earlier are much more expensive to exit from HV real mode
and switch to host mode, because on those processors HV interrupts get
to the hypervisor with the MMU off, and the other threads in the core
need to be pulled out of the guest, and SLBs all need to be saved,
ERATs invalidated, and host SLB reloaded before the MMU is re-enabled
in host mode. Hash guests also require a lot of hcalls to run. The
XICS interrupt controller requires hcalls to run.

By contrast, POWER9 has independent thread switching, and in radix mode
the hypervisor is already in a host virtual memory mode when the HV
interrupt is taken. Radix + xive guests don't need hcalls to handle
interrupts or manage translations.

So it's much less important to handle hcalls in real mode in P9.

Signed-off-by: Nicholas Piggin 
---
  arch/powerpc/include/asm/kvm_ppc.h  |  5 ++
  arch/powerpc/kvm/book3s_hv.c| 57 
  arch/powerpc/kvm/book3s_hv_rmhandlers.S |  5 ++
  arch/powerpc/kvm/book3s_xive.c  | 70 +
  4 files changed, 127 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 73b1ca5a6471..db6646c2ade2 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -607,6 +607,7 @@ extern void kvmppc_free_pimap(struct kvm *kvm);
  extern int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall);
  extern void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu);
  extern int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd);
+extern int kvmppc_xive_xics_hcall(struct kvm_vcpu *vcpu, u32 req);
  extern u64 kvmppc_xics_get_icp(struct kvm_vcpu *vcpu);
  extern int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 icpval);
  extern int kvmppc_xics_connect_vcpu(struct kvm_device *dev,
@@ -639,6 +640,8 @@ static inline int kvmppc_xics_enabled(struct kvm_vcpu *vcpu)
  static inline void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu) { }
  static inline int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd)
{ return 0; }
+static inline int kvmppc_xive_xics_hcall(struct kvm_vcpu *vcpu, u32 req)
+   { return 0; }
  #endif
  
  #ifdef CONFIG_KVM_XIVE

@@ -673,6 +676,7 @@ extern int kvmppc_xive_set_irq(struct kvm *kvm, int 
irq_source_id, u32 irq,
   int level, bool line_status);
  extern void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu);
  extern void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu);
+extern void kvmppc_xive_cede_vcpu(struct kvm_vcpu *vcpu);
  
  static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)

  {
@@ -714,6 +718,7 @@ static inline int kvmppc_xive_set_irq(struct kvm *kvm, int 
irq_source_id, u32 ir
  int level, bool line_status) { return 
-ENODEV; }
  static inline void kvmppc_xive_push_vcpu(struct kvm_vcpu *vcpu) { }
  static inline void kvmppc_xive_pull_vcpu(struct kvm_vcpu *vcpu) { }
+static inline void kvmppc_xive_cede_vcpu(struct kvm_vcpu *vcpu) { }
  
  static inline int kvmppc_xive_enabled(struct kvm_vcpu *vcpu)

{ return 0; }
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index fa7614c37e08..17739aaee3d8 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1142,12 +1142,13 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
  }
  
  /*

- * Handle H_CEDE in the nested virtualization case where we haven't
- * called the real-mode hcall handlers in book3s_hv_rmhandlers.S.
+ * Handle H_CEDE in the P9 path where we don't call the real-mode hcall
+ * handlers in book3s_hv_rmhandlers.S.
+ *
   * This has to be done early, not in kvmppc_pseries_do_hcall(), so
   * that the cede logic in kvmppc_run_single_vcpu() works properly.
   */
-static void kvmppc_nested_cede(struct kvm_vcpu *vcpu)
+static void kvmppc_cede(struct kvm_vcpu *vcpu)
  {
vcpu->arch.shregs.msr |= MSR_EE;
vcpu->arch.ceded = 1;
@@ -1403,9 +1404,15 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu,
/* hcall - punt to userspace */
int i;
  
-		/* hypercall with MSR_PR has already been handled in rmode,

-* and never reaches here.
-*/
+   if (unlikely(vcpu->arch.shregs.msr & MSR_PR)) {
+   /*
+* Guest userspace executed sc 1, reflect it back as a
+* privileged program check interrupt.
+*/
+   kvmppc_core_queue_program(vcpu, SRR1_PROGPRIV);
+   r = RESUME_GUEST;
+   break;
+   }
  
  		run->papr_hcall.nr = kvmppc_get_gpr(vcpu, 3);

for (i = 0; i < 9; ++i)
@@ -3663,6 +3670,12 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vc

Re: [PATCH v4 04/46] KVM: PPC: Book3S HV: Prevent radix guests from setting LPCR[TC]

2021-03-23 Thread Alexey Kardashevskiy




On 23/03/2021 12:02, Nicholas Piggin wrote:

This bit only applies to hash partitions.

Signed-off-by: Nicholas Piggin 




Reviewed-by: Alexey Kardashevskiy 


---
  arch/powerpc/kvm/book3s_hv.c| 6 ++
  arch/powerpc/kvm/book3s_hv_nested.c | 3 +--
  2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index c5de7e3f22b6..1ffb0902e779 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1645,6 +1645,12 @@ static int kvm_arch_vcpu_ioctl_set_sregs_hv(struct 
kvm_vcpu *vcpu,
   */
  unsigned long kvmppc_filter_lpcr_hv(struct kvmppc_vcore *vc, unsigned long 
lpcr)
  {
+   struct kvm *kvm = vc->kvm;
+
+   /* LPCR_TC only applies to HPT guests */
+   if (kvm_is_radix(kvm))
+   lpcr &= ~LPCR_TC;
+
/* On POWER8 and above, userspace can modify AIL */
if (!cpu_has_feature(CPU_FTR_ARCH_207S))
lpcr &= ~LPCR_AIL;
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c 
b/arch/powerpc/kvm/book3s_hv_nested.c
index f7b441b3eb17..851e3f527eb2 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -140,8 +140,7 @@ static void sanitise_hv_regs(struct kvm_vcpu *vcpu, struct 
hv_guest_state *hr)
/*
 * Don't let L1 change LPCR bits for the L2 except these:
 */
-   mask = LPCR_DPFD | LPCR_ILE | LPCR_TC | LPCR_AIL | LPCR_LD |
-   LPCR_LPES | LPCR_MER;
+   mask = LPCR_DPFD | LPCR_ILE | LPCR_AIL | LPCR_LD | LPCR_LPES | LPCR_MER;
hr->lpcr = kvmppc_filter_lpcr_hv(vc,
(vc->lpcr & ~mask) | (hr->lpcr & mask));
  



--
Alexey


Re: [RFC Qemu PATCH v2 1/2] spapr: drc: Add support for async hcalls at the drc level

2021-03-23 Thread Shivaprasad G Bhat

Hi David,

Sorry about the delay.

On 2/8/21 11:51 AM, David Gibson wrote:

On Tue, Jan 19, 2021 at 12:40:31PM +0530, Shivaprasad G Bhat wrote:

Thanks for the comments!


On 12/28/20 2:08 PM, David Gibson wrote:


On Mon, Dec 21, 2020 at 01:08:53PM +0100, Greg Kurz wrote:

...

The overall idea looks good but I think you should consider using
a thread pool to implement it. See below.

I am not convinced, however.  Specifically, attaching this to the DRC
doesn't make sense to me.  We're adding exactly one DRC related async
hcall, and I can't really see much call for another one.  We could
have other async hcalls - indeed we already have one for HPT resizing
- but attaching this to DRCs doesn't help for those.

The semantics of the hcall made me think, if this is going to be
re-usable for future if implemented at DRC level.

It would only be re-usable for operations that are actually connected
to DRCs.  It doesn't seem to me particularly likely that we'll ever
have more asynchronous hcalls that are also associated with DRCs.

Okay


Other option
is to move the async-hcall-state/list into the NVDIMMState structure
in include/hw/mem/nvdimm.h and handle it with machine->nvdimms_state
at a global level.

I'm ok with either of two options:

A) Implement this ad-hoc for this specific case, making whatever
simplifications you can based on this specific case.


I am simplifying it to nvdimm use-case alone and limiting the scope.



B) Implement a general mechanism for async hcalls that is *not* tied
to DRCs.  Then use that for the existing H_RESIZE_HPT_PREPARE call as
well as this new one.


Hope you are okay with using the pool based approach that Greg

Honestly a thread pool seems like it might be overkill for this
application.


I think its appropriate here as that is what is being done by virtio-pmem

too for flush requests. The aio infrastructure simplifies lot of the

thread handling usage. Please suggest if you think there are better ways.


I am sending the next version addressing all the comments from you and Greg.


Thanks,

Shivaprasad



Re: [PATCH 1/1] powerpc/iommu: Enable remaining IOMMU Pagesizes present in LoPAR

2021-03-23 Thread Alexey Kardashevskiy




On 23/03/2021 06:09, Leonardo Bras wrote:

According to LoPAR, ibm,query-pe-dma-window output named "IO Page Sizes"
will let the OS know all possible pagesizes that can be used for creating a
new DDW.

Currently Linux will only try using 3 of the 8 available options:
4K, 64K and 16M. According to LoPAR, Hypervisor may also offer 32M, 64M,
128M, 256M and 16G.

Enabling bigger pages would be interesting for direct mapping systems
with a lot of RAM, while using less TCE entries.
> Signed-off-by: Leonardo Bras 
---
  arch/powerpc/include/asm/iommu.h   |  8 
  arch/powerpc/platforms/pseries/iommu.c | 28 +++---
  2 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index deef7c94d7b6..c170048b7a1b 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -19,6 +19,14 @@
  #include 
  #include 
  
+#define IOMMU_PAGE_SHIFT_16G	34

+#define IOMMU_PAGE_SHIFT_256M  28
+#define IOMMU_PAGE_SHIFT_128M  27
+#define IOMMU_PAGE_SHIFT_64M   26
+#define IOMMU_PAGE_SHIFT_32M   25
+#define IOMMU_PAGE_SHIFT_16M   24
+#define IOMMU_PAGE_SHIFT_64K   16



These are not very descriptive, these are just normal shifts, could be 
as simple as __builtin_ctz(SZ_4K) (gcc will optimize this) and so on.


OTOH the PAPR page sizes need macros as they are the ones which are 
weird and screaming for macros.


I'd steal/rework spapr_page_mask_to_query_mask() from QEMU. Thanks,





+
  #define IOMMU_PAGE_SHIFT_4K  12
  #define IOMMU_PAGE_SIZE_4K   (ASM_CONST(1) << IOMMU_PAGE_SHIFT_4K)
  #define IOMMU_PAGE_MASK_4K   (~((1 << IOMMU_PAGE_SHIFT_4K) - 1))
diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 9fc5217f0c8e..02958e80aa91 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -1099,6 +1099,24 @@ static void reset_dma_window(struct pci_dev *dev, struct 
device_node *par_dn)
 ret);
  }
  
+/* Returns page shift based on "IO Page Sizes" output at ibm,query-pe-dma-window. SeeL LoPAR */

+static int iommu_get_page_shift(u32 query_page_size)
+{
+   const int shift[] = {IOMMU_PAGE_SHIFT_4K,   IOMMU_PAGE_SHIFT_64K,  
IOMMU_PAGE_SHIFT_16M,
+IOMMU_PAGE_SHIFT_32M,  IOMMU_PAGE_SHIFT_64M,  
IOMMU_PAGE_SHIFT_128M,
+IOMMU_PAGE_SHIFT_256M, IOMMU_PAGE_SHIFT_16G};
+   int i = ARRAY_SIZE(shift) - 1;
+
+   /* Looks for the largest page size supported */
+   for (; i >= 0; i--) {
+   if (query_page_size & (1 << i))
+   return shift[i];
+   }
+
+   /* No valid page size found. */
+   return 0;
+}
+
  /*
   * If the PE supports dynamic dma windows, and there is space for a table
   * that can map all pages in a linear offset, then setup such a table,
@@ -1206,13 +1224,9 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
device_node *pdn)
goto out_failed;
}
}
-   if (query.page_size & 4) {
-   page_shift = 24; /* 16MB */
-   } else if (query.page_size & 2) {
-   page_shift = 16; /* 64kB */
-   } else if (query.page_size & 1) {
-   page_shift = 12; /* 4kB */
-   } else {
+
+   page_shift = iommu_get_page_shift(query.page_size);
+   if (!page_shift) {
dev_dbg(&dev->dev, "no supported direct page size in mask %x",
  query.page_size);
goto out_failed;



--
Alexey


Re: [PATCH v3 19/41] KVM: PPC: Book3S HV P9: Stop handling hcalls in real-mode in the P9 path

2021-03-23 Thread Cédric Le Goater
On 3/22/21 7:22 PM, Nicholas Piggin wrote:
> Excerpts from Cédric Le Goater's message of March 23, 2021 2:01 am:
>> On 3/22/21 2:15 PM, Nicholas Piggin wrote:
>>> Excerpts from Alexey Kardashevskiy's message of March 22, 2021 5:30 pm:


 On 06/03/2021 02:06, Nicholas Piggin wrote:
> In the interest of minimising the amount of code that is run in>>> 
> "real-mode", don't handle hcalls in real mode in the P9 path.
>
> POWER8 and earlier are much more expensive to exit from HV real mode
> and switch to host mode, because on those processors HV interrupts get
> to the hypervisor with the MMU off, and the other threads in the core
> need to be pulled out of the guest, and SLBs all need to be saved,
> ERATs invalidated, and host SLB reloaded before the MMU is re-enabled
> in host mode. Hash guests also require a lot of hcalls to run. The
> XICS interrupt controller requires hcalls to run.
>
> By contrast, POWER9 has independent thread switching, and in radix mode
> the hypervisor is already in a host virtual memory mode when the HV
> interrupt is taken. Radix + xive guests don't need hcalls to handle
> interrupts or manage translations.
>>
>> Do we need to handle the host-is-a-P9-without-xive case ?
> 
> I'm not sure really. Is there an intention for OPAL to be able to 
> provide a fallback layer in the worst case?

yes. OPAL has a XICS-on-XIVE emulation for P9, implemented for bringup,
and it still boots, XICS guest can run. P10 doesn't have it though.

> Maybe microwatt grows HV capability before XIVE?

I don't know if we should develop the same XIVE logic for microwatt. 
It's awfully complex and we have the XICS interface which works already. 

> So it's much less important to handle hcalls in real mode in P9.

 So acde25726bc6034b (which added if(kvm_is_radix(vcpu->kvm))return 
 H_TOO_HARD) can be reverted, pretty much?
>>>
>>> Yes. Although that calls attention to the fact I missed doing
>>> a P9 h_random handler in this patch. I'll fix that, then I think
>>> acde2572 could be reverted entirely.
>>>
>>> [...]
>>>
>   } else {
>   kvmppc_xive_push_vcpu(vcpu);
>   trap = kvmhv_load_hv_regs_and_go(vcpu, time_limit, 
> lpcr);
> - kvmppc_xive_pull_vcpu(vcpu);
> + /* H_CEDE has to be handled now, not later */
> + /* XICS hcalls must be handled before xive is pulled */
> + if (trap == BOOK3S_INTERRUPT_SYSCALL &&
> + !(vcpu->arch.shregs.msr & MSR_PR)) {
> + unsigned long req = kvmppc_get_gpr(vcpu, 3);
>   
> + if (req == H_CEDE) {
> + kvmppc_cede(vcpu);
> + kvmppc_xive_cede_vcpu(vcpu); /* may un-cede */
> + kvmppc_set_gpr(vcpu, 3, 0);
> + trap = 0;
> + }
> + if (req == H_EOI || req == H_CPPR ||

 else if (req == H_EOI ... ?
>>>
>>> Hummm, sure.
>>
>> you could integrate the H_CEDE in the switch statement below.
> 
> Below is in a different file just for the emulation calls.
> 
>>>
>>> [...]
>>>
> +void kvmppc_xive_cede_vcpu(struct kvm_vcpu *vcpu)
> +{
> + void __iomem *esc_vaddr = (void __iomem *)vcpu->arch.xive_esc_vaddr;
> +
> + if (!esc_vaddr)
> + return;
> +
> + /* we are using XIVE with single escalation */
> +
> + if (vcpu->arch.xive_esc_on) {
> + /*
> +  * If we still have a pending escalation, abort the cede,
> +  * and we must set PQ to 10 rather than 00 so that we don't
> +  * potentially end up with two entries for the escalation
> +  * interrupt in the XIVE interrupt queue.  In that case
> +  * we also don't want to set xive_esc_on to 1 here in
> +  * case we race with xive_esc_irq().
> +  */
> + vcpu->arch.ceded = 0;
> + /*
> +  * The escalation interrupts are special as we don't EOI them.
> +  * There is no need to use the load-after-store ordering offset
> +  * to set PQ to 10 as we won't use StoreEOI.
> +  */
> + __raw_readq(esc_vaddr + XIVE_ESB_SET_PQ_10);
> + } else {
> + vcpu->arch.xive_esc_on = true;
> + mb();
> + __raw_readq(esc_vaddr + XIVE_ESB_SET_PQ_00);
> + }
> + mb();


 Uff. Thanks for cut-n-pasting the comments, helped a lot to match this c 
 to that asm!
>>>
>>> Glad it helped.
> +}
>>
>> I had to do the PowerNV models in QEMU to start understanding that stuff ... 
>>
> +EXPORT_SYMBOL_GPL(kvmppc_xive_cede_vcpu);
> +
>   /*
>* This is a simple trigger for a generic XIVE IRQ. This must
>* only be called for interrupts that support a trigger page
> @@ -2106,6 +214