Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback

2024-04-06 Thread Anup Patel
On Fri, Apr 5, 2024 at 5:28 PM Paolo Bonzini  wrote:
>
> The .change_pte() MMU notifier callback was intended as an
> optimization. The original point of it was that KSM could tell KVM to flip
> its secondary PTE to a new location without having to first zap it. At
> the time there was also an .invalidate_page() callback; both of them were
> *not* bracketed by calls to mmu_notifier_invalidate_range_{start,end}(),
> and .invalidate_page() also doubled as a fallback implementation of
> .change_pte().
>
> Later on, however, both callbacks were changed to occur within an
> invalidate_range_start/end() block.
>
> In the case of .change_pte(), commit 6bdb913f0a70 ("mm: wrap calls to
> set_pte_at_notify with invalidate_range_start and invalidate_range_end",
> 2012-10-09) did so to remove the fallback from .invalidate_page() to
> .change_pte() and allow sleepable .invalidate_page() hooks.
>
> This however made KVM's usage of the .change_pte() callback completely
> moot, because KVM unmaps the sPTEs during .invalidate_range_start()
> and therefore .change_pte() has no hope of finding a sPTE to change.
> Drop the generic KVM code that dispatches to kvm_set_spte_gfn(), as
> well as all the architecture specific implementations.
>
> Signed-off-by: Paolo Bonzini 

For KVM RISC-V:
Acked-by: Anup Patel 

Regards,
Anup

> ---
>  arch/arm64/kvm/mmu.c  | 34 -
>  arch/loongarch/include/asm/kvm_host.h |  1 -
>  arch/loongarch/kvm/mmu.c  | 32 
>  arch/mips/kvm/mmu.c   | 30 ---
>  arch/powerpc/include/asm/kvm_ppc.h|  1 -
>  arch/powerpc/kvm/book3s.c |  5 ---
>  arch/powerpc/kvm/book3s.h |  1 -
>  arch/powerpc/kvm/book3s_64_mmu_hv.c   | 12 --
>  arch/powerpc/kvm/book3s_hv.c  |  1 -
>  arch/powerpc/kvm/book3s_pr.c  |  7 
>  arch/powerpc/kvm/e500_mmu_host.c  |  6 ---
>  arch/riscv/kvm/mmu.c  | 20 --
>  arch/x86/kvm/mmu/mmu.c| 54 +--
>  arch/x86/kvm/mmu/spte.c   | 16 
>  arch/x86/kvm/mmu/spte.h   |  2 -
>  arch/x86/kvm/mmu/tdp_mmu.c| 46 ---
>  arch/x86/kvm/mmu/tdp_mmu.h|  1 -
>  include/linux/kvm_host.h  |  2 -
>  include/trace/events/kvm.h| 15 
>  virt/kvm/kvm_main.c   | 43 -
>  20 files changed, 2 insertions(+), 327 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index dc04bc767865..ff17849be9f4 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1768,40 +1768,6 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct 
> kvm_gfn_range *range)
> return false;
>  }
>
> -bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
> -{
> -   kvm_pfn_t pfn = pte_pfn(range->arg.pte);
> -
> -   if (!kvm->arch.mmu.pgt)
> -   return false;
> -
> -   WARN_ON(range->end - range->start != 1);
> -
> -   /*
> -* If the page isn't tagged, defer to user_mem_abort() for sanitising
> -* the MTE tags. The S2 pte should have been unmapped by
> -* mmu_notifier_invalidate_range_end().
> -*/
> -   if (kvm_has_mte(kvm) && !page_mte_tagged(pfn_to_page(pfn)))
> -   return false;
> -
> -   /*
> -* We've moved a page around, probably through CoW, so let's treat
> -* it just like a translation fault and the map handler will clean
> -* the cache to the PoC.
> -*
> -* The MMU notifiers will have unmapped a huge PMD before calling
> -* ->change_pte() (which in turn calls kvm_set_spte_gfn()) and
> -* therefore we never need to clear out a huge PMD through this
> -* calling path and a memcache is not required.
> -*/
> -   kvm_pgtable_stage2_map(kvm->arch.mmu.pgt, range->start << PAGE_SHIFT,
> -  PAGE_SIZE, __pfn_to_phys(pfn),
> -  KVM_PGTABLE_PROT_R, NULL, 0);
> -
> -   return false;
> -}
> -
>  bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
>  {
> u64 size = (range->end - range->start) << PAGE_SHIFT;
> diff --git a/arch/loongarch/include/asm/kvm_host.h 
> b/arch/loongarch/include/asm/kvm_host.h
> index 2d62f7b0d377..69305441f40d 100644
> --- a/arch/loongarch/include/asm/kvm_host.h
> +++ b/arch/loongarch/include/asm/kvm_host.h
> @@ -203,7 +203,6 @@ void kvm_flush_tlb_all(void);
>  void kvm_flush_tlb_gpa(struct kvm_vcpu *vcpu, unsigned long gpa);
>  int kvm_handle_mm_

Re: [PATCH v3 2/2] riscv: Fix text patching when IPI are used

2024-03-04 Thread Anup Patel
On Tue, Mar 5, 2024 at 1:54 AM Björn Töpel  wrote:
>
> Conor Dooley  writes:
>
> > On Thu, Feb 29, 2024 at 01:10:56PM +0100, Alexandre Ghiti wrote:
> >> For now, we use stop_machine() to patch the text and when we use IPIs for
> >> remote icache flushes (which is emitted in patch_text_nosync()), the system
> >> hangs.
> >>
> >> So instead, make sure every CPU executes the stop_machine() patching
> >> function and emit a local icache flush there.
> >>
> >> Co-developed-by: Björn Töpel 
> >> Signed-off-by: Björn Töpel 
> >> Signed-off-by: Alexandre Ghiti 
> >> Reviewed-by: Andrea Parri 
> >
> > What commit does this fix?
>
> Hmm. The bug is exposed when the AIA IPI are introduced, and used
> (instead of the firmware-based).
>
> I'm not sure this is something we'd like backported, but rather a
> prerequisite to AIA.
>
> @Anup @Alex WDYT?
>

The current text patching never considered IPIs being injected
directly in S-mode from hart to another so we are seeing this
issue now with AIA IPIs.

We certainly don't need to backport this fix since it's more
of a preparatory fix for AIA IPIs.

Regards,
Anup



Re: [PATCH] riscv: Remove 32b kernel mapping from page table dump

2021-04-19 Thread Anup Patel
On Sun, Apr 18, 2021 at 4:59 PM Alexandre Ghiti  wrote:
>
> The 32b kernel mapping lies in the linear mapping, there is no point in
> printing its address in page table dump, so remove this leftover that
> comes from moving the kernel mapping outside the linear mapping for 64b
> kernel.
>
> Fixes: e9efb21fe352 ("riscv: Prepare ptdump for vm layout dynamic addresses")
> Signed-off-by: Alexandre Ghiti 

Looks good to me.

Reviewed-by: Anup Patel 

Regards,
Anup

> ---
>  arch/riscv/mm/ptdump.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c
> index 0aba4421115c..a4ed4bdbbfde 100644
> --- a/arch/riscv/mm/ptdump.c
> +++ b/arch/riscv/mm/ptdump.c
> @@ -76,8 +76,8 @@ enum address_markers_idx {
> PAGE_OFFSET_NR,
>  #ifdef CONFIG_64BIT
> MODULES_MAPPING_NR,
> -#endif
> KERNEL_MAPPING_NR,
> +#endif
> END_OF_SPACE_NR
>  };
>
> @@ -99,8 +99,8 @@ static struct addr_marker address_markers[] = {
> {0, "Linear mapping"},
>  #ifdef CONFIG_64BIT
> {0, "Modules mapping"},
> -#endif
> {0, "Kernel mapping (kernel, BPF)"},
> +#endif
> {-1, NULL},
>  };
>
> @@ -379,8 +379,8 @@ static int ptdump_init(void)
> address_markers[PAGE_OFFSET_NR].start_address = PAGE_OFFSET;
>  #ifdef CONFIG_64BIT
> address_markers[MODULES_MAPPING_NR].start_address = MODULES_VADDR;
> -#endif
> address_markers[KERNEL_MAPPING_NR].start_address = kernel_virt_addr;
> +#endif
>
> kernel_ptd_info.base_addr = KERN_VIRT_START;
>
> --
> 2.20.1
>


Re: [PATCH] riscv: Fix 32b kernel caused by 64b kernel mapping moving outside linear mapping

2021-04-19 Thread Anup Patel
On Sat, Apr 17, 2021 at 10:52 PM Alexandre Ghiti  wrote:
>
> Fix multiple leftovers when moving the kernel mapping outside the linear
> mapping for 64b kernel that left the 32b kernel unusable.
>
> Fixes: 4b67f48da707 ("riscv: Move kernel mapping outside of linear mapping")
> Signed-off-by: Alexandre Ghiti 

Quite a few #ifdef but I don't see any better way at the moment. Maybe we can
clean this later. Otherwise looks good to me.

Reviewed-by: Anup Patel 

Regards,
Anup

> ---
>  arch/riscv/include/asm/page.h|  9 +
>  arch/riscv/include/asm/pgtable.h | 16 
>  arch/riscv/mm/init.c | 25 -
>  3 files changed, 45 insertions(+), 5 deletions(-)
>
> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> index 22cfb2be60dc..f64b61296c0c 100644
> --- a/arch/riscv/include/asm/page.h
> +++ b/arch/riscv/include/asm/page.h
> @@ -90,15 +90,20 @@ typedef struct page *pgtable_t;
>
>  #ifdef CONFIG_MMU
>  extern unsigned long va_pa_offset;
> +#ifdef CONFIG_64BIT
>  extern unsigned long va_kernel_pa_offset;
> +#endif
>  extern unsigned long pfn_base;
>  #define ARCH_PFN_OFFSET(pfn_base)
>  #else
>  #define va_pa_offset   0
> +#ifdef CONFIG_64BIT
>  #define va_kernel_pa_offset0
> +#endif
>  #define ARCH_PFN_OFFSET(PAGE_OFFSET >> PAGE_SHIFT)
>  #endif /* CONFIG_MMU */
>
> +#ifdef CONFIG_64BIT
>  extern unsigned long kernel_virt_addr;
>
>  #define linear_mapping_pa_to_va(x) ((void *)((unsigned long)(x) + 
> va_pa_offset))
> @@ -112,6 +117,10 @@ extern unsigned long kernel_virt_addr;
> (_x < kernel_virt_addr) ? 
>   \
> linear_mapping_va_to_pa(_x) : kernel_mapping_va_to_pa(_x);
>   \
> })
> +#else
> +#define __pa_to_va_nodebug(x)  ((void *)((unsigned long) (x) + va_pa_offset))
> +#define __va_to_pa_nodebug(x)  ((unsigned long)(x) - va_pa_offset)
> +#endif
>
>  #ifdef CONFIG_DEBUG_VIRTUAL
>  extern phys_addr_t __virt_to_phys(unsigned long x);
> diff --git a/arch/riscv/include/asm/pgtable.h 
> b/arch/riscv/include/asm/pgtable.h
> index 80e63a93e903..5afda75cc2c3 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -16,19 +16,27 @@
>  #else
>
>  #define ADDRESS_SPACE_END  (UL(-1))
> -/*
> - * Leave 2GB for kernel and BPF at the end of the address space
> - */
> +
> +#ifdef CONFIG_64BIT
> +/* Leave 2GB for kernel and BPF at the end of the address space */
>  #define KERNEL_LINK_ADDR   (ADDRESS_SPACE_END - SZ_2G + 1)
> +#else
> +#define KERNEL_LINK_ADDR   PAGE_OFFSET
> +#endif
>
>  #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
>  #define VMALLOC_END  (PAGE_OFFSET - 1)
>  #define VMALLOC_START(PAGE_OFFSET - VMALLOC_SIZE)
>
> -/* KASLR should leave at least 128MB for BPF after the kernel */
>  #define BPF_JIT_REGION_SIZE(SZ_128M)
> +#ifdef CONFIG_64BIT
> +/* KASLR should leave at least 128MB for BPF after the kernel */
>  #define BPF_JIT_REGION_START   PFN_ALIGN((unsigned long)&_end)
>  #define BPF_JIT_REGION_END (BPF_JIT_REGION_START + BPF_JIT_REGION_SIZE)
> +#else
> +#define BPF_JIT_REGION_START   (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
> +#define BPF_JIT_REGION_END (VMALLOC_END)
> +#endif
>
>  /* Modules always live before the kernel */
>  #ifdef CONFIG_64BIT
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 093f3a96ecfc..dc9b988e0778 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -91,8 +91,10 @@ static void print_vm_layout(void)
>   (unsigned long)VMALLOC_END);
> print_mlm("lowmem", (unsigned long)PAGE_OFFSET,
>   (unsigned long)high_memory);
> +#ifdef CONFIG_64BIT
> print_mlm("kernel", (unsigned long)KERNEL_LINK_ADDR,
>   (unsigned long)ADDRESS_SPACE_END);
> +#endif
>  }
>  #else
>  static void print_vm_layout(void) { }
> @@ -165,9 +167,11 @@ static struct pt_alloc_ops pt_ops;
>  /* Offset between linear mapping virtual address and kernel load address */
>  unsigned long va_pa_offset;
>  EXPORT_SYMBOL(va_pa_offset);
> +#ifdef CONFIG_64BIT
>  /* Offset between kernel mapping virtual address and kernel load address */
>  unsigned long va_kernel_pa_offset;
>  EXPORT_SYMBOL(va_kernel_pa_offset);
> +#endif
>  unsigned long pfn_base;
>  EXPORT_SYMBOL(pfn_base);
>
> @@ -410,7 +414,9 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
> load_sz = (uintptr_t)(&_end) - load_pa;
>
> va_pa_offset = PAGE_OFFSET - load_pa;
> 

Re: [PATCH] riscv: Protect kernel linear mapping only if CONFIG_STRICT_KERNEL_RWX is set

2021-04-16 Thread Anup Patel
On Thu, Apr 15, 2021 at 4:34 PM Alexandre Ghiti  wrote:
>
> If CONFIG_STRICT_KERNEL_RWX is not set, we cannot set different permissions
> to the kernel data and text sections, so make sure it is defined before
> trying to protect the kernel linear mapping.
>
> Signed-off-by: Alexandre Ghiti 

Maybe you should add "Fixes:" tag in commit tag ?

Otherwise it looks good.

Reviewed-by: Anup Patel 

Regards,
Anup

> ---
>  arch/riscv/kernel/setup.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> index 626003bb5fca..ab394d173cd4 100644
> --- a/arch/riscv/kernel/setup.c
> +++ b/arch/riscv/kernel/setup.c
> @@ -264,12 +264,12 @@ void __init setup_arch(char **cmdline_p)
>
> sbi_init();
>
> -   if (IS_ENABLED(CONFIG_STRICT_KERNEL_RWX))
> +   if (IS_ENABLED(CONFIG_STRICT_KERNEL_RWX)) {
> protect_kernel_text_data();
> -
> -#if defined(CONFIG_64BIT) && defined(CONFIG_MMU)
> -   protect_kernel_linear_mapping_text_rodata();
> +#ifdef CONFIG_64BIT
> +   protect_kernel_linear_mapping_text_rodata();
>  #endif
> +   }
>
>  #ifdef CONFIG_SWIOTLB
> swiotlb_init(1);
> --
> 2.20.1
>


[PATCH] RISC-V: Fix error code returned by riscv_hartid_to_cpuid()

2021-04-15 Thread Anup Patel
We should return a negative error code upon failure in
riscv_hartid_to_cpuid() instead of NR_CPUS. This is also
aligned with all uses of riscv_hartid_to_cpuid() which
expect negative error code upon failure.

Fixes: 6825c7a80f18 ("RISC-V: Add logical CPU indexing
for RISC-V")
Cc: sta...@vger.kernel.org
Signed-off-by: Anup Patel 
---
 arch/riscv/kernel/smp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c
index ea028d9e0d24..d44567490d91 100644
--- a/arch/riscv/kernel/smp.c
+++ b/arch/riscv/kernel/smp.c
@@ -54,7 +54,7 @@ int riscv_hartid_to_cpuid(int hartid)
return i;
 
pr_err("Couldn't find cpu id for hartid [%d]\n", hartid);
-   return i;
+   return -ENOENT;
 }
 
 void riscv_cpuid_to_hartid_mask(const struct cpumask *in, struct cpumask *out)
-- 
2.25.1



Re: [PATCH v3 02/10] riscv: add __init section marker to some functions

2021-04-12 Thread Anup Patel
On Mon, Apr 12, 2021 at 9:47 PM Jisheng Zhang  wrote:
>
> From: Jisheng Zhang 
>
> They are not needed after booting, so mark them as __init to move them
> to the __init section.
>
> Signed-off-by: Jisheng Zhang 

Looks good to me.

Reviewed-by: Anup Patel 

Regards,
Anup

> ---
>  arch/riscv/kernel/cpufeature.c | 2 +-
>  arch/riscv/kernel/traps.c  | 2 +-
>  arch/riscv/mm/init.c   | 4 ++--
>  arch/riscv/mm/kasan_init.c | 6 +++---
>  arch/riscv/mm/ptdump.c | 2 +-
>  5 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index ac202f44a670..e4741e1f0add 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -59,7 +59,7 @@ bool __riscv_isa_extension_available(const unsigned long 
> *isa_bitmap, int bit)
>  }
>  EXPORT_SYMBOL_GPL(__riscv_isa_extension_available);
>
> -void riscv_fill_hwcap(void)
> +void __init riscv_fill_hwcap(void)
>  {
> struct device_node *node;
> const char *isa;
> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> index 0879b5df11b9..041f4b44262e 100644
> --- a/arch/riscv/kernel/traps.c
> +++ b/arch/riscv/kernel/traps.c
> @@ -196,6 +196,6 @@ int is_valid_bugaddr(unsigned long pc)
>  #endif /* CONFIG_GENERIC_BUG */
>
>  /* stvec & scratch is already set from head.S */
> -void trap_init(void)
> +void __init trap_init(void)
>  {
>  }
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index dbeaa4144e4d..ecd485662b07 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -70,7 +70,7 @@ static inline void print_mlm(char *name, unsigned long b, 
> unsigned long t)
>   (((t) - (b)) >> 20));
>  }
>
> -static void print_vm_layout(void)
> +static void __init print_vm_layout(void)
>  {
> pr_notice("Virtual kernel memory layout:\n");
> print_mlk("fixmap", (unsigned long)FIXADDR_START,
> @@ -553,7 +553,7 @@ static inline void setup_vm_final(void)
>  #endif /* CONFIG_MMU */
>
>  #ifdef CONFIG_STRICT_KERNEL_RWX
> -void protect_kernel_text_data(void)
> +void __init protect_kernel_text_data(void)
>  {
> unsigned long text_start = (unsigned long)_start;
> unsigned long init_text_start = (unsigned long)__init_text_begin;
> diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
> index ec0029097251..e459290d2629 100644
> --- a/arch/riscv/mm/kasan_init.c
> +++ b/arch/riscv/mm/kasan_init.c
> @@ -48,7 +48,7 @@ asmlinkage void __init kasan_early_init(void)
> local_flush_tlb_all();
>  }
>
> -static void kasan_populate_pte(pmd_t *pmd, unsigned long vaddr, unsigned 
> long end)
> +static void __init kasan_populate_pte(pmd_t *pmd, unsigned long vaddr, 
> unsigned long end)
>  {
> phys_addr_t phys_addr;
> pte_t *ptep, *base_pte;
> @@ -70,7 +70,7 @@ static void kasan_populate_pte(pmd_t *pmd, unsigned long 
> vaddr, unsigned long en
> set_pmd(pmd, pfn_pmd(PFN_DOWN(__pa(base_pte)), PAGE_TABLE));
>  }
>
> -static void kasan_populate_pmd(pgd_t *pgd, unsigned long vaddr, unsigned 
> long end)
> +static void __init kasan_populate_pmd(pgd_t *pgd, unsigned long vaddr, 
> unsigned long end)
>  {
> phys_addr_t phys_addr;
> pmd_t *pmdp, *base_pmd;
> @@ -105,7 +105,7 @@ static void kasan_populate_pmd(pgd_t *pgd, unsigned long 
> vaddr, unsigned long en
> set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
>  }
>
> -static void kasan_populate_pgd(unsigned long vaddr, unsigned long end)
> +static void __init kasan_populate_pgd(unsigned long vaddr, unsigned long end)
>  {
> phys_addr_t phys_addr;
> pgd_t *pgdp = pgd_offset_k(vaddr);
> diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c
> index ace74dec7492..3b7b6e4d025e 100644
> --- a/arch/riscv/mm/ptdump.c
> +++ b/arch/riscv/mm/ptdump.c
> @@ -331,7 +331,7 @@ static int ptdump_show(struct seq_file *m, void *v)
>
>  DEFINE_SHOW_ATTRIBUTE(ptdump);
>
> -static int ptdump_init(void)
> +static int __init ptdump_init(void)
>  {
> unsigned int i, j;
>
> --
> 2.31.0
>
>


Re: [PATCH v3 00/10] riscv: improve self-protection

2021-04-12 Thread Anup Patel
On Mon, Apr 12, 2021 at 9:46 PM Jisheng Zhang  wrote:
>
> From: Jisheng Zhang 
>
> patch1 removes the non-necessary setup_zero_page()
> patch2 is a trivial improvement patch to move some functions to .init
> section
>
> Then following patches improve self-protection by:
>
> Marking some variables __ro_after_init
> Constifing some variables
> Enabling ARCH_HAS_STRICT_MODULE_RWX
>
> Hi Anup,
>
> I kept the __init modification to trap_init(), I will cook a trivial
> series to provide a __weak but NULL trap_init() implementation in
> init/main.c then remove all NULL implementation from all arch.

Yes, it makes sense to do this as a separate series.

Regards,
Anup

>
> Thanks
>
> Since v2:
>   - collect Reviewed-by tag
>   - add one patch to remove unnecessary setup_zero_page()
>
> Since v1:
>   - no need to move bpf_jit_alloc_exec() and bpf_jit_free_exec() to core
> because RV32 uses the default module_alloc() for jit code which also
> meets W^X after patch8
>   - fix a build error caused by local debug code clean up
>
>
> Jisheng Zhang (10):
>   riscv: mm: Remove setup_zero_page()
>   riscv: add __init section marker to some functions
>   riscv: Mark some global variables __ro_after_init
>   riscv: Constify sys_call_table
>   riscv: Constify sbi_ipi_ops
>   riscv: kprobes: Implement alloc_insn_page()
>   riscv: bpf: Write protect JIT code
>   riscv: bpf: Avoid breaking W^X on RV64
>   riscv: module: Create module allocations without exec permissions
>   riscv: Set ARCH_HAS_STRICT_MODULE_RWX if MMU
>
>  arch/riscv/Kconfig |  1 +
>  arch/riscv/include/asm/smp.h   |  4 ++--
>  arch/riscv/include/asm/syscall.h   |  2 +-
>  arch/riscv/kernel/cpufeature.c |  2 +-
>  arch/riscv/kernel/module.c | 10 --
>  arch/riscv/kernel/probes/kprobes.c |  8 
>  arch/riscv/kernel/sbi.c| 10 +-
>  arch/riscv/kernel/smp.c|  6 +++---
>  arch/riscv/kernel/syscall_table.c  |  2 +-
>  arch/riscv/kernel/time.c   |  2 +-
>  arch/riscv/kernel/traps.c  |  2 +-
>  arch/riscv/kernel/vdso.c   |  4 ++--
>  arch/riscv/mm/init.c   | 16 +---
>  arch/riscv/mm/kasan_init.c |  6 +++---
>  arch/riscv/mm/ptdump.c |  2 +-
>  arch/riscv/net/bpf_jit_comp64.c|  2 +-
>  arch/riscv/net/bpf_jit_core.c  |  1 +
>  17 files changed, 45 insertions(+), 35 deletions(-)
>
> --
> 2.31.0
>
>


Re: [PATCH v3 01/10] riscv: mm: Remove setup_zero_page()

2021-04-12 Thread Anup Patel
On Mon, Apr 12, 2021 at 9:47 PM Jisheng Zhang  wrote:
>
> From: Jisheng Zhang 
>
> The empty_zero_page sits at .bss..page_aligned section, so will be
> cleared to zero during clearing bss, we don't need to clear it again.
>
> Signed-off-by: Jisheng Zhang 

Looks good to me.

Reviewed-by: Anup Patel 

Regards,
Anup

> ---
>  arch/riscv/mm/init.c | 6 --
>  1 file changed, 6 deletions(-)
>
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 7f5036fbee8c..dbeaa4144e4d 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -57,11 +57,6 @@ static void __init zone_sizes_init(void)
> free_area_init(max_zone_pfns);
>  }
>
> -static void setup_zero_page(void)
> -{
> -   memset((void *)empty_zero_page, 0, PAGE_SIZE);
> -}
> -
>  #if defined(CONFIG_MMU) && defined(CONFIG_DEBUG_VM)
>  static inline void print_mlk(char *name, unsigned long b, unsigned long t)
>  {
> @@ -589,7 +584,6 @@ void mark_rodata_ro(void)
>  void __init paging_init(void)
>  {
> setup_vm_final();
> -   setup_zero_page();
>  }
>
>  void __init misc_mem_init(void)
> --
> 2.31.0
>
>


Re: [PATCH v2 8/9] riscv: module: Create module allocations without exec permissions

2021-04-01 Thread Anup Patel
On Wed, Mar 31, 2021 at 10:04 PM Jisheng Zhang
 wrote:
>
> From: Jisheng Zhang 
>
> The core code manages the executable permissions of code regions of
> modules explicitly, it is not necessary to create the module vmalloc
> regions with RWX permissions. Create them with RW- permissions instead.
>
> Signed-off-by: Jisheng Zhang 

Looks good to me.

Reviewed-by: Anup Patel 

Regards,
Anup

> ---
>  arch/riscv/kernel/module.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/arch/riscv/kernel/module.c b/arch/riscv/kernel/module.c
> index 104fba889cf7..e89367bba7c9 100644
> --- a/arch/riscv/kernel/module.c
> +++ b/arch/riscv/kernel/module.c
> @@ -407,14 +407,20 @@ int apply_relocate_add(Elf_Shdr *sechdrs, const char 
> *strtab,
> return 0;
>  }
>
> -#if defined(CONFIG_MMU) && defined(CONFIG_64BIT)
> +#ifdef CONFIG_MMU
> +
> +#ifdef CONFIG_64BIT
>  #define VMALLOC_MODULE_START \
>  max(PFN_ALIGN((unsigned long)&_end - SZ_2G), VMALLOC_START)
> +#else
> +#define VMALLOC_MODULE_START   VMALLOC_START
> +#endif
> +
>  void *module_alloc(unsigned long size)
>  {
> return __vmalloc_node_range(size, 1, VMALLOC_MODULE_START,
> VMALLOC_END, GFP_KERNEL,
> -   PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE,
> +   PAGE_KERNEL, 0, NUMA_NO_NODE,
> __builtin_return_address(0));
>  }
>  #endif
> --
> 2.31.0
>
>
>
> ___
> linux-riscv mailing list
> linux-ri...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv


Re: [PATCH v2 5/9] riscv: kprobes: Implement alloc_insn_page()

2021-04-01 Thread Anup Patel
On Wed, Mar 31, 2021 at 10:02 PM Jisheng Zhang
 wrote:
>
> From: Jisheng Zhang 
>
> Allocate PAGE_KERNEL_READ_EXEC(read only, executable) page for kprobes
> insn page. This is to prepare for STRICT_MODULE_RWX.
>
> Signed-off-by: Jisheng Zhang 

Looks good to me.

Reviewed-by: Anup Patel 

Regards,
Anup

> ---
>  arch/riscv/kernel/probes/kprobes.c | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/arch/riscv/kernel/probes/kprobes.c 
> b/arch/riscv/kernel/probes/kprobes.c
> index 7e2c78e2ca6b..8c1f7a30aeed 100644
> --- a/arch/riscv/kernel/probes/kprobes.c
> +++ b/arch/riscv/kernel/probes/kprobes.c
> @@ -84,6 +84,14 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p)
> return 0;
>  }
>
> +void *alloc_insn_page(void)
> +{
> +   return  __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START, VMALLOC_END,
> +GFP_KERNEL, PAGE_KERNEL_READ_EXEC,
> +VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
> +__builtin_return_address(0));
> +}
> +
>  /* install breakpoint in text */
>  void __kprobes arch_arm_kprobe(struct kprobe *p)
>  {
> --
> 2.31.0
>
>
>
> ___
> linux-riscv mailing list
> linux-ri...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv


Re: [PATCH v2 9/9] riscv: Set ARCH_HAS_STRICT_MODULE_RWX if MMU

2021-04-01 Thread Anup Patel
On Wed, Mar 31, 2021 at 10:05 PM Jisheng Zhang
 wrote:
>
> From: Jisheng Zhang 
>
> Now we can set ARCH_HAS_STRICT_MODULE_RWX for MMU riscv platforms, this
> is good from security perspective.
>
> Signed-off-by: Jisheng Zhang 

Looks good to me.

Reviewed-by: Anup Patel 

Regards,
Anup

> ---
>  arch/riscv/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 87d7b52f278f..9716be3674a2 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -28,6 +28,7 @@ config RISCV
> select ARCH_HAS_SET_DIRECT_MAP
> select ARCH_HAS_SET_MEMORY
> select ARCH_HAS_STRICT_KERNEL_RWX if MMU
> +   select ARCH_HAS_STRICT_MODULE_RWX if MMU
> select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
> select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT
> select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
> --
> 2.31.0
>
>
>
> ___
> linux-riscv mailing list
> linux-ri...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv


Re: [PATCH v2 4/9] riscv: Constify sbi_ipi_ops

2021-04-01 Thread Anup Patel
On Wed, Mar 31, 2021 at 10:02 PM Jisheng Zhang
 wrote:
>
> From: Jisheng Zhang 
>
> Constify the sbi_ipi_ops so that it will be placed in the .rodata
> section. This will cause attempts to modify it to fail when strict
> page permissions are in place.
>
> Signed-off-by: Jisheng Zhang 

Looks good to me.

Reviewed-by: Anup Patel 

Regards,
Anup

> ---
>  arch/riscv/include/asm/smp.h | 4 ++--
>  arch/riscv/kernel/sbi.c  | 2 +-
>  arch/riscv/kernel/smp.c  | 4 ++--
>  3 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/arch/riscv/include/asm/smp.h b/arch/riscv/include/asm/smp.h
> index df1f7c4cd433..a7d2811f3536 100644
> --- a/arch/riscv/include/asm/smp.h
> +++ b/arch/riscv/include/asm/smp.h
> @@ -46,7 +46,7 @@ int riscv_hartid_to_cpuid(int hartid);
>  void riscv_cpuid_to_hartid_mask(const struct cpumask *in, struct cpumask 
> *out);
>
>  /* Set custom IPI operations */
> -void riscv_set_ipi_ops(struct riscv_ipi_ops *ops);
> +void riscv_set_ipi_ops(const struct riscv_ipi_ops *ops);
>
>  /* Clear IPI for current CPU */
>  void riscv_clear_ipi(void);
> @@ -92,7 +92,7 @@ static inline void riscv_cpuid_to_hartid_mask(const struct 
> cpumask *in,
> cpumask_set_cpu(boot_cpu_hartid, out);
>  }
>
> -static inline void riscv_set_ipi_ops(struct riscv_ipi_ops *ops)
> +static inline void riscv_set_ipi_ops(const struct riscv_ipi_ops *ops)
>  {
>  }
>
> diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
> index cbd94a72eaa7..cb848e80865e 100644
> --- a/arch/riscv/kernel/sbi.c
> +++ b/arch/riscv/kernel/sbi.c
> @@ -556,7 +556,7 @@ static void sbi_send_cpumask_ipi(const struct cpumask 
> *target)
> sbi_send_ipi(cpumask_bits(_mask));
>  }
>
> -static struct riscv_ipi_ops sbi_ipi_ops = {
> +static const struct riscv_ipi_ops sbi_ipi_ops = {
> .ipi_inject = sbi_send_cpumask_ipi
>  };
>
> diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c
> index 504284d49135..e035124f06dc 100644
> --- a/arch/riscv/kernel/smp.c
> +++ b/arch/riscv/kernel/smp.c
> @@ -85,9 +85,9 @@ static void ipi_stop(void)
> wait_for_interrupt();
>  }
>
> -static struct riscv_ipi_ops *ipi_ops __ro_after_init;
> +static const struct riscv_ipi_ops *ipi_ops __ro_after_init;
>
> -void riscv_set_ipi_ops(struct riscv_ipi_ops *ops)
> +void riscv_set_ipi_ops(const struct riscv_ipi_ops *ops)
>  {
> ipi_ops = ops;
>  }
> --
> 2.31.0
>
>
>
> ___
> linux-riscv mailing list
> linux-ri...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv


Re: [PATCH v2 3/9] riscv: Constify sys_call_table

2021-04-01 Thread Anup Patel
On Wed, Mar 31, 2021 at 10:01 PM Jisheng Zhang
 wrote:
>
> From: Jisheng Zhang 
>
> Constify the sys_call_table so that it will be placed in the .rodata
> section. This will cause attempts to modify the table to fail when
> strict page permissions are in place.
>
> Signed-off-by: Jisheng Zhang 

Looks good to me.

Reviewed-by: Anup Patel 

Regards,
Anup

> ---
>  arch/riscv/include/asm/syscall.h  | 2 +-
>  arch/riscv/kernel/syscall_table.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/riscv/include/asm/syscall.h 
> b/arch/riscv/include/asm/syscall.h
> index 49350c8bd7b0..b933b1583c9f 100644
> --- a/arch/riscv/include/asm/syscall.h
> +++ b/arch/riscv/include/asm/syscall.h
> @@ -15,7 +15,7 @@
>  #include 
>
>  /* The array of function pointers for syscalls. */
> -extern void *sys_call_table[];
> +extern void * const sys_call_table[];
>
>  /*
>   * Only the low 32 bits of orig_r0 are meaningful, so we return int.
> diff --git a/arch/riscv/kernel/syscall_table.c 
> b/arch/riscv/kernel/syscall_table.c
> index f1ead9df96ca..a63c667c27b3 100644
> --- a/arch/riscv/kernel/syscall_table.c
> +++ b/arch/riscv/kernel/syscall_table.c
> @@ -13,7 +13,7 @@
>  #undef __SYSCALL
>  #define __SYSCALL(nr, call)[nr] = (call),
>
> -void *sys_call_table[__NR_syscalls] = {
> +void * const sys_call_table[__NR_syscalls] = {
> [0 ... __NR_syscalls - 1] = sys_ni_syscall,
>  #include 
>  };
> --
> 2.31.0
>
>
>
> ___
> linux-riscv mailing list
> linux-ri...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv


Re: [PATCH v2 2/9] riscv: Mark some global variables __ro_after_init

2021-04-01 Thread Anup Patel
On Wed, Mar 31, 2021 at 10:01 PM Jisheng Zhang
 wrote:
>
> From: Jisheng Zhang 
>
> All of these are never modified after init, so they can be
> __ro_after_init.
>
> Signed-off-by: Jisheng Zhang 

Looks good to me.

Reviewed-by: Anup Patel 

Regards,
Anup

> ---
>  arch/riscv/kernel/sbi.c  | 8 
>  arch/riscv/kernel/smp.c  | 4 ++--
>  arch/riscv/kernel/time.c | 2 +-
>  arch/riscv/kernel/vdso.c | 4 ++--
>  arch/riscv/mm/init.c | 6 +++---
>  5 files changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
> index d3bf756321a5..cbd94a72eaa7 100644
> --- a/arch/riscv/kernel/sbi.c
> +++ b/arch/riscv/kernel/sbi.c
> @@ -11,14 +11,14 @@
>  #include 
>
>  /* default SBI version is 0.1 */
> -unsigned long sbi_spec_version = SBI_SPEC_VERSION_DEFAULT;
> +unsigned long sbi_spec_version __ro_after_init = SBI_SPEC_VERSION_DEFAULT;
>  EXPORT_SYMBOL(sbi_spec_version);
>
> -static void (*__sbi_set_timer)(uint64_t stime);
> -static int (*__sbi_send_ipi)(const unsigned long *hart_mask);
> +static void (*__sbi_set_timer)(uint64_t stime) __ro_after_init;
> +static int (*__sbi_send_ipi)(const unsigned long *hart_mask) __ro_after_init;
>  static int (*__sbi_rfence)(int fid, const unsigned long *hart_mask,
>unsigned long start, unsigned long size,
> -  unsigned long arg4, unsigned long arg5);
> +  unsigned long arg4, unsigned long arg5) 
> __ro_after_init;
>
>  struct sbiret sbi_ecall(int ext, int fid, unsigned long arg0,
> unsigned long arg1, unsigned long arg2,
> diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c
> index ea028d9e0d24..504284d49135 100644
> --- a/arch/riscv/kernel/smp.c
> +++ b/arch/riscv/kernel/smp.c
> @@ -30,7 +30,7 @@ enum ipi_message_type {
> IPI_MAX
>  };
>
> -unsigned long __cpuid_to_hartid_map[NR_CPUS] = {
> +unsigned long __cpuid_to_hartid_map[NR_CPUS] __ro_after_init = {
> [0 ... NR_CPUS-1] = INVALID_HARTID
>  };
>
> @@ -85,7 +85,7 @@ static void ipi_stop(void)
> wait_for_interrupt();
>  }
>
> -static struct riscv_ipi_ops *ipi_ops;
> +static struct riscv_ipi_ops *ipi_ops __ro_after_init;
>
>  void riscv_set_ipi_ops(struct riscv_ipi_ops *ops)
>  {
> diff --git a/arch/riscv/kernel/time.c b/arch/riscv/kernel/time.c
> index 1b432264f7ef..8217b0f67c6c 100644
> --- a/arch/riscv/kernel/time.c
> +++ b/arch/riscv/kernel/time.c
> @@ -11,7 +11,7 @@
>  #include 
>  #include 
>
> -unsigned long riscv_timebase;
> +unsigned long riscv_timebase __ro_after_init;
>  EXPORT_SYMBOL_GPL(riscv_timebase);
>
>  void __init time_init(void)
> diff --git a/arch/riscv/kernel/vdso.c b/arch/riscv/kernel/vdso.c
> index 3f1d35e7c98a..25a3b8849599 100644
> --- a/arch/riscv/kernel/vdso.c
> +++ b/arch/riscv/kernel/vdso.c
> @@ -20,8 +20,8 @@
>
>  extern char vdso_start[], vdso_end[];
>
> -static unsigned int vdso_pages;
> -static struct page **vdso_pagelist;
> +static unsigned int vdso_pages __ro_after_init;
> +static struct page **vdso_pagelist __ro_after_init;
>
>  /*
>   * The vDSO data page.
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 76bf2de8aa59..719ec72ef069 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -149,11 +149,11 @@ void __init setup_bootmem(void)
>  }
>
>  #ifdef CONFIG_MMU
> -static struct pt_alloc_ops pt_ops;
> +static struct pt_alloc_ops pt_ops __ro_after_init;
>
> -unsigned long va_pa_offset;
> +unsigned long va_pa_offset __ro_after_init;
>  EXPORT_SYMBOL(va_pa_offset);
> -unsigned long pfn_base;
> +unsigned long pfn_base __ro_after_init;
>  EXPORT_SYMBOL(pfn_base);
>
>  pgd_t swapper_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
> --
> 2.31.0
>
>
>
> ___
> linux-riscv mailing list
> linux-ri...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv


Re: [PATCH v2 1/9] riscv: add __init section marker to some functions

2021-04-01 Thread Anup Patel
On Wed, Mar 31, 2021 at 10:00 PM Jisheng Zhang
 wrote:
>
> From: Jisheng Zhang 
>
> They are not needed after booting, so mark them as __init to move them
> to the __init section.
>
> Signed-off-by: Jisheng Zhang 
> ---
>  arch/riscv/kernel/traps.c  | 2 +-
>  arch/riscv/mm/init.c   | 6 +++---
>  arch/riscv/mm/kasan_init.c | 6 +++---
>  arch/riscv/mm/ptdump.c | 2 +-
>  4 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> index 1357abf79570..07fdded10c21 100644
> --- a/arch/riscv/kernel/traps.c
> +++ b/arch/riscv/kernel/traps.c
> @@ -197,6 +197,6 @@ int is_valid_bugaddr(unsigned long pc)
>  #endif /* CONFIG_GENERIC_BUG */
>
>  /* stvec & scratch is already set from head.S */
> -void trap_init(void)
> +void __init trap_init(void)
>  {
>  }

The trap_init() is unused currently so you can drop this change
and remove trap_init() as a separate patch.

> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 067583ab1bd7..76bf2de8aa59 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -57,7 +57,7 @@ static void __init zone_sizes_init(void)
> free_area_init(max_zone_pfns);
>  }
>
> -static void setup_zero_page(void)
> +static void __init setup_zero_page(void)
>  {
> memset((void *)empty_zero_page, 0, PAGE_SIZE);
>  }
> @@ -75,7 +75,7 @@ static inline void print_mlm(char *name, unsigned long b, 
> unsigned long t)
>   (((t) - (b)) >> 20));
>  }
>
> -static void print_vm_layout(void)
> +static void __init print_vm_layout(void)
>  {
> pr_notice("Virtual kernel memory layout:\n");
> print_mlk("fixmap", (unsigned long)FIXADDR_START,
> @@ -557,7 +557,7 @@ static inline void setup_vm_final(void)
>  #endif /* CONFIG_MMU */
>
>  #ifdef CONFIG_STRICT_KERNEL_RWX
> -void protect_kernel_text_data(void)
> +void __init protect_kernel_text_data(void)
>  {
> unsigned long text_start = (unsigned long)_start;
> unsigned long init_text_start = (unsigned long)__init_text_begin;
> diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
> index 4f85c6d0ddf8..e1d041ac1534 100644
> --- a/arch/riscv/mm/kasan_init.c
> +++ b/arch/riscv/mm/kasan_init.c
> @@ -60,7 +60,7 @@ asmlinkage void __init kasan_early_init(void)
> local_flush_tlb_all();
>  }
>
> -static void kasan_populate_pte(pmd_t *pmd, unsigned long vaddr, unsigned 
> long end)
> +static void __init kasan_populate_pte(pmd_t *pmd, unsigned long vaddr, 
> unsigned long end)
>  {
> phys_addr_t phys_addr;
> pte_t *ptep, *base_pte;
> @@ -82,7 +82,7 @@ static void kasan_populate_pte(pmd_t *pmd, unsigned long 
> vaddr, unsigned long en
> set_pmd(pmd, pfn_pmd(PFN_DOWN(__pa(base_pte)), PAGE_TABLE));
>  }
>
> -static void kasan_populate_pmd(pgd_t *pgd, unsigned long vaddr, unsigned 
> long end)
> +static void __init kasan_populate_pmd(pgd_t *pgd, unsigned long vaddr, 
> unsigned long end)
>  {
> phys_addr_t phys_addr;
> pmd_t *pmdp, *base_pmd;
> @@ -117,7 +117,7 @@ static void kasan_populate_pmd(pgd_t *pgd, unsigned long 
> vaddr, unsigned long en
> set_pgd(pgd, pfn_pgd(PFN_DOWN(__pa(base_pmd)), PAGE_TABLE));
>  }
>
> -static void kasan_populate_pgd(unsigned long vaddr, unsigned long end)
> +static void __init kasan_populate_pgd(unsigned long vaddr, unsigned long end)
>  {
> phys_addr_t phys_addr;
> pgd_t *pgdp = pgd_offset_k(vaddr);
> diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c
> index ace74dec7492..3b7b6e4d025e 100644
> --- a/arch/riscv/mm/ptdump.c
> +++ b/arch/riscv/mm/ptdump.c
> @@ -331,7 +331,7 @@ static int ptdump_show(struct seq_file *m, void *v)
>
>  DEFINE_SHOW_ATTRIBUTE(ptdump);
>
> -static int ptdump_init(void)
> +static int __init ptdump_init(void)
>  {
> unsigned int i, j;
>
> --
> 2.31.0
>
>
>
> ___
> linux-riscv mailing list
> linux-ri...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

Apart from above, looks good to me.

Reviewed-by: Anup Patel 

Regards,
Anup


[PATCH v17 12/17] RISC-V: KVM: Add timer functionality

2021-04-01 Thread Anup Patel
From: Atish Patra 

The RISC-V hypervisor specification doesn't have any virtual timer
feature.

Due to this, the guest VCPU timer will be programmed via SBI calls.
The host will use a separate hrtimer event for each guest VCPU to
provide timer functionality. We inject a virtual timer interrupt to
the guest VCPU whenever the guest VCPU hrtimer event expires.

This patch adds guest VCPU timer implementation along with ONE_REG
interface to access VCPU timer state from user space.

Signed-off-by: Atish Patra 
Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
Acked-by: Daniel Lezcano 
---
 arch/riscv/include/asm/kvm_host.h   |   7 +
 arch/riscv/include/asm/kvm_vcpu_timer.h |  44 +
 arch/riscv/include/uapi/asm/kvm.h   |  17 ++
 arch/riscv/kvm/Makefile |   2 +-
 arch/riscv/kvm/vcpu.c   |  14 ++
 arch/riscv/kvm/vcpu_timer.c | 225 
 arch/riscv/kvm/vm.c |   2 +-
 drivers/clocksource/timer-riscv.c   |   8 +
 include/clocksource/timer-riscv.h   |  16 ++
 9 files changed, 333 insertions(+), 2 deletions(-)
 create mode 100644 arch/riscv/include/asm/kvm_vcpu_timer.h
 create mode 100644 arch/riscv/kvm/vcpu_timer.c
 create mode 100644 include/clocksource/timer-riscv.h

diff --git a/arch/riscv/include/asm/kvm_host.h 
b/arch/riscv/include/asm/kvm_host.h
index 4e318137d82a..20308046d5df 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_64BIT
 #define KVM_MAX_VCPUS  (1U << 16)
@@ -65,6 +66,9 @@ struct kvm_arch {
/* stage2 page table */
pgd_t *pgd;
phys_addr_t pgd_phys;
+
+   /* Guest Timer */
+   struct kvm_guest_timer timer;
 };
 
 struct kvm_mmio_decode {
@@ -180,6 +184,9 @@ struct kvm_vcpu_arch {
unsigned long irqs_pending;
unsigned long irqs_pending_mask;
 
+   /* VCPU Timer */
+   struct kvm_vcpu_timer timer;
+
/* MMIO instruction details */
struct kvm_mmio_decode mmio_decode;
 
diff --git a/arch/riscv/include/asm/kvm_vcpu_timer.h 
b/arch/riscv/include/asm/kvm_vcpu_timer.h
new file mode 100644
index ..375281eb49e0
--- /dev/null
+++ b/arch/riscv/include/asm/kvm_vcpu_timer.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ * Atish Patra 
+ */
+
+#ifndef __KVM_VCPU_RISCV_TIMER_H
+#define __KVM_VCPU_RISCV_TIMER_H
+
+#include 
+
+struct kvm_guest_timer {
+   /* Mult & Shift values to get nanoseconds from cycles */
+   u32 nsec_mult;
+   u32 nsec_shift;
+   /* Time delta value */
+   u64 time_delta;
+};
+
+struct kvm_vcpu_timer {
+   /* Flag for whether init is done */
+   bool init_done;
+   /* Flag for whether timer event is configured */
+   bool next_set;
+   /* Next timer event cycles */
+   u64 next_cycles;
+   /* Underlying hrtimer instance */
+   struct hrtimer hrt;
+};
+
+int kvm_riscv_vcpu_timer_next_event(struct kvm_vcpu *vcpu, u64 ncycles);
+int kvm_riscv_vcpu_get_reg_timer(struct kvm_vcpu *vcpu,
+const struct kvm_one_reg *reg);
+int kvm_riscv_vcpu_set_reg_timer(struct kvm_vcpu *vcpu,
+const struct kvm_one_reg *reg);
+int kvm_riscv_vcpu_timer_init(struct kvm_vcpu *vcpu);
+int kvm_riscv_vcpu_timer_deinit(struct kvm_vcpu *vcpu);
+int kvm_riscv_vcpu_timer_reset(struct kvm_vcpu *vcpu);
+void kvm_riscv_vcpu_timer_restore(struct kvm_vcpu *vcpu);
+int kvm_riscv_guest_timer_init(struct kvm *kvm);
+
+#endif
diff --git a/arch/riscv/include/uapi/asm/kvm.h 
b/arch/riscv/include/uapi/asm/kvm.h
index f7e9dc388d54..08691dd27bcf 100644
--- a/arch/riscv/include/uapi/asm/kvm.h
+++ b/arch/riscv/include/uapi/asm/kvm.h
@@ -74,6 +74,18 @@ struct kvm_riscv_csr {
unsigned long scounteren;
 };
 
+/* TIMER registers for KVM_GET_ONE_REG and KVM_SET_ONE_REG */
+struct kvm_riscv_timer {
+   __u64 frequency;
+   __u64 time;
+   __u64 compare;
+   __u64 state;
+};
+
+/* Possible states for kvm_riscv_timer */
+#define KVM_RISCV_TIMER_STATE_OFF  0
+#define KVM_RISCV_TIMER_STATE_ON   1
+
 #define KVM_REG_SIZE(id)   \
(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
 
@@ -96,6 +108,11 @@ struct kvm_riscv_csr {
 #define KVM_REG_RISCV_CSR_REG(name)\
(offsetof(struct kvm_riscv_csr, name) / sizeof(unsigned long))
 
+/* Timer registers are mapped as type 4 */
+#define KVM_REG_RISCV_TIMER(0x04 << KVM_REG_RISCV_TYPE_SHIFT)
+#define KVM_REG_RISCV_TIMER_REG(name)  \
+   (offsetof(struct kvm_riscv_timer, name) / sizeof(__u64))
+
 #endif
 
 #endif /* __LINUX_KVM_RISCV_H */
diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
index b32f60edf48c

[PATCH v17 01/17] RISC-V: Add hypervisor extension related CSR defines

2021-04-01 Thread Anup Patel
This patch extends asm/csr.h by adding RISC-V hypervisor extension
related defines.

Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Alexander Graf 
---
 arch/riscv/include/asm/csr.h | 89 
 1 file changed, 89 insertions(+)

diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index caadfc1d7487..bdf00bd558e4 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -30,6 +30,8 @@
 #define SR_XS_CLEAN_AC(0x0001, UL)
 #define SR_XS_DIRTY_AC(0x00018000, UL)
 
+#define SR_MXR _AC(0x0008, UL)
+
 #ifndef CONFIG_64BIT
 #define SR_SD  _AC(0x8000, UL) /* FS/XS dirty */
 #else
@@ -58,22 +60,32 @@
 
 /* Interrupt causes (minus the high bit) */
 #define IRQ_S_SOFT 1
+#define IRQ_VS_SOFT2
 #define IRQ_M_SOFT 3
 #define IRQ_S_TIMER5
+#define IRQ_VS_TIMER   6
 #define IRQ_M_TIMER7
 #define IRQ_S_EXT  9
+#define IRQ_VS_EXT 10
 #define IRQ_M_EXT  11
 
 /* Exception causes */
 #define EXC_INST_MISALIGNED0
 #define EXC_INST_ACCESS1
+#define EXC_INST_ILLEGAL   2
 #define EXC_BREAKPOINT 3
 #define EXC_LOAD_ACCESS5
 #define EXC_STORE_ACCESS   7
 #define EXC_SYSCALL8
+#define EXC_HYPERVISOR_SYSCALL 9
+#define EXC_SUPERVISOR_SYSCALL 10
 #define EXC_INST_PAGE_FAULT12
 #define EXC_LOAD_PAGE_FAULT13
 #define EXC_STORE_PAGE_FAULT   15
+#define EXC_INST_GUEST_PAGE_FAULT  20
+#define EXC_LOAD_GUEST_PAGE_FAULT  21
+#define EXC_VIRTUAL_INST_FAULT 22
+#define EXC_STORE_GUEST_PAGE_FAULT 23
 
 /* PMP configuration */
 #define PMP_R  0x01
@@ -85,6 +97,58 @@
 #define PMP_A_NAPOT0x18
 #define PMP_L  0x80
 
+/* HSTATUS flags */
+#ifdef CONFIG_64BIT
+#define HSTATUS_VSXL   _AC(0x3, UL)
+#define HSTATUS_VSXL_SHIFT 32
+#endif
+#define HSTATUS_VTSR   _AC(0x0040, UL)
+#define HSTATUS_VTW_AC(0x0020, UL)
+#define HSTATUS_VTVM   _AC(0x0010, UL)
+#define HSTATUS_VGEIN  _AC(0x0003f000, UL)
+#define HSTATUS_VGEIN_SHIFT12
+#define HSTATUS_HU _AC(0x0200, UL)
+#define HSTATUS_SPVP   _AC(0x0100, UL)
+#define HSTATUS_SPV_AC(0x0080, UL)
+#define HSTATUS_GVA_AC(0x0040, UL)
+#define HSTATUS_VSBE   _AC(0x0020, UL)
+
+/* HGATP flags */
+#define HGATP_MODE_OFF _AC(0, UL)
+#define HGATP_MODE_SV32X4  _AC(1, UL)
+#define HGATP_MODE_SV39X4  _AC(8, UL)
+#define HGATP_MODE_SV48X4  _AC(9, UL)
+
+#define HGATP32_MODE_SHIFT 31
+#define HGATP32_VMID_SHIFT 22
+#define HGATP32_VMID_MASK  _AC(0x1FC0, UL)
+#define HGATP32_PPN_AC(0x003F, UL)
+
+#define HGATP64_MODE_SHIFT 60
+#define HGATP64_VMID_SHIFT 44
+#define HGATP64_VMID_MASK  _AC(0x03FFF000, UL)
+#define HGATP64_PPN_AC(0x0FFF, UL)
+
+#define HGATP_PAGE_SHIFT   12
+
+#ifdef CONFIG_64BIT
+#define HGATP_PPN  HGATP64_PPN
+#define HGATP_VMID_SHIFT   HGATP64_VMID_SHIFT
+#define HGATP_VMID_MASKHGATP64_VMID_MASK
+#define HGATP_MODE_SHIFT   HGATP64_MODE_SHIFT
+#else
+#define HGATP_PPN  HGATP32_PPN
+#define HGATP_VMID_SHIFT   HGATP32_VMID_SHIFT
+#define HGATP_VMID_MASKHGATP32_VMID_MASK
+#define HGATP_MODE_SHIFT   HGATP32_MODE_SHIFT
+#endif
+
+/* VSIP & HVIP relation */
+#define VSIP_TO_HVIP_SHIFT (IRQ_VS_SOFT - IRQ_S_SOFT)
+#define VSIP_VALID_MASK((_AC(1, UL) << IRQ_S_SOFT) | \
+(_AC(1, UL) << IRQ_S_TIMER) | \
+(_AC(1, UL) << IRQ_S_EXT))
+
 /* symbolic CSR names: */
 #define CSR_CYCLE  0xc00
 #define CSR_TIME   0xc01
@@ -104,6 +168,31 @@
 #define CSR_SIP0x144
 #define CSR_SATP   0x180
 
+#define CSR_VSSTATUS   0x200
+#define CSR_VSIE   0x204
+#define CSR_VSTVEC 0x205
+#define CSR_VSSCRATCH  0x240
+#define CSR_VSEPC  0x241
+#define CSR_VSCAUSE0x242
+#define CSR_VSTVAL 0x243
+#define CSR_VSIP   0x244
+#define CSR_VSATP  0x280
+
+#define CSR_HSTATUS0x600
+#define CSR_HEDELEG0x602
+#define CSR_HIDELEG0x603
+#define CSR_HIE0x604
+#define CSR_HTIMEDELTA 0x605
+#define CSR_HCOUNTEREN 0x606
+#define CSR_HGEIE  0x607
+#define CSR_HTIMEDELTAH0x615
+#define CSR_HTVAL  0x643
+#define CSR_HIP0x644
+#define CSR_HVIP   0x645
+#define CSR_HTINST 0x64a
+#define CSR_HGATP  0x680
+#define CSR_HGEIP  0xe12

[PATCH v17 16/17] RISC-V: KVM: Document RISC-V specific parts of KVM API

2021-04-01 Thread Anup Patel
Document RISC-V specific parts of the KVM API, such as:
 - The interrupt numbers passed to the KVM_INTERRUPT ioctl.
 - The states supported by the KVM_{GET,SET}_MP_STATE ioctls.
 - The registers supported by the KVM_{GET,SET}_ONE_REG interface
   and the encoding of those register ids.
 - The exit reason KVM_EXIT_RISCV_SBI for SBI calls forwarded to
   userspace tool.

CC: Jonathan Corbet 
CC: linux-...@vger.kernel.org
Signed-off-by: Anup Patel 
---
 Documentation/virt/kvm/api.rst | 193 +++--
 1 file changed, 184 insertions(+), 9 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 307f2fcf1b02..c8fe09a62690 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -532,7 +532,7 @@ translation mode.
 --
 
 :Capability: basic
-:Architectures: x86, ppc, mips
+:Architectures: x86, ppc, mips, riscv
 :Type: vcpu ioctl
 :Parameters: struct kvm_interrupt (in)
 :Returns: 0 on success, negative on failure.
@@ -601,6 +601,23 @@ interrupt number dequeues the interrupt.
 
 This is an asynchronous vcpu ioctl and can be invoked from any thread.
 
+RISC-V:
+^^^
+
+Queues an external interrupt to be injected into the virutal CPU. This ioctl
+is overloaded with 2 different irq values:
+
+a) KVM_INTERRUPT_SET
+
+   This sets external interrupt for a virtual CPU and it will receive
+   once it is ready.
+
+b) KVM_INTERRUPT_UNSET
+
+   This clears pending external interrupt for a virtual CPU.
+
+This is an asynchronous vcpu ioctl and can be invoked from any thread.
+
 
 4.17 KVM_DEBUG_GUEST
 
@@ -1394,7 +1411,7 @@ for vm-wide capabilities.
 -
 
 :Capability: KVM_CAP_MP_STATE
-:Architectures: x86, s390, arm, arm64
+:Architectures: x86, s390, arm, arm64, riscv
 :Type: vcpu ioctl
 :Parameters: struct kvm_mp_state (out)
 :Returns: 0 on success; -1 on error
@@ -1411,7 +1428,8 @@ uniprocessor guests).
 Possible values are:
 
==
===
-   KVM_MP_STATE_RUNNABLE the vcpu is currently running [x86,arm/arm64]
+   KVM_MP_STATE_RUNNABLE the vcpu is currently running
+ [x86,arm/arm64,riscv]
KVM_MP_STATE_UNINITIALIZEDthe vcpu is an application processor (AP)
  which has not yet received an INIT signal 
[x86]
KVM_MP_STATE_INIT_RECEIVEDthe vcpu has received an INIT signal, and is
@@ -1420,7 +1438,7 @@ Possible values are:
  is waiting for an interrupt [x86]
KVM_MP_STATE_SIPI_RECEIVEDthe vcpu has just received a SIPI (vector
  accessible via KVM_GET_VCPU_EVENTS) [x86]
-   KVM_MP_STATE_STOPPED  the vcpu is stopped [s390,arm/arm64]
+   KVM_MP_STATE_STOPPED  the vcpu is stopped [s390,arm/arm64,riscv]
KVM_MP_STATE_CHECK_STOP   the vcpu is in a special error state [s390]
KVM_MP_STATE_OPERATINGthe vcpu is operating (running or halted)
  [s390]
@@ -1432,8 +1450,8 @@ On x86, this ioctl is only useful after 
KVM_CREATE_IRQCHIP. Without an
 in-kernel irqchip, the multiprocessing state must be maintained by userspace on
 these architectures.
 
-For arm/arm64:
-^^
+For arm/arm64/riscv:
+
 
 The only states that are valid are KVM_MP_STATE_STOPPED and
 KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
@@ -1442,7 +1460,7 @@ KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused 
or not.
 -
 
 :Capability: KVM_CAP_MP_STATE
-:Architectures: x86, s390, arm, arm64
+:Architectures: x86, s390, arm, arm64, riscv
 :Type: vcpu ioctl
 :Parameters: struct kvm_mp_state (in)
 :Returns: 0 on success; -1 on error
@@ -1454,8 +1472,8 @@ On x86, this ioctl is only useful after 
KVM_CREATE_IRQCHIP. Without an
 in-kernel irqchip, the multiprocessing state must be maintained by userspace on
 these architectures.
 
-For arm/arm64:
-^^
+For arm/arm64/riscv:
+
 
 The only states that are valid are KVM_MP_STATE_STOPPED and
 KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not.
@@ -2572,6 +2590,144 @@ following id bit patterns::
 
   0x7020  0003 02 <0:3> 
 
+RISC-V registers are mapped using the lower 32 bits. The upper 8 bits of
+that is the register group type.
+
+RISC-V config registers are meant for configuring a Guest VCPU and it has
+the following id bit patterns::
+
+  0x8020  01  (32bit Host)
+  0x8030  01  (64bit Host)
+
+Following are the RISC-V config registers:
+
+=== = =
+EncodingRegister  Description
+=== = =
+  0x80x0  0100  isa   ISA feature bitmap of Gues

[PATCH v17 15/17] RISC-V: KVM: Add SBI v0.1 support

2021-04-01 Thread Anup Patel
From: Atish Patra 

The KVM host kernel is running in HS-mode needs so we need to handle
the SBI calls coming from guest kernel running in VS-mode.

This patch adds SBI v0.1 support in KVM RISC-V. Almost all SBI v0.1
calls are implemented in KVM kernel module except GETCHAR and PUTCHART
calls which are forwarded to user space because these calls cannot be
implemented in kernel space. In future, when we implement SBI v0.2 for
Guest, we will forward SBI v0.2 experimental and vendor extension calls
to user space.

Signed-off-by: Atish Patra 
Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
---
 arch/riscv/include/asm/kvm_host.h |  10 ++
 arch/riscv/kvm/Makefile   |   2 +-
 arch/riscv/kvm/vcpu.c |   9 ++
 arch/riscv/kvm/vcpu_exit.c|   4 +
 arch/riscv/kvm/vcpu_sbi.c | 173 ++
 include/uapi/linux/kvm.h  |   8 ++
 6 files changed, 205 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/kvm/vcpu_sbi.c

diff --git a/arch/riscv/include/asm/kvm_host.h 
b/arch/riscv/include/asm/kvm_host.h
index c6e717087a25..0818705e3c2b 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -79,6 +79,10 @@ struct kvm_mmio_decode {
int return_handled;
 };
 
+struct kvm_sbi_context {
+   int return_handled;
+};
+
 #define KVM_MMU_PAGE_CACHE_NR_OBJS 32
 
 struct kvm_mmu_page_cache {
@@ -191,6 +195,9 @@ struct kvm_vcpu_arch {
/* MMIO instruction details */
struct kvm_mmio_decode mmio_decode;
 
+   /* SBI context */
+   struct kvm_sbi_context sbi_context;
+
/* Cache pages needed to program page tables with spinlock held */
struct kvm_mmu_page_cache mmu_page_cache;
 
@@ -264,4 +271,7 @@ bool kvm_riscv_vcpu_has_interrupts(struct kvm_vcpu *vcpu, 
unsigned long mask);
 void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu);
 void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu);
 
+int kvm_riscv_vcpu_sbi_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu, struct kvm_run *run);
+
 #endif /* __RISCV_KVM_HOST_H__ */
diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
index a034826f9a3f..7cf0015d9142 100644
--- a/arch/riscv/kvm/Makefile
+++ b/arch/riscv/kvm/Makefile
@@ -10,6 +10,6 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
 kvm-objs := $(common-objs-y)
 
 kvm-objs += main.o vm.o vmid.o tlb.o mmu.o
-kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o
+kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o vcpu_sbi.o
 
 obj-$(CONFIG_KVM)  += kvm.o
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index a797f247db64..2c2c5e078c30 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -867,6 +867,15 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
}
}
 
+   /* Process SBI value returned from user-space */
+   if (run->exit_reason == KVM_EXIT_RISCV_SBI) {
+   ret = kvm_riscv_vcpu_sbi_return(vcpu, vcpu->run);
+   if (ret) {
+   srcu_read_unlock(>kvm->srcu, vcpu->arch.srcu_idx);
+   return ret;
+   }
+   }
+
if (run->immediate_exit) {
srcu_read_unlock(>kvm->srcu, vcpu->arch.srcu_idx);
return -EINTR;
diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
index 1873b8c35101..6d4e98e2ad6f 100644
--- a/arch/riscv/kvm/vcpu_exit.c
+++ b/arch/riscv/kvm/vcpu_exit.c
@@ -678,6 +678,10 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct 
kvm_run *run,
if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV)
ret = stage2_page_fault(vcpu, run, trap);
break;
+   case EXC_SUPERVISOR_SYSCALL:
+   if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV)
+   ret = kvm_riscv_vcpu_sbi_ecall(vcpu, run);
+   break;
default:
break;
};
diff --git a/arch/riscv/kvm/vcpu_sbi.c b/arch/riscv/kvm/vcpu_sbi.c
new file mode 100644
index ..9d1d25cf217f
--- /dev/null
+++ b/arch/riscv/kvm/vcpu_sbi.c
@@ -0,0 +1,173 @@
+// SPDX-License-Identifier: GPL-2.0
+/**
+ * Copyright (c) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ * Atish Patra 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define SBI_VERSION_MAJOR  0
+#define SBI_VERSION_MINOR  1
+
+static void kvm_sbi_system_shutdown(struct kvm_vcpu *vcpu,
+   struct kvm_run *run, u32 type)
+{
+   int i;
+   struct kvm_vcpu *tmp;
+
+   kvm_for_each_vcpu(i, tmp, vcpu->kvm)
+   tmp->arch.power_off = true;
+   kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
+
+   memset(>system_event, 0, sizeof(run->system_event));
+  

[PATCH v17 04/17] RISC-V: KVM: Implement VCPU interrupts and requests handling

2021-04-01 Thread Anup Patel
This patch implements VCPU interrupts and requests which are both
asynchronous events.

The VCPU interrupts can be set/unset using KVM_INTERRUPT ioctl from
user-space. In future, the in-kernel IRQCHIP emulation will use
kvm_riscv_vcpu_set_interrupt() and kvm_riscv_vcpu_unset_interrupt()
functions to set/unset VCPU interrupts.

Important VCPU requests implemented by this patch are:
KVM_REQ_SLEEP   - set whenever VCPU itself goes to sleep state
KVM_REQ_VCPU_RESET  - set whenever VCPU reset is requested

The WFI trap-n-emulate (added later) will use KVM_REQ_SLEEP request
and kvm_riscv_vcpu_has_interrupt() function.

The KVM_REQ_VCPU_RESET request will be used by SBI emulation (added
later) to power-up a VCPU in power-off state. The user-space can use
the GET_MPSTATE/SET_MPSTATE ioctls to get/set power state of a VCPU.

Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Alexander Graf 
---
 arch/riscv/include/asm/kvm_host.h |  23 
 arch/riscv/include/uapi/asm/kvm.h |   3 +
 arch/riscv/kvm/vcpu.c | 182 +++---
 3 files changed, 195 insertions(+), 13 deletions(-)

diff --git a/arch/riscv/include/asm/kvm_host.h 
b/arch/riscv/include/asm/kvm_host.h
index 2796a4211508..1bf660b1a9d8 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -132,6 +132,21 @@ struct kvm_vcpu_arch {
/* CPU CSR context upon Guest VCPU reset */
struct kvm_vcpu_csr guest_reset_csr;
 
+   /*
+* VCPU interrupts
+*
+* We have a lockless approach for tracking pending VCPU interrupts
+* implemented using atomic bitops. The irqs_pending bitmap represent
+* pending interrupts whereas irqs_pending_mask represent bits changed
+* in irqs_pending. Our approach is modeled around multiple producer
+* and single consumer problem where the consumer is the VCPU itself.
+*/
+   unsigned long irqs_pending;
+   unsigned long irqs_pending_mask;
+
+   /* VCPU power-off state */
+   bool power_off;
+
/* Don't run the VCPU (blocked) */
bool pause;
 
@@ -156,4 +171,12 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct 
kvm_run *run,
 
 static inline void __kvm_riscv_switch_to(struct kvm_vcpu_arch *vcpu_arch) {}
 
+int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq);
+int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq);
+void kvm_riscv_vcpu_flush_interrupts(struct kvm_vcpu *vcpu);
+void kvm_riscv_vcpu_sync_interrupts(struct kvm_vcpu *vcpu);
+bool kvm_riscv_vcpu_has_interrupts(struct kvm_vcpu *vcpu, unsigned long mask);
+void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu);
+void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu);
+
 #endif /* __RISCV_KVM_HOST_H__ */
diff --git a/arch/riscv/include/uapi/asm/kvm.h 
b/arch/riscv/include/uapi/asm/kvm.h
index 984d041a3e3b..3d3d703713c6 100644
--- a/arch/riscv/include/uapi/asm/kvm.h
+++ b/arch/riscv/include/uapi/asm/kvm.h
@@ -18,6 +18,9 @@
 
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
 
+#define KVM_INTERRUPT_SET  -1U
+#define KVM_INTERRUPT_UNSET-2U
+
 /* for KVM_GET_REGS and KVM_SET_REGS */
 struct kvm_regs {
 };
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index d87f56126df6..ae85a5d9b979 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -54,6 +55,9 @@ static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
memcpy(csr, reset_csr, sizeof(*csr));
 
memcpy(cntx, reset_cntx, sizeof(*cntx));
+
+   WRITE_ONCE(vcpu->arch.irqs_pending, 0);
+   WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0);
 }
 
 int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
@@ -97,8 +101,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 
 int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
 {
-   /* TODO: */
-   return 0;
+   return kvm_riscv_vcpu_has_interrupts(vcpu, 1UL << IRQ_VS_TIMER);
 }
 
 void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)
@@ -111,20 +114,18 @@ void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu)
 
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
 {
-   /* TODO: */
-   return 0;
+   return (kvm_riscv_vcpu_has_interrupts(vcpu, -1UL) &&
+   !vcpu->arch.power_off && !vcpu->arch.pause);
 }
 
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
 {
-   /* TODO: */
-   return 0;
+   return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
 }
 
 bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
 {
-   /* TODO: */
-   return false;
+   return (vcpu->arch.guest_context.sstatus & SR_SPP) ? true : false;
 }
 
 vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
@@ -135,7 +136,21 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, 
struct vm_fault *vmf)
 long kv

[PATCH v17 14/17] RISC-V: KVM: Implement ONE REG interface for FP registers

2021-04-01 Thread Anup Patel
From: Atish Patra 

Add a KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctl interface for floating
point registers such as F0-F31 and FCSR. This support is added for
both 'F' and 'D' extensions.

Signed-off-by: Atish Patra 
Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Alexander Graf 
---
 arch/riscv/include/uapi/asm/kvm.h |  10 +++
 arch/riscv/kvm/vcpu.c | 104 ++
 2 files changed, 114 insertions(+)

diff --git a/arch/riscv/include/uapi/asm/kvm.h 
b/arch/riscv/include/uapi/asm/kvm.h
index 08691dd27bcf..f808ad1ce500 100644
--- a/arch/riscv/include/uapi/asm/kvm.h
+++ b/arch/riscv/include/uapi/asm/kvm.h
@@ -113,6 +113,16 @@ struct kvm_riscv_timer {
 #define KVM_REG_RISCV_TIMER_REG(name)  \
(offsetof(struct kvm_riscv_timer, name) / sizeof(__u64))
 
+/* F extension registers are mapped as type 5 */
+#define KVM_REG_RISCV_FP_F (0x05 << KVM_REG_RISCV_TYPE_SHIFT)
+#define KVM_REG_RISCV_FP_F_REG(name)   \
+   (offsetof(struct __riscv_f_ext_state, name) / sizeof(__u32))
+
+/* D extension registers are mapped as type 6 */
+#define KVM_REG_RISCV_FP_D (0x06 << KVM_REG_RISCV_TYPE_SHIFT)
+#define KVM_REG_RISCV_FP_D_REG(name)   \
+   (offsetof(struct __riscv_d_ext_state, name) / sizeof(__u64))
+
 #endif
 
 #endif /* __LINUX_KVM_RISCV_H */
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 581fa55f7232..a797f247db64 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -416,6 +416,98 @@ static int kvm_riscv_vcpu_set_reg_csr(struct kvm_vcpu 
*vcpu,
return 0;
 }
 
+static int kvm_riscv_vcpu_get_reg_fp(struct kvm_vcpu *vcpu,
+const struct kvm_one_reg *reg,
+unsigned long rtype)
+{
+   struct kvm_cpu_context *cntx = >arch.guest_context;
+   unsigned long isa = vcpu->arch.isa;
+   unsigned long __user *uaddr =
+   (unsigned long __user *)(unsigned long)reg->addr;
+   unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
+   KVM_REG_SIZE_MASK |
+   rtype);
+   void *reg_val;
+
+   if ((rtype == KVM_REG_RISCV_FP_F) &&
+   riscv_isa_extension_available(, f)) {
+   if (KVM_REG_SIZE(reg->id) != sizeof(u32))
+   return -EINVAL;
+   if (reg_num == KVM_REG_RISCV_FP_F_REG(fcsr))
+   reg_val = >fp.f.fcsr;
+   else if ((KVM_REG_RISCV_FP_F_REG(f[0]) <= reg_num) &&
+ reg_num <= KVM_REG_RISCV_FP_F_REG(f[31]))
+   reg_val = >fp.f.f[reg_num];
+   else
+   return -EINVAL;
+   } else if ((rtype == KVM_REG_RISCV_FP_D) &&
+  riscv_isa_extension_available(, d)) {
+   if (reg_num == KVM_REG_RISCV_FP_D_REG(fcsr)) {
+   if (KVM_REG_SIZE(reg->id) != sizeof(u32))
+   return -EINVAL;
+   reg_val = >fp.d.fcsr;
+   } else if ((KVM_REG_RISCV_FP_D_REG(f[0]) <= reg_num) &&
+  reg_num <= KVM_REG_RISCV_FP_D_REG(f[31])) {
+   if (KVM_REG_SIZE(reg->id) != sizeof(u64))
+   return -EINVAL;
+   reg_val = >fp.d.f[reg_num];
+   } else
+   return -EINVAL;
+   } else
+   return -EINVAL;
+
+   if (copy_to_user(uaddr, reg_val, KVM_REG_SIZE(reg->id)))
+   return -EFAULT;
+
+   return 0;
+}
+
+static int kvm_riscv_vcpu_set_reg_fp(struct kvm_vcpu *vcpu,
+const struct kvm_one_reg *reg,
+unsigned long rtype)
+{
+   struct kvm_cpu_context *cntx = >arch.guest_context;
+   unsigned long isa = vcpu->arch.isa;
+   unsigned long __user *uaddr =
+   (unsigned long __user *)(unsigned long)reg->addr;
+   unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
+   KVM_REG_SIZE_MASK |
+   rtype);
+   void *reg_val;
+
+   if ((rtype == KVM_REG_RISCV_FP_F) &&
+   riscv_isa_extension_available(, f)) {
+   if (KVM_REG_SIZE(reg->id) != sizeof(u32))
+   return -EINVAL;
+   if (reg_num == KVM_REG_RISCV_FP_F_REG(fcsr))
+   reg_val = >fp.f.fcsr;
+   else if ((KVM_REG_RISCV_FP_F_REG(f[0]) <= reg_num) &&
+ reg_num <= KVM_REG_RISCV_FP_F_REG(f[31]))
+   reg_val = >fp.f.f[reg_num];
+   else
+   return -E

[PATCH v17 08/17] RISC-V: KVM: Handle WFI exits for VCPU

2021-04-01 Thread Anup Patel
We get illegal instruction trap whenever Guest/VM executes WFI
instruction.

This patch handles WFI trap by blocking the trapped VCPU using
kvm_vcpu_block() API. The blocked VCPU will be automatically
resumed whenever a VCPU interrupt is injected from user-space
or from in-kernel IRQCHIP emulation.

Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
---
 arch/riscv/kvm/vcpu_exit.c | 76 ++
 1 file changed, 76 insertions(+)

diff --git a/arch/riscv/kvm/vcpu_exit.c b/arch/riscv/kvm/vcpu_exit.c
index dc66be032ad7..1873b8c35101 100644
--- a/arch/riscv/kvm/vcpu_exit.c
+++ b/arch/riscv/kvm/vcpu_exit.c
@@ -12,6 +12,13 @@
 #include 
 #include 
 
+#define INSN_OPCODE_MASK   0x007c
+#define INSN_OPCODE_SHIFT  2
+#define INSN_OPCODE_SYSTEM 28
+
+#define INSN_MASK_WFI  0xff00
+#define INSN_MATCH_WFI 0x1050
+
 #define INSN_MATCH_LB  0x3
 #define INSN_MASK_LB   0x707f
 #define INSN_MATCH_LH  0x1003
@@ -116,6 +123,71 @@
 (s32)(((insn) >> 7) & 0x1f))
 #define MASK_FUNCT30x7000
 
+static int truly_illegal_insn(struct kvm_vcpu *vcpu,
+ struct kvm_run *run,
+ ulong insn)
+{
+   struct kvm_cpu_trap utrap = { 0 };
+
+   /* Redirect trap to Guest VCPU */
+   utrap.sepc = vcpu->arch.guest_context.sepc;
+   utrap.scause = EXC_INST_ILLEGAL;
+   utrap.stval = insn;
+   kvm_riscv_vcpu_trap_redirect(vcpu, );
+
+   return 1;
+}
+
+static int system_opcode_insn(struct kvm_vcpu *vcpu,
+ struct kvm_run *run,
+ ulong insn)
+{
+   if ((insn & INSN_MASK_WFI) == INSN_MATCH_WFI) {
+   vcpu->stat.wfi_exit_stat++;
+   if (!kvm_arch_vcpu_runnable(vcpu)) {
+   srcu_read_unlock(>kvm->srcu, vcpu->arch.srcu_idx);
+   kvm_vcpu_block(vcpu);
+   vcpu->arch.srcu_idx = srcu_read_lock(>kvm->srcu);
+   kvm_clear_request(KVM_REQ_UNHALT, vcpu);
+   }
+   vcpu->arch.guest_context.sepc += INSN_LEN(insn);
+   return 1;
+   }
+
+   return truly_illegal_insn(vcpu, run, insn);
+}
+
+static int virtual_inst_fault(struct kvm_vcpu *vcpu, struct kvm_run *run,
+ struct kvm_cpu_trap *trap)
+{
+   unsigned long insn = trap->stval;
+   struct kvm_cpu_trap utrap = { 0 };
+   struct kvm_cpu_context *ct;
+
+   if (unlikely(INSN_IS_16BIT(insn))) {
+   if (insn == 0) {
+   ct = >arch.guest_context;
+   insn = kvm_riscv_vcpu_unpriv_read(vcpu, true,
+ ct->sepc,
+ );
+   if (utrap.scause) {
+   utrap.sepc = ct->sepc;
+   kvm_riscv_vcpu_trap_redirect(vcpu, );
+   return 1;
+   }
+   }
+   if (INSN_IS_16BIT(insn))
+   return truly_illegal_insn(vcpu, run, insn);
+   }
+
+   switch ((insn & INSN_OPCODE_MASK) >> INSN_OPCODE_SHIFT) {
+   case INSN_OPCODE_SYSTEM:
+   return system_opcode_insn(vcpu, run, insn);
+   default:
+   return truly_illegal_insn(vcpu, run, insn);
+   }
+}
+
 static int emulate_load(struct kvm_vcpu *vcpu, struct kvm_run *run,
unsigned long fault_addr, unsigned long htinst)
 {
@@ -596,6 +668,10 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct 
kvm_run *run,
ret = -EFAULT;
run->exit_reason = KVM_EXIT_UNKNOWN;
switch (trap->scause) {
+   case EXC_VIRTUAL_INST_FAULT:
+   if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV)
+   ret = virtual_inst_fault(vcpu, run, trap);
+   break;
case EXC_INST_GUEST_PAGE_FAULT:
case EXC_LOAD_GUEST_PAGE_FAULT:
case EXC_STORE_GUEST_PAGE_FAULT:
-- 
2.25.1



[PATCH v17 10/17] RISC-V: KVM: Implement stage2 page table programming

2021-04-01 Thread Anup Patel
This patch implements all required functions for programming
the stage2 page table for each Guest/VM.

At high-level, the flow of stage2 related functions is similar
from KVM ARM/ARM64 implementation but the stage2 page table
format is quite different for KVM RISC-V.

[jiangyifei: stage2 dirty log support]
Signed-off-by: Yifei Jiang 
Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
---
 arch/riscv/include/asm/kvm_host.h |  12 +
 arch/riscv/include/asm/pgtable-bits.h |   1 +
 arch/riscv/kvm/Kconfig|   1 +
 arch/riscv/kvm/main.c |  19 +
 arch/riscv/kvm/mmu.c  | 650 +-
 arch/riscv/kvm/vm.c   |   6 -
 6 files changed, 673 insertions(+), 16 deletions(-)

diff --git a/arch/riscv/include/asm/kvm_host.h 
b/arch/riscv/include/asm/kvm_host.h
index 8612d8b35322..f80c394312b8 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -75,6 +75,13 @@ struct kvm_mmio_decode {
int return_handled;
 };
 
+#define KVM_MMU_PAGE_CACHE_NR_OBJS 32
+
+struct kvm_mmu_page_cache {
+   int nobjs;
+   void *objects[KVM_MMU_PAGE_CACHE_NR_OBJS];
+};
+
 struct kvm_cpu_trap {
unsigned long sepc;
unsigned long scause;
@@ -176,6 +183,9 @@ struct kvm_vcpu_arch {
/* MMIO instruction details */
struct kvm_mmio_decode mmio_decode;
 
+   /* Cache pages needed to program page tables with spinlock held */
+   struct kvm_mmu_page_cache mmu_page_cache;
+
/* VCPU power-off state */
bool power_off;
 
@@ -204,6 +214,8 @@ void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu);
 int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm);
 void kvm_riscv_stage2_free_pgd(struct kvm *kvm);
 void kvm_riscv_stage2_update_hgatp(struct kvm_vcpu *vcpu);
+void kvm_riscv_stage2_mode_detect(void);
+unsigned long kvm_riscv_stage2_mode(void);
 
 void kvm_riscv_stage2_vmid_detect(void);
 unsigned long kvm_riscv_stage2_vmid_bits(void);
diff --git a/arch/riscv/include/asm/pgtable-bits.h 
b/arch/riscv/include/asm/pgtable-bits.h
index bbaeb5d35842..be49d62fcc2b 100644
--- a/arch/riscv/include/asm/pgtable-bits.h
+++ b/arch/riscv/include/asm/pgtable-bits.h
@@ -26,6 +26,7 @@
 
 #define _PAGE_SPECIAL   _PAGE_SOFT
 #define _PAGE_TABLE _PAGE_PRESENT
+#define _PAGE_LEAF  (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)
 
 /*
  * _PAGE_PROT_NONE is set on not-present pages (and ignored by the hardware) to
diff --git a/arch/riscv/kvm/Kconfig b/arch/riscv/kvm/Kconfig
index b42979f84042..633063edaee8 100644
--- a/arch/riscv/kvm/Kconfig
+++ b/arch/riscv/kvm/Kconfig
@@ -23,6 +23,7 @@ config KVM
select PREEMPT_NOTIFIERS
select ANON_INODES
select KVM_MMIO
+   select KVM_GENERIC_DIRTYLOG_READ_PROTECT
select HAVE_KVM_VCPU_ASYNC_IOCTL
select HAVE_KVM_EVENTFD
select SRCU
diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
index 49a4941e3838..421ecf4e6360 100644
--- a/arch/riscv/kvm/main.c
+++ b/arch/riscv/kvm/main.c
@@ -64,6 +64,8 @@ void kvm_arch_hardware_disable(void)
 
 int kvm_arch_init(void *opaque)
 {
+   const char *str;
+
if (!riscv_isa_extension_available(NULL, h)) {
kvm_info("hypervisor extension not available\n");
return -ENODEV;
@@ -79,10 +81,27 @@ int kvm_arch_init(void *opaque)
return -ENODEV;
}
 
+   kvm_riscv_stage2_mode_detect();
+
kvm_riscv_stage2_vmid_detect();
 
kvm_info("hypervisor extension available\n");
 
+   switch (kvm_riscv_stage2_mode()) {
+   case HGATP_MODE_SV32X4:
+   str = "Sv32x4";
+   break;
+   case HGATP_MODE_SV39X4:
+   str = "Sv39x4";
+   break;
+   case HGATP_MODE_SV48X4:
+   str = "Sv48x4";
+   break;
+   default:
+   return -ENODEV;
+   }
+   kvm_info("using %s G-stage page table format\n", str);
+
kvm_info("VMID %ld bits available\n", kvm_riscv_stage2_vmid_bits());
 
return 0;
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 8ec10ef861e7..4c533a41b887 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -17,11 +17,415 @@
 #include 
 #include 
 #include 
+#include 
+
+#ifdef CONFIG_64BIT
+static unsigned long stage2_mode = (HGATP_MODE_SV39X4 << HGATP_MODE_SHIFT);
+static unsigned long stage2_pgd_levels = 3;
+#define stage2_index_bits  9
+#else
+static unsigned long stage2_mode = (HGATP_MODE_SV32X4 << HGATP_MODE_SHIFT);
+static unsigned long stage2_pgd_levels = 2;
+#define stage2_index_bits  10
+#endif
+
+#define stage2_pgd_xbits   2
+#define stage2_pgd_size(1UL << (HGATP_PAGE_SHIFT + stage2_pgd_xbits))
+#define stage2_gpa_bits(HGATP_PAGE_SHIFT + \
+(stage2_pgd_levels * stage2_index_bits) + \

[PATCH v17 13/17] RISC-V: KVM: FP lazy save/restore

2021-04-01 Thread Anup Patel
From: Atish Patra 

This patch adds floating point (F and D extension) context save/restore
for guest VCPUs. The FP context is saved and restored lazily only when
kernel enter/exits the in-kernel run loop and not during the KVM world
switch. This way FP save/restore has minimal impact on KVM performance.

Signed-off-by: Atish Patra 
Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Alexander Graf 
---
 arch/riscv/include/asm/kvm_host.h |   5 +
 arch/riscv/kernel/asm-offsets.c   |  72 +
 arch/riscv/kvm/vcpu.c |  91 
 arch/riscv/kvm/vcpu_switch.S  | 174 ++
 4 files changed, 342 insertions(+)

diff --git a/arch/riscv/include/asm/kvm_host.h 
b/arch/riscv/include/asm/kvm_host.h
index 20308046d5df..c6e717087a25 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -130,6 +130,7 @@ struct kvm_cpu_context {
unsigned long sepc;
unsigned long sstatus;
unsigned long hstatus;
+   union __riscv_fp_state fp;
 };
 
 struct kvm_vcpu_csr {
@@ -250,6 +251,10 @@ int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct 
kvm_run *run,
struct kvm_cpu_trap *trap);
 
 void __kvm_riscv_switch_to(struct kvm_vcpu_arch *vcpu_arch);
+void __kvm_riscv_fp_f_save(struct kvm_cpu_context *context);
+void __kvm_riscv_fp_f_restore(struct kvm_cpu_context *context);
+void __kvm_riscv_fp_d_save(struct kvm_cpu_context *context);
+void __kvm_riscv_fp_d_restore(struct kvm_cpu_context *context);
 
 int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq);
 int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq);
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index 21e7cf76948d..c7252b76d3af 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -195,6 +195,78 @@ void asm_offsets(void)
OFFSET(KVM_ARCH_TRAP_HTVAL, kvm_cpu_trap, htval);
OFFSET(KVM_ARCH_TRAP_HTINST, kvm_cpu_trap, htinst);
 
+   /* F extension */
+
+   OFFSET(KVM_ARCH_FP_F_F0, kvm_cpu_context, fp.f.f[0]);
+   OFFSET(KVM_ARCH_FP_F_F1, kvm_cpu_context, fp.f.f[1]);
+   OFFSET(KVM_ARCH_FP_F_F2, kvm_cpu_context, fp.f.f[2]);
+   OFFSET(KVM_ARCH_FP_F_F3, kvm_cpu_context, fp.f.f[3]);
+   OFFSET(KVM_ARCH_FP_F_F4, kvm_cpu_context, fp.f.f[4]);
+   OFFSET(KVM_ARCH_FP_F_F5, kvm_cpu_context, fp.f.f[5]);
+   OFFSET(KVM_ARCH_FP_F_F6, kvm_cpu_context, fp.f.f[6]);
+   OFFSET(KVM_ARCH_FP_F_F7, kvm_cpu_context, fp.f.f[7]);
+   OFFSET(KVM_ARCH_FP_F_F8, kvm_cpu_context, fp.f.f[8]);
+   OFFSET(KVM_ARCH_FP_F_F9, kvm_cpu_context, fp.f.f[9]);
+   OFFSET(KVM_ARCH_FP_F_F10, kvm_cpu_context, fp.f.f[10]);
+   OFFSET(KVM_ARCH_FP_F_F11, kvm_cpu_context, fp.f.f[11]);
+   OFFSET(KVM_ARCH_FP_F_F12, kvm_cpu_context, fp.f.f[12]);
+   OFFSET(KVM_ARCH_FP_F_F13, kvm_cpu_context, fp.f.f[13]);
+   OFFSET(KVM_ARCH_FP_F_F14, kvm_cpu_context, fp.f.f[14]);
+   OFFSET(KVM_ARCH_FP_F_F15, kvm_cpu_context, fp.f.f[15]);
+   OFFSET(KVM_ARCH_FP_F_F16, kvm_cpu_context, fp.f.f[16]);
+   OFFSET(KVM_ARCH_FP_F_F17, kvm_cpu_context, fp.f.f[17]);
+   OFFSET(KVM_ARCH_FP_F_F18, kvm_cpu_context, fp.f.f[18]);
+   OFFSET(KVM_ARCH_FP_F_F19, kvm_cpu_context, fp.f.f[19]);
+   OFFSET(KVM_ARCH_FP_F_F20, kvm_cpu_context, fp.f.f[20]);
+   OFFSET(KVM_ARCH_FP_F_F21, kvm_cpu_context, fp.f.f[21]);
+   OFFSET(KVM_ARCH_FP_F_F22, kvm_cpu_context, fp.f.f[22]);
+   OFFSET(KVM_ARCH_FP_F_F23, kvm_cpu_context, fp.f.f[23]);
+   OFFSET(KVM_ARCH_FP_F_F24, kvm_cpu_context, fp.f.f[24]);
+   OFFSET(KVM_ARCH_FP_F_F25, kvm_cpu_context, fp.f.f[25]);
+   OFFSET(KVM_ARCH_FP_F_F26, kvm_cpu_context, fp.f.f[26]);
+   OFFSET(KVM_ARCH_FP_F_F27, kvm_cpu_context, fp.f.f[27]);
+   OFFSET(KVM_ARCH_FP_F_F28, kvm_cpu_context, fp.f.f[28]);
+   OFFSET(KVM_ARCH_FP_F_F29, kvm_cpu_context, fp.f.f[29]);
+   OFFSET(KVM_ARCH_FP_F_F30, kvm_cpu_context, fp.f.f[30]);
+   OFFSET(KVM_ARCH_FP_F_F31, kvm_cpu_context, fp.f.f[31]);
+   OFFSET(KVM_ARCH_FP_F_FCSR, kvm_cpu_context, fp.f.fcsr);
+
+   /* D extension */
+
+   OFFSET(KVM_ARCH_FP_D_F0, kvm_cpu_context, fp.d.f[0]);
+   OFFSET(KVM_ARCH_FP_D_F1, kvm_cpu_context, fp.d.f[1]);
+   OFFSET(KVM_ARCH_FP_D_F2, kvm_cpu_context, fp.d.f[2]);
+   OFFSET(KVM_ARCH_FP_D_F3, kvm_cpu_context, fp.d.f[3]);
+   OFFSET(KVM_ARCH_FP_D_F4, kvm_cpu_context, fp.d.f[4]);
+   OFFSET(KVM_ARCH_FP_D_F5, kvm_cpu_context, fp.d.f[5]);
+   OFFSET(KVM_ARCH_FP_D_F6, kvm_cpu_context, fp.d.f[6]);
+   OFFSET(KVM_ARCH_FP_D_F7, kvm_cpu_context, fp.d.f[7]);
+   OFFSET(KVM_ARCH_FP_D_F8, kvm_cpu_context, fp.d.f[8]);
+   OFFSET(KVM_ARCH_FP_D_F9, kvm_cpu_context, fp.d.f[9]);
+   OFFSET(KVM_ARCH_FP_D_F10, kvm_cpu_context, fp.d.f[10]);
+   OFFSET(KVM_ARCH_FP_D_F11, kvm_cpu_context, fp.d.f[11

[PATCH v17 11/17] RISC-V: KVM: Implement MMU notifiers

2021-04-01 Thread Anup Patel
This patch implements MMU notifiers for KVM RISC-V so that Guest
physical address space is in-sync with Host physical address space.

This will allow swapping, page migration, etc to work transparently
with KVM RISC-V.

Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Alexander Graf 
---
 arch/riscv/include/asm/kvm_host.h |   7 ++
 arch/riscv/kvm/Kconfig|   1 +
 arch/riscv/kvm/mmu.c  | 144 +-
 arch/riscv/kvm/vm.c   |   1 +
 4 files changed, 149 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/include/asm/kvm_host.h 
b/arch/riscv/include/asm/kvm_host.h
index f80c394312b8..4e318137d82a 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -202,6 +202,13 @@ static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu 
*vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
 static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
 
+#define KVM_ARCH_WANT_MMU_NOTIFIER
+int kvm_unmap_hva_range(struct kvm *kvm, unsigned long start,
+   unsigned long end, unsigned int flags);
+int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
+int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
+int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
+
 void __kvm_riscv_hfence_gvma_vmid_gpa(unsigned long gpa, unsigned long vmid);
 void __kvm_riscv_hfence_gvma_vmid(unsigned long vmid);
 void __kvm_riscv_hfence_gvma_gpa(unsigned long gpa);
diff --git a/arch/riscv/kvm/Kconfig b/arch/riscv/kvm/Kconfig
index 633063edaee8..a712bb910cda 100644
--- a/arch/riscv/kvm/Kconfig
+++ b/arch/riscv/kvm/Kconfig
@@ -20,6 +20,7 @@ if VIRTUALIZATION
 config KVM
tristate "Kernel-based Virtual Machine (KVM) support (EXPERIMENTAL)"
depends on RISCV_SBI && MMU
+   select MMU_NOTIFIER
select PREEMPT_NOTIFIERS
select ANON_INODES
select KVM_MMIO
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 4c533a41b887..b64704aaed7d 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -296,7 +296,8 @@ static void stage2_op_pte(struct kvm *kvm, gpa_t addr,
}
 }
 
-static void stage2_unmap_range(struct kvm *kvm, gpa_t start, gpa_t size)
+static void stage2_unmap_range(struct kvm *kvm, gpa_t start,
+  gpa_t size, bool may_block)
 {
int ret;
pte_t *ptep;
@@ -321,6 +322,13 @@ static void stage2_unmap_range(struct kvm *kvm, gpa_t 
start, gpa_t size)
 
 next:
addr += page_size;
+
+   /*
+* If the range is too large, release the kvm->mmu_lock
+* to prevent starvation and lockup detector warnings.
+*/
+   if (may_block && addr < end)
+   cond_resched_lock(>mmu_lock);
}
 }
 
@@ -404,6 +412,38 @@ static int stage2_ioremap(struct kvm *kvm, gpa_t gpa, 
phys_addr_t hpa,
 
 }
 
+static int handle_hva_to_gpa(struct kvm *kvm,
+unsigned long start,
+unsigned long end,
+int (*handler)(struct kvm *kvm,
+   gpa_t gpa, u64 size,
+   void *data),
+void *data)
+{
+   struct kvm_memslots *slots;
+   struct kvm_memory_slot *memslot;
+   int ret = 0;
+
+   slots = kvm_memslots(kvm);
+
+   /* we only care about the pages that the guest sees */
+   kvm_for_each_memslot(memslot, slots) {
+   unsigned long hva_start, hva_end;
+   gfn_t gpa;
+
+   hva_start = max(start, memslot->userspace_addr);
+   hva_end = min(end, memslot->userspace_addr +
+   (memslot->npages << PAGE_SHIFT));
+   if (hva_start >= hva_end)
+   continue;
+
+   gpa = hva_to_gfn_memslot(hva_start, memslot) << PAGE_SHIFT;
+   ret |= handler(kvm, gpa, (u64)(hva_end - hva_start), data);
+   }
+
+   return ret;
+}
+
 void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 struct kvm_memory_slot *slot,
 gfn_t gfn_offset,
@@ -543,7 +583,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
spin_lock(>mmu_lock);
if (ret)
stage2_unmap_range(kvm, mem->guest_phys_addr,
-  mem->memory_size);
+  mem->memory_size, false);
spin_unlock(>mmu_lock);
 
 out:
@@ -551,6 +591,96 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
return ret;
 }
 
+static int kvm_unmap_hva_handler(struct kvm *kvm,
+gpa_t

[PATCH v17 17/17] RISC-V: KVM: Add MAINTAINERS entry

2021-04-01 Thread Anup Patel
Add myself as maintainer for KVM RISC-V and Atish as designated reviewer.

Signed-off-by: Atish Patra 
Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Alexander Graf 
---
 MAINTAINERS | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index fb2a3633b719..3d1aa899fc4c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9796,6 +9796,17 @@ F:   arch/powerpc/include/uapi/asm/kvm*
 F: arch/powerpc/kernel/kvm*
 F: arch/powerpc/kvm/
 
+KERNEL VIRTUAL MACHINE FOR RISC-V (KVM/riscv)
+M: Anup Patel 
+R: Atish Patra 
+L: k...@vger.kernel.org
+L: kvm-ri...@lists.infradead.org
+S: Maintained
+T: git git://github.com/kvm-riscv/linux.git
+F: arch/riscv/include/asm/kvm*
+F: arch/riscv/include/uapi/asm/kvm*
+F: arch/riscv/kvm/
+
 KERNEL VIRTUAL MACHINE for s390 (KVM/s390)
 M: Christian Borntraeger 
 M: Janosch Frank 
-- 
2.25.1



[PATCH v17 05/17] RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls

2021-04-01 Thread Anup Patel
For KVM RISC-V, we use KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls to access
VCPU config and registers from user-space.

We have three types of VCPU registers:
1. CONFIG - these are VCPU config and capabilities
2. CORE   - these are VCPU general purpose registers
3. CSR- these are VCPU control and status registers

The CONFIG register available to user-space is ISA. The ISA register is
a read and write register where user-space can only write the desired
VCPU ISA capabilities before running the VCPU.

The CORE registers available to user-space are PC, RA, SP, GP, TP, A0-A7,
T0-T6, S0-S11 and MODE. Most of these are RISC-V general registers except
PC and MODE. The PC register represents program counter whereas the MODE
register represent VCPU privilege mode (i.e. S/U-mode).

The CSRs available to user-space are SSTATUS, SIE, STVEC, SSCRATCH, SEPC,
SCAUSE, STVAL, SIP, and SATP. All of these are read/write registers.

In future, more VCPU register types will be added (such as FP) for the
KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls.

Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
---
 arch/riscv/include/uapi/asm/kvm.h |  53 ++-
 arch/riscv/kvm/vcpu.c | 246 +-
 2 files changed, 295 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/include/uapi/asm/kvm.h 
b/arch/riscv/include/uapi/asm/kvm.h
index 3d3d703713c6..f7e9dc388d54 100644
--- a/arch/riscv/include/uapi/asm/kvm.h
+++ b/arch/riscv/include/uapi/asm/kvm.h
@@ -41,10 +41,61 @@ struct kvm_guest_debug_arch {
 struct kvm_sync_regs {
 };
 
-/* dummy definition */
+/* for KVM_GET_SREGS and KVM_SET_SREGS */
 struct kvm_sregs {
 };
 
+/* CONFIG registers for KVM_GET_ONE_REG and KVM_SET_ONE_REG */
+struct kvm_riscv_config {
+   unsigned long isa;
+};
+
+/* CORE registers for KVM_GET_ONE_REG and KVM_SET_ONE_REG */
+struct kvm_riscv_core {
+   struct user_regs_struct regs;
+   unsigned long mode;
+};
+
+/* Possible privilege modes for kvm_riscv_core */
+#define KVM_RISCV_MODE_S   1
+#define KVM_RISCV_MODE_U   0
+
+/* CSR registers for KVM_GET_ONE_REG and KVM_SET_ONE_REG */
+struct kvm_riscv_csr {
+   unsigned long sstatus;
+   unsigned long sie;
+   unsigned long stvec;
+   unsigned long sscratch;
+   unsigned long sepc;
+   unsigned long scause;
+   unsigned long stval;
+   unsigned long sip;
+   unsigned long satp;
+   unsigned long scounteren;
+};
+
+#define KVM_REG_SIZE(id)   \
+   (1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
+
+/* If you need to interpret the index values, here is the key: */
+#define KVM_REG_RISCV_TYPE_MASK0xFF00
+#define KVM_REG_RISCV_TYPE_SHIFT   24
+
+/* Config registers are mapped as type 1 */
+#define KVM_REG_RISCV_CONFIG   (0x01 << KVM_REG_RISCV_TYPE_SHIFT)
+#define KVM_REG_RISCV_CONFIG_REG(name) \
+   (offsetof(struct kvm_riscv_config, name) / sizeof(unsigned long))
+
+/* Core registers are mapped as type 2 */
+#define KVM_REG_RISCV_CORE (0x02 << KVM_REG_RISCV_TYPE_SHIFT)
+#define KVM_REG_RISCV_CORE_REG(name)   \
+   (offsetof(struct kvm_riscv_core, name) / sizeof(unsigned long))
+
+/* Control and status registers are mapped as type 3 */
+#define KVM_REG_RISCV_CSR  (0x03 << KVM_REG_RISCV_TYPE_SHIFT)
+#define KVM_REG_RISCV_CSR_REG(name)\
+   (offsetof(struct kvm_riscv_csr, name) / sizeof(unsigned long))
+
 #endif
 
 #endif /* __LINUX_KVM_RISCV_H */
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index ae85a5d9b979..551359c9136c 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -18,7 +18,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 struct kvm_stats_debugfs_item debugfs_entries[] = {
@@ -133,6 +132,225 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, 
struct vm_fault *vmf)
return VM_FAULT_SIGBUS;
 }
 
+static int kvm_riscv_vcpu_get_reg_config(struct kvm_vcpu *vcpu,
+const struct kvm_one_reg *reg)
+{
+   unsigned long __user *uaddr =
+   (unsigned long __user *)(unsigned long)reg->addr;
+   unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK |
+   KVM_REG_SIZE_MASK |
+   KVM_REG_RISCV_CONFIG);
+   unsigned long reg_val;
+
+   if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long))
+   return -EINVAL;
+
+   switch (reg_num) {
+   case KVM_REG_RISCV_CONFIG_REG(isa):
+   reg_val = vcpu->arch.isa;
+   break;
+   default:
+   return -EINVAL;
+   };
+
+   if (copy_to_user(uaddr, _val, KVM_REG_SIZE(reg->id)))
+   return -EFAULT;
+
+   return 0;
+}
+
+static int kvm_riscv_vcpu_set_reg_config(struct kvm_vcpu *vcpu,
+

[PATCH v17 06/17] RISC-V: KVM: Implement VCPU world-switch

2021-04-01 Thread Anup Patel
This patch implements the VCPU world-switch for KVM RISC-V.

The KVM RISC-V world-switch (i.e. __kvm_riscv_switch_to()) mostly
switches general purpose registers, SSTATUS, STVEC, SSCRATCH and
HSTATUS CSRs. Other CSRs are switched via vcpu_load() and vcpu_put()
interface in kvm_arch_vcpu_load() and kvm_arch_vcpu_put() functions
respectively.

Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Alexander Graf 
---
 arch/riscv/include/asm/kvm_host.h |  10 +-
 arch/riscv/kernel/asm-offsets.c   |  78 
 arch/riscv/kvm/Makefile   |   2 +-
 arch/riscv/kvm/vcpu.c |  30 -
 arch/riscv/kvm/vcpu_switch.S  | 203 ++
 5 files changed, 319 insertions(+), 4 deletions(-)
 create mode 100644 arch/riscv/kvm/vcpu_switch.S

diff --git a/arch/riscv/include/asm/kvm_host.h 
b/arch/riscv/include/asm/kvm_host.h
index 1bf660b1a9d8..ca9b8dfcd406 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -120,6 +120,14 @@ struct kvm_vcpu_arch {
/* ISA feature bits (similar to MISA) */
unsigned long isa;
 
+   /* SSCRATCH, STVEC, and SCOUNTEREN of Host */
+   unsigned long host_sscratch;
+   unsigned long host_stvec;
+   unsigned long host_scounteren;
+
+   /* CPU context of Host */
+   struct kvm_cpu_context host_context;
+
/* CPU context of Guest VCPU */
struct kvm_cpu_context guest_context;
 
@@ -169,7 +177,7 @@ int kvm_riscv_vcpu_mmio_return(struct kvm_vcpu *vcpu, 
struct kvm_run *run);
 int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
struct kvm_cpu_trap *trap);
 
-static inline void __kvm_riscv_switch_to(struct kvm_vcpu_arch *vcpu_arch) {}
+void __kvm_riscv_switch_to(struct kvm_vcpu_arch *vcpu_arch);
 
 int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq);
 int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq);
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index 9ef33346853c..21f867d35b65 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -7,7 +7,9 @@
 #define GENERATING_ASM_OFFSETS
 
 #include 
+#include 
 #include 
+#include 
 #include 
 #include 
 
@@ -111,6 +113,82 @@ void asm_offsets(void)
OFFSET(PT_BADADDR, pt_regs, badaddr);
OFFSET(PT_CAUSE, pt_regs, cause);
 
+   OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
+   OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
+   OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
+   OFFSET(KVM_ARCH_GUEST_GP, kvm_vcpu_arch, guest_context.gp);
+   OFFSET(KVM_ARCH_GUEST_TP, kvm_vcpu_arch, guest_context.tp);
+   OFFSET(KVM_ARCH_GUEST_T0, kvm_vcpu_arch, guest_context.t0);
+   OFFSET(KVM_ARCH_GUEST_T1, kvm_vcpu_arch, guest_context.t1);
+   OFFSET(KVM_ARCH_GUEST_T2, kvm_vcpu_arch, guest_context.t2);
+   OFFSET(KVM_ARCH_GUEST_S0, kvm_vcpu_arch, guest_context.s0);
+   OFFSET(KVM_ARCH_GUEST_S1, kvm_vcpu_arch, guest_context.s1);
+   OFFSET(KVM_ARCH_GUEST_A0, kvm_vcpu_arch, guest_context.a0);
+   OFFSET(KVM_ARCH_GUEST_A1, kvm_vcpu_arch, guest_context.a1);
+   OFFSET(KVM_ARCH_GUEST_A2, kvm_vcpu_arch, guest_context.a2);
+   OFFSET(KVM_ARCH_GUEST_A3, kvm_vcpu_arch, guest_context.a3);
+   OFFSET(KVM_ARCH_GUEST_A4, kvm_vcpu_arch, guest_context.a4);
+   OFFSET(KVM_ARCH_GUEST_A5, kvm_vcpu_arch, guest_context.a5);
+   OFFSET(KVM_ARCH_GUEST_A6, kvm_vcpu_arch, guest_context.a6);
+   OFFSET(KVM_ARCH_GUEST_A7, kvm_vcpu_arch, guest_context.a7);
+   OFFSET(KVM_ARCH_GUEST_S2, kvm_vcpu_arch, guest_context.s2);
+   OFFSET(KVM_ARCH_GUEST_S3, kvm_vcpu_arch, guest_context.s3);
+   OFFSET(KVM_ARCH_GUEST_S4, kvm_vcpu_arch, guest_context.s4);
+   OFFSET(KVM_ARCH_GUEST_S5, kvm_vcpu_arch, guest_context.s5);
+   OFFSET(KVM_ARCH_GUEST_S6, kvm_vcpu_arch, guest_context.s6);
+   OFFSET(KVM_ARCH_GUEST_S7, kvm_vcpu_arch, guest_context.s7);
+   OFFSET(KVM_ARCH_GUEST_S8, kvm_vcpu_arch, guest_context.s8);
+   OFFSET(KVM_ARCH_GUEST_S9, kvm_vcpu_arch, guest_context.s9);
+   OFFSET(KVM_ARCH_GUEST_S10, kvm_vcpu_arch, guest_context.s10);
+   OFFSET(KVM_ARCH_GUEST_S11, kvm_vcpu_arch, guest_context.s11);
+   OFFSET(KVM_ARCH_GUEST_T3, kvm_vcpu_arch, guest_context.t3);
+   OFFSET(KVM_ARCH_GUEST_T4, kvm_vcpu_arch, guest_context.t4);
+   OFFSET(KVM_ARCH_GUEST_T5, kvm_vcpu_arch, guest_context.t5);
+   OFFSET(KVM_ARCH_GUEST_T6, kvm_vcpu_arch, guest_context.t6);
+   OFFSET(KVM_ARCH_GUEST_SEPC, kvm_vcpu_arch, guest_context.sepc);
+   OFFSET(KVM_ARCH_GUEST_SSTATUS, kvm_vcpu_arch, guest_context.sstatus);
+   OFFSET(KVM_ARCH_GUEST_HSTATUS, kvm_vcpu_arch, guest_context.hstatus);
+   OFFSET(KVM_ARCH_GUEST_SCOUNTEREN, kvm_vcpu_arch, guest_csr.scounteren);
+
+   OFFSET(KVM_ARCH_HOST_ZERO, kvm_vcpu_arch

[PATCH v17 07/17] RISC-V: KVM: Handle MMIO exits for VCPU

2021-04-01 Thread Anup Patel
We will get stage2 page faults whenever Guest/VM access SW emulated
MMIO device or unmapped Guest RAM.

This patch implements MMIO read/write emulation by extracting MMIO
details from the trapped load/store instruction and forwarding the
MMIO read/write to user-space. The actual MMIO emulation will happen
in user-space and KVM kernel module will only take care of register
updates before resuming the trapped VCPU.

The handling for stage2 page faults for unmapped Guest RAM will be
implemeted by a separate patch later.

[jiangyifei: ioeventfd and in-kernel mmio device support]
Signed-off-by: Yifei Jiang 
Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Alexander Graf 
---
 arch/riscv/include/asm/kvm_host.h |  22 ++
 arch/riscv/kernel/asm-offsets.c   |   6 +
 arch/riscv/kvm/Kconfig|   1 +
 arch/riscv/kvm/Makefile   |   1 +
 arch/riscv/kvm/mmu.c  |   8 +
 arch/riscv/kvm/vcpu_exit.c| 592 +-
 arch/riscv/kvm/vcpu_switch.S  |  23 ++
 arch/riscv/kvm/vm.c   |   1 +
 8 files changed, 651 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/include/asm/kvm_host.h 
b/arch/riscv/include/asm/kvm_host.h
index ca9b8dfcd406..7541018314a4 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -54,6 +54,14 @@ struct kvm_arch {
phys_addr_t pgd_phys;
 };
 
+struct kvm_mmio_decode {
+   unsigned long insn;
+   int insn_len;
+   int len;
+   int shift;
+   int return_handled;
+};
+
 struct kvm_cpu_trap {
unsigned long sepc;
unsigned long scause;
@@ -152,6 +160,9 @@ struct kvm_vcpu_arch {
unsigned long irqs_pending;
unsigned long irqs_pending_mask;
 
+   /* MMIO instruction details */
+   struct kvm_mmio_decode mmio_decode;
+
/* VCPU power-off state */
bool power_off;
 
@@ -168,11 +179,22 @@ static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu 
*vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
 static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
 
+int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu,
+struct kvm_memory_slot *memslot,
+gpa_t gpa, unsigned long hva, bool is_write);
 void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu);
 int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm);
 void kvm_riscv_stage2_free_pgd(struct kvm *kvm);
 void kvm_riscv_stage2_update_hgatp(struct kvm_vcpu *vcpu);
 
+void __kvm_riscv_unpriv_trap(void);
+
+unsigned long kvm_riscv_vcpu_unpriv_read(struct kvm_vcpu *vcpu,
+bool read_insn,
+unsigned long guest_addr,
+struct kvm_cpu_trap *trap);
+void kvm_riscv_vcpu_trap_redirect(struct kvm_vcpu *vcpu,
+ struct kvm_cpu_trap *trap);
 int kvm_riscv_vcpu_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
struct kvm_cpu_trap *trap);
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index 21f867d35b65..21e7cf76948d 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -189,6 +189,12 @@ void asm_offsets(void)
OFFSET(KVM_ARCH_HOST_STVEC, kvm_vcpu_arch, host_stvec);
OFFSET(KVM_ARCH_HOST_SCOUNTEREN, kvm_vcpu_arch, host_scounteren);
 
+   OFFSET(KVM_ARCH_TRAP_SEPC, kvm_cpu_trap, sepc);
+   OFFSET(KVM_ARCH_TRAP_SCAUSE, kvm_cpu_trap, scause);
+   OFFSET(KVM_ARCH_TRAP_STVAL, kvm_cpu_trap, stval);
+   OFFSET(KVM_ARCH_TRAP_HTVAL, kvm_cpu_trap, htval);
+   OFFSET(KVM_ARCH_TRAP_HTINST, kvm_cpu_trap, htinst);
+
/*
 * THREAD_{F,X}* might be larger than a S-type offset can handle, but
 * these are used in performance-sensitive assembly so we can't resort
diff --git a/arch/riscv/kvm/Kconfig b/arch/riscv/kvm/Kconfig
index 88edd477b3a8..b42979f84042 100644
--- a/arch/riscv/kvm/Kconfig
+++ b/arch/riscv/kvm/Kconfig
@@ -24,6 +24,7 @@ config KVM
select ANON_INODES
select KVM_MMIO
select HAVE_KVM_VCPU_ASYNC_IOCTL
+   select HAVE_KVM_EVENTFD
select SRCU
help
  Support hosting virtualized guest machines.
diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
index 845579273727..54991cc55c00 100644
--- a/arch/riscv/kvm/Makefile
+++ b/arch/riscv/kvm/Makefile
@@ -3,6 +3,7 @@
 #
 
 common-objs-y = $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o)
+common-objs-y += $(addprefix ../../../virt/kvm/, eventfd.o)
 
 ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
 
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index abfd2b22fa8e..8ec10ef861e7 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -58,6 +58,14 @@ int kvm_arch_prepare_memory_region(struct kvm

[PATCH v17 03/17] RISC-V: KVM: Implement VCPU create, init and destroy functions

2021-04-01 Thread Anup Patel
This patch implements VCPU create, init and destroy functions
required by generic KVM module. We don't have much dynamic
resources in struct kvm_vcpu_arch so these functions are quite
simple for KVM RISC-V.

Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Alexander Graf 
---
 arch/riscv/include/asm/kvm_host.h | 69 +++
 arch/riscv/kvm/vcpu.c | 55 
 2 files changed, 115 insertions(+), 9 deletions(-)

diff --git a/arch/riscv/include/asm/kvm_host.h 
b/arch/riscv/include/asm/kvm_host.h
index e1a8f89b2b81..2796a4211508 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -62,7 +62,76 @@ struct kvm_cpu_trap {
unsigned long htinst;
 };
 
+struct kvm_cpu_context {
+   unsigned long zero;
+   unsigned long ra;
+   unsigned long sp;
+   unsigned long gp;
+   unsigned long tp;
+   unsigned long t0;
+   unsigned long t1;
+   unsigned long t2;
+   unsigned long s0;
+   unsigned long s1;
+   unsigned long a0;
+   unsigned long a1;
+   unsigned long a2;
+   unsigned long a3;
+   unsigned long a4;
+   unsigned long a5;
+   unsigned long a6;
+   unsigned long a7;
+   unsigned long s2;
+   unsigned long s3;
+   unsigned long s4;
+   unsigned long s5;
+   unsigned long s6;
+   unsigned long s7;
+   unsigned long s8;
+   unsigned long s9;
+   unsigned long s10;
+   unsigned long s11;
+   unsigned long t3;
+   unsigned long t4;
+   unsigned long t5;
+   unsigned long t6;
+   unsigned long sepc;
+   unsigned long sstatus;
+   unsigned long hstatus;
+};
+
+struct kvm_vcpu_csr {
+   unsigned long vsstatus;
+   unsigned long hie;
+   unsigned long vstvec;
+   unsigned long vsscratch;
+   unsigned long vsepc;
+   unsigned long vscause;
+   unsigned long vstval;
+   unsigned long hvip;
+   unsigned long vsatp;
+   unsigned long scounteren;
+};
+
 struct kvm_vcpu_arch {
+   /* VCPU ran atleast once */
+   bool ran_atleast_once;
+
+   /* ISA feature bits (similar to MISA) */
+   unsigned long isa;
+
+   /* CPU context of Guest VCPU */
+   struct kvm_cpu_context guest_context;
+
+   /* CPU CSR context of Guest VCPU */
+   struct kvm_vcpu_csr guest_csr;
+
+   /* CPU context upon Guest VCPU reset */
+   struct kvm_cpu_context guest_reset_context;
+
+   /* CPU CSR context upon Guest VCPU reset */
+   struct kvm_vcpu_csr guest_reset_csr;
+
/* Don't run the VCPU (blocked) */
bool pause;
 
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 6f1c6c8049e1..d87f56126df6 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -35,6 +35,27 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ NULL }
 };
 
+#define KVM_RISCV_ISA_ALLOWED  (riscv_isa_extension_mask(a) | \
+riscv_isa_extension_mask(c) | \
+riscv_isa_extension_mask(d) | \
+riscv_isa_extension_mask(f) | \
+riscv_isa_extension_mask(i) | \
+riscv_isa_extension_mask(m) | \
+riscv_isa_extension_mask(s) | \
+riscv_isa_extension_mask(u))
+
+static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
+{
+   struct kvm_vcpu_csr *csr = >arch.guest_csr;
+   struct kvm_vcpu_csr *reset_csr = >arch.guest_reset_csr;
+   struct kvm_cpu_context *cntx = >arch.guest_context;
+   struct kvm_cpu_context *reset_cntx = >arch.guest_reset_context;
+
+   memcpy(csr, reset_csr, sizeof(*csr));
+
+   memcpy(cntx, reset_cntx, sizeof(*cntx));
+}
+
 int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
 {
return 0;
@@ -42,7 +63,25 @@ int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id)
 
 int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 {
-   /* TODO: */
+   struct kvm_cpu_context *cntx;
+
+   /* Mark this VCPU never ran */
+   vcpu->arch.ran_atleast_once = false;
+
+   /* Setup ISA features available to VCPU */
+   vcpu->arch.isa = riscv_isa_extension_base(NULL) & KVM_RISCV_ISA_ALLOWED;
+
+   /* Setup reset state of shadow SSTATUS and HSTATUS CSRs */
+   cntx = >arch.guest_reset_context;
+   cntx->sstatus = SR_SPP | SR_SPIE;
+   cntx->hstatus = 0;
+   cntx->hstatus |= HSTATUS_VTW;
+   cntx->hstatus |= HSTATUS_SPVP;
+   cntx->hstatus |= HSTATUS_SPV;
+
+   /* Reset VCPU */
+   kvm_riscv_reset_vcpu(vcpu);
+
return 0;
 }
 
@@ -50,15 +89,10 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
 {
 }
 
-int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
-{
-   /* TODO: */
-   return 0;
-}
-
 void kvm_arch_vcpu_destroy(s

[PATCH v17 09/17] RISC-V: KVM: Implement VMID allocator

2021-04-01 Thread Anup Patel
We implement a simple VMID allocator for Guests/VMs which:
1. Detects number of VMID bits at boot-time
2. Uses atomic number to track VMID version and increments
   VMID version whenever we run-out of VMIDs
3. Flushes Guest TLBs on all host CPUs whenever we run-out
   of VMIDs
4. Force updates HW Stage2 VMID for each Guest VCPU whenever
   VMID changes using VCPU request KVM_REQ_UPDATE_HGATP

Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Alexander Graf 
---
 arch/riscv/include/asm/kvm_host.h |  24 ++
 arch/riscv/kvm/Makefile   |   3 +-
 arch/riscv/kvm/main.c |   4 +
 arch/riscv/kvm/tlb.S  |  74 ++
 arch/riscv/kvm/vcpu.c |   9 +++
 arch/riscv/kvm/vm.c   |   6 ++
 arch/riscv/kvm/vmid.c | 120 ++
 7 files changed, 239 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/kvm/tlb.S
 create mode 100644 arch/riscv/kvm/vmid.c

diff --git a/arch/riscv/include/asm/kvm_host.h 
b/arch/riscv/include/asm/kvm_host.h
index 7541018314a4..8612d8b35322 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -26,6 +26,7 @@
 #define KVM_REQ_SLEEP \
KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(1)
+#define KVM_REQ_UPDATE_HGATP   KVM_ARCH_REQ(2)
 
 struct kvm_vm_stat {
ulong remote_tlb_flush;
@@ -48,7 +49,19 @@ struct kvm_vcpu_stat {
 struct kvm_arch_memory_slot {
 };
 
+struct kvm_vmid {
+   /*
+* Writes to vmid_version and vmid happen with vmid_lock held
+* whereas reads happen without any lock held.
+*/
+   unsigned long vmid_version;
+   unsigned long vmid;
+};
+
 struct kvm_arch {
+   /* stage2 vmid */
+   struct kvm_vmid vmid;
+
/* stage2 page table */
pgd_t *pgd;
phys_addr_t pgd_phys;
@@ -179,6 +192,11 @@ static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu 
*vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
 static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
 
+void __kvm_riscv_hfence_gvma_vmid_gpa(unsigned long gpa, unsigned long vmid);
+void __kvm_riscv_hfence_gvma_vmid(unsigned long vmid);
+void __kvm_riscv_hfence_gvma_gpa(unsigned long gpa);
+void __kvm_riscv_hfence_gvma_all(void);
+
 int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu,
 struct kvm_memory_slot *memslot,
 gpa_t gpa, unsigned long hva, bool is_write);
@@ -187,6 +205,12 @@ int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm);
 void kvm_riscv_stage2_free_pgd(struct kvm *kvm);
 void kvm_riscv_stage2_update_hgatp(struct kvm_vcpu *vcpu);
 
+void kvm_riscv_stage2_vmid_detect(void);
+unsigned long kvm_riscv_stage2_vmid_bits(void);
+int kvm_riscv_stage2_vmid_init(struct kvm *kvm);
+bool kvm_riscv_stage2_vmid_ver_changed(struct kvm_vmid *vmid);
+void kvm_riscv_stage2_vmid_update(struct kvm_vcpu *vcpu);
+
 void __kvm_riscv_unpriv_trap(void);
 
 unsigned long kvm_riscv_vcpu_unpriv_read(struct kvm_vcpu *vcpu,
diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
index 54991cc55c00..b32f60edf48c 100644
--- a/arch/riscv/kvm/Makefile
+++ b/arch/riscv/kvm/Makefile
@@ -9,6 +9,7 @@ ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
 
 kvm-objs := $(common-objs-y)
 
-kvm-objs += main.o vm.o mmu.o vcpu.o vcpu_exit.o vcpu_switch.o
+kvm-objs += main.o vm.o vmid.o tlb.o mmu.o
+kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o
 
 obj-$(CONFIG_KVM)  += kvm.o
diff --git a/arch/riscv/kvm/main.c b/arch/riscv/kvm/main.c
index 47926f0c175d..49a4941e3838 100644
--- a/arch/riscv/kvm/main.c
+++ b/arch/riscv/kvm/main.c
@@ -79,8 +79,12 @@ int kvm_arch_init(void *opaque)
return -ENODEV;
}
 
+   kvm_riscv_stage2_vmid_detect();
+
kvm_info("hypervisor extension available\n");
 
+   kvm_info("VMID %ld bits available\n", kvm_riscv_stage2_vmid_bits());
+
return 0;
 }
 
diff --git a/arch/riscv/kvm/tlb.S b/arch/riscv/kvm/tlb.S
new file mode 100644
index ..c858570f0856
--- /dev/null
+++ b/arch/riscv/kvm/tlb.S
@@ -0,0 +1,74 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ * Anup Patel 
+ */
+
+#include 
+#include 
+
+   .text
+   .altmacro
+   .option norelax
+
+   /*
+* Instruction encoding of hfence.gvma is:
+* HFENCE.GVMA rs1, rs2
+* HFENCE.GVMA zero, rs2
+* HFENCE.GVMA rs1
+* HFENCE.GVMA
+*
+* rs1!=zero and rs2!=zero ==> HFENCE.GVMA rs1, rs2
+* rs1==zero and rs2!=zero ==> HFENCE.GVMA zero, rs2
+* rs1!=zero and rs2==zero ==> HFENCE.GVMA rs1
+* rs1==zero and rs2==zero ==> HFENCE.GVMA
+*
+* Instruction encoding of HFENCE.GVMA is:
+* 0110001

[PATCH v17 02/17] RISC-V: Add initial skeletal KVM support

2021-04-01 Thread Anup Patel
This patch adds initial skeletal KVM RISC-V support which has:
1. A simple implementation of arch specific VM functions
   except kvm_vm_ioctl_get_dirty_log() which will implemeted
   in-future as part of stage2 page loging.
2. Stubs of required arch specific VCPU functions except
   kvm_arch_vcpu_ioctl_run() which is semi-complete and
   extended by subsequent patches.
3. Stubs for required arch specific stage2 MMU functions.

Signed-off-by: Anup Patel 
Acked-by: Paolo Bonzini 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Alexander Graf 
---
 arch/riscv/Kconfig |   1 +
 arch/riscv/Makefile|   2 +
 arch/riscv/include/asm/kvm_host.h  |  90 +
 arch/riscv/include/asm/kvm_types.h |   7 +
 arch/riscv/include/uapi/asm/kvm.h  |  47 +
 arch/riscv/kvm/Kconfig |  33 +++
 arch/riscv/kvm/Makefile|  13 ++
 arch/riscv/kvm/main.c  |  95 +
 arch/riscv/kvm/mmu.c   |  80 
 arch/riscv/kvm/vcpu.c  | 311 +
 arch/riscv/kvm/vcpu_exit.c |  35 
 arch/riscv/kvm/vm.c|  79 
 12 files changed, 793 insertions(+)
 create mode 100644 arch/riscv/include/asm/kvm_host.h
 create mode 100644 arch/riscv/include/asm/kvm_types.h
 create mode 100644 arch/riscv/include/uapi/asm/kvm.h
 create mode 100644 arch/riscv/kvm/Kconfig
 create mode 100644 arch/riscv/kvm/Makefile
 create mode 100644 arch/riscv/kvm/main.c
 create mode 100644 arch/riscv/kvm/mmu.c
 create mode 100644 arch/riscv/kvm/vcpu.c
 create mode 100644 arch/riscv/kvm/vcpu_exit.c
 create mode 100644 arch/riscv/kvm/vm.c

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index dc9182680897..45caf2d8c35c 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -475,4 +475,5 @@ source "kernel/power/Kconfig"
 
 endmenu
 
+source "arch/riscv/kvm/Kconfig"
 source "drivers/firmware/Kconfig"
diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile
index 1368d943f1f3..7b5a5c7fd9b4 100644
--- a/arch/riscv/Makefile
+++ b/arch/riscv/Makefile
@@ -88,6 +88,8 @@ head-y := arch/riscv/kernel/head.o
 
 core-y += arch/riscv/
 
+core-$(CONFIG_KVM) += arch/riscv/kvm/
+
 libs-y += arch/riscv/lib/
 libs-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a
 
diff --git a/arch/riscv/include/asm/kvm_host.h 
b/arch/riscv/include/asm/kvm_host.h
new file mode 100644
index ..e1a8f89b2b81
--- /dev/null
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -0,0 +1,90 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ * Anup Patel 
+ */
+
+#ifndef __RISCV_KVM_HOST_H__
+#define __RISCV_KVM_HOST_H__
+
+#include 
+#include 
+#include 
+
+#ifdef CONFIG_64BIT
+#define KVM_MAX_VCPUS  (1U << 16)
+#else
+#define KVM_MAX_VCPUS  (1U << 9)
+#endif
+
+#define KVM_HALT_POLL_NS_DEFAULT   50
+
+#define KVM_VCPU_MAX_FEATURES  0
+
+#define KVM_REQ_SLEEP \
+   KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(1)
+
+struct kvm_vm_stat {
+   ulong remote_tlb_flush;
+};
+
+struct kvm_vcpu_stat {
+   u64 halt_successful_poll;
+   u64 halt_attempted_poll;
+   u64 halt_poll_success_ns;
+   u64 halt_poll_fail_ns;
+   u64 halt_poll_invalid;
+   u64 halt_wakeup;
+   u64 ecall_exit_stat;
+   u64 wfi_exit_stat;
+   u64 mmio_exit_user;
+   u64 mmio_exit_kernel;
+   u64 exits;
+};
+
+struct kvm_arch_memory_slot {
+};
+
+struct kvm_arch {
+   /* stage2 page table */
+   pgd_t *pgd;
+   phys_addr_t pgd_phys;
+};
+
+struct kvm_cpu_trap {
+   unsigned long sepc;
+   unsigned long scause;
+   unsigned long stval;
+   unsigned long htval;
+   unsigned long htinst;
+};
+
+struct kvm_vcpu_arch {
+   /* Don't run the VCPU (blocked) */
+   bool pause;
+
+   /* SRCU lock index for in-kernel run loop */
+   int srcu_idx;
+};
+
+static inline void kvm_arch_hardware_unsetup(void) {}
+static inline void kvm_arch_sync_events(struct kvm *kvm) {}
+static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
+static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
+
+void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu);
+int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm);
+void kvm_riscv_stage2_free_pgd(struct kvm *kvm);
+void kvm_riscv_stage2_update_hgatp(struct kvm_vcpu *vcpu);
+
+int kvm_riscv_vcpu_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
+   struct kvm_cpu_trap *trap);
+
+static inline void __kvm_riscv_switch_to(struct kvm_vcpu_arch *vcpu_arch) {}
+
+#endif /* __RISCV_KVM_HOST_H__ */
diff --git a/arch/riscv/include/asm/kvm_types.h

[PATCH v17 00/17] KVM RISC-V Support

2021-04-01 Thread Anup Patel
ges since v7:
 - Rebased series on Linux-5.4-rc1 and Atish's SBI v0.2 patches
 - Removed PATCH1, PATCH3, and PATCH20 because these already merged
 - Use kernel doc style comments for ISA bitmap functions
 - Don't parse X, Y, and Z extension in riscv_fill_hwcap() because it will
   be added in-future
 - Mark KVM RISC-V kconfig option as EXPERIMENTAL
 - Typo fix in commit description of PATCH6 of v7 series
 - Use separate structs for CORE and CSR registers of ONE_REG interface
 - Explicitly include asm/sbi.h in kvm/vcpu_sbi.c
 - Removed implicit switch-case fall-through in kvm_riscv_vcpu_exit()
 - No need to set VSSTATUS.MXR bit in kvm_riscv_vcpu_unpriv_read()
 - Removed register for instruction length in kvm_riscv_vcpu_unpriv_read()
 - Added defines for checking/decoding instruction length
 - Added separate patch to forward unhandled SBI calls to userspace tool

Changes since v6:
 - Rebased patches on Linux-5.3-rc7
 - Added "return_handled" in struct kvm_mmio_decode to ensure that
   kvm_riscv_vcpu_mmio_return() updates SEPC only once
 - Removed trap_stval parameter from kvm_riscv_vcpu_unpriv_read()
 - Updated git repo URL in MAINTAINERS entry

Changes since v5:
 - Renamed KVM_REG_RISCV_CONFIG_TIMEBASE register to
   KVM_REG_RISCV_CONFIG_TBFREQ register in ONE_REG interface
 - Update SPEC in kvm_riscv_vcpu_mmio_return() for MMIO exits
 - Use switch case instead of illegal instruction opcode table for simplicity
 - Improve comments in stage2_remote_tlb_flush() for a potential remote TLB
  flush optimization
 - Handle all unsupported SBI calls in default case of
   kvm_riscv_vcpu_sbi_ecall() function
 - Fixed kvm_riscv_vcpu_sync_interrupts() for software interrupts
 - Improved unprivilege reads to handle traps due to Guest stage1 page table
 - Added separate patch to document RISC-V specific things in
   Documentation/virt/kvm/api.txt

Changes since v4:
 - Rebased patches on Linux-5.3-rc5
 - Added Paolo's Acked-by and Reviewed-by
 - Updated mailing list in MAINTAINERS entry

Changes since v3:
 - Moved patch for ISA bitmap from KVM prep series to this series
 - Make vsip_shadow as run-time percpu variable instead of compile-time
 - Flush Guest TLBs on all Host CPUs whenever we run-out of VMIDs

Changes since v2:
 - Removed references of KVM_REQ_IRQ_PENDING from all patches
 - Use kvm->srcu within in-kernel KVM run loop
 - Added percpu vsip_shadow to track last value programmed in VSIP CSR
 - Added comments about irqs_pending and irqs_pending_mask
 - Used kvm_arch_vcpu_runnable() in-place-of kvm_riscv_vcpu_has_interrupt()
   in system_opcode_insn()
 - Removed unwanted smp_wmb() in kvm_riscv_stage2_vmid_update()
 - Use kvm_flush_remote_tlbs() in kvm_riscv_stage2_vmid_update()
 - Use READ_ONCE() in kvm_riscv_stage2_update_hgatp() for vmid

Changes since v1:
 - Fixed compile errors in building KVM RISC-V as module
 - Removed unused kvm_riscv_halt_guest() and kvm_riscv_resume_guest()
 - Set KVM_CAP_SYNC_MMU capability only after MMU notifiers are implemented
 - Made vmid_version as unsigned long instead of atomic
 - Renamed KVM_REQ_UPDATE_PGTBL to KVM_REQ_UPDATE_HGATP
 - Renamed kvm_riscv_stage2_update_pgtbl() to kvm_riscv_stage2_update_hgatp()
 - Configure HIDELEG and HEDELEG in kvm_arch_hardware_enable()
 - Updated ONE_REG interface for CSR access to user-space
 - Removed irqs_pending_lock and use atomic bitops instead
 - Added separate patch for FP ONE_REG interface
 - Added separate patch for updating MAINTAINERS file

Anup Patel (13):
  RISC-V: Add hypervisor extension related CSR defines
  RISC-V: Add initial skeletal KVM support
  RISC-V: KVM: Implement VCPU create, init and destroy functions
  RISC-V: KVM: Implement VCPU interrupts and requests handling
  RISC-V: KVM: Implement KVM_GET_ONE_REG/KVM_SET_ONE_REG ioctls
  RISC-V: KVM: Implement VCPU world-switch
  RISC-V: KVM: Handle MMIO exits for VCPU
  RISC-V: KVM: Handle WFI exits for VCPU
  RISC-V: KVM: Implement VMID allocator
  RISC-V: KVM: Implement stage2 page table programming
  RISC-V: KVM: Implement MMU notifiers
  RISC-V: KVM: Document RISC-V specific parts of KVM API
  RISC-V: KVM: Add MAINTAINERS entry

Atish Patra (4):
  RISC-V: KVM: Add timer functionality
  RISC-V: KVM: FP lazy save/restore
  RISC-V: KVM: Implement ONE REG interface for FP registers
  RISC-V: KVM: Add SBI v0.1 support

 Documentation/virt/kvm/api.rst  | 193 -
 MAINTAINERS |  11 +
 arch/riscv/Kconfig  |   1 +
 arch/riscv/Makefile |   2 +
 arch/riscv/include/asm/csr.h|  89 +++
 arch/riscv/include/asm/kvm_host.h   | 277 +++
 arch/riscv/include/asm/kvm_types.h  |   7 +
 arch/riscv/include/asm/kvm_vcpu_timer.h |  44 ++
 arch/riscv/include/asm/pgtable-bits.h   |   1 +
 arch/riscv/include/uapi/asm/kvm.h   | 128 +++
 arch/riscv/kernel/asm-offsets.c | 156 
 arch/riscv/kvm/Kconfig  |  36 +
 arch/riscv/kvm/Makefile |  15 +
 ar

Re: [PATCH v16 00/17] KVM RISC-V Support

2021-04-01 Thread Anup Patel
On Wed, Mar 31, 2021 at 2:52 PM Paolo Bonzini  wrote:
>
> On 30/03/21 07:48, Anup Patel wrote:
> >
> > It seems Andrew does not want to freeze H-extension until we have 
> > virtualization
> > aware interrupt controller (such as RISC-V AIA specification) and IOMMU. Lot
> > of us feel that these things can be done independently because RISC-V
> > H-extension already has provisions for external interrupt controller with
> > virtualization support.
>
> Yes, frankly that's pretty ridiculous as it's perfectly possible to
> emulate the interrupt controller in software (and an IOMMU is not needed
> at all if you are okay with emulated or paravirtualized devices---which
> is almost always the case except for partitioning hypervisors).
>
> Palmer, are you okay with merging RISC-V KVM?  Or should we place it in
> drivers/staging/riscv/kvm?
>
> Either way, the best way to do it would be like this:
>
> 1) you apply patch 1 in a topic branch
>
> 2) you merge the topic branch in the risc-v tree
>
> 3) Anup merges the topic branch too and sends me a pull request.

In any case, I will send v17 based on Linux-5.12-rc5 so that people
can at least try KVM RISC-V based on latest kernel.

Regards,
Anup


Re: [PATCH v4 3/4] locking/qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32

2021-03-29 Thread Anup Patel
On Tue, Mar 30, 2021 at 7:56 AM Guo Ren  wrote:
>
> On Mon, Mar 29, 2021 at 9:56 PM Arnd Bergmann  wrote:
> >
> > On Mon, Mar 29, 2021 at 2:52 PM Guo Ren  wrote:
> > >
> > > On Mon, Mar 29, 2021 at 7:31 PM Peter Zijlstra  
> > > wrote:
> > > >
> > > > On Mon, Mar 29, 2021 at 01:16:53PM +0200, Peter Zijlstra wrote:
> > > > > Anyway, an additional 'funny' is that I suspect you cannot prove fwd
> > > > > progress of the entire primitive with any of this on. But who cares
> > > > > about details anyway.. :/
> > > >
> > > > What's the architectural guarantee on LL/SC progress for RISC-V ?
> > >
> > > funct5| aq | rl   | rs2 |  rs1  | funct3 | rd | opcode
> > >  5  11  5   5 35  7
> > > LR.W/D  ordering  0 addrwidth   destAMO
> > > SC.W/D  ordering  src  addrwidth   destAMO
> > >
> > > LR.W loads a word from the address in rs1, places the sign-extended
> > > value in rd, and registers a reservation set—a set of bytes that
> > > subsumes the bytes in the addressed word. SC.W conditionally writes a
> > > word in rs2 to the address in rs1: the SC.W succeeds only if the
> > > reservation is still valid and the reservation set contains the bytes
> > > being written. If the SC.W succeeds, the instruction writes the word
> > > in rs2 to memory, and it writes zero to rd. If the SC.W fails, the
> > > instruction does not write to memory, and it writes a nonzero value to
> > > rd. Regardless of success or failure, executing an SC.W instruction
> > > *invalidates any reservation held by this hart*.
> > >
> > > More details, ref:
> > > https://github.com/riscv/riscv-isa-manual
> >
> > I think section "3.5.3.2 Reservability PMA" [1] would be a more relevant
> > link, as this defines memory areas that either do or do not have
> > forward progress guarantees, including this part:
> >
> >"When LR/SC is used for memory locations marked RsrvNonEventual,
> >  software should provide alternative fall-back mechanisms used when
> >  lack of progress is detected."
> >
> > My reading of this is that if the example you tried stalls, then either
> > the PMA is not RsrvEventual, and it is wrong to rely on ll/sc on this,
> > or that the PMA is marked RsrvEventual but the implementation is
> > buggy.
> Yes, PMA just defines physical memory region attributes, But in our
> processor, when MMU is enabled (satp's value register > 2) in s-mode,
> it will look at our custom PTE's attributes BIT(63) ref [1]:
>
>PTE format:
>| 63 | 62 | 61 | 60 | 59 | 58-8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0
>  SO   CBSH   SERSW   D   A   G   U   X   W   R   V
>  ^^^^^
>BIT(63): SO - Strong Order
>BIT(62): C  - Cacheable
>BIT(61): B  - Bufferable
>BIT(60): SH - Shareable
>BIT(59): SE - Security
>
> So the memory also could be RsrvNone/RsrvEventual.
>
> [1] 
> https://github.com/c-sky/csky-linux/commit/e837aad23148542771794d8a2fcc52afd0fcbf88

Is this about your C-sky architecture or your RISC-V implementation.

If these PTE bits are in your RISC-V implementation then clearly your
RISC-V implementation is not compliant with the RISC-V privilege spec
because these bits are not defined in RISC-V privilege spec.

Regards,
Anup
>
> >
> > It also seems that the current "amoswap" based implementation
> > would be reliable independent of RsrvEventual/RsrvNonEventual.
> Yes, the hardware implementation of AMO could be different from LR/SC.
> AMO could use ACE snoop holding to lock the bus in hw coherency
> design, but LR/SC uses an exclusive monitor without locking the bus.
>
> > arm64 is already in the situation of having to choose between
> > two cmpxchg() implementation at runtime to allow falling back to
> > a slower but more general version, but it's best to avoid that if you
> > can.
> Current RISC-V needn't multiple versions to select, and all AMO &
> LR/SC has been defined in the spec.
>
> RISC-V hasn't CAS instructions, and it uses LR/SC for cmpxchg. I don't
> think LR/SC would be slower than CAS, and CAS is just good for code
> size.
>
> >
> >  Arnd
> >
> > [1] 
> > http://www.five-embeddev.com/riscv-isa-manual/latest/machine.html#atomicity-pmas
>
> --
> Best Regards
>  Guo Ren
>
> ML: https://lore.kernel.org/linux-csky/


Re: [PATCH v16 00/17] KVM RISC-V Support

2021-03-29 Thread Anup Patel
On Sat, Jan 23, 2021 at 9:10 AM Palmer Dabbelt  wrote:
>
> On Fri, 15 Jan 2021 04:18:29 PST (-0800), Anup Patel wrote:
> > This series adds initial KVM RISC-V support. Currently, we are able to boot
> > Linux on RV64/RV32 Guest with multiple VCPUs.
>
> Thanks.  IIUC the spec is still in limbo at the RISC-V foundation?  I haven't
> really been paying attention lately.

There is no change in H-extension spec for more than a year now.

The H-extension spec also has provision for external interrupt controller
with virtualization support (such as the RISC-V AIA specification).

It seems Andrew does not want to freeze H-extension until we have virtualization
aware interrupt controller (such as RISC-V AIA specification) and IOMMU. Lot
of us feel that these things can be done independently because RISC-V
H-extension already has provisions for external interrupt controller with
virtualization support.

The freeze criteria for H-extension is still not clear to me.
Refer, 
https://lists.riscv.org/g/tech-privileged/topic/risc_v_h_extension_freeze/80346318?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,80346318

Regards,
Anup

>
> >
> > Key aspects of KVM RISC-V added by this series are:
> > 1. No RISC-V specific KVM IOCTL
> > 2. Minimal possible KVM world-switch which touches only GPRs and few CSRs
> > 3. Both RV64 and RV32 host supported
> > 4. Full Guest/VM switch is done via vcpu_get/vcpu_put infrastructure
> > 5. KVM ONE_REG interface for VCPU register access from user-space
> > 6. PLIC emulation is done in user-space
> > 7. Timer and IPI emuation is done in-kernel
> > 8. Both Sv39x4 and Sv48x4 supported for RV64 host
> > 9. MMU notifiers supported
> > 10. Generic dirtylog supported
> > 11. FP lazy save/restore supported
> > 12. SBI v0.1 emulation for KVM Guest available
> > 13. Forward unhandled SBI calls to KVM userspace
> > 14. Hugepage support for Guest/VM
> > 15. IOEVENTFD support for Vhost
> >
> > Here's a brief TODO list which we will work upon after this series:
> > 1. SBI v0.2 emulation in-kernel
> > 2. SBI v0.2 hart state management emulation in-kernel
> > 3. In-kernel PLIC emulation
> > 4. . and more .
> >
> > This series can be found in riscv_kvm_v16 branch at:
> > https//github.com/avpatel/linux.git
> >
> > Our work-in-progress KVMTOOL RISC-V port can be found in riscv_v6 branch
> > at: https//github.com/avpatel/kvmtool.git
> >
> > The QEMU RISC-V hypervisor emulation is done by Alistair and is available
> > in master branch at: https://git.qemu.org/git/qemu.git
> >
> > To play around with KVM RISC-V, refer KVM RISC-V wiki at:
> > https://github.com/kvm-riscv/howto/wiki
> > https://github.com/kvm-riscv/howto/wiki/KVM-RISCV64-on-QEMU
> > https://github.com/kvm-riscv/howto/wiki/KVM-RISCV64-on-Spike
> >
> > Changes since v15:
> >  - Rebased on Linux-5.11-rc3
> >  - Fixed kvm_stage2_map() to use gfn_to_pfn_prot() for determing
> >writeability of a host pfn.
> >  - Use "__u64" in-place of "u64" and "__u32" in-place of "u32" for
> >uapi/asm/kvm.h
> >
> > Changes since v14:
> >  - Rebased on Linux-5.10-rc3
> >  - Fixed Stage2 (G-stage) PDG allocation to ensure it is 16KB aligned
> >
> > Changes since v13:
> >  - Rebased on Linux-5.9-rc3
> >  - Fixed kvm_riscv_vcpu_set_reg_csr() for SIP updation in PATCH5
> >  - Fixed instruction length computation in PATCH7
> >  - Added ioeventfd support in PATCH7
> >  - Ensure HSTATUS.SPVP is set to correct value before using HLV/HSV
> >intructions in PATCH7
> >  - Fixed stage2_map_page() to set PTE 'A' and 'D' bits correctly
> >in PATCH10
> >  - Added stage2 dirty page logging in PATCH10
> >  - Allow KVM user-space to SET/GET SCOUNTER CSR in PATCH5
> >  - Save/restore SCOUNTEREN in PATCH6
> >  - Reduced quite a few instructions for __kvm_riscv_switch_to() by
> >using CSR swap instruction in PATCH6
> >  - Detect and use Sv48x4 when available in PATCH10
> >
> > Changes since v12:
> >  - Rebased patches on Linux-5.8-rc4
> >  - By default enable all counters in HCOUNTEREN
> >  - RISC-V H-Extension v0.6.1 spec support
> >
> > Changes since v11:
> >  - Rebased patches on Linux-5.7-rc3
> >  - Fixed typo in typecast of stage2_map_size define
> >  - Introduced struct kvm_cpu_trap to represent trap details and
> >use it as function parameter wherever applicable
> >  - Pass memslot to kvm_riscv_stage2_map() for supporing dirty page
> >logging in future
> >  - RISC-V H-Extension v0.6 spec support
> >  - Se

RE: [PATCH v4 3/4] locking/qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32

2021-03-29 Thread Anup Patel


> -Original Message-
> From: Guo Ren 
> Sent: 30 March 2021 08:44
> To: Peter Zijlstra 
> Cc: linux-riscv ; Linux Kernel Mailing List
> ; linux-c...@vger.kernel.org; linux-arch
> ; Guo Ren ; Will
> Deacon ; Ingo Molnar ; Waiman
> Long ; Arnd Bergmann ; Anup
> Patel 
> Subject: Re: [PATCH v4 3/4] locking/qspinlock: Add
> ARCH_USE_QUEUED_SPINLOCKS_XCHG32
> 
> On Mon, Mar 29, 2021 at 8:50 PM Peter Zijlstra 
> wrote:
> >
> > On Mon, Mar 29, 2021 at 08:01:41PM +0800, Guo Ren wrote:
> > > u32 a = 0x55aa66bb;
> > > u16 *ptr = 
> > >
> > > CPU0   CPU1
> > > = =
> > > xchg16(ptr, new) while(1)
> > > WRITE_ONCE(*(ptr + 1), x);
> > >
> > > When we use lr.w/sc.w implement xchg16, it'll cause CPU0 deadlock.
> >
> > Then I think your LL/SC is broken.
> >
> > That also means you really don't want to build super complex locking
> > primitives on top, because that live-lock will percolate through.
> Do you mean the below implementation has live-lock risk?
> +static __always_inline u32 xchg_tail(struct qspinlock *lock, u32 tail)
> +{
> +   u32 old, new, val = atomic_read(>val);
> +
> +   for (;;) {
> +   new = (val & _Q_LOCKED_PENDING_MASK) | tail;
> +   old = atomic_cmpxchg(>val, val, new);
> +   if (old == val)
> +   break;
> +
> +   val = old;
> +   }
> +   return old;
> +}
> 
> 
> >
> > Step 1 would be to get your architecute fixed such that it can provide
> > fwd progress guarantees for LL/SC. Otherwise there's absolutely no
> > point in building complex systems with it.
> 
> Quote Waiman's comment [1] on xchg16 optimization:
> 
> "This optimization is needed to make the qspinlock achieve performance
> parity with ticket spinlock at light load."
> 
> [1] https://lore.kernel.org/kvm/1429901803-29771-6-git-send-email-
> waiman.l...@hp.com/
> 
> So for a non-xhg16 machine:
>  - ticket-lock for small numbers of CPUs
>  - qspinlock for large numbers of CPUs
> 
> Okay, I'll put all of them into the next patch 

I would suggest to have separate Kconfig opitons for ticket spinlock
in Linux RISC-V which will be disabled by default. This means Linux
RISC-V will use qspinlock by default and use ticket spinlock only when
ticket spinlock kconfig is enabled.

Regards,
Anup


RE: [PATCH v4 3/4] locking/qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32

2021-03-29 Thread Anup Patel



> -Original Message-
> From: Peter Zijlstra 
> Sent: 29 March 2021 16:57
> To: Guo Ren 
> Cc: linux-riscv ; Linux Kernel Mailing List
> ; linux-c...@vger.kernel.org; linux-arch
> ; Guo Ren ; Will
> Deacon ; Ingo Molnar ; Waiman
> Long ; Arnd Bergmann ; Anup
> Patel 
> Subject: Re: [PATCH v4 3/4] locking/qspinlock: Add
> ARCH_USE_QUEUED_SPINLOCKS_XCHG32
> 
> On Mon, Mar 29, 2021 at 07:19:29PM +0800, Guo Ren wrote:
> > On Mon, Mar 29, 2021 at 3:50 PM Peter Zijlstra 
> wrote:
> > >
> > > On Sat, Mar 27, 2021 at 06:06:38PM +, guo...@kernel.org wrote:
> > > > From: Guo Ren 
> > > >
> > > > Some architectures don't have sub-word swap atomic instruction,
> > > > they only have the full word's one.
> > > >
> > > > The sub-word swap only improve the performance when:
> > > > NR_CPUS < 16K
> > > >  *  0- 7: locked byte
> > > >  * 8: pending
> > > >  *  9-15: not used
> > > >  * 16-17: tail index
> > > >  * 18-31: tail cpu (+1)
> > > >
> > > > The 9-15 bits are wasted to use xchg16 in xchg_tail.
> > > >
> > > > Please let architecture select xchg16/xchg32 to implement
> > > > xchg_tail.
> > >
> > > So I really don't like this, this pushes complexity into the generic
> > > code for something that's really not needed.
> > >
> > > Lots of RISC already implement sub-word atomics using word ll/sc.
> > > Obviously they're not sharing code like they should be :/ See for
> > > example arch/mips/kernel/cmpxchg.c.
> > I see, we've done two versions of this:
> >  - Using cmpxchg codes from MIPS by Michael
> >  - Re-write with assembly codes by Guo
> >
> > But using the full-word atomic xchg instructions implement xchg16 has
> > the semantic risk for atomic operations.
> 
> What? -ENOPARSE
> 
> > > Also, I really do think doing ticket locks first is a far more
> > > sensible step.
> > NACK by Anup
> 
> Who's he when he's not sending NAKs ?

We had discussions in the RISC-V platforms group about this. Over there,
We had evaluated all spin lock approaches (ticket, qspinlock, etc) tried
in Linux till now. It was concluded in those discussions that eventually we
have to move to qspinlock (even if we moved to ticket spinlock temporarily)
because qspinlock avoids cache line bouncing. Also, moving to qspinlock
will be aligned with other major architectures supported in Linux (such as
x86, ARM64)

Some of the organizations working on high-end RISC-V systems (> 32
CPUs) are interested in having an optimized spinlock implementation
(just like other major architectures x86 and ARM64).

Based on above, Linux RISC-V should move to qspinlock.

Regards,
Anup


Re: [PATCH] riscv: Do not invoke early_init_dt_verify() twice

2021-03-24 Thread Anup Patel
On Wed, Mar 24, 2021 at 8:33 PM Changbin Du  wrote:
>
> In the setup_arch() of riscv, function early_init_dt_verify() has
> been done by parse_dtb(). So no need to call it again. Just directly
> invoke unflatten_device_tree().
>
> Signed-off-by: Changbin Du 
> ---
>  arch/riscv/kernel/setup.c | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> index f8f15332caa2..2a3d487e1710 100644
> --- a/arch/riscv/kernel/setup.c
> +++ b/arch/riscv/kernel/setup.c
> @@ -255,10 +255,7 @@ void __init setup_arch(char **cmdline_p)
>  #if IS_ENABLED(CONFIG_BUILTIN_DTB)
> unflatten_and_copy_device_tree();
>  #else
> -   if (early_init_dt_verify(__va(dtb_early_pa)))
> -   unflatten_device_tree();
> -   else
> -   pr_err("No DTB found in kernel mappings\n");
> +   unflatten_device_tree();
>  #endif

The early_init_dt_verify() set he DTB base address in Linux OF.

When parse_dtb() calls early_init_dt_verify(), MMU is enabled but
we have temporary mapping for DTB (i.e. dtb_early_va).

After paging_init(), we have moved to final swapper_pg_dir so
temporary mapping for DTB does not exists anymore but DTB
is at same physical address so update DTB base address in
Linux OF by calling early_init_dt_verify() again.

Based on above, NACK to this patch.

Regards,
Anup

> misc_mem_init();
>
> --
> 2.30.2
>
>
> ___
> linux-riscv mailing list
> linux-ri...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv


Re: [PATCH] riscv: locks: introduce ticket-based spinlock implementation

2021-03-24 Thread Anup Patel
On Wed, Mar 24, 2021 at 6:08 PM Peter Zijlstra  wrote:
>
> On Wed, Mar 24, 2021 at 05:58:58PM +0530, Anup Patel wrote:
> > On Wed, Mar 24, 2021 at 3:45 PM  wrote:
> > >
> > > From: Guo Ren 
> > >
> > > This patch introduces a ticket lock implementation for riscv, along the
> > > same lines as the implementation for arch/arm & arch/csky.
> > >
> > > Signed-off-by: Guo Ren 
> > > Cc: Catalin Marinas 
> > > Cc: Will Deacon 
> > > Cc: Peter Zijlstra 
> > > Cc: Palmer Dabbelt 
> > > Cc: Anup Patel 
> > > Cc: Arnd Bergmann 
> > > ---
> > >  arch/riscv/Kconfig  |   1 +
> > >  arch/riscv/include/asm/Kbuild   |   1 +
> > >  arch/riscv/include/asm/spinlock.h   | 158 
> > > 
> > >  arch/riscv/include/asm/spinlock_types.h |  19 ++--
> >
> > NACK from myside.
> >
> > Linux ARM64 has moved away from ticket spinlock to qspinlock.
> >
> > We should directly go for qspinlock.
>
> I think it is a sensible intermediate step, even if you want to go
> qspinlock. Ticket locks are more or less trivial and get you fairness
> and all that goodness without the mind bending complexity of qspinlock.
>
> Once you have the ticket lock implementation solid (and qrwlock) and
> everything, *then* start to carefully look at qspinlock.

I do understand qspinlock are relatively complex but the best thing
about qspinlock is it tries to ensure each CPU spins on it's own location.

Instead of adding ticket spinlock now and later replacing it with qspinlock,
it is better to straight away explore qspinlock hence my NACK.

>
> Now, arguably arm64 did the heavy lifting of making qspinlock good on
> weak architectures, but if you want to do it right, you still have to
> analyze the whole thing for your own architecture.

Most of the RISC-V implementations are weak memory ordering so it
makes more sense to explore qspinlock first.

Regards,
Anup


Re: [PATCH] riscv: locks: introduce ticket-based spinlock implementation

2021-03-24 Thread Anup Patel
On Wed, Mar 24, 2021 at 3:45 PM  wrote:
>
> From: Guo Ren 
>
> This patch introduces a ticket lock implementation for riscv, along the
> same lines as the implementation for arch/arm & arch/csky.
>
> Signed-off-by: Guo Ren 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Peter Zijlstra 
> Cc: Palmer Dabbelt 
> Cc: Anup Patel 
> Cc: Arnd Bergmann 
> ---
>  arch/riscv/Kconfig  |   1 +
>  arch/riscv/include/asm/Kbuild   |   1 +
>  arch/riscv/include/asm/spinlock.h   | 158 
> 
>  arch/riscv/include/asm/spinlock_types.h |  19 ++--

NACK from myside.

Linux ARM64 has moved away from ticket spinlock to qspinlock.

We should directly go for qspinlock.

Regards,
Anup

>  4 files changed, 74 insertions(+), 105 deletions(-)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 87d7b52..7c56a20 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -30,6 +30,7 @@ config RISCV
> select ARCH_HAS_STRICT_KERNEL_RWX if MMU
> select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
> select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT
> +   select ARCH_USE_QUEUED_RWLOCKS
> select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
> select ARCH_WANT_FRAME_POINTERS
> select ARCH_WANT_HUGE_PMD_SHARE if 64BIT
> diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
> index 445ccc9..e57ef80 100644
> --- a/arch/riscv/include/asm/Kbuild
> +++ b/arch/riscv/include/asm/Kbuild
> @@ -3,5 +3,6 @@ generic-y += early_ioremap.h
>  generic-y += extable.h
>  generic-y += flat.h
>  generic-y += kvm_para.h
> +generic-y += qrwlock.h
>  generic-y += user.h
>  generic-y += vmlinux.lds.h
> diff --git a/arch/riscv/include/asm/spinlock.h 
> b/arch/riscv/include/asm/spinlock.h
> index f4f7fa1..2c81764 100644
> --- a/arch/riscv/include/asm/spinlock.h
> +++ b/arch/riscv/include/asm/spinlock.h
> @@ -7,129 +7,91 @@
>  #ifndef _ASM_RISCV_SPINLOCK_H
>  #define _ASM_RISCV_SPINLOCK_H
>
> -#include 
> -#include 
> -#include 
> -
>  /*
> - * Simple spin lock operations.  These provide no fairness guarantees.
> + * Ticket-based spin-locking.
>   */
> +static inline void arch_spin_lock(arch_spinlock_t *lock)
> +{
> +   arch_spinlock_t lockval;
> +   u32 tmp;
> +
> +   asm volatile (
> +   "1: lr.w%0, %2  \n"
> +   "   mv  %1, %0  \n"
> +   "   addw%0, %0, %3  \n"
> +   "   sc.w%0, %0, %2  \n"
> +   "   bnez%0, 1b  \n"
> +   : "=" (tmp), "=" (lockval), "+A" (lock->lock)
> +   : "r" (1 << TICKET_NEXT)
> +   : "memory");
>
> -/* FIXME: Replace this with a ticket lock, like MIPS. */
> -
> -#define arch_spin_is_locked(x) (READ_ONCE((x)->lock) != 0)
> +   while (lockval.tickets.next != lockval.tickets.owner) {
> +   /*
> +* FIXME - we need wfi/wfe here to prevent:
> +*  - cache line bouncing
> +*  - saving cpu pipeline in multi-harts-per-core
> +*processor
> +*/
> +   lockval.tickets.owner = READ_ONCE(lock->tickets.owner);
> +   }
>
> -static inline void arch_spin_unlock(arch_spinlock_t *lock)
> -{
> -   smp_store_release(>lock, 0);
> +   __atomic_acquire_fence();
>  }
>
>  static inline int arch_spin_trylock(arch_spinlock_t *lock)
>  {
> -   int tmp = 1, busy;
> -
> -   __asm__ __volatile__ (
> -   "   amoswap.w %0, %2, %1\n"
> -   RISCV_ACQUIRE_BARRIER
> -   : "=r" (busy), "+A" (lock->lock)
> -   : "r" (tmp)
> +   u32 tmp, contended, res;
> +
> +   do {
> +   asm volatile (
> +   "   lr.w%0, %3  \n"
> +   "   srliw   %1, %0, %5  \n"
> +   "   slliw   %2, %0, %5  \n"
> +   "   or  %1, %2, %1  \n"
> +   "   li  %2, 0   \n"
> +   "   sub %1, %1, %0  \n"
> +   "   bnez%1, 1f  \n"
> +   "   addw%0, %0, %4  \n"
> +   "   sc.w%2, %0, %3  \n"
> +   "1: \n"
> +   : "=&q

[RFC PATCH v3 7/8] dt-bindings: Add common bindings for ARM and RISC-V idle states

2021-03-18 Thread Anup Patel
The RISC-V CPU idle states will be described in under the
/cpus/idle-states DT node in the same way as ARM CPU idle
states.

This patch adds common bindings documentation for both ARM
and RISC-V idle states.

Signed-off-by: Anup Patel 
---
 .../bindings/{arm => cpu}/idle-states.yaml| 228 --
 .../devicetree/bindings/riscv/cpus.yaml   |   6 +
 2 files changed, 217 insertions(+), 17 deletions(-)
 rename Documentation/devicetree/bindings/{arm => cpu}/idle-states.yaml (74%)

diff --git a/Documentation/devicetree/bindings/arm/idle-states.yaml 
b/Documentation/devicetree/bindings/cpu/idle-states.yaml
similarity index 74%
rename from Documentation/devicetree/bindings/arm/idle-states.yaml
rename to Documentation/devicetree/bindings/cpu/idle-states.yaml
index 52bce5dbb11f..74466f160cb2 100644
--- a/Documentation/devicetree/bindings/arm/idle-states.yaml
+++ b/Documentation/devicetree/bindings/cpu/idle-states.yaml
@@ -1,25 +1,30 @@
 # SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
 %YAML 1.2
 ---
-$id: http://devicetree.org/schemas/arm/idle-states.yaml#
+$id: http://devicetree.org/schemas/cpu/idle-states.yaml#
 $schema: http://devicetree.org/meta-schemas/core.yaml#
 
-title: ARM idle states binding description
+title: Idle states binding description
 
 maintainers:
   - Lorenzo Pieralisi 
+  - Anup Patel 
 
 description: |+
   ==
   1 - Introduction
   ==
 
-  ARM systems contain HW capable of managing power consumption dynamically,
-  where cores can be put in different low-power states (ranging from simple wfi
-  to power gating) according to OS PM policies. The CPU states representing the
-  range of dynamic idle states that a processor can enter at run-time, can be
-  specified through device tree bindings representing the parameters required 
to
-  enter/exit specific idle states on a given processor.
+  ARM and RISC-V systems contain HW capable of managing power consumption
+  dynamically, where cores can be put in different low-power states (ranging
+  from simple wfi to power gating) according to OS PM policies. The CPU states
+  representing the range of dynamic idle states that a processor can enter at
+  run-time, can be specified through device tree bindings representing the
+  parameters required to enter/exit specific idle states on a given processor.
+
+  ==
+  2 - ARM idle states
+  ==
 
   According to the Server Base System Architecture document (SBSA, [3]), the
   power states an ARM CPU can be put into are identified by the following list:
@@ -43,8 +48,23 @@ description: |+
   The device tree binding definition for ARM idle states is the subject of this
   document.
 
+  ==
+  3 - RISC-V idle states
+  ==
+
+  On RISC-V systems, the HARTs (or CPUs) [6] can be put in platform specific
+  suspend (or idle) states (ranging from simple WFI, power gating, etc). The
+  RISC-V SBI v0.3 (or higher) [7] hart state management extension provides a
+  standard mechanism for OS to request HART state transitions.
+
+  The platform specific suspend (or idle) states of a hart can be either
+  retentive or non-rententive in nature. A retentive suspend state will
+  preserve HART registers and CSR values for all privilege modes whereas
+  a non-retentive suspend state will not preserve HART registers and CSR
+  values.
+
   ===
-  2 - idle-states definitions
+  4 - idle-states definitions
   ===
 
   Idle states are characterized for a specific system through a set of
@@ -211,10 +231,10 @@ description: |+
   properties specification that is the subject of the following sections.
 
   ===
-  3 - idle-states node
+  5 - idle-states node
   ===
 
-  ARM processor idle states are defined within the idle-states node, which is
+  The processor idle states are defined within the idle-states node, which is
   a direct child of the cpus node [1] and provides a container where the
   processor idle states, defined as device tree nodes, are listed.
 
@@ -223,7 +243,7 @@ description: |+
   just supports idle_standby, an idle-states node is not required.
 
   ===
-  4 - References
+  6 - References
   ===
 
   [1] ARM Linux Kernel documentation - CPUs bindings
@@ -238,9 +258,15 @@ description: |+
   [4] ARM Architecture Reference Manuals
   http://infocenter.arm.com/help/index.jsp
 
-  [6] ARM Linux Kernel documentation - Booting AArch64 Linux
+  [5] ARM Linux Kernel documentation - Booting AArch64 Linux
   Documentation/arm64/booting.rst
 
+  [6] RISC-V Linux Kernel documentation - CPUs bindings
+  Documen

[RFC PATCH v3 8/8] RISC-V: Enable RISC-V SBI CPU Idle driver for QEMU virt machine

2021-03-18 Thread Anup Patel
We enable RISC-V SBI CPU Idle driver for QEMU virt machine to test
SBI HSM Supend on QEMU.

Signed-off-by: Anup Patel 
---
 arch/riscv/Kconfig.socs   | 3 +++
 arch/riscv/configs/defconfig  | 1 +
 arch/riscv/configs/rv32_defconfig | 1 +
 3 files changed, 5 insertions(+)

diff --git a/arch/riscv/Kconfig.socs b/arch/riscv/Kconfig.socs
index 7efcece8896c..efdf6fbe18dd 100644
--- a/arch/riscv/Kconfig.socs
+++ b/arch/riscv/Kconfig.socs
@@ -19,6 +19,9 @@ config SOC_VIRT
select GOLDFISH
select RTC_DRV_GOLDFISH if RTC_CLASS
select SIFIVE_PLIC
+   select PM_GENERIC_DOMAINS if PM
+   select PM_GENERIC_DOMAINS_OF if PM && OF
+   select RISCV_SBI_CPUIDLE if CPU_IDLE
help
  This enables support for QEMU Virt Machine.
 
diff --git a/arch/riscv/configs/defconfig b/arch/riscv/configs/defconfig
index dc4927c0e44b..aac26c20bbf5 100644
--- a/arch/riscv/configs/defconfig
+++ b/arch/riscv/configs/defconfig
@@ -19,6 +19,7 @@ CONFIG_SOC_SIFIVE=y
 CONFIG_SOC_VIRT=y
 CONFIG_SMP=y
 CONFIG_HOTPLUG_CPU=y
+CONFIG_PM=y
 CONFIG_CPU_IDLE=y
 CONFIG_JUMP_LABEL=y
 CONFIG_MODULES=y
diff --git a/arch/riscv/configs/rv32_defconfig 
b/arch/riscv/configs/rv32_defconfig
index 332e43a4a2c3..2285c95e34b3 100644
--- a/arch/riscv/configs/rv32_defconfig
+++ b/arch/riscv/configs/rv32_defconfig
@@ -20,6 +20,7 @@ CONFIG_SOC_VIRT=y
 CONFIG_ARCH_RV32I=y
 CONFIG_SMP=y
 CONFIG_HOTPLUG_CPU=y
+CONFIG_PM=y
 CONFIG_CPU_IDLE=y
 CONFIG_JUMP_LABEL=y
 CONFIG_MODULES=y
-- 
2.25.1



[RFC PATCH v3 6/8] cpuidle: Add RISC-V SBI CPU idle driver

2021-03-18 Thread Anup Patel
The RISC-V SBI HSM extension provides HSM suspend call which can
be used by Linux RISC-V to enter platform specific low-power state.

This patch adds a CPU idle driver based on RISC-V SBI calls which
will populate idle states from device tree and use SBI calls to
entry these idle states.

Signed-off-by: Anup Patel 
---
 MAINTAINERS   |   8 +
 drivers/cpuidle/Kconfig   |   5 +
 drivers/cpuidle/Kconfig.riscv |  15 +
 drivers/cpuidle/Makefile  |   4 +
 drivers/cpuidle/cpuidle-sbi.c | 502 ++
 5 files changed, 534 insertions(+)
 create mode 100644 drivers/cpuidle/Kconfig.riscv
 create mode 100644 drivers/cpuidle/cpuidle-sbi.c

diff --git a/MAINTAINERS b/MAINTAINERS
index aa84121c5611..4954112efdb4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4679,6 +4679,14 @@ S:   Supported
 F: drivers/cpuidle/cpuidle-psci.h
 F: drivers/cpuidle/cpuidle-psci-domain.c
 
+CPUIDLE DRIVER - RISC-V SBI
+M: Anup Patel 
+R: Sandeep Tripathy 
+L: linux...@vger.kernel.org
+L: linux-ri...@lists.infradead.org
+S: Supported
+F: drivers/cpuidle/cpuidle-sbi.c
+
 CRAMFS FILESYSTEM
 M: Nicolas Pitre 
 S: Maintained
diff --git a/drivers/cpuidle/Kconfig b/drivers/cpuidle/Kconfig
index f1afe7ab6b54..ff71dd662880 100644
--- a/drivers/cpuidle/Kconfig
+++ b/drivers/cpuidle/Kconfig
@@ -66,6 +66,11 @@ depends on PPC
 source "drivers/cpuidle/Kconfig.powerpc"
 endmenu
 
+menu "RISC-V CPU Idle Drivers"
+depends on RISCV
+source "drivers/cpuidle/Kconfig.riscv"
+endmenu
+
 config HALTPOLL_CPUIDLE
tristate "Halt poll cpuidle driver"
depends on X86 && KVM_GUEST
diff --git a/drivers/cpuidle/Kconfig.riscv b/drivers/cpuidle/Kconfig.riscv
new file mode 100644
index ..78518c26af74
--- /dev/null
+++ b/drivers/cpuidle/Kconfig.riscv
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# RISC-V CPU Idle drivers
+#
+
+config RISCV_SBI_CPUIDLE
+   bool "RISC-V SBI CPU idle Driver"
+   depends on RISCV_SBI
+   select DT_IDLE_STATES
+   select CPU_IDLE_MULTIPLE_DRIVERS
+   select DT_IDLE_GENPD if PM_GENERIC_DOMAINS_OF
+   help
+ Select this option to enable RISC-V SBI firmware based CPU idle
+ driver for RISC-V systems. This drivers also supports hierarchical
+ DT based layout of the idle state.
diff --git a/drivers/cpuidle/Makefile b/drivers/cpuidle/Makefile
index 11a26cef279f..a36922c18510 100644
--- a/drivers/cpuidle/Makefile
+++ b/drivers/cpuidle/Makefile
@@ -35,3 +35,7 @@ obj-$(CONFIG_MIPS_CPS_CPUIDLE)+= cpuidle-cps.o
 # POWERPC drivers
 obj-$(CONFIG_PSERIES_CPUIDLE)  += cpuidle-pseries.o
 obj-$(CONFIG_POWERNV_CPUIDLE)  += cpuidle-powernv.o
+
+###
+# RISC-V drivers
+obj-$(CONFIG_RISCV_SBI_CPUIDLE)+= cpuidle-sbi.o
diff --git a/drivers/cpuidle/cpuidle-sbi.c b/drivers/cpuidle/cpuidle-sbi.c
new file mode 100644
index ..47938fff61e1
--- /dev/null
+++ b/drivers/cpuidle/cpuidle-sbi.c
@@ -0,0 +1,502 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * RISC-V SBI CPU idle driver.
+ *
+ * Copyright (c) 2021 Western Digital Corporation or its affiliates.
+ */
+
+#define pr_fmt(fmt) "cpuidle-sbi: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "dt_idle_states.h"
+#include "dt_idle_genpd.h"
+
+struct sbi_cpuidle_data {
+   u32 *states;
+   struct device *dev;
+};
+
+struct sbi_domain_state {
+   bool available;
+   u32 state;
+};
+
+static DEFINE_PER_CPU_READ_MOSTLY(struct sbi_cpuidle_data, sbi_cpuidle_data);
+static DEFINE_PER_CPU(struct sbi_domain_state, domain_state);
+static bool sbi_cpuidle_use_osi;
+static bool sbi_cpuidle_use_cpuhp;
+static bool sbi_cpuidle_pd_allow_domain_state;
+
+static inline void sbi_set_domain_state(u32 state)
+{
+   struct sbi_domain_state *data = this_cpu_ptr(_state);
+
+   data->available = true;
+   data->state = state;
+}
+
+static inline u32 sbi_get_domain_state(void)
+{
+   struct sbi_domain_state *data = this_cpu_ptr(_state);
+
+   return data->state;
+}
+
+static inline void sbi_clear_domain_state(void)
+{
+   struct sbi_domain_state *data = this_cpu_ptr(_state);
+
+   data->available = false;
+}
+
+static inline bool sbi_is_domain_state_available(void)
+{
+   struct sbi_domain_state *data = this_cpu_ptr(_state);
+
+   return data->available;
+}
+
+static int sbi_suspend_finisher(unsigned long suspend_type,
+   unsigned long resume_addr,
+   unsigned long opaque)
+{
+   struct sbiret ret;
+
+   ret = sbi_ecall(SBI_EXT_HSM, SBI_EXT_HSM_HART_SUSPEND,
+   

[RFC PATCH v3 4/8] RISC-V: Add SBI HSM suspend related defines

2021-03-18 Thread Anup Patel
We add defines related to SBI HSM suspend call and also
update HSM states naming as-per latest SBI specification.

Signed-off-by: Anup Patel 
---
 arch/riscv/include/asm/sbi.h| 27 ++-
 arch/riscv/kernel/cpu_ops_sbi.c |  2 +-
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index 79fa9f28b786..4bdccec77a84 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -62,15 +62,32 @@ enum sbi_ext_hsm_fid {
SBI_EXT_HSM_HART_START = 0,
SBI_EXT_HSM_HART_STOP,
SBI_EXT_HSM_HART_STATUS,
+   SBI_EXT_HSM_HART_SUSPEND,
 };
 
-enum sbi_hsm_hart_status {
-   SBI_HSM_HART_STATUS_STARTED = 0,
-   SBI_HSM_HART_STATUS_STOPPED,
-   SBI_HSM_HART_STATUS_START_PENDING,
-   SBI_HSM_HART_STATUS_STOP_PENDING,
+enum sbi_hsm_hart_state {
+   SBI_HSM_STATE_STARTED = 0,
+   SBI_HSM_STATE_STOPPED,
+   SBI_HSM_STATE_START_PENDING,
+   SBI_HSM_STATE_STOP_PENDING,
+   SBI_HSM_STATE_SUSPENDED,
+   SBI_HSM_STATE_SUSPEND_PENDING,
+   SBI_HSM_STATE_RESUME_PENDING,
 };
 
+#define SBI_HSM_SUSP_BASE_MASK 0x7fff
+#define SBI_HSM_SUSP_NON_RET_BIT   0x8000
+#define SBI_HSM_SUSP_PLAT_BASE 0x1000
+
+#define SBI_HSM_SUSPEND_RET_DEFAULT0x
+#define SBI_HSM_SUSPEND_RET_PLATFORM   SBI_HSM_SUSP_PLAT_BASE
+#define SBI_HSM_SUSPEND_RET_LAST   SBI_HSM_SUSP_BASE_MASK
+#define SBI_HSM_SUSPEND_NON_RET_DEFAULTSBI_HSM_SUSP_NON_RET_BIT
+#define SBI_HSM_SUSPEND_NON_RET_PLATFORM   (SBI_HSM_SUSP_NON_RET_BIT | \
+SBI_HSM_SUSP_PLAT_BASE)
+#define SBI_HSM_SUSPEND_NON_RET_LAST   (SBI_HSM_SUSP_NON_RET_BIT | \
+SBI_HSM_SUSP_BASE_MASK)
+
 enum sbi_ext_srst_fid {
SBI_EXT_SRST_RESET = 0,
 };
diff --git a/arch/riscv/kernel/cpu_ops_sbi.c b/arch/riscv/kernel/cpu_ops_sbi.c
index 685fae72b7f5..5fd90f03a3e9 100644
--- a/arch/riscv/kernel/cpu_ops_sbi.c
+++ b/arch/riscv/kernel/cpu_ops_sbi.c
@@ -97,7 +97,7 @@ static int sbi_cpu_is_stopped(unsigned int cpuid)
 
rc = sbi_hsm_hart_get_status(hartid);
 
-   if (rc == SBI_HSM_HART_STATUS_STOPPED)
+   if (rc == SBI_HSM_STATE_STOPPED)
return 0;
return rc;
 }
-- 
2.25.1



[RFC PATCH v3 5/8] cpuidle: Factor-out power domain related code from PSCI domain driver

2021-03-18 Thread Anup Patel
The generic power domain related code in PSCI domain driver is largely
independent of PSCI and can be shared with RISC-V SBI domain driver
hence we factor-out this code into dt_idle_genpd.c and dt_idle_genpd.h.

Signed-off-by: Anup Patel 
---
 drivers/cpuidle/Kconfig   |   4 +
 drivers/cpuidle/Kconfig.arm   |   1 +
 drivers/cpuidle/Makefile  |   1 +
 drivers/cpuidle/cpuidle-psci-domain.c | 244 +-
 drivers/cpuidle/cpuidle-psci.h|  15 +-
 ...{cpuidle-psci-domain.c => dt_idle_genpd.c} | 165 
 drivers/cpuidle/dt_idle_genpd.h   |  42 +++
 7 files changed, 121 insertions(+), 351 deletions(-)
 copy drivers/cpuidle/{cpuidle-psci-domain.c => dt_idle_genpd.c} (52%)
 create mode 100644 drivers/cpuidle/dt_idle_genpd.h

diff --git a/drivers/cpuidle/Kconfig b/drivers/cpuidle/Kconfig
index c0aeedd66f02..f1afe7ab6b54 100644
--- a/drivers/cpuidle/Kconfig
+++ b/drivers/cpuidle/Kconfig
@@ -47,6 +47,10 @@ config CPU_IDLE_GOV_HALTPOLL
 config DT_IDLE_STATES
bool
 
+config DT_IDLE_GENPD
+   depends on PM_GENERIC_DOMAINS_OF
+   bool
+
 menu "ARM CPU Idle Drivers"
 depends on ARM || ARM64
 source "drivers/cpuidle/Kconfig.arm"
diff --git a/drivers/cpuidle/Kconfig.arm b/drivers/cpuidle/Kconfig.arm
index 0844fadc4be8..1007435ae298 100644
--- a/drivers/cpuidle/Kconfig.arm
+++ b/drivers/cpuidle/Kconfig.arm
@@ -27,6 +27,7 @@ config ARM_PSCI_CPUIDLE_DOMAIN
bool "PSCI CPU idle Domain"
depends on ARM_PSCI_CPUIDLE
depends on PM_GENERIC_DOMAINS_OF
+   select DT_IDLE_GENPD
default y
help
  Select this to enable the PSCI based CPUidle driver to use PM domains,
diff --git a/drivers/cpuidle/Makefile b/drivers/cpuidle/Makefile
index 26bbc5e74123..11a26cef279f 100644
--- a/drivers/cpuidle/Makefile
+++ b/drivers/cpuidle/Makefile
@@ -6,6 +6,7 @@
 obj-y += cpuidle.o driver.o governor.o sysfs.o governors/
 obj-$(CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED) += coupled.o
 obj-$(CONFIG_DT_IDLE_STATES) += dt_idle_states.o
+obj-$(CONFIG_DT_IDLE_GENPD)  += dt_idle_genpd.o
 obj-$(CONFIG_ARCH_HAS_CPU_RELAX) += poll_state.o
 obj-$(CONFIG_HALTPOLL_CPUIDLE)   += cpuidle-haltpoll.o
 
diff --git a/drivers/cpuidle/cpuidle-psci-domain.c 
b/drivers/cpuidle/cpuidle-psci-domain.c
index ff2c3f8e4668..b0621d890ab7 100644
--- a/drivers/cpuidle/cpuidle-psci-domain.c
+++ b/drivers/cpuidle/cpuidle-psci-domain.c
@@ -16,17 +16,9 @@
 #include 
 #include 
 #include 
-#include 
-#include 
 
 #include "cpuidle-psci.h"
 
-struct psci_pd_provider {
-   struct list_head link;
-   struct device_node *node;
-};
-
-static LIST_HEAD(psci_pd_providers);
 static bool psci_pd_allow_domain_state;
 
 static int psci_pd_power_off(struct generic_pm_domain *pd)
@@ -47,178 +39,6 @@ static int psci_pd_power_off(struct generic_pm_domain *pd)
return 0;
 }
 
-static int psci_pd_parse_state_nodes(struct genpd_power_state *states,
-int state_count)
-{
-   int i, ret;
-   u32 psci_state, *psci_state_buf;
-
-   for (i = 0; i < state_count; i++) {
-   ret = psci_dt_parse_state_node(to_of_node(states[i].fwnode),
-   _state);
-   if (ret)
-   goto free_state;
-
-   psci_state_buf = kmalloc(sizeof(u32), GFP_KERNEL);
-   if (!psci_state_buf) {
-   ret = -ENOMEM;
-   goto free_state;
-   }
-   *psci_state_buf = psci_state;
-   states[i].data = psci_state_buf;
-   }
-
-   return 0;
-
-free_state:
-   i--;
-   for (; i >= 0; i--)
-   kfree(states[i].data);
-   return ret;
-}
-
-static int psci_pd_parse_states(struct device_node *np,
-   struct genpd_power_state **states, int *state_count)
-{
-   int ret;
-
-   /* Parse the domain idle states. */
-   ret = of_genpd_parse_idle_states(np, states, state_count);
-   if (ret)
-   return ret;
-
-   /* Fill out the PSCI specifics for each found state. */
-   ret = psci_pd_parse_state_nodes(*states, *state_count);
-   if (ret)
-   kfree(*states);
-
-   return ret;
-}
-
-static void psci_pd_free_states(struct genpd_power_state *states,
-   unsigned int state_count)
-{
-   int i;
-
-   for (i = 0; i < state_count; i++)
-   kfree(states[i].data);
-   kfree(states);
-}
-
-static int psci_pd_init(struct device_node *np, bool use_osi)
-{
-   struct generic_pm_domain *pd;
-   struct psci_pd_provider *pd_provider;
-   struct dev_power_governor *pd_gov;
-   struct genpd_power_state *states = NULL;
-   int ret = -ENOMEM, state_count = 0;
-
-   pd = kzalloc(sizeof(*pd),

[RFC PATCH v3 0/8] RISC-V CPU Idle Support

2021-03-18 Thread Anup Patel
This series adds RISC-V CPU Idle support using SBI HSM suspend function.
The RISC-V SBI CPU idle driver added by this series is highly inspired
from the ARM PSCI CPU idle driver.

At high-level, this series includes the following changes:
1) Preparatory arch/riscv patches (Patches 1 to 3)
2) Defines for RISC-V SBI HSM suspend (Patch 4)
3) Preparatory patch to share code between RISC-V SBI CPU idle driver
   and ARM PSCI CPU idle driver (Patch 5)
4) RISC-V SBI CPU idle driver and related DT bindings (Patches 6 to 7)

These patches can be found in riscv_sbi_hsm_suspend_v3 branch at
https://github.com/avpatel/linux

Special thanks Sandeep Tripathy for providing early feeback on SBI HSM
support in all above projects (RISC-V SBI specification, OpenSBI, and
Linux RISC-V).

Changes since v2:
 - Rebased on Linux-5.12-rc3
 - Updated PATCH7 to add common DT bindings for both ARM and RISC-V
   idle states
 - Added "additionalProperties = false" for both idle-states node and
   child nodes in PATCH7

Changes since v1:
 - Fixex minor typo in PATCH1
 - Use just "idle-states" as DT node name for CPU idle states
 - Added documentation for "cpu-idle-states" DT property in
   devicetree/bindings/riscv/cpus.yaml
 - Added documentation for "riscv,sbi-suspend-param" DT property in
   devicetree/bindings/riscv/idle-states.yaml

Anup Patel (8):
  RISC-V: Enable CPU_IDLE drivers
  RISC-V: Rename relocate() and make it global
  RISC-V: Add arch functions for non-retentive suspend entry/exit
  RISC-V: Add SBI HSM suspend related defines
  cpuidle: Factor-out power domain related code from PSCI domain driver
  cpuidle: Add RISC-V SBI CPU idle driver
  dt-bindings: Add common bindings for ARM and RISC-V idle states
  RISC-V: Enable RISC-V SBI CPU Idle driver for QEMU virt machine

 .../bindings/{arm => cpu}/idle-states.yaml| 228 +++-
 .../devicetree/bindings/riscv/cpus.yaml   |   6 +
 MAINTAINERS   |   8 +
 arch/riscv/Kconfig|   7 +
 arch/riscv/Kconfig.socs   |   3 +
 arch/riscv/configs/defconfig  |   8 +-
 arch/riscv/configs/rv32_defconfig |   5 +-
 arch/riscv/include/asm/cpuidle.h  |  24 +
 arch/riscv/include/asm/sbi.h  |  27 +-
 arch/riscv/include/asm/suspend.h  |  35 ++
 arch/riscv/kernel/Makefile|   2 +
 arch/riscv/kernel/asm-offsets.c   |   3 +
 arch/riscv/kernel/cpu_ops_sbi.c   |   2 +-
 arch/riscv/kernel/head.S  |   7 +-
 arch/riscv/kernel/process.c   |   3 +-
 arch/riscv/kernel/suspend.c   |  86 +++
 arch/riscv/kernel/suspend_entry.S | 116 
 drivers/cpuidle/Kconfig   |   9 +
 drivers/cpuidle/Kconfig.arm   |   1 +
 drivers/cpuidle/Kconfig.riscv |  15 +
 drivers/cpuidle/Makefile  |   5 +
 drivers/cpuidle/cpuidle-psci-domain.c | 244 +
 drivers/cpuidle/cpuidle-psci.h|  15 +-
 drivers/cpuidle/cpuidle-sbi.c | 502 ++
 ...{cpuidle-psci-domain.c => dt_idle_genpd.c} | 165 ++
 drivers/cpuidle/dt_idle_genpd.h   |  42 ++
 26 files changed, 1184 insertions(+), 384 deletions(-)
 rename Documentation/devicetree/bindings/{arm => cpu}/idle-states.yaml (74%)
 create mode 100644 arch/riscv/include/asm/cpuidle.h
 create mode 100644 arch/riscv/include/asm/suspend.h
 create mode 100644 arch/riscv/kernel/suspend.c
 create mode 100644 arch/riscv/kernel/suspend_entry.S
 create mode 100644 drivers/cpuidle/Kconfig.riscv
 create mode 100644 drivers/cpuidle/cpuidle-sbi.c
 copy drivers/cpuidle/{cpuidle-psci-domain.c => dt_idle_genpd.c} (52%)
 create mode 100644 drivers/cpuidle/dt_idle_genpd.h

-- 
2.25.1



[RFC PATCH v3 3/8] RISC-V: Add arch functions for non-retentive suspend entry/exit

2021-03-18 Thread Anup Patel
The hart registers and CSRs are not preserved in non-retentative
suspend state so we provide arch specific helper functions which
will save/restore hart context upon entry/exit to non-retentive
suspend state. These helper functions can be used by cpuidle
drivers for non-retentive suspend entry/exit.

Signed-off-by: Anup Patel 
---
 arch/riscv/include/asm/suspend.h  |  35 +
 arch/riscv/kernel/Makefile|   2 +
 arch/riscv/kernel/asm-offsets.c   |   3 +
 arch/riscv/kernel/suspend.c   |  86 ++
 arch/riscv/kernel/suspend_entry.S | 116 ++
 5 files changed, 242 insertions(+)
 create mode 100644 arch/riscv/include/asm/suspend.h
 create mode 100644 arch/riscv/kernel/suspend.c
 create mode 100644 arch/riscv/kernel/suspend_entry.S

diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
new file mode 100644
index ..63e9f434fb89
--- /dev/null
+++ b/arch/riscv/include/asm/suspend.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2021 Western Digital Corporation or its affiliates.
+ */
+
+#ifndef _ASM_RISCV_SUSPEND_H
+#define _ASM_RISCV_SUSPEND_H
+
+#include 
+
+struct suspend_context {
+   /* Saved and restored by low-level functions */
+   struct pt_regs regs;
+   /* Saved and restored by high-level functions */
+   unsigned long scratch;
+   unsigned long tvec;
+   unsigned long ie;
+#ifdef CONFIG_MMU
+   unsigned long satp;
+#endif
+};
+
+/* Low-level CPU suspend entry function */
+int __cpu_suspend_enter(struct suspend_context *context);
+
+/* High-level CPU suspend which will save context and call finish() */
+int cpu_suspend(unsigned long arg,
+   int (*finish)(unsigned long arg,
+ unsigned long entry,
+ unsigned long context));
+
+/* Low-level CPU resume entry function */
+int __cpu_resume_enter(unsigned long hartid, unsigned long context);
+
+#endif
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index 3dc0abde988a..b9b1b05ab860 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -42,6 +42,8 @@ obj-$(CONFIG_SMP) += cpu_ops_spinwait.o
 obj-$(CONFIG_MODULES)  += module.o
 obj-$(CONFIG_MODULE_SECTIONS)  += module-sections.o
 
+obj-$(CONFIG_CPU_PM)   += suspend_entry.o suspend.o
+
 obj-$(CONFIG_FUNCTION_TRACER)  += mcount.o ftrace.o
 obj-$(CONFIG_DYNAMIC_FTRACE)   += mcount-dyn.o
 
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index 9ef33346853c..2628dfd0f77d 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 void asm_offsets(void);
 
@@ -111,6 +112,8 @@ void asm_offsets(void)
OFFSET(PT_BADADDR, pt_regs, badaddr);
OFFSET(PT_CAUSE, pt_regs, cause);
 
+   OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
+
/*
 * THREAD_{F,X}* might be larger than a S-type offset can handle, but
 * these are used in performance-sensitive assembly so we can't resort
diff --git a/arch/riscv/kernel/suspend.c b/arch/riscv/kernel/suspend.c
new file mode 100644
index ..49dddec30e99
--- /dev/null
+++ b/arch/riscv/kernel/suspend.c
@@ -0,0 +1,86 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2021 Western Digital Corporation or its affiliates.
+ */
+
+#include 
+#include 
+#include 
+
+static void suspend_save_csrs(struct suspend_context *context)
+{
+   context->scratch = csr_read(CSR_SCRATCH);
+   context->tvec = csr_read(CSR_TVEC);
+   context->ie = csr_read(CSR_IE);
+
+   /*
+* No need to save/restore IP CSR (i.e. MIP or SIP) because:
+*
+* 1. For no-MMU (M-mode) kernel, the bits in MIP are set by
+*external devices (such as interrupt controller, timer, etc).
+* 2. For MMU (S-mode) kernel, the bits in SIP are set by
+*M-mode firmware and external devices (such as interrupt
+*controller, etc).
+*/
+
+#ifdef CONFIG_MMU
+   context->satp = csr_read(CSR_SATP);
+#endif
+}
+
+static void suspend_restore_csrs(struct suspend_context *context)
+{
+   csr_write(CSR_SCRATCH, context->scratch);
+   csr_write(CSR_TVEC, context->tvec);
+   csr_write(CSR_IE, context->ie);
+
+#ifdef CONFIG_MMU
+   csr_write(CSR_SATP, context->satp);
+#endif
+}
+
+int cpu_suspend(unsigned long arg,
+   int (*finish)(unsigned long arg,
+ unsigned long entry,
+ unsigned long context))
+{
+   int rc = 0;
+   struct suspend_context context = { 0 };
+
+   /* Finisher should be non-NULL */
+   if (!finish)
+   return -EINVAL;
+
+   /* Save additional CSRs*/
+   suspend_save_csrs();
+
+   /*
+* Function graph tracer sta

[RFC PATCH v3 2/8] RISC-V: Rename relocate() and make it global

2021-03-18 Thread Anup Patel
The low-level relocate() function enables mmu and relocates
execution to link-time addresses. We rename relocate() function
to relocate_enable_mmu() function which is more informative.

Also, the relocate_enable_mmu() function will be used in the
resume path when a CPU wakes-up from a non-retentive suspend
so we make it global symbol.

Signed-off-by: Anup Patel 
---
 arch/riscv/kernel/head.S | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
index f5a9bad86e58..9d10f89e8ab7 100644
--- a/arch/riscv/kernel/head.S
+++ b/arch/riscv/kernel/head.S
@@ -67,7 +67,8 @@ pe_head_start:
 
 .align 2
 #ifdef CONFIG_MMU
-relocate:
+   .global relocate_enable_mmu
+relocate_enable_mmu:
/* Relocate return address */
li a1, PAGE_OFFSET
la a2, _start
@@ -156,7 +157,7 @@ secondary_start_common:
 #ifdef CONFIG_MMU
/* Enable virtual memory and relocate to virtual address */
la a0, swapper_pg_dir
-   call relocate
+   call relocate_enable_mmu
 #endif
call setup_trap_vector
tail smp_callin
@@ -268,7 +269,7 @@ clear_bss_done:
call setup_vm
 #ifdef CONFIG_MMU
la a0, early_pg_dir
-   call relocate
+   call relocate_enable_mmu
 #endif /* CONFIG_MMU */
 
call setup_trap_vector
-- 
2.25.1



[RFC PATCH v3 1/8] RISC-V: Enable CPU_IDLE drivers

2021-03-18 Thread Anup Patel
We force select CPU_PM and provide asm/cpuidle.h so that we can
use CPU IDLE drivers for Linux RISC-V kernel.

Signed-off-by: Anup Patel 
---
 arch/riscv/Kconfig|  7 +++
 arch/riscv/configs/defconfig  |  7 +++
 arch/riscv/configs/rv32_defconfig |  4 ++--
 arch/riscv/include/asm/cpuidle.h  | 24 
 arch/riscv/kernel/process.c   |  3 ++-
 5 files changed, 38 insertions(+), 7 deletions(-)
 create mode 100644 arch/riscv/include/asm/cpuidle.h

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 468642c4e92f..19c9ae909001 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -37,6 +37,7 @@ config RISCV
select CLONE_BACKWARDS
select CLINT_TIMER if !MMU
select COMMON_CLK
+   select CPU_PM if CPU_IDLE
select EDAC_SUPPORT
select GENERIC_ARCH_TOPOLOGY if SMP
select GENERIC_ATOMIC64 if !64BIT
@@ -475,4 +476,10 @@ source "kernel/power/Kconfig"
 
 endmenu
 
+menu "CPU Power Management"
+
+source "drivers/cpuidle/Kconfig"
+
+endmenu
+
 source "drivers/firmware/Kconfig"
diff --git a/arch/riscv/configs/defconfig b/arch/riscv/configs/defconfig
index 6c0625aa96c7..dc4927c0e44b 100644
--- a/arch/riscv/configs/defconfig
+++ b/arch/riscv/configs/defconfig
@@ -13,11 +13,13 @@ CONFIG_USER_NS=y
 CONFIG_CHECKPOINT_RESTORE=y
 CONFIG_BLK_DEV_INITRD=y
 CONFIG_EXPERT=y
+# CONFIG_SYSFS_SYSCALL is not set
 CONFIG_BPF_SYSCALL=y
 CONFIG_SOC_SIFIVE=y
 CONFIG_SOC_VIRT=y
 CONFIG_SMP=y
 CONFIG_HOTPLUG_CPU=y
+CONFIG_CPU_IDLE=y
 CONFIG_JUMP_LABEL=y
 CONFIG_MODULES=y
 CONFIG_MODULE_UNLOAD=y
@@ -65,10 +67,9 @@ CONFIG_HW_RANDOM=y
 CONFIG_HW_RANDOM_VIRTIO=y
 CONFIG_SPI=y
 CONFIG_SPI_SIFIVE=y
+# CONFIG_PTP_1588_CLOCK is not set
 CONFIG_GPIOLIB=y
 CONFIG_GPIO_SIFIVE=y
-# CONFIG_PTP_1588_CLOCK is not set
-CONFIG_POWER_RESET=y
 CONFIG_DRM=y
 CONFIG_DRM_RADEON=y
 CONFIG_DRM_VIRTIO_GPU=y
@@ -132,5 +133,3 @@ CONFIG_DEBUG_BLOCK_EXT_DEVT=y
 # CONFIG_FTRACE is not set
 # CONFIG_RUNTIME_TESTING_MENU is not set
 CONFIG_MEMTEST=y
-# CONFIG_SYSFS_SYSCALL is not set
-CONFIG_EFI=y
diff --git a/arch/riscv/configs/rv32_defconfig 
b/arch/riscv/configs/rv32_defconfig
index 8dd02b842fef..332e43a4a2c3 100644
--- a/arch/riscv/configs/rv32_defconfig
+++ b/arch/riscv/configs/rv32_defconfig
@@ -13,12 +13,14 @@ CONFIG_USER_NS=y
 CONFIG_CHECKPOINT_RESTORE=y
 CONFIG_BLK_DEV_INITRD=y
 CONFIG_EXPERT=y
+# CONFIG_SYSFS_SYSCALL is not set
 CONFIG_BPF_SYSCALL=y
 CONFIG_SOC_SIFIVE=y
 CONFIG_SOC_VIRT=y
 CONFIG_ARCH_RV32I=y
 CONFIG_SMP=y
 CONFIG_HOTPLUG_CPU=y
+CONFIG_CPU_IDLE=y
 CONFIG_JUMP_LABEL=y
 CONFIG_MODULES=y
 CONFIG_MODULE_UNLOAD=y
@@ -67,7 +69,6 @@ CONFIG_HW_RANDOM_VIRTIO=y
 CONFIG_SPI=y
 CONFIG_SPI_SIFIVE=y
 # CONFIG_PTP_1588_CLOCK is not set
-CONFIG_POWER_RESET=y
 CONFIG_DRM=y
 CONFIG_DRM_RADEON=y
 CONFIG_DRM_VIRTIO_GPU=y
@@ -131,4 +132,3 @@ CONFIG_DEBUG_BLOCK_EXT_DEVT=y
 # CONFIG_FTRACE is not set
 # CONFIG_RUNTIME_TESTING_MENU is not set
 CONFIG_MEMTEST=y
-# CONFIG_SYSFS_SYSCALL is not set
diff --git a/arch/riscv/include/asm/cpuidle.h b/arch/riscv/include/asm/cpuidle.h
new file mode 100644
index ..71fdc607d4bc
--- /dev/null
+++ b/arch/riscv/include/asm/cpuidle.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2021 Allwinner Ltd
+ * Copyright (C) 2021 Western Digital Corporation or its affiliates.
+ */
+
+#ifndef _ASM_RISCV_CPUIDLE_H
+#define _ASM_RISCV_CPUIDLE_H
+
+#include 
+#include 
+
+static inline void cpu_do_idle(void)
+{
+   /*
+* Add mb() here to ensure that all
+* IO/MEM accesses are completed prior
+* to entering WFI.
+*/
+   mb();
+   wait_for_interrupt();
+}
+
+#endif
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 6f728e731bed..dd2ef18517f4 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 register unsigned long gp_in_global __asm__("gp");
 
@@ -36,7 +37,7 @@ extern asmlinkage void ret_from_kernel_thread(void);
 
 void arch_cpu_idle(void)
 {
-   wait_for_interrupt();
+   cpu_do_idle();
raw_local_irq_enable();
 }
 
-- 
2.25.1



Re: [RFC PATCH 7/8] dt-bindings: Add bindings documentation for RISC-V idle states

2021-03-18 Thread Anup Patel
On Tue, Mar 16, 2021 at 9:24 PM Rob Herring  wrote:
>
> On Sun, Mar 7, 2021 at 8:18 PM Anup Patel  wrote:
> >
> > On Sat, Mar 6, 2021 at 4:52 AM Rob Herring  wrote:
> > >
> > > On Sun, Feb 21, 2021 at 03:07:57PM +0530, Anup Patel wrote:
> > > > The RISC-V CPU idle states will be described in DT under the
> > > > /cpus/riscv-idle-states DT node. This patch adds the bindings
> > > > documentation for riscv-idle-states DT nodes and idle state DT
> > > > nodes under it.
> > > >
> > > > Signed-off-by: Anup Patel 
> > > > ---
> > > >  .../bindings/riscv/idle-states.yaml   | 250 ++
> > > >  1 file changed, 250 insertions(+)
> > > >  create mode 100644 
> > > > Documentation/devicetree/bindings/riscv/idle-states.yaml
> > > >
> > > > diff --git a/Documentation/devicetree/bindings/riscv/idle-states.yaml 
> > > > b/Documentation/devicetree/bindings/riscv/idle-states.yaml
> > > > new file mode 100644
> > > > index ..3eff763fed23
> > > > --- /dev/null
> > > > +++ b/Documentation/devicetree/bindings/riscv/idle-states.yaml
> > > > @@ -0,0 +1,250 @@
> > > > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > > +%YAML 1.2
> > > > +---
> > > > +$id: http://devicetree.org/schemas/riscv/idle-states.yaml#
> > > > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > > > +
> > > > +title: RISC-V idle states binding description
> > > > +
> > > > +maintainers:
> > > > +  - Anup Patel 
> > > > +
> > > > +description: |+
> > > > +  RISC-V systems can manage power consumption dynamically, where HARTs
> > > > +  (or CPUs) [1] can be put in different platform specific suspend (or
> > > > +  idle) states (ranging from simple WFI, power gating, etc). The RISC-V
> > > > +  SBI [2] hart state management extension provides a standard mechanism
> > > > +  for OSes to request HART state transitions.
> > > > +
> > > > +  The platform specific suspend (or idle) states of a hart can be 
> > > > either
> > > > +  retentive or non-rententive in nature. A retentive suspend state will
> > > > +  preserve hart register and CSR values for all privilege modes whereas
> > > > +  a non-retentive suspend state will not preserve hart register and CSR
> > > > +  values. The suspend (or idle) state entered by executing the WFI
> > > > +  instruction is considered standard on all RISC-V systems and 
> > > > therefore
> > > > +  must not be listed in device tree.
> > > > +
> > > > +  The device tree binding definition for RISC-V idle states described
> > > > +  in this document is quite similar to the ARM idle states [3].
> > > > +
> > > > +  References
> > > > +
> > > > +  [1] RISC-V Linux Kernel documentation - CPUs bindings
> > > > +  Documentation/devicetree/bindings/riscv/cpus.yaml
> > > > +
> > > > +  [2] RISC-V Supervisor Binary Interface (SBI)
> > > > +  http://github.com/riscv/riscv-sbi-doc/riscv-sbi.adoc
> > > > +
> > > > +  [3] ARM idle states binding description - Idle states bindings
> > > > +  Documentation/devicetree/bindings/arm/idle-states.yaml
> > >
> > > I'd assume there's common parts we can share.
> >
> > Yes, except few properties most are the same.
> >
> > We can have a shared DT bindings for both ARM and RISC-V but
> > both architectures will always have some architecture specific details
> > (or properties) which need to be documented under arch specific
> > DT documentation. Is it okay if this is done as a separate series ?
>
> No...

Okay, I will create a common DT bindings for both ARM and RISC-V
in the next revision.

>
> > > > +
> > > > +properties:
> > > > +  $nodename:
> > > > +const: riscv-idle-states
> > >
> > > Just 'idle-states' like Arm.
> >
> > I had tried "idle-states" node name but DT bindings check complaints
> > conflict with ARM idle state bindings.
>
> ...and this being one reason why.
>
> Actually, I think this can all be in 1 doc if you want. It's fine with
> me if a common doc has RiscV and Arm specific properties.

Sure, will add common DT bindings.

>
> > > > +
> > > > +patternProperties

Re: [PATCH v6 1/2] RISC-V: Don't print SBI version for all detected extensions

2021-03-17 Thread Anup Patel
On Wed, Mar 17, 2021 at 10:38 AM Palmer Dabbelt  wrote:
>
> On Mon, 15 Mar 2021 04:04:59 PDT (-0700), Anup Patel wrote:
> > The sbi_init() already prints SBI version before detecting
> > various SBI extensions so we don't need to print SBI version
> > for all detected SBI extensions.
> >
> > Signed-off-by: Anup Patel 
> > ---
> >  arch/riscv/kernel/sbi.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
> > index f4a7db3d309e..c0dcebdd30ec 100644
> > --- a/arch/riscv/kernel/sbi.c
> > +++ b/arch/riscv/kernel/sbi.c
> > @@ -577,19 +577,19 @@ void __init sbi_init(void)
> >   sbi_get_firmware_id(), sbi_get_firmware_version());
> >   if (sbi_probe_extension(SBI_EXT_TIME) > 0) {
> >   __sbi_set_timer = __sbi_set_timer_v02;
> > - pr_info("SBI v0.2 TIME extension detected\n");
> > + pr_info("SBI TIME extension detected\n");
> >   } else {
> >   __sbi_set_timer = __sbi_set_timer_v01;
> >   }
> >   if (sbi_probe_extension(SBI_EXT_IPI) > 0) {
> >   __sbi_send_ipi  = __sbi_send_ipi_v02;
> > - pr_info("SBI v0.2 IPI extension detected\n");
> > + pr_info("SBI IPI extension detected\n");
> >   } else {
> >   __sbi_send_ipi  = __sbi_send_ipi_v01;
> >   }
> >   if (sbi_probe_extension(SBI_EXT_RFENCE) > 0) {
> >   __sbi_rfence= __sbi_rfence_v02;
> > - pr_info("SBI v0.2 RFENCE extension detected\n");
> > + pr_info("SBI RFENCE extension detected\n");
> >   } else {
> >   __sbi_rfence= __sbi_rfence_v01;
> >   }
>
> Thanks.  I'm just putting this one on for-next so you don't have to carry
> around the diff.

Thanks Palmer.


Re: [PATCH] riscv,entry: fix misaligned base for excp_vect_table

2021-03-17 Thread Anup Patel
On Wed, Mar 17, 2021 at 1:48 PM Zihao Yu  wrote:
>
> * In RV64, the size of each entry in excp_vect_table is 8 bytes. If the
>   base of the table is not 8-byte aligned, loading an entry in the table
>   will raise a misaligned exception. Although such exception will be
>   handled by opensbi/bbl, this still causes performance degradation.
>
> Signed-off-by: Zihao Yu 

Looks good to me.

Reviewed-by: Anup Patel 

Regards,
Anup

> ---
>  arch/riscv/kernel/entry.S | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> index 744f3209c..76274a4a1 100644
> --- a/arch/riscv/kernel/entry.S
> +++ b/arch/riscv/kernel/entry.S
> @@ -447,6 +447,7 @@ ENDPROC(__switch_to)
>  #endif
>
> .section ".rodata"
> +   .align LGREG
> /* Exception vector table */
>  ENTRY(excp_vect_table)
> RISCV_PTR do_trap_insn_misaligned
> --
> 2.20.1
>
>
> ___
> linux-riscv mailing list
> linux-ri...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv


Re: [RFC PATCH v1 0/3] IPI and remote TBL flush improvement

2021-03-17 Thread Anup Patel
On Wed, Mar 17, 2021 at 10:23 AM Palmer Dabbelt
 wrote:
>
> On Thu, 11 Mar 2021 08:47:09 PST (-0800), Anup Patel wrote:
> > This series primarily does two things:
> > 1. Allows RISC-V IPI provider to specificy whether IPI operations are
> >suitable for remote TLB flush (PATCH1)
> > 2. Improve remote TLB flush to use IPIs whenever possible (PATCH2)
> > 3. Allow irqchip drivers to handle IPIs from chained IRQ handlers (PATCH3)
>
> IIUC this last one isn't technically used in both forms, as we don't have any
> drivers that behave that way yet?  I'm OK taking it, under the assumption it
> makes keeping the out-of-tree driver for the draft interrupt controller 
> easier,
> but I was wrong then it's probably out of order so I figured I'd check.

The last patch is for RISC-V AIA drivers I am working on.

The draft RISC-V AIA specification is available at:
http://www.jhauser.us/RISCV/riscv-interrupts-019.pdf

>
> Aside from that this generally LGTM.  We are making quite a bit of mess in
> here, but I don't really see a way around that as we need to support the old
> hardware.  We can always do a cleanup when the specifications settle down.

Not all RISC-V platforms will have a mechanism for direct IPI injection from
S-mode so to maintain backward compatibility for older platforms (where
IPI injection will be always through SBI calls) we have chosen the current
approach.

The RISC-V AIA spec is trying to solve this in a way which works for both
S-mode (or HS-mode) and VS-mode. The current RISC-V AIA plan is to
provide IPIs as software injected MSIs between HARTs and this will work
fine for Guest/VM as well.

>
> Oddly enough this did come up in IRC recently and there may be some new bits 
> in
> the CLINT on the FU740 that allow S-mode SW interrupts to show up directly --
> there's at least a "delegate supervisor software interrupt" bit now, but the
> manual only calls out machine mode as being able to set it (though IIUC it's
> memory mapped, so not sure how that would be enforced).  Not saying we need
> that in order to take the last patch, but if it is possible it's probably 
> worth
> giving it a shot when the boards show up.

Adding a few bits in CLINT will not be a complete solution because we also need
a mechanism where a Guest/VM can directly inject IPIs without SBI calls to the
hypervisor.

>
> > This series also a preparatory series for upcoming RISC-V advanced
> > interrupt architecture (AIA) support.
> >
> > These patches can be found in riscv_ipi_imp_v1 branch at
> > https://github.com/avpatel/linux
> >
> > Anup Patel (3):
> >   RISC-V: IPI provider should specify if we can use IPI for remote FENCE
> >   RISC-V: Use IPIs for remote TLB flush when possible
> >   RISC-V: Add handle_IPI_noregs() for irqchip drivers
> >
> >  arch/riscv/include/asm/smp.h  | 19 +-
> >  arch/riscv/kernel/sbi.c   |  2 +-
> >  arch/riscv/kernel/smp.c   | 30 +++
> >  arch/riscv/mm/cacheflush.c|  2 +-
> >  arch/riscv/mm/tlbflush.c  | 62 ---
> >  drivers/clocksource/timer-clint.c |  2 +-
> >  6 files changed, 91 insertions(+), 26 deletions(-)

Regards,
Anup


[RFC PATCH v2 8/8] RISC-V: Enable RISC-V SBI CPU Idle driver for QEMU virt machine

2021-03-16 Thread Anup Patel
We enable RISC-V SBI CPU Idle driver for QEMU virt machine to test
SBI HSM Supend on QEMU.

Signed-off-by: Anup Patel 
---
 arch/riscv/Kconfig.socs   | 3 +++
 arch/riscv/configs/defconfig  | 1 +
 arch/riscv/configs/rv32_defconfig | 1 +
 3 files changed, 5 insertions(+)

diff --git a/arch/riscv/Kconfig.socs b/arch/riscv/Kconfig.socs
index 7efcece8896c..efdf6fbe18dd 100644
--- a/arch/riscv/Kconfig.socs
+++ b/arch/riscv/Kconfig.socs
@@ -19,6 +19,9 @@ config SOC_VIRT
select GOLDFISH
select RTC_DRV_GOLDFISH if RTC_CLASS
select SIFIVE_PLIC
+   select PM_GENERIC_DOMAINS if PM
+   select PM_GENERIC_DOMAINS_OF if PM && OF
+   select RISCV_SBI_CPUIDLE if CPU_IDLE
help
  This enables support for QEMU Virt Machine.
 
diff --git a/arch/riscv/configs/defconfig b/arch/riscv/configs/defconfig
index dc4927c0e44b..aac26c20bbf5 100644
--- a/arch/riscv/configs/defconfig
+++ b/arch/riscv/configs/defconfig
@@ -19,6 +19,7 @@ CONFIG_SOC_SIFIVE=y
 CONFIG_SOC_VIRT=y
 CONFIG_SMP=y
 CONFIG_HOTPLUG_CPU=y
+CONFIG_PM=y
 CONFIG_CPU_IDLE=y
 CONFIG_JUMP_LABEL=y
 CONFIG_MODULES=y
diff --git a/arch/riscv/configs/rv32_defconfig 
b/arch/riscv/configs/rv32_defconfig
index 332e43a4a2c3..2285c95e34b3 100644
--- a/arch/riscv/configs/rv32_defconfig
+++ b/arch/riscv/configs/rv32_defconfig
@@ -20,6 +20,7 @@ CONFIG_SOC_VIRT=y
 CONFIG_ARCH_RV32I=y
 CONFIG_SMP=y
 CONFIG_HOTPLUG_CPU=y
+CONFIG_PM=y
 CONFIG_CPU_IDLE=y
 CONFIG_JUMP_LABEL=y
 CONFIG_MODULES=y
-- 
2.25.1



[RFC PATCH v2 7/8] dt-bindings: Add bindings documentation for RISC-V idle states

2021-03-16 Thread Anup Patel
The RISC-V CPU idle states will be described in DT under the
/cpus/idle-states DT node. This patch adds the bindings documentation
for riscv-idle-states DT nodes and idle state DT nodes under it.

Signed-off-by: Anup Patel 
---
 .../devicetree/bindings/riscv/cpus.yaml   |   6 +
 .../bindings/riscv/idle-states.yaml   | 256 ++
 2 files changed, 262 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/riscv/idle-states.yaml

diff --git a/Documentation/devicetree/bindings/riscv/cpus.yaml 
b/Documentation/devicetree/bindings/riscv/cpus.yaml
index e534f6a7cfa1..482936630525 100644
--- a/Documentation/devicetree/bindings/riscv/cpus.yaml
+++ b/Documentation/devicetree/bindings/riscv/cpus.yaml
@@ -95,6 +95,12 @@ properties:
   - compatible
   - interrupt-controller
 
+  cpu-idle-states:
+$ref: '/schemas/types.yaml#/definitions/phandle-array'
+description: |
+  List of phandles to idle state nodes supported
+  by this hart (see ./idle-states.yaml).
+
 required:
   - riscv,isa
   - interrupt-controller
diff --git a/Documentation/devicetree/bindings/riscv/idle-states.yaml 
b/Documentation/devicetree/bindings/riscv/idle-states.yaml
new file mode 100644
index ..1dbf98905c8e
--- /dev/null
+++ b/Documentation/devicetree/bindings/riscv/idle-states.yaml
@@ -0,0 +1,256 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/riscv/idle-states.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: RISC-V idle states binding description
+
+maintainers:
+  - Anup Patel 
+
+description: |+
+  RISC-V systems can manage power consumption dynamically, where HARTs
+  (or CPUs) [1] can be put in different platform specific suspend (or
+  idle) states (ranging from simple WFI, power gating, etc). The RISC-V
+  SBI [2] hart state management extension provides a standard mechanism
+  for OSes to request HART state transitions.
+
+  The platform specific suspend (or idle) states of a hart can be either
+  retentive or non-rententive in nature. A retentive suspend state will
+  preserve hart register and CSR values for all privilege modes whereas
+  a non-retentive suspend state will not preserve hart register and CSR
+  values. The suspend (or idle) state entered by executing the WFI
+  instruction is considered standard on all RISC-V systems and therefore
+  must not be listed in device tree.
+
+  The device tree binding definition for RISC-V idle states described
+  in this document is quite similar to the ARM idle states [3].
+
+  References
+
+  [1] RISC-V Linux Kernel documentation - CPUs bindings
+  Documentation/devicetree/bindings/riscv/cpus.yaml
+
+  [2] RISC-V Supervisor Binary Interface (SBI)
+  http://github.com/riscv/riscv-sbi-doc/riscv-sbi.adoc
+
+  [3] ARM idle states binding description - Idle states bindings
+  Documentation/devicetree/bindings/arm/idle-states.yaml
+
+properties:
+  $nodename:
+const: idle-states
+
+patternProperties:
+  "^(cpu|cluster)-":
+type: object
+description: |
+  Each state node represents an idle state description and must be
+  defined as follows.
+
+properties:
+  compatible:
+const: riscv,idle-state
+
+  riscv,sbi-suspend-param:
+$ref: /schemas/types.yaml#/definitions/uint32
+description: |
+  suspend_type parameter to pass to the SBI HSM suspend call. For
+  more details on this parameter SBI specifiation v0.3 (or higher).
+
+  local-timer-stop:
+description:
+  If present the CPU local timer control logic is lost on state
+  entry, otherwise it is retained.
+type: boolean
+
+  entry-latency-us:
+description:
+  Worst case latency in microseconds required to enter the idle state.
+
+  exit-latency-us:
+description:
+  Worst case latency in microseconds required to exit the idle state.
+  The exit-latency-us duration may be guaranteed only after
+  entry-latency-us has passed.
+
+  min-residency-us:
+description:
+  Minimum residency duration in microseconds, inclusive of preparation
+  and entry, for this idle state to be considered worthwhile energy
+  wise (refer to section 2 of this document for a complete 
description).
+
+  wakeup-latency-us:
+description: |
+  Maximum delay between the signaling of a wake-up event and the CPU
+  being able to execute normal code again. If omitted, this is assumed
+  to be equal to:
+
+entry-latency-us + exit-latency-us
+
+  It is important to supply this value on systems where the duration
+  of PREP phase (see diagram 1, section 2) is non-neglibigle. In such
+  systems entry-latency-us + exit-latency-us will exceed
+  wakeup-latency-us by this duration.
+
+  idle-state-name:
+$ref: /schemas/

[RFC PATCH v2 6/8] cpuidle: Add RISC-V SBI CPU idle driver

2021-03-16 Thread Anup Patel
The RISC-V SBI HSM extension provides HSM suspend call which can
be used by Linux RISC-V to enter platform specific low-power state.

This patch adds a CPU idle driver based on RISC-V SBI calls which
will populate idle states from device tree and use SBI calls to
entry these idle states.

Signed-off-by: Anup Patel 
---
 MAINTAINERS   |   8 +
 drivers/cpuidle/Kconfig   |   5 +
 drivers/cpuidle/Kconfig.riscv |  15 +
 drivers/cpuidle/Makefile  |   4 +
 drivers/cpuidle/cpuidle-sbi.c | 502 ++
 5 files changed, 534 insertions(+)
 create mode 100644 drivers/cpuidle/Kconfig.riscv
 create mode 100644 drivers/cpuidle/cpuidle-sbi.c

diff --git a/MAINTAINERS b/MAINTAINERS
index aa84121c5611..4954112efdb4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4679,6 +4679,14 @@ S:   Supported
 F: drivers/cpuidle/cpuidle-psci.h
 F: drivers/cpuidle/cpuidle-psci-domain.c
 
+CPUIDLE DRIVER - RISC-V SBI
+M: Anup Patel 
+R: Sandeep Tripathy 
+L: linux...@vger.kernel.org
+L: linux-ri...@lists.infradead.org
+S: Supported
+F: drivers/cpuidle/cpuidle-sbi.c
+
 CRAMFS FILESYSTEM
 M: Nicolas Pitre 
 S: Maintained
diff --git a/drivers/cpuidle/Kconfig b/drivers/cpuidle/Kconfig
index f1afe7ab6b54..ff71dd662880 100644
--- a/drivers/cpuidle/Kconfig
+++ b/drivers/cpuidle/Kconfig
@@ -66,6 +66,11 @@ depends on PPC
 source "drivers/cpuidle/Kconfig.powerpc"
 endmenu
 
+menu "RISC-V CPU Idle Drivers"
+depends on RISCV
+source "drivers/cpuidle/Kconfig.riscv"
+endmenu
+
 config HALTPOLL_CPUIDLE
tristate "Halt poll cpuidle driver"
depends on X86 && KVM_GUEST
diff --git a/drivers/cpuidle/Kconfig.riscv b/drivers/cpuidle/Kconfig.riscv
new file mode 100644
index ..78518c26af74
--- /dev/null
+++ b/drivers/cpuidle/Kconfig.riscv
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# RISC-V CPU Idle drivers
+#
+
+config RISCV_SBI_CPUIDLE
+   bool "RISC-V SBI CPU idle Driver"
+   depends on RISCV_SBI
+   select DT_IDLE_STATES
+   select CPU_IDLE_MULTIPLE_DRIVERS
+   select DT_IDLE_GENPD if PM_GENERIC_DOMAINS_OF
+   help
+ Select this option to enable RISC-V SBI firmware based CPU idle
+ driver for RISC-V systems. This drivers also supports hierarchical
+ DT based layout of the idle state.
diff --git a/drivers/cpuidle/Makefile b/drivers/cpuidle/Makefile
index 11a26cef279f..a36922c18510 100644
--- a/drivers/cpuidle/Makefile
+++ b/drivers/cpuidle/Makefile
@@ -35,3 +35,7 @@ obj-$(CONFIG_MIPS_CPS_CPUIDLE)+= cpuidle-cps.o
 # POWERPC drivers
 obj-$(CONFIG_PSERIES_CPUIDLE)  += cpuidle-pseries.o
 obj-$(CONFIG_POWERNV_CPUIDLE)  += cpuidle-powernv.o
+
+###
+# RISC-V drivers
+obj-$(CONFIG_RISCV_SBI_CPUIDLE)+= cpuidle-sbi.o
diff --git a/drivers/cpuidle/cpuidle-sbi.c b/drivers/cpuidle/cpuidle-sbi.c
new file mode 100644
index ..47938fff61e1
--- /dev/null
+++ b/drivers/cpuidle/cpuidle-sbi.c
@@ -0,0 +1,502 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * RISC-V SBI CPU idle driver.
+ *
+ * Copyright (c) 2021 Western Digital Corporation or its affiliates.
+ */
+
+#define pr_fmt(fmt) "cpuidle-sbi: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "dt_idle_states.h"
+#include "dt_idle_genpd.h"
+
+struct sbi_cpuidle_data {
+   u32 *states;
+   struct device *dev;
+};
+
+struct sbi_domain_state {
+   bool available;
+   u32 state;
+};
+
+static DEFINE_PER_CPU_READ_MOSTLY(struct sbi_cpuidle_data, sbi_cpuidle_data);
+static DEFINE_PER_CPU(struct sbi_domain_state, domain_state);
+static bool sbi_cpuidle_use_osi;
+static bool sbi_cpuidle_use_cpuhp;
+static bool sbi_cpuidle_pd_allow_domain_state;
+
+static inline void sbi_set_domain_state(u32 state)
+{
+   struct sbi_domain_state *data = this_cpu_ptr(_state);
+
+   data->available = true;
+   data->state = state;
+}
+
+static inline u32 sbi_get_domain_state(void)
+{
+   struct sbi_domain_state *data = this_cpu_ptr(_state);
+
+   return data->state;
+}
+
+static inline void sbi_clear_domain_state(void)
+{
+   struct sbi_domain_state *data = this_cpu_ptr(_state);
+
+   data->available = false;
+}
+
+static inline bool sbi_is_domain_state_available(void)
+{
+   struct sbi_domain_state *data = this_cpu_ptr(_state);
+
+   return data->available;
+}
+
+static int sbi_suspend_finisher(unsigned long suspend_type,
+   unsigned long resume_addr,
+   unsigned long opaque)
+{
+   struct sbiret ret;
+
+   ret = sbi_ecall(SBI_EXT_HSM, SBI_EXT_HSM_HART_SUSPEND,
+   

[RFC PATCH v2 5/8] cpuidle: Factor-out power domain related code from PSCI domain driver

2021-03-16 Thread Anup Patel
The generic power domain related code in PSCI domain driver is largely
independent of PSCI and can be shared with RISC-V SBI domain driver
hence we factor-out this code into dt_idle_genpd.c and dt_idle_genpd.h.

Signed-off-by: Anup Patel 
---
 drivers/cpuidle/Kconfig   |   4 +
 drivers/cpuidle/Kconfig.arm   |   1 +
 drivers/cpuidle/Makefile  |   1 +
 drivers/cpuidle/cpuidle-psci-domain.c | 244 +-
 drivers/cpuidle/cpuidle-psci.h|  15 +-
 ...{cpuidle-psci-domain.c => dt_idle_genpd.c} | 165 
 drivers/cpuidle/dt_idle_genpd.h   |  42 +++
 7 files changed, 121 insertions(+), 351 deletions(-)
 copy drivers/cpuidle/{cpuidle-psci-domain.c => dt_idle_genpd.c} (52%)
 create mode 100644 drivers/cpuidle/dt_idle_genpd.h

diff --git a/drivers/cpuidle/Kconfig b/drivers/cpuidle/Kconfig
index c0aeedd66f02..f1afe7ab6b54 100644
--- a/drivers/cpuidle/Kconfig
+++ b/drivers/cpuidle/Kconfig
@@ -47,6 +47,10 @@ config CPU_IDLE_GOV_HALTPOLL
 config DT_IDLE_STATES
bool
 
+config DT_IDLE_GENPD
+   depends on PM_GENERIC_DOMAINS_OF
+   bool
+
 menu "ARM CPU Idle Drivers"
 depends on ARM || ARM64
 source "drivers/cpuidle/Kconfig.arm"
diff --git a/drivers/cpuidle/Kconfig.arm b/drivers/cpuidle/Kconfig.arm
index 0844fadc4be8..1007435ae298 100644
--- a/drivers/cpuidle/Kconfig.arm
+++ b/drivers/cpuidle/Kconfig.arm
@@ -27,6 +27,7 @@ config ARM_PSCI_CPUIDLE_DOMAIN
bool "PSCI CPU idle Domain"
depends on ARM_PSCI_CPUIDLE
depends on PM_GENERIC_DOMAINS_OF
+   select DT_IDLE_GENPD
default y
help
  Select this to enable the PSCI based CPUidle driver to use PM domains,
diff --git a/drivers/cpuidle/Makefile b/drivers/cpuidle/Makefile
index 26bbc5e74123..11a26cef279f 100644
--- a/drivers/cpuidle/Makefile
+++ b/drivers/cpuidle/Makefile
@@ -6,6 +6,7 @@
 obj-y += cpuidle.o driver.o governor.o sysfs.o governors/
 obj-$(CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED) += coupled.o
 obj-$(CONFIG_DT_IDLE_STATES) += dt_idle_states.o
+obj-$(CONFIG_DT_IDLE_GENPD)  += dt_idle_genpd.o
 obj-$(CONFIG_ARCH_HAS_CPU_RELAX) += poll_state.o
 obj-$(CONFIG_HALTPOLL_CPUIDLE)   += cpuidle-haltpoll.o
 
diff --git a/drivers/cpuidle/cpuidle-psci-domain.c 
b/drivers/cpuidle/cpuidle-psci-domain.c
index ff2c3f8e4668..b0621d890ab7 100644
--- a/drivers/cpuidle/cpuidle-psci-domain.c
+++ b/drivers/cpuidle/cpuidle-psci-domain.c
@@ -16,17 +16,9 @@
 #include 
 #include 
 #include 
-#include 
-#include 
 
 #include "cpuidle-psci.h"
 
-struct psci_pd_provider {
-   struct list_head link;
-   struct device_node *node;
-};
-
-static LIST_HEAD(psci_pd_providers);
 static bool psci_pd_allow_domain_state;
 
 static int psci_pd_power_off(struct generic_pm_domain *pd)
@@ -47,178 +39,6 @@ static int psci_pd_power_off(struct generic_pm_domain *pd)
return 0;
 }
 
-static int psci_pd_parse_state_nodes(struct genpd_power_state *states,
-int state_count)
-{
-   int i, ret;
-   u32 psci_state, *psci_state_buf;
-
-   for (i = 0; i < state_count; i++) {
-   ret = psci_dt_parse_state_node(to_of_node(states[i].fwnode),
-   _state);
-   if (ret)
-   goto free_state;
-
-   psci_state_buf = kmalloc(sizeof(u32), GFP_KERNEL);
-   if (!psci_state_buf) {
-   ret = -ENOMEM;
-   goto free_state;
-   }
-   *psci_state_buf = psci_state;
-   states[i].data = psci_state_buf;
-   }
-
-   return 0;
-
-free_state:
-   i--;
-   for (; i >= 0; i--)
-   kfree(states[i].data);
-   return ret;
-}
-
-static int psci_pd_parse_states(struct device_node *np,
-   struct genpd_power_state **states, int *state_count)
-{
-   int ret;
-
-   /* Parse the domain idle states. */
-   ret = of_genpd_parse_idle_states(np, states, state_count);
-   if (ret)
-   return ret;
-
-   /* Fill out the PSCI specifics for each found state. */
-   ret = psci_pd_parse_state_nodes(*states, *state_count);
-   if (ret)
-   kfree(*states);
-
-   return ret;
-}
-
-static void psci_pd_free_states(struct genpd_power_state *states,
-   unsigned int state_count)
-{
-   int i;
-
-   for (i = 0; i < state_count; i++)
-   kfree(states[i].data);
-   kfree(states);
-}
-
-static int psci_pd_init(struct device_node *np, bool use_osi)
-{
-   struct generic_pm_domain *pd;
-   struct psci_pd_provider *pd_provider;
-   struct dev_power_governor *pd_gov;
-   struct genpd_power_state *states = NULL;
-   int ret = -ENOMEM, state_count = 0;
-
-   pd = kzalloc(sizeof(*pd),

[RFC PATCH v2 4/8] RISC-V: Add SBI HSM suspend related defines

2021-03-16 Thread Anup Patel
We add defines related to SBI HSM suspend call and also
update HSM states naming as-per latest SBI specification.

Signed-off-by: Anup Patel 
---
 arch/riscv/include/asm/sbi.h| 27 ++-
 arch/riscv/kernel/cpu_ops_sbi.c |  2 +-
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index 79fa9f28b786..4bdccec77a84 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -62,15 +62,32 @@ enum sbi_ext_hsm_fid {
SBI_EXT_HSM_HART_START = 0,
SBI_EXT_HSM_HART_STOP,
SBI_EXT_HSM_HART_STATUS,
+   SBI_EXT_HSM_HART_SUSPEND,
 };
 
-enum sbi_hsm_hart_status {
-   SBI_HSM_HART_STATUS_STARTED = 0,
-   SBI_HSM_HART_STATUS_STOPPED,
-   SBI_HSM_HART_STATUS_START_PENDING,
-   SBI_HSM_HART_STATUS_STOP_PENDING,
+enum sbi_hsm_hart_state {
+   SBI_HSM_STATE_STARTED = 0,
+   SBI_HSM_STATE_STOPPED,
+   SBI_HSM_STATE_START_PENDING,
+   SBI_HSM_STATE_STOP_PENDING,
+   SBI_HSM_STATE_SUSPENDED,
+   SBI_HSM_STATE_SUSPEND_PENDING,
+   SBI_HSM_STATE_RESUME_PENDING,
 };
 
+#define SBI_HSM_SUSP_BASE_MASK 0x7fff
+#define SBI_HSM_SUSP_NON_RET_BIT   0x8000
+#define SBI_HSM_SUSP_PLAT_BASE 0x1000
+
+#define SBI_HSM_SUSPEND_RET_DEFAULT0x
+#define SBI_HSM_SUSPEND_RET_PLATFORM   SBI_HSM_SUSP_PLAT_BASE
+#define SBI_HSM_SUSPEND_RET_LAST   SBI_HSM_SUSP_BASE_MASK
+#define SBI_HSM_SUSPEND_NON_RET_DEFAULTSBI_HSM_SUSP_NON_RET_BIT
+#define SBI_HSM_SUSPEND_NON_RET_PLATFORM   (SBI_HSM_SUSP_NON_RET_BIT | \
+SBI_HSM_SUSP_PLAT_BASE)
+#define SBI_HSM_SUSPEND_NON_RET_LAST   (SBI_HSM_SUSP_NON_RET_BIT | \
+SBI_HSM_SUSP_BASE_MASK)
+
 enum sbi_ext_srst_fid {
SBI_EXT_SRST_RESET = 0,
 };
diff --git a/arch/riscv/kernel/cpu_ops_sbi.c b/arch/riscv/kernel/cpu_ops_sbi.c
index 685fae72b7f5..5fd90f03a3e9 100644
--- a/arch/riscv/kernel/cpu_ops_sbi.c
+++ b/arch/riscv/kernel/cpu_ops_sbi.c
@@ -97,7 +97,7 @@ static int sbi_cpu_is_stopped(unsigned int cpuid)
 
rc = sbi_hsm_hart_get_status(hartid);
 
-   if (rc == SBI_HSM_HART_STATUS_STOPPED)
+   if (rc == SBI_HSM_STATE_STOPPED)
return 0;
return rc;
 }
-- 
2.25.1



[RFC PATCH v2 3/8] RISC-V: Add arch functions for non-retentive suspend entry/exit

2021-03-16 Thread Anup Patel
The hart registers and CSRs are not preserved in non-retentative
suspend state so we provide arch specific helper functions which
will save/restore hart context upon entry/exit to non-retentive
suspend state. These helper functions can be used by cpuidle
drivers for non-retentive suspend entry/exit.

Signed-off-by: Anup Patel 
---
 arch/riscv/include/asm/suspend.h  |  35 +
 arch/riscv/kernel/Makefile|   2 +
 arch/riscv/kernel/asm-offsets.c   |   3 +
 arch/riscv/kernel/suspend.c   |  86 ++
 arch/riscv/kernel/suspend_entry.S | 116 ++
 5 files changed, 242 insertions(+)
 create mode 100644 arch/riscv/include/asm/suspend.h
 create mode 100644 arch/riscv/kernel/suspend.c
 create mode 100644 arch/riscv/kernel/suspend_entry.S

diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
new file mode 100644
index ..63e9f434fb89
--- /dev/null
+++ b/arch/riscv/include/asm/suspend.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2021 Western Digital Corporation or its affiliates.
+ */
+
+#ifndef _ASM_RISCV_SUSPEND_H
+#define _ASM_RISCV_SUSPEND_H
+
+#include 
+
+struct suspend_context {
+   /* Saved and restored by low-level functions */
+   struct pt_regs regs;
+   /* Saved and restored by high-level functions */
+   unsigned long scratch;
+   unsigned long tvec;
+   unsigned long ie;
+#ifdef CONFIG_MMU
+   unsigned long satp;
+#endif
+};
+
+/* Low-level CPU suspend entry function */
+int __cpu_suspend_enter(struct suspend_context *context);
+
+/* High-level CPU suspend which will save context and call finish() */
+int cpu_suspend(unsigned long arg,
+   int (*finish)(unsigned long arg,
+ unsigned long entry,
+ unsigned long context));
+
+/* Low-level CPU resume entry function */
+int __cpu_resume_enter(unsigned long hartid, unsigned long context);
+
+#endif
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index 3dc0abde988a..b9b1b05ab860 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -42,6 +42,8 @@ obj-$(CONFIG_SMP) += cpu_ops_spinwait.o
 obj-$(CONFIG_MODULES)  += module.o
 obj-$(CONFIG_MODULE_SECTIONS)  += module-sections.o
 
+obj-$(CONFIG_CPU_PM)   += suspend_entry.o suspend.o
+
 obj-$(CONFIG_FUNCTION_TRACER)  += mcount.o ftrace.o
 obj-$(CONFIG_DYNAMIC_FTRACE)   += mcount-dyn.o
 
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index 9ef33346853c..2628dfd0f77d 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 void asm_offsets(void);
 
@@ -111,6 +112,8 @@ void asm_offsets(void)
OFFSET(PT_BADADDR, pt_regs, badaddr);
OFFSET(PT_CAUSE, pt_regs, cause);
 
+   OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
+
/*
 * THREAD_{F,X}* might be larger than a S-type offset can handle, but
 * these are used in performance-sensitive assembly so we can't resort
diff --git a/arch/riscv/kernel/suspend.c b/arch/riscv/kernel/suspend.c
new file mode 100644
index ..49dddec30e99
--- /dev/null
+++ b/arch/riscv/kernel/suspend.c
@@ -0,0 +1,86 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2021 Western Digital Corporation or its affiliates.
+ */
+
+#include 
+#include 
+#include 
+
+static void suspend_save_csrs(struct suspend_context *context)
+{
+   context->scratch = csr_read(CSR_SCRATCH);
+   context->tvec = csr_read(CSR_TVEC);
+   context->ie = csr_read(CSR_IE);
+
+   /*
+* No need to save/restore IP CSR (i.e. MIP or SIP) because:
+*
+* 1. For no-MMU (M-mode) kernel, the bits in MIP are set by
+*external devices (such as interrupt controller, timer, etc).
+* 2. For MMU (S-mode) kernel, the bits in SIP are set by
+*M-mode firmware and external devices (such as interrupt
+*controller, etc).
+*/
+
+#ifdef CONFIG_MMU
+   context->satp = csr_read(CSR_SATP);
+#endif
+}
+
+static void suspend_restore_csrs(struct suspend_context *context)
+{
+   csr_write(CSR_SCRATCH, context->scratch);
+   csr_write(CSR_TVEC, context->tvec);
+   csr_write(CSR_IE, context->ie);
+
+#ifdef CONFIG_MMU
+   csr_write(CSR_SATP, context->satp);
+#endif
+}
+
+int cpu_suspend(unsigned long arg,
+   int (*finish)(unsigned long arg,
+ unsigned long entry,
+ unsigned long context))
+{
+   int rc = 0;
+   struct suspend_context context = { 0 };
+
+   /* Finisher should be non-NULL */
+   if (!finish)
+   return -EINVAL;
+
+   /* Save additional CSRs*/
+   suspend_save_csrs();
+
+   /*
+* Function graph tracer sta

[RFC PATCH v2 2/8] RISC-V: Rename relocate() and make it global

2021-03-16 Thread Anup Patel
The low-level relocate() function enables mmu and relocates
execution to link-time addresses. We rename relocate() function
to relocate_enable_mmu() function which is more informative.

Also, the relocate_enable_mmu() function will be used in the
resume path when a CPU wakes-up from a non-retentive suspend
so we make it global symbol.

Signed-off-by: Anup Patel 
---
 arch/riscv/kernel/head.S | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
index f5a9bad86e58..9d10f89e8ab7 100644
--- a/arch/riscv/kernel/head.S
+++ b/arch/riscv/kernel/head.S
@@ -67,7 +67,8 @@ pe_head_start:
 
 .align 2
 #ifdef CONFIG_MMU
-relocate:
+   .global relocate_enable_mmu
+relocate_enable_mmu:
/* Relocate return address */
li a1, PAGE_OFFSET
la a2, _start
@@ -156,7 +157,7 @@ secondary_start_common:
 #ifdef CONFIG_MMU
/* Enable virtual memory and relocate to virtual address */
la a0, swapper_pg_dir
-   call relocate
+   call relocate_enable_mmu
 #endif
call setup_trap_vector
tail smp_callin
@@ -268,7 +269,7 @@ clear_bss_done:
call setup_vm
 #ifdef CONFIG_MMU
la a0, early_pg_dir
-   call relocate
+   call relocate_enable_mmu
 #endif /* CONFIG_MMU */
 
call setup_trap_vector
-- 
2.25.1



[RFC PATCH v2 1/8] RISC-V: Enable CPU_IDLE drivers

2021-03-16 Thread Anup Patel
We force select CPU_PM and provide asm/cpuidle.h so that we can
use CPU IDLE drivers for Linux RISC-V kernel.

Signed-off-by: Anup Patel 
---
 arch/riscv/Kconfig|  7 +++
 arch/riscv/configs/defconfig  |  7 +++
 arch/riscv/configs/rv32_defconfig |  4 ++--
 arch/riscv/include/asm/cpuidle.h  | 24 
 arch/riscv/kernel/process.c   |  3 ++-
 5 files changed, 38 insertions(+), 7 deletions(-)
 create mode 100644 arch/riscv/include/asm/cpuidle.h

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 468642c4e92f..19c9ae909001 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -37,6 +37,7 @@ config RISCV
select CLONE_BACKWARDS
select CLINT_TIMER if !MMU
select COMMON_CLK
+   select CPU_PM if CPU_IDLE
select EDAC_SUPPORT
select GENERIC_ARCH_TOPOLOGY if SMP
select GENERIC_ATOMIC64 if !64BIT
@@ -475,4 +476,10 @@ source "kernel/power/Kconfig"
 
 endmenu
 
+menu "CPU Power Management"
+
+source "drivers/cpuidle/Kconfig"
+
+endmenu
+
 source "drivers/firmware/Kconfig"
diff --git a/arch/riscv/configs/defconfig b/arch/riscv/configs/defconfig
index 6c0625aa96c7..dc4927c0e44b 100644
--- a/arch/riscv/configs/defconfig
+++ b/arch/riscv/configs/defconfig
@@ -13,11 +13,13 @@ CONFIG_USER_NS=y
 CONFIG_CHECKPOINT_RESTORE=y
 CONFIG_BLK_DEV_INITRD=y
 CONFIG_EXPERT=y
+# CONFIG_SYSFS_SYSCALL is not set
 CONFIG_BPF_SYSCALL=y
 CONFIG_SOC_SIFIVE=y
 CONFIG_SOC_VIRT=y
 CONFIG_SMP=y
 CONFIG_HOTPLUG_CPU=y
+CONFIG_CPU_IDLE=y
 CONFIG_JUMP_LABEL=y
 CONFIG_MODULES=y
 CONFIG_MODULE_UNLOAD=y
@@ -65,10 +67,9 @@ CONFIG_HW_RANDOM=y
 CONFIG_HW_RANDOM_VIRTIO=y
 CONFIG_SPI=y
 CONFIG_SPI_SIFIVE=y
+# CONFIG_PTP_1588_CLOCK is not set
 CONFIG_GPIOLIB=y
 CONFIG_GPIO_SIFIVE=y
-# CONFIG_PTP_1588_CLOCK is not set
-CONFIG_POWER_RESET=y
 CONFIG_DRM=y
 CONFIG_DRM_RADEON=y
 CONFIG_DRM_VIRTIO_GPU=y
@@ -132,5 +133,3 @@ CONFIG_DEBUG_BLOCK_EXT_DEVT=y
 # CONFIG_FTRACE is not set
 # CONFIG_RUNTIME_TESTING_MENU is not set
 CONFIG_MEMTEST=y
-# CONFIG_SYSFS_SYSCALL is not set
-CONFIG_EFI=y
diff --git a/arch/riscv/configs/rv32_defconfig 
b/arch/riscv/configs/rv32_defconfig
index 8dd02b842fef..332e43a4a2c3 100644
--- a/arch/riscv/configs/rv32_defconfig
+++ b/arch/riscv/configs/rv32_defconfig
@@ -13,12 +13,14 @@ CONFIG_USER_NS=y
 CONFIG_CHECKPOINT_RESTORE=y
 CONFIG_BLK_DEV_INITRD=y
 CONFIG_EXPERT=y
+# CONFIG_SYSFS_SYSCALL is not set
 CONFIG_BPF_SYSCALL=y
 CONFIG_SOC_SIFIVE=y
 CONFIG_SOC_VIRT=y
 CONFIG_ARCH_RV32I=y
 CONFIG_SMP=y
 CONFIG_HOTPLUG_CPU=y
+CONFIG_CPU_IDLE=y
 CONFIG_JUMP_LABEL=y
 CONFIG_MODULES=y
 CONFIG_MODULE_UNLOAD=y
@@ -67,7 +69,6 @@ CONFIG_HW_RANDOM_VIRTIO=y
 CONFIG_SPI=y
 CONFIG_SPI_SIFIVE=y
 # CONFIG_PTP_1588_CLOCK is not set
-CONFIG_POWER_RESET=y
 CONFIG_DRM=y
 CONFIG_DRM_RADEON=y
 CONFIG_DRM_VIRTIO_GPU=y
@@ -131,4 +132,3 @@ CONFIG_DEBUG_BLOCK_EXT_DEVT=y
 # CONFIG_FTRACE is not set
 # CONFIG_RUNTIME_TESTING_MENU is not set
 CONFIG_MEMTEST=y
-# CONFIG_SYSFS_SYSCALL is not set
diff --git a/arch/riscv/include/asm/cpuidle.h b/arch/riscv/include/asm/cpuidle.h
new file mode 100644
index ..71fdc607d4bc
--- /dev/null
+++ b/arch/riscv/include/asm/cpuidle.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2021 Allwinner Ltd
+ * Copyright (C) 2021 Western Digital Corporation or its affiliates.
+ */
+
+#ifndef _ASM_RISCV_CPUIDLE_H
+#define _ASM_RISCV_CPUIDLE_H
+
+#include 
+#include 
+
+static inline void cpu_do_idle(void)
+{
+   /*
+* Add mb() here to ensure that all
+* IO/MEM accesses are completed prior
+* to entering WFI.
+*/
+   mb();
+   wait_for_interrupt();
+}
+
+#endif
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 6f728e731bed..dd2ef18517f4 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 register unsigned long gp_in_global __asm__("gp");
 
@@ -36,7 +37,7 @@ extern asmlinkage void ret_from_kernel_thread(void);
 
 void arch_cpu_idle(void)
 {
-   wait_for_interrupt();
+   cpu_do_idle();
raw_local_irq_enable();
 }
 
-- 
2.25.1



[RFC PATCH v2 0/8] RISC-V CPU Idle Support

2021-03-16 Thread Anup Patel
This series adds RISC-V CPU Idle support using SBI HSM suspend function.
The RISC-V SBI CPU idle driver added by this series is highly inspired
from the ARM PSCI CPU idle driver.

At high-level, this series includes the following changes:
1) Preparatory arch/riscv patches (Patches 1 to 3)
2) Defines for RISC-V SBI HSM suspend (Patch 4)
3) Preparatory patch to share code between RISC-V SBI CPU idle driver
   and ARM PSCI CPU idle driver (Patch 5)
4) RISC-V SBI CPU idle driver and related DT bindings (Patches 6 to 7)

These patches can be found in riscv_sbi_hsm_suspend_v2 branch at
https://github.com/avpatel/linux

Special thanks Sandeep Tripathy for providing early feeback on SBI HSM
support in all above projects (RISC-V SBI specification, OpenSBI, and
Linux RISC-V).

Changes since v1:
 - Fixex minor typo in PATCH1
 - Use just "idle-states" as DT node name for CPU idle states
 - Added documentation for "cpu-idle-states" DT property in
   devicetree/bindings/riscv/cpus.yaml
 - Added documentation for "riscv,sbi-suspend-param" DT property in
   devicetree/bindings/riscv/idle-states.yaml

Anup Patel (8):
  RISC-V: Enable CPU_IDLE drivers
  RISC-V: Rename relocate() and make it global
  RISC-V: Add arch functions for non-retentive suspend entry/exit
  RISC-V: Add SBI HSM suspend related defines
  cpuidle: Factor-out power domain related code from PSCI domain driver
  cpuidle: Add RISC-V SBI CPU idle driver
  dt-bindings: Add bindings documentation for RISC-V idle states
  RISC-V: Enable RISC-V SBI CPU Idle driver for QEMU virt machine

 .../devicetree/bindings/riscv/cpus.yaml   |   6 +
 .../bindings/riscv/idle-states.yaml   | 256 +
 MAINTAINERS   |   8 +
 arch/riscv/Kconfig|   7 +
 arch/riscv/Kconfig.socs   |   3 +
 arch/riscv/configs/defconfig  |   8 +-
 arch/riscv/configs/rv32_defconfig |   5 +-
 arch/riscv/include/asm/cpuidle.h  |  24 +
 arch/riscv/include/asm/sbi.h  |  27 +-
 arch/riscv/include/asm/suspend.h  |  35 ++
 arch/riscv/kernel/Makefile|   2 +
 arch/riscv/kernel/asm-offsets.c   |   3 +
 arch/riscv/kernel/cpu_ops_sbi.c   |   2 +-
 arch/riscv/kernel/head.S  |   7 +-
 arch/riscv/kernel/process.c   |   3 +-
 arch/riscv/kernel/suspend.c   |  86 +++
 arch/riscv/kernel/suspend_entry.S | 116 
 drivers/cpuidle/Kconfig   |   9 +
 drivers/cpuidle/Kconfig.arm   |   1 +
 drivers/cpuidle/Kconfig.riscv |  15 +
 drivers/cpuidle/Makefile  |   5 +
 drivers/cpuidle/cpuidle-psci-domain.c | 244 +
 drivers/cpuidle/cpuidle-psci.h|  15 +-
 drivers/cpuidle/cpuidle-sbi.c | 502 ++
 ...{cpuidle-psci-domain.c => dt_idle_genpd.c} | 165 ++
 drivers/cpuidle/dt_idle_genpd.h   |  42 ++
 26 files changed, 1229 insertions(+), 367 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/riscv/idle-states.yaml
 create mode 100644 arch/riscv/include/asm/cpuidle.h
 create mode 100644 arch/riscv/include/asm/suspend.h
 create mode 100644 arch/riscv/kernel/suspend.c
 create mode 100644 arch/riscv/kernel/suspend_entry.S
 create mode 100644 drivers/cpuidle/Kconfig.riscv
 create mode 100644 drivers/cpuidle/cpuidle-sbi.c
 copy drivers/cpuidle/{cpuidle-psci-domain.c => dt_idle_genpd.c} (52%)
 create mode 100644 drivers/cpuidle/dt_idle_genpd.h

-- 
2.25.1



Re: [PATCH] Insert SFENCE.VMA in function set_pte_at for RISCV

2021-03-16 Thread Anup Patel
On Tue, Mar 16, 2021 at 1:59 PM Andrew Waterman
 wrote:
>
> On Tue, Mar 16, 2021 at 12:32 AM Anup Patel  wrote:
> >
> > On Tue, Mar 16, 2021 at 12:27 PM Jiuyang Liu  wrote:
> > >
> > > > As per my understanding, we don't need to explicitly invalidate local 
> > > > TLB
> > > > in set_pte() or set_pet_at() because generic Linux page table management
> > > > (/mm/*) will call the appropriate flush_tlb_xyz() function after 
> > > > page
> > > > table updates.
> > >
> > > I witnessed this bug in our micro-architecture: set_pte instruction is
> > > still in the store buffer, no functions are inserting SFENCE.VMA in
> > > the stack below, so TLB cannot witness this modification.
> > > Here is my call stack:
> > > set_pte
> > > set_pte_at
> > > map_vm_area
> > > __vmalloc_area_node
> > > __vmalloc_node_range
> > > __vmalloc_node
> > > __vmalloc_node_flags
> > > vzalloc
> > > n_tty_open
> > >
> > > I think this is an architecture specific code, so /mm/* should
> > > not be modified.
> > > And spec requires SFENCE.VMA to be inserted on each modification to
> > > TLB. So I added code here.
> >
> > The generic linux/mm/* already calls the appropriate tlb_flush_xyz()
> > function defined in arch/riscv/include/asm/tlbflush.h
> >
> > Better to have a write-barrier in set_pte().
> >
> > >
> > > > Also, just local TLB flush is generally not sufficient because
> > > > a lot of page tables will be used across on multiple HARTs.
> > >
> > > Yes, this is the biggest issue, in RISC-V Volume 2, Privileged Spec v.
> > > 20190608 page 67 gave a solution:
> >
> > This is not an issue with RISC-V privilege spec rather it is more about
> > placing RISC-V fences at right locations.
> >
> > > Consequently, other harts must be notified separately when the
> > > memory-management data structures have been modified. One approach is
> > > to use
> > > 1) a local data fence to ensure local writes are visible globally,
> > > then 2) an interprocessor interrupt to the other thread,
> > > then 3) a local SFENCE.VMA in the interrupt handler of the remote thread,
> > > and finally 4) signal back to originating thread that operation is
> > > complete. This is, of course, the RISC-V analog to a TLB shootdown.
> >
> > I would suggest trying approach#1.
> >
> > You can include "asm/barrier.h" here and use wmb() or __smp_wmb()
> > in-place of local TLB flush.
>
> wmb() doesn't suffice to order older stores before younger page-table
> walks, so that might hide the problem without actually fixing it.

If we assume page-table walks as reads then mb() might be more
suitable in this case ??

ARM64 also has an explicit barrier in set_pte() implementation. They are
doing "dsb(ishst); isb()" which is an inner-shareable store barrier followed
by an instruction barrier.

>
> Based upon Jiuyang's description, it does sound plausible that we are
> missing an SFENCE.VMA (or TLB shootdown) somewhere.  But I don't
> understand the situation well enough to know where that might be, or
> what the best fix is.

Yes, I agree but set_pte() doesn't seem to be the right place for TLB
shootdown based on set_pte() implementations of other architectures.

Regards,
Anup

>
>
> >
> > >
> > > In general, this patch didn't handle the G bit in PTE, kernel trap it
> > > to sbi_remote_sfence_vma. do you think I should use flush_tlb_all?
> > >
> > > Jiuyang
> > >
> > >
> > >
> > >
> > > arch/arm/mm/mmu.c
> > > void set_pte_at(struct mm_struct *mm, unsigned long addr,
> > >   pte_t *ptep, pte_t pteval)
> > > {
> > > unsigned long ext = 0;
> > >
> > > if (addr < TASK_SIZE && pte_valid_user(pteval)) {
> > > if (!pte_special(pteval))
> > > __sync_icache_dcache(pteval);
> > > ext |= PTE_EXT_NG;
> > > }
> > >
> > > set_pte_ext(ptep, pteval, ext);
> > > }
> > >
> > > arch/mips/include/asm/pgtable.h
> > > static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
> > >   pte_t *ptep, pte_t pteval)
> > > {
> > >
> > > if (!pte_present(pteval))
> > > goto cache_sync_done;
> > &g

Re: [PATCH] Insert SFENCE.VMA in function set_pte_at for RISCV

2021-03-16 Thread Anup Patel
On Tue, Mar 16, 2021 at 12:27 PM Jiuyang Liu  wrote:
>
> > As per my understanding, we don't need to explicitly invalidate local TLB
> > in set_pte() or set_pet_at() because generic Linux page table management
> > (/mm/*) will call the appropriate flush_tlb_xyz() function after page
> > table updates.
>
> I witnessed this bug in our micro-architecture: set_pte instruction is
> still in the store buffer, no functions are inserting SFENCE.VMA in
> the stack below, so TLB cannot witness this modification.
> Here is my call stack:
> set_pte
> set_pte_at
> map_vm_area
> __vmalloc_area_node
> __vmalloc_node_range
> __vmalloc_node
> __vmalloc_node_flags
> vzalloc
> n_tty_open
>
> I think this is an architecture specific code, so /mm/* should
> not be modified.
> And spec requires SFENCE.VMA to be inserted on each modification to
> TLB. So I added code here.

The generic linux/mm/* already calls the appropriate tlb_flush_xyz()
function defined in arch/riscv/include/asm/tlbflush.h

Better to have a write-barrier in set_pte().

>
> > Also, just local TLB flush is generally not sufficient because
> > a lot of page tables will be used across on multiple HARTs.
>
> Yes, this is the biggest issue, in RISC-V Volume 2, Privileged Spec v.
> 20190608 page 67 gave a solution:

This is not an issue with RISC-V privilege spec rather it is more about
placing RISC-V fences at right locations.

> Consequently, other harts must be notified separately when the
> memory-management data structures have been modified. One approach is
> to use
> 1) a local data fence to ensure local writes are visible globally,
> then 2) an interprocessor interrupt to the other thread,
> then 3) a local SFENCE.VMA in the interrupt handler of the remote thread,
> and finally 4) signal back to originating thread that operation is
> complete. This is, of course, the RISC-V analog to a TLB shootdown.

I would suggest trying approach#1.

You can include "asm/barrier.h" here and use wmb() or __smp_wmb()
in-place of local TLB flush.

>
> In general, this patch didn't handle the G bit in PTE, kernel trap it
> to sbi_remote_sfence_vma. do you think I should use flush_tlb_all?
>
> Jiuyang
>
>
>
>
> arch/arm/mm/mmu.c
> void set_pte_at(struct mm_struct *mm, unsigned long addr,
>   pte_t *ptep, pte_t pteval)
> {
> unsigned long ext = 0;
>
> if (addr < TASK_SIZE && pte_valid_user(pteval)) {
> if (!pte_special(pteval))
> __sync_icache_dcache(pteval);
> ext |= PTE_EXT_NG;
> }
>
> set_pte_ext(ptep, pteval, ext);
> }
>
> arch/mips/include/asm/pgtable.h
> static inline void set_pte_at(struct mm_struct *mm, unsigned long addr,
>   pte_t *ptep, pte_t pteval)
> {
>
> if (!pte_present(pteval))
> goto cache_sync_done;
>
> if (pte_present(*ptep) && (pte_pfn(*ptep) == pte_pfn(pteval)))
> goto cache_sync_done;
>
>     __update_cache(addr, pteval);
> cache_sync_done:
> set_pte(ptep, pteval);
> }
>
>
> Also, just local TLB flush is generally not sufficient because
> > a lot of page tables will be used accross on multiple HARTs.
>
>
> On Tue, Mar 16, 2021 at 5:05 AM Anup Patel  wrote:
> >
> > +Alex
> >
> > On Tue, Mar 16, 2021 at 9:20 AM Jiuyang Liu  wrote:
> > >
> > > This patch inserts SFENCE.VMA after modifying PTE based on RISC-V
> > > specification.
> > >
> > > arch/riscv/include/asm/pgtable.h:
> > > 1. implement pte_user, pte_global and pte_leaf to check correspond
> > > attribute of a pte_t.
> >
> > Adding pte_user(), pte_global(), and pte_leaf() is fine.
> >
> > >
> > > 2. insert SFENCE.VMA in set_pte_at based on RISC-V Volume 2, Privileged
> > > Spec v. 20190608 page 66 and 67:
> > > If software modifies a non-leaf PTE, it should execute SFENCE.VMA with
> > > rs1=x0. If any PTE along the traversal path had its G bit set, rs2 must
> > > be x0; otherwise, rs2 should be set to the ASID for which the
> > > translation is being modified.
> > > If software modifies a leaf PTE, it should execute SFENCE.VMA with rs1
> > > set to a virtual address within the page. If any PTE along the traversal
> > > path had its G bit set, rs2 must be x0; otherwise, rs2 should be set to
> > > the ASID for which the translation is being modified.
> > >
> > > arch/riscv/include/asm/tlbflush.h:
> > > 1. implement get_current_asid to get current program asid.
> > > 

Re: [PATCH] Insert SFENCE.VMA in function set_pte_at for RISCV

2021-03-15 Thread Anup Patel
+Alex

On Tue, Mar 16, 2021 at 9:20 AM Jiuyang Liu  wrote:
>
> This patch inserts SFENCE.VMA after modifying PTE based on RISC-V
> specification.
>
> arch/riscv/include/asm/pgtable.h:
> 1. implement pte_user, pte_global and pte_leaf to check correspond
> attribute of a pte_t.

Adding pte_user(), pte_global(), and pte_leaf() is fine.

>
> 2. insert SFENCE.VMA in set_pte_at based on RISC-V Volume 2, Privileged
> Spec v. 20190608 page 66 and 67:
> If software modifies a non-leaf PTE, it should execute SFENCE.VMA with
> rs1=x0. If any PTE along the traversal path had its G bit set, rs2 must
> be x0; otherwise, rs2 should be set to the ASID for which the
> translation is being modified.
> If software modifies a leaf PTE, it should execute SFENCE.VMA with rs1
> set to a virtual address within the page. If any PTE along the traversal
> path had its G bit set, rs2 must be x0; otherwise, rs2 should be set to
> the ASID for which the translation is being modified.
>
> arch/riscv/include/asm/tlbflush.h:
> 1. implement get_current_asid to get current program asid.
> 2. implement local_flush_tlb_asid to flush tlb with asid.

As per my understanding, we don't need to explicitly invalidate local TLB
in set_pte() or set_pet_at() because generic Linux page table management
(/mm/*) will call the appropriate flush_tlb_xyz() function after page
table updates. Also, just local TLB flush is generally not sufficient because
a lot of page tables will be used accross on multiple HARTs.

>
> Signed-off-by: Jiuyang Liu 
> ---
>  arch/riscv/include/asm/pgtable.h  | 27 +++
>  arch/riscv/include/asm/tlbflush.h | 12 
>  2 files changed, 39 insertions(+)
>
> diff --git a/arch/riscv/include/asm/pgtable.h 
> b/arch/riscv/include/asm/pgtable.h
> index ebf817c1bdf4..5a47c60372c1 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -222,6 +222,16 @@ static inline int pte_write(pte_t pte)
> return pte_val(pte) & _PAGE_WRITE;
>  }
>
> +static inline int pte_user(pte_t pte)
> +{
> +   return pte_val(pte) & _PAGE_USER;
> +}
> +
> +static inline int pte_global(pte_t pte)
> +{
> +   return pte_val(pte) & _PAGE_GLOBAL;
> +}
> +
>  static inline int pte_exec(pte_t pte)
>  {
> return pte_val(pte) & _PAGE_EXEC;
> @@ -248,6 +258,11 @@ static inline int pte_special(pte_t pte)
> return pte_val(pte) & _PAGE_SPECIAL;
>  }
>
> +static inline int pte_leaf(pte_t pte)
> +{
> +   return pte_val(pte) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC);
> +}
> +
>  /* static inline pte_t pte_rdprotect(pte_t pte) */
>
>  static inline pte_t pte_wrprotect(pte_t pte)
> @@ -358,6 +373,18 @@ static inline void set_pte_at(struct mm_struct *mm,
> flush_icache_pte(pteval);
>
> set_pte(ptep, pteval);
> +
> +   if (pte_present(pteval)) {
> +   if (pte_leaf(pteval)) {
> +   local_flush_tlb_page(addr);
> +   } else {
> +   if (pte_global(pteval))
> +   local_flush_tlb_all();
> +   else
> +   local_flush_tlb_asid();
> +
> +   }
> +   }
>  }
>
>  static inline void pte_clear(struct mm_struct *mm,
> diff --git a/arch/riscv/include/asm/tlbflush.h 
> b/arch/riscv/include/asm/tlbflush.h
> index 394cfbccdcd9..1f9b62b3670b 100644
> --- a/arch/riscv/include/asm/tlbflush.h
> +++ b/arch/riscv/include/asm/tlbflush.h
> @@ -21,6 +21,18 @@ static inline void local_flush_tlb_page(unsigned long addr)
>  {
> __asm__ __volatile__ ("sfence.vma %0" : : "r" (addr) : "memory");
>  }
> +
> +static inline unsigned long get_current_asid(void)
> +{
> +   return (csr_read(CSR_SATP) >> SATP_ASID_SHIFT) & SATP_ASID_MASK;
> +}
> +
> +static inline void local_flush_tlb_asid(void)
> +{
> +   unsigned long asid = get_current_asid();
> +   __asm__ __volatile__ ("sfence.vma x0, %0" : : "r" (asid) : "memory");
> +}
> +
>  #else /* CONFIG_MMU */
>  #define local_flush_tlb_all()  do { } while (0)
>  #define local_flush_tlb_page(addr) do { } while (0)
> --
> 2.30.2
>
>
> ___
> linux-riscv mailing list
> linux-ri...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

Regards,
Anup


[PATCH v6 2/2] RISC-V: Use SBI SRST extension when available

2021-03-15 Thread Anup Patel
The SBI SRST extension provides a standard way to poweroff and
reboot the system irrespective to whether Linux RISC-V S-mode
is running natively (HS-mode) or inside Guest/VM (VS-mode).

The SBI SRST extension is available in latest SBI v0.3-draft
specification at: https://github.com/riscv/riscv-sbi-doc.

This patch extends Linux RISC-V SBI implementation to detect
and use SBI SRST extension.

Signed-off-by: Anup Patel 
Reviewed-by: Atish Patra 
---
 arch/riscv/include/asm/sbi.h | 24 
 arch/riscv/kernel/sbi.c  | 35 +++
 2 files changed, 59 insertions(+)

diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index 99895d9c3bdd..79fa9f28b786 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -27,6 +27,7 @@ enum sbi_ext_id {
SBI_EXT_IPI = 0x735049,
SBI_EXT_RFENCE = 0x52464E43,
SBI_EXT_HSM = 0x48534D,
+   SBI_EXT_SRST = 0x53525354,
 };
 
 enum sbi_ext_base_fid {
@@ -70,6 +71,21 @@ enum sbi_hsm_hart_status {
SBI_HSM_HART_STATUS_STOP_PENDING,
 };
 
+enum sbi_ext_srst_fid {
+   SBI_EXT_SRST_RESET = 0,
+};
+
+enum sbi_srst_reset_type {
+   SBI_SRST_RESET_TYPE_SHUTDOWN = 0,
+   SBI_SRST_RESET_TYPE_COLD_REBOOT,
+   SBI_SRST_RESET_TYPE_WARM_REBOOT,
+};
+
+enum sbi_srst_reset_reason {
+   SBI_SRST_RESET_REASON_NONE = 0,
+   SBI_SRST_RESET_REASON_SYS_FAILURE,
+};
+
 #define SBI_SPEC_VERSION_DEFAULT   0x1
 #define SBI_SPEC_VERSION_MAJOR_SHIFT   24
 #define SBI_SPEC_VERSION_MAJOR_MASK0x7f
@@ -145,6 +161,14 @@ static inline unsigned long sbi_minor_version(void)
return sbi_spec_version & SBI_SPEC_VERSION_MINOR_MASK;
 }
 
+/* Make SBI version */
+static inline unsigned long sbi_mk_version(unsigned long major,
+   unsigned long minor)
+{
+   return ((major & SBI_SPEC_VERSION_MAJOR_MASK) <<
+   SBI_SPEC_VERSION_MAJOR_SHIFT) | minor;
+}
+
 int sbi_err_map_linux_errno(int err);
 #else /* CONFIG_RISCV_SBI */
 static inline int sbi_remote_fence_i(const unsigned long *hart_mask) { return 
-1; }
diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
index c0dcebdd30ec..e94ea8053984 100644
--- a/arch/riscv/kernel/sbi.c
+++ b/arch/riscv/kernel/sbi.c
@@ -7,6 +7,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -501,6 +502,32 @@ int sbi_remote_hfence_vvma_asid(const unsigned long 
*hart_mask,
 }
 EXPORT_SYMBOL(sbi_remote_hfence_vvma_asid);
 
+static void sbi_srst_reset(unsigned long type, unsigned long reason)
+{
+   sbi_ecall(SBI_EXT_SRST, SBI_EXT_SRST_RESET, type, reason,
+ 0, 0, 0, 0);
+   pr_warn("%s: type=0x%lx reason=0x%lx failed\n",
+   __func__, type, reason);
+}
+
+static int sbi_srst_reboot(struct notifier_block *this,
+  unsigned long mode, void *cmd)
+{
+   sbi_srst_reset((mode == REBOOT_WARM || mode == REBOOT_SOFT) ?
+  SBI_SRST_RESET_TYPE_WARM_REBOOT :
+  SBI_SRST_RESET_TYPE_COLD_REBOOT,
+  SBI_SRST_RESET_REASON_NONE);
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block sbi_srst_reboot_nb;
+
+static void sbi_srst_power_off(void)
+{
+   sbi_srst_reset(SBI_SRST_RESET_TYPE_SHUTDOWN,
+  SBI_SRST_RESET_REASON_NONE);
+}
+
 /**
  * sbi_probe_extension() - Check if an SBI extension ID is supported or not.
  * @extid: The extension ID to be probed.
@@ -593,6 +620,14 @@ void __init sbi_init(void)
} else {
__sbi_rfence= __sbi_rfence_v01;
}
+   if ((sbi_spec_version >= sbi_mk_version(0, 3)) &&
+   (sbi_probe_extension(SBI_EXT_SRST) > 0)) {
+   pr_info("SBI SRST extension detected\n");
+   pm_power_off = sbi_srst_power_off;
+   sbi_srst_reboot_nb.notifier_call = sbi_srst_reboot;
+   sbi_srst_reboot_nb.priority = 192;
+   register_restart_handler(_srst_reboot_nb);
+   }
} else {
__sbi_set_timer = __sbi_set_timer_v01;
__sbi_send_ipi  = __sbi_send_ipi_v01;
-- 
2.25.1



[PATCH v6 1/2] RISC-V: Don't print SBI version for all detected extensions

2021-03-15 Thread Anup Patel
The sbi_init() already prints SBI version before detecting
various SBI extensions so we don't need to print SBI version
for all detected SBI extensions.

Signed-off-by: Anup Patel 
---
 arch/riscv/kernel/sbi.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
index f4a7db3d309e..c0dcebdd30ec 100644
--- a/arch/riscv/kernel/sbi.c
+++ b/arch/riscv/kernel/sbi.c
@@ -577,19 +577,19 @@ void __init sbi_init(void)
sbi_get_firmware_id(), sbi_get_firmware_version());
if (sbi_probe_extension(SBI_EXT_TIME) > 0) {
__sbi_set_timer = __sbi_set_timer_v02;
-   pr_info("SBI v0.2 TIME extension detected\n");
+   pr_info("SBI TIME extension detected\n");
} else {
__sbi_set_timer = __sbi_set_timer_v01;
}
if (sbi_probe_extension(SBI_EXT_IPI) > 0) {
__sbi_send_ipi  = __sbi_send_ipi_v02;
-   pr_info("SBI v0.2 IPI extension detected\n");
+   pr_info("SBI IPI extension detected\n");
} else {
__sbi_send_ipi  = __sbi_send_ipi_v01;
}
if (sbi_probe_extension(SBI_EXT_RFENCE) > 0) {
__sbi_rfence= __sbi_rfence_v02;
-   pr_info("SBI v0.2 RFENCE extension detected\n");
+   pr_info("SBI RFENCE extension detected\n");
} else {
__sbi_rfence= __sbi_rfence_v01;
}
-- 
2.25.1



[PATCH v6 0/2] SBI SRST extension support

2021-03-15 Thread Anup Patel
This series adds SBI SRST extension support to Linux RISC-V.

These patches can be found in riscv_sbi_srst_v6 branch at:
https://github.com/avpatel/linux

Changes since v5:
 - Factored-out pr_info() related change into separate patch
 - Added cover letter

Changes since v4:
 - We should compare both major and minor number to ensure that
   SBI spec version is 0.3 (or above) for detecting SRST extension.

Changes since v3:
 - Rebased on Linux-5.12-rc1
 - Check SBI spec version when probing for SRST extension

Changes since v2:
 - Rebased on Linux-5.10-rc5
 - Updated patch as-per SBI SRST extension available in the latest
   SBI v0.3-draft specification

Changes since v1:
 - Updated patch as-per latest SBI SRST extension draft spec where
   we have only one SBI call with "reset_type" parameter

Anup Patel (2):
  RISC-V: Don't print SBI version for all detected extensions
  RISC-V: Use SBI SRST extension when available

 arch/riscv/include/asm/sbi.h | 24 +
 arch/riscv/kernel/sbi.c  | 41 +---
 2 files changed, 62 insertions(+), 3 deletions(-)

-- 
2.25.1



[RFC PATCH v1 1/3] RISC-V: IPI provider should specify if we can use IPI for remote FENCE

2021-03-11 Thread Anup Patel
We extend riscv_set_ipi_ops() so that IPI providers (such as SBI, CLINT
driver, etc) can specify whether IPIs are suitable for doing remote
FENCEs (i.e remote TLB shoot down).

The upcoming AIA specification allows IPI injection directly from S-mode
(or VS-mode) using IMSIC controller so the extended riscv_set_ipi_ops()
will be useful to AIA IMSIC driver as well.

Signed-off-by: Anup Patel 
---
 arch/riscv/include/asm/smp.h  | 13 +++--
 arch/riscv/kernel/sbi.c   |  2 +-
 arch/riscv/kernel/smp.c   | 10 +-
 arch/riscv/mm/cacheflush.c|  2 +-
 drivers/clocksource/timer-clint.c |  2 +-
 5 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/arch/riscv/include/asm/smp.h b/arch/riscv/include/asm/smp.h
index df1f7c4cd433..82c23e5f22f6 100644
--- a/arch/riscv/include/asm/smp.h
+++ b/arch/riscv/include/asm/smp.h
@@ -45,8 +45,11 @@ void arch_send_call_function_single_ipi(int cpu);
 int riscv_hartid_to_cpuid(int hartid);
 void riscv_cpuid_to_hartid_mask(const struct cpumask *in, struct cpumask *out);
 
+/* Check if we can use IPIs for remote FENCE */
+bool riscv_use_ipi_for_rfence(void);
+
 /* Set custom IPI operations */
-void riscv_set_ipi_ops(struct riscv_ipi_ops *ops);
+void riscv_set_ipi_ops(struct riscv_ipi_ops *ops, bool use_for_rfence);
 
 /* Clear IPI for current CPU */
 void riscv_clear_ipi(void);
@@ -92,7 +95,13 @@ static inline void riscv_cpuid_to_hartid_mask(const struct 
cpumask *in,
cpumask_set_cpu(boot_cpu_hartid, out);
 }
 
-static inline void riscv_set_ipi_ops(struct riscv_ipi_ops *ops)
+static inline bool riscv_use_ipi_for_rfence(void)
+{
+   return false;
+}
+
+static inline void riscv_set_ipi_ops(struct riscv_ipi_ops *ops,
+bool use_for_rfence)
 {
 }
 
diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
index 49155588e56c..15a09680fdb6 100644
--- a/arch/riscv/kernel/sbi.c
+++ b/arch/riscv/kernel/sbi.c
@@ -634,5 +634,5 @@ void __init sbi_init(void)
__sbi_rfence= __sbi_rfence_v01;
}
 
-   riscv_set_ipi_ops(_ipi_ops);
+   riscv_set_ipi_ops(_ipi_ops, false);
 }
diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c
index ea028d9e0d24..9258e3eaa8c6 100644
--- a/arch/riscv/kernel/smp.c
+++ b/arch/riscv/kernel/smp.c
@@ -85,11 +85,19 @@ static void ipi_stop(void)
wait_for_interrupt();
 }
 
+static bool ipi_for_rfence;
 static struct riscv_ipi_ops *ipi_ops;
 
-void riscv_set_ipi_ops(struct riscv_ipi_ops *ops)
+bool riscv_use_ipi_for_rfence(void)
+{
+   return ipi_for_rfence;
+}
+EXPORT_SYMBOL_GPL(riscv_use_ipi_for_rfence);
+
+void riscv_set_ipi_ops(struct riscv_ipi_ops *ops, bool use_for_rfence)
 {
ipi_ops = ops;
+   ipi_for_rfence = use_for_rfence;
 }
 EXPORT_SYMBOL_GPL(riscv_set_ipi_ops);
 
diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index 094118663285..0ffe7d560dc8 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -16,7 +16,7 @@ static void ipi_remote_fence_i(void *info)
 
 void flush_icache_all(void)
 {
-   if (IS_ENABLED(CONFIG_RISCV_SBI))
+   if (!riscv_use_ipi_for_rfence())
sbi_remote_fence_i(NULL);
else
on_each_cpu(ipi_remote_fence_i, NULL, 1);
diff --git a/drivers/clocksource/timer-clint.c 
b/drivers/clocksource/timer-clint.c
index 6cfe2ab73eb0..fe018a2c008f 100644
--- a/drivers/clocksource/timer-clint.c
+++ b/drivers/clocksource/timer-clint.c
@@ -228,7 +228,7 @@ static int __init clint_timer_init_dt(struct device_node 
*np)
goto fail_free_irq;
}
 
-   riscv_set_ipi_ops(_ipi_ops);
+   riscv_set_ipi_ops(_ipi_ops, true);
clint_clear_ipi();
 
return 0;
-- 
2.25.1



[RFC PATCH v1 3/3] RISC-V: Add handle_IPI_noregs() for irqchip drivers

2021-03-11 Thread Anup Patel
We will be having IPI handled through nested interrupt controllers
(such as AIA IMSIC). The irqchip driver of such nested interrupt
controller will not do irq_enter() and save pt_regs because this
would have been already done by the irqchip driver of the parent
interrupt controller.

This patch adds handle_IPI_noregs() for nested irqchip drivers.

Signed-off-by: Anup Patel 
---
 arch/riscv/include/asm/smp.h |  6 ++
 arch/riscv/kernel/smp.c  | 20 ++--
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/arch/riscv/include/asm/smp.h b/arch/riscv/include/asm/smp.h
index 82c23e5f22f6..b31d3ec2f71b 100644
--- a/arch/riscv/include/asm/smp.h
+++ b/arch/riscv/include/asm/smp.h
@@ -33,6 +33,12 @@ void show_ipi_stats(struct seq_file *p, int prec);
 /* SMP initialization hook for setup_arch */
 void __init setup_smp(void);
 
+/*
+ * Called from C code, this handles an IPI assuming irq_enter() and
+ * pt_regs already saved by caller.
+ */
+void handle_IPI_noregs(void);
+
 /* Called from C code, this handles an IPI. */
 void handle_IPI(struct pt_regs *regs);
 
diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c
index 9258e3eaa8c6..19e102e2d5e6 100644
--- a/arch/riscv/kernel/smp.c
+++ b/arch/riscv/kernel/smp.c
@@ -144,14 +144,11 @@ void arch_irq_work_raise(void)
 }
 #endif
 
-void handle_IPI(struct pt_regs *regs)
+void handle_IPI_noregs(void)
 {
-   struct pt_regs *old_regs = set_irq_regs(regs);
unsigned long *pending_ipis = _data[smp_processor_id()].bits;
unsigned long *stats = ipi_data[smp_processor_id()].stats;
 
-   irq_enter();
-
riscv_clear_ipi();
 
while (true) {
@@ -162,7 +159,7 @@ void handle_IPI(struct pt_regs *regs)
 
ops = xchg(pending_ipis, 0);
if (ops == 0)
-   goto done;
+   break;
 
if (ops & (1 << IPI_RESCHEDULE)) {
stats[IPI_RESCHEDULE]++;
@@ -189,9 +186,20 @@ void handle_IPI(struct pt_regs *regs)
/* Order data access and bit testing. */
mb();
}
+}
+
+void handle_IPI(struct pt_regs *regs)
+{
+   struct pt_regs *old_regs = set_irq_regs(regs);
+
+   irq_enter();
+
+   handle_IPI_noregs();
+
+   riscv_clear_ipi();
 
-done:
irq_exit();
+
set_irq_regs(old_regs);
 }
 
-- 
2.25.1



[RFC PATCH v1 2/3] RISC-V: Use IPIs for remote TLB flush when possible

2021-03-11 Thread Anup Patel
If IPI calls are injected using SBI IPI calls then remote TLB flush
using SBI RFENCE calls is much faster because using IPIs for remote
TLB flush would still endup as SBI IPI calls with extra processing
on kernel side.

It is now possible to have specialized hardware (such as RISC-V AIA)
which allows S-mode software to directly inject IPIs without any
assistance from M-mode runtime firmware.

This patch extends remote TLB flush functions to use IPIs whenever
underlying IPI operations are suitable for remote FENCEs.

Signed-off-by: Anup Patel 
---
 arch/riscv/mm/tlbflush.c | 62 +++-
 1 file changed, 48 insertions(+), 14 deletions(-)

diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index 720b443c4528..009c56fa102d 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -1,39 +1,73 @@
 // SPDX-License-Identifier: GPL-2.0
+/*
+ * TLB flush implementation.
+ *
+ * Copyright (c) 2021 Western Digital Corporation or its affiliates.
+ */
 
 #include 
 #include 
 #include 
 #include 
 
+static void ipi_flush_tlb_all(void *info)
+{
+   local_flush_tlb_all();
+}
+
 void flush_tlb_all(void)
 {
-   sbi_remote_sfence_vma(NULL, 0, -1);
+   if (!riscv_use_ipi_for_rfence())
+   sbi_remote_sfence_vma(NULL, 0, -1);
+   else
+   on_each_cpu(ipi_flush_tlb_all, NULL, 1);
+}
+
+struct flush_range_data {
+   unsigned long start;
+   unsigned long size;
+};
+
+static void ipi_flush_range(void *info)
+{
+   struct flush_range_data *data = info;
+
+   /* local cpu is the only cpu present in cpumask */
+   if (data->size <= PAGE_SIZE)
+   local_flush_tlb_page(data->start);
+   else
+   local_flush_tlb_all();
 }
 
 /*
- * This function must not be called with cmask being null.
+ * This function must not be called with NULL cpumask.
  * Kernel may panic if cmask is NULL.
  */
-static void __sbi_tlb_flush_range(struct cpumask *cmask, unsigned long start,
- unsigned long size)
+static void flush_range(struct cpumask *cmask, unsigned long start,
+   unsigned long size)
 {
+   struct flush_range_data info;
struct cpumask hmask;
unsigned int cpuid;
 
if (cpumask_empty(cmask))
return;
 
+   info.start = start;
+   info.size = size;
+
cpuid = get_cpu();
 
if (cpumask_any_but(cmask, cpuid) >= nr_cpu_ids) {
-   /* local cpu is the only cpu present in cpumask */
-   if (size <= PAGE_SIZE)
-   local_flush_tlb_page(start);
-   else
-   local_flush_tlb_all();
+   ipi_flush_range();
} else {
-   riscv_cpuid_to_hartid_mask(cmask, );
-   sbi_remote_sfence_vma(cpumask_bits(), start, size);
+   if (!riscv_use_ipi_for_rfence()) {
+   riscv_cpuid_to_hartid_mask(cmask, );
+   sbi_remote_sfence_vma(cpumask_bits(),
+ start, size);
+   } else {
+   on_each_cpu_mask(cmask, ipi_flush_range, , 1);
+   }
}
 
put_cpu();
@@ -41,16 +75,16 @@ static void __sbi_tlb_flush_range(struct cpumask *cmask, 
unsigned long start,
 
 void flush_tlb_mm(struct mm_struct *mm)
 {
-   __sbi_tlb_flush_range(mm_cpumask(mm), 0, -1);
+   flush_range(mm_cpumask(mm), 0, -1);
 }
 
 void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
 {
-   __sbi_tlb_flush_range(mm_cpumask(vma->vm_mm), addr, PAGE_SIZE);
+   flush_range(mm_cpumask(vma->vm_mm), addr, PAGE_SIZE);
 }
 
 void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
 unsigned long end)
 {
-   __sbi_tlb_flush_range(mm_cpumask(vma->vm_mm), start, end - start);
+   flush_range(mm_cpumask(vma->vm_mm), start, end - start);
 }
-- 
2.25.1



[RFC PATCH v1 0/3] IPI and remote TBL flush improvement

2021-03-11 Thread Anup Patel
This series primarily does two things:
1. Allows RISC-V IPI provider to specificy whether IPI operations are
   suitable for remote TLB flush (PATCH1)
2. Improve remote TLB flush to use IPIs whenever possible (PATCH2)
3. Allow irqchip drivers to handle IPIs from chained IRQ handlers (PATCH3)

This series also a preparatory series for upcoming RISC-V advanced
interrupt architecture (AIA) support.

These patches can be found in riscv_ipi_imp_v1 branch at
https://github.com/avpatel/linux

Anup Patel (3):
  RISC-V: IPI provider should specify if we can use IPI for remote FENCE
  RISC-V: Use IPIs for remote TLB flush when possible
  RISC-V: Add handle_IPI_noregs() for irqchip drivers

 arch/riscv/include/asm/smp.h  | 19 +-
 arch/riscv/kernel/sbi.c   |  2 +-
 arch/riscv/kernel/smp.c   | 30 +++
 arch/riscv/mm/cacheflush.c|  2 +-
 arch/riscv/mm/tlbflush.c  | 62 ---
 drivers/clocksource/timer-clint.c |  2 +-
 6 files changed, 91 insertions(+), 26 deletions(-)

-- 
2.25.1



Re: [PATCH v4] RISC-V: Use SBI SRST extension when available

2021-03-09 Thread Anup Patel
On Wed, Mar 10, 2021 at 8:31 AM Palmer Dabbelt  wrote:
>
> On Mon, 01 Mar 2021 03:58:33 PST (-0800), Anup Patel wrote:
> > The SBI SRST extension provides a standard way to poweroff and
> > reboot the system irrespective to whether Linux RISC-V S-mode
> > is running natively (HS-mode) or inside Guest/VM (VS-mode).
> >
> > The SBI SRST extension is available in latest SBI v0.3-draft
> > specification at: https://github.com/riscv/riscv-sbi-doc.
> >
> > This patch extends Linux RISC-V SBI implementation to detect
> > and use SBI SRST extension.
> >
> > Signed-off-by: Anup Patel 
> > ---
> >  arch/riscv/include/asm/sbi.h | 16 ++
> >  arch/riscv/kernel/sbi.c  | 41 +---
> >  2 files changed, 54 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
> > index 99895d9c3bdd..8add0209c9c7 100644
> > --- a/arch/riscv/include/asm/sbi.h
> > +++ b/arch/riscv/include/asm/sbi.h
> > @@ -27,6 +27,7 @@ enum sbi_ext_id {
> >   SBI_EXT_IPI = 0x735049,
> >   SBI_EXT_RFENCE = 0x52464E43,
> >   SBI_EXT_HSM = 0x48534D,
> > + SBI_EXT_SRST = 0x53525354,
> >  };
> >
> >  enum sbi_ext_base_fid {
> > @@ -70,6 +71,21 @@ enum sbi_hsm_hart_status {
> >   SBI_HSM_HART_STATUS_STOP_PENDING,
> >  };
> >
> > +enum sbi_ext_srst_fid {
> > + SBI_EXT_SRST_RESET = 0,
> > +};
> > +
> > +enum sbi_srst_reset_type {
> > + SBI_SRST_RESET_TYPE_SHUTDOWN = 0,
> > + SBI_SRST_RESET_TYPE_COLD_REBOOT,
> > + SBI_SRST_RESET_TYPE_WARM_REBOOT,
> > +};
> > +
> > +enum sbi_srst_reset_reason {
> > + SBI_SRST_RESET_REASON_NONE = 0,
> > + SBI_SRST_RESET_REASON_SYS_FAILURE,
> > +};
> > +
> >  #define SBI_SPEC_VERSION_DEFAULT 0x1
> >  #define SBI_SPEC_VERSION_MAJOR_SHIFT 24
> >  #define SBI_SPEC_VERSION_MAJOR_MASK  0x7f
> > diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
> > index f4a7db3d309e..49155588e56c 100644
> > --- a/arch/riscv/kernel/sbi.c
> > +++ b/arch/riscv/kernel/sbi.c
> > @@ -7,6 +7,7 @@
> >
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >
> > @@ -501,6 +502,32 @@ int sbi_remote_hfence_vvma_asid(const unsigned long 
> > *hart_mask,
> >  }
> >  EXPORT_SYMBOL(sbi_remote_hfence_vvma_asid);
> >
> > +static void sbi_srst_reset(unsigned long type, unsigned long reason)
> > +{
> > + sbi_ecall(SBI_EXT_SRST, SBI_EXT_SRST_RESET, type, reason,
> > +   0, 0, 0, 0);
> > + pr_warn("%s: type=0x%lx reason=0x%lx failed\n",
> > + __func__, type, reason);
> > +}
> > +
> > +static int sbi_srst_reboot(struct notifier_block *this,
> > +unsigned long mode, void *cmd)
> > +{
> > + sbi_srst_reset((mode == REBOOT_WARM || mode == REBOOT_SOFT) ?
> > +SBI_SRST_RESET_TYPE_WARM_REBOOT :
> > +SBI_SRST_RESET_TYPE_COLD_REBOOT,
> > +SBI_SRST_RESET_REASON_NONE);
> > + return NOTIFY_DONE;
> > +}
> > +
> > +static struct notifier_block sbi_srst_reboot_nb;
> > +
> > +static void sbi_srst_power_off(void)
> > +{
> > + sbi_srst_reset(SBI_SRST_RESET_TYPE_SHUTDOWN,
> > +SBI_SRST_RESET_REASON_NONE);
> > +}
> > +
> >  /**
> >   * sbi_probe_extension() - Check if an SBI extension ID is supported or 
> > not.
> >   * @extid: The extension ID to be probed.
> > @@ -577,22 +604,30 @@ void __init sbi_init(void)
> >   sbi_get_firmware_id(), sbi_get_firmware_version());
> >   if (sbi_probe_extension(SBI_EXT_TIME) > 0) {
> >   __sbi_set_timer = __sbi_set_timer_v02;
> > - pr_info("SBI v0.2 TIME extension detected\n");
> > + pr_info("SBI TIME extension detected\n");
> >   } else {
> >   __sbi_set_timer = __sbi_set_timer_v01;
> >   }
> >   if (sbi_probe_extension(SBI_EXT_IPI) > 0) {
> >   __sbi_send_ipi  = __sbi_send_ipi_v02;
> > - pr_info("SBI v0.2 IPI extension detected\n");
> > + pr_info("SBI IPI extension detected\n");
>
> These aren't really part of the reset stuff and should be split into a 
> separate
> patch.


Re: [PATCH 1/1] RISC-V: correct enum sbi_ext_rfence_fid

2021-03-08 Thread Anup Patel
On Mon, Mar 8, 2021 at 1:19 PM Atish Patra  wrote:
>
> On Sat, Mar 6, 2021 at 4:12 AM Anup Patel  wrote:
> >
> > On Sat, Mar 6, 2021 at 11:19 AM Heinrich Schuchardt  
> > wrote:
> > >
> > > The constants in enum sbi_ext_rfence_fid should match the SBI
> > > specification. See
> > > https://github.com/riscv/riscv-sbi-doc/blob/master/riscv-sbi.adoc#78-function-listing
> > >
> > > | Function Name   | FID | EID
> > > | sbi_remote_fence_i  |   0 | 0x52464E43
> > > | sbi_remote_sfence_vma   |   1 | 0x52464E43
> > > | sbi_remote_sfence_vma_asid  |   2 | 0x52464E43
> > > | sbi_remote_hfence_gvma_vmid |   3 | 0x52464E43
> > > | sbi_remote_hfence_gvma  |   4 | 0x52464E43
> > > | sbi_remote_hfence_vvma_asid |   5 | 0x52464E43
> > > | sbi_remote_hfence_vvma  |   6 | 0x52464E43
> > >
> > > Fixes: ecbacc2a3efd ("RISC-V: Add SBI v0.2 extension definitions")
> > > Reported-by: Sean Anderson 
> > > Signed-off-by: Heinrich Schuchardt 
> >
> > Good catch.
> >
> > I guess we never saw any issues because these calls are only used by
> > KVM RISC-V which is not merged yet. Further for KVM RISC-V, the HFENCE
> > instruction is emulated as flush everything on FPGA, QEMU, and Spike so
> > we did not notice any issue with KVM RISC-V too.
> >
>
> OpenSBI & Xvisor also define the same order as Linux kernel. The
> existing order(in Linux kernel)
> makes more sense w.r.to Lexicographic order as well.
>
> Should we just fix the spec instead ?

I would not recommend that because RFENCE is part of the released SBI v0.2 spec.

We have to be more careful in software to follow the spec correctly.

Regards,
Anup

>
> > Looks good to me.
> >
> > Reviewed-by: Anup Patel 
> >
> > Regards,
> > Anup
> >
> > > ---
> > >  arch/riscv/include/asm/sbi.h | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
> > > index 99895d9c3bdd..d7027411dde8 100644
> > > --- a/arch/riscv/include/asm/sbi.h
> > > +++ b/arch/riscv/include/asm/sbi.h
> > > @@ -51,10 +51,10 @@ enum sbi_ext_rfence_fid {
> > > SBI_EXT_RFENCE_REMOTE_FENCE_I = 0,
> > > SBI_EXT_RFENCE_REMOTE_SFENCE_VMA,
> > > SBI_EXT_RFENCE_REMOTE_SFENCE_VMA_ASID,
> > > -   SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA,
> > > SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA_VMID,
> > > -   SBI_EXT_RFENCE_REMOTE_HFENCE_VVMA,
> > > +   SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA,
> > > SBI_EXT_RFENCE_REMOTE_HFENCE_VVMA_ASID,
> > > +   SBI_EXT_RFENCE_REMOTE_HFENCE_VVMA,
> > >  };
> > >
> > >  enum sbi_ext_hsm_fid {
> > > --
> > > 2.30.1
> > >
> > >
> > > ___
> > > linux-riscv mailing list
> > > linux-ri...@lists.infradead.org
> > > http://lists.infradead.org/mailman/listinfo/linux-riscv
> >
> > ___
> > linux-riscv mailing list
> > linux-ri...@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
>
>
>
> --
> Regards,
> Atish


Re: [RFC PATCH 7/8] dt-bindings: Add bindings documentation for RISC-V idle states

2021-03-07 Thread Anup Patel
On Sat, Mar 6, 2021 at 4:52 AM Rob Herring  wrote:
>
> On Sun, Feb 21, 2021 at 03:07:57PM +0530, Anup Patel wrote:
> > The RISC-V CPU idle states will be described in DT under the
> > /cpus/riscv-idle-states DT node. This patch adds the bindings
> > documentation for riscv-idle-states DT nodes and idle state DT
> > nodes under it.
> >
> > Signed-off-by: Anup Patel 
> > ---
> >  .../bindings/riscv/idle-states.yaml   | 250 ++
> >  1 file changed, 250 insertions(+)
> >  create mode 100644 Documentation/devicetree/bindings/riscv/idle-states.yaml
> >
> > diff --git a/Documentation/devicetree/bindings/riscv/idle-states.yaml 
> > b/Documentation/devicetree/bindings/riscv/idle-states.yaml
> > new file mode 100644
> > index ..3eff763fed23
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/riscv/idle-states.yaml
> > @@ -0,0 +1,250 @@
> > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > +%YAML 1.2
> > +---
> > +$id: http://devicetree.org/schemas/riscv/idle-states.yaml#
> > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > +
> > +title: RISC-V idle states binding description
> > +
> > +maintainers:
> > +  - Anup Patel 
> > +
> > +description: |+
> > +  RISC-V systems can manage power consumption dynamically, where HARTs
> > +  (or CPUs) [1] can be put in different platform specific suspend (or
> > +  idle) states (ranging from simple WFI, power gating, etc). The RISC-V
> > +  SBI [2] hart state management extension provides a standard mechanism
> > +  for OSes to request HART state transitions.
> > +
> > +  The platform specific suspend (or idle) states of a hart can be either
> > +  retentive or non-rententive in nature. A retentive suspend state will
> > +  preserve hart register and CSR values for all privilege modes whereas
> > +  a non-retentive suspend state will not preserve hart register and CSR
> > +  values. The suspend (or idle) state entered by executing the WFI
> > +  instruction is considered standard on all RISC-V systems and therefore
> > +  must not be listed in device tree.
> > +
> > +  The device tree binding definition for RISC-V idle states described
> > +  in this document is quite similar to the ARM idle states [3].
> > +
> > +  References
> > +
> > +  [1] RISC-V Linux Kernel documentation - CPUs bindings
> > +  Documentation/devicetree/bindings/riscv/cpus.yaml
> > +
> > +  [2] RISC-V Supervisor Binary Interface (SBI)
> > +  http://github.com/riscv/riscv-sbi-doc/riscv-sbi.adoc
> > +
> > +  [3] ARM idle states binding description - Idle states bindings
> > +  Documentation/devicetree/bindings/arm/idle-states.yaml
>
> I'd assume there's common parts we can share.

Yes, except few properties most are the same.

We can have a shared DT bindings for both ARM and RISC-V but
both architectures will always have some architecture specific details
(or properties) which need to be documented under arch specific
DT documentation. Is it okay if this is done as a separate series ?

>
> > +
> > +properties:
> > +  $nodename:
> > +const: riscv-idle-states
>
> Just 'idle-states' like Arm.

I had tried "idle-states" node name but DT bindings check complaints
conflict with ARM idle state bindings.

>
> > +
> > +patternProperties:
> > +  "^(cpu|cluster)-":
> > +type: object
> > +description: |
> > +  Each state node represents an idle state description and must be
> > +  defined as follows.
> > +
>
>additionalProperties: false

okay, will update.

>
> > +properties:
> > +  compatible:
> > +const: riscv,idle-state
> > +
> > +  local-timer-stop:
> > +description:
> > +  If present the CPU local timer control logic is lost on state
> > +  entry, otherwise it is retained.
> > +type: boolean
> > +
> > +  entry-latency-us:
> > +description:
> > +  Worst case latency in microseconds required to enter the idle 
> > state.
> > +
> > +  exit-latency-us:
> > +description:
> > +  Worst case latency in microseconds required to exit the idle 
> > state.
> > +  The exit-latency-us duration may be guaranteed only after
> > +  entry-latency-us has passed.
> > +
> > +  min-residency-us:
> > +description:
> > +  Minimum residency duration in microseconds, inclusive of 
> > preparation

Re: [PATCH 2/2] riscv: Enable generic clockevent broadcast

2021-03-06 Thread Anup Patel
On Sun, Mar 7, 2021 at 7:55 AM  wrote:
>
> From: Guo Ren 
>
> When percpu-timers are stopped by deep power saving mode, we
> need system timer help to broadcast IPI_TIMER.
>
> This is first introduced by broken x86 hardware, where the local apic
> timer stops in C3 state. But many other architectures(powerpc, mips,
> arm, hexagon, openrisc, sh) have supported the infrastructure to
> deal with Power Management issues.
>
> Signed-off-by: Guo Ren 
> Cc: Arnd Bergmann 
> Cc: Thomas Gleixner 
> Cc: Daniel Lezcano 
> Cc: Anup Patel 
> Cc: Atish Patra 
> Cc: Palmer Dabbelt 
> Cc: Greentime Hu 

Looks good to me.

Reviewed-by: Anup Patel 

Regards,
Anup

> ---
>  arch/riscv/Kconfig  |  2 ++
>  arch/riscv/kernel/smp.c | 16 
>  2 files changed, 18 insertions(+)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 85d626b8ce5e..8637e7344abe 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -28,6 +28,7 @@ config RISCV
> select ARCH_HAS_SET_DIRECT_MAP
> select ARCH_HAS_SET_MEMORY
> select ARCH_HAS_STRICT_KERNEL_RWX if MMU
> +   select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
> select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
> select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT
> select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
> @@ -39,6 +40,7 @@ config RISCV
> select EDAC_SUPPORT
> select GENERIC_ARCH_TOPOLOGY if SMP
> select GENERIC_ATOMIC64 if !64BIT
> +   select GENERIC_CLOCKEVENTS_BROADCAST if SMP
> select GENERIC_EARLY_IOREMAP
> select GENERIC_GETTIMEOFDAY if HAVE_GENERIC_VDSO
> select GENERIC_IOREMAP
> diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c
> index ea028d9e0d24..8325d33411d8 100644
> --- a/arch/riscv/kernel/smp.c
> +++ b/arch/riscv/kernel/smp.c
> @@ -9,6 +9,7 @@
>   */
>
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -27,6 +28,7 @@ enum ipi_message_type {
> IPI_CALL_FUNC,
> IPI_CPU_STOP,
> IPI_IRQ_WORK,
> +   IPI_TIMER,
> IPI_MAX
>  };
>
> @@ -176,6 +178,12 @@ void handle_IPI(struct pt_regs *regs)
> irq_work_run();
> }
>
> +#ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
> +   if (ops & (1 << IPI_TIMER)) {
> +   stats[IPI_TIMER]++;
> +   tick_receive_broadcast();
> +   }
> +#endif
> BUG_ON((ops >> IPI_MAX) != 0);
>
> /* Order data access and bit testing. */
> @@ -192,6 +200,7 @@ static const char * const ipi_names[] = {
> [IPI_CALL_FUNC] = "Function call interrupts",
> [IPI_CPU_STOP]  = "CPU stop interrupts",
> [IPI_IRQ_WORK]  = "IRQ work interrupts",
> +   [IPI_TIMER] = "Timer broadcast interrupts",
>  };
>
>  void show_ipi_stats(struct seq_file *p, int prec)
> @@ -217,6 +226,13 @@ void arch_send_call_function_single_ipi(int cpu)
> send_ipi_single(cpu, IPI_CALL_FUNC);
>  }
>
> +#ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
> +void tick_broadcast(const struct cpumask *mask)
> +{
> +   send_ipi_mask(mask, IPI_TIMER);
> +}
> +#endif
> +
>  void smp_send_stop(void)
>  {
> unsigned long timeout;
> --
> 2.25.1
>
>
> ___
> linux-riscv mailing list
> linux-ri...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv


Re: [PATCH 1/1] RISC-V: correct enum sbi_ext_rfence_fid

2021-03-06 Thread Anup Patel
On Sat, Mar 6, 2021 at 11:19 AM Heinrich Schuchardt  wrote:
>
> The constants in enum sbi_ext_rfence_fid should match the SBI
> specification. See
> https://github.com/riscv/riscv-sbi-doc/blob/master/riscv-sbi.adoc#78-function-listing
>
> | Function Name   | FID | EID
> | sbi_remote_fence_i  |   0 | 0x52464E43
> | sbi_remote_sfence_vma   |   1 | 0x52464E43
> | sbi_remote_sfence_vma_asid  |   2 | 0x52464E43
> | sbi_remote_hfence_gvma_vmid |   3 | 0x52464E43
> | sbi_remote_hfence_gvma  |   4 | 0x52464E43
> | sbi_remote_hfence_vvma_asid |   5 | 0x52464E43
> | sbi_remote_hfence_vvma  |   6 | 0x52464E43
>
> Fixes: ecbacc2a3efd ("RISC-V: Add SBI v0.2 extension definitions")
> Reported-by: Sean Anderson 
> Signed-off-by: Heinrich Schuchardt 

Good catch.

I guess we never saw any issues because these calls are only used by
KVM RISC-V which is not merged yet. Further for KVM RISC-V, the HFENCE
instruction is emulated as flush everything on FPGA, QEMU, and Spike so
we did not notice any issue with KVM RISC-V too.

Looks good to me.

Reviewed-by: Anup Patel 

Regards,
Anup

> ---
>  arch/riscv/include/asm/sbi.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
> index 99895d9c3bdd..d7027411dde8 100644
> --- a/arch/riscv/include/asm/sbi.h
> +++ b/arch/riscv/include/asm/sbi.h
> @@ -51,10 +51,10 @@ enum sbi_ext_rfence_fid {
> SBI_EXT_RFENCE_REMOTE_FENCE_I = 0,
> SBI_EXT_RFENCE_REMOTE_SFENCE_VMA,
> SBI_EXT_RFENCE_REMOTE_SFENCE_VMA_ASID,
> -   SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA,
> SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA_VMID,
> -   SBI_EXT_RFENCE_REMOTE_HFENCE_VVMA,
> +   SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA,
> SBI_EXT_RFENCE_REMOTE_HFENCE_VVMA_ASID,
> +   SBI_EXT_RFENCE_REMOTE_HFENCE_VVMA,
>  };
>
>  enum sbi_ext_hsm_fid {
> --
> 2.30.1
>
>
> ___
> linux-riscv mailing list
> linux-ri...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv


[PATCH v5] RISC-V: Use SBI SRST extension when available

2021-03-05 Thread Anup Patel
The SBI SRST extension provides a standard way to poweroff and
reboot the system irrespective to whether Linux RISC-V S-mode
is running natively (HS-mode) or inside Guest/VM (VS-mode).

The SBI SRST extension is available in latest SBI v0.3-draft
specification at: https://github.com/riscv/riscv-sbi-doc.

This patch extends Linux RISC-V SBI implementation to detect
and use SBI SRST extension.

Signed-off-by: Anup Patel 
Reviewed-by: Atish Patra 
---
Changes since v4:
 - We should compare both major and minor number to ensure that
   SBI spec version is 0.3 (or above) for detecting SRST extension.
Changes since v3:
 - Rebased on Linux-5.12-rc1
 - Check SBI spec version when probing for SRST extension
Changes since v2:
 - Rebased on Linux-5.10-rc5
 - Updated patch as-per SBI SRST extension available in the latest
   SBI v0.3-draft specification
Changes since v1:
 - Updated patch as-per latest SBI SRST extension draft spec where
   we have only one SBI call with "reset_type" parameter
---
 arch/riscv/include/asm/sbi.h | 24 +
 arch/riscv/kernel/sbi.c  | 41 +---
 2 files changed, 62 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index 99895d9c3bdd..79fa9f28b786 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -27,6 +27,7 @@ enum sbi_ext_id {
SBI_EXT_IPI = 0x735049,
SBI_EXT_RFENCE = 0x52464E43,
SBI_EXT_HSM = 0x48534D,
+   SBI_EXT_SRST = 0x53525354,
 };
 
 enum sbi_ext_base_fid {
@@ -70,6 +71,21 @@ enum sbi_hsm_hart_status {
SBI_HSM_HART_STATUS_STOP_PENDING,
 };
 
+enum sbi_ext_srst_fid {
+   SBI_EXT_SRST_RESET = 0,
+};
+
+enum sbi_srst_reset_type {
+   SBI_SRST_RESET_TYPE_SHUTDOWN = 0,
+   SBI_SRST_RESET_TYPE_COLD_REBOOT,
+   SBI_SRST_RESET_TYPE_WARM_REBOOT,
+};
+
+enum sbi_srst_reset_reason {
+   SBI_SRST_RESET_REASON_NONE = 0,
+   SBI_SRST_RESET_REASON_SYS_FAILURE,
+};
+
 #define SBI_SPEC_VERSION_DEFAULT   0x1
 #define SBI_SPEC_VERSION_MAJOR_SHIFT   24
 #define SBI_SPEC_VERSION_MAJOR_MASK0x7f
@@ -145,6 +161,14 @@ static inline unsigned long sbi_minor_version(void)
return sbi_spec_version & SBI_SPEC_VERSION_MINOR_MASK;
 }
 
+/* Make SBI version */
+static inline unsigned long sbi_mk_version(unsigned long major,
+   unsigned long minor)
+{
+   return ((major & SBI_SPEC_VERSION_MAJOR_MASK) <<
+   SBI_SPEC_VERSION_MAJOR_SHIFT) | minor;
+}
+
 int sbi_err_map_linux_errno(int err);
 #else /* CONFIG_RISCV_SBI */
 static inline int sbi_remote_fence_i(const unsigned long *hart_mask) { return 
-1; }
diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
index f4a7db3d309e..e94ea8053984 100644
--- a/arch/riscv/kernel/sbi.c
+++ b/arch/riscv/kernel/sbi.c
@@ -7,6 +7,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -501,6 +502,32 @@ int sbi_remote_hfence_vvma_asid(const unsigned long 
*hart_mask,
 }
 EXPORT_SYMBOL(sbi_remote_hfence_vvma_asid);
 
+static void sbi_srst_reset(unsigned long type, unsigned long reason)
+{
+   sbi_ecall(SBI_EXT_SRST, SBI_EXT_SRST_RESET, type, reason,
+ 0, 0, 0, 0);
+   pr_warn("%s: type=0x%lx reason=0x%lx failed\n",
+   __func__, type, reason);
+}
+
+static int sbi_srst_reboot(struct notifier_block *this,
+  unsigned long mode, void *cmd)
+{
+   sbi_srst_reset((mode == REBOOT_WARM || mode == REBOOT_SOFT) ?
+  SBI_SRST_RESET_TYPE_WARM_REBOOT :
+  SBI_SRST_RESET_TYPE_COLD_REBOOT,
+  SBI_SRST_RESET_REASON_NONE);
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block sbi_srst_reboot_nb;
+
+static void sbi_srst_power_off(void)
+{
+   sbi_srst_reset(SBI_SRST_RESET_TYPE_SHUTDOWN,
+  SBI_SRST_RESET_REASON_NONE);
+}
+
 /**
  * sbi_probe_extension() - Check if an SBI extension ID is supported or not.
  * @extid: The extension ID to be probed.
@@ -577,22 +604,30 @@ void __init sbi_init(void)
sbi_get_firmware_id(), sbi_get_firmware_version());
if (sbi_probe_extension(SBI_EXT_TIME) > 0) {
__sbi_set_timer = __sbi_set_timer_v02;
-   pr_info("SBI v0.2 TIME extension detected\n");
+   pr_info("SBI TIME extension detected\n");
} else {
__sbi_set_timer = __sbi_set_timer_v01;
}
if (sbi_probe_extension(SBI_EXT_IPI) > 0) {
__sbi_send_ipi  = __sbi_send_ipi_v02;
-   pr_info("SBI v0.2 IPI extension detected\n");
+   pr_info("SBI IPI extension detected\n");
} else {
__sbi_send_ipi  = __sbi_send_ipi_v0

Re: [RFC PATCH 1/8] RISC-V: Enable CPU_IDLE drivers

2021-03-02 Thread Anup Patel
On Fri, Feb 26, 2021 at 6:46 PM Alex Ghiti  wrote:
>
> Hi Anup,
>
> Le 2/21/21 à 4:37 AM, Anup Patel a écrit :
> > We force select CPU_PM and provide asm/cpuidle.h so that we can
> > use CPU IDLE drivers for Linux RISC-V kernel.
> >
> > Signed-off-by: Anup Patel 
> > ---
> >   arch/riscv/Kconfig|  7 +++
> >   arch/riscv/configs/defconfig  |  7 +++
> >   arch/riscv/configs/rv32_defconfig |  4 ++--
> >   arch/riscv/include/asm/cpuidle.h  | 24 
> >   arch/riscv/kernel/process.c   |  3 ++-
> >   5 files changed, 38 insertions(+), 7 deletions(-)
> >   create mode 100644 arch/riscv/include/asm/cpuidle.h
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index fe6862b06ead..4901200b6b6c 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -37,6 +37,7 @@ config RISCV
> >   select CLONE_BACKWARDS
> >   select CLINT_TIMER if !MMU
> >   select COMMON_CLK
> > + select CPU_PM if CPU_IDLE
> >   select EDAC_SUPPORT
> >   select GENERIC_ARCH_TOPOLOGY if SMP
> >   select GENERIC_ATOMIC64 if !64BIT
> > @@ -430,4 +431,10 @@ source "kernel/power/Kconfig"
> >
> >   endmenu
> >
> > +menu "CPU Power Management"
> > +
> > +source "drivers/cpuidle/Kconfig"
> > +
> > +endmenu
> > +
> >   source "drivers/firmware/Kconfig"
> > diff --git a/arch/riscv/configs/defconfig b/arch/riscv/configs/defconfig
> > index 6c0625aa96c7..dc4927c0e44b 100644
> > --- a/arch/riscv/configs/defconfig
> > +++ b/arch/riscv/configs/defconfig
> > @@ -13,11 +13,13 @@ CONFIG_USER_NS=y
> >   CONFIG_CHECKPOINT_RESTORE=y
> >   CONFIG_BLK_DEV_INITRD=y
> >   CONFIG_EXPERT=y
> > +# CONFIG_SYSFS_SYSCALL is not set
> >   CONFIG_BPF_SYSCALL=y
> >   CONFIG_SOC_SIFIVE=y
> >   CONFIG_SOC_VIRT=y
> >   CONFIG_SMP=y
> >   CONFIG_HOTPLUG_CPU=y
> > +CONFIG_CPU_IDLE=y
> >   CONFIG_JUMP_LABEL=y
> >   CONFIG_MODULES=y
> >   CONFIG_MODULE_UNLOAD=y
> > @@ -65,10 +67,9 @@ CONFIG_HW_RANDOM=y
> >   CONFIG_HW_RANDOM_VIRTIO=y
> >   CONFIG_SPI=y
> >   CONFIG_SPI_SIFIVE=y
> > +# CONFIG_PTP_1588_CLOCK is not set
> >   CONFIG_GPIOLIB=y
> >   CONFIG_GPIO_SIFIVE=y
> > -# CONFIG_PTP_1588_CLOCK is not set
> > -CONFIG_POWER_RESET=y
>
> Why do you remove this config ?

This option is selected by CONFIG_SOC_VIRT so it is being
removed from defconfig by savedefconfig.

>
> >   CONFIG_DRM=y
> >   CONFIG_DRM_RADEON=y
> >   CONFIG_DRM_VIRTIO_GPU=y
> > @@ -132,5 +133,3 @@ CONFIG_DEBUG_BLOCK_EXT_DEVT=y
> >   # CONFIG_FTRACE is not set
> >   # CONFIG_RUNTIME_TESTING_MENU is not set
> >   CONFIG_MEMTEST=y
> > -# CONFIG_SYSFS_SYSCALL is not set
> > -CONFIG_EFI=y
>
> And this is one too ? If those removals are intentional, maybe you can
> add something about that in the commit description ?

This is enabled by default so savedefconfig removes it.

>
> > diff --git a/arch/riscv/configs/rv32_defconfig 
> > b/arch/riscv/configs/rv32_defconfig
> > index 8dd02b842fef..332e43a4a2c3 100644
> > --- a/arch/riscv/configs/rv32_defconfig
> > +++ b/arch/riscv/configs/rv32_defconfig
> > @@ -13,12 +13,14 @@ CONFIG_USER_NS=y
> >   CONFIG_CHECKPOINT_RESTORE=y
> >   CONFIG_BLK_DEV_INITRD=y
> >   CONFIG_EXPERT=y
> > +# CONFIG_SYSFS_SYSCALL is not set
> >   CONFIG_BPF_SYSCALL=y
> >   CONFIG_SOC_SIFIVE=y
> >   CONFIG_SOC_VIRT=y
> >   CONFIG_ARCH_RV32I=y
> >   CONFIG_SMP=y
> >   CONFIG_HOTPLUG_CPU=y
> > +CONFIG_CPU_IDLE=y
> >   CONFIG_JUMP_LABEL=y
> >   CONFIG_MODULES=y
> >   CONFIG_MODULE_UNLOAD=y
> > @@ -67,7 +69,6 @@ CONFIG_HW_RANDOM_VIRTIO=y
> >   CONFIG_SPI=y
> >   CONFIG_SPI_SIFIVE=y
> >   # CONFIG_PTP_1588_CLOCK is not set
> > -CONFIG_POWER_RESET=y
> >   CONFIG_DRM=y
> >   CONFIG_DRM_RADEON=y
> >   CONFIG_DRM_VIRTIO_GPU=y
> > @@ -131,4 +132,3 @@ CONFIG_DEBUG_BLOCK_EXT_DEVT=y
> >   # CONFIG_FTRACE is not set
> >   # CONFIG_RUNTIME_TESTING_MENU is not set
> >   CONFIG_MEMTEST=y
> > -# CONFIG_SYSFS_SYSCALL is not set
> > diff --git a/arch/riscv/include/asm/cpuidle.h 
> > b/arch/riscv/include/asm/cpuidle.h
> > new file mode 100644
> > index ..1042d790e446
> > --- /dev/null
> > +++ b/arch/riscv/include/asm/cpuidle.h
> > @@ -0,0 +1,24 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * Copyright (C) 2021 Allwinner Ltd
> > + * Copyri

Re: [PATCH v4] RISC-V: Use SBI SRST extension when available

2021-03-02 Thread Anup Patel
On Mon, Mar 1, 2021 at 5:29 PM Anup Patel  wrote:
>
> The SBI SRST extension provides a standard way to poweroff and
> reboot the system irrespective to whether Linux RISC-V S-mode
> is running natively (HS-mode) or inside Guest/VM (VS-mode).
>
> The SBI SRST extension is available in latest SBI v0.3-draft
> specification at: https://github.com/riscv/riscv-sbi-doc.
>
> This patch extends Linux RISC-V SBI implementation to detect
> and use SBI SRST extension.
>
> Signed-off-by: Anup Patel 
> ---

I missed adding changelog here so here it is ...

Changes since v3:
 - Rebased on Linux-5.12-rc1
 - Check SBI spec version when probing for SRST extension
Changes since v2:
 - Rebased on Linux-5.10-rc5
 - Updated patch as-per SBI SRST extension available in the latest
   SBI v0.3-draft specification
Changes since v1:
 - Updated patch as-per latest SBI SRST extension draft spec where
   we have only one SBI call with "reset_type" parameter

Regards,
Anup

>  arch/riscv/include/asm/sbi.h | 16 ++
>  arch/riscv/kernel/sbi.c  | 41 +---
>  2 files changed, 54 insertions(+), 3 deletions(-)
>
> diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
> index 99895d9c3bdd..8add0209c9c7 100644
> --- a/arch/riscv/include/asm/sbi.h
> +++ b/arch/riscv/include/asm/sbi.h
> @@ -27,6 +27,7 @@ enum sbi_ext_id {
> SBI_EXT_IPI = 0x735049,
> SBI_EXT_RFENCE = 0x52464E43,
> SBI_EXT_HSM = 0x48534D,
> +   SBI_EXT_SRST = 0x53525354,
>  };
>
>  enum sbi_ext_base_fid {
> @@ -70,6 +71,21 @@ enum sbi_hsm_hart_status {
> SBI_HSM_HART_STATUS_STOP_PENDING,
>  };
>
> +enum sbi_ext_srst_fid {
> +   SBI_EXT_SRST_RESET = 0,
> +};
> +
> +enum sbi_srst_reset_type {
> +   SBI_SRST_RESET_TYPE_SHUTDOWN = 0,
> +   SBI_SRST_RESET_TYPE_COLD_REBOOT,
> +   SBI_SRST_RESET_TYPE_WARM_REBOOT,
> +};
> +
> +enum sbi_srst_reset_reason {
> +   SBI_SRST_RESET_REASON_NONE = 0,
> +   SBI_SRST_RESET_REASON_SYS_FAILURE,
> +};
> +
>  #define SBI_SPEC_VERSION_DEFAULT   0x1
>  #define SBI_SPEC_VERSION_MAJOR_SHIFT   24
>  #define SBI_SPEC_VERSION_MAJOR_MASK0x7f
> diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
> index f4a7db3d309e..49155588e56c 100644
> --- a/arch/riscv/kernel/sbi.c
> +++ b/arch/riscv/kernel/sbi.c
> @@ -7,6 +7,7 @@
>
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>
> @@ -501,6 +502,32 @@ int sbi_remote_hfence_vvma_asid(const unsigned long 
> *hart_mask,
>  }
>  EXPORT_SYMBOL(sbi_remote_hfence_vvma_asid);
>
> +static void sbi_srst_reset(unsigned long type, unsigned long reason)
> +{
> +   sbi_ecall(SBI_EXT_SRST, SBI_EXT_SRST_RESET, type, reason,
> + 0, 0, 0, 0);
> +   pr_warn("%s: type=0x%lx reason=0x%lx failed\n",
> +   __func__, type, reason);
> +}
> +
> +static int sbi_srst_reboot(struct notifier_block *this,
> +  unsigned long mode, void *cmd)
> +{
> +   sbi_srst_reset((mode == REBOOT_WARM || mode == REBOOT_SOFT) ?
> +  SBI_SRST_RESET_TYPE_WARM_REBOOT :
> +  SBI_SRST_RESET_TYPE_COLD_REBOOT,
> +  SBI_SRST_RESET_REASON_NONE);
> +   return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block sbi_srst_reboot_nb;
> +
> +static void sbi_srst_power_off(void)
> +{
> +   sbi_srst_reset(SBI_SRST_RESET_TYPE_SHUTDOWN,
> +  SBI_SRST_RESET_REASON_NONE);
> +}
> +
>  /**
>   * sbi_probe_extension() - Check if an SBI extension ID is supported or not.
>   * @extid: The extension ID to be probed.
> @@ -577,22 +604,30 @@ void __init sbi_init(void)
> sbi_get_firmware_id(), sbi_get_firmware_version());
> if (sbi_probe_extension(SBI_EXT_TIME) > 0) {
> __sbi_set_timer = __sbi_set_timer_v02;
> -   pr_info("SBI v0.2 TIME extension detected\n");
> +   pr_info("SBI TIME extension detected\n");
> } else {
> __sbi_set_timer = __sbi_set_timer_v01;
> }
> if (sbi_probe_extension(SBI_EXT_IPI) > 0) {
> __sbi_send_ipi  = __sbi_send_ipi_v02;
> -   pr_info("SBI v0.2 IPI extension detected\n");
> +   pr_info("SBI IPI extension detected\n");
> } else {
> __sbi_send_ipi  = __sbi_send_ipi_v01;
> }
> if (sbi_probe_extension(SBI_EXT_RFENCE) > 0) {
>  

[PATCH v4] RISC-V: Use SBI SRST extension when available

2021-03-01 Thread Anup Patel
The SBI SRST extension provides a standard way to poweroff and
reboot the system irrespective to whether Linux RISC-V S-mode
is running natively (HS-mode) or inside Guest/VM (VS-mode).

The SBI SRST extension is available in latest SBI v0.3-draft
specification at: https://github.com/riscv/riscv-sbi-doc.

This patch extends Linux RISC-V SBI implementation to detect
and use SBI SRST extension.

Signed-off-by: Anup Patel 
---
 arch/riscv/include/asm/sbi.h | 16 ++
 arch/riscv/kernel/sbi.c  | 41 +---
 2 files changed, 54 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index 99895d9c3bdd..8add0209c9c7 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -27,6 +27,7 @@ enum sbi_ext_id {
SBI_EXT_IPI = 0x735049,
SBI_EXT_RFENCE = 0x52464E43,
SBI_EXT_HSM = 0x48534D,
+   SBI_EXT_SRST = 0x53525354,
 };
 
 enum sbi_ext_base_fid {
@@ -70,6 +71,21 @@ enum sbi_hsm_hart_status {
SBI_HSM_HART_STATUS_STOP_PENDING,
 };
 
+enum sbi_ext_srst_fid {
+   SBI_EXT_SRST_RESET = 0,
+};
+
+enum sbi_srst_reset_type {
+   SBI_SRST_RESET_TYPE_SHUTDOWN = 0,
+   SBI_SRST_RESET_TYPE_COLD_REBOOT,
+   SBI_SRST_RESET_TYPE_WARM_REBOOT,
+};
+
+enum sbi_srst_reset_reason {
+   SBI_SRST_RESET_REASON_NONE = 0,
+   SBI_SRST_RESET_REASON_SYS_FAILURE,
+};
+
 #define SBI_SPEC_VERSION_DEFAULT   0x1
 #define SBI_SPEC_VERSION_MAJOR_SHIFT   24
 #define SBI_SPEC_VERSION_MAJOR_MASK0x7f
diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
index f4a7db3d309e..49155588e56c 100644
--- a/arch/riscv/kernel/sbi.c
+++ b/arch/riscv/kernel/sbi.c
@@ -7,6 +7,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -501,6 +502,32 @@ int sbi_remote_hfence_vvma_asid(const unsigned long 
*hart_mask,
 }
 EXPORT_SYMBOL(sbi_remote_hfence_vvma_asid);
 
+static void sbi_srst_reset(unsigned long type, unsigned long reason)
+{
+   sbi_ecall(SBI_EXT_SRST, SBI_EXT_SRST_RESET, type, reason,
+ 0, 0, 0, 0);
+   pr_warn("%s: type=0x%lx reason=0x%lx failed\n",
+   __func__, type, reason);
+}
+
+static int sbi_srst_reboot(struct notifier_block *this,
+  unsigned long mode, void *cmd)
+{
+   sbi_srst_reset((mode == REBOOT_WARM || mode == REBOOT_SOFT) ?
+  SBI_SRST_RESET_TYPE_WARM_REBOOT :
+  SBI_SRST_RESET_TYPE_COLD_REBOOT,
+  SBI_SRST_RESET_REASON_NONE);
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block sbi_srst_reboot_nb;
+
+static void sbi_srst_power_off(void)
+{
+   sbi_srst_reset(SBI_SRST_RESET_TYPE_SHUTDOWN,
+  SBI_SRST_RESET_REASON_NONE);
+}
+
 /**
  * sbi_probe_extension() - Check if an SBI extension ID is supported or not.
  * @extid: The extension ID to be probed.
@@ -577,22 +604,30 @@ void __init sbi_init(void)
sbi_get_firmware_id(), sbi_get_firmware_version());
if (sbi_probe_extension(SBI_EXT_TIME) > 0) {
__sbi_set_timer = __sbi_set_timer_v02;
-   pr_info("SBI v0.2 TIME extension detected\n");
+   pr_info("SBI TIME extension detected\n");
} else {
__sbi_set_timer = __sbi_set_timer_v01;
}
if (sbi_probe_extension(SBI_EXT_IPI) > 0) {
__sbi_send_ipi  = __sbi_send_ipi_v02;
-   pr_info("SBI v0.2 IPI extension detected\n");
+   pr_info("SBI IPI extension detected\n");
} else {
__sbi_send_ipi  = __sbi_send_ipi_v01;
}
if (sbi_probe_extension(SBI_EXT_RFENCE) > 0) {
__sbi_rfence= __sbi_rfence_v02;
-   pr_info("SBI v0.2 RFENCE extension detected\n");
+   pr_info("SBI RFENCE extension detected\n");
} else {
__sbi_rfence= __sbi_rfence_v01;
}
+   if (sbi_probe_extension(SBI_EXT_SRST) > 0 &&
+   sbi_minor_version() >= 3) {
+   pr_info("SBI SRST extension detected\n");
+   pm_power_off = sbi_srst_power_off;
+   sbi_srst_reboot_nb.notifier_call = sbi_srst_reboot;
+   sbi_srst_reboot_nb.priority = 192;
+   register_restart_handler(_srst_reboot_nb);
+   }
} else {
__sbi_set_timer = __sbi_set_timer_v01;
__sbi_send_ipi  = __sbi_send_ipi_v01;
-- 
2.25.1



Re: [RFC PATCH 1/8] RISC-V: Enable CPU_IDLE drivers

2021-02-26 Thread Anup Patel
Hi Alex,

On Fri, Feb 26, 2021 at 6:46 PM Alex Ghiti  wrote:
>
> Hi Anup,
>
> Le 2/21/21 à 4:37 AM, Anup Patel a écrit :
> > We force select CPU_PM and provide asm/cpuidle.h so that we can
> > use CPU IDLE drivers for Linux RISC-V kernel.
> >
> > Signed-off-by: Anup Patel 
> > ---
> >   arch/riscv/Kconfig|  7 +++
> >   arch/riscv/configs/defconfig  |  7 +++
> >   arch/riscv/configs/rv32_defconfig |  4 ++--
> >   arch/riscv/include/asm/cpuidle.h  | 24 
> >   arch/riscv/kernel/process.c   |  3 ++-
> >   5 files changed, 38 insertions(+), 7 deletions(-)
> >   create mode 100644 arch/riscv/include/asm/cpuidle.h
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index fe6862b06ead..4901200b6b6c 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -37,6 +37,7 @@ config RISCV
> >   select CLONE_BACKWARDS
> >   select CLINT_TIMER if !MMU
> >   select COMMON_CLK
> > + select CPU_PM if CPU_IDLE
> >   select EDAC_SUPPORT
> >   select GENERIC_ARCH_TOPOLOGY if SMP
> >   select GENERIC_ATOMIC64 if !64BIT
> > @@ -430,4 +431,10 @@ source "kernel/power/Kconfig"
> >
> >   endmenu
> >
> > +menu "CPU Power Management"
> > +
> > +source "drivers/cpuidle/Kconfig"
> > +
> > +endmenu
> > +
> >   source "drivers/firmware/Kconfig"
> > diff --git a/arch/riscv/configs/defconfig b/arch/riscv/configs/defconfig
> > index 6c0625aa96c7..dc4927c0e44b 100644
> > --- a/arch/riscv/configs/defconfig
> > +++ b/arch/riscv/configs/defconfig
> > @@ -13,11 +13,13 @@ CONFIG_USER_NS=y
> >   CONFIG_CHECKPOINT_RESTORE=y
> >   CONFIG_BLK_DEV_INITRD=y
> >   CONFIG_EXPERT=y
> > +# CONFIG_SYSFS_SYSCALL is not set
> >   CONFIG_BPF_SYSCALL=y
> >   CONFIG_SOC_SIFIVE=y
> >   CONFIG_SOC_VIRT=y
> >   CONFIG_SMP=y
> >   CONFIG_HOTPLUG_CPU=y
> > +CONFIG_CPU_IDLE=y
> >   CONFIG_JUMP_LABEL=y
> >   CONFIG_MODULES=y
> >   CONFIG_MODULE_UNLOAD=y
> > @@ -65,10 +67,9 @@ CONFIG_HW_RANDOM=y
> >   CONFIG_HW_RANDOM_VIRTIO=y
> >   CONFIG_SPI=y
> >   CONFIG_SPI_SIFIVE=y
> > +# CONFIG_PTP_1588_CLOCK is not set
> >   CONFIG_GPIOLIB=y
> >   CONFIG_GPIO_SIFIVE=y
> > -# CONFIG_PTP_1588_CLOCK is not set
> > -CONFIG_POWER_RESET=y
>
> Why do you remove this config ?

Argh, I don't know how this got here. I will remove
this change in the next revision. Thanks for catching.

>
> >   CONFIG_DRM=y
> >   CONFIG_DRM_RADEON=y
> >   CONFIG_DRM_VIRTIO_GPU=y
> > @@ -132,5 +133,3 @@ CONFIG_DEBUG_BLOCK_EXT_DEVT=y
> >   # CONFIG_FTRACE is not set
> >   # CONFIG_RUNTIME_TESTING_MENU is not set
> >   CONFIG_MEMTEST=y
> > -# CONFIG_SYSFS_SYSCALL is not set
> > -CONFIG_EFI=y
>
> And this is one too ? If those removals are intentional, maybe you can
> add something about that in the commit description ?

I will remove this change as well.

>
> > diff --git a/arch/riscv/configs/rv32_defconfig 
> > b/arch/riscv/configs/rv32_defconfig
> > index 8dd02b842fef..332e43a4a2c3 100644
> > --- a/arch/riscv/configs/rv32_defconfig
> > +++ b/arch/riscv/configs/rv32_defconfig
> > @@ -13,12 +13,14 @@ CONFIG_USER_NS=y
> >   CONFIG_CHECKPOINT_RESTORE=y
> >   CONFIG_BLK_DEV_INITRD=y
> >   CONFIG_EXPERT=y
> > +# CONFIG_SYSFS_SYSCALL is not set
> >   CONFIG_BPF_SYSCALL=y
> >   CONFIG_SOC_SIFIVE=y
> >   CONFIG_SOC_VIRT=y
> >   CONFIG_ARCH_RV32I=y
> >   CONFIG_SMP=y
> >   CONFIG_HOTPLUG_CPU=y
> > +CONFIG_CPU_IDLE=y
> >   CONFIG_JUMP_LABEL=y
> >   CONFIG_MODULES=y
> >   CONFIG_MODULE_UNLOAD=y
> > @@ -67,7 +69,6 @@ CONFIG_HW_RANDOM_VIRTIO=y
> >   CONFIG_SPI=y
> >   CONFIG_SPI_SIFIVE=y
> >   # CONFIG_PTP_1588_CLOCK is not set
> > -CONFIG_POWER_RESET=y
> >   CONFIG_DRM=y
> >   CONFIG_DRM_RADEON=y
> >   CONFIG_DRM_VIRTIO_GPU=y
> > @@ -131,4 +132,3 @@ CONFIG_DEBUG_BLOCK_EXT_DEVT=y
> >   # CONFIG_FTRACE is not set
> >   # CONFIG_RUNTIME_TESTING_MENU is not set
> >   CONFIG_MEMTEST=y
> > -# CONFIG_SYSFS_SYSCALL is not set
> > diff --git a/arch/riscv/include/asm/cpuidle.h 
> > b/arch/riscv/include/asm/cpuidle.h
> > new file mode 100644
> > index ..1042d790e446
> > --- /dev/null
> > +++ b/arch/riscv/include/asm/cpuidle.h
> > @@ -0,0 +1,24 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * Copyright (C) 2021 Allwinner Ltd
> > + * Copyri

Re: [PATCH] RISC-V: Enable CPU Hotplug in defconfigs

2021-02-26 Thread Anup Patel
Hi Palmer,

On Fri, Feb 19, 2021 at 12:45 PM Palmer Dabbelt
 wrote:
>
> On Mon, 08 Feb 2021 21:46:20 PST (-0800), Anup Patel wrote:
> > The CPU hotplug support has been tested on QEMU, Spike, and SiFive
> > Unleashed so let's enable it by default in RV32 and RV64 defconfigs.
> >
> > Signed-off-by: Anup Patel 
> > ---
> >  arch/riscv/configs/defconfig  | 1 +
> >  arch/riscv/configs/rv32_defconfig | 1 +
> >  2 files changed, 2 insertions(+)
> >
> > diff --git a/arch/riscv/configs/defconfig b/arch/riscv/configs/defconfig
> > index 8c3d1e451703..6c0625aa96c7 100644
> > --- a/arch/riscv/configs/defconfig
> > +++ b/arch/riscv/configs/defconfig
> > @@ -17,6 +17,7 @@ CONFIG_BPF_SYSCALL=y
> >  CONFIG_SOC_SIFIVE=y
> >  CONFIG_SOC_VIRT=y
> >  CONFIG_SMP=y
> > +CONFIG_HOTPLUG_CPU=y
> >  CONFIG_JUMP_LABEL=y
> >  CONFIG_MODULES=y
> >  CONFIG_MODULE_UNLOAD=y
> > diff --git a/arch/riscv/configs/rv32_defconfig 
> > b/arch/riscv/configs/rv32_defconfig
> > index 2c2cda6cc1c5..8dd02b842fef 100644
> > --- a/arch/riscv/configs/rv32_defconfig
> > +++ b/arch/riscv/configs/rv32_defconfig
> > @@ -18,6 +18,7 @@ CONFIG_SOC_SIFIVE=y
> >  CONFIG_SOC_VIRT=y
> >  CONFIG_ARCH_RV32I=y
> >  CONFIG_SMP=y
> > +CONFIG_HOTPLUG_CPU=y
> >  CONFIG_JUMP_LABEL=y
> >  CONFIG_MODULES=y
> >  CONFIG_MODULE_UNLOAD=y
>
> Thanks, this is on for-next.

This patch was missed in your PR for Linux-5.12-rc1

Regards,
Anup


[RFC PATCH 8/8] RISC-V: Enable RISC-V SBI CPU Idle driver for QEMU virt machine

2021-02-21 Thread Anup Patel
We enable RISC-V SBI CPU Idle driver for QEMU virt machine to test
SBI HSM Supend on QEMU.

Signed-off-by: Anup Patel 
---
 arch/riscv/Kconfig.socs   | 3 +++
 arch/riscv/configs/defconfig  | 1 +
 arch/riscv/configs/rv32_defconfig | 1 +
 3 files changed, 5 insertions(+)

diff --git a/arch/riscv/Kconfig.socs b/arch/riscv/Kconfig.socs
index 3284d5c291be..5f6f4a520772 100644
--- a/arch/riscv/Kconfig.socs
+++ b/arch/riscv/Kconfig.socs
@@ -19,6 +19,9 @@ config SOC_VIRT
select GOLDFISH
select RTC_DRV_GOLDFISH if RTC_CLASS
select SIFIVE_PLIC
+   select PM_GENERIC_DOMAINS if PM
+   select PM_GENERIC_DOMAINS_OF if PM && OF
+   select RISCV_SBI_CPUIDLE if CPU_IDLE
help
  This enables support for QEMU Virt Machine.
 
diff --git a/arch/riscv/configs/defconfig b/arch/riscv/configs/defconfig
index dc4927c0e44b..aac26c20bbf5 100644
--- a/arch/riscv/configs/defconfig
+++ b/arch/riscv/configs/defconfig
@@ -19,6 +19,7 @@ CONFIG_SOC_SIFIVE=y
 CONFIG_SOC_VIRT=y
 CONFIG_SMP=y
 CONFIG_HOTPLUG_CPU=y
+CONFIG_PM=y
 CONFIG_CPU_IDLE=y
 CONFIG_JUMP_LABEL=y
 CONFIG_MODULES=y
diff --git a/arch/riscv/configs/rv32_defconfig 
b/arch/riscv/configs/rv32_defconfig
index 332e43a4a2c3..2285c95e34b3 100644
--- a/arch/riscv/configs/rv32_defconfig
+++ b/arch/riscv/configs/rv32_defconfig
@@ -20,6 +20,7 @@ CONFIG_SOC_VIRT=y
 CONFIG_ARCH_RV32I=y
 CONFIG_SMP=y
 CONFIG_HOTPLUG_CPU=y
+CONFIG_PM=y
 CONFIG_CPU_IDLE=y
 CONFIG_JUMP_LABEL=y
 CONFIG_MODULES=y
-- 
2.25.1



[RFC PATCH 5/8] cpuidle: Factor-out power domain related code from PSCI domain driver

2021-02-21 Thread Anup Patel
The generic power domain related code in PSCI domain driver is largely
independent of PSCI and can be shared with RISC-V SBI domain driver
hence we factor-out this code into dt_idle_genpd.c and dt_idle_genpd.h.

Signed-off-by: Anup Patel 
---
 drivers/cpuidle/Kconfig   |   4 +
 drivers/cpuidle/Kconfig.arm   |   1 +
 drivers/cpuidle/Makefile  |   1 +
 drivers/cpuidle/cpuidle-psci-domain.c | 244 +-
 drivers/cpuidle/cpuidle-psci.h|  15 +-
 ...{cpuidle-psci-domain.c => dt_idle_genpd.c} | 165 
 drivers/cpuidle/dt_idle_genpd.h   |  42 +++
 7 files changed, 121 insertions(+), 351 deletions(-)
 copy drivers/cpuidle/{cpuidle-psci-domain.c => dt_idle_genpd.c} (52%)
 create mode 100644 drivers/cpuidle/dt_idle_genpd.h

diff --git a/drivers/cpuidle/Kconfig b/drivers/cpuidle/Kconfig
index c0aeedd66f02..f1afe7ab6b54 100644
--- a/drivers/cpuidle/Kconfig
+++ b/drivers/cpuidle/Kconfig
@@ -47,6 +47,10 @@ config CPU_IDLE_GOV_HALTPOLL
 config DT_IDLE_STATES
bool
 
+config DT_IDLE_GENPD
+   depends on PM_GENERIC_DOMAINS_OF
+   bool
+
 menu "ARM CPU Idle Drivers"
 depends on ARM || ARM64
 source "drivers/cpuidle/Kconfig.arm"
diff --git a/drivers/cpuidle/Kconfig.arm b/drivers/cpuidle/Kconfig.arm
index 0844fadc4be8..1007435ae298 100644
--- a/drivers/cpuidle/Kconfig.arm
+++ b/drivers/cpuidle/Kconfig.arm
@@ -27,6 +27,7 @@ config ARM_PSCI_CPUIDLE_DOMAIN
bool "PSCI CPU idle Domain"
depends on ARM_PSCI_CPUIDLE
depends on PM_GENERIC_DOMAINS_OF
+   select DT_IDLE_GENPD
default y
help
  Select this to enable the PSCI based CPUidle driver to use PM domains,
diff --git a/drivers/cpuidle/Makefile b/drivers/cpuidle/Makefile
index 26bbc5e74123..11a26cef279f 100644
--- a/drivers/cpuidle/Makefile
+++ b/drivers/cpuidle/Makefile
@@ -6,6 +6,7 @@
 obj-y += cpuidle.o driver.o governor.o sysfs.o governors/
 obj-$(CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED) += coupled.o
 obj-$(CONFIG_DT_IDLE_STATES) += dt_idle_states.o
+obj-$(CONFIG_DT_IDLE_GENPD)  += dt_idle_genpd.o
 obj-$(CONFIG_ARCH_HAS_CPU_RELAX) += poll_state.o
 obj-$(CONFIG_HALTPOLL_CPUIDLE)   += cpuidle-haltpoll.o
 
diff --git a/drivers/cpuidle/cpuidle-psci-domain.c 
b/drivers/cpuidle/cpuidle-psci-domain.c
index ff2c3f8e4668..b0621d890ab7 100644
--- a/drivers/cpuidle/cpuidle-psci-domain.c
+++ b/drivers/cpuidle/cpuidle-psci-domain.c
@@ -16,17 +16,9 @@
 #include 
 #include 
 #include 
-#include 
-#include 
 
 #include "cpuidle-psci.h"
 
-struct psci_pd_provider {
-   struct list_head link;
-   struct device_node *node;
-};
-
-static LIST_HEAD(psci_pd_providers);
 static bool psci_pd_allow_domain_state;
 
 static int psci_pd_power_off(struct generic_pm_domain *pd)
@@ -47,178 +39,6 @@ static int psci_pd_power_off(struct generic_pm_domain *pd)
return 0;
 }
 
-static int psci_pd_parse_state_nodes(struct genpd_power_state *states,
-int state_count)
-{
-   int i, ret;
-   u32 psci_state, *psci_state_buf;
-
-   for (i = 0; i < state_count; i++) {
-   ret = psci_dt_parse_state_node(to_of_node(states[i].fwnode),
-   _state);
-   if (ret)
-   goto free_state;
-
-   psci_state_buf = kmalloc(sizeof(u32), GFP_KERNEL);
-   if (!psci_state_buf) {
-   ret = -ENOMEM;
-   goto free_state;
-   }
-   *psci_state_buf = psci_state;
-   states[i].data = psci_state_buf;
-   }
-
-   return 0;
-
-free_state:
-   i--;
-   for (; i >= 0; i--)
-   kfree(states[i].data);
-   return ret;
-}
-
-static int psci_pd_parse_states(struct device_node *np,
-   struct genpd_power_state **states, int *state_count)
-{
-   int ret;
-
-   /* Parse the domain idle states. */
-   ret = of_genpd_parse_idle_states(np, states, state_count);
-   if (ret)
-   return ret;
-
-   /* Fill out the PSCI specifics for each found state. */
-   ret = psci_pd_parse_state_nodes(*states, *state_count);
-   if (ret)
-   kfree(*states);
-
-   return ret;
-}
-
-static void psci_pd_free_states(struct genpd_power_state *states,
-   unsigned int state_count)
-{
-   int i;
-
-   for (i = 0; i < state_count; i++)
-   kfree(states[i].data);
-   kfree(states);
-}
-
-static int psci_pd_init(struct device_node *np, bool use_osi)
-{
-   struct generic_pm_domain *pd;
-   struct psci_pd_provider *pd_provider;
-   struct dev_power_governor *pd_gov;
-   struct genpd_power_state *states = NULL;
-   int ret = -ENOMEM, state_count = 0;
-
-   pd = kzalloc(sizeof(*pd),

[RFC PATCH 7/8] dt-bindings: Add bindings documentation for RISC-V idle states

2021-02-21 Thread Anup Patel
The RISC-V CPU idle states will be described in DT under the
/cpus/riscv-idle-states DT node. This patch adds the bindings
documentation for riscv-idle-states DT nodes and idle state DT
nodes under it.

Signed-off-by: Anup Patel 
---
 .../bindings/riscv/idle-states.yaml   | 250 ++
 1 file changed, 250 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/riscv/idle-states.yaml

diff --git a/Documentation/devicetree/bindings/riscv/idle-states.yaml 
b/Documentation/devicetree/bindings/riscv/idle-states.yaml
new file mode 100644
index ..3eff763fed23
--- /dev/null
+++ b/Documentation/devicetree/bindings/riscv/idle-states.yaml
@@ -0,0 +1,250 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/riscv/idle-states.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: RISC-V idle states binding description
+
+maintainers:
+  - Anup Patel 
+
+description: |+
+  RISC-V systems can manage power consumption dynamically, where HARTs
+  (or CPUs) [1] can be put in different platform specific suspend (or
+  idle) states (ranging from simple WFI, power gating, etc). The RISC-V
+  SBI [2] hart state management extension provides a standard mechanism
+  for OSes to request HART state transitions.
+
+  The platform specific suspend (or idle) states of a hart can be either
+  retentive or non-rententive in nature. A retentive suspend state will
+  preserve hart register and CSR values for all privilege modes whereas
+  a non-retentive suspend state will not preserve hart register and CSR
+  values. The suspend (or idle) state entered by executing the WFI
+  instruction is considered standard on all RISC-V systems and therefore
+  must not be listed in device tree.
+
+  The device tree binding definition for RISC-V idle states described
+  in this document is quite similar to the ARM idle states [3].
+
+  References
+
+  [1] RISC-V Linux Kernel documentation - CPUs bindings
+  Documentation/devicetree/bindings/riscv/cpus.yaml
+
+  [2] RISC-V Supervisor Binary Interface (SBI)
+  http://github.com/riscv/riscv-sbi-doc/riscv-sbi.adoc
+
+  [3] ARM idle states binding description - Idle states bindings
+  Documentation/devicetree/bindings/arm/idle-states.yaml
+
+properties:
+  $nodename:
+const: riscv-idle-states
+
+patternProperties:
+  "^(cpu|cluster)-":
+type: object
+description: |
+  Each state node represents an idle state description and must be
+  defined as follows.
+
+properties:
+  compatible:
+const: riscv,idle-state
+
+  local-timer-stop:
+description:
+  If present the CPU local timer control logic is lost on state
+  entry, otherwise it is retained.
+type: boolean
+
+  entry-latency-us:
+description:
+  Worst case latency in microseconds required to enter the idle state.
+
+  exit-latency-us:
+description:
+  Worst case latency in microseconds required to exit the idle state.
+  The exit-latency-us duration may be guaranteed only after
+  entry-latency-us has passed.
+
+  min-residency-us:
+description:
+  Minimum residency duration in microseconds, inclusive of preparation
+  and entry, for this idle state to be considered worthwhile energy
+  wise (refer to section 2 of this document for a complete 
description).
+
+  wakeup-latency-us:
+description: |
+  Maximum delay between the signaling of a wake-up event and the CPU
+  being able to execute normal code again. If omitted, this is assumed
+  to be equal to:
+
+entry-latency-us + exit-latency-us
+
+  It is important to supply this value on systems where the duration
+  of PREP phase (see diagram 1, section 2) is non-neglibigle. In such
+  systems entry-latency-us + exit-latency-us will exceed
+  wakeup-latency-us by this duration.
+
+  idle-state-name:
+$ref: /schemas/types.yaml#/definitions/string
+description:
+  A string used as a descriptive name for the idle state.
+
+required:
+  - compatible
+  - entry-latency-us
+  - exit-latency-us
+  - min-residency-us
+
+additionalProperties: false
+
+examples:
+  - |
+
+cpus {
+#size-cells = <0>;
+#address-cells = <1>;
+
+cpu@0 {
+device_type = "cpu";
+compatible = "riscv";
+reg = <0x0>;
+riscv,isa = "rv64imafdc";
+mmu-type = "riscv,sv48";
+cpu-idle-states = <_RET_0_0 _NONRET_0_0
+_RET_0 _NONRET_0>;
+
+cpu_intc0: interrupt-controller {
+#interrupt-cells = <1>;
+compatible = "riscv,cpu-intc";
+interrupt-

[RFC PATCH 4/8] RISC-V: Add SBI HSM suspend related defines

2021-02-21 Thread Anup Patel
We add defines related to SBI HSM suspend call and also
update HSM states naming as-per latest SBI specification.

Signed-off-by: Anup Patel 
---
 arch/riscv/include/asm/sbi.h| 27 ++-
 arch/riscv/kernel/cpu_ops_sbi.c |  2 +-
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index 5b2d6d614c20..4f101f1aa4ea 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -62,15 +62,32 @@ enum sbi_ext_hsm_fid {
SBI_EXT_HSM_HART_START = 0,
SBI_EXT_HSM_HART_STOP,
SBI_EXT_HSM_HART_STATUS,
+   SBI_EXT_HSM_HART_SUSPEND,
 };
 
-enum sbi_hsm_hart_status {
-   SBI_HSM_HART_STATUS_STARTED = 0,
-   SBI_HSM_HART_STATUS_STOPPED,
-   SBI_HSM_HART_STATUS_START_PENDING,
-   SBI_HSM_HART_STATUS_STOP_PENDING,
+enum sbi_hsm_hart_state {
+   SBI_HSM_STATE_STARTED = 0,
+   SBI_HSM_STATE_STOPPED,
+   SBI_HSM_STATE_START_PENDING,
+   SBI_HSM_STATE_STOP_PENDING,
+   SBI_HSM_STATE_SUSPENDED,
+   SBI_HSM_STATE_SUSPEND_PENDING,
+   SBI_HSM_STATE_RESUME_PENDING,
 };
 
+#define SBI_HSM_SUSP_BASE_MASK 0x7fff
+#define SBI_HSM_SUSP_NON_RET_BIT   0x8000
+#define SBI_HSM_SUSP_PLAT_BASE 0x1000
+
+#define SBI_HSM_SUSPEND_RET_DEFAULT0x
+#define SBI_HSM_SUSPEND_RET_PLATFORM   SBI_HSM_SUSP_PLAT_BASE
+#define SBI_HSM_SUSPEND_RET_LAST   SBI_HSM_SUSP_BASE_MASK
+#define SBI_HSM_SUSPEND_NON_RET_DEFAULTSBI_HSM_SUSP_NON_RET_BIT
+#define SBI_HSM_SUSPEND_NON_RET_PLATFORM   (SBI_HSM_SUSP_NON_RET_BIT | \
+SBI_HSM_SUSP_PLAT_BASE)
+#define SBI_HSM_SUSPEND_NON_RET_LAST   (SBI_HSM_SUSP_NON_RET_BIT | \
+SBI_HSM_SUSP_BASE_MASK)
+
 enum sbi_ext_srst_fid {
SBI_EXT_SRST_RESET = 0,
 };
diff --git a/arch/riscv/kernel/cpu_ops_sbi.c b/arch/riscv/kernel/cpu_ops_sbi.c
index 685fae72b7f5..5fd90f03a3e9 100644
--- a/arch/riscv/kernel/cpu_ops_sbi.c
+++ b/arch/riscv/kernel/cpu_ops_sbi.c
@@ -97,7 +97,7 @@ static int sbi_cpu_is_stopped(unsigned int cpuid)
 
rc = sbi_hsm_hart_get_status(hartid);
 
-   if (rc == SBI_HSM_HART_STATUS_STOPPED)
+   if (rc == SBI_HSM_STATE_STOPPED)
return 0;
return rc;
 }
-- 
2.25.1



[RFC PATCH 3/8] RISC-V: Add arch functions for non-retentive suspend entry/exit

2021-02-21 Thread Anup Patel
The hart registers and CSRs are not preserved in non-retentative
suspend state so we provide arch specific helper functions which
will save/restore hart context upon entry/exit to non-retentive
suspend state. These helper functions can be used by cpuidle
drivers for non-retentive suspend entry/exit.

Signed-off-by: Anup Patel 
---
 arch/riscv/include/asm/suspend.h  |  35 +
 arch/riscv/kernel/Makefile|   2 +
 arch/riscv/kernel/asm-offsets.c   |   3 +
 arch/riscv/kernel/suspend.c   |  86 ++
 arch/riscv/kernel/suspend_entry.S | 116 ++
 5 files changed, 242 insertions(+)
 create mode 100644 arch/riscv/include/asm/suspend.h
 create mode 100644 arch/riscv/kernel/suspend.c
 create mode 100644 arch/riscv/kernel/suspend_entry.S

diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
new file mode 100644
index ..63e9f434fb89
--- /dev/null
+++ b/arch/riscv/include/asm/suspend.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2021 Western Digital Corporation or its affiliates.
+ */
+
+#ifndef _ASM_RISCV_SUSPEND_H
+#define _ASM_RISCV_SUSPEND_H
+
+#include 
+
+struct suspend_context {
+   /* Saved and restored by low-level functions */
+   struct pt_regs regs;
+   /* Saved and restored by high-level functions */
+   unsigned long scratch;
+   unsigned long tvec;
+   unsigned long ie;
+#ifdef CONFIG_MMU
+   unsigned long satp;
+#endif
+};
+
+/* Low-level CPU suspend entry function */
+int __cpu_suspend_enter(struct suspend_context *context);
+
+/* High-level CPU suspend which will save context and call finish() */
+int cpu_suspend(unsigned long arg,
+   int (*finish)(unsigned long arg,
+ unsigned long entry,
+ unsigned long context));
+
+/* Low-level CPU resume entry function */
+int __cpu_resume_enter(unsigned long hartid, unsigned long context);
+
+#endif
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index f6caf4d9ca15..d83038bf93b2 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -40,6 +40,8 @@ obj-$(CONFIG_SMP) += cpu_ops_spinwait.o
 obj-$(CONFIG_MODULES)  += module.o
 obj-$(CONFIG_MODULE_SECTIONS)  += module-sections.o
 
+obj-$(CONFIG_CPU_PM)   += suspend_entry.o suspend.o
+
 obj-$(CONFIG_FUNCTION_TRACER)  += mcount.o ftrace.o
 obj-$(CONFIG_DYNAMIC_FTRACE)   += mcount-dyn.o
 
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index b79ffa3561fd..8259f5f16da8 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 void asm_offsets(void);
 
@@ -108,6 +109,8 @@ void asm_offsets(void)
OFFSET(PT_BADADDR, pt_regs, badaddr);
OFFSET(PT_CAUSE, pt_regs, cause);
 
+   OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
+
/*
 * THREAD_{F,X}* might be larger than a S-type offset can handle, but
 * these are used in performance-sensitive assembly so we can't resort
diff --git a/arch/riscv/kernel/suspend.c b/arch/riscv/kernel/suspend.c
new file mode 100644
index ..49dddec30e99
--- /dev/null
+++ b/arch/riscv/kernel/suspend.c
@@ -0,0 +1,86 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2021 Western Digital Corporation or its affiliates.
+ */
+
+#include 
+#include 
+#include 
+
+static void suspend_save_csrs(struct suspend_context *context)
+{
+   context->scratch = csr_read(CSR_SCRATCH);
+   context->tvec = csr_read(CSR_TVEC);
+   context->ie = csr_read(CSR_IE);
+
+   /*
+* No need to save/restore IP CSR (i.e. MIP or SIP) because:
+*
+* 1. For no-MMU (M-mode) kernel, the bits in MIP are set by
+*external devices (such as interrupt controller, timer, etc).
+* 2. For MMU (S-mode) kernel, the bits in SIP are set by
+*M-mode firmware and external devices (such as interrupt
+*controller, etc).
+*/
+
+#ifdef CONFIG_MMU
+   context->satp = csr_read(CSR_SATP);
+#endif
+}
+
+static void suspend_restore_csrs(struct suspend_context *context)
+{
+   csr_write(CSR_SCRATCH, context->scratch);
+   csr_write(CSR_TVEC, context->tvec);
+   csr_write(CSR_IE, context->ie);
+
+#ifdef CONFIG_MMU
+   csr_write(CSR_SATP, context->satp);
+#endif
+}
+
+int cpu_suspend(unsigned long arg,
+   int (*finish)(unsigned long arg,
+ unsigned long entry,
+ unsigned long context))
+{
+   int rc = 0;
+   struct suspend_context context = { 0 };
+
+   /* Finisher should be non-NULL */
+   if (!finish)
+   return -EINVAL;
+
+   /* Save additional CSRs*/
+   suspend_save_csrs();
+
+   /*
+* Function graph tracer sta

[RFC PATCH 6/8] cpuidle: Add RISC-V SBI CPU idle driver

2021-02-21 Thread Anup Patel
The RISC-V SBI HSM extension provides HSM suspend call which can
be used by Linux RISC-V to enter platform specific low-power state.

This patch adds a CPU idle driver based on RISC-V SBI calls which
will populate idle states from device tree and use SBI calls to
entry these idle states.

Signed-off-by: Anup Patel 
---
 MAINTAINERS   |   8 +
 drivers/cpuidle/Kconfig   |   5 +
 drivers/cpuidle/Kconfig.riscv |  15 +
 drivers/cpuidle/Makefile  |   4 +
 drivers/cpuidle/cpuidle-sbi.c | 503 ++
 5 files changed, 535 insertions(+)
 create mode 100644 drivers/cpuidle/Kconfig.riscv
 create mode 100644 drivers/cpuidle/cpuidle-sbi.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 667d03852191..eeeab188a8ac 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4641,6 +4641,14 @@ S:   Supported
 F: drivers/cpuidle/cpuidle-psci.h
 F: drivers/cpuidle/cpuidle-psci-domain.c
 
+CPUIDLE DRIVER - RISC-V SBI
+M: Anup Patel 
+R: Sandeep Tripathy 
+L: linux...@vger.kernel.org
+L: linux-ri...@lists.infradead.org
+S: Supported
+F: drivers/cpuidle/cpuidle-sbi.c
+
 CRAMFS FILESYSTEM
 M: Nicolas Pitre 
 S: Maintained
diff --git a/drivers/cpuidle/Kconfig b/drivers/cpuidle/Kconfig
index f1afe7ab6b54..ff71dd662880 100644
--- a/drivers/cpuidle/Kconfig
+++ b/drivers/cpuidle/Kconfig
@@ -66,6 +66,11 @@ depends on PPC
 source "drivers/cpuidle/Kconfig.powerpc"
 endmenu
 
+menu "RISC-V CPU Idle Drivers"
+depends on RISCV
+source "drivers/cpuidle/Kconfig.riscv"
+endmenu
+
 config HALTPOLL_CPUIDLE
tristate "Halt poll cpuidle driver"
depends on X86 && KVM_GUEST
diff --git a/drivers/cpuidle/Kconfig.riscv b/drivers/cpuidle/Kconfig.riscv
new file mode 100644
index ..78518c26af74
--- /dev/null
+++ b/drivers/cpuidle/Kconfig.riscv
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# RISC-V CPU Idle drivers
+#
+
+config RISCV_SBI_CPUIDLE
+   bool "RISC-V SBI CPU idle Driver"
+   depends on RISCV_SBI
+   select DT_IDLE_STATES
+   select CPU_IDLE_MULTIPLE_DRIVERS
+   select DT_IDLE_GENPD if PM_GENERIC_DOMAINS_OF
+   help
+ Select this option to enable RISC-V SBI firmware based CPU idle
+ driver for RISC-V systems. This drivers also supports hierarchical
+ DT based layout of the idle state.
diff --git a/drivers/cpuidle/Makefile b/drivers/cpuidle/Makefile
index 11a26cef279f..a36922c18510 100644
--- a/drivers/cpuidle/Makefile
+++ b/drivers/cpuidle/Makefile
@@ -35,3 +35,7 @@ obj-$(CONFIG_MIPS_CPS_CPUIDLE)+= cpuidle-cps.o
 # POWERPC drivers
 obj-$(CONFIG_PSERIES_CPUIDLE)  += cpuidle-pseries.o
 obj-$(CONFIG_POWERNV_CPUIDLE)  += cpuidle-powernv.o
+
+###
+# RISC-V drivers
+obj-$(CONFIG_RISCV_SBI_CPUIDLE)+= cpuidle-sbi.o
diff --git a/drivers/cpuidle/cpuidle-sbi.c b/drivers/cpuidle/cpuidle-sbi.c
new file mode 100644
index ..abbcabca0e91
--- /dev/null
+++ b/drivers/cpuidle/cpuidle-sbi.c
@@ -0,0 +1,503 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * RISC-V SBI CPU idle driver.
+ *
+ * Copyright (c) 2021 Western Digital Corporation or its affiliates.
+ */
+
+#define pr_fmt(fmt) "cpuidle-sbi: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "dt_idle_states.h"
+#include "dt_idle_genpd.h"
+
+struct sbi_cpuidle_data {
+   u32 *states;
+   struct device *dev;
+};
+
+struct sbi_domain_state {
+   bool available;
+   u32 state;
+};
+
+static DEFINE_PER_CPU_READ_MOSTLY(struct sbi_cpuidle_data, sbi_cpuidle_data);
+static DEFINE_PER_CPU(struct sbi_domain_state, domain_state);
+static bool sbi_cpuidle_use_osi;
+static bool sbi_cpuidle_use_cpuhp;
+static bool sbi_cpuidle_pd_allow_domain_state;
+
+static inline void sbi_set_domain_state(u32 state)
+{
+   struct sbi_domain_state *data = this_cpu_ptr(_state);
+
+   data->available = true;
+   data->state = state;
+}
+
+static inline u32 sbi_get_domain_state(void)
+{
+   struct sbi_domain_state *data = this_cpu_ptr(_state);
+
+   return data->state;
+}
+
+static inline void sbi_clear_domain_state(void)
+{
+   struct sbi_domain_state *data = this_cpu_ptr(_state);
+
+   data->available = false;
+}
+
+static inline bool sbi_is_domain_state_available(void)
+{
+   struct sbi_domain_state *data = this_cpu_ptr(_state);
+
+   return data->available;
+}
+
+static int sbi_suspend_finisher(unsigned long suspend_type,
+   unsigned long resume_addr,
+   unsigned long opaque)
+{
+   struct sbiret ret;
+
+   ret = sbi_ecall(SBI_EXT_HSM, SBI_EXT_HSM_HART_SUSPEND,
+   

[RFC PATCH 2/8] RISC-V: Rename relocate() and make it global

2021-02-21 Thread Anup Patel
The low-level relocate() function enables mmu and relocates
execution to link-time addresses. We rename relocate() function
to relocate_enable_mmu() function which is more informative.

Also, the relocate_enable_mmu() function will be used in the
resume path when a CPU wakes-up from a non-retentive suspend
so we make it global symbol.

Signed-off-by: Anup Patel 
---
 arch/riscv/kernel/head.S | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
index 16e9941900c4..a8aca43929d8 100644
--- a/arch/riscv/kernel/head.S
+++ b/arch/riscv/kernel/head.S
@@ -67,7 +67,8 @@ pe_head_start:
 
 .align 2
 #ifdef CONFIG_MMU
-relocate:
+   .global relocate_enable_mmu
+relocate_enable_mmu:
/* Relocate return address */
li a1, PAGE_OFFSET
la a2, _start
@@ -156,7 +157,7 @@ secondary_start_common:
 #ifdef CONFIG_MMU
/* Enable virtual memory and relocate to virtual address */
la a0, swapper_pg_dir
-   call relocate
+   call relocate_enable_mmu
 #endif
call setup_trap_vector
tail smp_callin
@@ -264,7 +265,7 @@ clear_bss_done:
call setup_vm
 #ifdef CONFIG_MMU
la a0, early_pg_dir
-   call relocate
+   call relocate_enable_mmu
 #endif /* CONFIG_MMU */
 
call setup_trap_vector
-- 
2.25.1



[RFC PATCH 1/8] RISC-V: Enable CPU_IDLE drivers

2021-02-21 Thread Anup Patel
We force select CPU_PM and provide asm/cpuidle.h so that we can
use CPU IDLE drivers for Linux RISC-V kernel.

Signed-off-by: Anup Patel 
---
 arch/riscv/Kconfig|  7 +++
 arch/riscv/configs/defconfig  |  7 +++
 arch/riscv/configs/rv32_defconfig |  4 ++--
 arch/riscv/include/asm/cpuidle.h  | 24 
 arch/riscv/kernel/process.c   |  3 ++-
 5 files changed, 38 insertions(+), 7 deletions(-)
 create mode 100644 arch/riscv/include/asm/cpuidle.h

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index fe6862b06ead..4901200b6b6c 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -37,6 +37,7 @@ config RISCV
select CLONE_BACKWARDS
select CLINT_TIMER if !MMU
select COMMON_CLK
+   select CPU_PM if CPU_IDLE
select EDAC_SUPPORT
select GENERIC_ARCH_TOPOLOGY if SMP
select GENERIC_ATOMIC64 if !64BIT
@@ -430,4 +431,10 @@ source "kernel/power/Kconfig"
 
 endmenu
 
+menu "CPU Power Management"
+
+source "drivers/cpuidle/Kconfig"
+
+endmenu
+
 source "drivers/firmware/Kconfig"
diff --git a/arch/riscv/configs/defconfig b/arch/riscv/configs/defconfig
index 6c0625aa96c7..dc4927c0e44b 100644
--- a/arch/riscv/configs/defconfig
+++ b/arch/riscv/configs/defconfig
@@ -13,11 +13,13 @@ CONFIG_USER_NS=y
 CONFIG_CHECKPOINT_RESTORE=y
 CONFIG_BLK_DEV_INITRD=y
 CONFIG_EXPERT=y
+# CONFIG_SYSFS_SYSCALL is not set
 CONFIG_BPF_SYSCALL=y
 CONFIG_SOC_SIFIVE=y
 CONFIG_SOC_VIRT=y
 CONFIG_SMP=y
 CONFIG_HOTPLUG_CPU=y
+CONFIG_CPU_IDLE=y
 CONFIG_JUMP_LABEL=y
 CONFIG_MODULES=y
 CONFIG_MODULE_UNLOAD=y
@@ -65,10 +67,9 @@ CONFIG_HW_RANDOM=y
 CONFIG_HW_RANDOM_VIRTIO=y
 CONFIG_SPI=y
 CONFIG_SPI_SIFIVE=y
+# CONFIG_PTP_1588_CLOCK is not set
 CONFIG_GPIOLIB=y
 CONFIG_GPIO_SIFIVE=y
-# CONFIG_PTP_1588_CLOCK is not set
-CONFIG_POWER_RESET=y
 CONFIG_DRM=y
 CONFIG_DRM_RADEON=y
 CONFIG_DRM_VIRTIO_GPU=y
@@ -132,5 +133,3 @@ CONFIG_DEBUG_BLOCK_EXT_DEVT=y
 # CONFIG_FTRACE is not set
 # CONFIG_RUNTIME_TESTING_MENU is not set
 CONFIG_MEMTEST=y
-# CONFIG_SYSFS_SYSCALL is not set
-CONFIG_EFI=y
diff --git a/arch/riscv/configs/rv32_defconfig 
b/arch/riscv/configs/rv32_defconfig
index 8dd02b842fef..332e43a4a2c3 100644
--- a/arch/riscv/configs/rv32_defconfig
+++ b/arch/riscv/configs/rv32_defconfig
@@ -13,12 +13,14 @@ CONFIG_USER_NS=y
 CONFIG_CHECKPOINT_RESTORE=y
 CONFIG_BLK_DEV_INITRD=y
 CONFIG_EXPERT=y
+# CONFIG_SYSFS_SYSCALL is not set
 CONFIG_BPF_SYSCALL=y
 CONFIG_SOC_SIFIVE=y
 CONFIG_SOC_VIRT=y
 CONFIG_ARCH_RV32I=y
 CONFIG_SMP=y
 CONFIG_HOTPLUG_CPU=y
+CONFIG_CPU_IDLE=y
 CONFIG_JUMP_LABEL=y
 CONFIG_MODULES=y
 CONFIG_MODULE_UNLOAD=y
@@ -67,7 +69,6 @@ CONFIG_HW_RANDOM_VIRTIO=y
 CONFIG_SPI=y
 CONFIG_SPI_SIFIVE=y
 # CONFIG_PTP_1588_CLOCK is not set
-CONFIG_POWER_RESET=y
 CONFIG_DRM=y
 CONFIG_DRM_RADEON=y
 CONFIG_DRM_VIRTIO_GPU=y
@@ -131,4 +132,3 @@ CONFIG_DEBUG_BLOCK_EXT_DEVT=y
 # CONFIG_FTRACE is not set
 # CONFIG_RUNTIME_TESTING_MENU is not set
 CONFIG_MEMTEST=y
-# CONFIG_SYSFS_SYSCALL is not set
diff --git a/arch/riscv/include/asm/cpuidle.h b/arch/riscv/include/asm/cpuidle.h
new file mode 100644
index ..1042d790e446
--- /dev/null
+++ b/arch/riscv/include/asm/cpuidle.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2021 Allwinner Ltd
+ * Copyright (C) 2021 Western Digital Corporation or its affiliates.
+ */
+
+#ifndef _ASM_RISCV_CPUIDLE_H
+#define _ASM_RISCV_CPUIDLE_H
+
+#include 
+#include 
+
+static inline void cpu_do_idle(void)
+{
+   /*
+* Add mb() here to ensure that all
+* IO/MEM access are completed prior
+* to enter WFI.
+*/
+   mb();
+   wait_for_interrupt();
+}
+
+#endif
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index dd5f985b1f40..b5b51fd26624 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 register unsigned long gp_in_global __asm__("gp");
 
@@ -35,7 +36,7 @@ extern asmlinkage void ret_from_kernel_thread(void);
 
 void arch_cpu_idle(void)
 {
-   wait_for_interrupt();
+   cpu_do_idle();
raw_local_irq_enable();
 }
 
-- 
2.25.1



[RFC PATCH 0/8] RISC-V CPU Idle Support

2021-02-21 Thread Anup Patel
This series adds RISC-V CPU Idle support using SBI HSM suspend function.
The RISC-V SBI CPU idle driver added by this series is highly inspired
from the ARM PSCI CPU idle driver.

At high-level, this series includes the following changes:
1) Preparatory arch/riscv patches (Patches 1 to 3)
2) Defines for RISC-V SBI HSM suspend (Patch 4)
3) Preparatory patch to shared code between RISC-V SBI CPU idle driver
   and ARM PSCI CPU idle driver (Patch 5)
4) RISC-V SBI CPU idle driver and related DT bindings (Patches 6 to 7)

These patches can be found in riscv_sbi_hsm_suspend_v1 branch at
https://github.com/avpatel/linux

The proposed SBI HSM suspend definition can be found in hsm_suspend_v3
branch at https://github.com/avpatel/riscv-sbi-doc

The OpenSBI implementation of SBI HSM suspend function can be found
in hsm_suspend_v1 branch at https://github.com/avpatel/opensbi

Special thanks Sandeep Tripathy for providing early feeback on SBI HSM
support in all above projects (RISC-V SBI specification, OpenSBI, and
Linux RISC-V).

Anup Patel (8):
  RISC-V: Enable CPU_IDLE drivers
  RISC-V: Rename relocate() and make it global
  RISC-V: Add arch functions for non-retentive suspend entry/exit
  RISC-V: Add SBI HSM suspend related defines
  cpuidle: Factor-out power domain related code from PSCI domain driver
  cpuidle: Add RISC-V SBI CPU idle driver
  dt-bindings: Add bindings documentation for RISC-V idle states
  RISC-V: Enable RISC-V SBI CPU Idle driver for QEMU virt machine

 .../bindings/riscv/idle-states.yaml   | 250 +
 MAINTAINERS   |   8 +
 arch/riscv/Kconfig|   7 +
 arch/riscv/Kconfig.socs   |   3 +
 arch/riscv/configs/defconfig  |   8 +-
 arch/riscv/configs/rv32_defconfig |   5 +-
 arch/riscv/include/asm/cpuidle.h  |  24 +
 arch/riscv/include/asm/sbi.h  |  27 +-
 arch/riscv/include/asm/suspend.h  |  35 ++
 arch/riscv/kernel/Makefile|   2 +
 arch/riscv/kernel/asm-offsets.c   |   3 +
 arch/riscv/kernel/cpu_ops_sbi.c   |   2 +-
 arch/riscv/kernel/head.S  |   7 +-
 arch/riscv/kernel/process.c   |   3 +-
 arch/riscv/kernel/suspend.c   |  86 +++
 arch/riscv/kernel/suspend_entry.S | 116 
 drivers/cpuidle/Kconfig   |   9 +
 drivers/cpuidle/Kconfig.arm   |   1 +
 drivers/cpuidle/Kconfig.riscv |  15 +
 drivers/cpuidle/Makefile  |   5 +
 drivers/cpuidle/cpuidle-psci-domain.c | 244 +
 drivers/cpuidle/cpuidle-psci.h|  15 +-
 drivers/cpuidle/cpuidle-sbi.c | 503 ++
 ...{cpuidle-psci-domain.c => dt_idle_genpd.c} | 165 ++
 drivers/cpuidle/dt_idle_genpd.h   |  42 ++
 25 files changed, 1218 insertions(+), 367 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/riscv/idle-states.yaml
 create mode 100644 arch/riscv/include/asm/cpuidle.h
 create mode 100644 arch/riscv/include/asm/suspend.h
 create mode 100644 arch/riscv/kernel/suspend.c
 create mode 100644 arch/riscv/kernel/suspend_entry.S
 create mode 100644 drivers/cpuidle/Kconfig.riscv
 create mode 100644 drivers/cpuidle/cpuidle-sbi.c
 copy drivers/cpuidle/{cpuidle-psci-domain.c => dt_idle_genpd.c} (52%)
 create mode 100644 drivers/cpuidle/dt_idle_genpd.h

-- 
2.25.1



[PATCH] RISC-V: Enable CPU Hotplug in defconfigs

2021-02-08 Thread Anup Patel
The CPU hotplug support has been tested on QEMU, Spike, and SiFive
Unleashed so let's enable it by default in RV32 and RV64 defconfigs.

Signed-off-by: Anup Patel 
---
 arch/riscv/configs/defconfig  | 1 +
 arch/riscv/configs/rv32_defconfig | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/riscv/configs/defconfig b/arch/riscv/configs/defconfig
index 8c3d1e451703..6c0625aa96c7 100644
--- a/arch/riscv/configs/defconfig
+++ b/arch/riscv/configs/defconfig
@@ -17,6 +17,7 @@ CONFIG_BPF_SYSCALL=y
 CONFIG_SOC_SIFIVE=y
 CONFIG_SOC_VIRT=y
 CONFIG_SMP=y
+CONFIG_HOTPLUG_CPU=y
 CONFIG_JUMP_LABEL=y
 CONFIG_MODULES=y
 CONFIG_MODULE_UNLOAD=y
diff --git a/arch/riscv/configs/rv32_defconfig 
b/arch/riscv/configs/rv32_defconfig
index 2c2cda6cc1c5..8dd02b842fef 100644
--- a/arch/riscv/configs/rv32_defconfig
+++ b/arch/riscv/configs/rv32_defconfig
@@ -18,6 +18,7 @@ CONFIG_SOC_SIFIVE=y
 CONFIG_SOC_VIRT=y
 CONFIG_ARCH_RV32I=y
 CONFIG_SMP=y
+CONFIG_HOTPLUG_CPU=y
 CONFIG_JUMP_LABEL=y
 CONFIG_MODULES=y
 CONFIG_MODULE_UNLOAD=y
-- 
2.25.1



[PATCH v5] RISC-V: Implement ASID allocator

2021-02-03 Thread Anup Patel
Currently, we do local TLB flush on every MM switch. This is very harsh on
performance because we are forcing page table walks after every MM switch.

This patch implements ASID allocator for assigning an ASID to a MM context.
The number of ASIDs are limited in HW so we create a logical entity named
CONTEXTID for assigning to MM context. The lower bits of CONTEXTID are ASID
and upper bits are VERSION number. The number of usable ASID bits supported
by HW are detected at boot-time by writing 1s to ASID bits in SATP CSR.

We allocate new CONTEXTID on first MM switch for a MM context where the
ASID is allocated from an ASID bitmap and VERSION is provide by an atomic
counter. At time of allocating new CONTEXTID, if we run out of available
ASIDs then:
1. We flush the ASID bitmap
2. Increment current VERSION atomic counter
3. Re-allocate ASID from ASID bitmap
4. Flush TLB on all CPUs
5. Try CONTEXTID re-assignment on all CPUs

Please note that we don't use ASID #0 because it is used at boot-time by
all CPUs for initial MM context. Also, newly created context is always
assigned CONTEXTID #0 (i.e. VERSION #0 and ASID #0) which is an invalid
context in our implementation.

Using above approach, we have virtually infinite CONTEXTIDs on-top-of
limited number of HW ASIDs. This approach is inspired from ASID allocator
used for Linux ARM/ARM64 but we have adapted it for RISC-V. Overall, this
ASID allocator helps us reduce rate of local TLB flushes on every CPU
thereby increasing performance.

This patch is tested on QEMU virt machine, Spike and SiFive Unleashed
board. On QEMU virt machine, we see some (3-5% approx) performance
improvement with SW emulated TLBs provided by QEMU. Unfortunately,
the ASID bits of the SATP CSR are not implemented on Spike and SiFive
Unleashed board so we don't see any change in performance. On real HW
having all ASID bits implemented, the performance gains will be much
more due improved sharing of TLB among different processes.

Signed-off-by: Anup Patel 
Reviewed-by: Palmer Dabbelt 
---
Changes since v4:
- Rebased on Linux-5.11-rc6
- Used lockdep_assert_held() in __new_context() and __flush_context()

Changes since v3:
- Rebased on Linux-5.11-rc4. The previous v3 patch (almost 2 years back)
  was based on Linux-5.1-rc2
- Updated implementation to consider NoMMU kernel
- Converted use_asid_allocator boolean flag into static key
- Improved boot-time print in asids_init() to show number of ASID bits
- Access SATP CSR by number instead of old CSR name "sptbr"

Changes since v2:
- Move to lazy TLB flushing because we get slow path warnings if we
  use flush_tlb_all()
- Don't set ASID bits to all 1s in head.s. Instead just do it on
  boot CPU calling asids_init() for determining number of HW ASID bits
- Make CONTEXT version comparison more readable in set_mm_asid()
- Fix typo in __flush_context()

Changes since v1:
- We adapt good aspects from Gary Guo's ASID allocator implementation
  and provide due credit to him by adding his SoB.
- Track ASIDs active during context flush and mark them as reserved
- Set ASID bits to all 1s to simplify number of ASID bit detection
- Use atomic_long_t instead of atomic64_t for being 32bit friendly
- Use unsigned long instead of u64 for being 32bit friendly
- Use flush_tlb_all() instead of lazy local_tlb_flush_all() at time
  of context flush

This patch is based on Linux-5.11-rc6 and can be found in the
riscv_asid_allocator_v5 branch of https://github.com/avpatel/linux.git
---
 arch/riscv/include/asm/csr.h |   6 +
 arch/riscv/include/asm/mmu.h |   2 +
 arch/riscv/include/asm/mmu_context.h |  10 +
 arch/riscv/mm/context.c  | 265 ++-
 4 files changed, 279 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index cec462e198ce..caadfc1d7487 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -41,10 +41,16 @@
 #define SATP_PPN   _AC(0x003F, UL)
 #define SATP_MODE_32   _AC(0x8000, UL)
 #define SATP_MODE  SATP_MODE_32
+#define SATP_ASID_BITS 9
+#define SATP_ASID_SHIFT22
+#define SATP_ASID_MASK _AC(0x1FF, UL)
 #else
 #define SATP_PPN   _AC(0x0FFF, UL)
 #define SATP_MODE_39   _AC(0x8000, UL)
 #define SATP_MODE  SATP_MODE_39
+#define SATP_ASID_BITS 16
+#define SATP_ASID_SHIFT44
+#define SATP_ASID_MASK _AC(0x, UL)
 #endif
 
 /* Exception cause high bit - is an interrupt if set */
diff --git a/arch/riscv/include/asm/mmu.h b/arch/riscv/include/asm/mmu.h
index dabcf2cfb3dc..0099dc116168 100644
--- a/arch/riscv/include/asm/mmu.h
+++ b/arch/riscv/include/asm/mmu.h
@@ -12,6 +12,8 @@
 typedef struct {
 #ifndef CONFIG_MMU
unsigned long   end_brk;
+#else
+   atomic_long_t id;
 #endif
void *vdso;
 #ifdef CONFIG_SMP
diff --git a/arch/riscv/include/asm/mmu_context.h 
b/arch/riscv/include/asm/mmu_context.h
index 250defa06f3a..b0659413a080 100644
--- a/

Re: [PATCH v4] RISC-V: Implement ASID allocator

2021-02-02 Thread Anup Patel
On Wed, Feb 3, 2021 at 7:58 AM Palmer Dabbelt  wrote:
>
> On Thu, 21 Jan 2021 05:50:16 PST (-0800), Anup Patel wrote:
> > Currently, we do local TLB flush on every MM switch. This is very harsh on
> > performance because we are forcing page table walks after every MM switch.
> >
> > This patch implements ASID allocator for assigning an ASID to a MM context.
> > The number of ASIDs are limited in HW so we create a logical entity named
> > CONTEXTID for assigning to MM context. The lower bits of CONTEXTID are ASID
> > and upper bits are VERSION number. The number of usable ASID bits supported
> > by HW are detected at boot-time by writing 1s to ASID bits in SATP CSR.
> >
> > We allocate new CONTEXTID on first MM switch for a MM context where the
> > ASID is allocated from an ASID bitmap and VERSION is provide by an atomic
> > counter. At time of allocating new CONTEXTID, if we run out of available
> > ASIDs then:
> > 1. We flush the ASID bitmap
> > 2. Increment current VERSION atomic counter
> > 3. Re-allocate ASID from ASID bitmap
> > 4. Flush TLB on all CPUs
> > 5. Try CONTEXTID re-assignment on all CPUs
> >
> > Please note that we don't use ASID #0 because it is used at boot-time by
> > all CPUs for initial MM context. Also, newly created context is always
> > assigned CONTEXTID #0 (i.e. VERSION #0 and ASID #0) which is an invalid
> > context in our implementation.
> >
> > Using above approach, we have virtually infinite CONTEXTIDs on-top-of
> > limited number of HW ASIDs. This approach is inspired from ASID allocator
> > used for Linux ARM/ARM64 but we have adapted it for RISC-V. Overall, this
> > ASID allocator helps us reduce rate of local TLB flushes on every CPU
> > thereby increasing performance.
> >
> > This patch is tested on QEMU virt machine, Spike and SiFive Unleashed
> > board. On QEMU virt machine, we see some (3-5% approx) performance
> > improvement with SW emulated TLBs provided by QEMU. Unfortunately,
> > the ASID bits of the SATP CSR are not implemented on Spike and SiFive
> > Unleashed board so we don't see any change in performance. On real
> > HW having all ASID bits implemented, the performance gains will be
> > much more due improved sharing of TLB among different processes.
> >
> > Signed-off-by: Anup Patel 
> > ---
> > Changes since v3:
> > - Rebased on Linux-5.11-rc4. The previous v3 patch (almost 2 years back)
> >   was basd on Linux-5.1-rc2
> > - Updated implementation to consider NoMMU kernel
> > - Converted use_asid_allocator boolean flag into static key
> > - Improved boot-time print in asids_init() to show number of ASID bits
> > - Access SATP CSR by number instead of old CSR name "sptbr"
> >
> > Changes since v2:
> > - Move to lazy TLB flushing because we get slow path warnings if we
> >   use flush_tlb_all()
> > - Don't set ASID bits to all 1s in head.s. Instead just do it on
> >   boot CPU calling asids_init() for determining number of HW ASID bits
> > - Make CONTEXT version comparison more readable in set_mm_asid()
> > - Fix typo in __flush_context()
> >
> > Changes since v1:
> > - We adapt good aspects from Gary Guo's ASID allocator implementation
> >   and provide due credit to him by adding his SoB.
> > - Track ASIDs active during context flush and mark them as reserved
> > - Set ASID bits to all 1s to simplify number of ASID bit detection
> > - Use atomic_long_t instead of atomic64_t for being 32bit friendly
> > - Use unsigned long instead of u64 for being 32bit friendly
> > - Use flush_tlb_all() instead of lazy local_tlb_flush_all() at time
> >   of context flush
> >
> > This patch is based on Linux-5.11-rc4 and can be found in the
> > riscv_asid_allocator_v4 branch of https://github.com/avpatel/linux.git
> > ---
> >  arch/riscv/include/asm/csr.h |   6 +
> >  arch/riscv/include/asm/mmu.h |   2 +
> >  arch/riscv/include/asm/mmu_context.h |  10 +
> >  arch/riscv/mm/context.c  | 261 ++-
> >  4 files changed, 275 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
> > index cec462e198ce..caadfc1d7487 100644
> > --- a/arch/riscv/include/asm/csr.h
> > +++ b/arch/riscv/include/asm/csr.h
> > @@ -41,10 +41,16 @@
> >  #define SATP_PPN _AC(0x003F, UL)
> >  #define SATP_MODE_32 _AC(0x8000, UL)
> >  #define SATP_MODESATP_MODE_32
> > +#define SATP_ASID_BITS   9
> > +#define SATP_ASID_SHIFT  22
> > +#define SATP_ASID_MASK  

[PATCH v4] RISC-V: Implement ASID allocator

2021-01-21 Thread Anup Patel
Currently, we do local TLB flush on every MM switch. This is very harsh on
performance because we are forcing page table walks after every MM switch.

This patch implements ASID allocator for assigning an ASID to a MM context.
The number of ASIDs are limited in HW so we create a logical entity named
CONTEXTID for assigning to MM context. The lower bits of CONTEXTID are ASID
and upper bits are VERSION number. The number of usable ASID bits supported
by HW are detected at boot-time by writing 1s to ASID bits in SATP CSR.

We allocate new CONTEXTID on first MM switch for a MM context where the
ASID is allocated from an ASID bitmap and VERSION is provide by an atomic
counter. At time of allocating new CONTEXTID, if we run out of available
ASIDs then:
1. We flush the ASID bitmap
2. Increment current VERSION atomic counter
3. Re-allocate ASID from ASID bitmap
4. Flush TLB on all CPUs
5. Try CONTEXTID re-assignment on all CPUs

Please note that we don't use ASID #0 because it is used at boot-time by
all CPUs for initial MM context. Also, newly created context is always
assigned CONTEXTID #0 (i.e. VERSION #0 and ASID #0) which is an invalid
context in our implementation.

Using above approach, we have virtually infinite CONTEXTIDs on-top-of
limited number of HW ASIDs. This approach is inspired from ASID allocator
used for Linux ARM/ARM64 but we have adapted it for RISC-V. Overall, this
ASID allocator helps us reduce rate of local TLB flushes on every CPU
thereby increasing performance.

This patch is tested on QEMU virt machine, Spike and SiFive Unleashed
board. On QEMU virt machine, we see some (3-5% approx) performance
improvement with SW emulated TLBs provided by QEMU. Unfortunately,
the ASID bits of the SATP CSR are not implemented on Spike and SiFive
Unleashed board so we don't see any change in performance. On real
HW having all ASID bits implemented, the performance gains will be
much more due improved sharing of TLB among different processes.

Signed-off-by: Anup Patel 
---
Changes since v3:
- Rebased on Linux-5.11-rc4. The previous v3 patch (almost 2 years back)
  was basd on Linux-5.1-rc2
- Updated implementation to consider NoMMU kernel
- Converted use_asid_allocator boolean flag into static key
- Improved boot-time print in asids_init() to show number of ASID bits
- Access SATP CSR by number instead of old CSR name "sptbr"

Changes since v2:
- Move to lazy TLB flushing because we get slow path warnings if we
  use flush_tlb_all()
- Don't set ASID bits to all 1s in head.s. Instead just do it on
  boot CPU calling asids_init() for determining number of HW ASID bits
- Make CONTEXT version comparison more readable in set_mm_asid()
- Fix typo in __flush_context()

Changes since v1:
- We adapt good aspects from Gary Guo's ASID allocator implementation
  and provide due credit to him by adding his SoB.
- Track ASIDs active during context flush and mark them as reserved
- Set ASID bits to all 1s to simplify number of ASID bit detection
- Use atomic_long_t instead of atomic64_t for being 32bit friendly
- Use unsigned long instead of u64 for being 32bit friendly
- Use flush_tlb_all() instead of lazy local_tlb_flush_all() at time
  of context flush

This patch is based on Linux-5.11-rc4 and can be found in the
riscv_asid_allocator_v4 branch of https://github.com/avpatel/linux.git
---
 arch/riscv/include/asm/csr.h |   6 +
 arch/riscv/include/asm/mmu.h |   2 +
 arch/riscv/include/asm/mmu_context.h |  10 +
 arch/riscv/mm/context.c  | 261 ++-
 4 files changed, 275 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index cec462e198ce..caadfc1d7487 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -41,10 +41,16 @@
 #define SATP_PPN   _AC(0x003F, UL)
 #define SATP_MODE_32   _AC(0x8000, UL)
 #define SATP_MODE  SATP_MODE_32
+#define SATP_ASID_BITS 9
+#define SATP_ASID_SHIFT22
+#define SATP_ASID_MASK _AC(0x1FF, UL)
 #else
 #define SATP_PPN   _AC(0x0FFF, UL)
 #define SATP_MODE_39   _AC(0x8000, UL)
 #define SATP_MODE  SATP_MODE_39
+#define SATP_ASID_BITS 16
+#define SATP_ASID_SHIFT44
+#define SATP_ASID_MASK _AC(0x, UL)
 #endif
 
 /* Exception cause high bit - is an interrupt if set */
diff --git a/arch/riscv/include/asm/mmu.h b/arch/riscv/include/asm/mmu.h
index dabcf2cfb3dc..0099dc116168 100644
--- a/arch/riscv/include/asm/mmu.h
+++ b/arch/riscv/include/asm/mmu.h
@@ -12,6 +12,8 @@
 typedef struct {
 #ifndef CONFIG_MMU
unsigned long   end_brk;
+#else
+   atomic_long_t id;
 #endif
void *vdso;
 #ifdef CONFIG_SMP
diff --git a/arch/riscv/include/asm/mmu_context.h 
b/arch/riscv/include/asm/mmu_context.h
index 250defa06f3a..b0659413a080 100644
--- a/arch/riscv/include/asm/mmu_context.h
+++ b/arch/riscv/include/asm/mmu_context.h
@@ -23,6 +23,16 @@ static inline void activate_mm(struct mm_st

Re: [PATCH v3] riscv: add BUILTIN_DTB support for MMU-enabled targets

2021-01-21 Thread Anup Patel
On Thu, Jan 21, 2021 at 2:51 PM Vitaly Wool  wrote:
>
> On Sat, Jan 16, 2021 at 12:57 AM Vitaly Wool  wrote:
> >
> > Sometimes, especially in a production system we may not want to
> > use a "smart bootloader" like u-boot to load kernel, ramdisk and
> > device tree from a filesystem on eMMC, but rather load the kernel
> > from a NAND partition and just run it as soon as we can, and in
> > this case it is convenient to have device tree compiled into the
> > kernel binary. Since this case is not limited to MMU-less systems,
> > let's support it for these which have MMU enabled too.
> >
> > While at it, provide __dtb_start as a parameter to setup_vm() in
> > BUILTIN_DTB case, so we don't have to duplicate BUILTIN_DTB specific
> > processing in MMU-enabled and MMU-disabled versions of setup_vm().
>
> @Palmer: ping :)
>
> > Signed-off-by: Vitaly Wool 
>
> While at it, since this is just a respin/concatenation:
> @Damien: are you okay with re-adding 'Tested-By:' ?
> @Anup: are you okay with adding 'Reviewed-by:' since you have reviewed
> both v1 patches that were concatenated?

Yes, my Reviewed-by holds on this patch as well.

Reviewed-by: Anup Patel 

Best Regards,
Anup

>
> Best regards,
>Vitaly
>
> > ---
> > Changes from v2:
> > * folded "RISC-V: simplify BUILTIN_DTB processing" patch
> > [http://lists.infradead.org/pipermail/linux-riscv/2021-January/004153.html]
> > Changes from v1:
> > * no direct initial_boot_params assignment
> > * skips the temporary mapping for DT if BUILTIN_DTB=y
> >
> >  arch/riscv/Kconfig   |  1 -
> >  arch/riscv/kernel/head.S |  4 
> >  arch/riscv/mm/init.c | 19 +--
> >  3 files changed, 17 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index 2ef05ef921b5..444a1ed1e847 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -445,7 +445,6 @@ endmenu
> >
> >  config BUILTIN_DTB
> > def_bool n
> > -   depends on RISCV_M_MODE
> > depends on OF
> >
> >  menu "Power management options"
> > diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
> > index 16e9941900c4..f5a9bad86e58 100644
> > --- a/arch/riscv/kernel/head.S
> > +++ b/arch/riscv/kernel/head.S
> > @@ -260,7 +260,11 @@ clear_bss_done:
> >
> > /* Initialize page tables and relocate to virtual addresses */
> > la sp, init_thread_union + THREAD_SIZE
> > +#ifdef CONFIG_BUILTIN_DTB
> > +   la a0, __dtb_start
> > +#else
> > mv a0, s1
> > +#endif /* CONFIG_BUILTIN_DTB */
> > call setup_vm
> >  #ifdef CONFIG_MMU
> > la a0, early_pg_dir
> > diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> > index 30b61f2c6b87..45faad7c4291 100644
> > --- a/arch/riscv/mm/init.c
> > +++ b/arch/riscv/mm/init.c
> > @@ -192,10 +192,13 @@ void __init setup_bootmem(void)
> >  #endif /* CONFIG_BLK_DEV_INITRD */
> >
> > /*
> > -* Avoid using early_init_fdt_reserve_self() since __pa() does
> > +* If DTB is built in, no need to reserve its memblock.
> > +* Otherwise, do reserve it but avoid using
> > +* early_init_fdt_reserve_self() since __pa() does
> >  * not work for DTB pointers that are fixmap addresses
> >  */
> > -   memblock_reserve(dtb_early_pa, fdt_totalsize(dtb_early_va));
> > +   if (!IS_ENABLED(CONFIG_BUILTIN_DTB))
> > +   memblock_reserve(dtb_early_pa, fdt_totalsize(dtb_early_va));
> >
> > early_init_fdt_scan_reserved_mem();
> > dma_contiguous_reserve(dma32_phys_limit);
> > @@ -499,6 +502,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
> > /* Setup early PMD for DTB */
> > create_pgd_mapping(early_pg_dir, DTB_EARLY_BASE_VA,
> >(uintptr_t)early_dtb_pmd, PGDIR_SIZE, 
> > PAGE_TABLE);
> > +#ifndef CONFIG_BUILTIN_DTB
> > /* Create two consecutive PMD mappings for FDT early scan */
> > pa = dtb_pa & ~(PMD_SIZE - 1);
> > create_pmd_mapping(early_dtb_pmd, DTB_EARLY_BASE_VA,
> > @@ -506,7 +510,11 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
> > create_pmd_mapping(early_dtb_pmd, DTB_EARLY_BASE_VA + PMD_SIZE,
> >pa + PMD_SIZE, PMD_SIZE, PAGE_KERNEL);
> > dtb_early_va = (void *)DTB_EARLY_BASE_VA + (dtb_pa & (PMD_SIZE - 
> > 1));
> > +#else /* CONFIG_BUILTIN_DTB */

Re: [PATCH v4] drivers/soc/litex: Add restart handler

2021-01-19 Thread Anup Patel
Hi,

On Tue, Jan 19, 2021 at 1:40 PM Geert Uytterhoeven  wrote:
>
> Let the LiteX SoC Controller register a restart handler, which resets
> the LiteX SoC by writing 1 to CSR_CTRL_RESET_ADDR.
>
> Signed-off-by: Geert Uytterhoeven 

We have SBI System Reset Extension (SRST) in upcoming
SBI v0.3 spec. Using this SBI extension, you will not require a
dedicated reboot driver for various projects such as Linux kernel,
U-Boot, EDK2, FreeBSD kernel, etc.

The OpenSBI v0.9 (released yesterday) already has SBI SRST
extension implemented so we will just need platform hooks for
LiteX.

The Linux support for SRST extension is already available on
LKML so far no comments: https://lkml.org/lkml/2020/11/25/6

Regards,
Anup


> ---
> v4:
>   - Drop bogus "a" from description,
>   - Get rid of static litex_soc_ctrl_device and litex_reset_nb
> instances,
>   - Unregister handler on driver unbind,
>
> v3:
>   - Rebase on top of openrisc/for-next,
>
> v2:
>   - Rebase on top of v5.11-rc1,
>   - Change reset handler priority to recommended default value of 128
> (was 192).
>
> (v1 was not sent to a mailing list)
> ---
>  drivers/soc/litex/litex_soc_ctrl.c | 42 +-
>  1 file changed, 41 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/soc/litex/litex_soc_ctrl.c 
> b/drivers/soc/litex/litex_soc_ctrl.c
> index da17ba56b7956c84..a7dd5be9fd5bd8ad 100644
> --- a/drivers/soc/litex/litex_soc_ctrl.c
> +++ b/drivers/soc/litex/litex_soc_ctrl.c
> @@ -15,6 +15,11 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +
> +/* reset register located at the base address */
> +#define RESET_REG_OFF   0x00
> +#define RESET_REG_VALUE 0x0001
>
>  #define SCRATCH_REG_OFF 0x04
>  #define SCRATCH_REG_VALUE   0x12345678
> @@ -66,8 +71,19 @@ static int litex_check_csr_access(void __iomem *reg_addr)
>
>  struct litex_soc_ctrl_device {
> void __iomem *base;
> +   struct notifier_block reset_nb;
>  };
>
> +static int litex_reset_handler(struct notifier_block *this, unsigned long 
> mode,
> +  void *cmd)
> +{
> +   struct litex_soc_ctrl_device *soc_ctrl_dev =
> +   container_of(this, struct litex_soc_ctrl_device, reset_nb);
> +
> +   litex_write32(soc_ctrl_dev->base + RESET_REG_OFF, RESET_REG_VALUE);
> +   return NOTIFY_DONE;
> +}
> +
>  static const struct of_device_id litex_soc_ctrl_of_match[] = {
> {.compatible = "litex,soc-controller"},
> {},
> @@ -78,6 +94,7 @@ MODULE_DEVICE_TABLE(of, litex_soc_ctrl_of_match);
>  static int litex_soc_ctrl_probe(struct platform_device *pdev)
>  {
> struct litex_soc_ctrl_device *soc_ctrl_dev;
> +   int error;
>
> soc_ctrl_dev = devm_kzalloc(>dev, sizeof(*soc_ctrl_dev), 
> GFP_KERNEL);
> if (!soc_ctrl_dev)
> @@ -87,7 +104,29 @@ static int litex_soc_ctrl_probe(struct platform_device 
> *pdev)
> if (IS_ERR(soc_ctrl_dev->base))
> return PTR_ERR(soc_ctrl_dev->base);
>
> -   return litex_check_csr_access(soc_ctrl_dev->base);
> +   error = litex_check_csr_access(soc_ctrl_dev->base);
> +   if (error)
> +   return error;
> +
> +   platform_set_drvdata(pdev, soc_ctrl_dev);
> +
> +   soc_ctrl_dev->reset_nb.notifier_call = litex_reset_handler;
> +   soc_ctrl_dev->reset_nb.priority = 128;
> +   error = register_restart_handler(_ctrl_dev->reset_nb);
> +   if (error) {
> +   dev_warn(>dev, "cannot register restart handler: %d\n",
> +error);
> +   }
> +
> +   return 0;
> +}
> +
> +static int litex_soc_ctrl_remove(struct platform_device *pdev)
> +{
> +   struct litex_soc_ctrl_device *soc_ctrl_dev = 
> platform_get_drvdata(pdev);
> +
> +   unregister_restart_handler(_ctrl_dev->reset_nb);
> +   return 0;
>  }
>
>  static struct platform_driver litex_soc_ctrl_driver = {
> @@ -96,6 +135,7 @@ static struct platform_driver litex_soc_ctrl_driver = {
> .of_match_table = of_match_ptr(litex_soc_ctrl_of_match)
> },
> .probe = litex_soc_ctrl_probe,
> +   .remove = litex_soc_ctrl_remove,
>  };
>
>  module_platform_driver(litex_soc_ctrl_driver);
> --
> 2.25.1
>
>
> ___
> linux-riscv mailing list
> linux-ri...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv


  1   2   3   4   5   6   7   8   9   10   >