Re: [PATCH v3 1/3] RISC-V: Issue a local tlbflush if possible.

2019-08-21 Thread Atish Patra
On Thu, 2019-08-22 at 06:27 +0200, h...@lst.de wrote:
> On Thu, Aug 22, 2019 at 04:01:24AM +, Atish Patra wrote:
> > The downside of this is that for every !cmask case in true SMP
> > (more
> > common probably) it will execute 2 extra cpumask instructions. As
> > tlbflush path is in performance critical path, I think we should
> > favor
> > more common case (SMP with more than 1 core).
> 
> Actually, looking at both the current mainline code, and the code
> from my
> cleanups tree I don't think remote_sfence_vma / __sbi_tlb_flush_range
> can ever be called with  NULL cpumask, as we always have a valid mm.
> 

Yes. You are correct.

As both cpumask functions here will crash if cpumask is null, we should
probably leave a harmless comment to warn the consequeunce of cpumask
being null.

> So this is a bit of a moot point, and we can drop andling that case
> entirely.  With that we can also use a simple if / else for the local
> cpu only vs remote case. 

Done.

>  Btw, what was the reason you didn't like
> using cpumask_any_but like x86, which should be more efficient than
> cpumask_test_cpu + hweigt?

I had it in v2 patch but removed as it can potentially return garbage
value if cpumask is empty. 

However, we are already checking empty cpumask before the local cpu
check. I will replace cpumask_test_cpu + hweight with
cpumask_any_but().

-- 
Regards,
Atish


Re: [PATCH v3 1/3] RISC-V: Issue a local tlbflush if possible.

2019-08-21 Thread h...@lst.de
On Thu, Aug 22, 2019 at 04:01:24AM +, Atish Patra wrote:
> The downside of this is that for every !cmask case in true SMP (more
> common probably) it will execute 2 extra cpumask instructions. As
> tlbflush path is in performance critical path, I think we should favor
> more common case (SMP with more than 1 core).

Actually, looking at both the current mainline code, and the code from my
cleanups tree I don't think remote_sfence_vma / __sbi_tlb_flush_range
can ever be called with  NULL cpumask, as we always have a valid mm.

So this is a bit of a moot point, and we can drop andling that case
entirely.  With that we can also use a simple if / else for the local
cpu only vs remote case.  Btw, what was the reason you didn't like
using cpumask_any_but like x86, which should be more efficient than
cpumask_test_cpu + hweigt?


Re: [PATCH v3 1/3] RISC-V: Issue a local tlbflush if possible.

2019-08-21 Thread Atish Patra
On Thu, 2019-08-22 at 03:46 +0200, Christoph Hellwig wrote:
> On Wed, Aug 21, 2019 at 05:46:42PM -0700, Atish Patra wrote:
> > In RISC-V, tlb flush happens via SBI which is expensive. If the
> > local
> > cpu is the only cpu in cpumask, there is no need to invoke a SBI
> > call.
> > 
> > Just do a local flush and return.
> > 
> > Signed-off-by: Atish Patra 
> > ---
> >  arch/riscv/mm/tlbflush.c | 15 +++
> >  1 file changed, 15 insertions(+)
> > 
> > diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> > index df93b26f1b9d..36430ee3bed9 100644
> > --- a/arch/riscv/mm/tlbflush.c
> > +++ b/arch/riscv/mm/tlbflush.c
> > @@ -2,6 +2,7 @@
> >  
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  
> >  void flush_tlb_all(void)
> > @@ -13,9 +14,23 @@ static void __sbi_tlb_flush_range(struct cpumask
> > *cmask, unsigned long start,
> > unsigned long size)
> >  {
> > struct cpumask hmask;
> > +   unsigned int cpuid = get_cpu();
> >  
> > +   if (!cmask) {
> > +   riscv_cpuid_to_hartid_mask(cpu_online_mask, );
> > +   goto issue_sfence;
> > +   }
> > +
> > +   if (cpumask_test_cpu(cpuid, cmask) && cpumask_weight(cmask) ==
> > 1) {
> > +   local_flush_tlb_all();
> > +   goto done;
> > +   }
> 
> I think a single core on a SMP kernel is a valid enough use case
> given
> how litte distros still have UP kernels.  So Maybe this shiuld rather
> be:
> 
>   if (!cmask)
>   cmask = cpu_online_mask;
> 
>   if (cpumask_test_cpu(cpuid, cmask) && cpumask_weight(cmask) ==
> 1) {
>   local_flush_tlb_all();
>   } else {
>   riscv_cpuid_to_hartid_mask(cmask, );
>   sbi_remote_sfence_vma(hmask.bits, start, size);
>   }

The downside of this is that for every !cmask case in true SMP (more
common probably) it will execute 2 extra cpumask instructions. As
tlbflush path is in performance critical path, I think we should favor
more common case (SMP with more than 1 core).

Thoughts ?

-- 
Regards,
Atish


Re: [PATCH v3 1/3] RISC-V: Issue a local tlbflush if possible.

2019-08-21 Thread Christoph Hellwig
On Wed, Aug 21, 2019 at 05:46:42PM -0700, Atish Patra wrote:
> In RISC-V, tlb flush happens via SBI which is expensive. If the local
> cpu is the only cpu in cpumask, there is no need to invoke a SBI call.
> 
> Just do a local flush and return.
> 
> Signed-off-by: Atish Patra 
> ---
>  arch/riscv/mm/tlbflush.c | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
> index df93b26f1b9d..36430ee3bed9 100644
> --- a/arch/riscv/mm/tlbflush.c
> +++ b/arch/riscv/mm/tlbflush.c
> @@ -2,6 +2,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  void flush_tlb_all(void)
> @@ -13,9 +14,23 @@ static void __sbi_tlb_flush_range(struct cpumask *cmask, 
> unsigned long start,
>   unsigned long size)
>  {
>   struct cpumask hmask;
> + unsigned int cpuid = get_cpu();
>  
> + if (!cmask) {
> + riscv_cpuid_to_hartid_mask(cpu_online_mask, );
> + goto issue_sfence;
> + }
> +
> + if (cpumask_test_cpu(cpuid, cmask) && cpumask_weight(cmask) == 1) {
> + local_flush_tlb_all();
> + goto done;
> + }

I think a single core on a SMP kernel is a valid enough use case given
how litte distros still have UP kernels.  So Maybe this shiuld rather be:

if (!cmask)
cmask = cpu_online_mask;

if (cpumask_test_cpu(cpuid, cmask) && cpumask_weight(cmask) == 1) {
local_flush_tlb_all();
} else {
riscv_cpuid_to_hartid_mask(cmask, );
sbi_remote_sfence_vma(hmask.bits, start, size);
}


[PATCH v3 1/3] RISC-V: Issue a local tlbflush if possible.

2019-08-21 Thread Atish Patra
In RISC-V, tlb flush happens via SBI which is expensive. If the local
cpu is the only cpu in cpumask, there is no need to invoke a SBI call.

Just do a local flush and return.

Signed-off-by: Atish Patra 
---
 arch/riscv/mm/tlbflush.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index df93b26f1b9d..36430ee3bed9 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -2,6 +2,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 void flush_tlb_all(void)
@@ -13,9 +14,23 @@ static void __sbi_tlb_flush_range(struct cpumask *cmask, 
unsigned long start,
unsigned long size)
 {
struct cpumask hmask;
+   unsigned int cpuid = get_cpu();
 
+   if (!cmask) {
+   riscv_cpuid_to_hartid_mask(cpu_online_mask, );
+   goto issue_sfence;
+   }
+
+   if (cpumask_test_cpu(cpuid, cmask) && cpumask_weight(cmask) == 1) {
+   local_flush_tlb_all();
+   goto done;
+   }
riscv_cpuid_to_hartid_mask(cmask, );
+
+issue_sfence:
sbi_remote_sfence_vma(hmask.bits, start, size);
+done:
+   put_cpu();
 }
 
 void flush_tlb_mm(struct mm_struct *mm)
-- 
2.21.0