[PATCH] powerpc/imc-pmu: Use the correct spinlock initializer.

2023-03-09 Thread Sebastian Andrzej Siewior
The macro __SPIN_LOCK_INITIALIZER() is implementation specific. Users
that desire to initialize a spinlock in a struct must use
__SPIN_LOCK_UNLOCKED().

Use __SPIN_LOCK_UNLOCKED() for the spinlock_t in imc_global_refc.

Fixes: 76d588dddc459 ("powerpc/imc-pmu: Fix use of mutex in IRQs disabled 
section")
Signed-off-by: Sebastian Andrzej Siewior 
---
 arch/powerpc/perf/imc-pmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 9d229ef7f86ef..ada817c49b722 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -51,7 +51,7 @@ static int trace_imc_mem_size;
  * core and trace-imc
  */
 static struct imc_pmu_ref imc_global_refc = {
-   .lock = __SPIN_LOCK_INITIALIZER(imc_global_refc.lock),
+   .lock = __SPIN_LOCK_UNLOCKED(imc_global_refc.lock),
.id = 0,
.refc = 0,
 };
-- 
2.39.2



[PATCH] powerpc/pseries: Select the generic memory allocator.

2023-03-09 Thread Sebastian Andrzej Siewior
The RTAS work area allocator is using the generic memory allocator and
as such it must select it.

Select the generic memory allocator on pseries.

Fixes: 43033bc62d349 ("powerpc/pseries: add RTAS work area allocator")
Signed-off-by: Sebastian Andrzej Siewior 
---
 arch/powerpc/platforms/pseries/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/platforms/pseries/Kconfig 
b/arch/powerpc/platforms/pseries/Kconfig
index b481c5c8bae11..1c23f2d4ed557 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -2,6 +2,7 @@
 config PPC_PSERIES
depends on PPC64 && PPC_BOOK3S
bool "IBM pSeries & new (POWER5-based) iSeries"
+   select GENERIC_ALLOCATOR
select HAVE_PCSPKR_PLATFORM
select MPIC
select OF_DYNAMIC
-- 
2.39.2



Re: [RFC PATCH RESEND 04/28] mm: move mmap_lock assert function definitions

2022-09-01 Thread Sebastian Andrzej Siewior
On 2022-09-01 16:24:09 [-0400], Kent Overstreet wrote:
> > --- a/include/linux/mmap_lock.h
> > +++ b/include/linux/mmap_lock.h
> > @@ -60,6 +60,18 @@ static inline void __mmap_lock_trace_released(struct 
> > mm_struct *mm, bool write)
> >  
> >  #endif /* CONFIG_TRACING */
> >  
> > +static inline void mmap_assert_locked(struct mm_struct *mm)
> > +{
> > +   lockdep_assert_held(&mm->mmap_lock);
> > +   VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm);
> 
> These look redundant to me - maybe there's a reason the VM developers want 
> both,
> but I would drop the VM_BUG_ON() and just keep the lockdep_assert_held(), 
> since
> that's the standard way to write that assertion.

Exactly. rwsem_is_locked() returns true only if the lock is "locked" not
necessary by the caller. lockdep_assert_held() checks that the lock is
locked by the caller - this is the important part.

Sebastian


Re: [PATCH] arch/*: Disable softirq stacks on PREEMPT_RT.

2022-06-17 Thread Sebastian Andrzej Siewior
On 2022-06-15 17:41:45 [+0200], Arnd Bergmann wrote:
> Applied to the asm-generic tree with the above fixup, thanks!

Thank you Arnd.

>   Arnd

Sebastian


[PATCH] arch/*: Disable softirq stacks on PREEMPT_RT.

2022-06-14 Thread Sebastian Andrzej Siewior
PREEMPT_RT preempts softirqs and the current implementation avoids
do_softirq_own_stack() and only uses __do_softirq().

Disable the unused softirqs stacks on PREEMPT_RT to safe some memory and
ensure that do_softirq_own_stack() is not used bwcause it is not
expected.

Signed-off-by: Sebastian Andrzej Siewior 
---

Initially I aimed only for the asm-generic bits and arm since I have
most bits of the port ready. Arnd then suggested to do all arches at
once and here it is.
I tried to keep it minimal in sense that I didn't remove the dedicated
softirq-stacks on parisc or powerpc for instance. That would add another
few ifdefs and I don't know if we manage to get it up and running on
parisc. I do have the missing bits for powerpc however ;)

 arch/arm/kernel/irq.c | 3 ++-
 arch/parisc/kernel/irq.c  | 2 ++
 arch/powerpc/kernel/irq.c | 4 
 arch/s390/include/asm/softirq_stack.h | 3 ++-
 arch/sh/kernel/irq.c  | 2 ++
 arch/sparc/kernel/irq_64.c| 2 ++
 include/asm-generic/softirq_stack.h   | 2 +-
 7 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/arch/arm/kernel/irq.c b/arch/arm/kernel/irq.c
index 5c6f8d11a3ce5..034cb48c9eeb8 100644
--- a/arch/arm/kernel/irq.c
+++ b/arch/arm/kernel/irq.c
@@ -70,6 +70,7 @@ static void __init init_irq_stacks(void)
}
 }
 
+#ifndef CONFIG_PREEMPT_RT
 static void do_softirq(void *arg)
 {
__do_softirq();
@@ -80,7 +81,7 @@ void do_softirq_own_stack(void)
call_with_stack(do_softirq, NULL,
__this_cpu_read(irq_stack_ptr));
 }
-
+#endif
 #endif
 
 int arch_show_interrupts(struct seq_file *p, int prec)
diff --git a/arch/parisc/kernel/irq.c b/arch/parisc/kernel/irq.c
index 0fe2d79fb123f..eba193bcdab1b 100644
--- a/arch/parisc/kernel/irq.c
+++ b/arch/parisc/kernel/irq.c
@@ -480,10 +480,12 @@ static void execute_on_irq_stack(void *func, unsigned 
long param1)
*irq_stack_in_use = 1;
 }
 
+#ifndef CONFIG_PREEMPT_RT
 void do_softirq_own_stack(void)
 {
execute_on_irq_stack(__do_softirq, 0);
 }
+#endif
 #endif /* CONFIG_IRQSTACKS */
 
 /* ONLY called from entry.S:intr_extint() */
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index dd09919c3c668..0822a274a549c 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -611,6 +611,7 @@ static inline void check_stack_overflow(void)
}
 }
 
+#ifndef CONFIG_PREEMPT_RT
 static __always_inline void call_do_softirq(const void *sp)
 {
/* Temporarily switch r1 to sp, call __do_softirq() then restore r1. */
@@ -629,6 +630,7 @@ static __always_inline void call_do_softirq(const void *sp)
   "r11", "r12"
);
 }
+#endif
 
 static __always_inline void call_do_irq(struct pt_regs *regs, void *sp)
 {
@@ -747,10 +749,12 @@ void *mcheckirq_ctx[NR_CPUS] __read_mostly;
 void *softirq_ctx[NR_CPUS] __read_mostly;
 void *hardirq_ctx[NR_CPUS] __read_mostly;
 
+#ifndef CONFIG_PREEMPT_RT
 void do_softirq_own_stack(void)
 {
call_do_softirq(softirq_ctx[smp_processor_id()]);
 }
+#endif
 
 irq_hw_number_t virq_to_hw(unsigned int virq)
 {
diff --git a/arch/s390/include/asm/softirq_stack.h 
b/arch/s390/include/asm/softirq_stack.h
index fd17f25704bd5..af68d6c1d5840 100644
--- a/arch/s390/include/asm/softirq_stack.h
+++ b/arch/s390/include/asm/softirq_stack.h
@@ -5,9 +5,10 @@
 #include 
 #include 
 
+#ifndef CONFIG_PREEMPT_RT
 static inline void do_softirq_own_stack(void)
 {
call_on_stack(0, S390_lowcore.async_stack, void, __do_softirq);
 }
-
+#endif
 #endif /* __ASM_S390_SOFTIRQ_STACK_H */
diff --git a/arch/sh/kernel/irq.c b/arch/sh/kernel/irq.c
index ef0f0827cf575..2d3eca8fee011 100644
--- a/arch/sh/kernel/irq.c
+++ b/arch/sh/kernel/irq.c
@@ -149,6 +149,7 @@ void irq_ctx_exit(int cpu)
hardirq_ctx[cpu] = NULL;
 }
 
+#ifndef CONFIG_PREEMPT_RT
 void do_softirq_own_stack(void)
 {
struct thread_info *curctx;
@@ -176,6 +177,7 @@ void do_softirq_own_stack(void)
  "r5", "r6", "r7", "r8", "r9", "r15", "t", "pr"
);
 }
+#endif
 #else
 static inline void handle_one_irq(unsigned int irq)
 {
diff --git a/arch/sparc/kernel/irq_64.c b/arch/sparc/kernel/irq_64.c
index c8848bb681a11..41fa1be980a33 100644
--- a/arch/sparc/kernel/irq_64.c
+++ b/arch/sparc/kernel/irq_64.c
@@ -855,6 +855,7 @@ void __irq_entry handler_irq(int pil, struct pt_regs *regs)
set_irq_regs(old_regs);
 }
 
+#ifndef CONFIG_PREEMPT_RT
 void do_softirq_own_stack(void)
 {
void *orig_sp, *sp = softirq_stack[smp_processor_id()];
@@ -869,6 +870,7 @@ void do_softirq_own_stack(void)
__asm__ __volatile__("mov %0, %%sp"
 : : "r" (orig_sp));
 }
+#endif
 
 #ifdef CONFIG_HOTPLUG_CPU
 void fixup_irqs(void)
diff --git a/include/asm-generic/softirq_stack.h 
b/include/asm-gen

Re: [PATCH 09/10] scsi/ibmvscsi: Replace srp tasklet with work

2022-06-14 Thread 'Sebastian Andrzej Siewior'
On 2022-06-09 15:46:04 [+], David Laight wrote:
> From: Sebastian Andrzej Siewior
> > Sent: 09 June 2022 16:03
> > 
> > On 2022-05-30 16:15:11 [-0700], Davidlohr Bueso wrote:
> > > Tasklets have long been deprecated as being too heavy on the system
> > > by running in irq context - and this is not a performance critical
> > > path. If a higher priority process wants to run, it must wait for
> > > the tasklet to finish before doing so.
> > >
> > > Process srps asynchronously in process context in a dedicated
> > > single threaded workqueue.
> > 
> > I would suggest threaded interrupts instead. The pattern here is the
> > same as in the previous driver except here is less locking.
> 
> How long do these actions runs for, and what is waiting for
> them to finish?

That is something that one with hardware and workload can answer.

> These changes seem to drop the priority from above that of the
> highest priority RT process down to that of a default priority
> user process.
> There is no real guarantee that the latter will run 'any time soon'.

Not sure I can follow. Using threaded interrupts will run at FIFO-50 by
default. Workqueue however is SCHED_OTHER. But then it is not bound to
any CPU so it will run on an available CPU.

> Consider some workloads I'm setting up where most of the cpu are
> likely to spend 90%+ of the time running processes under the RT
> scheduler that are processing audio.
> 
> It is quite likely that a non-RT thread (especially one bound
> to a specific cpu) won't run for several milliseconds.
> (We have to go through 'hoops' to avoid dropping ethernet frames.)
> 
> I'd have thought that some of these kernel threads really
> need to run at a 'middling' RT priority.

The threaded interrupts do this by default. If you run your own RT
threads you need to decide if they are more or less important than the
interrupts.

>   David

Sebastian


Re: [PATCH 09/10] scsi/ibmvscsi: Replace srp tasklet with work

2022-06-09 Thread Sebastian Andrzej Siewior
On 2022-05-30 16:15:11 [-0700], Davidlohr Bueso wrote:
> Tasklets have long been deprecated as being too heavy on the system
> by running in irq context - and this is not a performance critical
> path. If a higher priority process wants to run, it must wait for
> the tasklet to finish before doing so.
> 
> Process srps asynchronously in process context in a dedicated
> single threaded workqueue.

I would suggest threaded interrupts instead. The pattern here is the
same as in the previous driver except here is less locking.

Sebastian


Re: [PATCH 08/10] scsi/ibmvfc: Replace tasklet with work

2022-06-09 Thread Sebastian Andrzej Siewior
On 2022-05-30 16:15:10 [-0700], Davidlohr Bueso wrote:
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index d0eab5700dc5..31b1900489e7 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -891,7 +891,7 @@ static void ibmvfc_release_crq_queue(struct ibmvfc_host 
> *vhost)
>  
>   ibmvfc_dbg(vhost, "Releasing CRQ\n");
>   free_irq(vdev->irq, vhost);
> - tasklet_kill(&vhost->tasklet);
> +cancel_work_sync(&vhost->work);
s/ {8}/\t/

is there a reason not to use threaded interrupts? The workqueue _might_
migrate to another CPU. The locking ensures that nothing bad happens but
ibmvfc_tasklet() has this piece:

| spin_lock_irqsave(vhost->host->host_lock, flags);
…
| while ((async = ibmvfc_next_async_crq(vhost)) != NULL) {
| ibmvfc_handle_async(async, vhost);
| async->valid = 0;
| wmb();
| }
…
| vio_enable_interrupts(vdev);
potentially enables interrupts which fires right away.

| if ((async = ibmvfc_next_async_crq(vhost)) != NULL) {
| vio_disable_interrupts(vdev);

disables it again.

| }
| 
| spin_unlock(vhost->crq.q_lock);
| spin_unlock_irqrestore(vhost->host->host_lock, flags);

If the worker runs on another CPU then the CPU servicing the interrupt
will be blocked on the lock which is not nice.

My guess is that you could enable interrupts right before unlocking but
is a different story.

>   do {
>   if (rc)
>   msleep(100);

Sebastian


Re: [PATCH 06/10] scsi/ibmvscsi_tgt: Replace work tasklet with threaded irq

2022-06-03 Thread Sebastian Andrzej Siewior
On 2022-05-30 16:15:08 [-0700], Davidlohr Bueso wrote:
> diff --git a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c 
> b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
> index eee1a24f7e15..fafadb7158a3 100644
> --- a/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
> +++ b/drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c
> @@ -2948,9 +2948,8 @@ static irqreturn_t ibmvscsis_interrupt(int dummy, void 
> *data)
>   struct scsi_info *vscsi = data;
>  
>   vio_disable_interrupts(vscsi->dma_dev);
> - tasklet_schedule(&vscsi->work_task);
looks good.

> - return IRQ_HANDLED;
> + return IRQ_WAKE_THREAD;
>  }
>  
>  /**
> @@ -3340,7 +3339,7 @@ static void ibmvscsis_handle_crq(unsigned long data)
>   dev_dbg(&vscsi->dev, "handle_crq, don't process: flags 0x%x, 
> state 0x%hx\n",
>   vscsi->flags, vscsi->state);
>   spin_unlock_bh(&vscsi->intr_lock);

So if you move it away from from tasklet you can replace the spin_lock_bh()
with spin_lock() since BH does not matter anymore. Except if there is a
timer because it matters for a timer_list timer which is invoked in
softirq context. This isn't the case except that there is a hrtimer
invoking ibmvscsis_service_wait_q(). This is bad because a hrtimer is
which is invoked in hard-irq context so a BH lock must not be acquired.
lockdep would scream here. You could use HRTIMER_MODE_REL_SOFT which
would make it a BH timer. Then you could keep the BH locking but
actually you want to get rid of it ;)

> - return;
> + goto done;
>   }
>  
>   rc = vscsi->flags & SCHEDULE_DISCONNECT;

Sebastian


[PATCH 11/11] locking: Allow to include asm/spinlock_types.h from linux/spinlock_types_raw.h

2021-11-29 Thread Sebastian Andrzej Siewior
kernel.org
Cc: linux-i...@vger.kernel.org
Cc: linux-ri...@lists.infradead.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: linux-xte...@linux-xtensa.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Sebastian Andrzej Siewior 
---
 arch/alpha/include/asm/spinlock_types.h  | 2 +-
 arch/arm/include/asm/spinlock_types.h| 2 +-
 arch/arm64/include/asm/spinlock_types.h  | 2 +-
 arch/csky/include/asm/spinlock_types.h   | 2 +-
 arch/hexagon/include/asm/spinlock_types.h| 2 +-
 arch/ia64/include/asm/spinlock_types.h   | 2 +-
 arch/powerpc/include/asm/simple_spinlock_types.h | 2 +-
 arch/powerpc/include/asm/spinlock_types.h| 2 +-
 arch/riscv/include/asm/spinlock_types.h  | 2 +-
 arch/s390/include/asm/spinlock_types.h   | 2 +-
 arch/sh/include/asm/spinlock_types.h | 2 +-
 arch/xtensa/include/asm/spinlock_types.h | 2 +-
 include/linux/ratelimit_types.h  | 2 +-
 include/linux/spinlock_types_up.h| 2 +-
 14 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/alpha/include/asm/spinlock_types.h 
b/arch/alpha/include/asm/spinlock_types.h
index 1d5716bc060be..2526fd3be5fd7 100644
--- a/arch/alpha/include/asm/spinlock_types.h
+++ b/arch/alpha/include/asm/spinlock_types.h
@@ -2,7 +2,7 @@
 #ifndef _ALPHA_SPINLOCK_TYPES_H
 #define _ALPHA_SPINLOCK_TYPES_H
 
-#ifndef __LINUX_SPINLOCK_TYPES_H
+#ifndef __LINUX_SPINLOCK_TYPES_RAW_H
 # error "please don't include this file directly"
 #endif
 
diff --git a/arch/arm/include/asm/spinlock_types.h 
b/arch/arm/include/asm/spinlock_types.h
index 5976958647fe1..0c14b36ef1013 100644
--- a/arch/arm/include/asm/spinlock_types.h
+++ b/arch/arm/include/asm/spinlock_types.h
@@ -2,7 +2,7 @@
 #ifndef __ASM_SPINLOCK_TYPES_H
 #define __ASM_SPINLOCK_TYPES_H
 
-#ifndef __LINUX_SPINLOCK_TYPES_H
+#ifndef __LINUX_SPINLOCK_TYPES_RAW_H
 # error "please don't include this file directly"
 #endif
 
diff --git a/arch/arm64/include/asm/spinlock_types.h 
b/arch/arm64/include/asm/spinlock_types.h
index 18782f0c47212..11ab1c0776977 100644
--- a/arch/arm64/include/asm/spinlock_types.h
+++ b/arch/arm64/include/asm/spinlock_types.h
@@ -5,7 +5,7 @@
 #ifndef __ASM_SPINLOCK_TYPES_H
 #define __ASM_SPINLOCK_TYPES_H
 
-#if !defined(__LINUX_SPINLOCK_TYPES_H) && !defined(__ASM_SPINLOCK_H)
+#if !defined(__LINUX_SPINLOCK_TYPES_RAW_H) && !defined(__ASM_SPINLOCK_H)
 # error "please don't include this file directly"
 #endif
 
diff --git a/arch/csky/include/asm/spinlock_types.h 
b/arch/csky/include/asm/spinlock_types.h
index 8ff0f6ff3a006..db87a12c3827d 100644
--- a/arch/csky/include/asm/spinlock_types.h
+++ b/arch/csky/include/asm/spinlock_types.h
@@ -3,7 +3,7 @@
 #ifndef __ASM_CSKY_SPINLOCK_TYPES_H
 #define __ASM_CSKY_SPINLOCK_TYPES_H
 
-#ifndef __LINUX_SPINLOCK_TYPES_H
+#ifndef __LINUX_SPINLOCK_TYPES_RAW_H
 # error "please don't include this file directly"
 #endif
 
diff --git a/arch/hexagon/include/asm/spinlock_types.h 
b/arch/hexagon/include/asm/spinlock_types.h
index 19d233497ba52..d5f66495b670f 100644
--- a/arch/hexagon/include/asm/spinlock_types.h
+++ b/arch/hexagon/include/asm/spinlock_types.h
@@ -8,7 +8,7 @@
 #ifndef _ASM_SPINLOCK_TYPES_H
 #define _ASM_SPINLOCK_TYPES_H
 
-#ifndef __LINUX_SPINLOCK_TYPES_H
+#ifndef __LINUX_SPINLOCK_TYPES_RAW_H
 # error "please don't include this file directly"
 #endif
 
diff --git a/arch/ia64/include/asm/spinlock_types.h 
b/arch/ia64/include/asm/spinlock_types.h
index 6e345fefcdcab..14b8a161c1652 100644
--- a/arch/ia64/include/asm/spinlock_types.h
+++ b/arch/ia64/include/asm/spinlock_types.h
@@ -2,7 +2,7 @@
 #ifndef _ASM_IA64_SPINLOCK_TYPES_H
 #define _ASM_IA64_SPINLOCK_TYPES_H
 
-#ifndef __LINUX_SPINLOCK_TYPES_H
+#ifndef __LINUX_SPINLOCK_TYPES_RAW_H
 # error "please don't include this file directly"
 #endif
 
diff --git a/arch/powerpc/include/asm/simple_spinlock_types.h 
b/arch/powerpc/include/asm/simple_spinlock_types.h
index 0f3cdd8faa959..08243338069d2 100644
--- a/arch/powerpc/include/asm/simple_spinlock_types.h
+++ b/arch/powerpc/include/asm/simple_spinlock_types.h
@@ -2,7 +2,7 @@
 #ifndef _ASM_POWERPC_SIMPLE_SPINLOCK_TYPES_H
 #define _ASM_POWERPC_SIMPLE_SPINLOCK_TYPES_H
 
-#ifndef __LINUX_SPINLOCK_TYPES_H
+#ifndef __LINUX_SPINLOCK_TYPES_RAW_H
 # error "please don't include this file directly"
 #endif
 
diff --git a/arch/powerpc/include/asm/spinlock_types.h 
b/arch/powerpc/include/asm/spinlock_types.h
index c5d742f18021d..d5f8a74ed2e8c 100644
--- a/arch/powerpc/include/asm/spinlock_types.h
+++ b/arch/powerpc/include/asm/spinlock_types.h
@@ -2,7 +2,7 @@
 #ifndef _ASM_POWERPC_SPINLOCK_TYPES_H
 #define _ASM_POWERPC_SPINLOCK_TYPES_H
 
-#ifndef __LINUX_SPINLOCK_TYPES_H
+#ifndef __LINUX_SPINLOCK_TYPES_RAW_H
 # error "please don't include this file directly"
 #endif
 
diff --git a/arch/riscv/include

Re: [PATCH 00/38] Replace deprecated CPU-hotplug

2021-08-03 Thread Sebastian Andrzej Siewior
On 2021-08-03 17:30:40 [+0200], Hans de Goede wrote:
> Hi Sebastien,
Hi Hans,

> On 8/3/21 4:15 PM, Sebastian Andrzej Siewior wrote:
> > This is a tree wide replacement of the deprecated CPU hotplug functions
> > which are only wrappers around the actual functions.
> > 
> > Each patch is independent and can be picked up by the relevant maintainer.
> 
> Ok; and I take it that then also is the plan for merging these ?
> 
> FWIW I'm fine with the drivers/platform/x86 patch going upstream
> through some other tree if its easier to keep the set together ...

There is no need to keep that set together since each patch is
independent. Please merge it through your tree.

> Regards,
> 
> Hans

Sebastian


[PATCH 00/38] Replace deprecated CPU-hotplug

2021-08-03 Thread Sebastian Andrzej Siewior
This is a tree wide replacement of the deprecated CPU hotplug functions
which are only wrappers around the actual functions.

Each patch is independent and can be picked up by the relevant maintainer.

Cc: Alexander Shishkin 
Cc: Amit Kucheria 
Cc: Andrew Morton 
Cc: Andy Lutomirski 
Cc: Arnaldo Carvalho de Melo 
Cc: Arnd Bergmann 
Cc: Benjamin Herrenschmidt 
Cc: Ben Segall 
Cc: Borislav Petkov 
Cc: cgro...@vger.kernel.org
Cc: Christian Borntraeger 
Cc: coresi...@lists.linaro.org
Cc: Daniel Bristot de Oliveira 
Cc: Daniel Jordan 
Cc: Daniel Lezcano 
Cc: Dave Hansen 
Cc: Davidlohr Bueso 
Cc: "David S. Miller" 
Cc: Dietmar Eggemann 
Cc: Gonglei 
Cc: Greg Kroah-Hartman 
Cc: Guenter Roeck 
Cc: Hans de Goede 
Cc: Heiko Carstens 
Cc: Herbert Xu 
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: Ingo Molnar 
Cc: Jakub Kicinski 
Cc: Jason Wang 
Cc: Jean Delvare 
Cc: Jiri Kosina 
Cc: Jiri Olsa 
Cc: Joe Lawrence 
Cc: Joel Fernandes 
Cc: Johannes Weiner 
Cc: John Stultz 
Cc: Jonathan Corbet 
Cc: Josh Poimboeuf 
Cc: Josh Triplett 
Cc: Julian Wiedmann 
Cc: Juri Lelli 
Cc: Karol Herbst 
Cc: Karsten Graul 
Cc: kvm-...@vger.kernel.org
Cc: Lai Jiangshan 
Cc: Len Brown 
Cc: Len Brown 
Cc: Leo Yan 
Cc: linux-a...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-cry...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-e...@vger.kernel.org
Cc: linux-hw...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-m...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-r...@vger.kernel.org
Cc: linux-s...@vger.kernel.org
Cc: live-patch...@vger.kernel.org
Cc: Mark Gross 
Cc: Mark Rutland 
Cc: Mathieu Desnoyers 
Cc: Mathieu Poirier 
Cc: Mel Gorman 
Cc: Michael Ellerman 
Cc: "Michael S. Tsirkin" 
Cc: Mike Leach 
Cc: Mike Travis 
Cc: Miroslav Benes 
Cc: Namhyung Kim 
Cc: net...@vger.kernel.org
Cc: nouv...@lists.freedesktop.org
Cc: "Paul E. McKenney" 
Cc: Paul Mackerras 
Cc: Pavel Machek 
Cc: Pekka Paalanen 
Cc: Peter Zijlstra 
Cc: Petr Mladek 
Cc: platform-driver-...@vger.kernel.org
Cc: "Rafael J. Wysocki" 
Cc: r...@vger.kernel.org
Cc: Robin Holt 
Cc: Song Liu 
Cc: Srinivas Pandruvada 
Cc: Steffen Klassert 
Cc: Stephen Boyd 
Cc: Steven Rostedt 
Cc: Steve Wahl 
Cc: Stuart Hayes 
Cc: Suzuki K Poulose 
Cc: Tejun Heo 
Cc: Thomas Bogendoerfer 
Cc: Thomas Gleixner 
Cc: Tony Luck 
Cc: Vasily Gorbik 
Cc: Vincent Guittot 
Cc: Viresh Kumar 
Cc: virtualizat...@lists.linux-foundation.org
Cc: x...@kernel.org
Cc: Zefan Li 
Cc: Zhang Rui 

Sebastian



[PATCH 03/38] powerpc: Replace deprecated CPU-hotplug functions.

2021-08-03 Thread Sebastian Andrzej Siewior
The functions get_online_cpus() and put_online_cpus() have been
deprecated during the CPU hotplug rework. They map directly to
cpus_read_lock() and cpus_read_unlock().

Replace deprecated CPU-hotplug functions with the official version.
The behavior remains unchanged.

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: kvm-...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
---
 arch/powerpc/kernel/rtasd.c   |  4 ++--
 arch/powerpc/kvm/book3s_hv_builtin.c  | 10 +-
 arch/powerpc/platforms/powernv/idle.c |  4 ++--
 arch/powerpc/platforms/powernv/opal-imc.c |  8 
 4 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index 8561dfb33f241..32ee17753eb4a 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -429,7 +429,7 @@ static void rtas_event_scan(struct work_struct *w)
 
do_event_scan();
 
-   get_online_cpus();
+   cpus_read_lock();
 
/* raw_ OK because just using CPU as starting point. */
cpu = cpumask_next(raw_smp_processor_id(), cpu_online_mask);
@@ -451,7 +451,7 @@ static void rtas_event_scan(struct work_struct *w)
schedule_delayed_work_on(cpu, &event_scan_work,
__round_jiffies_relative(event_scan_delay, cpu));
 
-   put_online_cpus();
+   cpus_read_unlock();
 }
 
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c 
b/arch/powerpc/kvm/book3s_hv_builtin.c
index be8ef1c5b1bfb..fcf4760a3a0ea 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -137,23 +137,23 @@ long int kvmppc_rm_h_confer(struct kvm_vcpu *vcpu, int 
target,
  * exist in the system. We use a counter of VMs to track this.
  *
  * One of the operations we need to block is onlining of secondaries, so we
- * protect hv_vm_count with get/put_online_cpus().
+ * protect hv_vm_count with cpus_read_lock/unlock().
  */
 static atomic_t hv_vm_count;
 
 void kvm_hv_vm_activated(void)
 {
-   get_online_cpus();
+   cpus_read_lock();
atomic_inc(&hv_vm_count);
-   put_online_cpus();
+   cpus_read_unlock();
 }
 EXPORT_SYMBOL_GPL(kvm_hv_vm_activated);
 
 void kvm_hv_vm_deactivated(void)
 {
-   get_online_cpus();
+   cpus_read_lock();
atomic_dec(&hv_vm_count);
-   put_online_cpus();
+   cpus_read_unlock();
 }
 EXPORT_SYMBOL_GPL(kvm_hv_vm_deactivated);
 
diff --git a/arch/powerpc/platforms/powernv/idle.c 
b/arch/powerpc/platforms/powernv/idle.c
index 528a7e0cf83aa..aa27689b832db 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -199,12 +199,12 @@ static ssize_t 
store_fastsleep_workaround_applyonce(struct device *dev,
 */
power7_fastsleep_workaround_exit = false;
 
-   get_online_cpus();
+   cpus_read_lock();
primary_thread_mask = cpu_online_cores_map();
on_each_cpu_mask(&primary_thread_mask,
pnv_fastsleep_workaround_apply,
&err, 1);
-   put_online_cpus();
+   cpus_read_unlock();
if (err) {
pr_err("fastsleep_workaround_applyonce change failed while 
running pnv_fastsleep_workaround_apply");
goto fail;
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c 
b/arch/powerpc/platforms/powernv/opal-imc.c
index 7824cc364bc40..ba02a75c14102 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -186,7 +186,7 @@ static void disable_nest_pmu_counters(void)
int nid, cpu;
const struct cpumask *l_cpumask;
 
-   get_online_cpus();
+   cpus_read_lock();
for_each_node_with_cpus(nid) {
l_cpumask = cpumask_of_node(nid);
cpu = cpumask_first_and(l_cpumask, cpu_online_mask);
@@ -195,7 +195,7 @@ static void disable_nest_pmu_counters(void)
opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST,
   get_hard_smp_processor_id(cpu));
}
-   put_online_cpus();
+   cpus_read_unlock();
 }
 
 static void disable_core_pmu_counters(void)
@@ -203,7 +203,7 @@ static void disable_core_pmu_counters(void)
cpumask_t cores_map;
int cpu, rc;
 
-   get_online_cpus();
+   cpus_read_lock();
/* Disable the IMC Core functions */
cores_map = cpu_online_cores_map();
for_each_cpu(cpu, &cores_map) {
@@ -213,7 +213,7 @@ static void disable_core_pmu_counters(void)
pr_err("%s: Failed to stop Core (cpu = %d)\n",
__FUNCTION__, cpu);
}
-   put_online_cpus();
+   cpus_read_unlock();
 }
 
 int get_max_nest_dev(void)
-- 
2.32.0



[PATCH] powerpc/mm: Move the linear_mapping_mutex to the ifdef where it is used

2021-02-19 Thread Sebastian Andrzej Siewior
The mutex linear_mapping_mutex is defined at the of the file while its
only two user are within the CONFIG_MEMORY_HOTPLUG block.
A compile without CONFIG_MEMORY_HOTPLUG set fails on PREEMPT_RT because
its mutex implementation is smart enough to realize that it is unused.

Move the definition of linear_mapping_mutex to ifdef block where it is
used.

Fixes: 1f73ad3e8d755 ("powerpc/mm: print warning in 
arch_remove_linear_mapping()")
Signed-off-by: Sebastian Andrzej Siewior 
---
 arch/powerpc/mm/mem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index afab328d08874..d6c3f0b79f1d1 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -54,7 +54,6 @@
 
 #include 
 
-static DEFINE_MUTEX(linear_mapping_mutex);
 unsigned long long memory_limit;
 bool init_mem_is_free;
 
@@ -72,6 +71,7 @@ pgprot_t phys_mem_access_prot(struct file *file, unsigned 
long pfn,
 EXPORT_SYMBOL(phys_mem_access_prot);
 
 #ifdef CONFIG_MEMORY_HOTPLUG
+static DEFINE_MUTEX(linear_mapping_mutex);
 
 #ifdef CONFIG_NUMA
 int memory_add_physaddr_to_nid(u64 start)
-- 
2.30.0



Re: [patch V3 13/37] mips/mm/highmem: Switch to generic kmap atomic

2021-01-11 Thread Sebastian Andrzej Siewior
On 2021-01-09 01:33:52 [+0100], Thomas Bogendoerfer wrote:
> On Sat, Jan 09, 2021 at 12:58:05AM +0100, Thomas Bogendoerfer wrote:
> > On Fri, Jan 08, 2021 at 08:20:43PM +, Paul Cercueil wrote:
> > > Hi Thomas,
> > > 
> > > 5.11 does not boot anymore on Ingenic SoCs, I bisected it to this commit.
> > > 
> > > Any idea what could be happening?
> > 
> > not yet, kernel crash log of a Malta QEMU is below.
> 
> update:
> 
> This dirty hack lets the Malta QEMU boot again:
> 
> diff --git a/mm/highmem.c b/mm/highmem.c
> index c3a9ea7875ef..190cdda1149d 100644
> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -515,7 +515,7 @@ void *__kmap_local_pfn_prot(unsigned long pfn, pgprot_t 
> prot)
>   vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
>   BUG_ON(!pte_none(*(kmap_pte - idx)));
>   pteval = pfn_pte(pfn, prot);
> - set_pte_at(&init_mm, vaddr, kmap_pte - idx, pteval);
> + set_pte(kmap_pte - idx, pteval);
>   arch_kmap_local_post_map(vaddr, pteval);
>   current->kmap_ctrl.pteval[kmap_local_idx()] = pteval;
>   preempt_enable();
> 
> set_pte_at() tries to update cache and could do an kmap_atomic() there.
So the old implementation used set_pte() while the new one uses
set_pte_at().

> Not sure, if this is allowed at this point.
The problem is the recursion
  kmap_atomic() -> __update_cache() -> kmap_atomic()

and kmap_local_idx_push() runs out if index space before stack space.

I'm not sure if the __update_cache() worked for highmem. It has been
added for that in commit
   f4281bba81810 ("MIPS: Handle highmem pages in __update_cache")

but it assumes that the address returned by kmap_atomic() is the same or
related enough for flush_data_cache_page() to work.

> Thomas.
> 

Sebastian


Re: [patch V3 10/37] ARM: highmem: Switch to generic kmap atomic

2020-11-12 Thread Sebastian Andrzej Siewior
On 2020-11-12 09:10:34 [+0100], Marek Szyprowski wrote:
> I can do more tests to help fixing this issue. Just let me know what to do.

-> https://lkml.kernel.org/r/87y2j6n8mj@nanos.tec.linutronix.de

Sebastian


[PATCH v2 net-next 3/3] crypto: caam: Replace in_irq() usage.

2020-11-01 Thread Sebastian Andrzej Siewior
The driver uses in_irq() + in_serving_softirq() magic to decide if NAPI
scheduling is required or packet processing.

The usage of in_*() in drivers is phased out and Linus clearly requested
that code which changes behaviour depending on context should either be
separated or the context be conveyed in an argument passed by the caller,
which usually knows the context.

Use the `sched_napi' argument passed by the callback. It is set true if
called from the interrupt handler and NAPI should be scheduled.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: "Horia Geantă" 
Cc: Aymen Sghaier 
Cc: Herbert Xu 
Cc: "David S. Miller" 
Cc: Madalin Bucur 
Cc: Jakub Kicinski 
Cc: Li Yang 
Cc: linux-cry...@vger.kernel.org
Cc: net...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-arm-ker...@lists.infradead.org
---
 drivers/crypto/caam/qi.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/crypto/caam/qi.c b/drivers/crypto/caam/qi.c
index 57f6ab6bfb56a..8163f5df8ebf7 100644
--- a/drivers/crypto/caam/qi.c
+++ b/drivers/crypto/caam/qi.c
@@ -545,14 +545,10 @@ static void cgr_cb(struct qman_portal *qm, struct 
qman_cgr *cgr, int congested)
}
 }
 
-static int caam_qi_napi_schedule(struct qman_portal *p, struct caam_napi *np)
+static int caam_qi_napi_schedule(struct qman_portal *p, struct caam_napi *np,
+bool sched_napi)
 {
-   /*
-* In case of threaded ISR, for RT kernels in_irq() does not return
-* appropriate value, so use in_serving_softirq to distinguish between
-* softirq and irq contexts.
-*/
-   if (unlikely(in_irq() || !in_serving_softirq())) {
+   if (sched_napi) {
/* Disable QMan IRQ source and invoke NAPI */
qman_p_irqsource_remove(p, QM_PIRQ_DQRI);
np->p = p;
@@ -574,7 +570,7 @@ static enum qman_cb_dqrr_result caam_rsp_fq_dqrr_cb(struct 
qman_portal *p,
struct caam_drv_private *priv = dev_get_drvdata(qidev);
u32 status;
 
-   if (caam_qi_napi_schedule(p, caam_napi))
+   if (caam_qi_napi_schedule(p, caam_napi, sched_napi))
return qman_cb_dqrr_stop;
 
fd = &dqrr->fd;
-- 
2.29.1



[PATCH v2 net-next 2/3] net: dpaa: Replace in_irq() usage.

2020-11-01 Thread Sebastian Andrzej Siewior
The driver uses in_irq() + in_serving_softirq() magic to decide if NAPI
scheduling is required or packet processing.

The usage of in_*() in drivers is phased out and Linus clearly requested
that code which changes behaviour depending on context should either be
separated or the context be conveyed in an argument passed by the caller,
which usually knows the context.

Use the `sched_napi' argument passed by the callback. It is set true if
called from the interrupt handler and NAPI should be scheduled.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: "Horia Geantă" 
Cc: Aymen Sghaier 
Cc: Herbert Xu 
Cc: "David S. Miller" 
Cc: Madalin Bucur 
Cc: Jakub Kicinski 
Cc: Li Yang 
Cc: linux-cry...@vger.kernel.org
Cc: net...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-arm-ker...@lists.infradead.org
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 98ead77c673e5..39c996b64ccec 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -2300,9 +2300,9 @@ static void dpaa_tx_conf(struct net_device *net_dev,
 }
 
 static inline int dpaa_eth_napi_schedule(struct dpaa_percpu_priv *percpu_priv,
-struct qman_portal *portal)
+struct qman_portal *portal, bool 
sched_napi)
 {
-   if (unlikely(in_irq() || !in_serving_softirq())) {
+   if (sched_napi) {
/* Disable QMan IRQ and invoke NAPI */
qman_p_irqsource_remove(portal, QM_PIRQ_DQRI);
 
@@ -2333,7 +2333,7 @@ static enum qman_cb_dqrr_result rx_error_dqrr(struct 
qman_portal *portal,
 
percpu_priv = this_cpu_ptr(priv->percpu_priv);
 
-   if (dpaa_eth_napi_schedule(percpu_priv, portal))
+   if (dpaa_eth_napi_schedule(percpu_priv, portal, sched_napi))
return qman_cb_dqrr_stop;
 
dpaa_eth_refill_bpools(priv);
@@ -2377,7 +2377,7 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct 
qman_portal *portal,
percpu_priv = this_cpu_ptr(priv->percpu_priv);
percpu_stats = &percpu_priv->stats;
 
-   if (unlikely(dpaa_eth_napi_schedule(percpu_priv, portal)))
+   if (unlikely(dpaa_eth_napi_schedule(percpu_priv, portal, sched_napi)))
return qman_cb_dqrr_stop;
 
/* Make sure we didn't run out of buffers */
@@ -2474,7 +2474,7 @@ static enum qman_cb_dqrr_result conf_error_dqrr(struct 
qman_portal *portal,
 
percpu_priv = this_cpu_ptr(priv->percpu_priv);
 
-   if (dpaa_eth_napi_schedule(percpu_priv, portal))
+   if (dpaa_eth_napi_schedule(percpu_priv, portal, sched_napi))
return qman_cb_dqrr_stop;
 
dpaa_tx_error(net_dev, priv, percpu_priv, &dq->fd, fq->fqid);
@@ -2499,7 +2499,7 @@ static enum qman_cb_dqrr_result conf_dflt_dqrr(struct 
qman_portal *portal,
 
percpu_priv = this_cpu_ptr(priv->percpu_priv);
 
-   if (dpaa_eth_napi_schedule(percpu_priv, portal))
+   if (dpaa_eth_napi_schedule(percpu_priv, portal, sched_napi))
return qman_cb_dqrr_stop;
 
dpaa_tx_conf(net_dev, priv, percpu_priv, &dq->fd, fq->fqid);
-- 
2.29.1



[PATCH v2 net-next 1/3] soc/fsl/qbman: Add an argument to signal if NAPI processing is required.

2020-11-01 Thread Sebastian Andrzej Siewior
dpaa_eth_napi_schedule() and caam_qi_napi_schedule() schedule NAPI if
invoked from:

 - Hard interrupt context
 - Any context which is not serving soft interrupts

Any context which is not serving soft interrupts includes hard interrupts
so the in_irq() check is redundant. caam_qi_napi_schedule() has a comment
about this:

/*
 * In case of threaded ISR, for RT kernels in_irq() does not return
 * appropriate value, so use in_serving_softirq to distinguish between
 * softirq and irq contexts.
 */
 if (in_irq() || !in_serving_softirq())

This has nothing to do with RT. Even on a non RT kernel force threaded
interrupts run obviously in thread context and therefore in_irq() returns
false when invoked from the handler.

The extension of the in_irq() check with !in_serving_softirq() was there
when the drivers were added, but in the out of tree FSL BSP the original
condition was in_irq() which got extended due to failures on RT.

The usage of in_xxx() in drivers is phased out and Linus clearly requested
that code which changes behaviour depending on context should either be
separated or the context be conveyed in an argument passed by the caller,
which usually knows the context. Right he is, the above construct is
clearly showing why.

The following callchains have been analyzed to end up in
dpaa_eth_napi_schedule():

qman_p_poll_dqrr()
  __poll_portal_fast()
fq->cb.dqrr()
   dpaa_eth_napi_schedule()

portal_isr()
  __poll_portal_fast()
fq->cb.dqrr()
   dpaa_eth_napi_schedule()

Both need to schedule NAPI.
The crypto part has another code path leading up to this:
  kill_fq()
 empty_retired_fq()
   qman_p_poll_dqrr()
 __poll_portal_fast()
fq->cb.dqrr()
   dpaa_eth_napi_schedule()

kill_fq() is called from task context and ends up scheduling NAPI, but
that's pointless and an unintended side effect of the !in_serving_softirq()
check.

The code path:
  caam_qi_poll() -> qman_p_poll_dqrr()

is invoked from NAPI and I *assume* from crypto's NAPI device and not
from qbman's NAPI device. I *guess* it is okay to skip scheduling NAPI
(because this is what happens now) but could be changed if it is wrong
due to `budget' handling.

Add an argument to __poll_portal_fast() which is true if NAPI needs to be
scheduled. This requires propagating the value to the caller including
`qman_cb_dqrr' typedef which is used by the dpaa and the crypto driver.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: "Horia Geantă" 
Cc: Aymen Sghaier 
Cc: Herbert XS 
Cc: "David S. Miller" 
Cc: Madalin Bucur 
Cc: Jakub Kicinski 
Cc: Li Yang 
Cc: linux-cry...@vger.kernel.org
Cc: net...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-arm-ker...@lists.infradead.org
---
 drivers/crypto/caam/qi.c   |  3 ++-
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 12 
 drivers/soc/fsl/qbman/qman.c   | 12 ++--
 drivers/soc/fsl/qbman/qman_test_api.c  |  6 --
 drivers/soc/fsl/qbman/qman_test_stash.c|  6 --
 include/soc/fsl/qman.h |  3 ++-
 6 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/drivers/crypto/caam/qi.c b/drivers/crypto/caam/qi.c
index ec53528d82058..57f6ab6bfb56a 100644
--- a/drivers/crypto/caam/qi.c
+++ b/drivers/crypto/caam/qi.c
@@ -564,7 +564,8 @@ static int caam_qi_napi_schedule(struct qman_portal *p, 
struct caam_napi *np)
 
 static enum qman_cb_dqrr_result caam_rsp_fq_dqrr_cb(struct qman_portal *p,
struct qman_fq *rsp_fq,
-   const struct qm_dqrr_entry 
*dqrr)
+   const struct qm_dqrr_entry 
*dqrr,
+   bool sched_napi)
 {
struct caam_napi *caam_napi = raw_cpu_ptr(&pcpu_qipriv.caam_napi);
struct caam_drv_req *drv_req;
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 06cc863f4dd63..98ead77c673e5 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -2316,7 +2316,8 @@ static inline int dpaa_eth_napi_schedule(struct 
dpaa_percpu_priv *percpu_priv,
 
 static enum qman_cb_dqrr_result rx_error_dqrr(struct qman_portal *portal,
  struct qman_fq *fq,
- const struct qm_dqrr_entry *dq)
+ const struct qm_dqrr_entry *dq,
+ bool sched_napi)
 {
struct dpaa_fq *dpaa_fq = container_of(fq, struct dpaa_fq, fq_base);
struct dpaa_percpu_priv *percpu_priv;
@@ -2343,7 +2344,8 @@ static enum qman_cb_dqrr_result rx_error_dqrr(struct 
qman_portal *portal,
 
 stati

Re: [PATCH net-next 14/15] net: dpaa: Replace in_irq() usage.

2020-11-01 Thread Sebastian Andrzej Siewior
On 2020-10-31 10:12:15 [-0700], Jakub Kicinski wrote:
> Nit: some networking drivers have a bool napi which means "are we
> running in napi context", the semantics here feel a little backwards,
> at least to me. But if I'm the only one thinking this, so be it.

I renamed it to `sched_napi'.

Sebastian


[PATCH net-next 13/15] soc/fsl/qbman: Add an argument to signal if NAPI processing is required.

2020-10-27 Thread Sebastian Andrzej Siewior
dpaa_eth_napi_schedule() and caam_qi_napi_schedule() schedule NAPI if
invoked from:

 - Hard interrupt context
 - Any context which is not serving soft interrupts

Any context which is not serving soft interrupts includes hard interrupts
so the in_irq() check is redundant. caam_qi_napi_schedule() has a comment
about this:

/*
 * In case of threaded ISR, for RT kernels in_irq() does not return
 * appropriate value, so use in_serving_softirq to distinguish between
 * softirq and irq contexts.
 */
 if (in_irq() || !in_serving_softirq())

This has nothing to do with RT. Even on a non RT kernel force threaded
interrupts run obviously in thread context and therefore in_irq() returns
false when invoked from the handler.

The extension of the in_irq() check with !in_serving_softirq() was there
when the drivers were added, but in the out of tree FSL BSP the original
condition was in_irq() which got extended due to failures on RT.

The usage of in_xxx() in drivers is phased out and Linus clearly requested
that code which changes behaviour depending on context should either be
seperated or the context be conveyed in an argument passed by the caller,
which usually knows the context. Right he is, the above construct is
clearly showing why.

The following callchains have been analyzed to end up in
dpaa_eth_napi_schedule():

qman_p_poll_dqrr()
  __poll_portal_fast()
fq->cb.dqrr()
   dpaa_eth_napi_schedule()

portal_isr()
  __poll_portal_fast()
fq->cb.dqrr()
   dpaa_eth_napi_schedule()

Both need to schedule NAPI.
The crypto part has another code path leading up to this:
  kill_fq()
 empty_retired_fq()
   qman_p_poll_dqrr()
 __poll_portal_fast()
fq->cb.dqrr()
   dpaa_eth_napi_schedule()

kill_fq() is called from task context and ends up scheduling NAPI, but
that's pointless and an unintended side effect of the !in_serving_softirq()
check.

The code path:
  caam_qi_poll() -> qman_p_poll_dqrr()

is invoked from NAPI and I *assume* from crypto's NAPI device and not
from qbman's NAPI device. I *guess* it is okay to skip scheduling NAPI
(because this is what happens now) but could be changed if it is wrong
due to `budget' handling.

Add an argument to __poll_portal_fast() which is true if NAPI needs to be
scheduled. This requires propagating the value to the caller including
`qman_cb_dqrr' typedef which is used by the dpaa and the crypto driver.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: "Horia Geantă" 
Cc: Aymen Sghaier 
Cc: Herbert Xu 
Cc: "David S. Miller" 
Cc: Madalin Bucur 
Cc: Jakub Kicinski 
Cc: Li Yang 
Cc: linux-cry...@vger.kernel.org
Cc: net...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-arm-ker...@lists.infradead.org
---
 drivers/crypto/caam/qi.c   |  3 ++-
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 12 
 drivers/soc/fsl/qbman/qman.c   | 12 ++--
 drivers/soc/fsl/qbman/qman_test_api.c  |  6 --
 drivers/soc/fsl/qbman/qman_test_stash.c|  6 --
 include/soc/fsl/qman.h |  3 ++-
 6 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/drivers/crypto/caam/qi.c b/drivers/crypto/caam/qi.c
index ec53528d82058..09ea398304c8b 100644
--- a/drivers/crypto/caam/qi.c
+++ b/drivers/crypto/caam/qi.c
@@ -564,7 +564,8 @@ static int caam_qi_napi_schedule(struct qman_portal *p, 
struct caam_napi *np)
 
 static enum qman_cb_dqrr_result caam_rsp_fq_dqrr_cb(struct qman_portal *p,
struct qman_fq *rsp_fq,
-   const struct qm_dqrr_entry 
*dqrr)
+   const struct qm_dqrr_entry 
*dqrr,
+   bool napi)
 {
struct caam_napi *caam_napi = raw_cpu_ptr(&pcpu_qipriv.caam_napi);
struct caam_drv_req *drv_req;
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 06cc863f4dd63..27835310b718e 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -2316,7 +2316,8 @@ static inline int dpaa_eth_napi_schedule(struct 
dpaa_percpu_priv *percpu_priv,
 
 static enum qman_cb_dqrr_result rx_error_dqrr(struct qman_portal *portal,
  struct qman_fq *fq,
- const struct qm_dqrr_entry *dq)
+ const struct qm_dqrr_entry *dq,
+ bool napi)
 {
struct dpaa_fq *dpaa_fq = container_of(fq, struct dpaa_fq, fq_base);
struct dpaa_percpu_priv *percpu_priv;
@@ -2343,7 +2344,8 @@ static enum qman_cb_dqrr_result rx_error_dqrr(struct 
qman_portal *portal,
 
 static enum q

[PATCH net-next 14/15] net: dpaa: Replace in_irq() usage.

2020-10-27 Thread Sebastian Andrzej Siewior
The driver uses in_irq() + in_serving_softirq() magic to decide if NAPI
scheduling is required or packet processing.

The usage of in_*() in drivers is phased out and Linus clearly requested
that code which changes behaviour depending on context should either be
seperated or the context be conveyed in an argument passed by the caller,
which usually knows the context.

Use the `napi' argument passed by the callback. It is set true if
called from the interrupt handler and NAPI should be scheduled.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: "Horia Geantă" 
Cc: Aymen Sghaier 
Cc: Herbert Xu 
Cc: "David S. Miller" 
Cc: Madalin Bucur 
Cc: Jakub Kicinski 
Cc: Li Yang 
Cc: linux-cry...@vger.kernel.org
Cc: net...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-arm-ker...@lists.infradead.org
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 27835310b718e..2c949acd74c67 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -2300,9 +2300,9 @@ static void dpaa_tx_conf(struct net_device *net_dev,
 }
 
 static inline int dpaa_eth_napi_schedule(struct dpaa_percpu_priv *percpu_priv,
-struct qman_portal *portal)
+struct qman_portal *portal, bool napi)
 {
-   if (unlikely(in_irq() || !in_serving_softirq())) {
+   if (napi) {
/* Disable QMan IRQ and invoke NAPI */
qman_p_irqsource_remove(portal, QM_PIRQ_DQRI);
 
@@ -2333,7 +2333,7 @@ static enum qman_cb_dqrr_result rx_error_dqrr(struct 
qman_portal *portal,
 
percpu_priv = this_cpu_ptr(priv->percpu_priv);
 
-   if (dpaa_eth_napi_schedule(percpu_priv, portal))
+   if (dpaa_eth_napi_schedule(percpu_priv, portal, napi))
return qman_cb_dqrr_stop;
 
dpaa_eth_refill_bpools(priv);
@@ -2377,7 +2377,7 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct 
qman_portal *portal,
percpu_priv = this_cpu_ptr(priv->percpu_priv);
percpu_stats = &percpu_priv->stats;
 
-   if (unlikely(dpaa_eth_napi_schedule(percpu_priv, portal)))
+   if (unlikely(dpaa_eth_napi_schedule(percpu_priv, portal, napi)))
return qman_cb_dqrr_stop;
 
/* Make sure we didn't run out of buffers */
@@ -2474,7 +2474,7 @@ static enum qman_cb_dqrr_result conf_error_dqrr(struct 
qman_portal *portal,
 
percpu_priv = this_cpu_ptr(priv->percpu_priv);
 
-   if (dpaa_eth_napi_schedule(percpu_priv, portal))
+   if (dpaa_eth_napi_schedule(percpu_priv, portal, napi))
return qman_cb_dqrr_stop;
 
dpaa_tx_error(net_dev, priv, percpu_priv, &dq->fd, fq->fqid);
@@ -2499,7 +2499,7 @@ static enum qman_cb_dqrr_result conf_dflt_dqrr(struct 
qman_portal *portal,
 
percpu_priv = this_cpu_ptr(priv->percpu_priv);
 
-   if (dpaa_eth_napi_schedule(percpu_priv, portal))
+   if (dpaa_eth_napi_schedule(percpu_priv, portal, napi))
return qman_cb_dqrr_stop;
 
dpaa_tx_conf(net_dev, priv, percpu_priv, &dq->fd, fq->fqid);
-- 
2.28.0



[PATCH net-next 12/15] net: rtlwifi: Remove in_interrupt() usage in halbtc_send_bt_mp_operation()

2020-10-27 Thread Sebastian Andrzej Siewior
halbtc_send_bt_mp_operation() uses in_interrupt() to determine if it is
safe to invoke wait_for_completion().

The usage of in_interrupt() in drivers is phased out and Linus clearly
requested that code which changes behaviour depending on context should
either be separated or the context be conveyed in an argument passed by the
caller, which usually knows the context.

Aside of that in_interrupt() is not correct as it does not catch preempt
disabled regions which neither can sleep.

halbtc_send_bt_mp_operation() is called from:

 rtl_watchdog_wq_callback()
   rtl_btc_periodical()
 halbtc_get()
   case BTC_GET_U4_BT_PATCH_VER:
 halbtc_get_bt_patch_version()

 which is preemtible context.

   rtl_c2h_content_parsing()
 btc_ops->btc_btinfo_notify()
   rtl_btc_btinfo_notify()
 exhalbtc_bt_info_notify()
   ex_btc8723b1ant_bt_info_notify()
   ex_btc8821a1ant_bt_info_notify()
   ex_btc8821a2ant_bt_info_notify()
 btcoexist->btc_set_bt_reg()
   halbtc_set_bt_reg()

   rtl_c2h_content_parsing() is in turn called from:

   rtl_c2hcmd_wq_callback()
 rtl_c2hcmd_launcher()

   which is preemptible context and from:

   _rtl_pci_rx_interrupt
 rtl_c2hcmd_enqueue()

   which is obviously not preemptible but limited to C2H_BT_MP commands
   which does invoke rtl_c2h_content_parsing().

Aside of that it can be reached from:

 halbtc_get()
   case BTC_GET_U4_SUPPORTED_FEATURE:
 halbtc_get_bt_coex_supported_feature()
   case BTC_GET_U4_BT_FORBIDDEN_SLOT_VAL:
 halbtc_get_bt_forbidden_slot_val()
   case BTC_GET_U4_BT_DEVICE_INFO:
 halbtc_get_bt_device_info()
   case BTC_GET_U4_SUPPORTED_VERSION:
 halbtc_get_bt_coex_supported_version()
   case BTC_GET_U4_SUPPORTED_FEATURE:
 halbtc_get_bt_coex_supported_feature()

   btcoexist->btc_get_bt_afh_map_from_bt()
 halbtc_get_bt_afh_map_from_bt()

   btcoexist->btc_get_ble_scan_para_from_bt()
 halbtc_get_ble_scan_para_from_bt()

   btcoexist->btc_get_ble_scan_type_from_bt()
 halbtc_get_ble_scan_type_from_bt()

   btcoexist->btc_get_ant_det_val_from_bt()
 halbtc_get_ant_det_val_from_bt()

   btcoexist->btc_get_bt_coex_supported_version()
 halbtc_get_bt_coex_supported_version()

   btcoexist->btc_get_bt_coex_supported_feature()
 halbtc_get_bt_coex_supported_feature()

None of these have a caller. Welcome to the wonderful world of HALs and
onion layers.

Remove in_interrupt() check.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: Ping-Ke Shih 
Cc: Kalle Valo 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: linux-wirel...@vger.kernel.org
Cc: net...@vger.kernel.org
---
 drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c 
b/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c
index 2155a6699ef8d..be4c0e60d44d1 100644
--- a/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c
+++ b/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c
@@ -240,9 +240,6 @@ bool halbtc_send_bt_mp_operation(struct btc_coexist 
*btcoexist, u8 op_code,
rtl_dbg(rtlpriv, COMP_BT_COEXIST, DBG_LOUD,
"btmpinfo wait req_num=%d wait=%ld\n", req_num, wait_ms);
 
-   if (in_interrupt())
-   return false;
-
if (wait_for_completion_timeout(&btcoexist->bt_mp_comp,
msecs_to_jiffies(wait_ms)) == 0) {
rtl_dbg(rtlpriv, COMP_BT_COEXIST, DBG_DMESG,
-- 
2.28.0



[PATCH net-next 15/15] crypto: caam: Replace in_irq() usage.

2020-10-27 Thread Sebastian Andrzej Siewior
The driver uses in_irq() + in_serving_softirq() magic to decide if NAPI
scheduling is required or packet processing.

The usage of in_*() in drivers is phased out and Linus clearly requested
that code which changes behaviour depending on context should either be
seperated or the context be conveyed in an argument passed by the caller,
which usually knows the context.

Use the `napi' argument passed by the callback. It is set true if
called from the interrupt handler and NAPI should be scheduled.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: "Horia Geantă" 
Cc: Aymen Sghaier 
Cc: Herbert Xu 
Cc: "David S. Miller" 
Cc: Madalin Bucur 
Cc: Jakub Kicinski 
Cc: Li Yang 
Cc: linux-cry...@vger.kernel.org
Cc: net...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-arm-ker...@lists.infradead.org
---
 drivers/crypto/caam/qi.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/crypto/caam/qi.c b/drivers/crypto/caam/qi.c
index 09ea398304c8b..79dbd90887f8a 100644
--- a/drivers/crypto/caam/qi.c
+++ b/drivers/crypto/caam/qi.c
@@ -545,14 +545,10 @@ static void cgr_cb(struct qman_portal *qm, struct 
qman_cgr *cgr, int congested)
}
 }
 
-static int caam_qi_napi_schedule(struct qman_portal *p, struct caam_napi *np)
+static int caam_qi_napi_schedule(struct qman_portal *p, struct caam_napi *np,
+bool napi)
 {
-   /*
-* In case of threaded ISR, for RT kernels in_irq() does not return
-* appropriate value, so use in_serving_softirq to distinguish between
-* softirq and irq contexts.
-*/
-   if (unlikely(in_irq() || !in_serving_softirq())) {
+   if (napi) {
/* Disable QMan IRQ source and invoke NAPI */
qman_p_irqsource_remove(p, QM_PIRQ_DQRI);
np->p = p;
@@ -574,7 +570,7 @@ static enum qman_cb_dqrr_result caam_rsp_fq_dqrr_cb(struct 
qman_portal *p,
struct caam_drv_private *priv = dev_get_drvdata(qidev);
u32 status;
 
-   if (caam_qi_napi_schedule(p, caam_napi))
+   if (caam_qi_napi_schedule(p, caam_napi, napi))
return qman_cb_dqrr_stop;
 
fd = &dqrr->fd;
-- 
2.28.0



[PATCH net-next 08/15] net: airo: Replace in_atomic() usage.

2020-10-27 Thread Sebastian Andrzej Siewior
issuecommand() is using in_atomic() to decide if it is safe to invoke
schedule() while waiting for the command to be accepted.

Usage of in_atomic() for this is only half correct as it can not detect all
condition where it is not allowed to schedule(). Also Linus clearly
requested that code which changes behaviour depending on context should
either be seperated or the context be conveyed in an argument passed by the
caller, which usually knows the context.

Add an may_sleep argument to issuecommand() indicating when it is save to
sleep and change schedule() to cond_resched() because it's pointless to
invoke schedule() if there is no request to reschedule.

Pass the may_sleep condition through the various call chains leading to
issuecommand().

Signed-off-by: Sebastian Andrzej Siewior 
Cc: Kalle Valo 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: linux-wirel...@vger.kernel.org
Cc: net...@vger.kernel.org
---
 drivers/net/wireless/cisco/airo.c | 86 +--
 1 file changed, 47 insertions(+), 39 deletions(-)

diff --git a/drivers/net/wireless/cisco/airo.c 
b/drivers/net/wireless/cisco/airo.c
index 369a6ca44d1ff..74acf9af2adb1 100644
--- a/drivers/net/wireless/cisco/airo.c
+++ b/drivers/net/wireless/cisco/airo.c
@@ -1115,7 +1115,8 @@ static int enable_MAC(struct airo_info *ai, int lock);
 static void disable_MAC(struct airo_info *ai, int lock);
 static void enable_interrupts(struct airo_info*);
 static void disable_interrupts(struct airo_info*);
-static u16 issuecommand(struct airo_info*, Cmd *pCmd, Resp *pRsp);
+static u16 issuecommand(struct airo_info*, Cmd *pCmd, Resp *pRsp,
+   bool may_sleep);
 static int bap_setup(struct airo_info*, u16 rid, u16 offset, int whichbap);
 static int aux_bap_read(struct airo_info*, __le16 *pu16Dst, int bytelen,
int whichbap);
@@ -1130,8 +1131,10 @@ static int PC4500_writerid(struct airo_info*, u16 rid, 
const void
 static int do_writerid(struct airo_info*, u16 rid, const void *rid_data,
int len, int dummy);
 static u16 transmit_allocate(struct airo_info*, int lenPayload, int raw);
-static int transmit_802_3_packet(struct airo_info*, int len, char *pPacket);
-static int transmit_802_11_packet(struct airo_info*, int len, char *pPacket);
+static int transmit_802_3_packet(struct airo_info*, int len, char *pPacket,
+bool may_sleep);
+static int transmit_802_11_packet(struct airo_info*, int len, char *pPacket,
+ bool may_sleep);
 
 static int mpi_send_packet(struct net_device *dev);
 static void mpi_unmap_card(struct pci_dev *pci);
@@ -1753,7 +1756,7 @@ static int readBSSListRid(struct airo_info *ai, int first,
if (down_interruptible(&ai->sem))
return -ERESTARTSYS;
ai->list_bss_task = current;
-   issuecommand(ai, &cmd, &rsp);
+   issuecommand(ai, &cmd, &rsp, true);
up(&ai->sem);
/* Let the command take effect */
schedule_timeout_uninterruptible(3 * HZ);
@@ -2096,7 +2099,7 @@ static void get_tx_error(struct airo_info *ai, s32 fid)
}
 }
 
-static void airo_end_xmit(struct net_device *dev)
+static void airo_end_xmit(struct net_device *dev, bool may_sleep)
 {
u16 status;
int i;
@@ -2107,7 +2110,7 @@ static void airo_end_xmit(struct net_device *dev)
 
clear_bit(JOB_XMIT, &priv->jobs);
clear_bit(FLAG_PENDING_XMIT, &priv->flags);
-   status = transmit_802_3_packet (priv, fids[fid], skb->data);
+   status = transmit_802_3_packet(priv, fids[fid], skb->data, may_sleep);
up(&priv->sem);
 
i = 0;
@@ -2164,11 +2167,11 @@ static netdev_tx_t airo_start_xmit(struct sk_buff *skb,
set_bit(JOB_XMIT, &priv->jobs);
wake_up_interruptible(&priv->thr_wait);
} else
-   airo_end_xmit(dev);
+   airo_end_xmit(dev, false);
return NETDEV_TX_OK;
 }
 
-static void airo_end_xmit11(struct net_device *dev)
+static void airo_end_xmit11(struct net_device *dev, bool may_sleep)
 {
u16 status;
int i;
@@ -2179,7 +2182,7 @@ static void airo_end_xmit11(struct net_device *dev)
 
clear_bit(JOB_XMIT11, &priv->jobs);
clear_bit(FLAG_PENDING_XMIT11, &priv->flags);
-   status = transmit_802_11_packet (priv, fids[fid], skb->data);
+   status = transmit_802_11_packet(priv, fids[fid], skb->data, may_sleep);
up(&priv->sem);
 
i = MAX_FIDS / 2;
@@ -2243,7 +2246,7 @@ static netdev_tx_t airo_start_xmit11(struct sk_buff *skb,
set_bit(JOB_XMIT11, &priv->jobs);
wake_up_interruptible(&priv->thr_wait);
} else
-   airo_end_xmit11(dev);
+   airo_end_xmit11(dev, false);
retu

[PATCH net-next 11/15] net: rtlwifi: Remove in_interrupt() usage in is_any_client_connect_to_ap().

2020-10-27 Thread Sebastian Andrzej Siewior
is_any_client_connect_to_ap() is using in_interrupt() to determine whether
it should acquire the lock prior accessing the list.

The usage of in_interrupt() in drivers is phased out and Linus clearly
requested that code which changes behaviour depending on context should
either be separated or the context be conveyed in an argument passed by the
caller, which usually knows the context.

The function is called from:

- halbtc_get()

- halbtc_get()
halbtc_get_wifi_link_status()

- halbtc_display_dbg_msg()
halbtc_display_wifi_status()
  halbtc_get_wifi_link_status()

All top level callers are part of the btc_coexist callback inferface and
are never invoked from a context which can hold the lock already.

The contexts which hold the lock are either protecting list add/del
operations or list walks which never call into any of the btc_coexist
interfaces.

In fact the conditional is outright dangerous because if this function
would be invoked from a BH disabled context the check would avoid taking
the lock while on another CPU the list could be manipulated under the lock.

Remove the in_interrupt() check and always acquire the lock.

To simplify the code further use list_empty() instead of walking the list
and counting the entries just to check the count for > 0 at the end.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: Ping-Ke Shih 
Cc: Kalle Valo 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: linux-wirel...@vger.kernel.org
Cc: net...@vger.kernel.org
---
 .../realtek/rtlwifi/btcoexist/halbtcoutsrc.c  | 25 +--
 1 file changed, 6 insertions(+), 19 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c 
b/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c
index 2c05369b79e4d..2155a6699ef8d 100644
--- a/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c
+++ b/drivers/net/wireless/realtek/rtlwifi/btcoexist/halbtcoutsrc.c
@@ -47,30 +47,17 @@ static bool is_any_client_connect_to_ap(struct btc_coexist 
*btcoexist)
 {
struct rtl_priv *rtlpriv = btcoexist->adapter;
struct rtl_mac *mac = rtl_mac(rtlpriv);
-   struct rtl_sta_info *drv_priv;
-   u8 cnt = 0;
+   bool ret = false;
 
if (mac->opmode == NL80211_IFTYPE_ADHOC ||
mac->opmode == NL80211_IFTYPE_MESH_POINT ||
mac->opmode == NL80211_IFTYPE_AP) {
-   if (in_interrupt() > 0) {
-   list_for_each_entry(drv_priv, &rtlpriv->entry_list,
-   list) {
-   cnt++;
-   }
-   } else {
-   spin_lock_bh(&rtlpriv->locks.entry_list_lock);
-   list_for_each_entry(drv_priv, &rtlpriv->entry_list,
-   list) {
-   cnt++;
-   }
-   spin_unlock_bh(&rtlpriv->locks.entry_list_lock);
-   }
+   spin_lock_bh(&rtlpriv->locks.entry_list_lock);
+   if (!list_empty(&rtlpriv->entry_list))
+   ret = true;
+   spin_unlock_bh(&rtlpriv->locks.entry_list_lock);
}
-   if (cnt > 0)
-   return true;
-   else
-   return false;
+   return ret;
 }
 
 static bool halbtc_legacy(struct rtl_priv *adapter)
-- 
2.28.0



[PATCH net-next 09/15] net: hostap: Remove in_atomic() check.

2020-10-27 Thread Sebastian Andrzej Siewior
hostap_get_wireless_stats() is the iw_handler_if::get_wireless_stats()
callback of this driver. This callback was not allowed to sleep until
commit a160ee69c6a46 ("wext: let get_wireless_stats() sleep") in v2.6.32.

Remove the therefore pointless in_atomic() check.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: Jouni Malinen 
Cc: Kalle Valo 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: linux-wirel...@vger.kernel.org
Cc: net...@vger.kernel.org
---
 drivers/net/wireless/intersil/hostap/hostap_ioctl.c | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/net/wireless/intersil/hostap/hostap_ioctl.c 
b/drivers/net/wireless/intersil/hostap/hostap_ioctl.c
index 514c7b01dbf6f..49766b285230c 100644
--- a/drivers/net/wireless/intersil/hostap/hostap_ioctl.c
+++ b/drivers/net/wireless/intersil/hostap/hostap_ioctl.c
@@ -44,19 +44,8 @@ static struct iw_statistics 
*hostap_get_wireless_stats(struct net_device *dev)
 
if (local->iw_mode != IW_MODE_MASTER &&
local->iw_mode != IW_MODE_REPEAT) {
-   int update = 1;
-#ifdef in_atomic
-   /* RID reading might sleep and it must not be called in
-* interrupt context or while atomic. However, this
-* function seems to be called while atomic (at least in Linux
-* 2.5.59). Update signal quality values only if in suitable
-* context. Otherwise, previous values read from tick timer
-* will be used. */
-   if (in_atomic())
-   update = 0;
-#endif /* in_atomic */
 
-   if (update && prism2_update_comms_qual(dev) == 0)
+   if (prism2_update_comms_qual(dev) == 0)
wstats->qual.updated = IW_QUAL_ALL_UPDATED |
IW_QUAL_DBM;
 
-- 
2.28.0



[PATCH net-next 10/15] net: zd1211rw: Remove in_atomic() usage.

2020-10-27 Thread Sebastian Andrzej Siewior
The usage of in_atomic() in driver code is deprecated as it can not
always detect all states where it is not allowed to sleep.

All callers are in premptible thread context and all functions invoke core
functions which have checks for invalid calling contexts already.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: Daniel Drake 
Cc: Ulrich Kunitz 
Cc: Kalle Valo 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: linux-wirel...@vger.kernel.org
Cc: net...@vger.kernel.org
---
 drivers/net/wireless/zydas/zd1211rw/zd_usb.c | 15 ---
 1 file changed, 15 deletions(-)

diff --git a/drivers/net/wireless/zydas/zd1211rw/zd_usb.c 
b/drivers/net/wireless/zydas/zd1211rw/zd_usb.c
index 66367ab7e4c1e..5c4cd0e1adebb 100644
--- a/drivers/net/wireless/zydas/zd1211rw/zd_usb.c
+++ b/drivers/net/wireless/zydas/zd1211rw/zd_usb.c
@@ -1711,11 +1711,6 @@ int zd_usb_ioread16v(struct zd_usb *usb, u16 *values,
 count, USB_MAX_IOREAD16_COUNT);
return -EINVAL;
}
-   if (in_atomic()) {
-   dev_dbg_f(zd_usb_dev(usb),
-"error: io in atomic context not supported\n");
-   return -EWOULDBLOCK;
-   }
if (!usb_int_enabled(usb)) {
dev_dbg_f(zd_usb_dev(usb),
  "error: usb interrupt not enabled\n");
@@ -1882,11 +1877,6 @@ int zd_usb_iowrite16v_async(struct zd_usb *usb, const 
struct zd_ioreq16 *ioreqs,
count, USB_MAX_IOWRITE16_COUNT);
return -EINVAL;
}
-   if (in_atomic()) {
-   dev_dbg_f(zd_usb_dev(usb),
-   "error: io in atomic context not supported\n");
-   return -EWOULDBLOCK;
-   }
 
udev = zd_usb_to_usbdev(usb);
 
@@ -1966,11 +1956,6 @@ int zd_usb_rfwrite(struct zd_usb *usb, u32 value, u8 
bits)
int i, req_len, actual_req_len;
u16 bit_value_template;
 
-   if (in_atomic()) {
-   dev_dbg_f(zd_usb_dev(usb),
-   "error: io in atomic context not supported\n");
-   return -EWOULDBLOCK;
-   }
if (bits < USB_MIN_RFWRITE_BIT_COUNT) {
dev_dbg_f(zd_usb_dev(usb),
"error: bits %d are smaller than"
-- 
2.28.0



[PATCH net-next 07/15] net: airo: Always use JOB_STATS and JOB_EVENT

2020-10-27 Thread Sebastian Andrzej Siewior
issuecommand() is using in_atomic() to decide if it is safe to invoke
schedule() while waiting for the command to be accepted.

Usage of in_atomic() for this is only half correct as it can not detect all
condition where it is not allowed to schedule(). Also Linus clearly
requested that code which changes behaviour depending on context should
either be seperated or the context be conveyed in an argument passed by the
caller, which usually knows the context.

Chasing the call chains leading up to issuecommand() is straight forward,
but airo_link() and airo_get_stats() would require to pass the context
through a quite large amount of functions.

As this is ancient hardware, avoid the churn and enforce the invocation of
those functions through the JOB machinery.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: Kalle Valo 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: linux-wirel...@vger.kernel.org
Cc: net...@vger.kernel.org
---
 drivers/net/wireless/cisco/airo.c | 16 +---
 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/drivers/net/wireless/cisco/airo.c 
b/drivers/net/wireless/cisco/airo.c
index ca423f3b6b3ea..369a6ca44d1ff 100644
--- a/drivers/net/wireless/cisco/airo.c
+++ b/drivers/net/wireless/cisco/airo.c
@@ -2286,12 +2286,8 @@ static struct net_device_stats *airo_get_stats(struct 
net_device *dev)
struct airo_info *local =  dev->ml_priv;
 
if (!test_bit(JOB_STATS, &local->jobs)) {
-   /* Get stats out of the card if available */
-   if (down_trylock(&local->sem) != 0) {
-   set_bit(JOB_STATS, &local->jobs);
-   wake_up_interruptible(&local->thr_wait);
-   } else
-   airo_read_stats(dev);
+   set_bit(JOB_STATS, &local->jobs);
+   wake_up_interruptible(&local->thr_wait);
}
 
return &dev->stats;
@@ -3277,11 +3273,9 @@ static void airo_handle_link(struct airo_info *ai)
set_bit(FLAG_UPDATE_UNI, &ai->flags);
set_bit(FLAG_UPDATE_MULTI, &ai->flags);
 
-   if (down_trylock(&ai->sem) != 0) {
-   set_bit(JOB_EVENT, &ai->jobs);
-   wake_up_interruptible(&ai->thr_wait);
-   } else
-   airo_send_event(ai->dev);
+   set_bit(JOB_EVENT, &ai->jobs);
+   wake_up_interruptible(&ai->thr_wait);
+
netif_carrier_on(ai->dev);
} else if (!scan_forceloss) {
if (auto_wep && !ai->expires) {
-- 
2.28.0



[PATCH net-next 04/15] net: mlx5: Replace in_irq() usage.

2020-10-27 Thread Sebastian Andrzej Siewior
mlx5_eq_async_int() uses in_irq() to decide whether eq::lock needs to be
acquired and released with spin_[un]lock() or the irq saving/restoring
variants.

The usage of in_*() in drivers is phased out and Linus clearly requested
that code which changes behaviour depending on context should either be
seperated or the context be conveyed in an argument passed by the caller,
which usually knows the context.

mlx5_eq_async_int() knows the context via the action argument already so
using it for the lock variant decision is a straight forward replacement
for in_irq().

Signed-off-by: Sebastian Andrzej Siewior 
Cc: Saeed Mahameed 
Cc: Leon Romanovsky 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: linux-r...@vger.kernel.org
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 8ebfe782f95e5..3800e9415158b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -189,19 +189,21 @@ u32 mlx5_eq_poll_irq_disabled(struct mlx5_eq_comp *eq)
return count_eqe;
 }
 
-static void mlx5_eq_async_int_lock(struct mlx5_eq_async *eq, unsigned long 
*flags)
+static void mlx5_eq_async_int_lock(struct mlx5_eq_async *eq, bool recovery,
+  unsigned long *flags)
__acquires(&eq->lock)
 {
-   if (in_irq())
+   if (!recovery)
spin_lock(&eq->lock);
else
spin_lock_irqsave(&eq->lock, *flags);
 }
 
-static void mlx5_eq_async_int_unlock(struct mlx5_eq_async *eq, unsigned long 
*flags)
+static void mlx5_eq_async_int_unlock(struct mlx5_eq_async *eq, bool recovery,
+unsigned long *flags)
__releases(&eq->lock)
 {
-   if (in_irq())
+   if (!recovery)
spin_unlock(&eq->lock);
else
spin_unlock_irqrestore(&eq->lock, *flags);
@@ -222,12 +224,14 @@ static int mlx5_eq_async_int(struct notifier_block *nb,
struct mlx5_core_dev *dev;
struct mlx5_eqe *eqe;
unsigned long flags;
+   bool recovery;
int num_eqes = 0;
 
dev = eq->dev;
eqt = dev->priv.eq_table;
 
-   mlx5_eq_async_int_lock(eq_async, &flags);
+   recovery = action == ASYNC_EQ_RECOVER;
+   mlx5_eq_async_int_lock(eq_async, recovery, &flags);
 
eqe = next_eqe_sw(eq);
if (!eqe)
@@ -249,9 +253,9 @@ static int mlx5_eq_async_int(struct notifier_block *nb,
 
 out:
eq_update_ci(eq, 1);
-   mlx5_eq_async_int_unlock(eq_async, &flags);
+   mlx5_eq_async_int_unlock(eq_async, recovery, &flags);
 
-   return unlikely(action == ASYNC_EQ_RECOVER) ? num_eqes : 0;
+   return unlikely(recovery) ? num_eqes : 0;
 }
 
 void mlx5_cmd_eq_recover(struct mlx5_core_dev *dev)
-- 
2.28.0



[PATCH net-next 06/15] net: airo: Invoke airo_read_wireless_stats() directly

2020-10-27 Thread Sebastian Andrzej Siewior
airo_get_wireless_stats() is the iw_handler_if::get_wireless_stats()
callback of this driver. This callback was not allowed to sleep until
commit a160ee69c6a46 ("wext: let get_wireless_stats() sleep") in v2.6.32.

airo still delegates the readout to a thread, which is not longer
necessary.

Invoke airo_read_wireless_stats() directly from the callback and remove
the now unused JOB_WSTATS handling.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: Kalle Valo 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: linux-wirel...@vger.kernel.org
Cc: net...@vger.kernel.org
---
 drivers/net/wireless/cisco/airo.c | 22 +-
 1 file changed, 5 insertions(+), 17 deletions(-)

diff --git a/drivers/net/wireless/cisco/airo.c 
b/drivers/net/wireless/cisco/airo.c
index 87b9398b03fd4..ca423f3b6b3ea 100644
--- a/drivers/net/wireless/cisco/airo.c
+++ b/drivers/net/wireless/cisco/airo.c
@@ -1144,7 +1144,6 @@ static int airo_thread(void *data);
 static void timer_func(struct net_device *dev);
 static int airo_ioctl(struct net_device *dev, struct ifreq *rq, int cmd);
 static struct iw_statistics *airo_get_wireless_stats(struct net_device *dev);
-static void airo_read_wireless_stats(struct airo_info *local);
 #ifdef CISCO_EXT
 static int readrids(struct net_device *dev, aironet_ioctl *comp);
 static int writerids(struct net_device *dev, aironet_ioctl *comp);
@@ -1200,7 +1199,6 @@ struct airo_info {
 #define JOB_MIC5
 #define JOB_EVENT  6
 #define JOB_AUTOWEP7
-#define JOB_WSTATS 8
 #define JOB_SCAN_RESULTS  9
unsigned long jobs;
int (*bap_read)(struct airo_info*, __le16 *pu16Dst, int bytelen,
@@ -3155,8 +3153,6 @@ static int airo_thread(void *data)
airo_end_xmit11(dev);
else if (test_bit(JOB_STATS, &ai->jobs))
airo_read_stats(dev);
-   else if (test_bit(JOB_WSTATS, &ai->jobs))
-   airo_read_wireless_stats(ai);
else if (test_bit(JOB_PROMISC, &ai->jobs))
airo_set_promisc(ai);
else if (test_bit(JOB_MIC, &ai->jobs))
@@ -7732,15 +7728,12 @@ static void airo_read_wireless_stats(struct airo_info 
*local)
__le32 *vals = stats_rid.vals;
 
/* Get stats out of the card */
-   clear_bit(JOB_WSTATS, &local->jobs);
-   if (local->power.event) {
-   up(&local->sem);
+   if (local->power.event)
return;
-   }
+
readCapabilityRid(local, &cap_rid, 0);
readStatusRid(local, &status_rid, 0);
readStatsRid(local, &stats_rid, RID_STATS, 0);
-   up(&local->sem);
 
/* The status */
local->wstats.status = le16_to_cpu(status_rid.mode);
@@ -7783,15 +7776,10 @@ static struct iw_statistics 
*airo_get_wireless_stats(struct net_device *dev)
 {
struct airo_info *local =  dev->ml_priv;
 
-   if (!test_bit(JOB_WSTATS, &local->jobs)) {
-   /* Get stats out of the card if available */
-   if (down_trylock(&local->sem) != 0) {
-   set_bit(JOB_WSTATS, &local->jobs);
-   wake_up_interruptible(&local->thr_wait);
-   } else
-   airo_read_wireless_stats(local);
+   if (!down_interruptible(&local->sem)) {
+   airo_read_wireless_stats(local);
+   up(&local->sem);
}
-
return &local->wstats;
 }
 
-- 
2.28.0



[PATCH net-next 03/15] net: forcedeth: Replace context and lock check with a lockdep_assert()

2020-10-27 Thread Sebastian Andrzej Siewior
nv_update_stats() triggers a WARN_ON() when invoked from hard interrupt
context because the locks in use are not hard interrupt safe. It also has
an assert_spin_locked() which was the lock check before the lockdep era.

Lockdep has way broader locking correctness checks and covers both issues,
so replace the warning and the lock assert with lockdep_assert_held().

Signed-off-by: Sebastian Andrzej Siewior 
Cc: Rain River 
Cc: Zhu Yanjun 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: net...@vger.kernel.org
---
 drivers/net/ethernet/nvidia/forcedeth.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/nvidia/forcedeth.c 
b/drivers/net/ethernet/nvidia/forcedeth.c
index 2fc10a36afa4a..7e85cf943be11 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -1666,11 +1666,7 @@ static void nv_update_stats(struct net_device *dev)
struct fe_priv *np = netdev_priv(dev);
u8 __iomem *base = get_hwbase(dev);
 
-   /* If it happens that this is run in top-half context, then
-* replace the spin_lock of hwstats_lock with
-* spin_lock_irqsave() in calling functions. */
-   WARN_ONCE(in_irq(), "forcedeth: estats spin_lock(_bh) from top-half");
-   assert_spin_locked(&np->hwstats_lock);
+   lockdep_assert_held(&np->hwstats_lock);
 
/* query hardware */
np->estats.tx_bytes += readl(base + NvRegTxCnt);
-- 
2.28.0



[PATCH net-next 05/15] net: tlan: Replace in_irq() usage

2020-10-27 Thread Sebastian Andrzej Siewior
The driver uses in_irq() to determine if the tlan_priv::lock has to be
acquired in tlan_mii_read_reg() and tlan_mii_write_reg().

The interrupt handler acquires the lock outside of these functions so the
in_irq() check is meant to prevent a lock recursion deadlock. But this
check is incorrect when interrupt force threading is enabled because then
the handler runs in thread context and in_irq() correctly returns false.

The usage of in_*() in drivers is phased out and Linus clearly requested
that code which changes behaviour depending on context should either be
seperated or the context be conveyed in an argument passed by the caller,
which usually knows the context.

tlan_set_timer() has this conditional as well, but this function is only
invoked from task context or the timer callback itself. So it always has to
lock and the check can be removed.

tlan_mii_read_reg(), tlan_mii_write_reg() and tlan_phy_print() are invoked
from interrupt and other contexts.

Split out the actual function body into helper variants which are called
from interrupt context and make the original functions wrappers which
acquire tlan_priv::lock unconditionally.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: Samuel Chessman 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: net...@vger.kernel.org
---
 drivers/net/ethernet/ti/tlan.c | 98 --
 1 file changed, 57 insertions(+), 41 deletions(-)

diff --git a/drivers/net/ethernet/ti/tlan.c b/drivers/net/ethernet/ti/tlan.c
index 267c080ee084b..0b2ce4bdc2c3d 100644
--- a/drivers/net/ethernet/ti/tlan.c
+++ b/drivers/net/ethernet/ti/tlan.c
@@ -186,6 +186,7 @@ static void tlan_reset_adapter(struct net_device *);
 static voidtlan_finish_reset(struct net_device *);
 static voidtlan_set_mac(struct net_device *, int areg, char *mac);
 
+static void__tlan_phy_print(struct net_device *);
 static voidtlan_phy_print(struct net_device *);
 static voidtlan_phy_detect(struct net_device *);
 static voidtlan_phy_power_down(struct net_device *);
@@ -201,9 +202,11 @@ static voidtlan_phy_finish_auto_neg(struct 
net_device *);
   static int   tlan_phy_dp83840a_check(struct net_device *);
 */
 
-static booltlan_mii_read_reg(struct net_device *, u16, u16, u16 *);
+static bool__tlan_mii_read_reg(struct net_device *, u16, u16, u16 *);
+static voidtlan_mii_read_reg(struct net_device *, u16, u16, u16 *);
 static voidtlan_mii_send_data(u16, u32, unsigned);
 static voidtlan_mii_sync(u16);
+static void__tlan_mii_write_reg(struct net_device *, u16, u16, u16);
 static voidtlan_mii_write_reg(struct net_device *, u16, u16, u16);
 
 static voidtlan_ee_send_start(u16);
@@ -242,23 +245,20 @@ static u32
tlan_handle_rx_eoc
 };
 
-static inline void
+static void
 tlan_set_timer(struct net_device *dev, u32 ticks, u32 type)
 {
struct tlan_priv *priv = netdev_priv(dev);
unsigned long flags = 0;
 
-   if (!in_irq())
-   spin_lock_irqsave(&priv->lock, flags);
+   spin_lock_irqsave(&priv->lock, flags);
if (priv->timer.function != NULL &&
priv->timer_type != TLAN_TIMER_ACTIVITY) {
-   if (!in_irq())
-   spin_unlock_irqrestore(&priv->lock, flags);
+   spin_unlock_irqrestore(&priv->lock, flags);
return;
}
priv->timer.function = tlan_timer;
-   if (!in_irq())
-   spin_unlock_irqrestore(&priv->lock, flags);
+   spin_unlock_irqrestore(&priv->lock, flags);
 
priv->timer_set_at = jiffies;
priv->timer_type = type;
@@ -1703,22 +1703,22 @@ static u32 tlan_handle_status_check(struct net_device 
*dev, u16 host_int)
 dev->name, (unsigned) net_sts);
}
if ((net_sts & TLAN_NET_STS_MIRQ) &&  (priv->phy_num == 0)) {
-   tlan_mii_read_reg(dev, phy, TLAN_TLPHY_STS, &tlphy_sts);
-   tlan_mii_read_reg(dev, phy, TLAN_TLPHY_CTL, &tlphy_ctl);
+   __tlan_mii_read_reg(dev, phy, TLAN_TLPHY_STS, 
&tlphy_sts);
+   __tlan_mii_read_reg(dev, phy, TLAN_TLPHY_CTL, 
&tlphy_ctl);
if (!(tlphy_sts & TLAN_TS_POLOK) &&
!(tlphy_ctl & TLAN_TC_SWAPOL)) {
tlphy_ctl |= TLAN_TC_SWAPOL;
-   tlan_mii_write_reg(dev, phy, TLAN_TLPHY_CTL,
-  tlphy_ctl);
+   __tlan_mii_write_reg(dev, phy, TLAN_TLPHY_CTL,
+tlphy_ctl);
} else if ((tlphy_sts & TLAN_TS_POLOK) &&
   (tlphy_ctl & TLAN_TC_SWAPOL)) {
tlphy_ctl &= ~TLAN_TC_

[PATCH net-next 02/15] net: neterion: s2io: Replace in_interrupt() for context detection

2020-10-27 Thread Sebastian Andrzej Siewior
wait_for_cmd_complete() uses in_interrupt() to detect whether it is safe to
sleep or not.

The usage of in_interrupt() in drivers is phased out and Linus clearly
requested that code which changes behaviour depending on context should
either be seperated or the context be conveyed in an argument passed by the
caller, which usually knows the context.

in_interrupt() also is only partially correct because it fails to chose the
correct code path when just preemption or interrupts are disabled.

Add an argument 'may_block' to both functions and adjust the callers to
pass the context information.

The following call chains which end up invoking wait_for_cmd_complete()
were analyzed to be safe to sleep:

 s2io_card_up()
   s2io_set_multicast()

 init_nic()
   init_tti()

 s2io_close()
   do_s2io_delete_unicast_mc()
 do_s2io_add_mac()

 s2io_set_mac_addr()
   do_s2io_prog_unicast()
 do_s2io_add_mac()

 s2io_reset()
   do_s2io_restore_unicast_mc()
 do_s2io_add_mc()
   do_s2io_add_mac()

 s2io_open()
   do_s2io_prog_unicast()
 do_s2io_add_mac()

The following call chains which end up invoking wait_for_cmd_complete()
were analyzed to be safe to sleep:

 __dev_set_rx_mode()
s2io_set_multicast()

 s2io_txpic_intr_handle()
   s2io_link()
 init_tti()

Add a may_sleep argument to wait_for_cmd_complete(), s2io_set_multicast()
and init_tti() and hand the context information in from the call sites.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: Jon Mason 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: net...@vger.kernel.org
---
 drivers/net/ethernet/neterion/s2io.c | 41 
 drivers/net/ethernet/neterion/s2io.h |  4 +--
 2 files changed, 25 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/neterion/s2io.c 
b/drivers/net/ethernet/neterion/s2io.c
index d13d92bf74478..8f2f091bce899 100644
--- a/drivers/net/ethernet/neterion/s2io.c
+++ b/drivers/net/ethernet/neterion/s2io.c
@@ -1106,7 +1106,7 @@ static int s2io_print_pci_mode(struct s2io_nic *nic)
  *  '-1' on failure
  */
 
-static int init_tti(struct s2io_nic *nic, int link)
+static int init_tti(struct s2io_nic *nic, int link, bool may_sleep)
 {
struct XENA_dev_config __iomem *bar0 = nic->bar0;
register u64 val64 = 0;
@@ -1166,7 +1166,7 @@ static int init_tti(struct s2io_nic *nic, int link)
 
if (wait_for_cmd_complete(&bar0->tti_command_mem,
  TTI_CMD_MEM_STROBE_NEW_CMD,
- S2IO_BIT_RESET) != SUCCESS)
+ S2IO_BIT_RESET, may_sleep) != SUCCESS)
return FAILURE;
}
 
@@ -1659,7 +1659,7 @@ static int init_nic(struct s2io_nic *nic)
 */
 
/* Initialize TTI */
-   if (SUCCESS != init_tti(nic, nic->last_link_state))
+   if (SUCCESS != init_tti(nic, nic->last_link_state, true))
return -ENODEV;
 
/* RTI Initialization */
@@ -3331,7 +3331,7 @@ static void s2io_updt_xpak_counter(struct net_device *dev)
  */
 
 static int wait_for_cmd_complete(void __iomem *addr, u64 busy_bit,
-int bit_state)
+int bit_state, bool may_sleep)
 {
int ret = FAILURE, cnt = 0, delay = 1;
u64 val64;
@@ -3353,7 +3353,7 @@ static int wait_for_cmd_complete(void __iomem *addr, u64 
busy_bit,
}
}
 
-   if (in_interrupt())
+   if (!may_sleep)
mdelay(delay);
else
msleep(delay);
@@ -4877,8 +4877,7 @@ static struct net_device_stats *s2io_get_stats(struct 
net_device *dev)
  *  Return value:
  *  void.
  */
-
-static void s2io_set_multicast(struct net_device *dev)
+static void s2io_set_multicast(struct net_device *dev, bool may_sleep)
 {
int i, j, prev_cnt;
struct netdev_hw_addr *ha;
@@ -4903,7 +4902,7 @@ static void s2io_set_multicast(struct net_device *dev)
/* Wait till command completes */
wait_for_cmd_complete(&bar0->rmac_addr_cmd_mem,
  RMAC_ADDR_CMD_MEM_STROBE_CMD_EXECUTING,
- S2IO_BIT_RESET);
+ S2IO_BIT_RESET, may_sleep);
 
sp->m_cast_flg = 1;
sp->all_multi_pos = config->max_mc_addr - 1;
@@ -4920,7 +4919,7 @@ static void s2io_set_multicast(struct net_device *dev)
/* Wait till command completes */
wait_for_cmd_complete(&bar0->rmac_addr_cmd_mem,
  RMAC_ADDR_CMD_MEM_STROBE_CMD_EXECUTING,
- S2IO_BIT_RESET);
+ S2IO_BIT_RESET, may_sleep);
 
sp->m_cast_flg = 0;
sp->all_multi_pos = 0;

[PATCH net-next 01/15] net: orinoco: Remove BUG_ON(in_interrupt/irq())

2020-10-27 Thread Sebastian Andrzej Siewior
The usage of in_irq()/in_interrupt() in drivers is phased out and the
BUG_ON()'s based on those are not covering all contexts in which these
functions cannot be called.

Aside of that BUG_ON() should only be used as last resort, which is clearly
not the case here.

A broad variety of checks in the invoked functions (always enabled or debug
option dependent) cover these conditions already, so the BUG_ON()'s do not
really provide additional value.

Just remove them.

Signed-off-by: Sebastian Andrzej Siewior 
Cc: Kalle Valo 
Cc: "David S. Miller" 
Cc: Jakub Kicinski 
Cc: linux-wirel...@vger.kernel.org
Cc: net...@vger.kernel.org
---
 drivers/net/wireless/intersil/orinoco/orinoco_usb.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/net/wireless/intersil/orinoco/orinoco_usb.c 
b/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
index b849d27bd741e..046f2453ad5d9 100644
--- a/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
+++ b/drivers/net/wireless/intersil/orinoco/orinoco_usb.c
@@ -859,8 +859,6 @@ static int ezusb_access_ltv(struct ezusb_priv *upriv,
int retval = 0;
enum ezusb_state state;
 
-   BUG_ON(in_irq());
-
if (!upriv->udev) {
retval = -ENODEV;
goto exit;
@@ -1349,7 +1347,6 @@ static int ezusb_init(struct hermes *hw)
struct ezusb_priv *upriv = hw->priv;
int retval;
 
-   BUG_ON(in_interrupt());
if (!upriv)
return -EINVAL;
 
@@ -1448,7 +1445,6 @@ static inline void ezusb_delete(struct ezusb_priv *upriv)
struct list_head *tmp_item;
unsigned long flags;
 
-   BUG_ON(in_interrupt());
BUG_ON(!upriv);
 
mutex_lock(&upriv->mtx);
-- 
2.28.0



[PATCH net-next 00/15] in_interrupt() cleanup, part 2

2020-10-27 Thread Sebastian Andrzej Siewior
Folks,

in the discussion about preempt count consistency across kernel configurations:

  https://lore.kernel.org/r/20200914204209.256266...@linutronix.de/

Linus clearly requested that code in drivers and libraries which changes
behaviour based on execution context should either be split up so that
e.g. task context invocations and BH invocations have different interfaces
or if that's not possible the context information has to be provided by the
caller which knows in which context it is executing.

This includes conditional locking, allocation mode (GFP_*) decisions and
avoidance of code paths which might sleep.

In the long run, usage of 'preemptible, in_*irq etc.' should be banned from
driver code completely.

This is part two addressing remaining drivers except for orinoco-usb.

Sebastian


[PATCH 4/5] ia64: Remove mm.h from asm/uaccess.h

2020-03-20 Thread Sebastian Andrzej Siewior
The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC  kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|  from include/linux/mm.h:567,
|  from arch/ia64/include/asm/uaccess.h:,
|  from include/linux/uaccess.h:11,
|  from include/linux/sched/task.h:11,
|  from include/linux/sched/signal.h:9,
|  from include/linux/rcuwait.h:6,
|  from include/linux/percpu-rwsem.h:8,
|  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 
'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Cc: Tony Luck 
Cc: Fenghua Yu 
Cc: linux-i...@vger.kernel.org
Reported-by: kbuild test robot 
Signed-off-by: Sebastian Andrzej Siewior 
---
 arch/ia64/include/asm/uaccess.h | 1 -
 arch/ia64/kernel/process.c  | 1 +
 arch/ia64/mm/ioremap.c  | 1 +
 3 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/ia64/include/asm/uaccess.h b/arch/ia64/include/asm/uaccess.h
index 89782ad3fb887..5c7e79eccaeed 100644
--- a/arch/ia64/include/asm/uaccess.h
+++ b/arch/ia64/include/asm/uaccess.h
@@ -35,7 +35,6 @@
 
 #include 
 #include 
-#include 
 
 #include 
 #include 
diff --git a/arch/ia64/kernel/process.c b/arch/ia64/kernel/process.c
index 968b5f33e725e..743aaf5283278 100644
--- a/arch/ia64/kernel/process.c
+++ b/arch/ia64/kernel/process.c
@@ -681,3 +681,4 @@ machine_power_off (void)
machine_halt();
 }
 
+EXPORT_SYMBOL(ia64_delay_loop);
diff --git a/arch/ia64/mm/ioremap.c b/arch/ia64/mm/ioremap.c
index a09cfa0645369..55fd3eb753ff9 100644
--- a/arch/ia64/mm/ioremap.c
+++ b/arch/ia64/mm/ioremap.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
-- 
2.26.0.rc2



[PATCH 1/5] nds32: Remove mm.h from asm/uaccess.h

2020-03-20 Thread Sebastian Andrzej Siewior
The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC  kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|  from include/linux/mm.h:567,
|  from arch/nds32/include/asm/uaccess.h:,
|  from include/linux/uaccess.h:11,
|  from include/linux/sched/task.h:11,
|  from include/linux/sched/signal.h:9,
|  from include/linux/rcuwait.h:6,
|  from include/linux/percpu-rwsem.h:8,
|  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 
'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Cc: Nick Hu 
Cc: Greentime Hu 
Cc: Vincent Chen 
Reported-by: kbuild test robot 
Signed-off-by: Sebastian Andrzej Siewior 
---
 arch/nds32/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/nds32/include/asm/uaccess.h b/arch/nds32/include/asm/uaccess.h
index 8916ad9f9f139..3a9219f53ee0d 100644
--- a/arch/nds32/include/asm/uaccess.h
+++ b/arch/nds32/include/asm/uaccess.h
@@ -11,7 +11,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #define __asmeq(x, y)  ".ifnc " x "," y " ; .err ; .endif\n\t"
 
-- 
2.26.0.rc2



[PATCH 2/5] csky: Remove mm.h from asm/uaccess.h

2020-03-20 Thread Sebastian Andrzej Siewior
The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC  kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|  from include/linux/mm.h:567,
|  from arch/csky/include/asm/uaccess.h:,
|  from include/linux/uaccess.h:11,
|  from include/linux/sched/task.h:11,
|  from include/linux/sched/signal.h:9,
|  from include/linux/rcuwait.h:6,
|  from include/linux/percpu-rwsem.h:8,
|  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 
'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Cc: Guo Ren 
Cc: linux-c...@vger.kernel.org
Reported-by: kbuild test robot 
Signed-off-by: Sebastian Andrzej Siewior 
---
 arch/csky/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/csky/include/asm/uaccess.h b/arch/csky/include/asm/uaccess.h
index eaa1c3403a424..abefa125b93cf 100644
--- a/arch/csky/include/asm/uaccess.h
+++ b/arch/csky/include/asm/uaccess.h
@@ -11,7 +11,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
2.26.0.rc2



[PATCH 3/5] hexagon: Remove mm.h from asm/uaccess.h

2020-03-20 Thread Sebastian Andrzej Siewior
The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC  kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|  from include/linux/mm.h:567,
|  from arch/hexagon/include/asm/uaccess.h:,
|  from include/linux/uaccess.h:11,
|  from include/linux/sched/task.h:11,
|  from include/linux/sched/signal.h:9,
|  from include/linux/rcuwait.h:6,
|  from include/linux/percpu-rwsem.h:8,
|  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 
'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Cc: Brian Cain 
Cc: linux-hexa...@vger.kernel.org
Reported-by: kbuild test robot 
Signed-off-by: Sebastian Andrzej Siewior 
---
 arch/hexagon/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/hexagon/include/asm/uaccess.h 
b/arch/hexagon/include/asm/uaccess.h
index 00cb38faad0c4..c1019a736ff13 100644
--- a/arch/hexagon/include/asm/uaccess.h
+++ b/arch/hexagon/include/asm/uaccess.h
@@ -10,7 +10,6 @@
 /*
  * User space memory access functions
  */
-#include 
 #include 
 
 /*
-- 
2.26.0.rc2



[PATCH 0/5] Remove mm.h from arch/*/include/asm/uaccess.h

2020-03-20 Thread Sebastian Andrzej Siewior
The following mini-series removes linux/mm.h from the asm/uaccess.h of
the individual architecture. The series has been compile tested with the
defconfig and additionally for ia64 with the "special" allmodconfig
supplied by the bot. The regular allmod for the architectures does not
compile (even without the series).

Sebastian




[PATCH 5/5] microblaze: Remove mm.h from asm/uaccess.h

2020-03-20 Thread Sebastian Andrzej Siewior
The defconfig compiles without linux/mm.h. With mm.h included the
include chain leands to:
|   CC  kernel/locking/percpu-rwsem.o
| In file included from include/linux/huge_mm.h:8,
|  from include/linux/mm.h:567,
|  from arch/microblaze/include/asm/uaccess.h:,
|  from include/linux/uaccess.h:11,
|  from include/linux/sched/task.h:11,
|  from include/linux/sched/signal.h:9,
|  from include/linux/rcuwait.h:6,
|  from include/linux/percpu-rwsem.h:8,
|  from kernel/locking/percpu-rwsem.c:6:
| include/linux/fs.h:1422:29: error: array type has incomplete element type 
'struct percpu_rw_semaphore'
|  1422 |  struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS];

once rcuwait.h includes linux/sched/signal.h.

Remove the linux/mm.h include.

Cc: Michal Simek 
Reported-by: kbuild test robot 
Signed-off-by: Sebastian Andrzej Siewior 
---
 arch/microblaze/include/asm/uaccess.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/microblaze/include/asm/uaccess.h 
b/arch/microblaze/include/asm/uaccess.h
index a1f206b90753a..4916d5fbea5e3 100644
--- a/arch/microblaze/include/asm/uaccess.h
+++ b/arch/microblaze/include/asm/uaccess.h
@@ -12,7 +12,6 @@
 #define _ASM_MICROBLAZE_UACCESS_H
 
 #include 
-#include 
 
 #include 
 #include 
-- 
2.26.0.rc2



Re: [PATCH 19/15] sched/swait: Reword some of the main description

2020-03-20 Thread Sebastian Andrzej Siewior
On 2020-03-20 01:55:27 [-0700], Davidlohr Bueso wrote:
> diff --git a/include/linux/swait.h b/include/linux/swait.h
> index 73e06e9986d4..6e5b5d0e64fd 100644
> --- a/include/linux/swait.h
> +++ b/include/linux/swait.h
> @@ -39,7 +26,7 @@
>   *sleeper state.
>   *
>   *  - the !exclusive mode; because that leads to O(n) wakeups, everything is
> - *exclusive.
> + *exclusive. As such swait_wake_up_one will only ever awake _one_ waiter.
swake_up_one()

>   *  - custom wake callback functions; because you cannot give any guarantees
>   *about random code. This also allows swait to be used in RT, such that

Sebastian


Re: [PATCH 17/15] rcuwait: Inform rcuwait_wake_up() users if a wakeup was attempted

2020-03-20 Thread Sebastian Andrzej Siewior
On 2020-03-20 01:55:25 [-0700], Davidlohr Bueso wrote:
> Let the caller know if wake_up_process() was actually called or not;
> some users can use this information for ad-hoc. Of course returning
> true does not guarantee that wake_up_process() actually woke anything
> up.

Wouldn't it make sense to return wake_up_process() return value to know
if a change of state occurred or not?

Sebastian


Re: [patch V2 06/15] rcuwait: Add @state argument to rcuwait_wait_event()

2020-03-20 Thread Sebastian Andrzej Siewior
On 2020-03-19 22:36:57 [-0700], Davidlohr Bueso wrote:
> On Wed, 18 Mar 2020, Thomas Gleixner wrote:
> 
> Right now I'm not sure what the proper fix should be.

I though that v2 has it fixed with the previous commit (acpi: Remove
header dependency). The kbot just reported that everything is fine.
Let me look…

> Thanks,
> Davidlohr

Sebastian


Re: [patch V2 07/15] powerpc/ps3: Convert half completion to rcuwait

2020-03-19 Thread Sebastian Andrzej Siewior
On 2020-03-19 03:04:59 [-0700], Christoph Hellwig wrote:
> But I wonder how alive the whole PS3 support is to start with..

OtherOS can only be used on "old" PS3 which do not have have their
firmware upgraded past version 3.21, released April 1, 2010 [0].
It was not possible to install OtherOS on PS3-slim and I don't remember
if it was a successor or a budget version (but it had lower power
consumption as per my memory).
*I* remember from back then that a few universities bought quite a few
of them and used them as a computation cluster. However, whatever broke
over the last 10 years is broken.

[0] https://en.wikipedia.org/wiki/OtherOS

Sebastian


Re: [patch V2 07/15] powerpc/ps3: Convert half completion to rcuwait

2020-03-19 Thread Sebastian Andrzej Siewior
On 2020-03-18 21:43:09 [+0100], Thomas Gleixner wrote:
> --- a/arch/powerpc/platforms/ps3/device-init.c
> +++ b/arch/powerpc/platforms/ps3/device-init.c
> @@ -725,12 +728,12 @@ static int ps3_notification_read_write(s
>   unsigned long flags;
>   int res;
>  
> - init_completion(&dev->done);
>   spin_lock_irqsave(&dev->lock, flags);
>   res = write ? lv1_storage_write(dev->sbd.dev_id, 0, 0, 1, 0, lpar,
>   &dev->tag)
>   : lv1_storage_read(dev->sbd.dev_id, 0, 0, 1, 0, lpar,
>  &dev->tag);
> + dev->done = false;
>   spin_unlock_irqrestore(&dev->lock, flags);
>   if (res) {
>   pr_err("%s:%u: %s failed %d\n", __func__, __LINE__, op, res);
> @@ -738,14 +741,10 @@ static int ps3_notification_read_write(s
>   }
>   pr_debug("%s:%u: notification %s issued\n", __func__, __LINE__, op);
>  
> - res = wait_event_interruptible(dev->done.wait,
> -dev->done.done || kthread_should_stop());
> + rcuwait_wait_event(&dev->wait, dev->done || kthread_should_stop(), 
> TASK_IDLE);
> +
…

Not sure it matters but this struct `dev' is allocated on stack. Should
the interrupt fire *before* rcuwait_wait_event() set wait.task to NULL
then it is of random value on the first invocation of rcuwait_wake_up().
->

diff --git a/arch/powerpc/platforms/ps3/device-init.c 
b/arch/powerpc/platforms/ps3/device-init.c
index 197347c3c0b24..e87360a0fb40d 100644
--- a/arch/powerpc/platforms/ps3/device-init.c
+++ b/arch/powerpc/platforms/ps3/device-init.c
@@ -809,6 +809,7 @@ static int ps3_probe_thread(void *data)
}
 
spin_lock_init(&dev.lock);
+   rcuwait_init(&dev.wait);
 
res = request_irq(irq, ps3_notification_interrupt, 0,
  "ps3_notification", &dev);


Sebastian


Re: [RFC] per-CPU usage in perf core-book3s

2020-02-05 Thread Sebastian Andrzej Siewior
On 2020-02-05 07:10:59 [+0530], maddy wrote:
> Yes, currently we dont have anything that prevents the timer
> callback to interrupt pmu::event_init. Nice catch. Thanks for
> pointing this out.

You are welcome.

> Looking at the code, per-cpu variable access are made to
> check for constraints and for Branch Stack (BHRB). So could
> wrap this block of  pmu::event_init with local_irq_save/restore.
> Will send a patch to fix it.

Okay. Please keep me in the loop :)

> 
> Maddy

Sebastian


[RFC] per-CPU usage in perf core-book3s

2020-01-27 Thread Sebastian Andrzej Siewior
I've been looking at usage of per-CPU variable cpu_hw_events in
arch/powerpc/perf/core-book3s.c.

power_pmu_enable() and power_pmu_disable() (pmu::pmu_enable() and
pmu::pmu_disable()) are accessing the variable and the callbacks are
invoked always with disabled interrupts.

power_pmu_event_init() (pmu::event_init()) is invoked from preemptible
context and uses get_cpu_var() to obtain a stable pointer (by disabling
preemption).

pmu::pmu_enable() and pmu::pmu_disable() can be invoked via a hrtimer
(perf_mux_hrtimer_handler()) and it invokes pmu::pmu_enable() and
pmu::pmu_disable() as part of the callback.

Is there anything that prevents the timer callback to interrupt
pmu::event_init() while it is accessing per-CPU data?

Sebastian


[PATCH] powerpc/85xx: Get twr_p102x to compile again

2019-12-19 Thread Sebastian Andrzej Siewior
With CONFIG_QUICC_ENGINE enabled and CONFIG_UCC_GETH + CONFIG_SERIAL_QE
disabled we have an unused variable (np). The code won't compile with
-Werror.

Move the np variable to the block where it is actually used.

Signed-off-by: Sebastian Andrzej Siewior 
---
 arch/powerpc/platforms/85xx/twr_p102x.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/twr_p102x.c 
b/arch/powerpc/platforms/85xx/twr_p102x.c
index 6c3c0cdaee9ad..b301ef9d6ce75 100644
--- a/arch/powerpc/platforms/85xx/twr_p102x.c
+++ b/arch/powerpc/platforms/85xx/twr_p102x.c
@@ -60,10 +60,6 @@ static void __init twr_p1025_pic_init(void)
  */
 static void __init twr_p1025_setup_arch(void)
 {
-#ifdef CONFIG_QUICC_ENGINE
-   struct device_node *np;
-#endif
-
if (ppc_md.progress)
ppc_md.progress("twr_p1025_setup_arch()", 0);
 
@@ -77,6 +73,7 @@ static void __init twr_p1025_setup_arch(void)
 #if IS_ENABLED(CONFIG_UCC_GETH) || IS_ENABLED(CONFIG_SERIAL_QE)
if (machine_is(twr_p1025)) {
struct ccsr_guts __iomem *guts;
+   struct device_node *np;
 
np = of_find_compatible_node(NULL, NULL, "fsl,p1021-guts");
if (np) {
-- 
2.24.0



Re: [Qemu-ppc] pseries on qemu-system-ppc64le crashes in doorbell_core_ipi()

2019-12-19 Thread Sebastian Andrzej Siewior
On 2019-12-19 11:41:21 [+0100], Jason A. Donenfeld wrote:
> Hi folks,
Hi,

so this should duct tape it:

diff --git a/arch/powerpc/kernel/dbell.c b/arch/powerpc/kernel/dbell.c
index f17ff1200eaae..ec044bdf362a1 100644
--- a/arch/powerpc/kernel/dbell.c
+++ b/arch/powerpc/kernel/dbell.c
@@ -60,16 +60,8 @@ void doorbell_core_ipi(int cpu)
  */
 int doorbell_try_core_ipi(int cpu)
 {
-   int this_cpu = get_cpu();
int ret = 0;
 
-   if (cpumask_test_cpu(cpu, cpu_sibling_mask(this_cpu))) {
-   doorbell_core_ipi(cpu);
-   ret = 1;
-   }
-
-   put_cpu();
-
return ret;
 }
 

> This is with "qemu-system-ppc64 -smp 4 -machine pseries" on QEMU 4.0.0.

Interesting. I didn't get v5.4 to boot a while ago and didn't have the
time to look into it.

> I'm not totally sure what's going on here. I'm emulating a pseries, and
> using that with qemu's pseries model, and I see that selecting the pseries
> forces the selection of 'config PPC_DOORBELL' (twice in the same section,
> actually). Then inside the kernel there appears to be some runtime CPU check
> for doorbell support. Is this a case in which QEMU is advertising doorbell
> support that TCG doesn't have? Or is something else happening here?

Based on my understanding is that the doorbell feature is part of the
architecture. It can be used to signal other siblings on the same CPU.
qemu TCG doesn't support that and does not allow to announce multiple
siblings on the same CPU. However, the kernel uses this interface if it
tries to send an interrupt to itself (the same CPU) so everything
matches.
Last time I run into this, the interface was change so the kernel das
not send an IPI to itself. This changed now for another function…

> Thanks,
> Jason

Sebastian


Re: [RFC] Efficiency of the phandle_cache on ppc64/SLOF

2019-12-09 Thread Sebastian Andrzej Siewior
On 2019-12-05 20:01:41 [-0600], Frank Rowand wrote:
> Is there a memory usage issue for the systems that led to this thread?

No, no memory issue led to this thread. I was just testing my patch and
I assumed that I did something wrong in the counting/lock drop/lock
acquire/allocate path because the array was hardly used. So I started to
look deeper…
Once I figured out everything was fine, I was curious if everyone is
aware of the different phandle creation by dtc vs POWER. And I posted
the mail in the thread.
Once you confirmed that everything is "known / not an issue" I was ready
to take off [0].

Later more replies came in such as one mail [1] from Rob describing the
original reason with 814 phandles. _Here_ I was just surprised that 1024
were used over 64 entries for a benefit of 60ms. I understand that this
is low concern for you because that memory is released if modules are
not enabled. I usually see that module support is left enabled.

However, Rob suggested / asked about the fixed size array (this is how I
understood it):
|And yes, as mentioned earlier I don't like the complexity. I didn't
|from the start and I'm  I'm still of the opinion we should have a
|fixed or 1 time sized true cache (i.e. smaller than total # of
|phandles). That would solve the RT memory allocation and locking issue
|too.

so I attempted to ask if we should have the fixed size array maybe
with the hash_32() instead the mask. This would make my other patch
obsolete because the fixed size array should not have a RT issue. The
hash_32() part here would address the POWER issue where the cache is
currently not used efficiently.

If you want instead to keep things as-is then this is okay from my side.
If you want to keep this cache off on POWER then I could contribute a
patch doing so.

[0] 
https://lore.kernel.org/linux-devicetree/20191202110732.4dvzrro5o6zrl...@linutronix.de/
[1] 
https://lore.kernel.org/linux-devicetree/CAL_JsqKieG5=teL7gABPKbJOQfvoS9s-ZPF-=r0yee_luoy...@mail.gmail.com/
> -Frank

Sebastian


Re: [RFC] Efficiency of the phandle_cache on ppc64/SLOF

2019-12-05 Thread Sebastian Andrzej Siewior
On 2019-12-03 10:56:35 [-0600], Rob Herring wrote:
> > Another possibility would be to make the cache be dependent
> > upon not CONFIG_PPC.  It might be possible to disable the
> > cache with a minimal code change.
> 
> I'd rather not do that.
> 
> And yes, as mentioned earlier I don't like the complexity. I didn't
> from the start and I'm  I'm still of the opinion we should have a
> fixed or 1 time sized true cache (i.e. smaller than total # of
> phandles). That would solve the RT memory allocation and locking issue
> too.
> 
> For reference, the performance difference between the current
> implementation (assuming fixes haven't regressed it) was ~400ms vs. a
> ~340ms improvement with a 64 entry cache (using a mask, not a hash).
> IMO, 340ms improvement was good enough.

Okay. So the 814 phandles would result in an array with 1024 slots. That
would need 4KiB of memory.
What about we go back to the fix 64 slots array but with hash32 for the
lookup? Without the hash we would be 60ms slower during boot (compared
to now, based on ancient data) but then the hash isn't expensive so we
end up with better coverage of the memory on systems which don't have a
plain enumeration of the phandle.

> Rob

Sebastian


Re: [RFC] Efficiency of the phandle_cache on ppc64/SLOF

2019-12-02 Thread Sebastian Andrzej Siewior
On 2019-11-29 20:14:47 [-0600], Frank Rowand wrote:
> The hash used is based on the assumptions you noted, and as stated in the
> code, that phandle property values are in a contiguous range of 1..n
> (not starting from zero), which is what dtc generates.
> 
> We knew that for systems that do not match the assumptions that the hash
> will not be optimal.  Unless there is a serious performance problem for
> such systems, I do not want to make the phandle hash code more complicated
> to optimize for these cases.  And the pseries have been performing ok
> without phandle related performance issues that I remember hearing since
> before the cache was added, which could have only helped the performance.
> Yes, if your observations are correct, some memory is being wasted, but
> a 64 entry cache is not very large on a pseries.

okay, so it is nothing new and everyone is aware of the situation. I
move on then :)

> -Frank

Sebastian


[RFC] Efficiency of the phandle_cache on ppc64/SLOF

2019-11-29 Thread Sebastian Andrzej Siewior
I've been looking at phandle_cache and noticed the following: The raw
phandle value as generated by dtc starts at zero and is incremented by
one for each phandle entry. The qemu pSeries model is using Slof (which
is probably the same thing as used on real hardware) and this looks like
a poiner value for the phandle.
With
qemu-system-ppc64le -m 16G -machine pseries -smp 8 

I got the following output:
| entries: 64
| phandle 7e732468 slot 28 hash c
| phandle 7e732ad0 slot 10 hash 27
| phandle 7e732ee8 slot 28 hash 3a
| phandle 7e734160 slot 20 hash 36
| phandle 7e734318 slot 18 hash 3a
| phandle 7e734428 slot 28 hash 33
| phandle 7e734538 slot 38 hash 2c
| phandle 7e734850 slot 10 hash e
| phandle 7e735220 slot 20 hash 2d
| phandle 7e735bf0 slot 30 hash d
| phandle 7e7365c0 slot 0 hash 2d
| phandle 7e736f90 slot 10 hash d
| phandle 7e737960 slot 20 hash 2d
| phandle 7e738330 slot 30 hash d
| phandle 7e738d00 slot 0 hash 2d
| phandle 7e739730 slot 30 hash 38
| phandle 7e73bd08 slot 8 hash 17
| phandle 7e73c2e0 slot 20 hash 32
| phandle 7e73c7f8 slot 38 hash 37
| phandle 7e782420 slot 20 hash 13
| phandle 7e782ed8 slot 18 hash 1b
| phandle 7e73ce28 slot 28 hash 39
| phandle 7e73d390 slot 10 hash 22
| phandle 7e73d9a8 slot 28 hash 1a
| phandle 7e73dc28 slot 28 hash 37
| phandle 7e73de00 slot 0 hash a
| phandle 7e73e028 slot 28 hash 0
| phandle 7e7621a8 slot 28 hash 36
| phandle 7e73e458 slot 18 hash 1e
| phandle 7e73e608 slot 8 hash 1e
| phandle 7e740078 slot 38 hash 28
| phandle 7e740180 slot 0 hash 1d
| phandle 7e740240 slot 0 hash 33
| phandle 7e740348 slot 8 hash 29
| phandle 7e740410 slot 10 hash 2
| phandle 7e740eb0 slot 30 hash 3e
| phandle 7e745390 slot 10 hash 33
| phandle 7e747b08 slot 8 hash c
| phandle 7e748528 slot 28 hash f
| phandle 7e74a6e0 slot 20 hash 18
| phandle 7e74aab0 slot 30 hash b
| phandle 7e74f788 slot 8 hash d
| Used entries: 8, hashed: 29

So the hash array has 64 entries out which only 8 are populated. Using
hash_32() populates 29 entries.
Could someone with real hardware verify this?
I'm not sure how important this performance wise, it looks just like a
waste using only 1/8 of the array.

The patch used for testing:

diff --git a/drivers/of/base.c b/drivers/of/base.c
index 1d667eb730e19..2640d4bc81a9a 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -197,6 +197,7 @@ void of_populate_phandle_cache(void)
u32 cache_entries;
struct device_node *np;
u32 phandles = 0;
+   struct device_node **cache2;
 
raw_spin_lock_irqsave(&devtree_lock, flags);
 
@@ -214,14 +215,32 @@ void of_populate_phandle_cache(void)
 
phandle_cache = kcalloc(cache_entries, sizeof(*phandle_cache),
GFP_ATOMIC);
+   cache2 = kcalloc(cache_entries, sizeof(*phandle_cache), GFP_ATOMIC);
if (!phandle_cache)
goto out;
 
+   pr_err("%s(%d) entries: %d\n", __func__, __LINE__, cache_entries);
for_each_of_allnodes(np)
if (np->phandle && np->phandle != OF_PHANDLE_ILLEGAL) {
+   int slot;
of_node_get(np);
phandle_cache[np->phandle & phandle_cache_mask] = np;
+   slot = hash_32(np->phandle, __ffs(cache_entries));
+   cache2[slot] = np;
+   pr_err("%s(%d) phandle %x slot %x hash %x\n", __func__, 
__LINE__,
+  np->phandle, np->phandle & phandle_cache_mask, 
slot);
}
+   {
+   int i, filled = 0, filled_hash = 0;
+
+   for (i = 0; i < cache_entries; i++) {
+   if (phandle_cache[i])
+   filled++;
+   if (cache2[i])
+   filled_hash++;
+   }
+   pr_err("%s(%d) Used entries: %d, hashed: %d\n", __func__, 
__LINE__, filled, filled_hash);
+   }
 
 out:
raw_spin_unlock_irqrestore(&devtree_lock, flags);

Sebastian


[PATCH 03/34 v3] powerpc: Use CONFIG_PREEMPTION

2019-10-24 Thread Sebastian Andrzej Siewior
From: Thomas Gleixner 

CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT.
Both PREEMPT and PREEMPT_RT require the same functionality which today
depends on CONFIG_PREEMPT.

Switch the entry code over to use CONFIG_PREEMPTION.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Thomas Gleixner 
[bigeasy: +Kconfig]
Signed-off-by: Sebastian Andrzej Siewior 
---
v2…v3: Don't mention die.c changes in the description.
v1…v2: Remove the changes to die.c.

 arch/powerpc/Kconfig   | 2 +-
 arch/powerpc/kernel/entry_32.S | 4 ++--
 arch/powerpc/kernel/entry_64.S | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 3e56c9c2f16ee..8ead8d6e1cbc8 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -106,7 +106,7 @@ config LOCKDEP_SUPPORT
 config GENERIC_LOCKBREAK
bool
default y
-   depends on SMP && PREEMPT
+   depends on SMP && PREEMPTION
 
 config GENERIC_HWEIGHT
bool
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index d60908ea37fb9..e1a4c39b83b86 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -897,7 +897,7 @@ user_exc_return:/* r10 contains MSR_KERNEL here 
*/
bne-0b
 1:
 
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
/* check current_thread_info->preempt_count */
lwz r0,TI_PREEMPT(r2)
cmpwi   0,r0,0  /* if non-zero, just restore regs and return */
@@ -921,7 +921,7 @@ user_exc_return:/* r10 contains MSR_KERNEL here 
*/
 */
bl  trace_hardirqs_on
 #endif
-#endif /* CONFIG_PREEMPT */
+#endif /* CONFIG_PREEMPTION */
 restore_kuap:
kuap_restore r1, r2, r9, r10, r0
 
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 6467bdab8d405..83733376533e8 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -840,7 +840,7 @@ _GLOBAL(ret_from_except_lite)
bne-0b
 1:
 
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
/* Check if we need to preempt */
andi.   r0,r4,_TIF_NEED_RESCHED
beq+restore
@@ -871,7 +871,7 @@ _GLOBAL(ret_from_except_lite)
li  r10,MSR_RI
mtmsrd  r10,1 /* Update machine state */
 #endif /* CONFIG_PPC_BOOK3E */
-#endif /* CONFIG_PREEMPT */
+#endif /* CONFIG_PREEMPTION */
 
.globl  fast_exc_return_irq
 fast_exc_return_irq:
-- 
2.23.0



[PATCH 03/34 v2] powerpc: Use CONFIG_PREEMPTION

2019-10-24 Thread Sebastian Andrzej Siewior
From: Thomas Gleixner 

CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT.
Both PREEMPT and PREEMPT_RT require the same functionality which today
depends on CONFIG_PREEMPT.

Switch the entry code over to use CONFIG_PREEMPTION. Add PREEMPT_RT
output in __die().

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Thomas Gleixner 
[bigeasy: +Kconfig]
Signed-off-by: Sebastian Andrzej Siewior 
---
v1…v2: Remove the changes to die.c

 arch/powerpc/Kconfig   | 2 +-
 arch/powerpc/kernel/entry_32.S | 4 ++--
 arch/powerpc/kernel/entry_64.S | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 3e56c9c2f16ee..8ead8d6e1cbc8 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -106,7 +106,7 @@ config LOCKDEP_SUPPORT
 config GENERIC_LOCKBREAK
bool
default y
-   depends on SMP && PREEMPT
+   depends on SMP && PREEMPTION
 
 config GENERIC_HWEIGHT
bool
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index d60908ea37fb9..e1a4c39b83b86 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -897,7 +897,7 @@ user_exc_return:/* r10 contains MSR_KERNEL here 
*/
bne-0b
 1:
 
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
/* check current_thread_info->preempt_count */
lwz r0,TI_PREEMPT(r2)
cmpwi   0,r0,0  /* if non-zero, just restore regs and return */
@@ -921,7 +921,7 @@ user_exc_return:/* r10 contains MSR_KERNEL here 
*/
 */
bl  trace_hardirqs_on
 #endif
-#endif /* CONFIG_PREEMPT */
+#endif /* CONFIG_PREEMPTION */
 restore_kuap:
kuap_restore r1, r2, r9, r10, r0
 
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 6467bdab8d405..83733376533e8 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -840,7 +840,7 @@ _GLOBAL(ret_from_except_lite)
bne-0b
 1:
 
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
/* Check if we need to preempt */
andi.   r0,r4,_TIF_NEED_RESCHED
beq+restore
@@ -871,7 +871,7 @@ _GLOBAL(ret_from_except_lite)
li  r10,MSR_RI
mtmsrd  r10,1 /* Update machine state */
 #endif /* CONFIG_PPC_BOOK3E */
-#endif /* CONFIG_PREEMPT */
+#endif /* CONFIG_PREEMPTION */
 
.globl  fast_exc_return_irq
 fast_exc_return_irq:
-- 
2.23.0



Re: [PATCH 03/34] powerpc: Use CONFIG_PREEMPTION

2019-10-16 Thread Sebastian Andrzej Siewior
On 2019-10-16 20:33:08 [+1100], Michael Ellerman wrote:
> Christophe Leroy  writes:
> > Le 15/10/2019 à 21:17, Sebastian Andrzej Siewior a écrit :
> >> From: Thomas Gleixner 
> >> 
> >> CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT.
> >> Both PREEMPT and PREEMPT_RT require the same functionality which today
> >> depends on CONFIG_PREEMPT.
> >> 
> >> Switch the entry code over to use CONFIG_PREEMPTION. Add PREEMPT_RT
> >> output in __die().
> >
> > powerpc doesn't select ARCH_SUPPORTS_RT, so this change is useless as 
> > CONFIG_PREEMPT_RT cannot be selected.
> 
> Yeah I don't think there's any point adding the "_RT" to the __die()
> output until/if we ever start supporting PREEMPT_RT.

so jut the PREEMPT -> PREEMPTION  bits then?

> cheers

Sebastian


Re: [PATCH 03/34] powerpc: Use CONFIG_PREEMPTION

2019-10-16 Thread Sebastian Andrzej Siewior
On 2019-10-16 06:57:48 [+0200], Christophe Leroy wrote:
> 
> 
> Le 15/10/2019 à 21:17, Sebastian Andrzej Siewior a écrit :
> > From: Thomas Gleixner 
> > 
> > CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT.
> > Both PREEMPT and PREEMPT_RT require the same functionality which today
> > depends on CONFIG_PREEMPT.
> > 
> > Switch the entry code over to use CONFIG_PREEMPTION. Add PREEMPT_RT
> > output in __die().
> 
> powerpc doesn't select ARCH_SUPPORTS_RT, so this change is useless as
> CONFIG_PREEMPT_RT cannot be selected.

No it is not. It makes it possible for PowerPC to select it one day and
I have patches for it today. Also, if other ARCH copies code from
PowerPC it will copy the correct thing (as in distinguish between the
flavour PREEMPT and the functionality PREEMPTION).

> > diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> > index 82f43535e6867..23d2f20be4f2e 100644
> > --- a/arch/powerpc/kernel/traps.c
> > +++ b/arch/powerpc/kernel/traps.c
> > @@ -252,14 +252,19 @@ NOKPROBE_SYMBOL(oops_end);
> >   static int __die(const char *str, struct pt_regs *regs, long err)
> >   {
> > +   const char *pr = "";
> > +
> 
> Please follow the same approach as already existing. Don't add a local var
> for that.

I would leave it to the maintainer to comment on that and decide which
one they want. My eyes find it more readable and the compiles does not
create more code.

> > printk("Oops: %s, sig: %ld [#%d]\n", str, err, ++die_counter);
> > +   if (IS_ENABLED(CONFIG_PREEMPTION))
> > +   pr = IS_ENABLED(CONFIG_PREEMPT_RT) ? " PREEMPT_RT" : " PREEMPT";
> > +
> 
> drop
> 
> > printk("%s PAGE_SIZE=%luK%s%s%s%s%s%s%s %s\n",
> 
> Add one %s
> 
> >IS_ENABLED(CONFIG_CPU_LITTLE_ENDIAN) ? "LE" : "BE",
> >PAGE_SIZE / 1024,
> >early_radix_enabled() ? " MMU=Radix" : "",
> >early_mmu_has_feature(MMU_FTR_HPTE_TABLE) ? " MMU=Hash" : "",
> > -  IS_ENABLED(CONFIG_PREEMPT) ? " PREEMPT" : "",
> 
> Replace by:   IS_ENABLED(CONFIG_PREEMPTION) ? " PREEMPT" : ""
> 
> > +  pr,
> 
> add something like: IS_ENABLED(CONFIG_PREEMPT_RT) ? "_RT" : ""

this on the other hand will create more code which is not strictly
required.

> >IS_ENABLED(CONFIG_SMP) ? " SMP" : "",
> >IS_ENABLED(CONFIG_SMP) ? (" NR_CPUS=" __stringify(NR_CPUS)) : "",
> >debug_pagealloc_enabled() ? " DEBUG_PAGEALLOC" : "",
> > 
> 
> Christophe

Sebastian


[PATCH 03/34] powerpc: Use CONFIG_PREEMPTION

2019-10-15 Thread Sebastian Andrzej Siewior
From: Thomas Gleixner 

CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT.
Both PREEMPT and PREEMPT_RT require the same functionality which today
depends on CONFIG_PREEMPT.

Switch the entry code over to use CONFIG_PREEMPTION. Add PREEMPT_RT
output in __die().

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Thomas Gleixner 
[bigeasy: +traps.c, Kconfig]
Signed-off-by: Sebastian Andrzej Siewior 
---
 arch/powerpc/Kconfig   | 2 +-
 arch/powerpc/kernel/entry_32.S | 4 ++--
 arch/powerpc/kernel/entry_64.S | 4 ++--
 arch/powerpc/kernel/traps.c| 7 ++-
 4 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 3e56c9c2f16ee..8ead8d6e1cbc8 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -106,7 +106,7 @@ config LOCKDEP_SUPPORT
 config GENERIC_LOCKBREAK
bool
default y
-   depends on SMP && PREEMPT
+   depends on SMP && PREEMPTION
 
 config GENERIC_HWEIGHT
bool
diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index d60908ea37fb9..e1a4c39b83b86 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -897,7 +897,7 @@ user_exc_return:/* r10 contains MSR_KERNEL here 
*/
bne-0b
 1:
 
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
/* check current_thread_info->preempt_count */
lwz r0,TI_PREEMPT(r2)
cmpwi   0,r0,0  /* if non-zero, just restore regs and return */
@@ -921,7 +921,7 @@ user_exc_return:/* r10 contains MSR_KERNEL here 
*/
 */
bl  trace_hardirqs_on
 #endif
-#endif /* CONFIG_PREEMPT */
+#endif /* CONFIG_PREEMPTION */
 restore_kuap:
kuap_restore r1, r2, r9, r10, r0
 
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 6467bdab8d405..83733376533e8 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -840,7 +840,7 @@ _GLOBAL(ret_from_except_lite)
bne-0b
 1:
 
-#ifdef CONFIG_PREEMPT
+#ifdef CONFIG_PREEMPTION
/* Check if we need to preempt */
andi.   r0,r4,_TIF_NEED_RESCHED
beq+restore
@@ -871,7 +871,7 @@ _GLOBAL(ret_from_except_lite)
li  r10,MSR_RI
mtmsrd  r10,1 /* Update machine state */
 #endif /* CONFIG_PPC_BOOK3E */
-#endif /* CONFIG_PREEMPT */
+#endif /* CONFIG_PREEMPTION */
 
.globl  fast_exc_return_irq
 fast_exc_return_irq:
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 82f43535e6867..23d2f20be4f2e 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -252,14 +252,19 @@ NOKPROBE_SYMBOL(oops_end);
 
 static int __die(const char *str, struct pt_regs *regs, long err)
 {
+   const char *pr = "";
+
printk("Oops: %s, sig: %ld [#%d]\n", str, err, ++die_counter);
 
+   if (IS_ENABLED(CONFIG_PREEMPTION))
+   pr = IS_ENABLED(CONFIG_PREEMPT_RT) ? " PREEMPT_RT" : " PREEMPT";
+
printk("%s PAGE_SIZE=%luK%s%s%s%s%s%s%s %s\n",
   IS_ENABLED(CONFIG_CPU_LITTLE_ENDIAN) ? "LE" : "BE",
   PAGE_SIZE / 1024,
   early_radix_enabled() ? " MMU=Radix" : "",
   early_mmu_has_feature(MMU_FTR_HPTE_TABLE) ? " MMU=Hash" : "",
-  IS_ENABLED(CONFIG_PREEMPT) ? " PREEMPT" : "",
+  pr,
   IS_ENABLED(CONFIG_SMP) ? " SMP" : "",
   IS_ENABLED(CONFIG_SMP) ? (" NR_CPUS=" __stringify(NR_CPUS)) : "",
   debug_pagealloc_enabled() ? " DEBUG_PAGEALLOC" : "",
-- 
2.23.0



Re: [Qemu-ppc] pseries on qemu-system-ppc64le crashes in doorbell_core_ipi()

2019-04-05 Thread Sebastian Andrzej Siewior
On 2019-04-05 02:25:44 [+1000], Nicholas Piggin wrote:
> Sebastian, are you able to test if this patch solves your problem?

yes, it does.

Tested-by: Sebastian Andrzej Siewior 

> Thanks,
> Nick

Sebastian


Re: [Qemu-ppc] pseries on qemu-system-ppc64le crashes in doorbell_core_ipi()

2019-03-29 Thread Sebastian Andrzej Siewior
On 2019-03-29 16:20:51 [+1100], Suraj Jitindar Singh wrote:
> 
> Yeah the kernel must have used msgsndp which isn't implemented for TCG
> yet. We use doorbells in linux but only for threads which are on the
> same core.

It is msgsndp as per instruction decode.

> And when I try to construct a situation with more than 1 thread per
> core (e.g. -smp 4,threads=4), I get "TCG cannot support more than 1
> thread/core on a pseries machine".
>
> So I wonder why the guest thinks it can use msgsndp...

I see
|# cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list
|0
|1
|2
|3

so this works all well. The problem is when a CPU sends an IPI to itself
then linux's doorbell_try_core_ipi() considers this as a valid sibling
and here we have the msgsndp.

Sebastian


pseries on qemu-system-ppc64le crashes in doorbell_core_ipi()

2019-03-27 Thread Sebastian Andrzej Siewior
With qemu-system-ppc64le -machine pseries -smp 4 I get:

|#  chrt 1 hackbench
|Running in process mode with 10 groups using 40 file descriptors each (== 400 
tasks)
|Each sender will pass 100 messages of 100 bytes
| Oops: Exception in kernel mode, sig: 4 [#1]
| LE PAGE_SIZE=64K MMU=Hash PREEMPT SMP NR_CPUS=2048 NUMA pSeries
| Modules linked in:
| CPU: 0 PID: 629 Comm: hackbench Not tainted 5.1.0-rc2 #71
| NIP:  c0046978 LR: c0046a38 CTR: c00b0150
| REGS: c001fffeb8e0 TRAP: 0700   Not tainted  (5.1.0-rc2)
| MSR:  80089033   CR: 42000874  XER: 
| CFAR: c0046a34 IRQMASK: 1
| GPR00: c00b0170 c001fffebb70 c0a6ba00 2800
…
| NIP [c0046978] doorbell_core_ipi+0x28/0x30
| LR [c0046a38] doorbell_try_core_ipi+0xb8/0xf0
| Call Trace:
| [c001fffebb70] [c001fffebba0] 0xc001fffebba0 (unreliable)
| [c001fffebba0] [c00b0170] smp_pseries_cause_ipi+0x20/0x70
| [c001fffebbd0] [c004b02c] 
arch_send_call_function_single_ipi+0x8c/0xa0
| [c001fffebbf0] [c01de600] irq_work_queue_on+0xe0/0x130
| [c001fffebc30] [c01340c8] rto_push_irq_work_func+0xc8/0x120
…
| Instruction dump:
| 6000 6000 3c4c00a2 384250b0 3d220009 392949c8 8129 3929
| 7d231838 7c0004ac 5463017e 64632800 <7c00191c> 4e800020 3c4c00a2 38425080
| ---[ end trace eb842b544538cbdf ]---

and I was wondering whether this is a qemu bug or the kernel is using an
opcode it should rather not. If I skip doorbell_try_core_ipi() in
smp_pseries_cause_ipi() then there is no crash. The comment says "POWER9
should not use this handler" so…

(This is QEMU emulator version 3.1.0 (Debian 1:3.1+dfsg-6))

Sebastian


[PATCH 17/22] KVM/PPC/Book3S HV: Convert to hotplug state machine

2016-11-26 Thread Sebastian Andrzej Siewior
From: Anna-Maria Gleixner 

Install the callbacks via the state machine.

Cc: Alexander Graf 
Cc: Paolo Bonzini 
Cc: "Radim Krčmář" 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: kvm-...@vger.kernel.org
Cc: k...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Anna-Maria Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
---
 arch/powerpc/kvm/book3s_hv.c | 48 ++--
 include/linux/cpuhotplug.h   |  1 +
 2 files changed, 12 insertions(+), 37 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 3686471be32b..39ef1f4a7b02 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2254,12 +2254,12 @@ static void post_guest_process(struct kvmppc_vcore *vc, 
bool is_master)
  * enter the guest. Only do this if it is the primary thread of the
  * core (not if a subcore) that is entering the guest.
  */
-static inline void kvmppc_clear_host_core(int cpu)
+static inline int kvmppc_clear_host_core(unsigned int cpu)
 {
int core;
 
if (!kvmppc_host_rm_ops_hv || cpu_thread_in_core(cpu))
-   return;
+   return 0;
/*
 * Memory barrier can be omitted here as we will do a smp_wmb()
 * later in kvmppc_start_thread and we need ensure that state is
@@ -2267,6 +2267,7 @@ static inline void kvmppc_clear_host_core(int cpu)
 */
core = cpu >> threads_shift;
kvmppc_host_rm_ops_hv->rm_core[core].rm_state.in_host = 0;
+   return 0;
 }
 
 /*
@@ -2274,12 +2275,12 @@ static inline void kvmppc_clear_host_core(int cpu)
  * Only need to do this if it is the primary thread of the core that is
  * exiting.
  */
-static inline void kvmppc_set_host_core(int cpu)
+static inline int kvmppc_set_host_core(unsigned int cpu)
 {
int core;
 
if (!kvmppc_host_rm_ops_hv || cpu_thread_in_core(cpu))
-   return;
+   return 0;
 
/*
 * Memory barrier can be omitted here because we do a spin_unlock
@@ -2287,6 +2288,7 @@ static inline void kvmppc_set_host_core(int cpu)
 */
core = cpu >> threads_shift;
kvmppc_host_rm_ops_hv->rm_core[core].rm_state.in_host = 1;
+   return 0;
 }
 
 /*
@@ -3094,36 +3096,6 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu 
*vcpu)
 }
 
 #ifdef CONFIG_KVM_XICS
-static int kvmppc_cpu_notify(struct notifier_block *self, unsigned long action,
-   void *hcpu)
-{
-   unsigned long cpu = (long)hcpu;
-
-   switch (action) {
-   case CPU_UP_PREPARE:
-   case CPU_UP_PREPARE_FROZEN:
-   kvmppc_set_host_core(cpu);
-   break;
-
-#ifdef CONFIG_HOTPLUG_CPU
-   case CPU_DEAD:
-   case CPU_DEAD_FROZEN:
-   case CPU_UP_CANCELED:
-   case CPU_UP_CANCELED_FROZEN:
-   kvmppc_clear_host_core(cpu);
-   break;
-#endif
-   default:
-   break;
-   }
-
-   return NOTIFY_OK;
-}
-
-static struct notifier_block kvmppc_cpu_notifier = {
-   .notifier_call = kvmppc_cpu_notify,
-};
-
 /*
  * Allocate a per-core structure for managing state about which cores are
  * running in the host versus the guest and for exchanging data between
@@ -3185,15 +3157,17 @@ void kvmppc_alloc_host_rm_ops(void)
return;
}
 
-   register_cpu_notifier(&kvmppc_cpu_notifier);
-
+   cpuhp_setup_state_nocalls(CPUHP_KVM_PPC_BOOK3S_PREPARE,
+ "ppc/kvm_book3s:prepare",
+ kvmppc_set_host_core,
+ kvmppc_clear_host_core);
put_online_cpus();
 }
 
 void kvmppc_free_host_rm_ops(void)
 {
if (kvmppc_host_rm_ops_hv) {
-   unregister_cpu_notifier(&kvmppc_cpu_notifier);
+   cpuhp_remove_state_nocalls(CPUHP_KVM_PPC_BOOK3S_PREPARE);
kfree(kvmppc_host_rm_ops_hv->rm_core);
kfree(kvmppc_host_rm_ops_hv);
kvmppc_host_rm_ops_hv = NULL;
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 853f8176594d..71c6822dd5be 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -68,6 +68,7 @@ enum cpuhp_state {
CPUHP_MM_ZS_PREPARE,
CPUHP_MM_ZSWP_MEM_PREPARE,
CPUHP_MM_ZSWP_POOL_PREPARE,
+   CPUHP_KVM_PPC_BOOK3S_PREPARE,
CPUHP_TIMERS_DEAD,
CPUHP_NOTF_ERR_INJ_PREPARE,
CPUHP_MIPS_SOC_PREPARE,
-- 
2.10.2



[PATCH 16/20] powerpc/sysfs: Convert to hotplug state machine

2016-11-17 Thread Sebastian Andrzej Siewior
Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

The previous convention of keeping the files around until the CPU is dead
has not been preserved as there is no point to keep them available when the
cpu is going down. This makes the hotplug call symmetric.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Sebastian Andrzej Siewior 
---
 arch/powerpc/kernel/sysfs.c | 50 +
 1 file changed, 10 insertions(+), 40 deletions(-)

diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index c4f1d1f7bae0..c1fb255a60d6 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -703,7 +703,7 @@ static struct device_attribute pa6t_attrs[] = {
 #endif /* HAS_PPC_PMC_PA6T */
 #endif /* HAS_PPC_PMC_CLASSIC */
 
-static void register_cpu_online(unsigned int cpu)
+static int register_cpu_online(unsigned int cpu)
 {
struct cpu *c = &per_cpu(cpu_devices, cpu);
struct device *s = &c->dev;
@@ -782,11 +782,12 @@ static void register_cpu_online(unsigned int cpu)
}
 #endif
cacheinfo_cpu_online(cpu);
+   return 0;
 }
 
-#ifdef CONFIG_HOTPLUG_CPU
-static void unregister_cpu_online(unsigned int cpu)
+static int unregister_cpu_online(unsigned int cpu)
 {
+#ifdef CONFIG_HOTPLUG_CPU
struct cpu *c = &per_cpu(cpu_devices, cpu);
struct device *s = &c->dev;
struct device_attribute *attrs, *pmc_attrs;
@@ -863,6 +864,8 @@ static void unregister_cpu_online(unsigned int cpu)
}
 #endif
cacheinfo_cpu_offline(cpu);
+#endif /* CONFIG_HOTPLUG_CPU */
+   return 0;
 }
 
 #ifdef CONFIG_ARCH_CPU_PROBE_RELEASE
@@ -883,32 +886,6 @@ ssize_t arch_cpu_release(const char *buf, size_t count)
 }
 #endif /* CONFIG_ARCH_CPU_PROBE_RELEASE */
 
-#endif /* CONFIG_HOTPLUG_CPU */
-
-static int sysfs_cpu_notify(struct notifier_block *self,
- unsigned long action, void *hcpu)
-{
-   unsigned int cpu = (unsigned int)(long)hcpu;
-
-   switch (action) {
-   case CPU_ONLINE:
-   case CPU_ONLINE_FROZEN:
-   register_cpu_online(cpu);
-   break;
-#ifdef CONFIG_HOTPLUG_CPU
-   case CPU_DEAD:
-   case CPU_DEAD_FROZEN:
-   unregister_cpu_online(cpu);
-   break;
-#endif
-   }
-   return NOTIFY_OK;
-}
-
-static struct notifier_block sysfs_cpu_nb = {
-   .notifier_call  = sysfs_cpu_notify,
-};
-
 static DEFINE_MUTEX(cpu_mutex);
 
 int cpu_add_dev_attr(struct device_attribute *attr)
@@ -1023,12 +1000,10 @@ static DEVICE_ATTR(physical_id, 0444, show_physical_id, 
NULL);
 
 static int __init topology_init(void)
 {
-   int cpu;
+   int cpu, r;
 
register_nodes();
 
-   cpu_notifier_register_begin();
-
for_each_possible_cpu(cpu) {
struct cpu *c = &per_cpu(cpu_devices, cpu);
 
@@ -1047,15 +1022,10 @@ static int __init topology_init(void)
 
device_create_file(&c->dev, &dev_attr_physical_id);
}
-
-   if (cpu_online(cpu))
-   register_cpu_online(cpu);
}
-
-   __register_cpu_notifier(&sysfs_cpu_nb);
-
-   cpu_notifier_register_done();
-
+   r = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "powerpc/topology:online",
+ register_cpu_online, unregister_cpu_online);
+   WARN_ON(r < 0);
 #ifdef CONFIG_PPC64
sysfs_create_dscr_default();
 #endif /* CONFIG_PPC64 */
-- 
2.10.2



[PATCH 16/16] powerpc: mmu nohash: Convert to hotplug state machine

2016-08-18 Thread Sebastian Andrzej Siewior
Install the callbacks via the state machine.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Sebastian Andrzej Siewior 
---
 arch/powerpc/mm/mmu_context_nohash.c | 54 +++-
 include/linux/cpuhotplug.h   |  1 +
 2 files changed, 24 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/mm/mmu_context_nohash.c 
b/arch/powerpc/mm/mmu_context_nohash.c
index 7d95bc402dba..f7755fcbe1f8 100644
--- a/arch/powerpc/mm/mmu_context_nohash.c
+++ b/arch/powerpc/mm/mmu_context_nohash.c
@@ -369,44 +369,34 @@ void destroy_context(struct mm_struct *mm)
 }
 
 #ifdef CONFIG_SMP
-
-static int mmu_context_cpu_notify(struct notifier_block *self,
- unsigned long action, void *hcpu)
+static int mmu_ctx_cpu_prepare(unsigned int cpu)
 {
-   unsigned int cpu = (unsigned int)(long)hcpu;
-
/* We don't touch CPU 0 map, it's allocated at aboot and kept
 * around forever
 */
if (cpu == boot_cpuid)
-   return NOTIFY_OK;
+   return 0;
 
-   switch (action) {
-   case CPU_UP_PREPARE:
-   case CPU_UP_PREPARE_FROZEN:
-   pr_devel("MMU: Allocating stale context map for CPU %d\n", cpu);
-   stale_map[cpu] = kzalloc(CTX_MAP_SIZE, GFP_KERNEL);
-   break;
-#ifdef CONFIG_HOTPLUG_CPU
-   case CPU_UP_CANCELED:
-   case CPU_UP_CANCELED_FROZEN:
-   case CPU_DEAD:
-   case CPU_DEAD_FROZEN:
-   pr_devel("MMU: Freeing stale context map for CPU %d\n", cpu);
-   kfree(stale_map[cpu]);
-   stale_map[cpu] = NULL;
-
-   /* We also clear the cpu_vm_mask bits of CPUs going away */
-   clear_tasks_mm_cpumask(cpu);
-   break;
-#endif /* CONFIG_HOTPLUG_CPU */
-   }
-   return NOTIFY_OK;
+   pr_devel("MMU: Allocating stale context map for CPU %d\n", cpu);
+   stale_map[cpu] = kzalloc(CTX_MAP_SIZE, GFP_KERNEL);
+   return 0;
 }
 
-static struct notifier_block mmu_context_cpu_nb = {
-   .notifier_call  = mmu_context_cpu_notify,
-};
+static int mmu_ctx_cpu_dead(unsigned int cpu)
+{
+#ifdef CONFIG_HOTPLUG_CPU
+   if (cpu == boot_cpuid)
+   return 0;
+
+   pr_devel("MMU: Freeing stale context map for CPU %d\n", cpu);
+   kfree(stale_map[cpu]);
+   stale_map[cpu] = NULL;
+
+   /* We also clear the cpu_vm_mask bits of CPUs going away */
+   clear_tasks_mm_cpumask(cpu);
+#endif
+   return 0;
+}
 
 #endif /* CONFIG_SMP */
 
@@ -469,7 +459,9 @@ void __init mmu_context_init(void)
 #else
stale_map[boot_cpuid] = memblock_virt_alloc(CTX_MAP_SIZE, 0);
 
-   register_cpu_notifier(&mmu_context_cpu_nb);
+   cpuhp_setup_state_nocalls(CPUHP_POWER_MMU_CTX_PREPARE,
+ "POWER_MMU_CTX_PREPARE", mmu_ctx_cpu_prepare,
+ mmu_ctx_cpu_dead);
 #endif
 
printk(KERN_INFO
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 4974c9fdbf9a..92b9cf3271b2 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -33,6 +33,7 @@ enum cpuhp_state {
CPUHP_MD_RAID5_PREPARE,
CPUHP_CPUIDLE_COUPLED_PREPARE,
CPUHP_POWER_PMAC_PREPARE,
+   CPUHP_POWER_MMU_CTX_PREPARE,
CPUHP_NOTIFY_PREPARE,
CPUHP_TIMERS_DEAD,
CPUHP_BRINGUP_CPU,
-- 
2.9.3



[PATCH 15/16] powerpc: powermac: Convert to hotplug state machine

2016-08-18 Thread Sebastian Andrzej Siewior
Install the callbacks via the state machine.
I assume here that the powermac has two CPUs and so only one can go up
or down at a time. The variable smp_core99_host_open is here to ensure
that we do not try to open or close the i2c host twice if something goes
wrong and we invoke the prepare or online callback twice due to
rollback.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Sebastian Andrzej Siewior 
---
 arch/powerpc/platforms/powermac/smp.c | 50 +--
 include/linux/cpuhotplug.h|  1 +
 2 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/platforms/powermac/smp.c 
b/arch/powerpc/platforms/powermac/smp.c
index 834868b9fdc9..4a853323f906 100644
--- a/arch/powerpc/platforms/powermac/smp.c
+++ b/arch/powerpc/platforms/powermac/smp.c
@@ -852,37 +852,33 @@ static void smp_core99_setup_cpu(int cpu_nr)
 
 #ifdef CONFIG_PPC64
 #ifdef CONFIG_HOTPLUG_CPU
-static int smp_core99_cpu_notify(struct notifier_block *self,
-unsigned long action, void *hcpu)
+static unsigned int smp_core99_host_open;
+
+static int smp_core99_cpu_prepare(unsigned int cpu)
 {
int rc;
 
-   switch(action & ~CPU_TASKS_FROZEN) {
-   case CPU_UP_PREPARE:
-   /* Open i2c bus if it was used for tb sync */
-   if (pmac_tb_clock_chip_host) {
-   rc = pmac_i2c_open(pmac_tb_clock_chip_host, 1);
-   if (rc) {
-   pr_err("Failed to open i2c bus for time 
sync\n");
-   return notifier_from_errno(rc);
-   }
+   /* Open i2c bus if it was used for tb sync */
+   if (pmac_tb_clock_chip_host && !smp_core99_host_open) {
+   rc = pmac_i2c_open(pmac_tb_clock_chip_host, 1);
+   if (rc) {
+   pr_err("Failed to open i2c bus for time sync\n");
+   return notifier_from_errno(rc);
}
-   break;
-   case CPU_ONLINE:
-   case CPU_UP_CANCELED:
-   /* Close i2c bus if it was used for tb sync */
-   if (pmac_tb_clock_chip_host)
-   pmac_i2c_close(pmac_tb_clock_chip_host);
-   break;
-   default:
-   break;
+   smp_core99_host_open = 1;
}
-   return NOTIFY_OK;
+   return 0;
 }
 
-static struct notifier_block smp_core99_cpu_nb = {
-   .notifier_call  = smp_core99_cpu_notify,
-};
+static int smp_core99_cpu_online(unsigned int cpu)
+{
+   /* Close i2c bus if it was used for tb sync */
+   if (pmac_tb_clock_chip_host && smp_core99_host_open) {
+   pmac_i2c_close(pmac_tb_clock_chip_host);
+   smp_core99_host_open = 0;
+   }
+   return 0;
+}
 #endif /* CONFIG_HOTPLUG_CPU */
 
 static void __init smp_core99_bringup_done(void)
@@ -902,7 +898,11 @@ static void __init smp_core99_bringup_done(void)
g5_phy_disable_cpu1();
}
 #ifdef CONFIG_HOTPLUG_CPU
-   register_cpu_notifier(&smp_core99_cpu_nb);
+   cpuhp_setup_state_nocalls(CPUHP_POWER_PMAC_PREPARE,
+ "POWER_PMAC_PREPARE", smp_core99_cpu_prepare,
+ NULL);
+   cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "AP_POWER_PMAC_ONLINE",
+ smp_core99_cpu_online, NULL);
 #endif
 
if (ppc_md.progress)
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 9e50a7b3bbcd..4974c9fdbf9a 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -32,6 +32,7 @@ enum cpuhp_state {
CPUHP_RCUTREE_PREP,
CPUHP_MD_RAID5_PREPARE,
CPUHP_CPUIDLE_COUPLED_PREPARE,
+   CPUHP_POWER_PMAC_PREPARE,
CPUHP_NOTIFY_PREPARE,
CPUHP_TIMERS_DEAD,
CPUHP_BRINGUP_CPU,
-- 
2.9.3



[PATCH v2] powerpc/numa: Convert to hotplug state machine

2016-07-18 Thread Sebastian Andrzej Siewior
Install the callbacks via the state machine and let the core invoke
the callbacks on the already online CPUs.

v1…v2: manual callback invocation on boot-CPU (cpuhp is not up yet and
   we need them all before additional CPUs are up).

Signed-off-by: Sebastian Andrzej Siewior 
Cc: Andrew Morton 
Cc: Benjamin Herrenschmidt 
Cc: Bharata B Rao 
Cc: Christophe Jaillet 
Cc: Linus Torvalds 
Cc: Michael Ellerman 
Cc: Nikunj A Dadhania 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Raghavendra K T 
Cc: Thomas Gleixner 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Anna-Maria Gleixner 
---
 arch/powerpc/mm/numa.c | 48 ++
 include/linux/cpuhotplug.h |  1 +
 2 files changed, 20 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 669a15e7fa76..6dc07ddbfd04 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -581,30 +581,22 @@ static void verify_cpu_node_mapping(int cpu, int node)
}
 }
 
-static int cpu_numa_callback(struct notifier_block *nfb, unsigned long action,
-void *hcpu)
+/* Must run before sched domains notifier. */
+static int ppc_numa_cpu_prepare(unsigned int cpu)
 {
-   unsigned long lcpu = (unsigned long)hcpu;
-   int ret = NOTIFY_DONE, nid;
+   int nid;
 
-   switch (action) {
-   case CPU_UP_PREPARE:
-   case CPU_UP_PREPARE_FROZEN:
-   nid = numa_setup_cpu(lcpu);
-   verify_cpu_node_mapping((int)lcpu, nid);
-   ret = NOTIFY_OK;
-   break;
+   nid = numa_setup_cpu(cpu);
+   verify_cpu_node_mapping(cpu, nid);
+   return 0;
+}
+
+static int ppc_numa_cpu_dead(unsigned int cpu)
+{
 #ifdef CONFIG_HOTPLUG_CPU
-   case CPU_DEAD:
-   case CPU_DEAD_FROZEN:
-   case CPU_UP_CANCELED:
-   case CPU_UP_CANCELED_FROZEN:
-   unmap_cpu_from_node(lcpu);
-   ret = NOTIFY_OK;
-   break;
+   unmap_cpu_from_node(cpu);
 #endif
-   }
-   return ret;
+   return 0;
 }
 
 /*
@@ -913,11 +905,6 @@ static void __init dump_numa_memory_topology(void)
}
 }
 
-static struct notifier_block ppc64_numa_nb = {
-   .notifier_call = cpu_numa_callback,
-   .priority = 1 /* Must run before sched domains notifier. */
-};
-
 /* Initialize NODE_DATA for a node on the local memory */
 static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
 {
@@ -985,15 +972,18 @@ void __init initmem_init(void)
setup_node_to_cpumask_map();
 
reset_numa_cpu_lookup_table();
-   register_cpu_notifier(&ppc64_numa_nb);
+
/*
 * We need the numa_cpu_lookup_table to be accurate for all CPUs,
 * even before we online them, so that we can use cpu_to_{node,mem}
 * early in boot, cf. smp_prepare_cpus().
+* _nocalls() + manual invocation is used because cpuhp is not yet
+* initialized for the boot CPU.
 */
-   for_each_present_cpu(cpu) {
-   numa_setup_cpu((unsigned long)cpu);
-   }
+   cpuhp_setup_state_nocalls(CPUHP_POWER_NUMA_PREPARE, 
"POWER_NUMA_PREPARE",
+ ppc_numa_cpu_prepare, ppc_numa_cpu_dead);
+   for_each_present_cpu(cpu)
+   numa_setup_cpu(cpu);
 }
 
 static int __init early_numa(char *p)
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 09ef54bcba39..5015f463d314 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -15,6 +15,7 @@ enum cpuhp_state {
CPUHP_X86_HPET_DEAD,
CPUHP_X86_APB_DEAD,
CPUHP_WORKQUEUE_PREP,
+   CPUHP_POWER_NUMA_PREPARE,
CPUHP_HRTIMERS_PREPARE,
CPUHP_PROFILE_PREPARE,
CPUHP_X2APIC_PREPARE,
-- 
2.8.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [patch V2 30/67] powerpc/numa: Convert to hotplug state machine

2016-07-15 Thread Sebastian Andrzej Siewior
* Anton Blanchard | 2016-07-15 10:28:25 [+1000]:

>Hi Anna-Maria,
Hi Anton,

>> >> Install the callbacks via the state machine and let the core invoke
>> >> the callbacks on the already online CPUs.  
>> >
>> > This is causing an oops on ppc64le QEMU, looks like a NULL
>> > pointer:  
>> 
>> Did you tested it against tip WIP.hotplug?
>
>I noticed tip started failing in my CI environment which tests on QEMU.
>The failure bisected to commit 425209e0abaf2c6e3a90ce4fedb935c10652bf80
>
>It reproduces running ppc64le QEMU on a x86-64 box. On Ubuntu:
…

Thanks for that. I can reproduce this ontop of latest WIP.hotplug with
this patch.

>Anton

Sebastian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] powerpc: mm: fixup preempt undefflow with huge pages

2016-03-30 Thread Sebastian Andrzej Siewior
On 03/30/2016 02:41 AM, Michael Ellerman wrote:
> The merge window just closed, I'm still recovering.
> 
> I've got it in my fixes branch locally, I'll probably push that today to
> linux-next.

Thank you.

> 
> cheers
> 
Sebastian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2] powerpc: mm: fixup preempt undefflow with huge pages

2016-03-29 Thread Sebastian Andrzej Siewior
On 2016-03-10 01:04:24 [+0530], Aneesh Kumar K.V wrote:
> Sebastian Andrzej Siewior  writes:

*ping*
http://patchwork.ozlabs.org/patch/593943/

> > [ text/plain ]
> > hugepd_free() used __get_cpu_var() once. Nothing ensured that the code
> > accessing the variable did not migrate from one CPU to another and soon
> > this was noticed by Tiejun Chen in 94b09d755462 ("powerpc/hugetlb:
> > Replace __get_cpu_var with get_cpu_var"). So we had it fixed.
> >
> > Christoph Lameter was doing his __get_cpu_var() replaces and forgot
> > PowerPC. Then he noticed this and sent his fixed up batch again which
> > got applied as 69111bac42f5 ("powerpc: Replace __get_cpu_var uses").
> >
> > The careful reader will noticed one little detail: get_cpu_var() got
> > replaced with this_cpu_ptr(). So now we have a put_cpu_var() which does
> > a preempt_enable() and nothing that does preempt_disable() so we
> > underflow the preempt counter.
> >
> 
> Reviewed-by: Aneesh Kumar K.V 
> 
> > Cc: Benjamin Herrenschmidt 
> > Cc: Christoph Lameter 
> > Cc: Michael Ellerman 
> > Cc: 
> > Signed-off-by: Sebastian Andrzej Siewior 
> > ---
> > v1…v2: - use get_cpu_var() instead of get_cpu_ptr()
> >- correct indentation of put_cpu_var()
> >
> >  arch/powerpc/mm/hugetlbpage.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
> > index 744e24bcb85c..4a811ca7ac9d 100644
> > --- a/arch/powerpc/mm/hugetlbpage.c
> > +++ b/arch/powerpc/mm/hugetlbpage.c
> > @@ -414,13 +414,13 @@ static void hugepd_free(struct mmu_gather *tlb, void 
> > *hugepte)
> >  {
> > struct hugepd_freelist **batchp;
> >  
> > -   batchp = this_cpu_ptr(&hugepd_freelist_cur);
> > +   batchp = &get_cpu_var(hugepd_freelist_cur);
> >  
> > if (atomic_read(&tlb->mm->mm_users) < 2 ||
> > cpumask_equal(mm_cpumask(tlb->mm),
> >   cpumask_of(smp_processor_id( {
> > kmem_cache_free(hugepte_cache, hugepte);
> > -put_cpu_var(hugepd_freelist_cur);
> > +   put_cpu_var(hugepd_freelist_cur);
> > return;
> > }
> >  

Sebastian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH v2] powerpc: mm: fixup preempt undefflow with huge pages

2016-03-08 Thread Sebastian Andrzej Siewior
hugepd_free() used __get_cpu_var() once. Nothing ensured that the code
accessing the variable did not migrate from one CPU to another and soon
this was noticed by Tiejun Chen in 94b09d755462 ("powerpc/hugetlb:
Replace __get_cpu_var with get_cpu_var"). So we had it fixed.

Christoph Lameter was doing his __get_cpu_var() replaces and forgot
PowerPC. Then he noticed this and sent his fixed up batch again which
got applied as 69111bac42f5 ("powerpc: Replace __get_cpu_var uses").

The careful reader will noticed one little detail: get_cpu_var() got
replaced with this_cpu_ptr(). So now we have a put_cpu_var() which does
a preempt_enable() and nothing that does preempt_disable() so we
underflow the preempt counter.

Cc: Benjamin Herrenschmidt 
Cc: Christoph Lameter 
Cc: Michael Ellerman 
Cc: 
Signed-off-by: Sebastian Andrzej Siewior 
---
v1…v2: - use get_cpu_var() instead of get_cpu_ptr()
   - correct indentation of put_cpu_var()

 arch/powerpc/mm/hugetlbpage.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 744e24bcb85c..4a811ca7ac9d 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -414,13 +414,13 @@ static void hugepd_free(struct mmu_gather *tlb, void 
*hugepte)
 {
struct hugepd_freelist **batchp;
 
-   batchp = this_cpu_ptr(&hugepd_freelist_cur);
+   batchp = &get_cpu_var(hugepd_freelist_cur);
 
if (atomic_read(&tlb->mm->mm_users) < 2 ||
cpumask_equal(mm_cpumask(tlb->mm),
  cpumask_of(smp_processor_id( {
kmem_cache_free(hugepte_cache, hugepte);
-put_cpu_var(hugepd_freelist_cur);
+   put_cpu_var(hugepd_freelist_cur);
return;
}
 
-- 
2.7.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: mm: fixup preempt undefflow with huge pages

2016-03-08 Thread Sebastian Andrzej Siewior
On 03/08/2016 12:41 AM, Benjamin Herrenschmidt wrote:
> On Mon, 2016-03-07 at 21:04 +0530, Aneesh Kumar K.V wrote:
>> Sebastian Andrzej Siewior  writes:
>>  
>> While you are there, can you also fix the wrong indentation on line
>> 423
>> ?
> 
>  .../...
> 
> Also this looks like stable material no ?

Yes, I had a stable tag attached. I am going to resend it with the
additional changes Aneesh asked for.

> Cheers,
> Ben.
> 
Sebastian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH] powerpc: mm: fixup preempt undefflow with huge pages

2016-03-07 Thread Sebastian Andrzej Siewior
hugepd_free() used __get_cpu_var() once. Nothing ensured that the code
accessing the variable did not migrate from one CPU to another and soon
this was noticed by Tiejun Chen in 94b09d755462 ("powerpc/hugetlb:
Replace __get_cpu_var with get_cpu_var"). So we had it fixed.

Christoph Lameter was doing his __get_cpu_var() replaces and forgot
PowerPC. Then he noticed this and sent his fixed up batch again which
got applied as 69111bac42f5 ("powerpc: Replace __get_cpu_var uses").

The careful reader will noticed one little detail: get_cpu_var() got
replaced with this_cpu_ptr(). So now we have a put_cpu_var() which does
a preempt_enable() and nothing that does preempt_disable() so we
underflow the preempt counter.

Cc: Benjamin Herrenschmidt 
Cc: Christoph Lameter 
Cc: Michael Ellerman 
Cc: Tiejun Chen 
Cc: 
Signed-off-by: Sebastian Andrzej Siewior 
---
 arch/powerpc/mm/hugetlbpage.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
index 744e24bcb85c..9e8919b39640 100644
--- a/arch/powerpc/mm/hugetlbpage.c
+++ b/arch/powerpc/mm/hugetlbpage.c
@@ -414,7 +414,7 @@ static void hugepd_free(struct mmu_gather *tlb, void 
*hugepte)
 {
struct hugepd_freelist **batchp;
 
-   batchp = this_cpu_ptr(&hugepd_freelist_cur);
+   batchp = get_cpu_ptr(&hugepd_freelist_cur);
 
if (atomic_read(&tlb->mm->mm_users) < 2 ||
cpumask_equal(mm_cpumask(tlb->mm),
-- 
2.7.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 2/2] powerpc/kvm: Disable in-kernel MPIC emulation for PREEMPT_RT_FULL

2015-05-14 Thread Sebastian Andrzej Siewior
* Bogdan Purcareata | 2015-04-24 15:53:13 [+]:

>While converting the openpic emulation code to use a raw_spinlock_t enables
>guests to run on RT, there's still a performance issue. For interrupts sent in

>Signed-off-by: Bogdan Purcareata 

Applied with Scott's Acked-by

Sebastian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/2] powerpc/kvm: Enable running guests on RT Linux

2015-05-14 Thread Sebastian Andrzej Siewior
* Bogdan Purcareata | 2015-04-24 15:53:11 [+]:

>0001: converts the openpic spinlock to a raw spinlock, in order to circumvent
>0002: disables in-kernel MPIC emulation for guest running on RT, in order to

Scott, I'm asking here for your explicit Acked-by on those two patches.
That you want it *that* way. We have now a nice explanation in #1 :)

Sebastian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/2] powerpc/kvm: Enable running guests on RT Linux

2015-02-26 Thread Sebastian Andrzej Siewior
On 02/26/2015 02:02 PM, Paolo Bonzini wrote:
> 
> 
> On 24/02/2015 00:27, Scott Wood wrote:
>> This isn't a host PIC driver.  It's guest PIC emulation, some of which
>> is indeed not suitable for a rawlock (in particular, openpic_update_irq
>> which loops on the number of vcpus, with a loop body that calls
>> IRQ_check() which loops over all pending IRQs).
> 
> The question is what behavior is wanted of code that isn't quite
> RT-ready.  What is preferred, bugs or bad latency?
> 
> If the answer is bad latency (which can be avoided simply by not running
> KVM on a RT kernel in production), patch 1 can be applied.  If the
can be applied *but* makes no difference if applied or not.

> answer is bugs, patch 1 is not upstream material.
> 
> I myself prefer to have bad latency; if something takes a spinlock in
> atomic context, that spinlock should be raw.  If it hurts (latency),
> don't do it (use the affected code).

The problem, that is fixed by this s/spin_lock/raw_spin_lock/, exists
only in -RT. There is no change upstream. In general we fix such things
in -RT first and forward the patches upstream if possible. This convert
thingy would be possible.
Bug fixing comes before latency no matter if RT or not. Converting
every lock into a rawlock is not always the answer.
Last thing I read from Scott is that he is not entirely sure if this is
the right approach or not and patch #1 was not acked-by him either.

So for now I wait for Scott's feedback and maybe a backtrace :)

> 
> Paolo

Sebastian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/2] powerpc/kvm: Enable running guests on RT Linux

2015-02-25 Thread Sebastian Andrzej Siewior
* Scott Wood | 2015-02-23 17:27:31 [-0600]:

>This isn't a host PIC driver.  It's guest PIC emulation, some of which
>is indeed not suitable for a rawlock (in particular, openpic_update_irq
>which loops on the number of vcpus, with a loop body that calls
>IRQ_check() which loops over all pending IRQs).  The vcpu limits are a
>temporary bandaid to avoid the worst latencies, but I'm still skeptical
>about this being upstream material.

Okay.

>-Scott

Sebastian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/2] powerpc/kvm: Enable running guests on RT Linux

2015-02-20 Thread Sebastian Andrzej Siewior
On 02/20/2015 04:10 PM, Paolo Bonzini wrote:
> On 20/02/2015 16:06, Sebastian Andrzej Siewior wrote:
>> On 02/20/2015 03:57 PM, Paolo Bonzini wrote:
> 
>>> Yes, but large latencies just mean the code has to be rewritten (x86
>>> doesn't anymore do event injection in an atomic regions for example).
>>> Until it is, using raw_spin_lock is correct.
>>
>> It does not sound like it. It sounds more like disabling interrupts to
>> get things run faster and then limit it on a different corner to not
>> blow up everything.
> 
> "This patchset enables running KVM SMP guests with external interrupts
> on an underlying RT-enabled Linux. Previous to this patch, a guest with
> in-kernel MPIC emulation could easily panic the kernel due to preemption
> when delivering IPIs and external interrupts, because of the openpic
> spinlock becoming a sleeping mutex on PREEMPT_RT_FULL Linux".
> 
>> Max latencies was decreased "Max latency (us)  7062" and that
>> is why this is done? For 8 us and possible DoS in case there are too
>> many cpus?
> 
> My understanding is that:
> 
> 1) netperf can get you a BUG KVM, and raw_spinlock fixes that

May I please see a backtrace with context tracking which states where
the interrupts / preemption gets disabled and where the lock was taken?

I'm not totally against this patch I just want to make sure this is not
a blind raw conversation to shup up the warning the kernel throws.

> 2) cyclictest did not trigger the BUG, and you can also get reduced
> latency from using raw_spinlock.
> 
> I think we agree that (2) is not a factor in accepting the patch.
good :)

> 
> Paolo
> 
Sebastian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/2] powerpc/kvm: Enable running guests on RT Linux

2015-02-20 Thread Sebastian Andrzej Siewior
On 02/20/2015 03:57 PM, Paolo Bonzini wrote:
> 
> 
> On 20/02/2015 15:54, Sebastian Andrzej Siewior wrote:
>> Usually you see "scheduling while atomic" on -RT and convert them to
>> raw locks if it is appropriate.
>>
>> Bogdan wrote in 2/2 that he needs to limit the number of CPUs in oder
>> not cause a DoS and large latencies in the host. I haven't seen an
>> answer to my why question. Because if the conversation leads to
>> large latencies in the host then it does not look right.
>>
>> Each host PIC has a rawlock and does mostly just mask/unmask and the
>> raw lock makes sure the value written is not mixed up due to
>> preemption.
>> This hardly increase latencies because the "locked" path is very short.
>> If this conversation leads to higher latencies then the locked path is
>> too long and hardly suitable to become a rawlock.
> 
> Yes, but large latencies just mean the code has to be rewritten (x86
> doesn't anymore do event injection in an atomic regions for example).
> Until it is, using raw_spin_lock is correct.

It does not sound like it. It sounds more like disabling interrupts to
get things run faster and then limit it on a different corner to not
blow up everything.
Max latencies was decreased "Max latency (us)  7062" and that
is why this is done? For 8 us and possible DoS in case there are too
many cpus?

> Paolo
> 

Sebastian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/2] powerpc/kvm: Enable running guests on RT Linux

2015-02-20 Thread Sebastian Andrzej Siewior
On 02/20/2015 03:12 PM, Paolo Bonzini wrote:
>> Thomas, what is the usual approach for patches like this? Do you take
>> them into your rt tree or should they get integrated to upstream?
> 
> Patch 1 is definitely suitable for upstream, that's the reason why we
> have raw_spin_lock vs. raw_spin_unlock.

raw_spin_lock were introduced in c2f21ce2e31286a0a32 ("locking:
Implement new raw_spinlock). They are used in context which runs with
IRQs off - especially on -RT. This includes usually interrupt
controllers and related core-code pieces.

Usually you see "scheduling while atomic" on -RT and convert them to
raw locks if it is appropriate.

Bogdan wrote in 2/2 that he needs to limit the number of CPUs in oder
not cause a DoS and large latencies in the host. I haven't seen an
answer to my why question. Because if the conversation leads to
large latencies in the host then it does not look right.

Each host PIC has a rawlock and does mostly just mask/unmask and the
raw lock makes sure the value written is not mixed up due to
preemption.
This hardly increase latencies because the "locked" path is very short.
If this conversation leads to higher latencies then the locked path is
too long and hardly suitable to become a rawlock.

> Paolo
> 

Sebastian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/2] powerpc/kvm: Limit MAX_VCPUS for guests running on RT Linux

2015-02-18 Thread Sebastian Andrzej Siewior
On 02/18/2015 10:32 AM, Bogdan Purcareata wrote:
> Due to the introduction of the raw_spinlock for the KVM openpic, guests with a
> high number of VCPUs may induce great latencies on the underlying RT Linux
> system (e.g. cyclictest reports latencies of ~15ms for guests with 24 VCPUs).
> This can be further aggravated by sending a lot of external interrupts to the
> guest.
> 
> A malicious app can abuse this scenario, causing a DoS of the host Linux.
> Until the KVM openpic code is refactored to use finer lock granularity, impose
> a limitation on the number of VCPUs a guest can have when running on a
> PREEMPT_RT_FULL system with KVM_MPIC emulation.

How is this possible? You take the raw lock, write a register, release
the raw lock. How can the guest lockup the host? Is this write blocking
in guest?

Sebastian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/2] irqdomain: add support for creating a continous mapping

2014-11-04 Thread Sebastian Andrzej Siewior
On 11/03/2014 11:18 PM, Scott Wood wrote:
> Is it really necessary for the virqs to be contiguous?  How is the
> availability of multiple MSIs communicated to the driver?  Is there an
> example of a driver that currently uses multiple MSIs?

I used this in PCI and after enabling 2^x, I allocated an continuous
block starting at virq. There is no way of communicating this in any
other way. The first irq number is saved in pci_dev->irq and the
remaining have to follow.

Take a look at  drivers/ata/ahci.c and you will see:
- pci_msi_vec_count() for number of irqs
- pci_enable_msi_block() to allocate the number of irqs
- pci_enable_msi() (no X)
- ahci_host_activate() does request_threaded_irq() for pdev->irq and
  loops "number of ports" times which matches number of number of irqs
  (or is less then).

Sebastian

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: FSL MSI Mapping

2014-10-31 Thread Sebastian Andrzej Siewior
On 10/31/2014 09:12 AM, Johannes Thumshirn wrote:
> On Thu, Oct 30, 2014 at 02:51:57PM +1100, Michael Ellerman wrote:
>> Why would you not use MSI-X ?

If I'm not mistaken, a PCI-E requirement is to support MSI but MSI-X is
optional. And with MSI you can have multiple interrupts per single
device (power of two, max 32).
Now imagine you have a FPGA with two (or more) different devices in it
and you don't want them to share the IRQ line (for $reason). I bet the
HW developer sees MSI and MSI-X and the former is for some reason
cheaper compared to MSI-X and it fits the needs. So…

> 
> Does anyone (especially the original author) have any objections if I re-spin
> the patch series?

I didn't get around to address the review comments so it did not went
in and there was no v2. Feel free re-do the series.

Sebastian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 1/2 v2] irqdomain: add support for creating a continous mapping

2014-02-21 Thread Sebastian Andrzej Siewior
A MSI device may have multiple interrupts. That means that the
interrupts numbers should be continuos so that pdev->irq refers to the
first interrupt, pdev->irq + 1 to the second and so on.
This patch adds support for continuous allocation of virqs for a range
of hwirqs. The function is based on irq_create_mapping() but due to the
number argument there is very little in common now.

Cc: Thomas Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
---
Scott, this is what you suggested. I must admit, it does not look that
bad. It is just compile tested.

v1.v2:
- use irq_create_mapping_block() for irq_create_mapping()

 include/linux/irqdomain.h | 10 --
 kernel/irq/irqdomain.c| 87 ++-
 2 files changed, 72 insertions(+), 25 deletions(-)

diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index c983ed1..8b09a6b 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -171,12 +171,18 @@ extern int irq_domain_associate(struct irq_domain 
*domain, unsigned int irq,
irq_hw_number_t hwirq);
 extern void irq_domain_associate_many(struct irq_domain *domain,
  unsigned int irq_base,
  irq_hw_number_t hwirq_base, int count);
 
-extern unsigned int irq_create_mapping(struct irq_domain *host,
-  irq_hw_number_t hwirq);
+extern unsigned int irq_create_mapping_block(struct irq_domain *host,
+   irq_hw_number_t hwirq, unsigned int num);
+static inline unsigned int irq_create_mapping(struct irq_domain *host,
+   irq_hw_number_t hwirq)
+{
+   return irq_create_mapping_block(host, hwirq, 1);
+}
+
 extern void irq_dispose_mapping(unsigned int virq);
 
 /**
  * irq_linear_revmap() - Find a linux irq from a hw irq number.
  * @domain: domain owning this hardware interrupt
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index cf68bb3..cdc6627 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -372,67 +372,108 @@ unsigned int irq_create_direct_mapping(struct irq_domain 
*domain)
 
return virq;
 }
 EXPORT_SYMBOL_GPL(irq_create_direct_mapping);
 
+static int irq_check_continuous_mapping(struct irq_domain *domain,
+   irq_hw_number_t hwirq, unsigned int num)
+{
+   int virq;
+   int i;
+
+   virq = irq_find_mapping(domain, hwirq);
+
+   for (i = 1; i < num; i++) {
+   unsigned int next;
+
+   next = irq_find_mapping(domain, hwirq + i);
+   if (next == virq + i)
+   continue;
+
+   pr_err("irq: invalid partial mapping. First hwirq %lu maps to "
+   "%d and \n", hwirq, virq);
+   pr_err("irq: +%d hwirq (%lu) maps to %d but should be %d.\n",
+   i, hwirq + i, next, virq + i);
+   return -EINVAL;
+   }
+
+   pr_debug("-> existing mapping on virq %d\n", virq);
+   return virq;
+}
+
 /**
- * irq_create_mapping() - Map a hardware interrupt into linux irq space
+ * irq_create_mapping_block() - Map multiple hardware interrupts
  * @domain: domain owning this hardware interrupt or NULL for default domain
  * @hwirq: hardware irq number in that domain space
+ * @num: number of interrupts
+ *
+ * Maps a hwirq to a newly allocated virq. If num is greater than 1 then num
+ * hwirqs (hwirq … hwirq + num - 1) will be mapped and virq will be  
continuous.
+ * Returns the first linux virq number.
  *
- * Only one mapping per hardware interrupt is permitted. Returns a linux
- * irq number.
  * If the sense/trigger is to be specified, set_irq_type() should be called
  * on the number returned from that call.
  */
-unsigned int irq_create_mapping(struct irq_domain *domain,
-   irq_hw_number_t hwirq)
+unsigned int irq_create_mapping_block(struct irq_domain *domain,
+   irq_hw_number_t hwirq, unsigned int num)
 {
-   unsigned int hint;
int virq;
+   int i;
+   int node;
+   unsigned int hint;
 
-   pr_debug("irq_create_mapping(0x%p, 0x%lx)\n", domain, hwirq);
+   pr_debug("%s(0x%p, 0x%lx, %d)\n", __func__, domain, hwirq, num);
 
/* Look for default domain if nececssary */
-   if (domain == NULL)
+   if (!domain && num == 1)
domain = irq_default_domain;
+
if (domain == NULL) {
WARN(1, "%s(, %lx) called with NULL domain\n", __func__, hwirq);
return 0;
}
pr_debug("-> using domain @%p\n", domain);
 
/* Check if mapping already exists */
-   virq = irq_find_mapping(domain, hwirq);
-   if (virq) {
-   pr_debug("-> existing mapping on virq %d\n", virq);
-   return virq

Re: [PATCH 1/2] irqdomain: add support for creating a continous mapping

2014-02-21 Thread Sebastian Andrzej Siewior
On 02/20/2014 10:06 PM, Scott Wood wrote:
> On Thu, 2014-02-20 at 21:53 +0100, Sebastian Andrzej Siewior wrote:
>> A MSI device may have multiple interrupts. That means that the
>> interrupts numbers should be continuos so that pdev->irq refers to the
>> first interrupt, pdev->irq + 1 to the second and so on.
>> This patch adds support for continuous allocation of virqs for a range
>> of hwirqs. The function is based on irq_create_mapping() but due to the
>> number argument there is very little in common now.
> 
> Would it make sense to turn irq_create_mapping() into a call to
> irq_create_mapping_block() with num = 1?

There are few things different and I didn't like it. Now I that I look
at it again I've found a bug the way irq_find_mapping() is called.
Let me redo it with your suggestion and we will see if it makes sense.

> -Scott
> 
> 
Sebastian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/2] powerpc: msi: fsl: add support for multiple MSI interrupts

2014-02-20 Thread Sebastian Andrzej Siewior
This patch pushes the check for nvec > 1 && MSI into the check function
of each MSI driver except for FSL's MSI where the functionality is
added.

Cc: Arnd Bergmann 
Cc: Gavin Shan 
Cc: Alexey Kardashevskiy 
Cc: Alistair Popple 
Cc: Brian King 
Cc: Anton Blanchard 
Cc: Scott Wood 
Cc: Minghuan Lian 
Cc: Kumar Gala 
Signed-off-by: Sebastian Andrzej Siewior 
---
 arch/powerpc/kernel/msi.c  |  4 
 arch/powerpc/platforms/cell/axon_msi.c |  3 +++
 arch/powerpc/platforms/powernv/pci.c   |  2 ++
 arch/powerpc/platforms/pseries/msi.c   |  3 +++
 arch/powerpc/platforms/wsp/msi.c   |  8 
 arch/powerpc/sysdev/fsl_msi.c  | 29 +
 arch/powerpc/sysdev/mpic_pasemi_msi.c  |  3 ++-
 arch/powerpc/sysdev/mpic_u3msi.c   |  2 ++
 arch/powerpc/sysdev/ppc4xx_msi.c   |  2 ++
 9 files changed, 43 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/kernel/msi.c b/arch/powerpc/kernel/msi.c
index 8bbc12d..46b1470 100644
--- a/arch/powerpc/kernel/msi.c
+++ b/arch/powerpc/kernel/msi.c
@@ -20,10 +20,6 @@ int arch_msi_check_device(struct pci_dev* dev, int nvec, int 
type)
return -ENOSYS;
}
 
-   /* PowerPC doesn't support multiple MSI yet */
-   if (type == PCI_CAP_ID_MSI && nvec > 1)
-   return 1;
-
if (ppc_md.msi_check_device) {
pr_debug("msi: Using platform check routine.\n");
return ppc_md.msi_check_device(dev, nvec, type);
diff --git a/arch/powerpc/platforms/cell/axon_msi.c 
b/arch/powerpc/platforms/cell/axon_msi.c
index 85825b5..6e592ed 100644
--- a/arch/powerpc/platforms/cell/axon_msi.c
+++ b/arch/powerpc/platforms/cell/axon_msi.c
@@ -204,6 +204,9 @@ static int axon_msi_check_device(struct pci_dev *dev, int 
nvec, int type)
if (!find_msi_translator(dev))
return -ENODEV;
 
+   if (type == PCI_CAP_ID_MSI && nvec > 1)
+   return 1;
+
return 0;
 }
 
diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index 95633d7..1d08040 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -54,6 +54,8 @@ static int pnv_msi_check_device(struct pci_dev* pdev, int 
nvec, int type)
 
if (pdn && pdn->force_32bit_msi && !phb->msi32_support)
return -ENODEV;
+   if (type == PCI_CAP_ID_MSI && nvec > 1)
+   return 1;
 
return (phb && phb->msi_bmp.bitmap) ? 0 : -ENODEV;
 }
diff --git a/arch/powerpc/platforms/pseries/msi.c 
b/arch/powerpc/platforms/pseries/msi.c
index 0c882e8..ad5e766 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -339,6 +339,9 @@ static int rtas_msi_check_device(struct pci_dev *pdev, int 
nvec, int type)
 {
int quota, rc;
 
+   if (type == PCI_CAP_ID_MSI && nvec > 1)
+   return 1;
+
if (type == PCI_CAP_ID_MSIX)
rc = check_req_msix(pdev, nvec);
else
diff --git a/arch/powerpc/platforms/wsp/msi.c b/arch/powerpc/platforms/wsp/msi.c
index 380882f..0cabd46 100644
--- a/arch/powerpc/platforms/wsp/msi.c
+++ b/arch/powerpc/platforms/wsp/msi.c
@@ -21,6 +21,13 @@
 #define MSI_ADDR_320xul
 #define MSI_ADDR_640x1000ul
 
+static int wsp_msi_check_device(struct pci_dev *dev, int nvec, int type)
+{
+   if (type == PCI_CAP_ID_MSI && nvec > 1)
+   return 1;
+   return 0;
+}
+
 int wsp_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 {
struct pci_controller *phb;
@@ -98,5 +105,6 @@ void wsp_setup_phb_msi(struct pci_controller *phb)
out_be64(phb->cfg_data + PCIE_REG_IODA_DATA0, 1ull << 63);
 
ppc_md.setup_msi_irqs = wsp_setup_msi_irqs;
+   ppc_md.msi_check_device = wsp_msi_check_device;
ppc_md.teardown_msi_irqs = wsp_teardown_msi_irqs;
 }
diff --git a/arch/powerpc/sysdev/fsl_msi.c b/arch/powerpc/sysdev/fsl_msi.c
index 77efbae..f07840f 100644
--- a/arch/powerpc/sysdev/fsl_msi.c
+++ b/arch/powerpc/sysdev/fsl_msi.c
@@ -123,13 +123,19 @@ static void fsl_teardown_msi_irqs(struct pci_dev *pdev)
struct fsl_msi *msi_data;
 
list_for_each_entry(entry, &pdev->msi_list, list) {
+   int num;
+   int i;
+
if (entry->irq == NO_IRQ)
continue;
msi_data = irq_get_chip_data(entry->irq);
irq_set_msi_desc(entry->irq, NULL);
+   num = 1 << entry->msi_attrib.multiple;
msi_bitmap_free_hwirqs(&msi_data->bitmap,
-  virq_to_hw(entry->irq), 1);
-   irq_dispose_mapping(entry->irq);
+  virq_to_hw(entry->irq), num);
+
+   for (i = 0; i < num; i++)
+  

multiple MSI interrupts on FSL-MPIC

2014-02-20 Thread Sebastian Andrzej Siewior
This series adds support for multiple MSI interrupts on FSL's MPIC. The
code has been tested on v3.2 with custom FPGA device on PCIe and then
forward ported. Patch #1 has been created from scratch during porting,
patch #2 almost applied as-it.

Sebastian

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 1/2] irqdomain: add support for creating a continous mapping

2014-02-20 Thread Sebastian Andrzej Siewior
A MSI device may have multiple interrupts. That means that the
interrupts numbers should be continuos so that pdev->irq refers to the
first interrupt, pdev->irq + 1 to the second and so on.
This patch adds support for continuous allocation of virqs for a range
of hwirqs. The function is based on irq_create_mapping() but due to the
number argument there is very little in common now.

Cc: Thomas Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
---
 include/linux/irqdomain.h |  2 ++
 kernel/irq/irqdomain.c| 61 +++
 2 files changed, 63 insertions(+)

diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index c983ed1..21d0635 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -175,6 +175,8 @@ extern void irq_domain_associate_many(struct irq_domain 
*domain,
 
 extern unsigned int irq_create_mapping(struct irq_domain *host,
   irq_hw_number_t hwirq);
+extern unsigned int irq_create_mapping_block(struct irq_domain *host,
+   irq_hw_number_t hwirq, unsigned int num);
 extern void irq_dispose_mapping(unsigned int virq);
 
 /**
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index cf68bb3..323d417 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -433,6 +433,67 @@ unsigned int irq_create_mapping(struct irq_domain *domain,
 EXPORT_SYMBOL_GPL(irq_create_mapping);
 
 /**
+ * irq_create_mapping_block() - Map multiple hardware interrupts
+ * @domain: domain owning this hardware interrupt or NULL for default domain
+ * @hwirq: hardware irq number in that domain space
+ * @num: number of interrupts
+ *
+ * Maps a hwirq to a newly allocated virq. Num should be greater than 1 so num
+ * hwirqs (hwirq … hwirq + num - 1) will be mapped which and virq will be
+ * continuous. Returns the first linux virq number.
+ *
+ * If the sense/trigger is to be specified, set_irq_type() should be called
+ * on the number returned from that call.
+ */
+unsigned int irq_create_mapping_block(struct irq_domain *domain,
+   irq_hw_number_t hwirq, unsigned int num)
+{
+   int virq;
+   int i;
+
+   pr_debug("%s(0x%p, 0x%lx) %d\n", __func__, domain, hwirq, num);
+
+   if (num < 2)
+   return 0;
+
+   /* Look for default domain if nececssary */
+   if (domain == NULL) {
+   WARN(1, "%s(, %lx) called with NULL domain\n", __func__, hwirq);
+   return 0;
+   }
+   for (i = 0; i < num; i++) {
+   /* Check if mapping already exists */
+   virq = irq_find_mapping(domain, hwirq);
+   if (virq != NO_IRQ) {
+   if (i == 0) {
+   pr_debug("-> existing mapping on virq %d\n",
+   virq);
+   return virq;
+   }
+   pr_err("irq: hwirq %ld has no mapping but hwirq %ld "
+   "maps to virq %d. This can't be a block\n",
+   hwirq, hwirq + i, virq);
+   return -EINVAL;
+   }
+   }
+
+   /* Allocate a virtual interrupt number */
+   virq = irq_alloc_descs_from(1, num, of_node_to_nid(domain->of_node));
+   if (virq <= 0) {
+   pr_debug("-> virq allocation failed\n");
+   return 0;
+   }
+
+   irq_domain_associate_many(domain, virq, hwirq, num);
+
+   pr_debug("irqs %lu…%lu on domain %s mapped to virtual irqs %u…%u\n",
+   hwirq, hwirq + num - 1, of_node_full_name(domain->of_node),
+   virq, virq + num - 1);
+
+   return virq;
+}
+
+/**
  * irq_create_strict_mappings() - Map a range of hw irqs to fixed linux irqs
  * @domain: domain owning the interrupt range
  * @irq_base: beginning of linux IRQ range
-- 
1.9.0.rc3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 2/2] powerpc: add "force config cmd line" Kconfig option

2014-02-20 Thread Sebastian Andrzej Siewior
powerpc uses early_init_dt_scan_chosen() from common fdt code. By
enabling this option, the common code can take the built in
command line over the one that is comming from bootloader / DT.

Signed-off-by: Sebastian Andrzej Siewior 
---
 arch/powerpc/Kconfig | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 957bf34..957d3e5 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -618,6 +618,15 @@ config CMDLINE
  some command-line options at build time by entering them here.  In
  most cases you will need to specify the root device here.
 
+config CMDLINE_FORCE
+   bool "Always use the default kernel command string"
+   depends on CMDLINE_BOOL
+   help
+ Always use the default kernel command string, even if the boot
+ loader passes other arguments to the kernel.
+ This is useful if you cannot or don't want to change the
+ command-line options your boot loader passes to the kernel.
+
 config EXTRA_TARGETS
string "Additional default image types"
help
-- 
1.9.0.rc3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 1/2] powerpc: 85xx rdb: move np pointer to avoid builderror

2014-02-20 Thread Sebastian Andrzej Siewior
If CONFIG_UCC_GETH or CONFIG_SERIAL_QE is not defined then we get a
warning about an used variable which leads to a build error.

Signed-off-by: Sebastian Andrzej Siewior 
---
 arch/powerpc/platforms/85xx/mpc85xx_rdb.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/85xx/mpc85xx_rdb.c 
b/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
index e15bdd1..232a6a7 100644
--- a/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
+++ b/arch/powerpc/platforms/85xx/mpc85xx_rdb.c
@@ -86,10 +86,6 @@ void __init mpc85xx_rdb_pic_init(void)
  */
 static void __init mpc85xx_rdb_setup_arch(void)
 {
-#ifdef CONFIG_QUICC_ENGINE
-   struct device_node *np;
-#endif
-
if (ppc_md.progress)
ppc_md.progress("mpc85xx_rdb_setup_arch()", 0);
 
@@ -101,6 +97,7 @@ static void __init mpc85xx_rdb_setup_arch(void)
mpc85xx_qe_init();
 #if defined(CONFIG_UCC_GETH) || defined(CONFIG_SERIAL_QE)
if (machine_is(p1025_rdb)) {
+   struct device_node *np;
 
struct ccsr_guts __iomem *guts;
 
-- 
1.9.0.rc3

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v3] powerpc: Fix PTE page address mismatch in pgtable ctor/dtor

2013-12-15 Thread Sebastian Andrzej Siewior
* Hong H. Pham | 2013-12-07 09:06:33 [-0500]:

>On PPC32, only SMP kernels are affected.
>
>On PPC64, only SMP kernels with 4K page size are affected.

$ uname -a
Linux mpc8536-1 3.12.1-rt3-00281-g9de268d #76 SMP PREEMPT RT Fri Nov 22 
16:53:05 CET 2013 ppc GNU/Linux
$ uptime
 22:01:10 up 22 days, 21:01,  1 user,  load average: 443.08, 563.59, 586.20

This is from a mpc8536 box. The high load comes from a hackbench that was
running for quite some time. Are Book-E (CONFIG_PPC_BOOK3E_MMU without
CONFIG_PPC_BOOK3E set) not affected or is this bug not present if a SMP
kernel is booted on a UP machine?

Sebastian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] Kind of revert "powerpc: 52xx: provide a default in mpc52xx_irqhost_map()"

2013-10-04 Thread Sebastian Andrzej Siewior
On 10/04/2013 05:37 PM, Wolfram Sang wrote:
> This more or less reverts commit 6391f697d4892a6f233501beea553e13f7745a23.
> Instead of adding an unneeded 'default', mark the variable to prevent
> the false positive 'uninitialized var'. The other change (fixing the
> printout) needs revert, too. We want to know WHICH critical irq failed,
> not which level it had.
> 
> Signed-off-by: Wolfram Sang 
> Cc: Sebastian Andrzej Siewior 
> Cc: Anatolij Gustschin 

Acked-by: Sebastian Andrzej Siewior 

Sebastian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] Revert "powerpc: 52xx: provide a default in mpc52xx_irqhost_map()"

2013-10-02 Thread Sebastian Andrzej Siewior
On 10/01/2013 09:03 PM, Wolfram Sang wrote:
> 
> Yup. But I just remembered a better solution:
> 
> From: Wolfram Sang  Subject: [PATCH] ppc:
> mpc52xx: silence false positive from old GCC
> 
> So people can compile with -Werror.
> 
> Signed-off-by: Wolfram Sang  --- 
> arch/powerpc/platforms/52xx/mpc52xx_pic.c |2 +- 1 file changed,
> 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/52xx/mpc52xx_pic.c
> b/arch/powerpc/platforms/52xx/mpc52xx_pic.c index b89ef65..2898b73
> 100644 --- a/arch/powerpc/platforms/52xx/mpc52xx_pic.c +++
> b/arch/powerpc/platforms/52xx/mpc52xx_pic.c @@ -340,7 +340,7 @@
> static int mpc52xx_irqhost_map(struct irq_domain *h, unsigned int
> virq, { int l1irq; int l2irq; -   struct irq_chip *irqchip; + struct
> irq_chip *uninitialized_var(irqchip); void *hndlr; int type; u32
> reg;
> 
> 
> uninitialized_var was created for exactly that purpose IIRC.

Yup, looks good, thanks.

> 
> Thanks,
> 
> Wolfram
> 

Sebastian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] Revert "powerpc: 52xx: provide a default in mpc52xx_irqhost_map()"

2013-10-01 Thread Sebastian Andrzej Siewior
On 10/01/2013 11:11 AM, Wolfram Sang wrote:
> Hi,

Hi Wolfram,

> Well, if you insist, I'd prefer the following patch.
> 
> From: Wolfram Sang  Subject: [PATCH] ppc:
> mpc52xx: silence false positive from old GCC
> 
> So people can compile with -Werror (RT patchset).

Why do you mention the RT patch set here? Doesn't the vanila tree gets
compiled with -Werror as well?

> Signed-off-by: Wolfram Sang  --- 
> arch/powerpc/platforms/52xx/mpc52xx_pic.c |2 +- 1 file changed,
> 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/52xx/mpc52xx_pic.c
> b/arch/powerpc/platforms/52xx/mpc52xx_pic.c index b89ef65..ad3c9b0
> 100644 --- a/arch/powerpc/platforms/52xx/mpc52xx_pic.c +++
> b/arch/powerpc/platforms/52xx/mpc52xx_pic.c @@ -340,7 +340,7 @@
> static int mpc52xx_irqhost_map(struct irq_domain *h, unsigned int
> virq, { int l1irq; int l2irq; -   struct irq_chip *irqchip; + struct
> irq_chip *irqchip = NULL; /* pet old compilers */

That would probably work, too. I would drop that comment but then
someone might clean that up :P

> void *hndlr; int type; u32 reg;
> 
>> Why miss leading code? Default here does the same as unhandled
>> and crit where it does nothing.
> 
> People not realizing 'default' is a no-op might wonder why unknown 
> levels are mapped to critical.

I see. And what would you suggest as default in case we would have an
additional bit?

> 
>> Any why do you want to see l2irq since it was not in the case
>> statement? l2 holds the number, l1 the level.
> 
> We know which level it was, since the printout is only for that
> level. We probably want to know which requested IRQ was causing
> this, so we can fix the assorted driver. Otherwise we only know
> that some critical IRQ was requested somewhere.

Hmmm. I assumed that critical / SDMA / … are interrupt numbers but they
are seem not be. In that case I guess l2 is more important. l1 kinda
looks important since it is the value in the switch case which failed
but since it can only hold one possible value, I guess your info is
better :)

> 
> Thanks,
> 
> Wolfram
> 
Sebastian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] Revert "powerpc: 52xx: provide a default in mpc52xx_irqhost_map()"

2013-10-01 Thread Sebastian Andrzej Siewior
On 10/01/2013 09:26 AM, Wolfram Sang wrote:
> This reverts commit 6391f697d4892a6f233501beea553e13f7745a23. The
> compiler warning it wants to fix does not appear with my gcc 4.6.2. IMO
> we don't need superfluous (and here even misleading) code to make old
> compilers happy. Fixing the printout was bogus, too. We want to know
> WHICH critical irq failed, not which level it had.

According to minimal Doc*/Changes minimal gcc is 3.2. Mine was 4.3.5.
Why miss leading code? Default here does the same as unhandled and crit
where it does nothing. Any why do you want to see l2irq since it was
not in the case statement? l2 holds the number, l1 the level.

> Signed-off-by: Wolfram Sang 
> Cc: Sebastian Andrzej Siewior 
> Cc: Anatolij Gustschin 
> ---
> 
> Have I been on CC when the original patch was sent?

You were but your email bounced. I wasn't aware of this new email
address you are using now.

> 
>  arch/powerpc/platforms/52xx/mpc52xx_pic.c |3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/52xx/mpc52xx_pic.c 
> b/arch/powerpc/platforms/52xx/mpc52xx_pic.c
> index b69221b..b89ef65 100644
> --- a/arch/powerpc/platforms/52xx/mpc52xx_pic.c
> +++ b/arch/powerpc/platforms/52xx/mpc52xx_pic.c
> @@ -373,9 +373,8 @@ static int mpc52xx_irqhost_map(struct irq_domain *h, 
> unsigned int virq,
>   case MPC52xx_IRQ_L1_PERP: irqchip = &mpc52xx_periph_irqchip; break;
>   case MPC52xx_IRQ_L1_SDMA: irqchip = &mpc52xx_sdma_irqchip; break;
>   case MPC52xx_IRQ_L1_CRIT:
> - default:
>   pr_warn("%s: Critical IRQ #%d is unsupported! Nopping it.\n",
> - __func__, l1irq);
> + __func__, l2irq);
>   irq_set_chip(virq, &no_irq_chip);
>   return 0;
>   }
> 

Sebastian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


powerpc: Mark low level irq handlers NO_THREAD

2013-08-12 Thread Sebastian Andrzej Siewior
From: Thomas Gleixner 

These low level handlers cannot be threaded. Mark them NO_THREAD

Reported-by: leroy christophe 
Tested-by: leroy christophe 
Signed-off-by: Thomas Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
---

This patch has been posted on Feb 13, 2013 and nobody responded back then.

 arch/powerpc/platforms/8xx/m8xx_setup.c |1 +
 arch/powerpc/sysdev/cpm1.c  |1 +
 2 files changed, 2 insertions(+)

--- a/arch/powerpc/platforms/8xx/m8xx_setup.c
+++ b/arch/powerpc/platforms/8xx/m8xx_setup.c
@@ -43,6 +43,7 @@ static irqreturn_t timebase_interrupt(in
 
 static struct irqaction tbint_irqaction = {
.handler = timebase_interrupt,
+   .flags = IRQF_NO_THREAD,
.name = "tbint",
 };
 
--- a/arch/powerpc/sysdev/cpm1.c
+++ b/arch/powerpc/sysdev/cpm1.c
@@ -120,6 +120,7 @@ static irqreturn_t cpm_error_interrupt(i
 
 static struct irqaction cpm_error_irqaction = {
.handler = cpm_error_interrupt,
+   .flags = IRQF_NO_THREAD,
.name = "error",
 };
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] powerpc: 52xx: provide a default in mpc52xx_irqhost_map()

2013-08-12 Thread Sebastian Andrzej Siewior
My gcc-4.3.5 fails to compile due to:

|cc1: warnings being treated as errors
|arch/powerpc/platforms/52xx/mpc52xx_pic.c: In function ‘mpc52xx_irqhost_map’:
|arch/powerpc/platforms/52xx/mpc52xx_pic.c:343: error: ‘irqchip’ may be used 
uninitialized in this function

since commit e34298c ("powerpc: 52xx: nop out unsupported critical
IRQs"). This warning is complete crap since only values 0…3 are possible
which are checked but gcc fails to understand that. I wouldn't care much
but since this is compiled with -Werror I made this patch.
While add it, I replaced the warning from l2irq to l1irq since this is
the number that is evaluated.

Signed-off-by: Sebastian Andrzej Siewior 
---
 arch/powerpc/platforms/52xx/mpc52xx_pic.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/52xx/mpc52xx_pic.c 
b/arch/powerpc/platforms/52xx/mpc52xx_pic.c
index b89ef65..b69221b 100644
--- a/arch/powerpc/platforms/52xx/mpc52xx_pic.c
+++ b/arch/powerpc/platforms/52xx/mpc52xx_pic.c
@@ -373,8 +373,9 @@ static int mpc52xx_irqhost_map(struct irq_domain *h, 
unsigned int virq,
case MPC52xx_IRQ_L1_PERP: irqchip = &mpc52xx_periph_irqchip; break;
case MPC52xx_IRQ_L1_SDMA: irqchip = &mpc52xx_sdma_irqchip; break;
case MPC52xx_IRQ_L1_CRIT:
+   default:
pr_warn("%s: Critical IRQ #%d is unsupported! Nopping it.\n",
-   __func__, l2irq);
+   __func__, l1irq);
irq_set_chip(virq, &no_irq_chip);
return 0;
}
-- 
1.8.4.rc1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] of: Specify initrd location using 64-bit

2013-07-01 Thread Sebastian Andrzej Siewior
On 07/01/2013 09:59 AM, Geert Uytterhoeven wrote:
> That's actual the original reason: DT has it as 64 bit, and passes it to a
> 32 bit kernel when running in 32 bit mode without PAE.

And I think the DT code should check if the u64 fits in phys_addr_t and
if does not it should write an error message and act like no initrd was
specified (instead of passing "something" to the architecture).

> 
> Gr{oetje,eeting}s,
> 
> Geert
> 

Sebastian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] of: Specify initrd location using 64-bit

2013-07-01 Thread Sebastian Andrzej Siewior
On 06/29/2013 01:43 AM, Santosh Shilimkar wrote:
> 
> Sebastian,
> 
> Apart from waste of 32bit, what is the other concern you
> have ?

You pass a u64 as a physical address which is represented in other
parts of the kernel (for a good reason) by phys_addr_t.

> I really want to converge on this patch because it
> has been a open ended discussion for quite some time. Does
> that really break any thing on x86 or your concern is more
> from semantics of the physical address.
You want to have your code in so you can continue with your work, that
is okay. The other two arguments why u64 here is a good thing was "due
to what I said earlier" and "+1" and I don't have the time to look
that up.

There should be no problems on x86 if this goes in as it is now.

But think about this: What happens if you boot your ARM device without
PAE and your initrd is in the upper region? If you are lucky the kernel
looks at a different place where it also has a read permission, notices
nothing sane is there, writes a message and continues. And if it is not
allowed to read? It is clearly the user's fault for booting a non-PAE
kernel.

> 
> Thanks for help.
> 
> Regards,
> Santosh

Sebastian
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


  1   2   >