RE: [PATCH v2] panic: move bust_spinlocks(0) after console_flush_on_panic() to avoid deadlocks

2018-06-21 Thread Hoeun Ryu
+CC
sergey.senozhatsky.w...@gmail.com
pmla...@suse.com

Please review this patch.

> -Original Message-
> From: Hoeun Ryu [mailto:hoeun@lge.com.com]
> Sent: Tuesday, June 05, 2018 11:19 AM
> To: Andrew Morton ; Kees Cook
> ; Andi Kleen ; Borislav Petkov
> ; Thomas Gleixner ; Steven Rostedt
(VMware)
> 
> Cc: Josh Poimboeuf ; Tejun Heo ;
> Vitaly Kuznetsov ; Hoeun Ryu ;
> linux-kernel@vger.kernel.org
> Subject: [PATCH v2] panic: move bust_spinlocks(0) after
> console_flush_on_panic() to avoid deadlocks
> 
> From: Hoeun Ryu 
> 
>  Many console device drivers hold the uart_port->lock spinlock with irq
> disabled
> (using spin_lock_irqsave()) while the device drivers are writing
> characters to their
> devices, but the device drivers just try to hold the spin lock (using
> spin_trylock_irqsave()) instead if "oops_in_progress" is equal or greater
> than 1 to
> avoid deadlocks.
> 
>  There is a case ocurring a deadlock related to the lock and
> oops_in_progress. If the
> kernel lockup detector calls panic() while the device driver is holding
> the lock,
> it can cause a deadlock because panic() eventually calls console_unlock()
> and tries
> to hold the lock. Here is an example.
> 
>  CPU0
> 
>  local_irq_save()
>  .
>  foo()
>  bar()
>  .// foo() + bar() takes long
time
>  printk()
>console_unlock()
>  call_console_drivers()   // close to watchdog
threshold
>some_slow_console_device_write()   // device driver
code
>  spin_lock_irqsave(uart->lock)// acquire uart spin
lock
>slow-write()
>  watchdog_overflow_callback() // watchdog expired and call
> panic()
>panic()
>  bust_spinlocks(0)// now, oops_in_progress = 0
>console_flush_on_panic()
>  console_unlock()
>call_console_drivers()
>  some_slow_console_device_write()
>spin_lock_irqsave(uart->lock)
> deadlock  // we can use
> spin_trylock_irqsave()
> 
>  console_flush_on_panic() is called in panic() and it eventually holds the
> uart
> lock but the lock is held by the preempted CPU (the same CPU in NMI
> context) and it is
> a deadlock.
>  By moving bust_spinlocks(0) after console_flush_on_panic(), let the
> console device
> drivers think the Oops is still in progress to call spin_trylock_irqsave()
> instead of
> spin_lock_irqsave() to avoid the deadlock.
> 
>  CPU0
> 
>  watchdog_overflow_callback() // watchdog expired and
> call panic()
>panic()
>  console_flush_on_panic()
>console_unlock()
>  call_console_drivers()
>some_slow_console_device_write()
>  spin_trylock_irqsave(uart->lock) // oops_in_progress = 1
>   use trylock, no deadlock
>  bust_spinlocks(0)// now,
oops_in_progress =
> 0
> 
> Signed-off-by: Hoeun Ryu 
> ---
>  v2: fix commit message on the reason of a deadlock, no code change.
> 
>  kernel/panic.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 42e4874..b4063b6 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -233,8 +233,6 @@ void panic(const char *fmt, ...)
>   if (_crash_kexec_post_notifiers)
>   __crash_kexec(NULL);
> 
> - bust_spinlocks(0);
> -
>   /*
>* We may have ended up stopping the CPU holding the lock (in
>* smp_send_stop()) while still having some valuable data in the
> console
> @@ -246,6 +244,8 @@ void panic(const char *fmt, ...)
>   debug_locks_off();
>   console_flush_on_panic();
> 
> + bust_spinlocks(0);
> +
>   if (!panic_blink)
>   panic_blink = no_blink;
> 
> --
> 2.1.4



RE: [PATCH v2] panic: move bust_spinlocks(0) after console_flush_on_panic() to avoid deadlocks

2018-06-21 Thread Hoeun Ryu
+CC
sergey.senozhatsky.w...@gmail.com
pmla...@suse.com

Please review this patch.

> -Original Message-
> From: Hoeun Ryu [mailto:hoeun@lge.com.com]
> Sent: Tuesday, June 05, 2018 11:19 AM
> To: Andrew Morton ; Kees Cook
> ; Andi Kleen ; Borislav Petkov
> ; Thomas Gleixner ; Steven Rostedt
(VMware)
> 
> Cc: Josh Poimboeuf ; Tejun Heo ;
> Vitaly Kuznetsov ; Hoeun Ryu ;
> linux-kernel@vger.kernel.org
> Subject: [PATCH v2] panic: move bust_spinlocks(0) after
> console_flush_on_panic() to avoid deadlocks
> 
> From: Hoeun Ryu 
> 
>  Many console device drivers hold the uart_port->lock spinlock with irq
> disabled
> (using spin_lock_irqsave()) while the device drivers are writing
> characters to their
> devices, but the device drivers just try to hold the spin lock (using
> spin_trylock_irqsave()) instead if "oops_in_progress" is equal or greater
> than 1 to
> avoid deadlocks.
> 
>  There is a case ocurring a deadlock related to the lock and
> oops_in_progress. If the
> kernel lockup detector calls panic() while the device driver is holding
> the lock,
> it can cause a deadlock because panic() eventually calls console_unlock()
> and tries
> to hold the lock. Here is an example.
> 
>  CPU0
> 
>  local_irq_save()
>  .
>  foo()
>  bar()
>  .// foo() + bar() takes long
time
>  printk()
>console_unlock()
>  call_console_drivers()   // close to watchdog
threshold
>some_slow_console_device_write()   // device driver
code
>  spin_lock_irqsave(uart->lock)// acquire uart spin
lock
>slow-write()
>  watchdog_overflow_callback() // watchdog expired and call
> panic()
>panic()
>  bust_spinlocks(0)// now, oops_in_progress = 0
>console_flush_on_panic()
>  console_unlock()
>call_console_drivers()
>  some_slow_console_device_write()
>spin_lock_irqsave(uart->lock)
> deadlock  // we can use
> spin_trylock_irqsave()
> 
>  console_flush_on_panic() is called in panic() and it eventually holds the
> uart
> lock but the lock is held by the preempted CPU (the same CPU in NMI
> context) and it is
> a deadlock.
>  By moving bust_spinlocks(0) after console_flush_on_panic(), let the
> console device
> drivers think the Oops is still in progress to call spin_trylock_irqsave()
> instead of
> spin_lock_irqsave() to avoid the deadlock.
> 
>  CPU0
> 
>  watchdog_overflow_callback() // watchdog expired and
> call panic()
>panic()
>  console_flush_on_panic()
>console_unlock()
>  call_console_drivers()
>some_slow_console_device_write()
>  spin_trylock_irqsave(uart->lock) // oops_in_progress = 1
>   use trylock, no deadlock
>  bust_spinlocks(0)// now,
oops_in_progress =
> 0
> 
> Signed-off-by: Hoeun Ryu 
> ---
>  v2: fix commit message on the reason of a deadlock, no code change.
> 
>  kernel/panic.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 42e4874..b4063b6 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -233,8 +233,6 @@ void panic(const char *fmt, ...)
>   if (_crash_kexec_post_notifiers)
>   __crash_kexec(NULL);
> 
> - bust_spinlocks(0);
> -
>   /*
>* We may have ended up stopping the CPU holding the lock (in
>* smp_send_stop()) while still having some valuable data in the
> console
> @@ -246,6 +244,8 @@ void panic(const char *fmt, ...)
>   debug_locks_off();
>   console_flush_on_panic();
> 
> + bust_spinlocks(0);
> +
>   if (!panic_blink)
>   panic_blink = no_blink;
> 
> --
> 2.1.4



RE: [PATCH] panic: move bust_spinlocks(0) after console_flush_on_panic() to avoid deadlocks

2018-06-04 Thread Hoeun Ryu
I misunderstood the cause of a deadlock.
I sent v2 fixing the commit message about the reason of the deadlock.
Please ignore this and review v2.
Thank you.

> -Original Message-
> From: Steven Rostedt [mailto:rost...@goodmis.org]
> Sent: Tuesday, June 05, 2018 10:44 AM
> To: Hoeun Ryu 
> Cc: Andrew Morton ; Kees Cook
> ; Borislav Petkov ; Andi Kleen
> ; Josh Poimboeuf ; Hoeun Ryu
> ; linux-kernel@vger.kernel.org; Tejun Heo
> ; Vitaly Kuznetsov 
> Subject: Re: [PATCH] panic: move bust_spinlocks(0) after
> console_flush_on_panic() to avoid deadlocks
> 
> On Mon,  4 Jun 2018 14:45:57 +0900
> Hoeun Ryu  wrote:
> 
> > From: Hoeun Ryu 
> >
> >  Many console device drivers hold the uart_port->lock spinlock with irq
> enabled
> > (using spin_lock()) while the device drivers are writing characters to
> their devices,
> > but the device drivers just try to hold the spin lock (using
> spin_trylock()) if
> > "oops_in_progress" is equal or greater than 1 to avoid deadlocks.
> >
> >  There is a case ocurring a deadlock related to the lock and
> oops_in_progress. A CPU
> > could be stopped by smp_send_stop() while it was holding the port lock
> because irq was
> > enabled. Once a CPU stops, it doesn't respond interrupts anymore and the
> lock stays
> > locked forever.
> >
> >  console_flush_on_panic() is called during panic() and it eventually
> holds the uart
> > lock but the lock is held by another stopped CPU and it is a deadlock.
> By moving
> > bust_spinlocks(0) after console_flush_on_panic(), let the console device
> drivers
> > think the Oops is still in progress to call spin_trylock() instead of
> spin_lock() to
> > avoid the deadlock.
> >
> > Signed-off-by: Hoeun Ryu 
> > ---
> >  kernel/panic.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/panic.c b/kernel/panic.c
> > index 42e4874..b4063b6 100644
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -233,8 +233,6 @@ void panic(const char *fmt, ...)
> > if (_crash_kexec_post_notifiers)
> > __crash_kexec(NULL);
> >
> > -   bust_spinlocks(0);
> > -
> > /*
> >  * We may have ended up stopping the CPU holding the lock (in
> >  * smp_send_stop()) while still having some valuable data in the
> console
> > @@ -246,6 +244,8 @@ void panic(const char *fmt, ...)
> > debug_locks_off();
> > console_flush_on_panic();
> >
> > +   bust_spinlocks(0);
> 
> Added a few more to Cc. This looks like it could have subtle
> side-effects. I'd like those that have been touching the code around
> here to have a look.
> 
> -- Steve
> 
> 
> > +
> > if (!panic_blink)
> > panic_blink = no_blink;
> >



RE: [PATCH] panic: move bust_spinlocks(0) after console_flush_on_panic() to avoid deadlocks

2018-06-04 Thread Hoeun Ryu
I misunderstood the cause of a deadlock.
I sent v2 fixing the commit message about the reason of the deadlock.
Please ignore this and review v2.
Thank you.

> -Original Message-
> From: Steven Rostedt [mailto:rost...@goodmis.org]
> Sent: Tuesday, June 05, 2018 10:44 AM
> To: Hoeun Ryu 
> Cc: Andrew Morton ; Kees Cook
> ; Borislav Petkov ; Andi Kleen
> ; Josh Poimboeuf ; Hoeun Ryu
> ; linux-kernel@vger.kernel.org; Tejun Heo
> ; Vitaly Kuznetsov 
> Subject: Re: [PATCH] panic: move bust_spinlocks(0) after
> console_flush_on_panic() to avoid deadlocks
> 
> On Mon,  4 Jun 2018 14:45:57 +0900
> Hoeun Ryu  wrote:
> 
> > From: Hoeun Ryu 
> >
> >  Many console device drivers hold the uart_port->lock spinlock with irq
> enabled
> > (using spin_lock()) while the device drivers are writing characters to
> their devices,
> > but the device drivers just try to hold the spin lock (using
> spin_trylock()) if
> > "oops_in_progress" is equal or greater than 1 to avoid deadlocks.
> >
> >  There is a case ocurring a deadlock related to the lock and
> oops_in_progress. A CPU
> > could be stopped by smp_send_stop() while it was holding the port lock
> because irq was
> > enabled. Once a CPU stops, it doesn't respond interrupts anymore and the
> lock stays
> > locked forever.
> >
> >  console_flush_on_panic() is called during panic() and it eventually
> holds the uart
> > lock but the lock is held by another stopped CPU and it is a deadlock.
> By moving
> > bust_spinlocks(0) after console_flush_on_panic(), let the console device
> drivers
> > think the Oops is still in progress to call spin_trylock() instead of
> spin_lock() to
> > avoid the deadlock.
> >
> > Signed-off-by: Hoeun Ryu 
> > ---
> >  kernel/panic.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/panic.c b/kernel/panic.c
> > index 42e4874..b4063b6 100644
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -233,8 +233,6 @@ void panic(const char *fmt, ...)
> > if (_crash_kexec_post_notifiers)
> > __crash_kexec(NULL);
> >
> > -   bust_spinlocks(0);
> > -
> > /*
> >  * We may have ended up stopping the CPU holding the lock (in
> >  * smp_send_stop()) while still having some valuable data in the
> console
> > @@ -246,6 +244,8 @@ void panic(const char *fmt, ...)
> > debug_locks_off();
> > console_flush_on_panic();
> >
> > +   bust_spinlocks(0);
> 
> Added a few more to Cc. This looks like it could have subtle
> side-effects. I'd like those that have been touching the code around
> here to have a look.
> 
> -- Steve
> 
> 
> > +
> > if (!panic_blink)
> > panic_blink = no_blink;
> >



[PATCH v2] panic: move bust_spinlocks(0) after console_flush_on_panic() to avoid deadlocks

2018-06-04 Thread Hoeun Ryu
From: Hoeun Ryu 

 Many console device drivers hold the uart_port->lock spinlock with irq disabled
(using spin_lock_irqsave()) while the device drivers are writing characters to 
their
devices, but the device drivers just try to hold the spin lock (using
spin_trylock_irqsave()) instead if "oops_in_progress" is equal or greater than 
1 to
avoid deadlocks.

 There is a case ocurring a deadlock related to the lock and oops_in_progress. 
If the
kernel lockup detector calls panic() while the device driver is holding the 
lock,
it can cause a deadlock because panic() eventually calls console_unlock() and 
tries
to hold the lock. Here is an example.

 CPU0

 local_irq_save()
 .
 foo()
 bar()
 .  // foo() + bar() takes long time
 printk()
   console_unlock()
 call_console_drivers() // close to watchdog threshold
   some_slow_console_device_write() // device driver code
 spin_lock_irqsave(uart->lock)  // acquire uart spin lock
   slow-write()
 watchdog_overflow_callback()   // watchdog expired and call 
panic()
   panic()
 bust_spinlocks(0)  // now, oops_in_progress = 0
   console_flush_on_panic()
 console_unlock()
   call_console_drivers()
 some_slow_console_device_write()
   spin_lock_irqsave(uart->lock)
    deadlock// we can use 
spin_trylock_irqsave()

 console_flush_on_panic() is called in panic() and it eventually holds the uart
lock but the lock is held by the preempted CPU (the same CPU in NMI context) 
and it is
a deadlock.
 By moving bust_spinlocks(0) after console_flush_on_panic(), let the console 
device
drivers think the Oops is still in progress to call spin_trylock_irqsave() 
instead of
spin_lock_irqsave() to avoid the deadlock.

 CPU0

 watchdog_overflow_callback()   // watchdog expired and call 
panic()
   panic()
 console_flush_on_panic()
   console_unlock()
 call_console_drivers()
   some_slow_console_device_write()
 spin_trylock_irqsave(uart->lock)   // oops_in_progress = 1
  use trylock, no deadlock
 bust_spinlocks(0)  // now, oops_in_progress = 0

Signed-off-by: Hoeun Ryu 
---
 v2: fix commit message on the reason of a deadlock, no code change.

 kernel/panic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index 42e4874..b4063b6 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -233,8 +233,6 @@ void panic(const char *fmt, ...)
if (_crash_kexec_post_notifiers)
__crash_kexec(NULL);
 
-   bust_spinlocks(0);
-
/*
 * We may have ended up stopping the CPU holding the lock (in
 * smp_send_stop()) while still having some valuable data in the console
@@ -246,6 +244,8 @@ void panic(const char *fmt, ...)
debug_locks_off();
console_flush_on_panic();
 
+   bust_spinlocks(0);
+
if (!panic_blink)
panic_blink = no_blink;
 
-- 
2.1.4



[PATCH v2] panic: move bust_spinlocks(0) after console_flush_on_panic() to avoid deadlocks

2018-06-04 Thread Hoeun Ryu
From: Hoeun Ryu 

 Many console device drivers hold the uart_port->lock spinlock with irq disabled
(using spin_lock_irqsave()) while the device drivers are writing characters to 
their
devices, but the device drivers just try to hold the spin lock (using
spin_trylock_irqsave()) instead if "oops_in_progress" is equal or greater than 
1 to
avoid deadlocks.

 There is a case ocurring a deadlock related to the lock and oops_in_progress. 
If the
kernel lockup detector calls panic() while the device driver is holding the 
lock,
it can cause a deadlock because panic() eventually calls console_unlock() and 
tries
to hold the lock. Here is an example.

 CPU0

 local_irq_save()
 .
 foo()
 bar()
 .  // foo() + bar() takes long time
 printk()
   console_unlock()
 call_console_drivers() // close to watchdog threshold
   some_slow_console_device_write() // device driver code
 spin_lock_irqsave(uart->lock)  // acquire uart spin lock
   slow-write()
 watchdog_overflow_callback()   // watchdog expired and call 
panic()
   panic()
 bust_spinlocks(0)  // now, oops_in_progress = 0
   console_flush_on_panic()
 console_unlock()
   call_console_drivers()
 some_slow_console_device_write()
   spin_lock_irqsave(uart->lock)
    deadlock// we can use 
spin_trylock_irqsave()

 console_flush_on_panic() is called in panic() and it eventually holds the uart
lock but the lock is held by the preempted CPU (the same CPU in NMI context) 
and it is
a deadlock.
 By moving bust_spinlocks(0) after console_flush_on_panic(), let the console 
device
drivers think the Oops is still in progress to call spin_trylock_irqsave() 
instead of
spin_lock_irqsave() to avoid the deadlock.

 CPU0

 watchdog_overflow_callback()   // watchdog expired and call 
panic()
   panic()
 console_flush_on_panic()
   console_unlock()
 call_console_drivers()
   some_slow_console_device_write()
 spin_trylock_irqsave(uart->lock)   // oops_in_progress = 1
  use trylock, no deadlock
 bust_spinlocks(0)  // now, oops_in_progress = 0

Signed-off-by: Hoeun Ryu 
---
 v2: fix commit message on the reason of a deadlock, no code change.

 kernel/panic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index 42e4874..b4063b6 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -233,8 +233,6 @@ void panic(const char *fmt, ...)
if (_crash_kexec_post_notifiers)
__crash_kexec(NULL);
 
-   bust_spinlocks(0);
-
/*
 * We may have ended up stopping the CPU holding the lock (in
 * smp_send_stop()) while still having some valuable data in the console
@@ -246,6 +244,8 @@ void panic(const char *fmt, ...)
debug_locks_off();
console_flush_on_panic();
 
+   bust_spinlocks(0);
+
if (!panic_blink)
panic_blink = no_blink;
 
-- 
2.1.4



[PATCH] panic: move bust_spinlocks(0) after console_flush_on_panic() to avoid deadlocks

2018-06-03 Thread Hoeun Ryu
From: Hoeun Ryu 

 Many console device drivers hold the uart_port->lock spinlock with irq enabled
(using spin_lock()) while the device drivers are writing characters to their 
devices,
but the device drivers just try to hold the spin lock (using spin_trylock()) if
"oops_in_progress" is equal or greater than 1 to avoid deadlocks.

 There is a case ocurring a deadlock related to the lock and oops_in_progress. 
A CPU
could be stopped by smp_send_stop() while it was holding the port lock because 
irq was
enabled. Once a CPU stops, it doesn't respond interrupts anymore and the lock 
stays
locked forever.

 console_flush_on_panic() is called during panic() and it eventually holds the 
uart
lock but the lock is held by another stopped CPU and it is a deadlock. By moving
bust_spinlocks(0) after console_flush_on_panic(), let the console device drivers
think the Oops is still in progress to call spin_trylock() instead of 
spin_lock() to
avoid the deadlock.

Signed-off-by: Hoeun Ryu 
---
 kernel/panic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index 42e4874..b4063b6 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -233,8 +233,6 @@ void panic(const char *fmt, ...)
if (_crash_kexec_post_notifiers)
__crash_kexec(NULL);
 
-   bust_spinlocks(0);
-
/*
 * We may have ended up stopping the CPU holding the lock (in
 * smp_send_stop()) while still having some valuable data in the console
@@ -246,6 +244,8 @@ void panic(const char *fmt, ...)
debug_locks_off();
console_flush_on_panic();
 
+   bust_spinlocks(0);
+
if (!panic_blink)
panic_blink = no_blink;
 
-- 
2.1.4



[PATCH] panic: move bust_spinlocks(0) after console_flush_on_panic() to avoid deadlocks

2018-06-03 Thread Hoeun Ryu
From: Hoeun Ryu 

 Many console device drivers hold the uart_port->lock spinlock with irq enabled
(using spin_lock()) while the device drivers are writing characters to their 
devices,
but the device drivers just try to hold the spin lock (using spin_trylock()) if
"oops_in_progress" is equal or greater than 1 to avoid deadlocks.

 There is a case ocurring a deadlock related to the lock and oops_in_progress. 
A CPU
could be stopped by smp_send_stop() while it was holding the port lock because 
irq was
enabled. Once a CPU stops, it doesn't respond interrupts anymore and the lock 
stays
locked forever.

 console_flush_on_panic() is called during panic() and it eventually holds the 
uart
lock but the lock is held by another stopped CPU and it is a deadlock. By moving
bust_spinlocks(0) after console_flush_on_panic(), let the console device drivers
think the Oops is still in progress to call spin_trylock() instead of 
spin_lock() to
avoid the deadlock.

Signed-off-by: Hoeun Ryu 
---
 kernel/panic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index 42e4874..b4063b6 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -233,8 +233,6 @@ void panic(const char *fmt, ...)
if (_crash_kexec_post_notifiers)
__crash_kexec(NULL);
 
-   bust_spinlocks(0);
-
/*
 * We may have ended up stopping the CPU holding the lock (in
 * smp_send_stop()) while still having some valuable data in the console
@@ -246,6 +244,8 @@ void panic(const char *fmt, ...)
debug_locks_off();
console_flush_on_panic();
 
+   bust_spinlocks(0);
+
if (!panic_blink)
panic_blink = no_blink;
 
-- 
2.1.4



[PATCH v2] printk: make printk_safe_flush safe in NMI context by skipping flushing

2018-06-03 Thread Hoeun Ryu
From: Hoeun Ryu 

 Make printk_safe_flush() safe in NMI context.
nmi_trigger_cpumask_backtrace() can be called in NMI context. For example the
function is called in watchdog_overflow_callback() if the flag of hardlockup
backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and
watchdog_overflow_callback() function is called in NMI context on some
architectures.
 Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually tries
to lock logbuf_lock in vprintk_emit() that might be already be part
of another non-nmi context on the same CPU or a soft- or hard-lockup on another
CPU. The example of deadlock can be

 CPU0
 local_irq_save();
 for (;;)
   req = blk_peek_request(q);
   if (unlikely(!scsi_device_online(sdev)))
 printk()
   vprintk_emit()
 console_unlock()
   logbuf_lock_irqsave()
 slow-serial-console-write()// close to watchdog threshold
   watchdog_overflow_callback()
 trigger_allbutself_cpu_backtrace()
   printk_safe_flush()
 vprintk_emit()
   logbuf_lock_irqsave()
    deadlock

and some other cases.
 This patch prevents a deadlock in printk_safe_flush() in NMI context. It makes
sure that we continue and eventually call printk_safe_flush_on_panic() from 
panic()
that has better chances to succeed.
 There is a risk that logbuf_lock was not part of a soft- or dead-lockup and we
might just loose the messages. But then there is a high chance that irq_work 
will
get called and the messages will get flushed the normal way.

Signed-off-by: Hoeun Ryu 
Suggested-by: Petr Mladek 
Suggested-by: Sergey Senozhatsky 
---
 v2: fix comments in commit message and code. no change in code itself.

 kernel/printk/printk_safe.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 3e3c200..3b5c660 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -254,6 +254,17 @@ void printk_safe_flush(void)
 {
int cpu;
 
+   /*
+* Just avoid a deadlock here.
+* It makes sure that we continue and eventually call
+* printk_safe_flush_on_panic() from panic() that has better chances to 
succeed.
+* There is a risk that logbuf_lock was not part of a soft- or 
dead-lockup and
+* we might just loose the messages. But then there is a high chance 
that
+* irq_work will get called and the messages will get flushed the 
normal way.
+*/
+   if (this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK)
+   return;
+
for_each_possible_cpu(cpu) {
 #ifdef CONFIG_PRINTK_NMI
__printk_safe_flush(_cpu(nmi_print_seq, cpu).work);
-- 
2.1.4



[PATCH v2] printk: make printk_safe_flush safe in NMI context by skipping flushing

2018-06-03 Thread Hoeun Ryu
From: Hoeun Ryu 

 Make printk_safe_flush() safe in NMI context.
nmi_trigger_cpumask_backtrace() can be called in NMI context. For example the
function is called in watchdog_overflow_callback() if the flag of hardlockup
backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and
watchdog_overflow_callback() function is called in NMI context on some
architectures.
 Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually tries
to lock logbuf_lock in vprintk_emit() that might be already be part
of another non-nmi context on the same CPU or a soft- or hard-lockup on another
CPU. The example of deadlock can be

 CPU0
 local_irq_save();
 for (;;)
   req = blk_peek_request(q);
   if (unlikely(!scsi_device_online(sdev)))
 printk()
   vprintk_emit()
 console_unlock()
   logbuf_lock_irqsave()
 slow-serial-console-write()// close to watchdog threshold
   watchdog_overflow_callback()
 trigger_allbutself_cpu_backtrace()
   printk_safe_flush()
 vprintk_emit()
   logbuf_lock_irqsave()
    deadlock

and some other cases.
 This patch prevents a deadlock in printk_safe_flush() in NMI context. It makes
sure that we continue and eventually call printk_safe_flush_on_panic() from 
panic()
that has better chances to succeed.
 There is a risk that logbuf_lock was not part of a soft- or dead-lockup and we
might just loose the messages. But then there is a high chance that irq_work 
will
get called and the messages will get flushed the normal way.

Signed-off-by: Hoeun Ryu 
Suggested-by: Petr Mladek 
Suggested-by: Sergey Senozhatsky 
---
 v2: fix comments in commit message and code. no change in code itself.

 kernel/printk/printk_safe.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 3e3c200..3b5c660 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -254,6 +254,17 @@ void printk_safe_flush(void)
 {
int cpu;
 
+   /*
+* Just avoid a deadlock here.
+* It makes sure that we continue and eventually call
+* printk_safe_flush_on_panic() from panic() that has better chances to 
succeed.
+* There is a risk that logbuf_lock was not part of a soft- or 
dead-lockup and
+* we might just loose the messages. But then there is a high chance 
that
+* irq_work will get called and the messages will get flushed the 
normal way.
+*/
+   if (this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK)
+   return;
+
for_each_possible_cpu(cpu) {
 #ifdef CONFIG_PRINTK_NMI
__printk_safe_flush(_cpu(nmi_print_seq, cpu).work);
-- 
2.1.4



RE: [PATCH] printk: make printk_safe_flush safe in NMI context by skipping flushing

2018-05-31 Thread Hoeun Ryu


> -Original Message-
> From: Petr Mladek [mailto:pmla...@suse.com]
> Sent: Wednesday, May 30, 2018 5:32 PM
> To: Sergey Senozhatsky 
> Cc: Hoeun Ryu ; Sergey Senozhatsky
> ; Steven Rostedt ;
> Hoeun Ryu ; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] printk: make printk_safe_flush safe in NMI context by
> skipping flushing
> 
> On Tue 2018-05-29 21:13:15, Sergey Senozhatsky wrote:
> > On (05/29/18 11:51), Hoeun Ryu wrote:
> > >  Make printk_safe_flush() safe in NMI context.
> > > nmi_trigger_cpumask_backtrace() can be called in NMI context. For
> example the
> > > function is called in watchdog_overflow_callback() if the flag of
> hardlockup
> > > backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and
> > > watchdog_overflow_callback() function is called in NMI context on some
> > > architectures.
> > >  Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace()
> eventually tries
> > > to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be
> already locked in
> > > preempted contexts (task or irq in this case) or by other CPUs and it
> may cause
> 
> The sentence "logbuf_lock can be already locked in preempted contexts"
> does not
> make much sense. It is a spin lock. It means that both interrupts and
> preemption are disabled.
> 

I'd like to say that the preempting context is NMI,
so the preempted contexts could be task/irq/bh contexts on the same CPU.

> I would change it to something like:
> 
> "Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually
> tries
> to lock logbuf_lock in vprintk_emit() that might be already be part
> of a soft- or hard-lockup on another CPU."
> 

It looks more clear.
But I'd modify "be part of a soft- or hard-lockup on another CPU." to
"be part of another non-nmi context on the same CPU or a soft- or
hard-lockup on another CPU."

How about it?

> 
> > > deadlocks.
> > >  By making printk_safe_flush() safe in NMI context, the backtrace
> triggering CPU
> > > just skips flushing if the lock is not avaiable in NMI context. The
> messages in
> > > per-cpu nmi buffer of the backtrace triggering CPU can be lost if the
> CPU is in
> > > hard lockup (because irq is disabled here) but if panic() is not
> called. The
> > > flushing can be delayed by the next irq work in normal cases.
> 
> I somehow miss there a motivation why the current state is better than
> the previous. It looks like we exchange the risk of a deadlock with
> a risk of loosing the messages.
> 
> I see it the following way:
> 
> "This patch prevents a deadlock in printk_safe_flush() in NMI
> context. It makes sure that we continue and eventually call
> printk_safe_flush_on_panic() from panic() that has better
> chances to succeed.
> 
> There is a risk that logbuf_lock was not part of a soft- or
> dead-lockup and we might just loose the messages. But then there is a high
> chance that irq_work will get called and the messages will get flushed
> the normal way."
> 
> 
> > Any chance we can add more info to the commit message? E.g. backtraces
> > which would describe "how" is this possible (like the one I posted in
> > another email). Just to make it more clear.
> 
> I agree that a backtrace would be helpful. But it is not a must to
> have from my point of view.
> 
> The patch itself looks good to me.
> 
> Best Regards,
> Petr



RE: [PATCH] printk: make printk_safe_flush safe in NMI context by skipping flushing

2018-05-31 Thread Hoeun Ryu


> -Original Message-
> From: Petr Mladek [mailto:pmla...@suse.com]
> Sent: Wednesday, May 30, 2018 5:32 PM
> To: Sergey Senozhatsky 
> Cc: Hoeun Ryu ; Sergey Senozhatsky
> ; Steven Rostedt ;
> Hoeun Ryu ; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] printk: make printk_safe_flush safe in NMI context by
> skipping flushing
> 
> On Tue 2018-05-29 21:13:15, Sergey Senozhatsky wrote:
> > On (05/29/18 11:51), Hoeun Ryu wrote:
> > >  Make printk_safe_flush() safe in NMI context.
> > > nmi_trigger_cpumask_backtrace() can be called in NMI context. For
> example the
> > > function is called in watchdog_overflow_callback() if the flag of
> hardlockup
> > > backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and
> > > watchdog_overflow_callback() function is called in NMI context on some
> > > architectures.
> > >  Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace()
> eventually tries
> > > to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be
> already locked in
> > > preempted contexts (task or irq in this case) or by other CPUs and it
> may cause
> 
> The sentence "logbuf_lock can be already locked in preempted contexts"
> does not
> make much sense. It is a spin lock. It means that both interrupts and
> preemption are disabled.
> 

I'd like to say that the preempting context is NMI,
so the preempted contexts could be task/irq/bh contexts on the same CPU.

> I would change it to something like:
> 
> "Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually
> tries
> to lock logbuf_lock in vprintk_emit() that might be already be part
> of a soft- or hard-lockup on another CPU."
> 

It looks more clear.
But I'd modify "be part of a soft- or hard-lockup on another CPU." to
"be part of another non-nmi context on the same CPU or a soft- or
hard-lockup on another CPU."

How about it?

> 
> > > deadlocks.
> > >  By making printk_safe_flush() safe in NMI context, the backtrace
> triggering CPU
> > > just skips flushing if the lock is not avaiable in NMI context. The
> messages in
> > > per-cpu nmi buffer of the backtrace triggering CPU can be lost if the
> CPU is in
> > > hard lockup (because irq is disabled here) but if panic() is not
> called. The
> > > flushing can be delayed by the next irq work in normal cases.
> 
> I somehow miss there a motivation why the current state is better than
> the previous. It looks like we exchange the risk of a deadlock with
> a risk of loosing the messages.
> 
> I see it the following way:
> 
> "This patch prevents a deadlock in printk_safe_flush() in NMI
> context. It makes sure that we continue and eventually call
> printk_safe_flush_on_panic() from panic() that has better
> chances to succeed.
> 
> There is a risk that logbuf_lock was not part of a soft- or
> dead-lockup and we might just loose the messages. But then there is a high
> chance that irq_work will get called and the messages will get flushed
> the normal way."
> 
> 
> > Any chance we can add more info to the commit message? E.g. backtraces
> > which would describe "how" is this possible (like the one I posted in
> > another email). Just to make it more clear.
> 
> I agree that a backtrace would be helpful. But it is not a must to
> have from my point of view.
> 
> The patch itself looks good to me.
> 
> Best Regards,
> Petr



RE: [PATCH] printk: make printk_safe_flush safe in NMI context by skipping flushing

2018-05-31 Thread Hoeun Ryu
I appreciate the detailed correction.
I will reflect the corrections in the next patch.
plus, the explanation in the code will be fixed.

> -Original Message-
> From: Petr Mladek [mailto:pmla...@suse.com]
> Sent: Wednesday, May 30, 2018 5:32 PM
> To: Sergey Senozhatsky 
> Cc: Hoeun Ryu ; Sergey Senozhatsky
> ; Steven Rostedt ;
> Hoeun Ryu ; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] printk: make printk_safe_flush safe in NMI context by
> skipping flushing
> 
> On Tue 2018-05-29 21:13:15, Sergey Senozhatsky wrote:
> > On (05/29/18 11:51), Hoeun Ryu wrote:
> > >  Make printk_safe_flush() safe in NMI context.
> > > nmi_trigger_cpumask_backtrace() can be called in NMI context. For
> example the
> > > function is called in watchdog_overflow_callback() if the flag of
> hardlockup
> > > backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and
> > > watchdog_overflow_callback() function is called in NMI context on some
> > > architectures.
> > >  Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace()
> eventually tries
> > > to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be
> already locked in
> > > preempted contexts (task or irq in this case) or by other CPUs and it
> may cause
> 
> The sentence "logbuf_lock can be already locked in preempted contexts"
> does not
> make much sense. It is a spin lock. It means that both interrupts and
> preemption are disabled.
> 
> I would change it to something like:
> 
> "Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually
> tries
> to lock logbuf_lock in vprintk_emit() that might be already be part
> of a soft- or hard-lockup on another CPU."
> 
> 
> > > deadlocks.
> > >  By making printk_safe_flush() safe in NMI context, the backtrace
> triggering CPU
> > > just skips flushing if the lock is not avaiable in NMI context. The
> messages in
> > > per-cpu nmi buffer of the backtrace triggering CPU can be lost if the
> CPU is in
> > > hard lockup (because irq is disabled here) but if panic() is not
> called. The
> > > flushing can be delayed by the next irq work in normal cases.
> 
> I somehow miss there a motivation why the current state is better than
> the previous. It looks like we exchange the risk of a deadlock with
> a risk of loosing the messages.
> 
> I see it the following way:
> 
> "This patch prevents a deadlock in printk_safe_flush() in NMI
> context. It makes sure that we continue and eventually call
> printk_safe_flush_on_panic() from panic() that has better
> chances to succeed.
> 
> There is a risk that logbuf_lock was not part of a soft- or
> dead-lockup and we might just loose the messages. But then there is a high
> chance that irq_work will get called and the messages will get flushed
> the normal way."
> 
> 
> > Any chance we can add more info to the commit message? E.g. backtraces
> > which would describe "how" is this possible (like the one I posted in
> > another email). Just to make it more clear.
> 
> I agree that a backtrace would be helpful. But it is not a must to
> have from my point of view.
> 
> The patch itself looks good to me.
> 
> Best Regards,
> Petr



RE: [PATCH] printk: make printk_safe_flush safe in NMI context by skipping flushing

2018-05-31 Thread Hoeun Ryu
I appreciate the detailed correction.
I will reflect the corrections in the next patch.
plus, the explanation in the code will be fixed.

> -Original Message-
> From: Petr Mladek [mailto:pmla...@suse.com]
> Sent: Wednesday, May 30, 2018 5:32 PM
> To: Sergey Senozhatsky 
> Cc: Hoeun Ryu ; Sergey Senozhatsky
> ; Steven Rostedt ;
> Hoeun Ryu ; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] printk: make printk_safe_flush safe in NMI context by
> skipping flushing
> 
> On Tue 2018-05-29 21:13:15, Sergey Senozhatsky wrote:
> > On (05/29/18 11:51), Hoeun Ryu wrote:
> > >  Make printk_safe_flush() safe in NMI context.
> > > nmi_trigger_cpumask_backtrace() can be called in NMI context. For
> example the
> > > function is called in watchdog_overflow_callback() if the flag of
> hardlockup
> > > backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and
> > > watchdog_overflow_callback() function is called in NMI context on some
> > > architectures.
> > >  Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace()
> eventually tries
> > > to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be
> already locked in
> > > preempted contexts (task or irq in this case) or by other CPUs and it
> may cause
> 
> The sentence "logbuf_lock can be already locked in preempted contexts"
> does not
> make much sense. It is a spin lock. It means that both interrupts and
> preemption are disabled.
> 
> I would change it to something like:
> 
> "Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually
> tries
> to lock logbuf_lock in vprintk_emit() that might be already be part
> of a soft- or hard-lockup on another CPU."
> 
> 
> > > deadlocks.
> > >  By making printk_safe_flush() safe in NMI context, the backtrace
> triggering CPU
> > > just skips flushing if the lock is not avaiable in NMI context. The
> messages in
> > > per-cpu nmi buffer of the backtrace triggering CPU can be lost if the
> CPU is in
> > > hard lockup (because irq is disabled here) but if panic() is not
> called. The
> > > flushing can be delayed by the next irq work in normal cases.
> 
> I somehow miss there a motivation why the current state is better than
> the previous. It looks like we exchange the risk of a deadlock with
> a risk of loosing the messages.
> 
> I see it the following way:
> 
> "This patch prevents a deadlock in printk_safe_flush() in NMI
> context. It makes sure that we continue and eventually call
> printk_safe_flush_on_panic() from panic() that has better
> chances to succeed.
> 
> There is a risk that logbuf_lock was not part of a soft- or
> dead-lockup and we might just loose the messages. But then there is a high
> chance that irq_work will get called and the messages will get flushed
> the normal way."
> 
> 
> > Any chance we can add more info to the commit message? E.g. backtraces
> > which would describe "how" is this possible (like the one I posted in
> > another email). Just to make it more clear.
> 
> I agree that a backtrace would be helpful. But it is not a must to
> have from my point of view.
> 
> The patch itself looks good to me.
> 
> Best Regards,
> Petr



RE: [PATCH] printk: make printk_safe_flush safe in NMI context by skipping flushing

2018-05-29 Thread Hoeun Ryu


> -Original Message-
> From: Sergey Senozhatsky [mailto:sergey.senozhatsky.w...@gmail.com]
> Sent: Tuesday, May 29, 2018 9:13 PM
> To: Hoeun Ryu 
> Cc: Petr Mladek ; Sergey Senozhatsky
> ; Steven Rostedt ;
> Hoeun Ryu ; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] printk: make printk_safe_flush safe in NMI context by
> skipping flushing
> 
> On (05/29/18 11:51), Hoeun Ryu wrote:
> >  Make printk_safe_flush() safe in NMI context.
> > nmi_trigger_cpumask_backtrace() can be called in NMI context. For
> example the
> > function is called in watchdog_overflow_callback() if the flag of
> hardlockup
> > backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and
> > watchdog_overflow_callback() function is called in NMI context on some
> > architectures.
> >  Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace()
> eventually tries
> > to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be already
> locked in
> > preempted contexts (task or irq in this case) or by other CPUs and it
> may cause
> > deadlocks.
> >  By making printk_safe_flush() safe in NMI context, the backtrace
> triggering CPU
> > just skips flushing if the lock is not avaiable in NMI context. The
> messages in
> > per-cpu nmi buffer of the backtrace triggering CPU can be lost if the
> CPU is in
> > hard lockup (because irq is disabled here) but if panic() is not called.
> The
> > flushing can be delayed by the next irq work in normal cases.
> 
> Any chance we can add more info to the commit message? E.g. backtraces
> which would describe "how" is this possible (like the one I posted in
> another email). Just to make it more clear.
> 

OK, I will.

> > @@ -254,6 +254,16 @@ void printk_safe_flush(void)
> >  {
> > int cpu;
> >
> > +   /*
> > +* Just avoid deadlocks here, we could loose the messages in per-
> cpu nmi buffer
> > +* in the case that hardlockup happens but panic() is not called
> (irq_work won't
> > +* work).
> > +* The flushing can be delayed by the next irq_work if flushing is
> skippped here
>
^^ skipped
> 

Typo will be fixed.

>   -ss



RE: [PATCH] printk: make printk_safe_flush safe in NMI context by skipping flushing

2018-05-29 Thread Hoeun Ryu


> -Original Message-
> From: Sergey Senozhatsky [mailto:sergey.senozhatsky.w...@gmail.com]
> Sent: Tuesday, May 29, 2018 9:13 PM
> To: Hoeun Ryu 
> Cc: Petr Mladek ; Sergey Senozhatsky
> ; Steven Rostedt ;
> Hoeun Ryu ; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] printk: make printk_safe_flush safe in NMI context by
> skipping flushing
> 
> On (05/29/18 11:51), Hoeun Ryu wrote:
> >  Make printk_safe_flush() safe in NMI context.
> > nmi_trigger_cpumask_backtrace() can be called in NMI context. For
> example the
> > function is called in watchdog_overflow_callback() if the flag of
> hardlockup
> > backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and
> > watchdog_overflow_callback() function is called in NMI context on some
> > architectures.
> >  Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace()
> eventually tries
> > to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be already
> locked in
> > preempted contexts (task or irq in this case) or by other CPUs and it
> may cause
> > deadlocks.
> >  By making printk_safe_flush() safe in NMI context, the backtrace
> triggering CPU
> > just skips flushing if the lock is not avaiable in NMI context. The
> messages in
> > per-cpu nmi buffer of the backtrace triggering CPU can be lost if the
> CPU is in
> > hard lockup (because irq is disabled here) but if panic() is not called.
> The
> > flushing can be delayed by the next irq work in normal cases.
> 
> Any chance we can add more info to the commit message? E.g. backtraces
> which would describe "how" is this possible (like the one I posted in
> another email). Just to make it more clear.
> 

OK, I will.

> > @@ -254,6 +254,16 @@ void printk_safe_flush(void)
> >  {
> > int cpu;
> >
> > +   /*
> > +* Just avoid deadlocks here, we could loose the messages in per-
> cpu nmi buffer
> > +* in the case that hardlockup happens but panic() is not called
> (irq_work won't
> > +* work).
> > +* The flushing can be delayed by the next irq_work if flushing is
> skippped here
>
^^ skipped
> 

Typo will be fixed.

>   -ss



[PATCH] printk: make printk_safe_flush safe in NMI context by skipping flushing

2018-05-28 Thread Hoeun Ryu
From: Hoeun Ryu 

 Make printk_safe_flush() safe in NMI context.
nmi_trigger_cpumask_backtrace() can be called in NMI context. For example the
function is called in watchdog_overflow_callback() if the flag of hardlockup
backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and
watchdog_overflow_callback() function is called in NMI context on some
architectures.
 Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually tries
to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be already locked 
in
preempted contexts (task or irq in this case) or by other CPUs and it may cause
deadlocks.
 By making printk_safe_flush() safe in NMI context, the backtrace triggering CPU
just skips flushing if the lock is not avaiable in NMI context. The messages in
per-cpu nmi buffer of the backtrace triggering CPU can be lost if the CPU is in
hard lockup (because irq is disabled here) but if panic() is not called. The
flushing can be delayed by the next irq work in normal cases.

Suggested-by: Sergey Senozhatsky 
Signed-off-by: Hoeun Ryu 
---
 kernel/printk/printk_safe.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 3e3c200..62bcc9b 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -254,6 +254,16 @@ void printk_safe_flush(void)
 {
int cpu;
 
+   /*
+* Just avoid deadlocks here, we could loose the messages in per-cpu 
nmi buffer
+* in the case that hardlockup happens but panic() is not called 
(irq_work won't
+* work).
+* The flushing can be delayed by the next irq_work if flushing is 
skippped here
+* in normal cases.
+*/
+   if (this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK)
+   return;
+
for_each_possible_cpu(cpu) {
 #ifdef CONFIG_PRINTK_NMI
__printk_safe_flush(_cpu(nmi_print_seq, cpu).work);
-- 
2.1.4



[PATCH] printk: make printk_safe_flush safe in NMI context by skipping flushing

2018-05-28 Thread Hoeun Ryu
From: Hoeun Ryu 

 Make printk_safe_flush() safe in NMI context.
nmi_trigger_cpumask_backtrace() can be called in NMI context. For example the
function is called in watchdog_overflow_callback() if the flag of hardlockup
backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and
watchdog_overflow_callback() function is called in NMI context on some
architectures.
 Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually tries
to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be already locked 
in
preempted contexts (task or irq in this case) or by other CPUs and it may cause
deadlocks.
 By making printk_safe_flush() safe in NMI context, the backtrace triggering CPU
just skips flushing if the lock is not avaiable in NMI context. The messages in
per-cpu nmi buffer of the backtrace triggering CPU can be lost if the CPU is in
hard lockup (because irq is disabled here) but if panic() is not called. The
flushing can be delayed by the next irq work in normal cases.

Suggested-by: Sergey Senozhatsky 
Signed-off-by: Hoeun Ryu 
---
 kernel/printk/printk_safe.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 3e3c200..62bcc9b 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -254,6 +254,16 @@ void printk_safe_flush(void)
 {
int cpu;
 
+   /*
+* Just avoid deadlocks here, we could loose the messages in per-cpu 
nmi buffer
+* in the case that hardlockup happens but panic() is not called 
(irq_work won't
+* work).
+* The flushing can be delayed by the next irq_work if flushing is 
skippped here
+* in normal cases.
+*/
+   if (this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK)
+   return;
+
for_each_possible_cpu(cpu) {
 #ifdef CONFIG_PRINTK_NMI
__printk_safe_flush(_cpu(nmi_print_seq, cpu).work);
-- 
2.1.4



[PATCH] printk: make printk_safe_flush safe in NMI context

2018-05-28 Thread Hoeun Ryu
From: Hoeun Ryu 

 Make printk_safe_flush() safe in NMI context. And printk_safe_flush_on_panic() 
is
folded into this function. The prototype of printk_safe_flush() is changed to
"void printk_safe_flush(bool panic)".

 nmi_trigger_cpumask_backtrace() can be called in NMI context. For example the
function is called in watchdog_overflow_callback() if the flag of hardlockup
backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and
watchdog_overflow_callback() function is called in NMI context on some
architectures.
 Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually 
tries to
lock logbuf_lock in vprintk_emit() but the logbuf_lock can be already locked in
preempted contexts (task or irq in this case) or by other CPUs and it may cause
deadlocks.
 By making printk_safe_flush() safe in NMI context, the backtrace triggering CPU
just skips flushing if the lock is not avaiable in NMI context. The messages in
per-cpu nmi buffer of the backtrace triggering CPU can be lost if the CPU is in
hard lockup (because irq is disabled here) but if panic() is not called. The
flushing can be delayed by the next irq work in normal cases.

Suggested-by: Sergey Senozhatsky 
Signed-off-by: Hoeun Ryu 
---
 arch/powerpc/kernel/traps.c|  2 +-
 arch/powerpc/kernel/watchdog.c |  2 +-
 include/linux/printk.h |  9 ++
 kernel/kexec_core.c|  2 +-
 kernel/panic.c |  4 +--
 kernel/printk/printk_safe.c| 62 +-
 lib/nmi_backtrace.c|  2 +-
 7 files changed, 39 insertions(+), 44 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 0904492..c50749c 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -160,7 +160,7 @@ extern void panic_flush_kmsg_start(void)
 
 extern void panic_flush_kmsg_end(void)
 {
-   printk_safe_flush_on_panic();
+   printk_safe_flush(true);
kmsg_dump(KMSG_DUMP_PANIC);
bust_spinlocks(0);
debug_locks_off();
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 6256dc3..3c9138b 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -173,7 +173,7 @@ static void watchdog_smp_panic(int cpu, u64 tb)
 
wd_smp_unlock();
 
-   printk_safe_flush();
+   printk_safe_flush(false);
/*
 * printk_safe_flush() seems to require another print
 * before anything actually goes out to console.
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 6d7e800..495fe26 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -203,8 +203,7 @@ void dump_stack_print_info(const char *log_lvl);
 void show_regs_print_info(const char *log_lvl);
 extern asmlinkage void dump_stack(void) __cold;
 extern void printk_safe_init(void);
-extern void printk_safe_flush(void);
-extern void printk_safe_flush_on_panic(void);
+extern void printk_safe_flush(bool panic);
 #else
 static inline __printf(1, 0)
 int vprintk(const char *s, va_list args)
@@ -273,11 +272,7 @@ static inline void printk_safe_init(void)
 {
 }
 
-static inline void printk_safe_flush(void)
-{
-}
-
-static inline void printk_safe_flush_on_panic(void)
+static inline void printk_safe_flush(bool panic)
 {
 }
 #endif
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 20fef1a..1b0876e 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -961,7 +961,7 @@ void crash_kexec(struct pt_regs *regs)
old_cpu = atomic_cmpxchg(_cpu, PANIC_CPU_INVALID, this_cpu);
if (old_cpu == PANIC_CPU_INVALID) {
/* This is the 1st CPU which comes here, so go ahead. */
-   printk_safe_flush_on_panic();
+   printk_safe_flush(true);
__crash_kexec(regs);
 
/*
diff --git a/kernel/panic.c b/kernel/panic.c
index 42e4874..2f2c86c 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -193,7 +193,7 @@ void panic(const char *fmt, ...)
 * Bypass the panic_cpu check and call __crash_kexec directly.
 */
if (!_crash_kexec_post_notifiers) {
-   printk_safe_flush_on_panic();
+   printk_safe_flush(true);
__crash_kexec(NULL);
 
/*
@@ -218,7 +218,7 @@ void panic(const char *fmt, ...)
atomic_notifier_call_chain(_notifier_list, 0, buf);
 
/* Call flush even twice. It tries harder with a single online CPU */
-   printk_safe_flush_on_panic();
+   printk_safe_flush(true);
kmsg_dump(KMSG_DUMP_PANIC);
 
/*
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 3e3c200..35ea941 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -244,49 +244,49 @@ static void __printk_safe_flush(struct irq_work *work)
 }
 
 /**
- * printk_safe_flush - flush all per-cpu nmi buffers.
+ * printk_safe_flush - flush all per-cpu nmi buffers. it can be called ev

[PATCH] printk: make printk_safe_flush safe in NMI context

2018-05-28 Thread Hoeun Ryu
From: Hoeun Ryu 

 Make printk_safe_flush() safe in NMI context. And printk_safe_flush_on_panic() 
is
folded into this function. The prototype of printk_safe_flush() is changed to
"void printk_safe_flush(bool panic)".

 nmi_trigger_cpumask_backtrace() can be called in NMI context. For example the
function is called in watchdog_overflow_callback() if the flag of hardlockup
backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and
watchdog_overflow_callback() function is called in NMI context on some
architectures.
 Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually 
tries to
lock logbuf_lock in vprintk_emit() but the logbuf_lock can be already locked in
preempted contexts (task or irq in this case) or by other CPUs and it may cause
deadlocks.
 By making printk_safe_flush() safe in NMI context, the backtrace triggering CPU
just skips flushing if the lock is not avaiable in NMI context. The messages in
per-cpu nmi buffer of the backtrace triggering CPU can be lost if the CPU is in
hard lockup (because irq is disabled here) but if panic() is not called. The
flushing can be delayed by the next irq work in normal cases.

Suggested-by: Sergey Senozhatsky 
Signed-off-by: Hoeun Ryu 
---
 arch/powerpc/kernel/traps.c|  2 +-
 arch/powerpc/kernel/watchdog.c |  2 +-
 include/linux/printk.h |  9 ++
 kernel/kexec_core.c|  2 +-
 kernel/panic.c |  4 +--
 kernel/printk/printk_safe.c| 62 +-
 lib/nmi_backtrace.c|  2 +-
 7 files changed, 39 insertions(+), 44 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 0904492..c50749c 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -160,7 +160,7 @@ extern void panic_flush_kmsg_start(void)
 
 extern void panic_flush_kmsg_end(void)
 {
-   printk_safe_flush_on_panic();
+   printk_safe_flush(true);
kmsg_dump(KMSG_DUMP_PANIC);
bust_spinlocks(0);
debug_locks_off();
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 6256dc3..3c9138b 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -173,7 +173,7 @@ static void watchdog_smp_panic(int cpu, u64 tb)
 
wd_smp_unlock();
 
-   printk_safe_flush();
+   printk_safe_flush(false);
/*
 * printk_safe_flush() seems to require another print
 * before anything actually goes out to console.
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 6d7e800..495fe26 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -203,8 +203,7 @@ void dump_stack_print_info(const char *log_lvl);
 void show_regs_print_info(const char *log_lvl);
 extern asmlinkage void dump_stack(void) __cold;
 extern void printk_safe_init(void);
-extern void printk_safe_flush(void);
-extern void printk_safe_flush_on_panic(void);
+extern void printk_safe_flush(bool panic);
 #else
 static inline __printf(1, 0)
 int vprintk(const char *s, va_list args)
@@ -273,11 +272,7 @@ static inline void printk_safe_init(void)
 {
 }
 
-static inline void printk_safe_flush(void)
-{
-}
-
-static inline void printk_safe_flush_on_panic(void)
+static inline void printk_safe_flush(bool panic)
 {
 }
 #endif
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 20fef1a..1b0876e 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -961,7 +961,7 @@ void crash_kexec(struct pt_regs *regs)
old_cpu = atomic_cmpxchg(_cpu, PANIC_CPU_INVALID, this_cpu);
if (old_cpu == PANIC_CPU_INVALID) {
/* This is the 1st CPU which comes here, so go ahead. */
-   printk_safe_flush_on_panic();
+   printk_safe_flush(true);
__crash_kexec(regs);
 
/*
diff --git a/kernel/panic.c b/kernel/panic.c
index 42e4874..2f2c86c 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -193,7 +193,7 @@ void panic(const char *fmt, ...)
 * Bypass the panic_cpu check and call __crash_kexec directly.
 */
if (!_crash_kexec_post_notifiers) {
-   printk_safe_flush_on_panic();
+   printk_safe_flush(true);
__crash_kexec(NULL);
 
/*
@@ -218,7 +218,7 @@ void panic(const char *fmt, ...)
atomic_notifier_call_chain(_notifier_list, 0, buf);
 
/* Call flush even twice. It tries harder with a single online CPU */
-   printk_safe_flush_on_panic();
+   printk_safe_flush(true);
kmsg_dump(KMSG_DUMP_PANIC);
 
/*
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 3e3c200..35ea941 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -244,49 +244,49 @@ static void __printk_safe_flush(struct irq_work *work)
 }
 
 /**
- * printk_safe_flush - flush all per-cpu nmi buffers.
+ * printk_safe_flush - flush all per-cpu nmi buffers. it can be called ev

[PATCH] nmi_backtrace: use printk_safe_flush_on_panic when triggering backtraces

2018-05-25 Thread Hoeun Ryu
From: Hoeun Ryu <hoeun@lge.com>

 Use printk_safe_flush_on_panic() in nmi_trigger_cpumask_backtrace().
nmi_trigger_cpumask_backtrace() can be called in NMI context. For example the
function is called in watchdog_overflow_callback() if the flag of hardlockup
backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and
watchdog_overflow_callback() function is called in NMI context on some
architectures.
 printk_safe_flush() eventually tries to lock logbuf_lock in vprintk_emit() but 
the
logbuf_lock can be already locked in preempted contexts (task or irq in this 
case)
or by other CPUs and it may cause deadlocks.
 By using printk_safe_flush_on_panic() instead of printk_safe_flush() in
in nmi_trigger_cpumask_backtrace(), the backtrace triggering CPU can access the 
lock
safely as the lock is reinitialized before calling printk_safe_flush().
 The messages in logbuf can be corrupted if the NMI context preempts other 
contexts
or ignores the lock held by other CPUs writing the buffer.

Signed-off-by: Hoeun Ryu <hoeun@lge.com>
---
 lib/nmi_backtrace.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 61a6b5a..0ace3c9 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -78,8 +78,11 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
/*
 * Force flush any remote buffers that might be stuck in IRQ context
 * and therefore could not run their irq_work.
+* Call nmi-safe version of printk_safe_flush() because this function 
can be
+* called in NMI context like watchdog_overflow_callback() if
+* sysctl_hardlockup_all_cpu_backtrace is true.
 */
-   printk_safe_flush();
+   printk_safe_flush_on_panic();
 
clear_bit_unlock(0, _flag);
put_cpu();
-- 
2.1.4



[PATCH] nmi_backtrace: use printk_safe_flush_on_panic when triggering backtraces

2018-05-25 Thread Hoeun Ryu
From: Hoeun Ryu 

 Use printk_safe_flush_on_panic() in nmi_trigger_cpumask_backtrace().
nmi_trigger_cpumask_backtrace() can be called in NMI context. For example the
function is called in watchdog_overflow_callback() if the flag of hardlockup
backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and
watchdog_overflow_callback() function is called in NMI context on some
architectures.
 printk_safe_flush() eventually tries to lock logbuf_lock in vprintk_emit() but 
the
logbuf_lock can be already locked in preempted contexts (task or irq in this 
case)
or by other CPUs and it may cause deadlocks.
 By using printk_safe_flush_on_panic() instead of printk_safe_flush() in
in nmi_trigger_cpumask_backtrace(), the backtrace triggering CPU can access the 
lock
safely as the lock is reinitialized before calling printk_safe_flush().
 The messages in logbuf can be corrupted if the NMI context preempts other 
contexts
or ignores the lock held by other CPUs writing the buffer.

Signed-off-by: Hoeun Ryu 
---
 lib/nmi_backtrace.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 61a6b5a..0ace3c9 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -78,8 +78,11 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
/*
 * Force flush any remote buffers that might be stuck in IRQ context
 * and therefore could not run their irq_work.
+* Call nmi-safe version of printk_safe_flush() because this function 
can be
+* called in NMI context like watchdog_overflow_callback() if
+* sysctl_hardlockup_all_cpu_backtrace is true.
 */
-   printk_safe_flush();
+   printk_safe_flush_on_panic();
 
clear_bit_unlock(0, _flag);
put_cpu();
-- 
2.1.4



[PATCHv2] armpmu: broadcast overflow irq on multi-core system having one muxed SPI for PMU.

2018-05-13 Thread Hoeun Ryu
From: Hoeun Ryu <hoeun@lge.com>

 On some SoCs like i.MX6DL/QL have only one muxed SPI for multi-core system.
On the systems, a CPU can be interrupted by overflow irq but it is possible that
the overflow actually occurs on another CPU.
 This patch broadcasts the irq using smp_call_function_single_async() so that 
other
CPUs can check and handle their overflows by themselves when a overflow doesn't
actually occur on the interrupted CPU. Per-cpu call_single_data are allocated in
arm_pmu structure for this purpose during initialization

 The callback for smp_call_function_single_async() is __armpmu_handle_irq() and
the function calls armpmu->handle_irq() with an invalid irq_num because
smp_call_func_t has only one parameter and armpmu pointer is handed over by the
pointer. It can be a problem if irq_num parameter is used by handlers but no
handler uses the irq parameter for now. We could have another approach removing
irq_num argument itself in handle_irq() function.

Signed-off-by: Hoeun Ryu <hoeun@lge.com>
---
 drivers/perf/arm_pmu.c   | 62 ++--
 include/linux/perf/arm_pmu.h |  3 +++
 2 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 1a0d340..df024a0 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -322,6 +322,29 @@ validate_group(struct perf_event *event)
return 0;
 }
 
+static void __armpmu_handle_irq(void *dev)
+{
+   struct arm_pmu *armpmu;
+   u64 start_clock, finish_clock;
+   irqreturn_t ret;
+
+   armpmu = *(void **)dev;
+   start_clock = sched_clock();
+   /*
+* irq_num should not be used by the handler, we don't have irq_num for
+* the first place. There is no handler using the irq_num argument for 
now.
+* smp_call_func_t has one function argument and irq number cannot be 
handed
+* over to this callback because we need dev pointer here.
+* If you need valid irq_num, you need to declare a wrapper struct 
having
+* irq_num and dev pointer.
+*/
+   ret = armpmu->handle_irq(-1, armpmu);
+   if (ret == IRQ_HANDLED) {
+   finish_clock = sched_clock();
+   perf_sample_event_took(finish_clock - start_clock);
+   }
+}
+
 static irqreturn_t armpmu_dispatch_irq(int irq, void *dev)
 {
struct arm_pmu *armpmu;
@@ -340,9 +363,34 @@ static irqreturn_t armpmu_dispatch_irq(int irq, void *dev)
 
start_clock = sched_clock();
ret = armpmu->handle_irq(irq, armpmu);
-   finish_clock = sched_clock();
+   /*
+* The handler just returns with IRQ_NONE when it checks the overflow
+* and the overflow doesn't occur on the CPU.
+*
+* Some SoCs like i.MX6 have one muxed SPI on multi-core system.
+* On the systems , the irq should be broadcasted to other CPUs so that 
the
+* CPUs can check their own PMU overflow.
+*/
+   if (ret == IRQ_HANDLED) {
+   finish_clock = sched_clock();
+   perf_sample_event_took(finish_clock - start_clock);
+   } else if (ret == IRQ_NONE) {
+   int cpu;
+   struct cpumask mask;
+
+   cpumask_copy(, cpu_online_mask);
+   cpumask_clear_cpu(raw_smp_processor_id(), );
+   for_each_cpu(cpu, ) {
+   call_single_data_t *csd =
+   per_cpu_ptr(armpmu->ov_brdcast_csd, cpu);
+
+   csd->func = __armpmu_handle_irq;
+   csd->info = dev;
+
+   smp_call_function_single_async(cpu, csd);
+   }
+   }
 
-   perf_sample_event_took(finish_clock - start_clock);
return ret;
 }
 
@@ -790,6 +838,13 @@ static struct arm_pmu *__armpmu_alloc(gfp_t flags)
goto out_free_pmu;
}
 
+   pmu->ov_brdcast_csd = alloc_percpu_gfp(call_single_data_t, flags);
+   if (!pmu->ov_brdcast_csd) {
+   pr_info("failed to allocate per-cpu "
+   "overflow broadcasting call single data.\n");
+   goto out_free_hw_events;
+   }
+
pmu->pmu = (struct pmu) {
.pmu_enable = armpmu_enable,
.pmu_disable= armpmu_disable,
@@ -824,6 +879,8 @@ static struct arm_pmu *__armpmu_alloc(gfp_t flags)
 
return pmu;
 
+out_free_hw_events:
+   free_percpu(pmu->hw_events);
 out_free_pmu:
kfree(pmu);
 out:
@@ -844,6 +901,7 @@ struct arm_pmu *armpmu_alloc_atomic(void)
 void armpmu_free(struct arm_pmu *pmu)
 {
free_percpu(pmu->hw_events);
+   free_percpu(pmu->ov_brdcast_csd);
kfree(pmu);
 }
 
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 40036a5..a63da63 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -107,6 +10

RE: [PATCH] armpmu: broadcast overflow irq on multi-core system having one muxed SPI for PMU.

2018-05-13 Thread Hoeun Ryu
Thank you for the review.
I understand your NACK.

But I'd like to just fix the part of smp_call_function() in the next version.
You can simply ignore it.

> -Original Message-
> From: Mark Rutland [mailto:mark.rutl...@arm.com]
> Sent: Friday, May 11, 2018 7:39 PM
> To: ��ȣ�� <hoeun@lge.com>
> Cc: 'Hoeun Ryu' <hoeun@lge.com.com>; 'Will Deacon'
> <will.dea...@arm.com>; linux-arm-ker...@lists.infradead.org; linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCH] armpmu: broadcast overflow irq on multi-core system
> having one muxed SPI for PMU.
> 
> On Fri, May 11, 2018 at 08:20:49AM +0900, ��ȣ�� wrote:
> > Thank you for the reply.
> >
> > > -Original Message-
> > > From: Mark Rutland [mailto:mark.rutl...@arm.com]
> > > Sent: Thursday, May 10, 2018 7:21 PM
> > > To: Hoeun Ryu <hoeun@lge.com.com>
> > > Cc: Will Deacon <will.dea...@arm.com>; Hoeun Ryu <hoeun@lge.com>;
> > > linux-arm-ker...@lists.infradead.org; linux-kernel@vger.kernel.org
> > > Subject: Re: [PATCH] armpmu: broadcast overflow irq on multi-core
> system
> > > having one muxed SPI for PMU.
> 
> > > Muxing the PMU IRQs is a really broken system design, and there's no
> good
> > > way of supporting it.
> 
> > > What we should do for such systems is:
> > >
> > > * Add a flag to the DT to describe that the IRQs are muxed, as this
> > >   cannot be probed.
> > >
> > > * Add hrtimer code to periodically update the counters, to avoid
> > >   overflow (e.g. as we do in the l2x0 PMU).
> > >
> > > * Reject sampling for such systems, as this cannot be done reliably or
> > >   efficiently.
> > >
> > > NAK to broadcasting the IRQ -- there are a number of issues with the
> > > general approach.
> >
> > The second solution would be good if sampling is necessary even like
> those
> > systems.
> 
> Please note that I mean *all* of the above. There would be no sampling
> on systems with muxed PMU IRQs, since there's no correlation between
> overflow events and the hrtimer interrupts -- the results of sampling
> would be misleading.
> 
> > Actually I'm working on FIQ available ARM32 system and trying to enable
> the
> > hard lockup detector by routing the PMU IRQ to FIQ.
> > Because of that, I really need the interrupt event if it is a muxed SPI,
> > beside I also need to make an dedicated IPI FIQ to broadcast the IRQ in
> > this approach.
> > What would you do if you were in the same situation ?
> 
> I don't think that this can work with a muxed IRQ, sorry.
> 
> It would be better to use some kind of timer.
> 
> [...]
> 
> > > Futher, If you ever encounter a case where you need to avoid
> preemption
> > > across enabling IRQs, preemption must be disabled *before* enabling
> IRQs.
> >
> > Ah, OK.
> > Enabling IRQs can cause scheduling tasks in the end of exception or
> other
> > scheduling points, right ?
> 
> Yes. If an IRQ was taken *between* enabling IRQs and disabling
> preemption, preemption may occur as part of the exception return.
> 
> Thanks,
> Mark.



[PATCHv2] armpmu: broadcast overflow irq on multi-core system having one muxed SPI for PMU.

2018-05-13 Thread Hoeun Ryu
From: Hoeun Ryu 

 On some SoCs like i.MX6DL/QL have only one muxed SPI for multi-core system.
On the systems, a CPU can be interrupted by overflow irq but it is possible that
the overflow actually occurs on another CPU.
 This patch broadcasts the irq using smp_call_function_single_async() so that 
other
CPUs can check and handle their overflows by themselves when a overflow doesn't
actually occur on the interrupted CPU. Per-cpu call_single_data are allocated in
arm_pmu structure for this purpose during initialization

 The callback for smp_call_function_single_async() is __armpmu_handle_irq() and
the function calls armpmu->handle_irq() with an invalid irq_num because
smp_call_func_t has only one parameter and armpmu pointer is handed over by the
pointer. It can be a problem if irq_num parameter is used by handlers but no
handler uses the irq parameter for now. We could have another approach removing
irq_num argument itself in handle_irq() function.

Signed-off-by: Hoeun Ryu 
---
 drivers/perf/arm_pmu.c   | 62 ++--
 include/linux/perf/arm_pmu.h |  3 +++
 2 files changed, 63 insertions(+), 2 deletions(-)

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 1a0d340..df024a0 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -322,6 +322,29 @@ validate_group(struct perf_event *event)
return 0;
 }
 
+static void __armpmu_handle_irq(void *dev)
+{
+   struct arm_pmu *armpmu;
+   u64 start_clock, finish_clock;
+   irqreturn_t ret;
+
+   armpmu = *(void **)dev;
+   start_clock = sched_clock();
+   /*
+* irq_num should not be used by the handler, we don't have irq_num for
+* the first place. There is no handler using the irq_num argument for 
now.
+* smp_call_func_t has one function argument and irq number cannot be 
handed
+* over to this callback because we need dev pointer here.
+* If you need valid irq_num, you need to declare a wrapper struct 
having
+* irq_num and dev pointer.
+*/
+   ret = armpmu->handle_irq(-1, armpmu);
+   if (ret == IRQ_HANDLED) {
+   finish_clock = sched_clock();
+   perf_sample_event_took(finish_clock - start_clock);
+   }
+}
+
 static irqreturn_t armpmu_dispatch_irq(int irq, void *dev)
 {
struct arm_pmu *armpmu;
@@ -340,9 +363,34 @@ static irqreturn_t armpmu_dispatch_irq(int irq, void *dev)
 
start_clock = sched_clock();
ret = armpmu->handle_irq(irq, armpmu);
-   finish_clock = sched_clock();
+   /*
+* The handler just returns with IRQ_NONE when it checks the overflow
+* and the overflow doesn't occur on the CPU.
+*
+* Some SoCs like i.MX6 have one muxed SPI on multi-core system.
+* On the systems , the irq should be broadcasted to other CPUs so that 
the
+* CPUs can check their own PMU overflow.
+*/
+   if (ret == IRQ_HANDLED) {
+   finish_clock = sched_clock();
+   perf_sample_event_took(finish_clock - start_clock);
+   } else if (ret == IRQ_NONE) {
+   int cpu;
+   struct cpumask mask;
+
+   cpumask_copy(, cpu_online_mask);
+   cpumask_clear_cpu(raw_smp_processor_id(), );
+   for_each_cpu(cpu, ) {
+   call_single_data_t *csd =
+   per_cpu_ptr(armpmu->ov_brdcast_csd, cpu);
+
+   csd->func = __armpmu_handle_irq;
+   csd->info = dev;
+
+   smp_call_function_single_async(cpu, csd);
+   }
+   }
 
-   perf_sample_event_took(finish_clock - start_clock);
return ret;
 }
 
@@ -790,6 +838,13 @@ static struct arm_pmu *__armpmu_alloc(gfp_t flags)
goto out_free_pmu;
}
 
+   pmu->ov_brdcast_csd = alloc_percpu_gfp(call_single_data_t, flags);
+   if (!pmu->ov_brdcast_csd) {
+   pr_info("failed to allocate per-cpu "
+   "overflow broadcasting call single data.\n");
+   goto out_free_hw_events;
+   }
+
pmu->pmu = (struct pmu) {
.pmu_enable = armpmu_enable,
.pmu_disable= armpmu_disable,
@@ -824,6 +879,8 @@ static struct arm_pmu *__armpmu_alloc(gfp_t flags)
 
return pmu;
 
+out_free_hw_events:
+   free_percpu(pmu->hw_events);
 out_free_pmu:
kfree(pmu);
 out:
@@ -844,6 +901,7 @@ struct arm_pmu *armpmu_alloc_atomic(void)
 void armpmu_free(struct arm_pmu *pmu)
 {
free_percpu(pmu->hw_events);
+   free_percpu(pmu->ov_brdcast_csd);
kfree(pmu);
 }
 
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 40036a5..a63da63 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -107,6 +107,9 @@ struct arm_pmu {
 
/* Only to be used 

RE: [PATCH] armpmu: broadcast overflow irq on multi-core system having one muxed SPI for PMU.

2018-05-13 Thread Hoeun Ryu
Thank you for the review.
I understand your NACK.

But I'd like to just fix the part of smp_call_function() in the next version.
You can simply ignore it.

> -Original Message-
> From: Mark Rutland [mailto:mark.rutl...@arm.com]
> Sent: Friday, May 11, 2018 7:39 PM
> To: ��ȣ�� 
> Cc: 'Hoeun Ryu' ; 'Will Deacon'
> ; linux-arm-ker...@lists.infradead.org; linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCH] armpmu: broadcast overflow irq on multi-core system
> having one muxed SPI for PMU.
> 
> On Fri, May 11, 2018 at 08:20:49AM +0900, ��ȣ�� wrote:
> > Thank you for the reply.
> >
> > > -Original Message-
> > > From: Mark Rutland [mailto:mark.rutl...@arm.com]
> > > Sent: Thursday, May 10, 2018 7:21 PM
> > > To: Hoeun Ryu 
> > > Cc: Will Deacon ; Hoeun Ryu ;
> > > linux-arm-ker...@lists.infradead.org; linux-kernel@vger.kernel.org
> > > Subject: Re: [PATCH] armpmu: broadcast overflow irq on multi-core
> system
> > > having one muxed SPI for PMU.
> 
> > > Muxing the PMU IRQs is a really broken system design, and there's no
> good
> > > way of supporting it.
> 
> > > What we should do for such systems is:
> > >
> > > * Add a flag to the DT to describe that the IRQs are muxed, as this
> > >   cannot be probed.
> > >
> > > * Add hrtimer code to periodically update the counters, to avoid
> > >   overflow (e.g. as we do in the l2x0 PMU).
> > >
> > > * Reject sampling for such systems, as this cannot be done reliably or
> > >   efficiently.
> > >
> > > NAK to broadcasting the IRQ -- there are a number of issues with the
> > > general approach.
> >
> > The second solution would be good if sampling is necessary even like
> those
> > systems.
> 
> Please note that I mean *all* of the above. There would be no sampling
> on systems with muxed PMU IRQs, since there's no correlation between
> overflow events and the hrtimer interrupts -- the results of sampling
> would be misleading.
> 
> > Actually I'm working on FIQ available ARM32 system and trying to enable
> the
> > hard lockup detector by routing the PMU IRQ to FIQ.
> > Because of that, I really need the interrupt event if it is a muxed SPI,
> > beside I also need to make an dedicated IPI FIQ to broadcast the IRQ in
> > this approach.
> > What would you do if you were in the same situation ?
> 
> I don't think that this can work with a muxed IRQ, sorry.
> 
> It would be better to use some kind of timer.
> 
> [...]
> 
> > > Futher, If you ever encounter a case where you need to avoid
> preemption
> > > across enabling IRQs, preemption must be disabled *before* enabling
> IRQs.
> >
> > Ah, OK.
> > Enabling IRQs can cause scheduling tasks in the end of exception or
> other
> > scheduling points, right ?
> 
> Yes. If an IRQ was taken *between* enabling IRQs and disabling
> preemption, preemption may occur as part of the exception return.
> 
> Thanks,
> Mark.



RE: smp_call_function() friends and irq/bottom_half context

2018-05-11 Thread Hoeun Ryu

> -Original Message-
> From: Peter Zijlstra [mailto:pet...@infradead.org]
> Sent: Thursday, May 10, 2018 7:17 PM
> To: ��ȣ�� 
> Cc: mi...@kernel.org; aaron...@intel.com; adobri...@gmail.com;
> frede...@kernel.org; ying.hu...@intel.com; linux-kernel@vger.kernel.org
> Subject: Re: smp_call_function() friends and irq/bottom_half context
> 
> On Thu, May 10, 2018 at 01:58:29PM +0900, ��ȣ�� wrote:
> > Hi, all.
> >
> > I'm reading kernel/smp.c code and I found comments on smp_call_function()
> > and smp_call_function_[single/many]
> > saying that these functions are cannot be called in interrupt disabled
> > context or irq/bottom half handlers.
> >
> > I understand that there is a potential deadlock issue when caller CPU of
> > the functions is waiting for the completion of the callback of other
> CPUs.
> > But I was wondering if this is the case even when the caller CPU doesn't
> > wait for the completion (wait == 0).
> 
> IIRC yes, because csd_lock(). You can however use
> smp_call_function_single_async() with your own csd. Be very careful
> though, it is very easy to construct deadlocks.

Thank you for the explanation and the suggestion.

Would you please explain more on csd lock and irq disabled or irq/bh context ?
How can deadlocks happen when calling smp_call_function(wait=0) with irq 
disabled ?
How can deadlocks happen when calling smp_call_function(wait=0) from irq or 
bottom half context ?

I think it as 2d array like call_single_data[caller cpu][target cpu].flags.
csd_lock(csd[caller][target]) spins on the flag only when LOCKED is already set
which means that the caller CPU already called smp_call_function(wait=0)
and at the second calling of the function, the caller CPU spins on
csd_lock(csd[caller][target]) but the target CPU is still processing its 
callback list
(maybe processing other csd[other cpu][target]) and it does not yet process
csd[caller][target]. But the callback list will be finally flushed and LOCKED 
flags
is cleared of csd[caller][target] eventually ?




RE: smp_call_function() friends and irq/bottom_half context

2018-05-11 Thread Hoeun Ryu

> -Original Message-
> From: Peter Zijlstra [mailto:pet...@infradead.org]
> Sent: Thursday, May 10, 2018 7:17 PM
> To: ��ȣ�� 
> Cc: mi...@kernel.org; aaron...@intel.com; adobri...@gmail.com;
> frede...@kernel.org; ying.hu...@intel.com; linux-kernel@vger.kernel.org
> Subject: Re: smp_call_function() friends and irq/bottom_half context
> 
> On Thu, May 10, 2018 at 01:58:29PM +0900, ��ȣ�� wrote:
> > Hi, all.
> >
> > I'm reading kernel/smp.c code and I found comments on smp_call_function()
> > and smp_call_function_[single/many]
> > saying that these functions are cannot be called in interrupt disabled
> > context or irq/bottom half handlers.
> >
> > I understand that there is a potential deadlock issue when caller CPU of
> > the functions is waiting for the completion of the callback of other
> CPUs.
> > But I was wondering if this is the case even when the caller CPU doesn't
> > wait for the completion (wait == 0).
> 
> IIRC yes, because csd_lock(). You can however use
> smp_call_function_single_async() with your own csd. Be very careful
> though, it is very easy to construct deadlocks.

Thank you for the explanation and the suggestion.

Would you please explain more on csd lock and irq disabled or irq/bh context ?
How can deadlocks happen when calling smp_call_function(wait=0) with irq 
disabled ?
How can deadlocks happen when calling smp_call_function(wait=0) from irq or 
bottom half context ?

I think it as 2d array like call_single_data[caller cpu][target cpu].flags.
csd_lock(csd[caller][target]) spins on the flag only when LOCKED is already set
which means that the caller CPU already called smp_call_function(wait=0)
and at the second calling of the function, the caller CPU spins on
csd_lock(csd[caller][target]) but the target CPU is still processing its 
callback list
(maybe processing other csd[other cpu][target]) and it does not yet process
csd[caller][target]. But the callback list will be finally flushed and LOCKED 
flags
is cleared of csd[caller][target] eventually ?




[PATCH] armpmu: broadcast overflow irq on multi-core system having one muxed SPI for PMU.

2018-05-10 Thread Hoeun Ryu
From: Hoeun Ryu <hoeun@lge.com>

 On some SoCs like i.MX6DL/QL have only one muxed SPI for multi-core system.
On the systems, a CPU can be interrupted by overflow irq but it is possible that
the overflow actually occurs on another CPU.
 This patch broadcasts the irq using smp_call_function() so that other CPUs can
check and handle their overflows by themselves when a overflow doesn't actually
occur on the interrupted CPU.

 Local irq is enabled and preemption is disabled temporarily to call
smp_call_function_many() in armpmu_dispatch_irq() as the 
smp_call_function_many()
doesn't allow to be called with irq-disabled.

 The callback for smp_call_function_many() is __armpmu_handle_irq() and the
function calls armpmu->handle_irq() with an invalid irq_num because
smp_call_func_t has only one parameter and armpmu pointer is handed over by the
pointer. It can be a problem if irq_num parameter is used by handlers but no
handler uses the irq parameter for now. We could have another approach removing
irq_num argument itself in handle_irq() function.

Signed-off-by: Hoeun Ryu <hoeun@lge.com>
---
 drivers/perf/arm_pmu.c | 51 +++---
 1 file changed, 48 insertions(+), 3 deletions(-)

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 1a0d340..3d65e44 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -322,6 +322,29 @@ validate_group(struct perf_event *event)
return 0;
 }
 
+static void __armpmu_handle_irq(void *dev)
+{
+   struct arm_pmu *armpmu;
+   u64 start_clock, finish_clock;
+   irqreturn_t ret;
+
+   armpmu = *(void **)dev;
+   start_clock = sched_clock();
+   /*
+* irq_num should not be used by the handler, we don't have irq_num for
+* the first place. There is no handler using the irq_num argument for 
now.
+* smp_call_func_t has one function argument and irq number cannot be 
handed
+* over to this callback because we need dev pointer here.
+* If you need valid irq_num, you need to declare a wrapper struct 
having
+* irq_num and dev pointer.
+*/
+   ret = armpmu->handle_irq(-1, armpmu);
+   if (ret == IRQ_HANDLED) {
+   finish_clock = sched_clock();
+   perf_sample_event_took(finish_clock - start_clock);
+   }
+}
+
 static irqreturn_t armpmu_dispatch_irq(int irq, void *dev)
 {
struct arm_pmu *armpmu;
@@ -340,9 +363,31 @@ static irqreturn_t armpmu_dispatch_irq(int irq, void *dev)
 
start_clock = sched_clock();
ret = armpmu->handle_irq(irq, armpmu);
-   finish_clock = sched_clock();
-
-   perf_sample_event_took(finish_clock - start_clock);
+   /*
+* The handler just returns with IRQ_NONE when it checks the overflow
+* and the overflow doesn't occur on the CPU.
+*
+* Some SoCs like i.MX6 have one muxed SPI on multi-core system.
+* On the systems , the irq should be broadcasted to other CPUs so that 
the
+* CPUs can check their own PMU overflow.
+*/
+   if (ret == IRQ_HANDLED) {
+   finish_clock = sched_clock();
+   perf_sample_event_took(finish_clock - start_clock);
+   } else if (ret == IRQ_NONE) {
+   struct cpumask mask;
+
+   cpumask_copy(, cpu_online_mask);
+   cpumask_clear_cpu(raw_smp_processor_id(), );
+   if (!cpumask_empty()) {
+   /* smp_call_function cannot be called with irq disabled 
*/
+   local_irq_enable();
+   preempt_disable();
+   smp_call_function_many(, __armpmu_handle_irq, dev, 
0);
+   preempt_enable();
+   local_irq_disable();
+   }
+   }
return ret;
 }
 
-- 
2.1.4



[PATCH] armpmu: broadcast overflow irq on multi-core system having one muxed SPI for PMU.

2018-05-10 Thread Hoeun Ryu
From: Hoeun Ryu 

 On some SoCs like i.MX6DL/QL have only one muxed SPI for multi-core system.
On the systems, a CPU can be interrupted by overflow irq but it is possible that
the overflow actually occurs on another CPU.
 This patch broadcasts the irq using smp_call_function() so that other CPUs can
check and handle their overflows by themselves when a overflow doesn't actually
occur on the interrupted CPU.

 Local irq is enabled and preemption is disabled temporarily to call
smp_call_function_many() in armpmu_dispatch_irq() as the 
smp_call_function_many()
doesn't allow to be called with irq-disabled.

 The callback for smp_call_function_many() is __armpmu_handle_irq() and the
function calls armpmu->handle_irq() with an invalid irq_num because
smp_call_func_t has only one parameter and armpmu pointer is handed over by the
pointer. It can be a problem if irq_num parameter is used by handlers but no
handler uses the irq parameter for now. We could have another approach removing
irq_num argument itself in handle_irq() function.

Signed-off-by: Hoeun Ryu 
---
 drivers/perf/arm_pmu.c | 51 +++---
 1 file changed, 48 insertions(+), 3 deletions(-)

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 1a0d340..3d65e44 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -322,6 +322,29 @@ validate_group(struct perf_event *event)
return 0;
 }
 
+static void __armpmu_handle_irq(void *dev)
+{
+   struct arm_pmu *armpmu;
+   u64 start_clock, finish_clock;
+   irqreturn_t ret;
+
+   armpmu = *(void **)dev;
+   start_clock = sched_clock();
+   /*
+* irq_num should not be used by the handler, we don't have irq_num for
+* the first place. There is no handler using the irq_num argument for 
now.
+* smp_call_func_t has one function argument and irq number cannot be 
handed
+* over to this callback because we need dev pointer here.
+* If you need valid irq_num, you need to declare a wrapper struct 
having
+* irq_num and dev pointer.
+*/
+   ret = armpmu->handle_irq(-1, armpmu);
+   if (ret == IRQ_HANDLED) {
+   finish_clock = sched_clock();
+   perf_sample_event_took(finish_clock - start_clock);
+   }
+}
+
 static irqreturn_t armpmu_dispatch_irq(int irq, void *dev)
 {
struct arm_pmu *armpmu;
@@ -340,9 +363,31 @@ static irqreturn_t armpmu_dispatch_irq(int irq, void *dev)
 
start_clock = sched_clock();
ret = armpmu->handle_irq(irq, armpmu);
-   finish_clock = sched_clock();
-
-   perf_sample_event_took(finish_clock - start_clock);
+   /*
+* The handler just returns with IRQ_NONE when it checks the overflow
+* and the overflow doesn't occur on the CPU.
+*
+* Some SoCs like i.MX6 have one muxed SPI on multi-core system.
+* On the systems , the irq should be broadcasted to other CPUs so that 
the
+* CPUs can check their own PMU overflow.
+*/
+   if (ret == IRQ_HANDLED) {
+   finish_clock = sched_clock();
+   perf_sample_event_took(finish_clock - start_clock);
+   } else if (ret == IRQ_NONE) {
+   struct cpumask mask;
+
+   cpumask_copy(, cpu_online_mask);
+   cpumask_clear_cpu(raw_smp_processor_id(), );
+   if (!cpumask_empty()) {
+   /* smp_call_function cannot be called with irq disabled 
*/
+   local_irq_enable();
+   preempt_disable();
+   smp_call_function_many(, __armpmu_handle_irq, dev, 
0);
+   preempt_enable();
+   local_irq_disable();
+   }
+   }
return ret;
 }
 
-- 
2.1.4



[RFC 3/3] sched: depend on 64BIT_ATOMIC_ACCESS to determine if to use min_vruntime_copy

2017-08-23 Thread Hoeun Ryu
 'min_vruntime_copy' is copied when 'min_vruntime' is updated for cfq_rq
and used to check if updating 'min_vruntime' is completed on reader side.
 Because 'min_vruntime' variable is 64bit, we need a mimic of seqlock to
check if the variable is not being updated on 32bit machines.

 On 64BIT_ATOMIC_ACCESS enabled machines, 64bit accesses are atomic even
though the machines are 32bit, so we can directly access 'min_vruntime'
on the architectures.

 Depend on CONFIG_64BIT_ATOMIC_ACCESS instead of CONFIG_64BIT to determine
whether 'min_vruntime_copy' variable is used for synchronization or not.
And align 'min_vruntime' by 8 if 64BIT_ATOMIC_ALIGNED_ACCESS is true
because 64BIT_ATOMIC_ALIGNED_ACCESS enabled system can access the variable
atomically only when it' aligned.

Signed-off-by: Hoeun Ryu <hoeun@gmail.com>
---
 kernel/sched/fair.c  | 6 +++---
 kernel/sched/sched.h | 6 +-
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c95880e..840658f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -536,7 +536,7 @@ static void update_min_vruntime(struct cfs_rq *cfs_rq)
 
/* ensure we never gain time by being placed backwards. */
cfs_rq->min_vruntime = max_vruntime(cfs_rq->min_vruntime, vruntime);
-#ifndef CONFIG_64BIT
+#ifndef CONFIG_64BIT_ATOMIC_ACCESS
smp_wmb();
cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
 #endif
@@ -5975,7 +5975,7 @@ static void migrate_task_rq_fair(struct task_struct *p)
struct cfs_rq *cfs_rq = cfs_rq_of(se);
u64 min_vruntime;
 
-#ifndef CONFIG_64BIT
+#ifndef CONFIG_64BIT_ATOMIC_ACCESS
u64 min_vruntime_copy;
 
do {
@@ -9173,7 +9173,7 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
 {
cfs_rq->tasks_timeline = RB_ROOT;
cfs_rq->min_vruntime = (u64)(-(1LL << 20));
-#ifndef CONFIG_64BIT
+#ifndef CONFIG_64BIT_ATOMIC_ACCESS
cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
 #endif
 #ifdef CONFIG_SMP
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index eeef1a3..870010b 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -421,8 +421,12 @@ struct cfs_rq {
unsigned int nr_running, h_nr_running;
 
u64 exec_clock;
+#ifndef CONFIG_64BIT_ATOMIC_ALIGNED_ACCESS
u64 min_vruntime;
-#ifndef CONFIG_64BIT
+#else
+   u64 min_vruntime __attribute__((aligned(sizeof(u64;
+#endif
+#ifndef CONFIG_64BIT_ATOMIC_ACCESS
u64 min_vruntime_copy;
 #endif
 
-- 
2.7.4



[RFC 3/3] sched: depend on 64BIT_ATOMIC_ACCESS to determine if to use min_vruntime_copy

2017-08-23 Thread Hoeun Ryu
 'min_vruntime_copy' is copied when 'min_vruntime' is updated for cfq_rq
and used to check if updating 'min_vruntime' is completed on reader side.
 Because 'min_vruntime' variable is 64bit, we need a mimic of seqlock to
check if the variable is not being updated on 32bit machines.

 On 64BIT_ATOMIC_ACCESS enabled machines, 64bit accesses are atomic even
though the machines are 32bit, so we can directly access 'min_vruntime'
on the architectures.

 Depend on CONFIG_64BIT_ATOMIC_ACCESS instead of CONFIG_64BIT to determine
whether 'min_vruntime_copy' variable is used for synchronization or not.
And align 'min_vruntime' by 8 if 64BIT_ATOMIC_ALIGNED_ACCESS is true
because 64BIT_ATOMIC_ALIGNED_ACCESS enabled system can access the variable
atomically only when it' aligned.

Signed-off-by: Hoeun Ryu 
---
 kernel/sched/fair.c  | 6 +++---
 kernel/sched/sched.h | 6 +-
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c95880e..840658f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -536,7 +536,7 @@ static void update_min_vruntime(struct cfs_rq *cfs_rq)
 
/* ensure we never gain time by being placed backwards. */
cfs_rq->min_vruntime = max_vruntime(cfs_rq->min_vruntime, vruntime);
-#ifndef CONFIG_64BIT
+#ifndef CONFIG_64BIT_ATOMIC_ACCESS
smp_wmb();
cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
 #endif
@@ -5975,7 +5975,7 @@ static void migrate_task_rq_fair(struct task_struct *p)
struct cfs_rq *cfs_rq = cfs_rq_of(se);
u64 min_vruntime;
 
-#ifndef CONFIG_64BIT
+#ifndef CONFIG_64BIT_ATOMIC_ACCESS
u64 min_vruntime_copy;
 
do {
@@ -9173,7 +9173,7 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
 {
cfs_rq->tasks_timeline = RB_ROOT;
cfs_rq->min_vruntime = (u64)(-(1LL << 20));
-#ifndef CONFIG_64BIT
+#ifndef CONFIG_64BIT_ATOMIC_ACCESS
cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
 #endif
 #ifdef CONFIG_SMP
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index eeef1a3..870010b 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -421,8 +421,12 @@ struct cfs_rq {
unsigned int nr_running, h_nr_running;
 
u64 exec_clock;
+#ifndef CONFIG_64BIT_ATOMIC_ALIGNED_ACCESS
u64 min_vruntime;
-#ifndef CONFIG_64BIT
+#else
+   u64 min_vruntime __attribute__((aligned(sizeof(u64;
+#endif
+#ifndef CONFIG_64BIT_ATOMIC_ACCESS
u64 min_vruntime_copy;
 #endif
 
-- 
2.7.4



[RFC 0/3] add 64BIT_ATOMIC_ACCESS and 64BIT_ATOMIC_ALIGNED_ACCESS

2017-08-23 Thread Hoeun Ryu
 On some 32-bit architectures, 64bit accesses are atomic when certain
conditions are satisfied.

 For example, on LPAE (Large Physical Address Extension) enabled ARM
architecture, 'ldrd/strd' (load/store doublewords) instructions are 64bit
atomic as long as the address is 64-bit aligned. This feature is to
guarantee atomic accesses on newly introduced 64bit wide descriptors in
the translation tables, and 's/u64' variables can be accessed atomically
when they are aligned(8) on LPAE enabled ARM architecture machines.

 Introducing 64BIT_ATOMIC_ACCESS and 64BIT_ATOMIC_ALIGNED_ACCESS, which
can be true for the 32bit architectures as well as 64bit architectures.

 we can optimize some kernel codes using seqlock (timekeeping) or mimics
of it (like in sched/cfq) simply to read or write 64bit variables.
 The existing codes depend on CONFIG_64BIT to determine whether the 64bit
variables can be directly accessed or need additional synchronization
primitives like seqlock. CONFIG_64BIT_ATOMIC_ACCESS can be used instead of
CONFIG_64BIT in the cases.

 64BIT_ATOMIC_ALIGNED_ACCESS can be used in the variable declaration to
indicate the alignment requirement to the compiler
(__attribute__((aligned(8 in the way of #ifdef.

 The last patch "sched: depend on 64BIT_ATOMIC_ACCESS to determine if to
use min_vruntime_copy" is an example of this approach.

 I'd like to know what the architecture maintainers and kernel maintainers
think about it. I think I can make more examples (mostly removing seqlock
to access the 64bit variables on the machines) if this approach is
accepted.

Hoeun Ryu (3):
  arch: add 64BIT_ATOMIC_ACCESS to support 64bit atomic access on 32bit
machines
  arm: enable 64BIT_ATOMIC(_ALIGNED)_ACCESS on LPAE enabled machines
  sched: depend on 64BIT_ATOMIC_ACCESS to determine if to use
min_vruntime_copy

 arch/Kconfig | 20 
 arch/arm/mm/Kconfig  |  2 ++
 kernel/sched/fair.c  |  6 +++---
 kernel/sched/sched.h |  6 +-
 4 files changed, 30 insertions(+), 4 deletions(-)

-- 
2.7.4



[RFC 1/3] arch: add 64BIT_ATOMIC_ACCESS to support 64bit atomic access on 32bit machines

2017-08-23 Thread Hoeun Ryu
 On some 32-bit architectures, 64bit accesses are atomic when certain
conditions are satisfied.
 For example, on LPAE (Large Physical Address Extension) enabled ARM
architecture, 'ldrd/strd' (load/store doublewords) instructions are 64bit
atomic as long as the address is 64-bit aligned. This feature is to
guarantee atomic accesses on newly introduced 64bit wide descriptors in
the translation tables, and 's/u64' variables can be accessed atomically
when they are aligned(8) on LPAE enabled ARM architecture machines.

 Introducing 64BIT_ATOMIC_ACCESS and 64BIT_ATOMIC_ALIGNED_ACCESS, which
can be true for the 32bit architectures as well as 64bit architectures,
we can optimize some kernel codes using seqlock (timekeeping) or mimics
of it (like in sched/cfq) simply to read or write 64bit variables.
 The existing codes depend on CONFIG_64BIT to determine whether the 64bit
variables can be directly accessed or need additional synchronization
primitives like seqlock. CONFIG_64BIT_ATOMIC_ACCESS can be used instead of
CONFIG_64BIT in the cases.
 64BIT_ATOMIC_ALIGNED_ACCESS can be used in the variable declaration to
indicate the alignment requirement to the compiler
(__attribute__((aligned(8 in the way of #ifdef.

Signed-off-by: Hoeun Ryu <hoeun@gmail.com>
---
 arch/Kconfig | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index 21d0089..1def331 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -115,6 +115,26 @@ config UPROBES
managed by the kernel and kept transparent to the probed
application. )
 
+config 64BIT_ATOMIC_ACCESS
+   def_bool 64BIT
+   help
+ On some 32bit architectures as well as 64bit architectures,
+ 64bit accesses are atomic when certain conditions are satisfied.
+
+ This symbol should be selected by an architecture if 64 bit
+ accesses can be atomic.
+
+config 64BIT_ATOMIC_ALIGNED_ACCESS
+   def_bool n
+   depends on 64BIT_ATOMIC_ACCESS
+   help
+ On 64BIT_ATOMIC_ACCESS enabled system, the address should be
+ aligned by 8 to guarantee the accesses are atomic.
+
+ This symbol should be selected by an architecture if 64 bit
+ accesses are required to be 64 bit aligned to guarantee that
+ the 64bit accesses are atomic.
+
 config HAVE_64BIT_ALIGNED_ACCESS
def_bool 64BIT && !HAVE_EFFICIENT_UNALIGNED_ACCESS
help
-- 
2.7.4



[RFC 0/3] add 64BIT_ATOMIC_ACCESS and 64BIT_ATOMIC_ALIGNED_ACCESS

2017-08-23 Thread Hoeun Ryu
 On some 32-bit architectures, 64bit accesses are atomic when certain
conditions are satisfied.

 For example, on LPAE (Large Physical Address Extension) enabled ARM
architecture, 'ldrd/strd' (load/store doublewords) instructions are 64bit
atomic as long as the address is 64-bit aligned. This feature is to
guarantee atomic accesses on newly introduced 64bit wide descriptors in
the translation tables, and 's/u64' variables can be accessed atomically
when they are aligned(8) on LPAE enabled ARM architecture machines.

 Introducing 64BIT_ATOMIC_ACCESS and 64BIT_ATOMIC_ALIGNED_ACCESS, which
can be true for the 32bit architectures as well as 64bit architectures.

 we can optimize some kernel codes using seqlock (timekeeping) or mimics
of it (like in sched/cfq) simply to read or write 64bit variables.
 The existing codes depend on CONFIG_64BIT to determine whether the 64bit
variables can be directly accessed or need additional synchronization
primitives like seqlock. CONFIG_64BIT_ATOMIC_ACCESS can be used instead of
CONFIG_64BIT in the cases.

 64BIT_ATOMIC_ALIGNED_ACCESS can be used in the variable declaration to
indicate the alignment requirement to the compiler
(__attribute__((aligned(8 in the way of #ifdef.

 The last patch "sched: depend on 64BIT_ATOMIC_ACCESS to determine if to
use min_vruntime_copy" is an example of this approach.

 I'd like to know what the architecture maintainers and kernel maintainers
think about it. I think I can make more examples (mostly removing seqlock
to access the 64bit variables on the machines) if this approach is
accepted.

Hoeun Ryu (3):
  arch: add 64BIT_ATOMIC_ACCESS to support 64bit atomic access on 32bit
machines
  arm: enable 64BIT_ATOMIC(_ALIGNED)_ACCESS on LPAE enabled machines
  sched: depend on 64BIT_ATOMIC_ACCESS to determine if to use
min_vruntime_copy

 arch/Kconfig | 20 
 arch/arm/mm/Kconfig  |  2 ++
 kernel/sched/fair.c  |  6 +++---
 kernel/sched/sched.h |  6 +-
 4 files changed, 30 insertions(+), 4 deletions(-)

-- 
2.7.4



[RFC 1/3] arch: add 64BIT_ATOMIC_ACCESS to support 64bit atomic access on 32bit machines

2017-08-23 Thread Hoeun Ryu
 On some 32-bit architectures, 64bit accesses are atomic when certain
conditions are satisfied.
 For example, on LPAE (Large Physical Address Extension) enabled ARM
architecture, 'ldrd/strd' (load/store doublewords) instructions are 64bit
atomic as long as the address is 64-bit aligned. This feature is to
guarantee atomic accesses on newly introduced 64bit wide descriptors in
the translation tables, and 's/u64' variables can be accessed atomically
when they are aligned(8) on LPAE enabled ARM architecture machines.

 Introducing 64BIT_ATOMIC_ACCESS and 64BIT_ATOMIC_ALIGNED_ACCESS, which
can be true for the 32bit architectures as well as 64bit architectures,
we can optimize some kernel codes using seqlock (timekeeping) or mimics
of it (like in sched/cfq) simply to read or write 64bit variables.
 The existing codes depend on CONFIG_64BIT to determine whether the 64bit
variables can be directly accessed or need additional synchronization
primitives like seqlock. CONFIG_64BIT_ATOMIC_ACCESS can be used instead of
CONFIG_64BIT in the cases.
 64BIT_ATOMIC_ALIGNED_ACCESS can be used in the variable declaration to
indicate the alignment requirement to the compiler
(__attribute__((aligned(8 in the way of #ifdef.

Signed-off-by: Hoeun Ryu 
---
 arch/Kconfig | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index 21d0089..1def331 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -115,6 +115,26 @@ config UPROBES
managed by the kernel and kept transparent to the probed
application. )
 
+config 64BIT_ATOMIC_ACCESS
+   def_bool 64BIT
+   help
+ On some 32bit architectures as well as 64bit architectures,
+ 64bit accesses are atomic when certain conditions are satisfied.
+
+ This symbol should be selected by an architecture if 64 bit
+ accesses can be atomic.
+
+config 64BIT_ATOMIC_ALIGNED_ACCESS
+   def_bool n
+   depends on 64BIT_ATOMIC_ACCESS
+   help
+ On 64BIT_ATOMIC_ACCESS enabled system, the address should be
+ aligned by 8 to guarantee the accesses are atomic.
+
+ This symbol should be selected by an architecture if 64 bit
+ accesses are required to be 64 bit aligned to guarantee that
+ the 64bit accesses are atomic.
+
 config HAVE_64BIT_ALIGNED_ACCESS
def_bool 64BIT && !HAVE_EFFICIENT_UNALIGNED_ACCESS
help
-- 
2.7.4



[RFC 2/3] arm: enable 64BIT_ATOMIC(_ALIGNED)_ACCESS on LPAE enabled machines

2017-08-23 Thread Hoeun Ryu
 'ldrd/strd' (load/store doublewords) instructions are 64bit atomic as
long as the address is 64-bit aligned on LPAE (Large Physical Address
Extension) enabled architectures. This feature is to guarantee atomic
accesses on newly introduced 64bit wide descriptors in the translation
tables.

 Making 64BIT_ATOMIC_ACCESS true, some kernel codes to access 64bit
variables can be optimized by omitting seqlock or the mimic of it.
 Also make 64BIT_ATOMIC_ALIGNED_ACCESS true, the 64bit atomic access is
guarnteed only when the address is 64bit algined.

Signed-off-by: Hoeun Ryu <hoeun@gmail.com>
---
 arch/arm/mm/Kconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index 60cdfdc..3142572 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -660,6 +660,8 @@ config ARM_LPAE
bool "Support for the Large Physical Address Extension"
depends on MMU && CPU_32v7 && !CPU_32v6 && !CPU_32v5 && \
!CPU_32v4 && !CPU_32v3
+   select 64BIT_ATOMIC_ACCESS
+   select 64BIT_ATOMIC_ALIGNED_ACCESS
help
  Say Y if you have an ARMv7 processor supporting the LPAE page
  table format and you would like to access memory beyond the
-- 
2.7.4



[RFC 2/3] arm: enable 64BIT_ATOMIC(_ALIGNED)_ACCESS on LPAE enabled machines

2017-08-23 Thread Hoeun Ryu
 'ldrd/strd' (load/store doublewords) instructions are 64bit atomic as
long as the address is 64-bit aligned on LPAE (Large Physical Address
Extension) enabled architectures. This feature is to guarantee atomic
accesses on newly introduced 64bit wide descriptors in the translation
tables.

 Making 64BIT_ATOMIC_ACCESS true, some kernel codes to access 64bit
variables can be optimized by omitting seqlock or the mimic of it.
 Also make 64BIT_ATOMIC_ALIGNED_ACCESS true, the 64bit atomic access is
guarnteed only when the address is 64bit algined.

Signed-off-by: Hoeun Ryu 
---
 arch/arm/mm/Kconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index 60cdfdc..3142572 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -660,6 +660,8 @@ config ARM_LPAE
bool "Support for the Large Physical Address Extension"
depends on MMU && CPU_32v7 && !CPU_32v6 && !CPU_32v5 && \
!CPU_32v4 && !CPU_32v3
+   select 64BIT_ATOMIC_ACCESS
+   select 64BIT_ATOMIC_ALIGNED_ACCESS
help
  Say Y if you have an ARMv7 processor supporting the LPAE page
  table format and you would like to access memory beyond the
-- 
2.7.4



Re: [PATCHv3] arm:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores

2017-08-17 Thread Hoeun Ryu
Hello, All.

Would you please review this patch ?
I haven't had any respond to this patch.

Thank you.

On Tue, 2017-08-08 at 10:22 +0900, Hoeun Ryu wrote:
>  Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly
> version in panic path) introduced crash_smp_send_stop() which is a weak
> function and can be overriden by architecture codes to fix the side effect
> caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_
> notifiers" option).
> 
>  ARM architecture uses the weak version function and the problem is that
> the weak function simply calls smp_send_stop() which makes other CPUs
> offline and takes away the chance to save crash information for nonpanic
> CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel
> option is enabled.
> 
>  Calling smp_call_function(machine_crash_nonpanic_core, NULL, false) in
> the function is useless because all nonpanic CPUs are already offline by
> smp_send_stop() in this case and smp_call_function() only works against
> online CPUs.
> 
>  The result is that /proc/vmcore is not available with the error messages;
> "Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized".
> 
>  crash_smp_send_stop() is implemented for ARM architecture to fix this
> problem and the function (strong symbol version) saves crash information
> for nonpanic CPUs using smp_call_function() and machine_crash_shutdown()
> tries to save crash information for nonpanic CPUs only when
> crash_kexec_post_notifiers kernel option is disabled.
> 
>  We might be able to implement the function like arm64 or x86 using a
> dedicated IPI (let's say IPI_CPU_CRASH_STOP), but we cannot implement this
> function like that because of the lack of IPI slots. Please see the commit
> e7273ff4 : (ARM: 8488/1: Make IPI_CPU_BACKTRACE a "non-secure" SGI)
> 
> Signed-off-by: Hoeun Ryu <hoeun@gmail.com>
> ---
>  v3:
>    - remove 'WARN_ON(num_online_cpus() > 1)' in machine_crash_shutdown().
>  it's a false check for the case when crash_kexec_post_notifiers
>  kernel option is disabled.
>  v2:
>    - calling crash_smp_send_stop() in machine_crash_shutdown() for the case
>  when crash_kexec_post_notifiers kernel option is disabled.
>    - fix commit messages for it.
> 
>  arch/arm/kernel/machine_kexec.c | 40 +---
>  1 file changed, 29 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
> index fe1419e..82ef7c7 100644
> --- a/arch/arm/kernel/machine_kexec.c
> +++ b/arch/arm/kernel/machine_kexec.c
> @@ -94,6 +94,34 @@ void machine_crash_nonpanic_core(void *unused)
>   cpu_relax();
>  }
>  
> +void crash_smp_send_stop(void)
> +{
> + static int cpus_stopped;
> + unsigned long msecs;
> +
> + /*
> +  * This function can be called twice in panic path, but obviously
> +  * we execute this only once.
> +  */
> + if (cpus_stopped)
> + return;
> +
> + cpus_stopped = 1;
> +
> + if (num_online_cpus() == 1)
> + return;
> +
> + atomic_set(_for_crash_ipi, num_online_cpus() - 1);
> + smp_call_function(machine_crash_nonpanic_core, NULL, false);
> + msecs = 1000; /* Wait at most a second for the other cpus to stop */
> + while ((atomic_read(_for_crash_ipi) > 0) && msecs) {
> + mdelay(1);
> + msecs--;
> + }
> + if (atomic_read(_for_crash_ipi) > 0)
> + pr_warn("Non-crashing CPUs did not react to IPI\n");
> +}
> +
>  static void machine_kexec_mask_interrupts(void)
>  {
>   unsigned int i;
> @@ -119,19 +147,9 @@ static void machine_kexec_mask_interrupts(void)
>  
>  void machine_crash_shutdown(struct pt_regs *regs)
>  {
> - unsigned long msecs;
> -
>   local_irq_disable();
>  
> - atomic_set(_for_crash_ipi, num_online_cpus() - 1);
> - smp_call_function(machine_crash_nonpanic_core, NULL, false);
> - msecs = 1000; /* Wait at most a second for the other cpus to stop */
> - while ((atomic_read(_for_crash_ipi) > 0) && msecs) {
> - mdelay(1);
> - msecs--;
> - }
> - if (atomic_read(_for_crash_ipi) > 0)
> - pr_warn("Non-crashing CPUs did not react to IPI\n");
> + crash_smp_send_stop();
>  
>   crash_save_cpu(regs, smp_processor_id());
>   machine_kexec_mask_interrupts();


Re: [PATCHv3] arm:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores

2017-08-17 Thread Hoeun Ryu
Hello, All.

Would you please review this patch ?
I haven't had any respond to this patch.

Thank you.

On Tue, 2017-08-08 at 10:22 +0900, Hoeun Ryu wrote:
>  Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly
> version in panic path) introduced crash_smp_send_stop() which is a weak
> function and can be overriden by architecture codes to fix the side effect
> caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_
> notifiers" option).
> 
>  ARM architecture uses the weak version function and the problem is that
> the weak function simply calls smp_send_stop() which makes other CPUs
> offline and takes away the chance to save crash information for nonpanic
> CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel
> option is enabled.
> 
>  Calling smp_call_function(machine_crash_nonpanic_core, NULL, false) in
> the function is useless because all nonpanic CPUs are already offline by
> smp_send_stop() in this case and smp_call_function() only works against
> online CPUs.
> 
>  The result is that /proc/vmcore is not available with the error messages;
> "Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized".
> 
>  crash_smp_send_stop() is implemented for ARM architecture to fix this
> problem and the function (strong symbol version) saves crash information
> for nonpanic CPUs using smp_call_function() and machine_crash_shutdown()
> tries to save crash information for nonpanic CPUs only when
> crash_kexec_post_notifiers kernel option is disabled.
> 
>  We might be able to implement the function like arm64 or x86 using a
> dedicated IPI (let's say IPI_CPU_CRASH_STOP), but we cannot implement this
> function like that because of the lack of IPI slots. Please see the commit
> e7273ff4 : (ARM: 8488/1: Make IPI_CPU_BACKTRACE a "non-secure" SGI)
> 
> Signed-off-by: Hoeun Ryu 
> ---
>  v3:
>    - remove 'WARN_ON(num_online_cpus() > 1)' in machine_crash_shutdown().
>  it's a false check for the case when crash_kexec_post_notifiers
>  kernel option is disabled.
>  v2:
>    - calling crash_smp_send_stop() in machine_crash_shutdown() for the case
>  when crash_kexec_post_notifiers kernel option is disabled.
>    - fix commit messages for it.
> 
>  arch/arm/kernel/machine_kexec.c | 40 +---
>  1 file changed, 29 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
> index fe1419e..82ef7c7 100644
> --- a/arch/arm/kernel/machine_kexec.c
> +++ b/arch/arm/kernel/machine_kexec.c
> @@ -94,6 +94,34 @@ void machine_crash_nonpanic_core(void *unused)
>   cpu_relax();
>  }
>  
> +void crash_smp_send_stop(void)
> +{
> + static int cpus_stopped;
> + unsigned long msecs;
> +
> + /*
> +  * This function can be called twice in panic path, but obviously
> +  * we execute this only once.
> +  */
> + if (cpus_stopped)
> + return;
> +
> + cpus_stopped = 1;
> +
> + if (num_online_cpus() == 1)
> + return;
> +
> + atomic_set(_for_crash_ipi, num_online_cpus() - 1);
> + smp_call_function(machine_crash_nonpanic_core, NULL, false);
> + msecs = 1000; /* Wait at most a second for the other cpus to stop */
> + while ((atomic_read(_for_crash_ipi) > 0) && msecs) {
> + mdelay(1);
> + msecs--;
> + }
> + if (atomic_read(_for_crash_ipi) > 0)
> + pr_warn("Non-crashing CPUs did not react to IPI\n");
> +}
> +
>  static void machine_kexec_mask_interrupts(void)
>  {
>   unsigned int i;
> @@ -119,19 +147,9 @@ static void machine_kexec_mask_interrupts(void)
>  
>  void machine_crash_shutdown(struct pt_regs *regs)
>  {
> - unsigned long msecs;
> -
>   local_irq_disable();
>  
> - atomic_set(_for_crash_ipi, num_online_cpus() - 1);
> - smp_call_function(machine_crash_nonpanic_core, NULL, false);
> - msecs = 1000; /* Wait at most a second for the other cpus to stop */
> - while ((atomic_read(_for_crash_ipi) > 0) && msecs) {
> - mdelay(1);
> - msecs--;
> - }
> - if (atomic_read(_for_crash_ipi) > 0)
> - pr_warn("Non-crashing CPUs did not react to IPI\n");
> + crash_smp_send_stop();
>  
>   crash_save_cpu(regs, smp_processor_id());
>   machine_kexec_mask_interrupts();


[PATCHv3] arm64:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores

2017-08-16 Thread Hoeun Ryu
 Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly
version in panic path) introduced crash_smp_send_stop() which is a weak
function and can be overridden by architecture codes to fix the side effect
caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_
notifiers" option).

 ARM64 architecture uses the weak version function and the problem is that
the weak function simply calls smp_send_stop() which makes other CPUs
offline and takes away the chance to save crash information for nonpanic
CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel
option is enabled.

 Calling smp_send_crash_stop() in machine_crash_shutdown() is useless
because all nonpanic CPUs are already offline by smp_send_stop() in this
case and smp_send_crash_stop() only works against online CPUs.

 The result is that secondary CPUs registers are not saved by
crash_save_cpu() and the vmcore file misreports these CPUs as being
offline.

 crash_smp_send_stop() is implemented to fix this problem by replacing the
existing smp_send_crash_stop() and adding a check for multiple calling to
the function. The function (strong symbol version) saves crash information
for nonpanic CPUs and machine_crash_shutdown() tries to save crash
information for nonpanic CPUs only when crash_kexec_post_notifiers kernel
option is disabled.

* crash_kexec_post_notifiers : false

  panic()
__crash_kexec()
  machine_crash_shutdown()
crash_smp_send_stop()<= save crash dump for nonpanic cores

* crash_kexec_post_notifiers : true

  panic()
crash_smp_send_stop()<= save crash dump for nonpanic cores
__crash_kexec()
  machine_crash_shutdown()
crash_smp_send_stop()<= just return.

Signed-off-by: Hoeun Ryu <hoeun@gmail.com>
Reviewed-by: James Morse <james.mo...@arm.com>
Tested-by: James Morse <james.mo...@arm.com>
---
 v3:
   - fix typos in the commit log.
   - modify commit log about the result of this problem.
   - add Tested-by/Reviewed-by: James Morse.
 v2:
   - replace the existing smp_send_crash_stop() with crash_smp_send_stop()
 and adding called-twice logic to it.
   - modify the commit message.

 arch/arm64/include/asm/smp.h  |  2 +-
 arch/arm64/kernel/machine_kexec.c |  2 +-
 arch/arm64/kernel/smp.c   | 12 +++-
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index 55f08c5..f82b447 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -148,7 +148,7 @@ static inline void cpu_panic_kernel(void)
  */
 bool cpus_are_stuck_in_kernel(void);
 
-extern void smp_send_crash_stop(void);
+extern void crash_smp_send_stop(void);
 extern bool smp_crash_stop_failed(void);
 
 #endif /* ifndef __ASSEMBLY__ */
diff --git a/arch/arm64/kernel/machine_kexec.c 
b/arch/arm64/kernel/machine_kexec.c
index 481f54a..11121f6 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -252,7 +252,7 @@ void machine_crash_shutdown(struct pt_regs *regs)
local_irq_disable();
 
/* shutdown non-crashing cpus */
-   smp_send_crash_stop();
+   crash_smp_send_stop();
 
/* for crashing cpu */
crash_save_cpu(regs, smp_processor_id());
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index dc66e6e..73d8f5e 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -977,11 +977,21 @@ void smp_send_stop(void)
 }
 
 #ifdef CONFIG_KEXEC_CORE
-void smp_send_crash_stop(void)
+void crash_smp_send_stop(void)
 {
+   static int cpus_stopped;
cpumask_t mask;
unsigned long timeout;
 
+   /*
+* This function can be called twice in panic path, but obviously
+* we execute this only once.
+*/
+   if (cpus_stopped)
+   return;
+
+   cpus_stopped = 1;
+
if (num_online_cpus() == 1)
return;
 
-- 
2.7.4



[PATCHv3] arm64:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores

2017-08-16 Thread Hoeun Ryu
 Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly
version in panic path) introduced crash_smp_send_stop() which is a weak
function and can be overridden by architecture codes to fix the side effect
caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_
notifiers" option).

 ARM64 architecture uses the weak version function and the problem is that
the weak function simply calls smp_send_stop() which makes other CPUs
offline and takes away the chance to save crash information for nonpanic
CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel
option is enabled.

 Calling smp_send_crash_stop() in machine_crash_shutdown() is useless
because all nonpanic CPUs are already offline by smp_send_stop() in this
case and smp_send_crash_stop() only works against online CPUs.

 The result is that secondary CPUs registers are not saved by
crash_save_cpu() and the vmcore file misreports these CPUs as being
offline.

 crash_smp_send_stop() is implemented to fix this problem by replacing the
existing smp_send_crash_stop() and adding a check for multiple calling to
the function. The function (strong symbol version) saves crash information
for nonpanic CPUs and machine_crash_shutdown() tries to save crash
information for nonpanic CPUs only when crash_kexec_post_notifiers kernel
option is disabled.

* crash_kexec_post_notifiers : false

  panic()
__crash_kexec()
  machine_crash_shutdown()
crash_smp_send_stop()<= save crash dump for nonpanic cores

* crash_kexec_post_notifiers : true

  panic()
crash_smp_send_stop()<= save crash dump for nonpanic cores
__crash_kexec()
  machine_crash_shutdown()
crash_smp_send_stop()<= just return.

Signed-off-by: Hoeun Ryu 
Reviewed-by: James Morse 
Tested-by: James Morse 
---
 v3:
   - fix typos in the commit log.
   - modify commit log about the result of this problem.
   - add Tested-by/Reviewed-by: James Morse.
 v2:
   - replace the existing smp_send_crash_stop() with crash_smp_send_stop()
 and adding called-twice logic to it.
   - modify the commit message.

 arch/arm64/include/asm/smp.h  |  2 +-
 arch/arm64/kernel/machine_kexec.c |  2 +-
 arch/arm64/kernel/smp.c   | 12 +++-
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index 55f08c5..f82b447 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -148,7 +148,7 @@ static inline void cpu_panic_kernel(void)
  */
 bool cpus_are_stuck_in_kernel(void);
 
-extern void smp_send_crash_stop(void);
+extern void crash_smp_send_stop(void);
 extern bool smp_crash_stop_failed(void);
 
 #endif /* ifndef __ASSEMBLY__ */
diff --git a/arch/arm64/kernel/machine_kexec.c 
b/arch/arm64/kernel/machine_kexec.c
index 481f54a..11121f6 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -252,7 +252,7 @@ void machine_crash_shutdown(struct pt_regs *regs)
local_irq_disable();
 
/* shutdown non-crashing cpus */
-   smp_send_crash_stop();
+   crash_smp_send_stop();
 
/* for crashing cpu */
crash_save_cpu(regs, smp_processor_id());
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index dc66e6e..73d8f5e 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -977,11 +977,21 @@ void smp_send_stop(void)
 }
 
 #ifdef CONFIG_KEXEC_CORE
-void smp_send_crash_stop(void)
+void crash_smp_send_stop(void)
 {
+   static int cpus_stopped;
cpumask_t mask;
unsigned long timeout;
 
+   /*
+* This function can be called twice in panic path, but obviously
+* we execute this only once.
+*/
+   if (cpus_stopped)
+   return;
+
+   cpus_stopped = 1;
+
if (num_online_cpus() == 1)
return;
 
-- 
2.7.4



Re: [PATCHv2] arm64:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores

2017-08-16 Thread Hoeun Ryu
Hello, James.

Thank you for the meticulous test and review.

On Fri, 2017-08-11 at 18:02 +0100, James Morse wrote:
> Hi Hoeun,
> 
> On 07/08/17 06:09, Hoeun Ryu wrote:
> > 
> >  Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly
> > version in panic path) introduced crash_smp_send_stop() which is a weak
> > function and can be overriden by architecture codes to fix the side effect
> (overridden)

It'll be fixed in the next version.

> 
> 
> > 
> > caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_
> > notifiers" option).
> > 
> >  ARM64 architecture uses the weak version function and the problem is that
> > the weak function simply calls smp_send_stop() which makes other CPUs
> > offline and takes away the chance to save crash information for nonpanic
> > CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel
> > option is enabled.
> > 
> >  Calling smp_send_crash_stop() in machine_crash_shutdown() is useless
> > because all nonpanic CPUs are already offline by smp_send_stop() in this
> > case and smp_send_crash_stop() only works against online CPUs.
> 
> > 
> >  The result is that /proc/vmcore is not available with the error messages;
> > "Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized".
> When I tried this I got one of these warnings for each secondary CPU, but the
> vmcore file was still available. When I ran 'crash' on the vmcore it reported:
> > 
> > CPUS: 6 [OFFLINE: 5]
> Did I miss as step to reproduce this? If not, can we change this paragraph to
> say something like:
> > 
> > The result is that secondary CPUs registers are not saved by 
> > crash_save_cpu()
> > and the vmcore file misreports these CPUs as being offline.

Actually the commit log comes from the patch to fix a similar issue in arm port.
I'll change the commit log with yours.

> 
> > 
> >  crash_smp_send_stop() is implemented to fix this problem by replacing the
> > exising smp_send_crash_stop() and adding a check for multiple calling to
> (existing)

It'll be fixed in the next version.

> 
> 
> > 
> > the function. The function (strong symbol version) saves crash information
> > for nonpanic CPUs and machine_crash_shutdown() tries to save crash
> > information for nonpanic CPUs only when crash_kexec_post_notifiers kernel
> > option is disabled.
> > 
> > * crash_kexec_post_notifiers : false
> > 
> >   panic()
> > __crash_kexec()
> >   machine_crash_shutdown()
> > crash_smp_send_stop()<= save crash dump for nonpanic cores
> > 
> > * crash_kexec_post_notifiers : true
> > 
> >   panic()
> > crash_smp_send_stop()<= save crash dump for nonpanic cores
> > __crash_kexec()
> >   machine_crash_shutdown()
> > crash_smp_send_stop()<= just return.
> 
> > 
> > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > index dc66e6e..73d8f5e 100644
> > --- a/arch/arm64/kernel/smp.c
> > +++ b/arch/arm64/kernel/smp.c
> > @@ -977,11 +977,21 @@ void smp_send_stop(void)
> >  }
> >  
> >  #ifdef CONFIG_KEXEC_CORE
> > -void smp_send_crash_stop(void)
> > +void crash_smp_send_stop(void)
> >  {
> > +   static int cpus_stopped;
> >     cpumask_t mask;
> >     unsigned long timeout;
> >  
> > +   /*
> > +    * This function can be called twice in panic path, but obviously
> > +    * we execute this only once.
> > +    */
> > +   if (cpus_stopped)
> > +   return;
> > +
> > +   cpus_stopped = 1;
> > +
> This cpus_stopped=1 can't happen on multiple CPUs at the same time as any 
> second
> call is guaranteed to be on the same CPU, both are behind panic()s
> 'atomic_cmpxchg()'.

'cpu_stopped' variable is not for the race of multi CPUs.
This variable is simply to prevent from calling
'smp_cross_call(, IPI_CPU_CRASH_STOP)' twice in the 
machine_crash_shutdown().
Please look at following call path.
 
* crash_kexec_post_notifiers : true
 panic()
     crash_smp_send_stop() {
         ...
         cpu_stopped = 1              <= make it '1'
         smp_cross_call()             <= save crash dump for nonpanic cores
     }
     __crash_kexec()
         machine_crash_shutdown()
             crash_smp_send_stop() { 
                 if (cpu_stopped)
                     return           <= just return.
             }
> 
> 
> Other than my '/proc/vmcore is not available' question above, this looks fine 
> to me:
> Reviewed-by: James Morse <james.mo...@arm.com>
> Tested-by: James Morse <james.mo...@arm.com>
> 
> 
> Thanks!
> 
> James
> 
> 
> 


Re: [PATCHv2] arm64:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores

2017-08-16 Thread Hoeun Ryu
Hello, James.

Thank you for the meticulous test and review.

On Fri, 2017-08-11 at 18:02 +0100, James Morse wrote:
> Hi Hoeun,
> 
> On 07/08/17 06:09, Hoeun Ryu wrote:
> > 
> >  Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly
> > version in panic path) introduced crash_smp_send_stop() which is a weak
> > function and can be overriden by architecture codes to fix the side effect
> (overridden)

It'll be fixed in the next version.

> 
> 
> > 
> > caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_
> > notifiers" option).
> > 
> >  ARM64 architecture uses the weak version function and the problem is that
> > the weak function simply calls smp_send_stop() which makes other CPUs
> > offline and takes away the chance to save crash information for nonpanic
> > CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel
> > option is enabled.
> > 
> >  Calling smp_send_crash_stop() in machine_crash_shutdown() is useless
> > because all nonpanic CPUs are already offline by smp_send_stop() in this
> > case and smp_send_crash_stop() only works against online CPUs.
> 
> > 
> >  The result is that /proc/vmcore is not available with the error messages;
> > "Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized".
> When I tried this I got one of these warnings for each secondary CPU, but the
> vmcore file was still available. When I ran 'crash' on the vmcore it reported:
> > 
> > CPUS: 6 [OFFLINE: 5]
> Did I miss as step to reproduce this? If not, can we change this paragraph to
> say something like:
> > 
> > The result is that secondary CPUs registers are not saved by 
> > crash_save_cpu()
> > and the vmcore file misreports these CPUs as being offline.

Actually the commit log comes from the patch to fix a similar issue in arm port.
I'll change the commit log with yours.

> 
> > 
> >  crash_smp_send_stop() is implemented to fix this problem by replacing the
> > exising smp_send_crash_stop() and adding a check for multiple calling to
> (existing)

It'll be fixed in the next version.

> 
> 
> > 
> > the function. The function (strong symbol version) saves crash information
> > for nonpanic CPUs and machine_crash_shutdown() tries to save crash
> > information for nonpanic CPUs only when crash_kexec_post_notifiers kernel
> > option is disabled.
> > 
> > * crash_kexec_post_notifiers : false
> > 
> >   panic()
> > __crash_kexec()
> >   machine_crash_shutdown()
> > crash_smp_send_stop()<= save crash dump for nonpanic cores
> > 
> > * crash_kexec_post_notifiers : true
> > 
> >   panic()
> > crash_smp_send_stop()<= save crash dump for nonpanic cores
> > __crash_kexec()
> >   machine_crash_shutdown()
> > crash_smp_send_stop()<= just return.
> 
> > 
> > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > index dc66e6e..73d8f5e 100644
> > --- a/arch/arm64/kernel/smp.c
> > +++ b/arch/arm64/kernel/smp.c
> > @@ -977,11 +977,21 @@ void smp_send_stop(void)
> >  }
> >  
> >  #ifdef CONFIG_KEXEC_CORE
> > -void smp_send_crash_stop(void)
> > +void crash_smp_send_stop(void)
> >  {
> > +   static int cpus_stopped;
> >     cpumask_t mask;
> >     unsigned long timeout;
> >  
> > +   /*
> > +    * This function can be called twice in panic path, but obviously
> > +    * we execute this only once.
> > +    */
> > +   if (cpus_stopped)
> > +   return;
> > +
> > +   cpus_stopped = 1;
> > +
> This cpus_stopped=1 can't happen on multiple CPUs at the same time as any 
> second
> call is guaranteed to be on the same CPU, both are behind panic()s
> 'atomic_cmpxchg()'.

'cpu_stopped' variable is not for the race of multi CPUs.
This variable is simply to prevent from calling
'smp_cross_call(, IPI_CPU_CRASH_STOP)' twice in the 
machine_crash_shutdown().
Please look at following call path.
 
* crash_kexec_post_notifiers : true
 panic()
     crash_smp_send_stop() {
         ...
         cpu_stopped = 1              <= make it '1'
         smp_cross_call()             <= save crash dump for nonpanic cores
     }
     __crash_kexec()
         machine_crash_shutdown()
             crash_smp_send_stop() { 
                 if (cpu_stopped)
                     return           <= just return.
             }
> 
> 
> Other than my '/proc/vmcore is not available' question above, this looks fine 
> to me:
> Reviewed-by: James Morse 
> Tested-by: James Morse 
> 
> 
> Thanks!
> 
> James
> 
> 
> 


[PATCHv3] arm:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores

2017-08-07 Thread Hoeun Ryu
 Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly
version in panic path) introduced crash_smp_send_stop() which is a weak
function and can be overriden by architecture codes to fix the side effect
caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_
notifiers" option).

 ARM architecture uses the weak version function and the problem is that
the weak function simply calls smp_send_stop() which makes other CPUs
offline and takes away the chance to save crash information for nonpanic
CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel
option is enabled.

 Calling smp_call_function(machine_crash_nonpanic_core, NULL, false) in
the function is useless because all nonpanic CPUs are already offline by
smp_send_stop() in this case and smp_call_function() only works against
online CPUs.

 The result is that /proc/vmcore is not available with the error messages;
"Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized".

 crash_smp_send_stop() is implemented for ARM architecture to fix this
problem and the function (strong symbol version) saves crash information
for nonpanic CPUs using smp_call_function() and machine_crash_shutdown()
tries to save crash information for nonpanic CPUs only when
crash_kexec_post_notifiers kernel option is disabled.

 We might be able to implement the function like arm64 or x86 using a
dedicated IPI (let's say IPI_CPU_CRASH_STOP), but we cannot implement this
function like that because of the lack of IPI slots. Please see the commit
e7273ff4 : (ARM: 8488/1: Make IPI_CPU_BACKTRACE a "non-secure" SGI)

Signed-off-by: Hoeun Ryu <hoeun@gmail.com>
---
 v3:
   - remove 'WARN_ON(num_online_cpus() > 1)' in machine_crash_shutdown().
 it's a false check for the case when crash_kexec_post_notifiers
 kernel option is disabled.
 v2:
   - calling crash_smp_send_stop() in machine_crash_shutdown() for the case
 when crash_kexec_post_notifiers kernel option is disabled.
   - fix commit messages for it.

 arch/arm/kernel/machine_kexec.c | 40 +---
 1 file changed, 29 insertions(+), 11 deletions(-)

diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index fe1419e..82ef7c7 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -94,6 +94,34 @@ void machine_crash_nonpanic_core(void *unused)
cpu_relax();
 }
 
+void crash_smp_send_stop(void)
+{
+   static int cpus_stopped;
+   unsigned long msecs;
+
+   /*
+* This function can be called twice in panic path, but obviously
+* we execute this only once.
+*/
+   if (cpus_stopped)
+   return;
+
+   cpus_stopped = 1;
+
+   if (num_online_cpus() == 1)
+   return;
+
+   atomic_set(_for_crash_ipi, num_online_cpus() - 1);
+   smp_call_function(machine_crash_nonpanic_core, NULL, false);
+   msecs = 1000; /* Wait at most a second for the other cpus to stop */
+   while ((atomic_read(_for_crash_ipi) > 0) && msecs) {
+   mdelay(1);
+   msecs--;
+   }
+   if (atomic_read(_for_crash_ipi) > 0)
+   pr_warn("Non-crashing CPUs did not react to IPI\n");
+}
+
 static void machine_kexec_mask_interrupts(void)
 {
unsigned int i;
@@ -119,19 +147,9 @@ static void machine_kexec_mask_interrupts(void)
 
 void machine_crash_shutdown(struct pt_regs *regs)
 {
-   unsigned long msecs;
-
local_irq_disable();
 
-   atomic_set(_for_crash_ipi, num_online_cpus() - 1);
-   smp_call_function(machine_crash_nonpanic_core, NULL, false);
-   msecs = 1000; /* Wait at most a second for the other cpus to stop */
-   while ((atomic_read(_for_crash_ipi) > 0) && msecs) {
-   mdelay(1);
-   msecs--;
-   }
-   if (atomic_read(_for_crash_ipi) > 0)
-   pr_warn("Non-crashing CPUs did not react to IPI\n");
+   crash_smp_send_stop();
 
crash_save_cpu(regs, smp_processor_id());
machine_kexec_mask_interrupts();
-- 
2.7.4



[PATCHv3] arm:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores

2017-08-07 Thread Hoeun Ryu
 Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly
version in panic path) introduced crash_smp_send_stop() which is a weak
function and can be overriden by architecture codes to fix the side effect
caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_
notifiers" option).

 ARM architecture uses the weak version function and the problem is that
the weak function simply calls smp_send_stop() which makes other CPUs
offline and takes away the chance to save crash information for nonpanic
CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel
option is enabled.

 Calling smp_call_function(machine_crash_nonpanic_core, NULL, false) in
the function is useless because all nonpanic CPUs are already offline by
smp_send_stop() in this case and smp_call_function() only works against
online CPUs.

 The result is that /proc/vmcore is not available with the error messages;
"Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized".

 crash_smp_send_stop() is implemented for ARM architecture to fix this
problem and the function (strong symbol version) saves crash information
for nonpanic CPUs using smp_call_function() and machine_crash_shutdown()
tries to save crash information for nonpanic CPUs only when
crash_kexec_post_notifiers kernel option is disabled.

 We might be able to implement the function like arm64 or x86 using a
dedicated IPI (let's say IPI_CPU_CRASH_STOP), but we cannot implement this
function like that because of the lack of IPI slots. Please see the commit
e7273ff4 : (ARM: 8488/1: Make IPI_CPU_BACKTRACE a "non-secure" SGI)

Signed-off-by: Hoeun Ryu 
---
 v3:
   - remove 'WARN_ON(num_online_cpus() > 1)' in machine_crash_shutdown().
 it's a false check for the case when crash_kexec_post_notifiers
 kernel option is disabled.
 v2:
   - calling crash_smp_send_stop() in machine_crash_shutdown() for the case
 when crash_kexec_post_notifiers kernel option is disabled.
   - fix commit messages for it.

 arch/arm/kernel/machine_kexec.c | 40 +---
 1 file changed, 29 insertions(+), 11 deletions(-)

diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index fe1419e..82ef7c7 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -94,6 +94,34 @@ void machine_crash_nonpanic_core(void *unused)
cpu_relax();
 }
 
+void crash_smp_send_stop(void)
+{
+   static int cpus_stopped;
+   unsigned long msecs;
+
+   /*
+* This function can be called twice in panic path, but obviously
+* we execute this only once.
+*/
+   if (cpus_stopped)
+   return;
+
+   cpus_stopped = 1;
+
+   if (num_online_cpus() == 1)
+   return;
+
+   atomic_set(_for_crash_ipi, num_online_cpus() - 1);
+   smp_call_function(machine_crash_nonpanic_core, NULL, false);
+   msecs = 1000; /* Wait at most a second for the other cpus to stop */
+   while ((atomic_read(_for_crash_ipi) > 0) && msecs) {
+   mdelay(1);
+   msecs--;
+   }
+   if (atomic_read(_for_crash_ipi) > 0)
+   pr_warn("Non-crashing CPUs did not react to IPI\n");
+}
+
 static void machine_kexec_mask_interrupts(void)
 {
unsigned int i;
@@ -119,19 +147,9 @@ static void machine_kexec_mask_interrupts(void)
 
 void machine_crash_shutdown(struct pt_regs *regs)
 {
-   unsigned long msecs;
-
local_irq_disable();
 
-   atomic_set(_for_crash_ipi, num_online_cpus() - 1);
-   smp_call_function(machine_crash_nonpanic_core, NULL, false);
-   msecs = 1000; /* Wait at most a second for the other cpus to stop */
-   while ((atomic_read(_for_crash_ipi) > 0) && msecs) {
-   mdelay(1);
-   msecs--;
-   }
-   if (atomic_read(_for_crash_ipi) > 0)
-   pr_warn("Non-crashing CPUs did not react to IPI\n");
+   crash_smp_send_stop();
 
crash_save_cpu(regs, smp_processor_id());
machine_kexec_mask_interrupts();
-- 
2.7.4



[PATCHv2] arm64:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores

2017-08-06 Thread Hoeun Ryu
 Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly
version in panic path) introduced crash_smp_send_stop() which is a weak
function and can be overriden by architecture codes to fix the side effect
caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_
notifiers" option).

 ARM64 architecture uses the weak version function and the problem is that
the weak function simply calls smp_send_stop() which makes other CPUs
offline and takes away the chance to save crash information for nonpanic
CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel
option is enabled.

 Calling smp_send_crash_stop() in machine_crash_shutdown() is useless
because all nonpanic CPUs are already offline by smp_send_stop() in this
case and smp_send_crash_stop() only works against online CPUs.

 The result is that /proc/vmcore is not available with the error messages;
"Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized".

 crash_smp_send_stop() is implemented to fix this problem by replacing the
exising smp_send_crash_stop() and adding a check for multiple calling to
the function. The function (strong symbol version) saves crash information
for nonpanic CPUs and machine_crash_shutdown() tries to save crash
information for nonpanic CPUs only when crash_kexec_post_notifiers kernel
option is disabled.

* crash_kexec_post_notifiers : false

  panic()
__crash_kexec()
  machine_crash_shutdown()
crash_smp_send_stop()<= save crash dump for nonpanic cores

* crash_kexec_post_notifiers : true

  panic()
crash_smp_send_stop()<= save crash dump for nonpanic cores
__crash_kexec()
  machine_crash_shutdown()
crash_smp_send_stop()<= just return.

Signed-off-by: Hoeun Ryu <hoeun@gmail.com>
---
 v2:
   - replace the existing smp_send_crash_stop() with crash_smp_send_stop()
 and adding called-twice logic to it.
   - modify the commit message

 arch/arm64/include/asm/smp.h  |  2 +-
 arch/arm64/kernel/machine_kexec.c |  2 +-
 arch/arm64/kernel/smp.c   | 12 +++-
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index 55f08c5..f82b447 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -148,7 +148,7 @@ static inline void cpu_panic_kernel(void)
  */
 bool cpus_are_stuck_in_kernel(void);
 
-extern void smp_send_crash_stop(void);
+extern void crash_smp_send_stop(void);
 extern bool smp_crash_stop_failed(void);
 
 #endif /* ifndef __ASSEMBLY__ */
diff --git a/arch/arm64/kernel/machine_kexec.c 
b/arch/arm64/kernel/machine_kexec.c
index 481f54a..11121f6 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -252,7 +252,7 @@ void machine_crash_shutdown(struct pt_regs *regs)
local_irq_disable();
 
/* shutdown non-crashing cpus */
-   smp_send_crash_stop();
+   crash_smp_send_stop();
 
/* for crashing cpu */
crash_save_cpu(regs, smp_processor_id());
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index dc66e6e..73d8f5e 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -977,11 +977,21 @@ void smp_send_stop(void)
 }
 
 #ifdef CONFIG_KEXEC_CORE
-void smp_send_crash_stop(void)
+void crash_smp_send_stop(void)
 {
+   static int cpus_stopped;
cpumask_t mask;
unsigned long timeout;
 
+   /*
+* This function can be called twice in panic path, but obviously
+* we execute this only once.
+*/
+   if (cpus_stopped)
+   return;
+
+   cpus_stopped = 1;
+
if (num_online_cpus() == 1)
return;
 
-- 
2.7.4



[PATCHv2] arm64:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores

2017-08-06 Thread Hoeun Ryu
 Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly
version in panic path) introduced crash_smp_send_stop() which is a weak
function and can be overriden by architecture codes to fix the side effect
caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_
notifiers" option).

 ARM64 architecture uses the weak version function and the problem is that
the weak function simply calls smp_send_stop() which makes other CPUs
offline and takes away the chance to save crash information for nonpanic
CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel
option is enabled.

 Calling smp_send_crash_stop() in machine_crash_shutdown() is useless
because all nonpanic CPUs are already offline by smp_send_stop() in this
case and smp_send_crash_stop() only works against online CPUs.

 The result is that /proc/vmcore is not available with the error messages;
"Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized".

 crash_smp_send_stop() is implemented to fix this problem by replacing the
exising smp_send_crash_stop() and adding a check for multiple calling to
the function. The function (strong symbol version) saves crash information
for nonpanic CPUs and machine_crash_shutdown() tries to save crash
information for nonpanic CPUs only when crash_kexec_post_notifiers kernel
option is disabled.

* crash_kexec_post_notifiers : false

  panic()
__crash_kexec()
  machine_crash_shutdown()
crash_smp_send_stop()<= save crash dump for nonpanic cores

* crash_kexec_post_notifiers : true

  panic()
crash_smp_send_stop()<= save crash dump for nonpanic cores
__crash_kexec()
  machine_crash_shutdown()
crash_smp_send_stop()<= just return.

Signed-off-by: Hoeun Ryu 
---
 v2:
   - replace the existing smp_send_crash_stop() with crash_smp_send_stop()
 and adding called-twice logic to it.
   - modify the commit message

 arch/arm64/include/asm/smp.h  |  2 +-
 arch/arm64/kernel/machine_kexec.c |  2 +-
 arch/arm64/kernel/smp.c   | 12 +++-
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index 55f08c5..f82b447 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -148,7 +148,7 @@ static inline void cpu_panic_kernel(void)
  */
 bool cpus_are_stuck_in_kernel(void);
 
-extern void smp_send_crash_stop(void);
+extern void crash_smp_send_stop(void);
 extern bool smp_crash_stop_failed(void);
 
 #endif /* ifndef __ASSEMBLY__ */
diff --git a/arch/arm64/kernel/machine_kexec.c 
b/arch/arm64/kernel/machine_kexec.c
index 481f54a..11121f6 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -252,7 +252,7 @@ void machine_crash_shutdown(struct pt_regs *regs)
local_irq_disable();
 
/* shutdown non-crashing cpus */
-   smp_send_crash_stop();
+   crash_smp_send_stop();
 
/* for crashing cpu */
crash_save_cpu(regs, smp_processor_id());
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index dc66e6e..73d8f5e 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -977,11 +977,21 @@ void smp_send_stop(void)
 }
 
 #ifdef CONFIG_KEXEC_CORE
-void smp_send_crash_stop(void)
+void crash_smp_send_stop(void)
 {
+   static int cpus_stopped;
cpumask_t mask;
unsigned long timeout;
 
+   /*
+* This function can be called twice in panic path, but obviously
+* we execute this only once.
+*/
+   if (cpus_stopped)
+   return;
+
+   cpus_stopped = 1;
+
if (num_online_cpus() == 1)
return;
 
-- 
2.7.4



Re: [PATCH] arm64:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores

2017-08-04 Thread Hoeun Ryu

> On 4 Aug 2017, at 8:43 PM, AKASHI Takahiro <takahiro.aka...@linaro.org> wrote:
> 
>> On Fri, Aug 04, 2017 at 11:38:16AM +0100, James Morse wrote:
>> Hi Hoeun,
>> 
>>> On 04/08/17 08:02, Hoeun Ryu wrote:
>>> Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly
>>> version in panic path) introduced crash_smp_send_stop() which is a weak
>>> function and can be overriden by architecture codes to fix the side effect
>>> caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_
>>> notifiers" option).
>> 
>> If I've understood correctly: if we boot with this option core code doesn't 
>> use
>> our machine_crash_shutdown(), and instead calls crash_smp_send_stop(), which 
>> we
> 
> No. Machine_crash_shutdown() is always called, but at that time,
> all the cpus other than the crashing cpu have already died in this case.
> 

You're right.

>> don't have, so it uses the default smp_send_stop(), which doesn't save the 
>> regs.
>> 
>> Thanks for catching this!
>> 
>> 
>> Could we rename smp_send_crash_stop() crash_smp_send_stop() and add the
>> called-twice logic there? They are similar enough that I'm getting them 
>> muddled
>> already!
>> 
> 
> Nice.

I'll reflect it in v2.
Thank you for the review.

> 
> -Takahiro AKASHI
> 
>> 
>> Thanks,
>> 
>> James
>> 
>> 
>>> ARM64 architecture uses the weak version function and the problem is that
>>> the weak function simply calls smp_send_stop() which makes other CPUs
>>> offline and takes away the chance to save crash information for nonpanic
>>> CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel
>>> option is enabled.
>>> 
>>> Calling smp_send_crash_stop() in the function is useless because all
>>> nonpanic CPUs are already offline by smp_send_stop() in this case and
>>> smp_send_crash_stop() only works against online CPUs.
>>> 
>>> The result is that /proc/vmcore is not available with the error messages;
>>> "Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized".
>>> 
>>> crash_smp_send_stop() is implemented for ARM64 architecture to fix this
>>> problem and the function (strong symbol version) saves crash information
>>> for nonpanic CPUs using smp_send_crash_stop() and machine_crash_shutdown()
>>> tries to save crash information for nonpanic CPUs only when
>>> crash_kexec_post_notifiers kernel option is disabled.
>> 
>> 


Re: [PATCH] arm64:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores

2017-08-04 Thread Hoeun Ryu

> On 4 Aug 2017, at 8:43 PM, AKASHI Takahiro  wrote:
> 
>> On Fri, Aug 04, 2017 at 11:38:16AM +0100, James Morse wrote:
>> Hi Hoeun,
>> 
>>> On 04/08/17 08:02, Hoeun Ryu wrote:
>>> Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly
>>> version in panic path) introduced crash_smp_send_stop() which is a weak
>>> function and can be overriden by architecture codes to fix the side effect
>>> caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_
>>> notifiers" option).
>> 
>> If I've understood correctly: if we boot with this option core code doesn't 
>> use
>> our machine_crash_shutdown(), and instead calls crash_smp_send_stop(), which 
>> we
> 
> No. Machine_crash_shutdown() is always called, but at that time,
> all the cpus other than the crashing cpu have already died in this case.
> 

You're right.

>> don't have, so it uses the default smp_send_stop(), which doesn't save the 
>> regs.
>> 
>> Thanks for catching this!
>> 
>> 
>> Could we rename smp_send_crash_stop() crash_smp_send_stop() and add the
>> called-twice logic there? They are similar enough that I'm getting them 
>> muddled
>> already!
>> 
> 
> Nice.

I'll reflect it in v2.
Thank you for the review.

> 
> -Takahiro AKASHI
> 
>> 
>> Thanks,
>> 
>> James
>> 
>> 
>>> ARM64 architecture uses the weak version function and the problem is that
>>> the weak function simply calls smp_send_stop() which makes other CPUs
>>> offline and takes away the chance to save crash information for nonpanic
>>> CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel
>>> option is enabled.
>>> 
>>> Calling smp_send_crash_stop() in the function is useless because all
>>> nonpanic CPUs are already offline by smp_send_stop() in this case and
>>> smp_send_crash_stop() only works against online CPUs.
>>> 
>>> The result is that /proc/vmcore is not available with the error messages;
>>> "Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized".
>>> 
>>> crash_smp_send_stop() is implemented for ARM64 architecture to fix this
>>> problem and the function (strong symbol version) saves crash information
>>> for nonpanic CPUs using smp_send_crash_stop() and machine_crash_shutdown()
>>> tries to save crash information for nonpanic CPUs only when
>>> crash_kexec_post_notifiers kernel option is disabled.
>> 
>> 


Re: [PATCH] arm64:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores

2017-08-04 Thread Hoeun Ryu

> On 4 Aug 2017, at 7:38 PM, James Morse <james.mo...@arm.com> wrote:
> 
> Hi Hoeun,
> 
>> On 04/08/17 08:02, Hoeun Ryu wrote:
>> Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly
>> version in panic path) introduced crash_smp_send_stop() which is a weak
>> function and can be overriden by architecture codes to fix the side effect
>> caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_
>> notifiers" option).
> 
> If I've understood correctly: if we boot with this option core code doesn't 
> use
> our machine_crash_shutdown(), and instead calls crash_smp_send_stop(), which 
> we
> don't have, so it uses the default smp_send_stop(), which doesn't save the 
> regs.
> 
> Thanks for catching this!
> 
> 
> Could we rename smp_send_crash_stop() crash_smp_send_stop() and add the
> called-twice logic there? They are similar enough that I'm getting them 
> muddled
> already!

I think it is possible, I will reflect it in v2.
Thank you for the review.

> 
> Thanks,
> 
> James
> 
> 
>> ARM64 architecture uses the weak version function and the problem is that
>> the weak function simply calls smp_send_stop() which makes other CPUs
>> offline and takes away the chance to save crash information for nonpanic
>> CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel
>> option is enabled.
>> 
>> Calling smp_send_crash_stop() in the function is useless because all
>> nonpanic CPUs are already offline by smp_send_stop() in this case and
>> smp_send_crash_stop() only works against online CPUs.
>> 
>> The result is that /proc/vmcore is not available with the error messages;
>> "Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized".
>> 
>> crash_smp_send_stop() is implemented for ARM64 architecture to fix this
>> problem and the function (strong symbol version) saves crash information
>> for nonpanic CPUs using smp_send_crash_stop() and machine_crash_shutdown()
>> tries to save crash information for nonpanic CPUs only when
>> crash_kexec_post_notifiers kernel option is disabled.
> 
> 


Re: [PATCH] arm64:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores

2017-08-04 Thread Hoeun Ryu

> On 4 Aug 2017, at 7:38 PM, James Morse  wrote:
> 
> Hi Hoeun,
> 
>> On 04/08/17 08:02, Hoeun Ryu wrote:
>> Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly
>> version in panic path) introduced crash_smp_send_stop() which is a weak
>> function and can be overriden by architecture codes to fix the side effect
>> caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_
>> notifiers" option).
> 
> If I've understood correctly: if we boot with this option core code doesn't 
> use
> our machine_crash_shutdown(), and instead calls crash_smp_send_stop(), which 
> we
> don't have, so it uses the default smp_send_stop(), which doesn't save the 
> regs.
> 
> Thanks for catching this!
> 
> 
> Could we rename smp_send_crash_stop() crash_smp_send_stop() and add the
> called-twice logic there? They are similar enough that I'm getting them 
> muddled
> already!

I think it is possible, I will reflect it in v2.
Thank you for the review.

> 
> Thanks,
> 
> James
> 
> 
>> ARM64 architecture uses the weak version function and the problem is that
>> the weak function simply calls smp_send_stop() which makes other CPUs
>> offline and takes away the chance to save crash information for nonpanic
>> CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel
>> option is enabled.
>> 
>> Calling smp_send_crash_stop() in the function is useless because all
>> nonpanic CPUs are already offline by smp_send_stop() in this case and
>> smp_send_crash_stop() only works against online CPUs.
>> 
>> The result is that /proc/vmcore is not available with the error messages;
>> "Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized".
>> 
>> crash_smp_send_stop() is implemented for ARM64 architecture to fix this
>> problem and the function (strong symbol version) saves crash information
>> for nonpanic CPUs using smp_send_crash_stop() and machine_crash_shutdown()
>> tries to save crash information for nonpanic CPUs only when
>> crash_kexec_post_notifiers kernel option is disabled.
> 
> 


Re: [PATCH] arm:lpae: build TTB control register value from scratch in v7_ttb_setup

2017-08-04 Thread Hoeun Ryu


2017. 8. 4. 오후 7:04 Robin Murphy <robin.mur...@arm.com> 작성:

>> On 04/08/17 07:07, Hoeun Ryu wrote:
>> Hello, Russell King.
>> 
>> The following patch has not merged yet.
>> Do you have a plan to accept and merge this patch ?
> 
> This should probably go through the ARM tree, so please submit it to
> Russell's patch-tracking system here:
> 
> http://www.armlinux.org.uk/developer/patches/

Thank you for the reply, I'll try it.

> 
> Robin.
> 
>> 
>> Thank you.
>> 
>>> On Mon, 2017-06-12 at 10:47 +0900, Hoeun Ryu wrote:
>>> Reading TTBCR in early boot stage might return the value of the previous
>>> kernel's configuration, especially in case of kexec. For example, if
>>> normal kernel (first kernel) had run on a configuration of PHYS_OFFSET <=
>>> PAGE_OFFSET and crash kernel (second kernel) is running on a configuration
>>> PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the
>>> reserved area for crash kernel, reading TTBCR and using the value to OR
>>> other bit fields might be risky because it doesn't have a reset value for
>>> TTBCR.
>>> 
>>> Acked-by: Russell King <rmk+ker...@armlinux.org.uk>
>>> Suggested-by: Robin Murphy <robin.mur...@arm.com>
>>> Signed-off-by: Hoeun Ryu <hoeun@gmail.com>
>>> 
>>> ---
>>> 
>>> * add Acked-by: Russell King <rmk+ker...@armlinux.org.uk>
>>> * v1: amended based on
>>> - "[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when
>>>PHYS_OFFSET > PAGE_OFFSET"
>>> - https://lkml.org/lkml/2017/6/5/239
>>> 
>>> arch/arm/mm/proc-v7-3level.S | 3 +--
>>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>> 
>>> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
>>> index 5e5720e..7d16bbc 100644
>>> --- a/arch/arm/mm/proc-v7-3level.S
>>> +++ b/arch/arm/mm/proc-v7-3level.S
>>> @@ -129,8 +129,7 @@ ENDPROC(cpu_v7_set_pte_ext)
>>>.macrov7_ttb_setup, zero, ttbr0l, ttbr0h, ttbr1, tmp
>>>ldr\tmp, =swapper_pg_dir@ swapper_pg_dir virtual address
>>>cmp\ttbr1, \tmp, lsr #12@ PHYS_OFFSET > PAGE_OFFSET?
>>> -mrcp15, 0, \tmp, c2, c0, 2@ TTB control egister
>>> -orr\tmp, \tmp, #TTB_EAE
>>> +mov\tmp, #TTB_EAE@ for TTB control egister
>>>ALT_SMP(orr\tmp, \tmp, #TTB_FLAGS_SMP)
>>>ALT_UP(orr\tmp, \tmp, #TTB_FLAGS_UP)
>>>ALT_SMP(orr\tmp, \tmp, #TTB_FLAGS_SMP << 16)
> 


Re: [PATCH] arm:lpae: build TTB control register value from scratch in v7_ttb_setup

2017-08-04 Thread Hoeun Ryu


2017. 8. 4. 오후 7:04 Robin Murphy  작성:

>> On 04/08/17 07:07, Hoeun Ryu wrote:
>> Hello, Russell King.
>> 
>> The following patch has not merged yet.
>> Do you have a plan to accept and merge this patch ?
> 
> This should probably go through the ARM tree, so please submit it to
> Russell's patch-tracking system here:
> 
> http://www.armlinux.org.uk/developer/patches/

Thank you for the reply, I'll try it.

> 
> Robin.
> 
>> 
>> Thank you.
>> 
>>> On Mon, 2017-06-12 at 10:47 +0900, Hoeun Ryu wrote:
>>> Reading TTBCR in early boot stage might return the value of the previous
>>> kernel's configuration, especially in case of kexec. For example, if
>>> normal kernel (first kernel) had run on a configuration of PHYS_OFFSET <=
>>> PAGE_OFFSET and crash kernel (second kernel) is running on a configuration
>>> PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the
>>> reserved area for crash kernel, reading TTBCR and using the value to OR
>>> other bit fields might be risky because it doesn't have a reset value for
>>> TTBCR.
>>> 
>>> Acked-by: Russell King 
>>> Suggested-by: Robin Murphy 
>>> Signed-off-by: Hoeun Ryu 
>>> 
>>> ---
>>> 
>>> * add Acked-by: Russell King 
>>> * v1: amended based on
>>> - "[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when
>>>PHYS_OFFSET > PAGE_OFFSET"
>>> - https://lkml.org/lkml/2017/6/5/239
>>> 
>>> arch/arm/mm/proc-v7-3level.S | 3 +--
>>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>> 
>>> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
>>> index 5e5720e..7d16bbc 100644
>>> --- a/arch/arm/mm/proc-v7-3level.S
>>> +++ b/arch/arm/mm/proc-v7-3level.S
>>> @@ -129,8 +129,7 @@ ENDPROC(cpu_v7_set_pte_ext)
>>>.macrov7_ttb_setup, zero, ttbr0l, ttbr0h, ttbr1, tmp
>>>ldr\tmp, =swapper_pg_dir@ swapper_pg_dir virtual address
>>>cmp\ttbr1, \tmp, lsr #12@ PHYS_OFFSET > PAGE_OFFSET?
>>> -mrcp15, 0, \tmp, c2, c0, 2@ TTB control egister
>>> -orr\tmp, \tmp, #TTB_EAE
>>> +mov\tmp, #TTB_EAE@ for TTB control egister
>>>ALT_SMP(orr\tmp, \tmp, #TTB_FLAGS_SMP)
>>>ALT_UP(orr\tmp, \tmp, #TTB_FLAGS_UP)
>>>ALT_SMP(orr\tmp, \tmp, #TTB_FLAGS_SMP << 16)
> 


[PATCH] arm64:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores

2017-08-04 Thread Hoeun Ryu
 Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly
version in panic path) introduced crash_smp_send_stop() which is a weak
function and can be overriden by architecture codes to fix the side effect
caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_
notifiers" option).

 ARM64 architecture uses the weak version function and the problem is that
the weak function simply calls smp_send_stop() which makes other CPUs
offline and takes away the chance to save crash information for nonpanic
CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel
option is enabled.

 Calling smp_send_crash_stop() in the function is useless because all
nonpanic CPUs are already offline by smp_send_stop() in this case and
smp_send_crash_stop() only works against online CPUs.

 The result is that /proc/vmcore is not available with the error messages;
"Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized".

 crash_smp_send_stop() is implemented for ARM64 architecture to fix this
problem and the function (strong symbol version) saves crash information
for nonpanic CPUs using smp_send_crash_stop() and machine_crash_shutdown()
tries to save crash information for nonpanic CPUs only when
crash_kexec_post_notifiers kernel option is disabled.

Signed-off-by: Hoeun Ryu <hoeun@gmail.com>
---
 arch/arm64/kernel/machine_kexec.c | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/machine_kexec.c 
b/arch/arm64/kernel/machine_kexec.c
index 481f54a..ec55cd8 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -213,6 +213,23 @@ void machine_kexec(struct kimage *kimage)
BUG(); /* Should never get here. */
 }
 
+void crash_smp_send_stop(void)
+{
+   static int cpus_stopped;
+
+   /*
+* This function can be called twice in panic path, but obviously
+* we execute this only once.
+*/
+   if (cpus_stopped)
+   return;
+
+   /* shutdown non-crashing cpus */
+   smp_send_crash_stop();
+
+   cpus_stopped = 1;
+}
+
 static void machine_kexec_mask_interrupts(void)
 {
unsigned int i;
@@ -252,7 +269,7 @@ void machine_crash_shutdown(struct pt_regs *regs)
local_irq_disable();
 
/* shutdown non-crashing cpus */
-   smp_send_crash_stop();
+   crash_smp_send_stop();
 
/* for crashing cpu */
crash_save_cpu(regs, smp_processor_id());
-- 
2.7.4



[PATCH] arm64:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores

2017-08-04 Thread Hoeun Ryu
 Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly
version in panic path) introduced crash_smp_send_stop() which is a weak
function and can be overriden by architecture codes to fix the side effect
caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_
notifiers" option).

 ARM64 architecture uses the weak version function and the problem is that
the weak function simply calls smp_send_stop() which makes other CPUs
offline and takes away the chance to save crash information for nonpanic
CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel
option is enabled.

 Calling smp_send_crash_stop() in the function is useless because all
nonpanic CPUs are already offline by smp_send_stop() in this case and
smp_send_crash_stop() only works against online CPUs.

 The result is that /proc/vmcore is not available with the error messages;
"Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized".

 crash_smp_send_stop() is implemented for ARM64 architecture to fix this
problem and the function (strong symbol version) saves crash information
for nonpanic CPUs using smp_send_crash_stop() and machine_crash_shutdown()
tries to save crash information for nonpanic CPUs only when
crash_kexec_post_notifiers kernel option is disabled.

Signed-off-by: Hoeun Ryu 
---
 arch/arm64/kernel/machine_kexec.c | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/machine_kexec.c 
b/arch/arm64/kernel/machine_kexec.c
index 481f54a..ec55cd8 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -213,6 +213,23 @@ void machine_kexec(struct kimage *kimage)
BUG(); /* Should never get here. */
 }
 
+void crash_smp_send_stop(void)
+{
+   static int cpus_stopped;
+
+   /*
+* This function can be called twice in panic path, but obviously
+* we execute this only once.
+*/
+   if (cpus_stopped)
+   return;
+
+   /* shutdown non-crashing cpus */
+   smp_send_crash_stop();
+
+   cpus_stopped = 1;
+}
+
 static void machine_kexec_mask_interrupts(void)
 {
unsigned int i;
@@ -252,7 +269,7 @@ void machine_crash_shutdown(struct pt_regs *regs)
local_irq_disable();
 
/* shutdown non-crashing cpus */
-   smp_send_crash_stop();
+   crash_smp_send_stop();
 
/* for crashing cpu */
crash_save_cpu(regs, smp_processor_id());
-- 
2.7.4



[PATCHv2] arm:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores

2017-08-04 Thread Hoeun Ryu
 Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly
version in panic path) introduced crash_smp_send_stop() which is a weak
function and can be overriden by architecture codes to fix the side effect
caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_
notifiers" option).

 ARM architecture uses the weak version function and the problem is that
the weak function simply calls smp_send_stop() which makes other CPUs
offline and takes away the chance to save crash information for nonpanic
CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel
option is enabled.

 Calling smp_call_function(machine_crash_nonpanic_core, NULL, false) in
the function is useless because all nonpanic CPUs are already offline by
smp_send_stop() in this case and smp_call_function() only works against
online CPUs.

 The result is that /proc/vmcore is not available with the error messages;
"Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized".

 crash_smp_send_stop() is implemented for ARM architecture to fix this
problem and the function (strong symbol version) saves crash information
for nonpanic CPUs using smp_call_function() and machine_crash_shutdown()
tries to save crash information for nonpanic CPUs only when
crash_kexec_post_notifiers kernel option is disabled.

 We might be able to implement the function like arm64 or x86 using a
dedicated IPI (let's say IPI_CPU_CRASH_STOP), but we cannot implement this
function like that because of the lack of IPI slots. Please see the commit
e7273ff4 : (ARM: 8488/1: Make IPI_CPU_BACKTRACE a "non-secure" SGI)

Signed-off-by: Hoeun Ryu <hoeun@gmail.com>
---
 v2:
  - calling crash_smp_send_stop() in machine_crash_shutdown() for the case
when crash_kexec_post_notifiers kernel option is disabled.
  - fix commit messages for it.

 arch/arm/kernel/machine_kexec.c | 37 +++--
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index fe1419e..b58a49a 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -94,6 +94,31 @@ void machine_crash_nonpanic_core(void *unused)
cpu_relax();
 }
 
+void crash_smp_send_stop(void)
+{
+   static int cpus_stopped;
+   unsigned long msecs;
+
+   /*
+* This function can be called twice in panic path, but obviously
+* we execute this only once.
+*/
+   if (cpus_stopped)
+   return;
+
+   atomic_set(_for_crash_ipi, num_online_cpus() - 1);
+   smp_call_function(machine_crash_nonpanic_core, NULL, false);
+   msecs = 1000; /* Wait at most a second for the other cpus to stop */
+   while ((atomic_read(_for_crash_ipi) > 0) && msecs) {
+   mdelay(1);
+   msecs--;
+   }
+   if (atomic_read(_for_crash_ipi) > 0)
+   pr_warn("Non-crashing CPUs did not react to IPI\n");
+
+   cpus_stopped = 1;
+}
+
 static void machine_kexec_mask_interrupts(void)
 {
unsigned int i;
@@ -119,19 +144,11 @@ static void machine_kexec_mask_interrupts(void)
 
 void machine_crash_shutdown(struct pt_regs *regs)
 {
-   unsigned long msecs;
+   WARN_ON(num_online_cpus() > 1);
 
local_irq_disable();
 
-   atomic_set(_for_crash_ipi, num_online_cpus() - 1);
-   smp_call_function(machine_crash_nonpanic_core, NULL, false);
-   msecs = 1000; /* Wait at most a second for the other cpus to stop */
-   while ((atomic_read(_for_crash_ipi) > 0) && msecs) {
-   mdelay(1);
-   msecs--;
-   }
-   if (atomic_read(_for_crash_ipi) > 0)
-   pr_warn("Non-crashing CPUs did not react to IPI\n");
+   crash_smp_send_stop();
 
crash_save_cpu(regs, smp_processor_id());
machine_kexec_mask_interrupts();
-- 
2.7.4



[PATCHv2] arm:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores

2017-08-04 Thread Hoeun Ryu
 Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly
version in panic path) introduced crash_smp_send_stop() which is a weak
function and can be overriden by architecture codes to fix the side effect
caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_
notifiers" option).

 ARM architecture uses the weak version function and the problem is that
the weak function simply calls smp_send_stop() which makes other CPUs
offline and takes away the chance to save crash information for nonpanic
CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel
option is enabled.

 Calling smp_call_function(machine_crash_nonpanic_core, NULL, false) in
the function is useless because all nonpanic CPUs are already offline by
smp_send_stop() in this case and smp_call_function() only works against
online CPUs.

 The result is that /proc/vmcore is not available with the error messages;
"Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized".

 crash_smp_send_stop() is implemented for ARM architecture to fix this
problem and the function (strong symbol version) saves crash information
for nonpanic CPUs using smp_call_function() and machine_crash_shutdown()
tries to save crash information for nonpanic CPUs only when
crash_kexec_post_notifiers kernel option is disabled.

 We might be able to implement the function like arm64 or x86 using a
dedicated IPI (let's say IPI_CPU_CRASH_STOP), but we cannot implement this
function like that because of the lack of IPI slots. Please see the commit
e7273ff4 : (ARM: 8488/1: Make IPI_CPU_BACKTRACE a "non-secure" SGI)

Signed-off-by: Hoeun Ryu 
---
 v2:
  - calling crash_smp_send_stop() in machine_crash_shutdown() for the case
when crash_kexec_post_notifiers kernel option is disabled.
  - fix commit messages for it.

 arch/arm/kernel/machine_kexec.c | 37 +++--
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index fe1419e..b58a49a 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -94,6 +94,31 @@ void machine_crash_nonpanic_core(void *unused)
cpu_relax();
 }
 
+void crash_smp_send_stop(void)
+{
+   static int cpus_stopped;
+   unsigned long msecs;
+
+   /*
+* This function can be called twice in panic path, but obviously
+* we execute this only once.
+*/
+   if (cpus_stopped)
+   return;
+
+   atomic_set(_for_crash_ipi, num_online_cpus() - 1);
+   smp_call_function(machine_crash_nonpanic_core, NULL, false);
+   msecs = 1000; /* Wait at most a second for the other cpus to stop */
+   while ((atomic_read(_for_crash_ipi) > 0) && msecs) {
+   mdelay(1);
+   msecs--;
+   }
+   if (atomic_read(_for_crash_ipi) > 0)
+   pr_warn("Non-crashing CPUs did not react to IPI\n");
+
+   cpus_stopped = 1;
+}
+
 static void machine_kexec_mask_interrupts(void)
 {
unsigned int i;
@@ -119,19 +144,11 @@ static void machine_kexec_mask_interrupts(void)
 
 void machine_crash_shutdown(struct pt_regs *regs)
 {
-   unsigned long msecs;
+   WARN_ON(num_online_cpus() > 1);
 
local_irq_disable();
 
-   atomic_set(_for_crash_ipi, num_online_cpus() - 1);
-   smp_call_function(machine_crash_nonpanic_core, NULL, false);
-   msecs = 1000; /* Wait at most a second for the other cpus to stop */
-   while ((atomic_read(_for_crash_ipi) > 0) && msecs) {
-   mdelay(1);
-   msecs--;
-   }
-   if (atomic_read(_for_crash_ipi) > 0)
-   pr_warn("Non-crashing CPUs did not react to IPI\n");
+   crash_smp_send_stop();
 
crash_save_cpu(regs, smp_processor_id());
machine_kexec_mask_interrupts();
-- 
2.7.4



Re: [PATCH] arm:lpae: build TTB control register value from scratch in v7_ttb_setup

2017-08-04 Thread Hoeun Ryu
Hello, Russell King.

The following patch has not merged yet.
Do you have a plan to accept and merge this patch ?

Thank you.

On Mon, 2017-06-12 at 10:47 +0900, Hoeun Ryu wrote:
>  Reading TTBCR in early boot stage might return the value of the previous
> kernel's configuration, especially in case of kexec. For example, if
> normal kernel (first kernel) had run on a configuration of PHYS_OFFSET <=
> PAGE_OFFSET and crash kernel (second kernel) is running on a configuration
> PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the
> reserved area for crash kernel, reading TTBCR and using the value to OR
> other bit fields might be risky because it doesn't have a reset value for
> TTBCR.
> 
> Acked-by: Russell King <rmk+ker...@armlinux.org.uk>
> Suggested-by: Robin Murphy <robin.mur...@arm.com>
> Signed-off-by: Hoeun Ryu <hoeun@gmail.com>
> 
> ---
> 
>  * add Acked-by: Russell King <rmk+ker...@armlinux.org.uk>
>  * v1: amended based on
>  - "[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when
> PHYS_OFFSET > PAGE_OFFSET"
>  - https://lkml.org/lkml/2017/6/5/239
> 
>  arch/arm/mm/proc-v7-3level.S | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
> index 5e5720e..7d16bbc 100644
> --- a/arch/arm/mm/proc-v7-3level.S
> +++ b/arch/arm/mm/proc-v7-3level.S
> @@ -129,8 +129,7 @@ ENDPROC(cpu_v7_set_pte_ext)
>   .macro  v7_ttb_setup, zero, ttbr0l, ttbr0h, ttbr1, tmp
>   ldr \tmp, =swapper_pg_dir   @ swapper_pg_dir virtual address
>   cmp \ttbr1, \tmp, lsr #12   @ PHYS_OFFSET > PAGE_OFFSET?
> - mrc p15, 0, \tmp, c2, c0, 2 @ TTB control egister
> - orr \tmp, \tmp, #TTB_EAE
> + mov \tmp, #TTB_EAE  @ for TTB control egister
>   ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP)
>   ALT_UP(orr  \tmp, \tmp, #TTB_FLAGS_UP)
>   ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP << 16)


Re: [PATCH] arm:lpae: build TTB control register value from scratch in v7_ttb_setup

2017-08-04 Thread Hoeun Ryu
Hello, Russell King.

The following patch has not merged yet.
Do you have a plan to accept and merge this patch ?

Thank you.

On Mon, 2017-06-12 at 10:47 +0900, Hoeun Ryu wrote:
>  Reading TTBCR in early boot stage might return the value of the previous
> kernel's configuration, especially in case of kexec. For example, if
> normal kernel (first kernel) had run on a configuration of PHYS_OFFSET <=
> PAGE_OFFSET and crash kernel (second kernel) is running on a configuration
> PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the
> reserved area for crash kernel, reading TTBCR and using the value to OR
> other bit fields might be risky because it doesn't have a reset value for
> TTBCR.
> 
> Acked-by: Russell King 
> Suggested-by: Robin Murphy 
> Signed-off-by: Hoeun Ryu 
> 
> ---
> 
>  * add Acked-by: Russell King 
>  * v1: amended based on
>  - "[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when
> PHYS_OFFSET > PAGE_OFFSET"
>  - https://lkml.org/lkml/2017/6/5/239
> 
>  arch/arm/mm/proc-v7-3level.S | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
> index 5e5720e..7d16bbc 100644
> --- a/arch/arm/mm/proc-v7-3level.S
> +++ b/arch/arm/mm/proc-v7-3level.S
> @@ -129,8 +129,7 @@ ENDPROC(cpu_v7_set_pte_ext)
>   .macro  v7_ttb_setup, zero, ttbr0l, ttbr0h, ttbr1, tmp
>   ldr \tmp, =swapper_pg_dir   @ swapper_pg_dir virtual address
>   cmp \ttbr1, \tmp, lsr #12   @ PHYS_OFFSET > PAGE_OFFSET?
> - mrc p15, 0, \tmp, c2, c0, 2 @ TTB control egister
> - orr \tmp, \tmp, #TTB_EAE
> + mov \tmp, #TTB_EAE  @ for TTB control egister
>   ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP)
>   ALT_UP(orr  \tmp, \tmp, #TTB_FLAGS_UP)
>   ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP << 16)


[PATCH] arm:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores

2017-08-04 Thread Hoeun Ryu
 Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly
version in panic path) introduced crash_smp_send_stop() which is a weak
function and can be overriden by architecture codes to fix the side effect
caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_
notifiers" option).

 ARM architecture uses the weak version function and the problem is that
the weak function simply calls smp_send_stop() which makes other CPUs
offline and takes away the chance to save crash information for nonpanic
CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel
option is enabled.

 Calling smp_call_function(machine_crash_nonpanic_core, NULL, false) in
the function is useless because all nonpanic CPUs are already offline by
smp_send_stop() in this case and smp_call_function() only works against
online CPUs.

 The result is that /proc/vmcore is not available with the error messages;
"Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized".

 crash_smp_send_stop() is implemented for ARM architecture to fix this
problem and the function (strong symbol version) saves crash information
for nonpanic CPUs using smp_call_function() and machine_crash_shutdown()
saves crash information only for the panic CPU.

 We might be able to implement the function like arm64 or x86 using a
dedicated IPI (let's say IPI_CPU_CRASH_STOP), but we cannot implement this
function like that because of the lack of IPI slot. Please see the commit
e7273ff4 : (ARM: 8488/1: Make IPI_CPU_BACKTRACE a "non-secure" SGI)

Signed-off-by: Hoeun Ryu <hoeun@gmail.com>
---
 arch/arm/kernel/machine_kexec.c | 37 ++---
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index fe1419e..18d084a 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -94,6 +94,31 @@ void machine_crash_nonpanic_core(void *unused)
cpu_relax();
 }
 
+void crash_smp_send_stop(void)
+{
+   static int cpus_stopped;
+   unsigned long msecs;
+
+   /*
+* This function can be called twice in panic path, but obviously
+* we execute this only once.
+*/
+   if (cpus_stopped)
+   return;
+
+   atomic_set(_for_crash_ipi, num_online_cpus() - 1);
+   smp_call_function(machine_crash_nonpanic_core, NULL, false);
+   msecs = 1000; /* Wait at most a second for the other cpus to stop */
+   while ((atomic_read(_for_crash_ipi) > 0) && msecs) {
+   mdelay(1);
+   msecs--;
+   }
+   if (atomic_read(_for_crash_ipi) > 0)
+   pr_warn("Non-crashing CPUs did not react to IPI\n");
+
+   cpus_stopped = 1;
+}
+
 static void machine_kexec_mask_interrupts(void)
 {
unsigned int i;
@@ -119,20 +144,10 @@ static void machine_kexec_mask_interrupts(void)
 
 void machine_crash_shutdown(struct pt_regs *regs)
 {
-   unsigned long msecs;
+   WARN_ON(num_online_cpus() > 1);
 
local_irq_disable();
 
-   atomic_set(_for_crash_ipi, num_online_cpus() - 1);
-   smp_call_function(machine_crash_nonpanic_core, NULL, false);
-   msecs = 1000; /* Wait at most a second for the other cpus to stop */
-   while ((atomic_read(_for_crash_ipi) > 0) && msecs) {
-   mdelay(1);
-   msecs--;
-   }
-   if (atomic_read(_for_crash_ipi) > 0)
-   pr_warn("Non-crashing CPUs did not react to IPI\n");
-
crash_save_cpu(regs, smp_processor_id());
machine_kexec_mask_interrupts();
 
-- 
2.7.4



[PATCH] arm:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores

2017-08-04 Thread Hoeun Ryu
 Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly
version in panic path) introduced crash_smp_send_stop() which is a weak
function and can be overriden by architecture codes to fix the side effect
caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_
notifiers" option).

 ARM architecture uses the weak version function and the problem is that
the weak function simply calls smp_send_stop() which makes other CPUs
offline and takes away the chance to save crash information for nonpanic
CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel
option is enabled.

 Calling smp_call_function(machine_crash_nonpanic_core, NULL, false) in
the function is useless because all nonpanic CPUs are already offline by
smp_send_stop() in this case and smp_call_function() only works against
online CPUs.

 The result is that /proc/vmcore is not available with the error messages;
"Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized".

 crash_smp_send_stop() is implemented for ARM architecture to fix this
problem and the function (strong symbol version) saves crash information
for nonpanic CPUs using smp_call_function() and machine_crash_shutdown()
saves crash information only for the panic CPU.

 We might be able to implement the function like arm64 or x86 using a
dedicated IPI (let's say IPI_CPU_CRASH_STOP), but we cannot implement this
function like that because of the lack of IPI slot. Please see the commit
e7273ff4 : (ARM: 8488/1: Make IPI_CPU_BACKTRACE a "non-secure" SGI)

Signed-off-by: Hoeun Ryu 
---
 arch/arm/kernel/machine_kexec.c | 37 ++---
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index fe1419e..18d084a 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -94,6 +94,31 @@ void machine_crash_nonpanic_core(void *unused)
cpu_relax();
 }
 
+void crash_smp_send_stop(void)
+{
+   static int cpus_stopped;
+   unsigned long msecs;
+
+   /*
+* This function can be called twice in panic path, but obviously
+* we execute this only once.
+*/
+   if (cpus_stopped)
+   return;
+
+   atomic_set(_for_crash_ipi, num_online_cpus() - 1);
+   smp_call_function(machine_crash_nonpanic_core, NULL, false);
+   msecs = 1000; /* Wait at most a second for the other cpus to stop */
+   while ((atomic_read(_for_crash_ipi) > 0) && msecs) {
+   mdelay(1);
+   msecs--;
+   }
+   if (atomic_read(_for_crash_ipi) > 0)
+   pr_warn("Non-crashing CPUs did not react to IPI\n");
+
+   cpus_stopped = 1;
+}
+
 static void machine_kexec_mask_interrupts(void)
 {
unsigned int i;
@@ -119,20 +144,10 @@ static void machine_kexec_mask_interrupts(void)
 
 void machine_crash_shutdown(struct pt_regs *regs)
 {
-   unsigned long msecs;
+   WARN_ON(num_online_cpus() > 1);
 
local_irq_disable();
 
-   atomic_set(_for_crash_ipi, num_online_cpus() - 1);
-   smp_call_function(machine_crash_nonpanic_core, NULL, false);
-   msecs = 1000; /* Wait at most a second for the other cpus to stop */
-   while ((atomic_read(_for_crash_ipi) > 0) && msecs) {
-   mdelay(1);
-   msecs--;
-   }
-   if (atomic_read(_for_crash_ipi) > 0)
-   pr_warn("Non-crashing CPUs did not react to IPI\n");
-
crash_save_cpu(regs, smp_processor_id());
machine_kexec_mask_interrupts();
 
-- 
2.7.4



Re: [PATCH] arm:lpae: build TTB control register value from scratch in v7_ttb_setup

2017-07-03 Thread Hoeun Ryu
Hello, Russell King.

Do you have a plan to include this patch in your tree ?

Thank you.

On Mon, 2017-06-12 at 10:47 +0900, Hoeun Ryu wrote:
>  Reading TTBCR in early boot stage might return the value of the
> previous
> kernel's configuration, especially in case of kexec. For example, if
> normal kernel (first kernel) had run on a configuration of
> PHYS_OFFSET <=
> PAGE_OFFSET and crash kernel (second kernel) is running on a
> configuration
> PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the
> reserved area for crash kernel, reading TTBCR and using the value to
> OR
> other bit fields might be risky because it doesn't have a reset value
> for
> TTBCR.
> 
> Acked-by: Russell King <rmk+ker...@armlinux.org.uk>
> Suggested-by: Robin Murphy <robin.mur...@arm.com>
> Signed-off-by: Hoeun Ryu <hoeun@gmail.com>
> 
> ---
> 
>  * add Acked-by: Russell King <rmk+ker...@armlinux.org.uk>
>  * v1: amended based on
>  - "[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when
> PHYS_OFFSET > PAGE_OFFSET"
>  - https://lkml.org/lkml/2017/6/5/239
> 
>  arch/arm/mm/proc-v7-3level.S | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-
> 3level.S
> index 5e5720e..7d16bbc 100644
> --- a/arch/arm/mm/proc-v7-3level.S
> +++ b/arch/arm/mm/proc-v7-3level.S
> @@ -129,8 +129,7 @@ ENDPROC(cpu_v7_set_pte_ext)
>   .macro  v7_ttb_setup, zero, ttbr0l, ttbr0h, ttbr1, tmp
>   ldr \tmp, =swapper_pg_dir   @
> swapper_pg_dir virtual address
>   cmp \ttbr1, \tmp, lsr #12   @
> PHYS_OFFSET > PAGE_OFFSET?
> - mrc p15, 0, \tmp, c2, c0, 2 @ TTB
> control egister
> - orr \tmp, \tmp, #TTB_EAE
> + mov \tmp, #TTB_EAE  @ for TTB
> control egister
>   ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP)
>   ALT_UP(orr  \tmp, \tmp, #TTB_FLAGS_UP)
>   ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP << 16)


Re: [PATCH] arm:lpae: build TTB control register value from scratch in v7_ttb_setup

2017-07-03 Thread Hoeun Ryu
Hello, Russell King.

Do you have a plan to include this patch in your tree ?

Thank you.

On Mon, 2017-06-12 at 10:47 +0900, Hoeun Ryu wrote:
>  Reading TTBCR in early boot stage might return the value of the
> previous
> kernel's configuration, especially in case of kexec. For example, if
> normal kernel (first kernel) had run on a configuration of
> PHYS_OFFSET <=
> PAGE_OFFSET and crash kernel (second kernel) is running on a
> configuration
> PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the
> reserved area for crash kernel, reading TTBCR and using the value to
> OR
> other bit fields might be risky because it doesn't have a reset value
> for
> TTBCR.
> 
> Acked-by: Russell King 
> Suggested-by: Robin Murphy 
> Signed-off-by: Hoeun Ryu 
> 
> ---
> 
>  * add Acked-by: Russell King 
>  * v1: amended based on
>  - "[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when
> PHYS_OFFSET > PAGE_OFFSET"
>  - https://lkml.org/lkml/2017/6/5/239
> 
>  arch/arm/mm/proc-v7-3level.S | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-
> 3level.S
> index 5e5720e..7d16bbc 100644
> --- a/arch/arm/mm/proc-v7-3level.S
> +++ b/arch/arm/mm/proc-v7-3level.S
> @@ -129,8 +129,7 @@ ENDPROC(cpu_v7_set_pte_ext)
>   .macro  v7_ttb_setup, zero, ttbr0l, ttbr0h, ttbr1, tmp
>   ldr \tmp, =swapper_pg_dir   @
> swapper_pg_dir virtual address
>   cmp \ttbr1, \tmp, lsr #12   @
> PHYS_OFFSET > PAGE_OFFSET?
> - mrc p15, 0, \tmp, c2, c0, 2 @ TTB
> control egister
> - orr \tmp, \tmp, #TTB_EAE
> + mov \tmp, #TTB_EAE  @ for TTB
> control egister
>   ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP)
>   ALT_UP(orr  \tmp, \tmp, #TTB_FLAGS_UP)
>   ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP << 16)


Re: [RESEND PATCH 1/2] arm:omap2+: put omap_uart_phys/virt/lsr in .text section when ZIMAGE is true

2017-06-14 Thread Hoeun Ryu
On Tue, 2017-06-13 at 22:27 -0700, Tony Lindgren wrote:
> Hi,
> 
> * Hoeun Ryu <hoeun@gmail.com> [170612 18:18]:
> > 
> > --- a/arch/arm/include/debug/omap2plus.S
> > +++ b/arch/arm/include/debug/omap2plus.S
> > @@ -58,11 +58,22 @@
> >  
> >  #define UART_OFFSET(addr)  ((addr) & 0x00ff)
> >  
> > +/*
> > + * Definition of ZIMAGE is in arch/arm/boot/compressed/Makefile.
> > + * Place the following block in .text section only when this file
> > is
> > + * included by arch/arm/boot/compressed/* to make it possible to
> > + * enable CONFIG_DEBUG_UNCOMPRESS and DEBUG in
> > arch/arm/boot/compressed/head.S
> > + * on OMAP2+ SoCs.
> > + */
> > +#ifndef ZIMAGE
> >     .pushsection .data
> > +#endif
> >  omap_uart_phys:.word   0
> >  omap_uart_virt:.word   0
> >  omap_uart_lsr: .word   0
> > +#ifndef ZIMAGE
> >     .popsection
> > +#endif
> So I converted all these to use the 8250 debug_ll yesterday
> which should solve the DEBUG_UNCOMPRESS issue for you and
> allows us to remove this file. Will post the series shortly
> for testing with you in Cc after I've done a bit more testing
> here.

It sounds good.
patch 01 is dropped though, are you using the second patch ?
"[PATCH 2/2] arm:omap2+: drop dependence on DEBUG_OMAP2PLUS_UART for
DEBUG_UNCOMPRESS"

Thank you.

> 
> Regards,
> 
> Tony


Re: [RESEND PATCH 1/2] arm:omap2+: put omap_uart_phys/virt/lsr in .text section when ZIMAGE is true

2017-06-14 Thread Hoeun Ryu
On Tue, 2017-06-13 at 22:27 -0700, Tony Lindgren wrote:
> Hi,
> 
> * Hoeun Ryu  [170612 18:18]:
> > 
> > --- a/arch/arm/include/debug/omap2plus.S
> > +++ b/arch/arm/include/debug/omap2plus.S
> > @@ -58,11 +58,22 @@
> >  
> >  #define UART_OFFSET(addr)  ((addr) & 0x00ff)
> >  
> > +/*
> > + * Definition of ZIMAGE is in arch/arm/boot/compressed/Makefile.
> > + * Place the following block in .text section only when this file
> > is
> > + * included by arch/arm/boot/compressed/* to make it possible to
> > + * enable CONFIG_DEBUG_UNCOMPRESS and DEBUG in
> > arch/arm/boot/compressed/head.S
> > + * on OMAP2+ SoCs.
> > + */
> > +#ifndef ZIMAGE
> >     .pushsection .data
> > +#endif
> >  omap_uart_phys:.word   0
> >  omap_uart_virt:.word   0
> >  omap_uart_lsr: .word   0
> > +#ifndef ZIMAGE
> >     .popsection
> > +#endif
> So I converted all these to use the 8250 debug_ll yesterday
> which should solve the DEBUG_UNCOMPRESS issue for you and
> allows us to remove this file. Will post the series shortly
> for testing with you in Cc after I've done a bit more testing
> here.

It sounds good.
patch 01 is dropped though, are you using the second patch ?
"[PATCH 2/2] arm:omap2+: drop dependence on DEBUG_OMAP2PLUS_UART for
DEBUG_UNCOMPRESS"

Thank you.

> 
> Regards,
> 
> Tony


[RESEND PATCH 2/2] arm:omap2+: drop dependence on DEBUG_OMAP2PLUS_UART for DEBUG_UNCOMPRESS

2017-06-12 Thread Hoeun Ryu
 With a patch 9a5151e1d1ba4df41bc4b8e8c0f17163247645ac, "arm:omap2+: put
omap_uart_phys/virt/lsr in .text section when ZIMAGE is true", the
dependence on DEBUG_OMAP2PLUS_UART for DEBUG_UNCOMPRESS can be dropped,
because all code of omap2+'s DEBUG_LL_INCLUDE file is in .text section when
it's included in the decompressor.

Signed-off-by: Hoeun Ryu <hoeun@gmail.com>
---
 * mail to relevant recipients, no response yet from them.
   - add TO=Tony Lindgren <t...@atomide.com>
   - add CC=linux-o...@vger.kernel.org
 * indentical to previous patch

 arch/arm/Kconfig.debug | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug
index ba2cb63..52eb0bf 100644
--- a/arch/arm/Kconfig.debug
+++ b/arch/arm/Kconfig.debug
@@ -1735,8 +1735,7 @@ config DEBUG_UART_8250_FLOW_CONTROL
 config DEBUG_UNCOMPRESS
bool
depends on ARCH_MULTIPLATFORM || PLAT_SAMSUNG || ARM_SINGLE_ARMV7M
-   default y if DEBUG_LL && !DEBUG_OMAP2PLUS_UART && \
-(!DEBUG_TEGRA_UART || !ZBOOT_ROM) && \
+   default y if DEBUG_LL && (!DEBUG_TEGRA_UART || !ZBOOT_ROM) && \
 !DEBUG_BRCMSTB_UART
help
  This option influences the normal decompressor output for
-- 
2.7.4



[RESEND PATCH 2/2] arm:omap2+: drop dependence on DEBUG_OMAP2PLUS_UART for DEBUG_UNCOMPRESS

2017-06-12 Thread Hoeun Ryu
 With a patch 9a5151e1d1ba4df41bc4b8e8c0f17163247645ac, "arm:omap2+: put
omap_uart_phys/virt/lsr in .text section when ZIMAGE is true", the
dependence on DEBUG_OMAP2PLUS_UART for DEBUG_UNCOMPRESS can be dropped,
because all code of omap2+'s DEBUG_LL_INCLUDE file is in .text section when
it's included in the decompressor.

Signed-off-by: Hoeun Ryu 
---
 * mail to relevant recipients, no response yet from them.
   - add TO=Tony Lindgren 
   - add CC=linux-o...@vger.kernel.org
 * indentical to previous patch

 arch/arm/Kconfig.debug | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug
index ba2cb63..52eb0bf 100644
--- a/arch/arm/Kconfig.debug
+++ b/arch/arm/Kconfig.debug
@@ -1735,8 +1735,7 @@ config DEBUG_UART_8250_FLOW_CONTROL
 config DEBUG_UNCOMPRESS
bool
depends on ARCH_MULTIPLATFORM || PLAT_SAMSUNG || ARM_SINGLE_ARMV7M
-   default y if DEBUG_LL && !DEBUG_OMAP2PLUS_UART && \
-(!DEBUG_TEGRA_UART || !ZBOOT_ROM) && \
+   default y if DEBUG_LL && (!DEBUG_TEGRA_UART || !ZBOOT_ROM) && \
 !DEBUG_BRCMSTB_UART
help
  This option influences the normal decompressor output for
-- 
2.7.4



[RESEND PATCH 1/2] arm:omap2+: put omap_uart_phys/virt/lsr in .text section when ZIMAGE is true

2017-06-12 Thread Hoeun Ryu
 omap_uart_phys, omap_uart_virt and omap_uart_lsr reside in .data section
and it's right implementation. But because of this, we cannot enable
CONFIG_DEBUG_UNCOMPRESS. LL_DEBUG and DEBUG_UNCOMPRESS are very useful tools
for debugging early boot stage when something goes wrong if you don't have
any hardware based debugging tools like a JTAG debugger.
 This patch is to put the variables into .text section instead only when the
DEBUG_LL_INCLUDE file is included in the kernel decompressor, which is only
when ZIMAGE is defined.
 This patch does not change anything when DEBUG_LL_INCLUDE is included in
the other kernel parts like arch/arm/kernel/*

Signed-off-by: Hoeun Ryu <hoeun@gmail.com>
---
 * to mail to relevant recipients, no respond yet from them
   - TO=Tony Lindgren <t...@atomide.com>
   - CC=linux-o...@vger.kernel.org
 * indentical to previous patch

 arch/arm/include/debug/omap2plus.S | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm/include/debug/omap2plus.S 
b/arch/arm/include/debug/omap2plus.S
index 6d867ae..6ce6ef9 100644
--- a/arch/arm/include/debug/omap2plus.S
+++ b/arch/arm/include/debug/omap2plus.S
@@ -58,11 +58,22 @@
 
 #define UART_OFFSET(addr)  ((addr) & 0x00ff)
 
+/*
+ * Definition of ZIMAGE is in arch/arm/boot/compressed/Makefile.
+ * Place the following block in .text section only when this file is
+ * included by arch/arm/boot/compressed/* to make it possible to
+ * enable CONFIG_DEBUG_UNCOMPRESS and DEBUG in arch/arm/boot/compressed/head.S
+ * on OMAP2+ SoCs.
+ */
+#ifndef ZIMAGE
.pushsection .data
+#endif
 omap_uart_phys:.word   0
 omap_uart_virt:.word   0
 omap_uart_lsr: .word   0
+#ifndef ZIMAGE
.popsection
+#endif
 
.macro  addruart, rp, rv, tmp
 
-- 
2.7.4



[RESEND PATCH 1/2] arm:omap2+: put omap_uart_phys/virt/lsr in .text section when ZIMAGE is true

2017-06-12 Thread Hoeun Ryu
 omap_uart_phys, omap_uart_virt and omap_uart_lsr reside in .data section
and it's right implementation. But because of this, we cannot enable
CONFIG_DEBUG_UNCOMPRESS. LL_DEBUG and DEBUG_UNCOMPRESS are very useful tools
for debugging early boot stage when something goes wrong if you don't have
any hardware based debugging tools like a JTAG debugger.
 This patch is to put the variables into .text section instead only when the
DEBUG_LL_INCLUDE file is included in the kernel decompressor, which is only
when ZIMAGE is defined.
 This patch does not change anything when DEBUG_LL_INCLUDE is included in
the other kernel parts like arch/arm/kernel/*

Signed-off-by: Hoeun Ryu 
---
 * to mail to relevant recipients, no respond yet from them
   - TO=Tony Lindgren 
   - CC=linux-o...@vger.kernel.org
 * indentical to previous patch

 arch/arm/include/debug/omap2plus.S | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm/include/debug/omap2plus.S 
b/arch/arm/include/debug/omap2plus.S
index 6d867ae..6ce6ef9 100644
--- a/arch/arm/include/debug/omap2plus.S
+++ b/arch/arm/include/debug/omap2plus.S
@@ -58,11 +58,22 @@
 
 #define UART_OFFSET(addr)  ((addr) & 0x00ff)
 
+/*
+ * Definition of ZIMAGE is in arch/arm/boot/compressed/Makefile.
+ * Place the following block in .text section only when this file is
+ * included by arch/arm/boot/compressed/* to make it possible to
+ * enable CONFIG_DEBUG_UNCOMPRESS and DEBUG in arch/arm/boot/compressed/head.S
+ * on OMAP2+ SoCs.
+ */
+#ifndef ZIMAGE
.pushsection .data
+#endif
 omap_uart_phys:.word   0
 omap_uart_virt:.word   0
 omap_uart_lsr: .word   0
+#ifndef ZIMAGE
.popsection
+#endif
 
.macro  addruart, rp, rv, tmp
 
-- 
2.7.4



[PATCH] arm:lpae: build TTB control register value from scratch in v7_ttb_setup

2017-06-11 Thread Hoeun Ryu
 Reading TTBCR in early boot stage might return the value of the previous
kernel's configuration, especially in case of kexec. For example, if
normal kernel (first kernel) had run on a configuration of PHYS_OFFSET <=
PAGE_OFFSET and crash kernel (second kernel) is running on a configuration
PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the
reserved area for crash kernel, reading TTBCR and using the value to OR
other bit fields might be risky because it doesn't have a reset value for
TTBCR.

Acked-by: Russell King <rmk+ker...@armlinux.org.uk>
Suggested-by: Robin Murphy <robin.mur...@arm.com>
Signed-off-by: Hoeun Ryu <hoeun@gmail.com>

---

 * add Acked-by: Russell King <rmk+ker...@armlinux.org.uk>
 * v1: amended based on
 - "[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when
PHYS_OFFSET > PAGE_OFFSET"
 - https://lkml.org/lkml/2017/6/5/239

 arch/arm/mm/proc-v7-3level.S | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
index 5e5720e..7d16bbc 100644
--- a/arch/arm/mm/proc-v7-3level.S
+++ b/arch/arm/mm/proc-v7-3level.S
@@ -129,8 +129,7 @@ ENDPROC(cpu_v7_set_pte_ext)
.macro  v7_ttb_setup, zero, ttbr0l, ttbr0h, ttbr1, tmp
ldr \tmp, =swapper_pg_dir   @ swapper_pg_dir virtual address
cmp \ttbr1, \tmp, lsr #12   @ PHYS_OFFSET > PAGE_OFFSET?
-   mrc p15, 0, \tmp, c2, c0, 2 @ TTB control egister
-   orr \tmp, \tmp, #TTB_EAE
+   mov \tmp, #TTB_EAE  @ for TTB control egister
ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP)
ALT_UP(orr  \tmp, \tmp, #TTB_FLAGS_UP)
ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP << 16)
-- 
2.7.4



[PATCH] arm:lpae: build TTB control register value from scratch in v7_ttb_setup

2017-06-11 Thread Hoeun Ryu
 Reading TTBCR in early boot stage might return the value of the previous
kernel's configuration, especially in case of kexec. For example, if
normal kernel (first kernel) had run on a configuration of PHYS_OFFSET <=
PAGE_OFFSET and crash kernel (second kernel) is running on a configuration
PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the
reserved area for crash kernel, reading TTBCR and using the value to OR
other bit fields might be risky because it doesn't have a reset value for
TTBCR.

Acked-by: Russell King 
Suggested-by: Robin Murphy 
Signed-off-by: Hoeun Ryu 

---

 * add Acked-by: Russell King 
 * v1: amended based on
 - "[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when
PHYS_OFFSET > PAGE_OFFSET"
 - https://lkml.org/lkml/2017/6/5/239

 arch/arm/mm/proc-v7-3level.S | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
index 5e5720e..7d16bbc 100644
--- a/arch/arm/mm/proc-v7-3level.S
+++ b/arch/arm/mm/proc-v7-3level.S
@@ -129,8 +129,7 @@ ENDPROC(cpu_v7_set_pte_ext)
.macro  v7_ttb_setup, zero, ttbr0l, ttbr0h, ttbr1, tmp
ldr \tmp, =swapper_pg_dir   @ swapper_pg_dir virtual address
cmp \ttbr1, \tmp, lsr #12   @ PHYS_OFFSET > PAGE_OFFSET?
-   mrc p15, 0, \tmp, c2, c0, 2 @ TTB control egister
-   orr \tmp, \tmp, #TTB_EAE
+   mov \tmp, #TTB_EAE  @ for TTB control egister
ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP)
ALT_UP(orr  \tmp, \tmp, #TTB_FLAGS_UP)
ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP << 16)
-- 
2.7.4



Re: [PATCH 1/2] arm:omap2+: put omap_uart_phys/virt/lsr in .text section when ZIMAGE is true

2017-06-09 Thread Hoeun Ryu
Hello, Russell.

Would you please review this patch ?

Than you

> On Jun 8, 2017, at 11:16 AM, Hoeun Ryu <hoeun@gmail.com> wrote:
> 
> omap_uart_phys, omap_uart_virt and omap_uart_lsr reside in .data section
> and it's right implementation. But because of this, we cannot enable
> CONFIG_DEBUG_UNCOMPRESS. LL_DEBUG and DEBUG_UNCOMPRESS are very useful tools
> for debugging early boot stage when something goes wrong if you don't have
> any hardware based debugging tools like a JTAG debugger.
> This patch is to put the variables into .text section instead only when the
> DEBUG_LL_INCLUDE file is included in the kernel decompressor, which is only
> when ZIMAGE is defined.
> This patch does not change anything when DEBUG_LL_INCLUDE is included in
> the other kernel parts like arch/arm/kernel/*
> 
> Signed-off-by: Hoeun Ryu <hoeun@gmail.com>
> ---
> arch/arm/include/debug/omap2plus.S | 11 +++
> 1 file changed, 11 insertions(+)
> 
> diff --git a/arch/arm/include/debug/omap2plus.S 
> b/arch/arm/include/debug/omap2plus.S
> index 6d867ae..6ce6ef9 100644
> --- a/arch/arm/include/debug/omap2plus.S
> +++ b/arch/arm/include/debug/omap2plus.S
> @@ -58,11 +58,22 @@
> 
> #define UART_OFFSET(addr)((addr) & 0x00ff)
> 
> +/*
> + * Definition of ZIMAGE is in arch/arm/boot/compressed/Makefile.
> + * Place the following block in .text section only when this file is
> + * included by arch/arm/boot/compressed/* to make it possible to
> + * enable CONFIG_DEBUG_UNCOMPRESS and DEBUG in 
> arch/arm/boot/compressed/head.S
> + * on OMAP2+ SoCs.
> + */
> +#ifndef ZIMAGE
>.pushsection .data
> +#endif
> omap_uart_phys:.word0
> omap_uart_virt:.word0
> omap_uart_lsr:.word0
> +#ifndef ZIMAGE
>.popsection
> +#endif
> 
>.macroaddruart, rp, rv, tmp
> 
> -- 
> 2.7.4
> 


Re: [PATCH] arm:lpae: build TTB control register value from scratch in v7_ttb_setup

2017-06-09 Thread Hoeun Ryu
Hello, Russell and Robin.

Would you please review this patch ?

Than you

> On Jun 7, 2017, at 11:39 AM, Hoeun Ryu <hoeun@gmail.com> wrote:
> 
> Reading TTBCR in early boot stage might return the value of the previous
> kernel's configuration, especially in case of kexec. For example, if
> normal kernel (first kernel) had run on a configuration of PHYS_OFFSET <=
> PAGE_OFFSET and crash kernel (second kernel) is running on a configuration
> PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the
> reserved area for crash kernel, reading TTBCR and using the value to OR
> other bit fields might be risky because it doesn't have a reset value for
> TTBCR.
> 
> Suggested-by: Robin Murphy <robin.mur...@arm.com>
> Signed-off-by: Hoeun Ryu <hoeun@gmail.com>
> ---
> 
> * v1: amended based on
>  - "[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when
> PHYS_OFFSET > PAGE_OFFSET"
>  - https://lkml.org/lkml/2017/6/5/239
> 
> arch/arm/mm/proc-v7-3level.S | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
> index 5e5720e..7d16bbc 100644
> --- a/arch/arm/mm/proc-v7-3level.S
> +++ b/arch/arm/mm/proc-v7-3level.S
> @@ -129,8 +129,7 @@ ENDPROC(cpu_v7_set_pte_ext)
>.macrov7_ttb_setup, zero, ttbr0l, ttbr0h, ttbr1, tmp
>ldr\tmp, =swapper_pg_dir@ swapper_pg_dir virtual address
>cmp\ttbr1, \tmp, lsr #12@ PHYS_OFFSET > PAGE_OFFSET?
> -mrcp15, 0, \tmp, c2, c0, 2@ TTB control egister
> -orr\tmp, \tmp, #TTB_EAE
> +mov\tmp, #TTB_EAE@ for TTB control egister
>ALT_SMP(orr\tmp, \tmp, #TTB_FLAGS_SMP)
>ALT_UP(orr\tmp, \tmp, #TTB_FLAGS_UP)
>ALT_SMP(orr\tmp, \tmp, #TTB_FLAGS_SMP << 16)
> -- 
> 2.7.4
> 


Re: [PATCH 1/2] arm:omap2+: put omap_uart_phys/virt/lsr in .text section when ZIMAGE is true

2017-06-09 Thread Hoeun Ryu
Hello, Russell.

Would you please review this patch ?

Than you

> On Jun 8, 2017, at 11:16 AM, Hoeun Ryu  wrote:
> 
> omap_uart_phys, omap_uart_virt and omap_uart_lsr reside in .data section
> and it's right implementation. But because of this, we cannot enable
> CONFIG_DEBUG_UNCOMPRESS. LL_DEBUG and DEBUG_UNCOMPRESS are very useful tools
> for debugging early boot stage when something goes wrong if you don't have
> any hardware based debugging tools like a JTAG debugger.
> This patch is to put the variables into .text section instead only when the
> DEBUG_LL_INCLUDE file is included in the kernel decompressor, which is only
> when ZIMAGE is defined.
> This patch does not change anything when DEBUG_LL_INCLUDE is included in
> the other kernel parts like arch/arm/kernel/*
> 
> Signed-off-by: Hoeun Ryu 
> ---
> arch/arm/include/debug/omap2plus.S | 11 +++
> 1 file changed, 11 insertions(+)
> 
> diff --git a/arch/arm/include/debug/omap2plus.S 
> b/arch/arm/include/debug/omap2plus.S
> index 6d867ae..6ce6ef9 100644
> --- a/arch/arm/include/debug/omap2plus.S
> +++ b/arch/arm/include/debug/omap2plus.S
> @@ -58,11 +58,22 @@
> 
> #define UART_OFFSET(addr)((addr) & 0x00ff)
> 
> +/*
> + * Definition of ZIMAGE is in arch/arm/boot/compressed/Makefile.
> + * Place the following block in .text section only when this file is
> + * included by arch/arm/boot/compressed/* to make it possible to
> + * enable CONFIG_DEBUG_UNCOMPRESS and DEBUG in 
> arch/arm/boot/compressed/head.S
> + * on OMAP2+ SoCs.
> + */
> +#ifndef ZIMAGE
>.pushsection .data
> +#endif
> omap_uart_phys:.word0
> omap_uart_virt:.word0
> omap_uart_lsr:.word0
> +#ifndef ZIMAGE
>.popsection
> +#endif
> 
>.macroaddruart, rp, rv, tmp
> 
> -- 
> 2.7.4
> 


Re: [PATCH] arm:lpae: build TTB control register value from scratch in v7_ttb_setup

2017-06-09 Thread Hoeun Ryu
Hello, Russell and Robin.

Would you please review this patch ?

Than you

> On Jun 7, 2017, at 11:39 AM, Hoeun Ryu  wrote:
> 
> Reading TTBCR in early boot stage might return the value of the previous
> kernel's configuration, especially in case of kexec. For example, if
> normal kernel (first kernel) had run on a configuration of PHYS_OFFSET <=
> PAGE_OFFSET and crash kernel (second kernel) is running on a configuration
> PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the
> reserved area for crash kernel, reading TTBCR and using the value to OR
> other bit fields might be risky because it doesn't have a reset value for
> TTBCR.
> 
> Suggested-by: Robin Murphy 
> Signed-off-by: Hoeun Ryu 
> ---
> 
> * v1: amended based on
>  - "[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when
> PHYS_OFFSET > PAGE_OFFSET"
>  - https://lkml.org/lkml/2017/6/5/239
> 
> arch/arm/mm/proc-v7-3level.S | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
> index 5e5720e..7d16bbc 100644
> --- a/arch/arm/mm/proc-v7-3level.S
> +++ b/arch/arm/mm/proc-v7-3level.S
> @@ -129,8 +129,7 @@ ENDPROC(cpu_v7_set_pte_ext)
>.macrov7_ttb_setup, zero, ttbr0l, ttbr0h, ttbr1, tmp
>ldr\tmp, =swapper_pg_dir@ swapper_pg_dir virtual address
>cmp\ttbr1, \tmp, lsr #12@ PHYS_OFFSET > PAGE_OFFSET?
> -mrcp15, 0, \tmp, c2, c0, 2@ TTB control egister
> -orr\tmp, \tmp, #TTB_EAE
> +mov\tmp, #TTB_EAE@ for TTB control egister
>ALT_SMP(orr\tmp, \tmp, #TTB_FLAGS_SMP)
>ALT_UP(orr\tmp, \tmp, #TTB_FLAGS_UP)
>ALT_SMP(orr\tmp, \tmp, #TTB_FLAGS_SMP << 16)
> -- 
> 2.7.4
> 


[PATCH 2/2] arm:omap2+: drop dependence on DEBUG_OMAP2PLUS_UART for DEBUG_UNCOMPRESS

2017-06-07 Thread Hoeun Ryu
 With a patch 9a5151e1d1ba4df41bc4b8e8c0f17163247645ac, "arm:omap2+: put
omap_uart_phys/virt/lsr in .text section when ZIMAGE is true", the
dependence on DEBUG_OMAP2PLUS_UART for DEBUG_UNCOMPRESS can be dropped,
because all code of omap2+'s DEBUG_LL_INCLUDE file is in .text section when
it's included in the decompressor.

Signed-off-by: Hoeun Ryu <hoeun@gmail.com>
---
 arch/arm/Kconfig.debug | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug
index ba2cb63..52eb0bf 100644
--- a/arch/arm/Kconfig.debug
+++ b/arch/arm/Kconfig.debug
@@ -1735,8 +1735,7 @@ config DEBUG_UART_8250_FLOW_CONTROL
 config DEBUG_UNCOMPRESS
bool
depends on ARCH_MULTIPLATFORM || PLAT_SAMSUNG || ARM_SINGLE_ARMV7M
-   default y if DEBUG_LL && !DEBUG_OMAP2PLUS_UART && \
-(!DEBUG_TEGRA_UART || !ZBOOT_ROM) && \
+   default y if DEBUG_LL && (!DEBUG_TEGRA_UART || !ZBOOT_ROM) && \
 !DEBUG_BRCMSTB_UART
help
  This option influences the normal decompressor output for
-- 
2.7.4



[PATCH 2/2] arm:omap2+: drop dependence on DEBUG_OMAP2PLUS_UART for DEBUG_UNCOMPRESS

2017-06-07 Thread Hoeun Ryu
 With a patch 9a5151e1d1ba4df41bc4b8e8c0f17163247645ac, "arm:omap2+: put
omap_uart_phys/virt/lsr in .text section when ZIMAGE is true", the
dependence on DEBUG_OMAP2PLUS_UART for DEBUG_UNCOMPRESS can be dropped,
because all code of omap2+'s DEBUG_LL_INCLUDE file is in .text section when
it's included in the decompressor.

Signed-off-by: Hoeun Ryu 
---
 arch/arm/Kconfig.debug | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug
index ba2cb63..52eb0bf 100644
--- a/arch/arm/Kconfig.debug
+++ b/arch/arm/Kconfig.debug
@@ -1735,8 +1735,7 @@ config DEBUG_UART_8250_FLOW_CONTROL
 config DEBUG_UNCOMPRESS
bool
depends on ARCH_MULTIPLATFORM || PLAT_SAMSUNG || ARM_SINGLE_ARMV7M
-   default y if DEBUG_LL && !DEBUG_OMAP2PLUS_UART && \
-(!DEBUG_TEGRA_UART || !ZBOOT_ROM) && \
+   default y if DEBUG_LL && (!DEBUG_TEGRA_UART || !ZBOOT_ROM) && \
 !DEBUG_BRCMSTB_UART
help
  This option influences the normal decompressor output for
-- 
2.7.4



[PATCH 1/2] arm:omap2+: put omap_uart_phys/virt/lsr in .text section when ZIMAGE is true

2017-06-07 Thread Hoeun Ryu
 omap_uart_phys, omap_uart_virt and omap_uart_lsr reside in .data section
and it's right implementation. But because of this, we cannot enable
CONFIG_DEBUG_UNCOMPRESS. LL_DEBUG and DEBUG_UNCOMPRESS are very useful tools
for debugging early boot stage when something goes wrong if you don't have
any hardware based debugging tools like a JTAG debugger.
 This patch is to put the variables into .text section instead only when the
DEBUG_LL_INCLUDE file is included in the kernel decompressor, which is only
when ZIMAGE is defined.
 This patch does not change anything when DEBUG_LL_INCLUDE is included in
the other kernel parts like arch/arm/kernel/*

Signed-off-by: Hoeun Ryu <hoeun@gmail.com>
---
 arch/arm/include/debug/omap2plus.S | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm/include/debug/omap2plus.S 
b/arch/arm/include/debug/omap2plus.S
index 6d867ae..6ce6ef9 100644
--- a/arch/arm/include/debug/omap2plus.S
+++ b/arch/arm/include/debug/omap2plus.S
@@ -58,11 +58,22 @@
 
 #define UART_OFFSET(addr)  ((addr) & 0x00ff)
 
+/*
+ * Definition of ZIMAGE is in arch/arm/boot/compressed/Makefile.
+ * Place the following block in .text section only when this file is
+ * included by arch/arm/boot/compressed/* to make it possible to
+ * enable CONFIG_DEBUG_UNCOMPRESS and DEBUG in arch/arm/boot/compressed/head.S
+ * on OMAP2+ SoCs.
+ */
+#ifndef ZIMAGE
.pushsection .data
+#endif
 omap_uart_phys:.word   0
 omap_uart_virt:.word   0
 omap_uart_lsr: .word   0
+#ifndef ZIMAGE
.popsection
+#endif
 
.macro  addruart, rp, rv, tmp
 
-- 
2.7.4



[PATCH 1/2] arm:omap2+: put omap_uart_phys/virt/lsr in .text section when ZIMAGE is true

2017-06-07 Thread Hoeun Ryu
 omap_uart_phys, omap_uart_virt and omap_uart_lsr reside in .data section
and it's right implementation. But because of this, we cannot enable
CONFIG_DEBUG_UNCOMPRESS. LL_DEBUG and DEBUG_UNCOMPRESS are very useful tools
for debugging early boot stage when something goes wrong if you don't have
any hardware based debugging tools like a JTAG debugger.
 This patch is to put the variables into .text section instead only when the
DEBUG_LL_INCLUDE file is included in the kernel decompressor, which is only
when ZIMAGE is defined.
 This patch does not change anything when DEBUG_LL_INCLUDE is included in
the other kernel parts like arch/arm/kernel/*

Signed-off-by: Hoeun Ryu 
---
 arch/arm/include/debug/omap2plus.S | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm/include/debug/omap2plus.S 
b/arch/arm/include/debug/omap2plus.S
index 6d867ae..6ce6ef9 100644
--- a/arch/arm/include/debug/omap2plus.S
+++ b/arch/arm/include/debug/omap2plus.S
@@ -58,11 +58,22 @@
 
 #define UART_OFFSET(addr)  ((addr) & 0x00ff)
 
+/*
+ * Definition of ZIMAGE is in arch/arm/boot/compressed/Makefile.
+ * Place the following block in .text section only when this file is
+ * included by arch/arm/boot/compressed/* to make it possible to
+ * enable CONFIG_DEBUG_UNCOMPRESS and DEBUG in arch/arm/boot/compressed/head.S
+ * on OMAP2+ SoCs.
+ */
+#ifndef ZIMAGE
.pushsection .data
+#endif
 omap_uart_phys:.word   0
 omap_uart_virt:.word   0
 omap_uart_lsr: .word   0
+#ifndef ZIMAGE
.popsection
+#endif
 
.macro  addruart, rp, rv, tmp
 
-- 
2.7.4



[PATCH] arm:lpae: build TTB control register value from scratch in v7_ttb_setup

2017-06-06 Thread Hoeun Ryu
 Reading TTBCR in early boot stage might return the value of the previous
kernel's configuration, especially in case of kexec. For example, if
normal kernel (first kernel) had run on a configuration of PHYS_OFFSET <=
PAGE_OFFSET and crash kernel (second kernel) is running on a configuration
PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the
reserved area for crash kernel, reading TTBCR and using the value to OR
other bit fields might be risky because it doesn't have a reset value for
TTBCR.

Suggested-by: Robin Murphy <robin.mur...@arm.com>
Signed-off-by: Hoeun Ryu <hoeun@gmail.com>
---

* v1: amended based on
  - "[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when
 PHYS_OFFSET > PAGE_OFFSET"
  - https://lkml.org/lkml/2017/6/5/239

 arch/arm/mm/proc-v7-3level.S | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
index 5e5720e..7d16bbc 100644
--- a/arch/arm/mm/proc-v7-3level.S
+++ b/arch/arm/mm/proc-v7-3level.S
@@ -129,8 +129,7 @@ ENDPROC(cpu_v7_set_pte_ext)
.macro  v7_ttb_setup, zero, ttbr0l, ttbr0h, ttbr1, tmp
ldr \tmp, =swapper_pg_dir   @ swapper_pg_dir virtual address
cmp \ttbr1, \tmp, lsr #12   @ PHYS_OFFSET > PAGE_OFFSET?
-   mrc p15, 0, \tmp, c2, c0, 2 @ TTB control egister
-   orr \tmp, \tmp, #TTB_EAE
+   mov \tmp, #TTB_EAE  @ for TTB control egister
ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP)
ALT_UP(orr  \tmp, \tmp, #TTB_FLAGS_UP)
ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP << 16)
-- 
2.7.4



[PATCH] arm:lpae: build TTB control register value from scratch in v7_ttb_setup

2017-06-06 Thread Hoeun Ryu
 Reading TTBCR in early boot stage might return the value of the previous
kernel's configuration, especially in case of kexec. For example, if
normal kernel (first kernel) had run on a configuration of PHYS_OFFSET <=
PAGE_OFFSET and crash kernel (second kernel) is running on a configuration
PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the
reserved area for crash kernel, reading TTBCR and using the value to OR
other bit fields might be risky because it doesn't have a reset value for
TTBCR.

Suggested-by: Robin Murphy 
Signed-off-by: Hoeun Ryu 
---

* v1: amended based on
  - "[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when
 PHYS_OFFSET > PAGE_OFFSET"
  - https://lkml.org/lkml/2017/6/5/239

 arch/arm/mm/proc-v7-3level.S | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
index 5e5720e..7d16bbc 100644
--- a/arch/arm/mm/proc-v7-3level.S
+++ b/arch/arm/mm/proc-v7-3level.S
@@ -129,8 +129,7 @@ ENDPROC(cpu_v7_set_pte_ext)
.macro  v7_ttb_setup, zero, ttbr0l, ttbr0h, ttbr1, tmp
ldr \tmp, =swapper_pg_dir   @ swapper_pg_dir virtual address
cmp \ttbr1, \tmp, lsr #12   @ PHYS_OFFSET > PAGE_OFFSET?
-   mrc p15, 0, \tmp, c2, c0, 2 @ TTB control egister
-   orr \tmp, \tmp, #TTB_EAE
+   mov \tmp, #TTB_EAE  @ for TTB control egister
ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP)
ALT_UP(orr  \tmp, \tmp, #TTB_FLAGS_UP)
ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP << 16)
-- 
2.7.4



Re: [PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when PHYS_OFFSET > PAGE_OFFSET

2017-06-05 Thread Hoeun Ryu

>> On Jun 5, 2017, at 7:30 PM, Robin Murphy <robin.mur...@arm.com> wrote:
>> 
>> On 05/06/17 11:06, Hoeun Ryu wrote:
>> Clearing TTBCR.T1SZ explicitly when kernel runs on a configuration of
>> PHYS_OFFSET > PAGE_OFFSET.
>> Reading TTBCR in early boot stage might returns the value of the
>> previous kernel's configuration, especially in case of kexec. For
>> example, if normal kernel (first kernel) had run on a configuration
>> of PHYS_OFFSET <= PAGE_OFFSET and crash kernel (second kernel) is
>> running on a configuration PHYS_OFFSET > PAGE_OFFSET, which can happen
>> because it depends on the reserved area for crash kernel, reading
>> TTBCR and using the value without clearing TTBCR.T1SZ might risky
>> because the value doesn't have a reset value for TTBCR.T1SZ.
> 
> Hmm, but isn't that also strictly true of all the other fields we're
> ORing TTB_FLAGS_* into? Given that this is initial setup, I wonder
> whether we need to care about the previous TTBCR value at all - is there
> any reason for not simply building up the value we want from scratch?

Your idea seems better.
I'll send a new patch to build TTBCR from scratch without reading it for the 
initial value.

Russell, do you have any opinion ?

Thank toy for the review.

> 
> Robin.
> 
>> Signed-off-by: Hoeun Ryu <hoeun@gmail.com>
>> ---
>> arch/arm/mm/proc-v7-3level.S | 1 +
>> 1 file changed, 1 insertion(+)
>> 
>> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
>> index 5e5720e..81404b8 100644
>> --- a/arch/arm/mm/proc-v7-3level.S
>> +++ b/arch/arm/mm/proc-v7-3level.S
>> @@ -140,6 +140,7 @@ ENDPROC(cpu_v7_set_pte_ext)
>>* otherwise booting secondary CPUs would end up using TTBR1 for the
>>* identity mapping set up in TTBR0.
>>*/
>> +bichi\tmp, \tmp, #(7 << 16)@ clear TTBCR.T1SZ
>>   orrls\tmp, \tmp, #TTBR1_SIZE@ TTBCR.T1SZ
>>   mcrp15, 0, \tmp, c2, c0, 2@ TTBCR
>>   mov\tmp, \ttbr1, lsr #20
> 


Re: [PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when PHYS_OFFSET > PAGE_OFFSET

2017-06-05 Thread Hoeun Ryu

>> On Jun 5, 2017, at 7:30 PM, Robin Murphy  wrote:
>> 
>> On 05/06/17 11:06, Hoeun Ryu wrote:
>> Clearing TTBCR.T1SZ explicitly when kernel runs on a configuration of
>> PHYS_OFFSET > PAGE_OFFSET.
>> Reading TTBCR in early boot stage might returns the value of the
>> previous kernel's configuration, especially in case of kexec. For
>> example, if normal kernel (first kernel) had run on a configuration
>> of PHYS_OFFSET <= PAGE_OFFSET and crash kernel (second kernel) is
>> running on a configuration PHYS_OFFSET > PAGE_OFFSET, which can happen
>> because it depends on the reserved area for crash kernel, reading
>> TTBCR and using the value without clearing TTBCR.T1SZ might risky
>> because the value doesn't have a reset value for TTBCR.T1SZ.
> 
> Hmm, but isn't that also strictly true of all the other fields we're
> ORing TTB_FLAGS_* into? Given that this is initial setup, I wonder
> whether we need to care about the previous TTBCR value at all - is there
> any reason for not simply building up the value we want from scratch?

Your idea seems better.
I'll send a new patch to build TTBCR from scratch without reading it for the 
initial value.

Russell, do you have any opinion ?

Thank toy for the review.

> 
> Robin.
> 
>> Signed-off-by: Hoeun Ryu 
>> ---
>> arch/arm/mm/proc-v7-3level.S | 1 +
>> 1 file changed, 1 insertion(+)
>> 
>> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
>> index 5e5720e..81404b8 100644
>> --- a/arch/arm/mm/proc-v7-3level.S
>> +++ b/arch/arm/mm/proc-v7-3level.S
>> @@ -140,6 +140,7 @@ ENDPROC(cpu_v7_set_pte_ext)
>>* otherwise booting secondary CPUs would end up using TTBR1 for the
>>* identity mapping set up in TTBR0.
>>*/
>> +bichi\tmp, \tmp, #(7 << 16)@ clear TTBCR.T1SZ
>>   orrls\tmp, \tmp, #TTBR1_SIZE@ TTBCR.T1SZ
>>   mcrp15, 0, \tmp, c2, c0, 2@ TTBCR
>>   mov\tmp, \ttbr1, lsr #20
> 


[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when PHYS_OFFSET > PAGE_OFFSET

2017-06-05 Thread Hoeun Ryu
 Clearing TTBCR.T1SZ explicitly when kernel runs on a configuration of
PHYS_OFFSET > PAGE_OFFSET.
 Reading TTBCR in early boot stage might returns the value of the
previous kernel's configuration, especially in case of kexec. For
example, if normal kernel (first kernel) had run on a configuration
of PHYS_OFFSET <= PAGE_OFFSET and crash kernel (second kernel) is
running on a configuration PHYS_OFFSET > PAGE_OFFSET, which can happen
because it depends on the reserved area for crash kernel, reading
TTBCR and using the value without clearing TTBCR.T1SZ might risky
because the value doesn't have a reset value for TTBCR.T1SZ.

Signed-off-by: Hoeun Ryu <hoeun@gmail.com>
---
 arch/arm/mm/proc-v7-3level.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
index 5e5720e..81404b8 100644
--- a/arch/arm/mm/proc-v7-3level.S
+++ b/arch/arm/mm/proc-v7-3level.S
@@ -140,6 +140,7 @@ ENDPROC(cpu_v7_set_pte_ext)
 * otherwise booting secondary CPUs would end up using TTBR1 for the
 * identity mapping set up in TTBR0.
 */
+   bichi   \tmp, \tmp, #(7 << 16)  @ clear 
TTBCR.T1SZ
orrls   \tmp, \tmp, #TTBR1_SIZE @ TTBCR.T1SZ
mcr p15, 0, \tmp, c2, c0, 2 @ TTBCR
mov \tmp, \ttbr1, lsr #20
-- 
2.7.4



[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when PHYS_OFFSET > PAGE_OFFSET

2017-06-05 Thread Hoeun Ryu
 Clearing TTBCR.T1SZ explicitly when kernel runs on a configuration of
PHYS_OFFSET > PAGE_OFFSET.
 Reading TTBCR in early boot stage might returns the value of the
previous kernel's configuration, especially in case of kexec. For
example, if normal kernel (first kernel) had run on a configuration
of PHYS_OFFSET <= PAGE_OFFSET and crash kernel (second kernel) is
running on a configuration PHYS_OFFSET > PAGE_OFFSET, which can happen
because it depends on the reserved area for crash kernel, reading
TTBCR and using the value without clearing TTBCR.T1SZ might risky
because the value doesn't have a reset value for TTBCR.T1SZ.

Signed-off-by: Hoeun Ryu 
---
 arch/arm/mm/proc-v7-3level.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
index 5e5720e..81404b8 100644
--- a/arch/arm/mm/proc-v7-3level.S
+++ b/arch/arm/mm/proc-v7-3level.S
@@ -140,6 +140,7 @@ ENDPROC(cpu_v7_set_pte_ext)
 * otherwise booting secondary CPUs would end up using TTBR1 for the
 * identity mapping set up in TTBR0.
 */
+   bichi   \tmp, \tmp, #(7 << 16)  @ clear 
TTBCR.T1SZ
orrls   \tmp, \tmp, #TTBR1_SIZE @ TTBCR.T1SZ
mcr p15, 0, \tmp, c2, c0, 2 @ TTBCR
mov \tmp, \ttbr1, lsr #20
-- 
2.7.4



Re: [PATCH] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when PHYS_OFFSET > PAGE_OFFSET

2017-06-05 Thread Hoeun Ryu
On Mon, Jun 5, 2017 at 6:34 PM, Russell King - ARM Linux
<li...@armlinux.org.uk> wrote:
> On Mon, Jun 05, 2017 at 06:22:20PM +0900, Hoeun Ryu wrote:
>> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
>> index 5e5720e..9ac2bec 100644
>> --- a/arch/arm/mm/proc-v7-3level.S
>> +++ b/arch/arm/mm/proc-v7-3level.S
>> @@ -140,6 +140,7 @@ ENDPROC(cpu_v7_set_pte_ext)
>>* otherwise booting secondary CPUs would end up using TTBR1 for the
>>* identity mapping set up in TTBR0.
>>*/
>> + bichi   \tmp, \tmp, #(1 << 16)  @ clear 
>> TTBCR.T1SZ
>
> This looks insufficient.  There's two bits here:
>
>  * TTBR0/TTBR1 split (PAGE_OFFSET):
>  *   0x4000: T0SZ = 2, T1SZ = 0 (not used)
>  *   0x8000: T0SZ = 0, T1SZ = 1
>  *   0xc000: T0SZ = 0, T1SZ = 2
>
> but you seem to only be clearing one bit.

Oh, I'm sorry for the mistake, I'll fix this like #(7 << 16) in v2.
(There're 3 bits in TxSZ)
Thank you for the review.

>
> --
> RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
> according to speedtest.net.


Re: [PATCH] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when PHYS_OFFSET > PAGE_OFFSET

2017-06-05 Thread Hoeun Ryu
On Mon, Jun 5, 2017 at 6:34 PM, Russell King - ARM Linux
 wrote:
> On Mon, Jun 05, 2017 at 06:22:20PM +0900, Hoeun Ryu wrote:
>> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
>> index 5e5720e..9ac2bec 100644
>> --- a/arch/arm/mm/proc-v7-3level.S
>> +++ b/arch/arm/mm/proc-v7-3level.S
>> @@ -140,6 +140,7 @@ ENDPROC(cpu_v7_set_pte_ext)
>>* otherwise booting secondary CPUs would end up using TTBR1 for the
>>* identity mapping set up in TTBR0.
>>*/
>> + bichi   \tmp, \tmp, #(1 << 16)  @ clear 
>> TTBCR.T1SZ
>
> This looks insufficient.  There's two bits here:
>
>  * TTBR0/TTBR1 split (PAGE_OFFSET):
>  *   0x4000: T0SZ = 2, T1SZ = 0 (not used)
>  *   0x8000: T0SZ = 0, T1SZ = 1
>  *   0xc000: T0SZ = 0, T1SZ = 2
>
> but you seem to only be clearing one bit.

Oh, I'm sorry for the mistake, I'll fix this like #(7 << 16) in v2.
(There're 3 bits in TxSZ)
Thank you for the review.

>
> --
> RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
> according to speedtest.net.


[PATCH] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when PHYS_OFFSET > PAGE_OFFSET

2017-06-05 Thread Hoeun Ryu
 Clearing TTBCR.T1SZ explicitly when kernel runs on a configuration of
PHYS_OFFSET > PAGE_OFFSET.
 Reading TTBCR in early boot stage might returns the value of the
previous kernel's configuration, especially in case of kexec. For
example, if normal kernel (first kernel) had run on a configuration
of PHYS_OFFSET <= PAGE_OFFSET and crash kernel (second kernel) is
running on a configuration PHYS_OFFSET > PAGE_OFFSET, which can happen
because it depends on the reserved area for crash kernel, reading
TTBCR and using the value without clearing TTBCR.T1SZ might risky
because the value doesn't have a reset value for TTBCR.T1SZ.

Signed-off-by: Hoeun Ryu <hoeun@gmail.com>
---
 arch/arm/mm/proc-v7-3level.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
index 5e5720e..9ac2bec 100644
--- a/arch/arm/mm/proc-v7-3level.S
+++ b/arch/arm/mm/proc-v7-3level.S
@@ -140,6 +140,7 @@ ENDPROC(cpu_v7_set_pte_ext)
 * otherwise booting secondary CPUs would end up using TTBR1 for the
 * identity mapping set up in TTBR0.
 */
+   bichi   \tmp, \tmp, #(1 << 16)  @ clear 
TTBCR.T1SZ
orrls   \tmp, \tmp, #TTBR1_SIZE @ TTBCR.T1SZ
mcr p15, 0, \tmp, c2, c0, 2 @ TTBCR
mov \tmp, \ttbr1, lsr #20
-- 
2.7.4



[PATCH] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when PHYS_OFFSET > PAGE_OFFSET

2017-06-05 Thread Hoeun Ryu
 Clearing TTBCR.T1SZ explicitly when kernel runs on a configuration of
PHYS_OFFSET > PAGE_OFFSET.
 Reading TTBCR in early boot stage might returns the value of the
previous kernel's configuration, especially in case of kexec. For
example, if normal kernel (first kernel) had run on a configuration
of PHYS_OFFSET <= PAGE_OFFSET and crash kernel (second kernel) is
running on a configuration PHYS_OFFSET > PAGE_OFFSET, which can happen
because it depends on the reserved area for crash kernel, reading
TTBCR and using the value without clearing TTBCR.T1SZ might risky
because the value doesn't have a reset value for TTBCR.T1SZ.

Signed-off-by: Hoeun Ryu 
---
 arch/arm/mm/proc-v7-3level.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
index 5e5720e..9ac2bec 100644
--- a/arch/arm/mm/proc-v7-3level.S
+++ b/arch/arm/mm/proc-v7-3level.S
@@ -140,6 +140,7 @@ ENDPROC(cpu_v7_set_pte_ext)
 * otherwise booting secondary CPUs would end up using TTBR1 for the
 * identity mapping set up in TTBR0.
 */
+   bichi   \tmp, \tmp, #(1 << 16)  @ clear 
TTBCR.T1SZ
orrls   \tmp, \tmp, #TTBR1_SIZE @ TTBCR.T1SZ
mcr p15, 0, \tmp, c2, c0, 2 @ TTBCR
mov \tmp, \ttbr1, lsr #20
-- 
2.7.4



Re: [PATCH v2] mm: add VM_STATIC flag to vmalloc and prevent from removing the areas

2017-04-18 Thread Hoeun Ryu

> On Apr 18, 2017, at 3:59 PM, Michal Hocko <mho...@kernel.org> wrote:
> 
>> On Tue 18-04-17 14:48:39, Hoeun Ryu wrote:
>> vm_area_add_early/vm_area_register_early() are used to reserve vmalloc area
>> during boot process and those virtually mapped areas are never unmapped.
>> So `OR` VM_STATIC flag to the areas in vmalloc_init() when importing
>> existing vmlist entries and prevent those areas from being removed from the
>> rbtree by accident.
> 
> Has this been a problem in the past or currently so that it is worth
> handling?
> 
>> This flags can be also used by other vmalloc APIs to
>> specify that the area will never go away.
> 
> Do we have a user for that?
> 
>> This makes remove_vm_area() more robust against other kind of errors (eg.
>> programming errors).
> 
> Well, yes it will help to prevent from vfree(early_mem) but we have 4
> users of vm_area_register_early so I am really wondering whether this is
> worth additional code. It would really help to understand your
> motivation for the patch if we were explicit about the problem you are
> trying to solve.

I just think that it would be good to make it robust against various kind of 
errors.
You might think that's not an enough reason to do so though.

> 
> Thanks
> 
> -- 
> Michal Hocko
> SUSE Labs


Re: [PATCH v2] mm: add VM_STATIC flag to vmalloc and prevent from removing the areas

2017-04-18 Thread Hoeun Ryu

> On Apr 18, 2017, at 3:59 PM, Michal Hocko  wrote:
> 
>> On Tue 18-04-17 14:48:39, Hoeun Ryu wrote:
>> vm_area_add_early/vm_area_register_early() are used to reserve vmalloc area
>> during boot process and those virtually mapped areas are never unmapped.
>> So `OR` VM_STATIC flag to the areas in vmalloc_init() when importing
>> existing vmlist entries and prevent those areas from being removed from the
>> rbtree by accident.
> 
> Has this been a problem in the past or currently so that it is worth
> handling?
> 
>> This flags can be also used by other vmalloc APIs to
>> specify that the area will never go away.
> 
> Do we have a user for that?
> 
>> This makes remove_vm_area() more robust against other kind of errors (eg.
>> programming errors).
> 
> Well, yes it will help to prevent from vfree(early_mem) but we have 4
> users of vm_area_register_early so I am really wondering whether this is
> worth additional code. It would really help to understand your
> motivation for the patch if we were explicit about the problem you are
> trying to solve.

I just think that it would be good to make it robust against various kind of 
errors.
You might think that's not an enough reason to do so though.

> 
> Thanks
> 
> -- 
> Michal Hocko
> SUSE Labs


[PATCH v2] mm: add VM_STATIC flag to vmalloc and prevent from removing the areas

2017-04-17 Thread Hoeun Ryu
 vm_area_add_early/vm_area_register_early() are used to reserve vmalloc area
during boot process and those virtually mapped areas are never unmapped.
So `OR` VM_STATIC flag to the areas in vmalloc_init() when importing
existing vmlist entries and prevent those areas from being removed from the
rbtree by accident. This flags can be also used by other vmalloc APIs to
specify that the area will never go away.
 This makes remove_vm_area() more robust against other kind of errors (eg.
programming errors).

Signed-off-by: Hoeun Ryu <hoeun@gmail.com>
---
v2:
 - update changelog
 - add description to VM_STATIC
 - check VM_STATIC first before VM_VM_AREA in remove_vm_area()

 include/linux/vmalloc.h | 7 +++
 mm/vmalloc.c| 9 ++---
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 46991ad..a42c210 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -19,6 +19,13 @@ struct notifier_block;   /* in notifier.h */
 #define VM_UNINITIALIZED   0x0020  /* vm_struct is not fully 
initialized */
 #define VM_NO_GUARD0x0040  /* don't add guard page */
 #define VM_KASAN   0x0080  /* has allocated kasan shadow 
memory */
+/*
+ * static area, vmap_area will never go away.
+ * This flag is OR-ed automatically in vmalloc_init() if the area is inserted
+ * by vm_area_add_early()/vm_area_register_early() during early boot process
+ * or you can give this flag manually using other vmalloc APIs.
+ */
+#define VM_STATIC  0x0200
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 8ef8ea1..6bc6c39 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1262,7 +1262,7 @@ void __init vmalloc_init(void)
/* Import existing vmlist entries. */
for (tmp = vmlist; tmp; tmp = tmp->next) {
va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT);
-   va->flags = VM_VM_AREA;
+   va->flags = VM_VM_AREA | VM_STATIC;
va->va_start = (unsigned long)tmp->addr;
va->va_end = va->va_start + tmp->size;
va->vm = tmp;
@@ -1480,7 +1480,7 @@ struct vm_struct *remove_vm_area(const void *addr)
might_sleep();
 
va = find_vmap_area((unsigned long)addr);
-   if (va && va->flags & VM_VM_AREA) {
+   if (va && likely(!(va->flags & VM_STATIC)) && va->flags & VM_VM_AREA) {
struct vm_struct *vm = va->vm;
 
spin_lock(_area_lock);
@@ -1510,7 +1510,7 @@ static void __vunmap(const void *addr, int 
deallocate_pages)
 
area = remove_vm_area(addr);
if (unlikely(!area)) {
-   WARN(1, KERN_ERR "Trying to vfree() nonexistent vm area (%p)\n",
+   WARN(1, KERN_ERR "Trying to vfree() nonexistent or static vm 
area (%p)\n",
addr);
return;
}
@@ -2708,6 +2708,9 @@ static int s_show(struct seq_file *m, void *p)
if (v->phys_addr)
seq_printf(m, " phys=%pa", >phys_addr);
 
+   if (v->flags & VM_STATIC)
+   seq_puts(m, " static");
+
if (v->flags & VM_IOREMAP)
seq_puts(m, " ioremap");
 
-- 
2.7.4



[PATCH v2] mm: add VM_STATIC flag to vmalloc and prevent from removing the areas

2017-04-17 Thread Hoeun Ryu
 vm_area_add_early/vm_area_register_early() are used to reserve vmalloc area
during boot process and those virtually mapped areas are never unmapped.
So `OR` VM_STATIC flag to the areas in vmalloc_init() when importing
existing vmlist entries and prevent those areas from being removed from the
rbtree by accident. This flags can be also used by other vmalloc APIs to
specify that the area will never go away.
 This makes remove_vm_area() more robust against other kind of errors (eg.
programming errors).

Signed-off-by: Hoeun Ryu 
---
v2:
 - update changelog
 - add description to VM_STATIC
 - check VM_STATIC first before VM_VM_AREA in remove_vm_area()

 include/linux/vmalloc.h | 7 +++
 mm/vmalloc.c| 9 ++---
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 46991ad..a42c210 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -19,6 +19,13 @@ struct notifier_block;   /* in notifier.h */
 #define VM_UNINITIALIZED   0x0020  /* vm_struct is not fully 
initialized */
 #define VM_NO_GUARD0x0040  /* don't add guard page */
 #define VM_KASAN   0x0080  /* has allocated kasan shadow 
memory */
+/*
+ * static area, vmap_area will never go away.
+ * This flag is OR-ed automatically in vmalloc_init() if the area is inserted
+ * by vm_area_add_early()/vm_area_register_early() during early boot process
+ * or you can give this flag manually using other vmalloc APIs.
+ */
+#define VM_STATIC  0x0200
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 8ef8ea1..6bc6c39 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1262,7 +1262,7 @@ void __init vmalloc_init(void)
/* Import existing vmlist entries. */
for (tmp = vmlist; tmp; tmp = tmp->next) {
va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT);
-   va->flags = VM_VM_AREA;
+   va->flags = VM_VM_AREA | VM_STATIC;
va->va_start = (unsigned long)tmp->addr;
va->va_end = va->va_start + tmp->size;
va->vm = tmp;
@@ -1480,7 +1480,7 @@ struct vm_struct *remove_vm_area(const void *addr)
might_sleep();
 
va = find_vmap_area((unsigned long)addr);
-   if (va && va->flags & VM_VM_AREA) {
+   if (va && likely(!(va->flags & VM_STATIC)) && va->flags & VM_VM_AREA) {
struct vm_struct *vm = va->vm;
 
spin_lock(_area_lock);
@@ -1510,7 +1510,7 @@ static void __vunmap(const void *addr, int 
deallocate_pages)
 
area = remove_vm_area(addr);
if (unlikely(!area)) {
-   WARN(1, KERN_ERR "Trying to vfree() nonexistent vm area (%p)\n",
+   WARN(1, KERN_ERR "Trying to vfree() nonexistent or static vm 
area (%p)\n",
addr);
return;
}
@@ -2708,6 +2708,9 @@ static int s_show(struct seq_file *m, void *p)
if (v->phys_addr)
seq_printf(m, " phys=%pa", >phys_addr);
 
+   if (v->flags & VM_STATIC)
+   seq_puts(m, " static");
+
if (v->flags & VM_IOREMAP)
seq_puts(m, " ioremap");
 
-- 
2.7.4



[PATCH v2] mm: add VM_STATIC flag to vmalloc and prevent from removing the areas

2017-04-17 Thread Hoeun Ryu
 vm_area_add_early/vm_area_register_early() are used to reserve vmalloc area
during boot process and those virtually mapped areas are never unmapped.
So `OR` VM_STATIC flag to the areas in vmalloc_init() when importing
existing vmlist entries and prevent those areas from being removed from the
rbtree by accident. This flags can be also used by other vmalloc APIs to
specify that the area will never go away.
 This makes remove_vm_area() more robust against other kind of errors (eg.
programming errors).

Signed-off-by: Hoeun Ryu <hoeun@gmail.com>
---
v2:
 - update changelog
 - add description to VM_STATIC
 - check VM_STATIC first before VM_VM_AREA in remove_vm_area()

 include/linux/vmalloc.h | 7 +++
 mm/vmalloc.c| 9 ++---
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 46991ad..a42c210 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -19,6 +19,13 @@ struct notifier_block;   /* in notifier.h */
 #define VM_UNINITIALIZED   0x0020  /* vm_struct is not fully 
initialized */
 #define VM_NO_GUARD0x0040  /* don't add guard page */
 #define VM_KASAN   0x0080  /* has allocated kasan shadow 
memory */
+/*
+ * static area, vmap_area will never go away.
+ * This flag is OR-ed automatically in vmalloc_init() if the area is inserted
+ * by vm_area_add_early()/vm_area_register_early() during early boot process
+ * or you can give this flag manually using other vmalloc APIs.
+ */
+#define VM_STATIC  0x0200
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 8ef8ea1..6bc6c39 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1262,7 +1262,7 @@ void __init vmalloc_init(void)
/* Import existing vmlist entries. */
for (tmp = vmlist; tmp; tmp = tmp->next) {
va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT);
-   va->flags = VM_VM_AREA;
+   va->flags = VM_VM_AREA | VM_STATIC;
va->va_start = (unsigned long)tmp->addr;
va->va_end = va->va_start + tmp->size;
va->vm = tmp;
@@ -1480,7 +1480,7 @@ struct vm_struct *remove_vm_area(const void *addr)
might_sleep();
 
va = find_vmap_area((unsigned long)addr);
-   if (va && va->flags & VM_VM_AREA) {
+   if (va && likely(!(va->flags & VM_STATIC)) && va->flags & VM_VM_AREA) {
struct vm_struct *vm = va->vm;
 
spin_lock(_area_lock);
@@ -1510,7 +1510,7 @@ static void __vunmap(const void *addr, int 
deallocate_pages)
 
area = remove_vm_area(addr);
if (unlikely(!area)) {
-   WARN(1, KERN_ERR "Trying to vfree() nonexistent vm area (%p)\n",
+   WARN(1, KERN_ERR "Trying to vfree() nonexistent or static vm 
area (%p)\n",
addr);
return;
}
@@ -2708,6 +2708,9 @@ static int s_show(struct seq_file *m, void *p)
if (v->phys_addr)
seq_printf(m, " phys=%pa", >phys_addr);
 
+   if (v->flags & VM_STATIC)
+   seq_puts(m, " static");
+
if (v->flags & VM_IOREMAP)
seq_puts(m, " ioremap");
 
-- 
2.7.4



[PATCH v2] mm: add VM_STATIC flag to vmalloc and prevent from removing the areas

2017-04-17 Thread Hoeun Ryu
 vm_area_add_early/vm_area_register_early() are used to reserve vmalloc area
during boot process and those virtually mapped areas are never unmapped.
So `OR` VM_STATIC flag to the areas in vmalloc_init() when importing
existing vmlist entries and prevent those areas from being removed from the
rbtree by accident. This flags can be also used by other vmalloc APIs to
specify that the area will never go away.
 This makes remove_vm_area() more robust against other kind of errors (eg.
programming errors).

Signed-off-by: Hoeun Ryu 
---
v2:
 - update changelog
 - add description to VM_STATIC
 - check VM_STATIC first before VM_VM_AREA in remove_vm_area()

 include/linux/vmalloc.h | 7 +++
 mm/vmalloc.c| 9 ++---
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 46991ad..a42c210 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -19,6 +19,13 @@ struct notifier_block;   /* in notifier.h */
 #define VM_UNINITIALIZED   0x0020  /* vm_struct is not fully 
initialized */
 #define VM_NO_GUARD0x0040  /* don't add guard page */
 #define VM_KASAN   0x0080  /* has allocated kasan shadow 
memory */
+/*
+ * static area, vmap_area will never go away.
+ * This flag is OR-ed automatically in vmalloc_init() if the area is inserted
+ * by vm_area_add_early()/vm_area_register_early() during early boot process
+ * or you can give this flag manually using other vmalloc APIs.
+ */
+#define VM_STATIC  0x0200
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 8ef8ea1..6bc6c39 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1262,7 +1262,7 @@ void __init vmalloc_init(void)
/* Import existing vmlist entries. */
for (tmp = vmlist; tmp; tmp = tmp->next) {
va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT);
-   va->flags = VM_VM_AREA;
+   va->flags = VM_VM_AREA | VM_STATIC;
va->va_start = (unsigned long)tmp->addr;
va->va_end = va->va_start + tmp->size;
va->vm = tmp;
@@ -1480,7 +1480,7 @@ struct vm_struct *remove_vm_area(const void *addr)
might_sleep();
 
va = find_vmap_area((unsigned long)addr);
-   if (va && va->flags & VM_VM_AREA) {
+   if (va && likely(!(va->flags & VM_STATIC)) && va->flags & VM_VM_AREA) {
struct vm_struct *vm = va->vm;
 
spin_lock(_area_lock);
@@ -1510,7 +1510,7 @@ static void __vunmap(const void *addr, int 
deallocate_pages)
 
area = remove_vm_area(addr);
if (unlikely(!area)) {
-   WARN(1, KERN_ERR "Trying to vfree() nonexistent vm area (%p)\n",
+   WARN(1, KERN_ERR "Trying to vfree() nonexistent or static vm 
area (%p)\n",
addr);
return;
}
@@ -2708,6 +2708,9 @@ static int s_show(struct seq_file *m, void *p)
if (v->phys_addr)
seq_printf(m, " phys=%pa", >phys_addr);
 
+   if (v->flags & VM_STATIC)
+   seq_puts(m, " static");
+
if (v->flags & VM_IOREMAP)
seq_puts(m, " ioremap");
 
-- 
2.7.4



Re: [PATCH] mm: add VM_STATIC flag to vmalloc and prevent from removing the areas

2017-04-13 Thread Hoeun Ryu

> On Apr 13, 2017, at 1:17 PM, Anshuman Khandual <khand...@linux.vnet.ibm.com> 
> wrote:
> 
>> On 04/12/2017 10:31 AM, Hoeun Ryu wrote:
>> vm_area_add_early/vm_area_register_early() are used to reserve vmalloc area
>> during boot process and those virtually mapped areas are never unmapped.
>> So `OR` VM_STATIC flag to the areas in vmalloc_init() when importing
>> existing vmlist entries and prevent those areas from being removed from the
>> rbtree by accident.
> 
> I am wondering whether protection against accidental deletion
> of any vmap area should be done in remove_vm_area() function
> or the callers should take care of it. But I guess either way
> it works.
> 
>> 
>> Signed-off-by: Hoeun Ryu <hoeun@gmail.com>
>> ---
>> include/linux/vmalloc.h | 1 +
>> mm/vmalloc.c| 9 ++---
>> 2 files changed, 7 insertions(+), 3 deletions(-)
>> 
>> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
>> index 46991ad..3df53fc 100644
>> --- a/include/linux/vmalloc.h
>> +++ b/include/linux/vmalloc.h
>> @@ -19,6 +19,7 @@ struct notifier_block;/* in notifier.h */
>> #define VM_UNINITIALIZED0x0020/* vm_struct is not fully 
>> initialized */
>> #define VM_NO_GUARD0x0040  /* don't add guard page */
>> #define VM_KASAN0x0080  /* has allocated kasan shadow memory 
>> */
>> +#define VM_STATIC0x0200
> 
> You might want to add some description in the comment saying
> its a sticky VM area which will never go away or something.
> 

OK. I will add some description.

>> /* bits [20..32] reserved for arch specific ioremap internals */
>> 
>> /*
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index 8ef8ea1..fb5049a 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -1262,7 +1262,7 @@ void __init vmalloc_init(void)
>>/* Import existing vmlist entries. */
>>for (tmp = vmlist; tmp; tmp = tmp->next) {
>>va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT);
>> -va->flags = VM_VM_AREA;
>> +va->flags = VM_VM_AREA | VM_STATIC;
>>va->va_start = (unsigned long)tmp->addr;
>>va->va_end = va->va_start + tmp->size;
>>va->vm = tmp;
>> @@ -1480,7 +1480,7 @@ struct vm_struct *remove_vm_area(const void *addr)
>>might_sleep();
>> 
>>va = find_vmap_area((unsigned long)addr);
>> -if (va && va->flags & VM_VM_AREA) {
>> +if (va && va->flags & VM_VM_AREA && likely(!(va->flags & VM_STATIC))) {
> 
> 
> You might want to move the VM_STATIC check before the VM_VM_AREA
> check so in cases where the former is set we can save one more
> conditional check.
> 

OK, I'll fix this in the next version

Thank you for the review.

Re: [PATCH] mm: add VM_STATIC flag to vmalloc and prevent from removing the areas

2017-04-13 Thread Hoeun Ryu

> On Apr 13, 2017, at 1:17 PM, Anshuman Khandual  
> wrote:
> 
>> On 04/12/2017 10:31 AM, Hoeun Ryu wrote:
>> vm_area_add_early/vm_area_register_early() are used to reserve vmalloc area
>> during boot process and those virtually mapped areas are never unmapped.
>> So `OR` VM_STATIC flag to the areas in vmalloc_init() when importing
>> existing vmlist entries and prevent those areas from being removed from the
>> rbtree by accident.
> 
> I am wondering whether protection against accidental deletion
> of any vmap area should be done in remove_vm_area() function
> or the callers should take care of it. But I guess either way
> it works.
> 
>> 
>> Signed-off-by: Hoeun Ryu 
>> ---
>> include/linux/vmalloc.h | 1 +
>> mm/vmalloc.c| 9 ++---
>> 2 files changed, 7 insertions(+), 3 deletions(-)
>> 
>> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
>> index 46991ad..3df53fc 100644
>> --- a/include/linux/vmalloc.h
>> +++ b/include/linux/vmalloc.h
>> @@ -19,6 +19,7 @@ struct notifier_block;/* in notifier.h */
>> #define VM_UNINITIALIZED0x0020/* vm_struct is not fully 
>> initialized */
>> #define VM_NO_GUARD0x0040  /* don't add guard page */
>> #define VM_KASAN0x0080  /* has allocated kasan shadow memory 
>> */
>> +#define VM_STATIC0x0200
> 
> You might want to add some description in the comment saying
> its a sticky VM area which will never go away or something.
> 

OK. I will add some description.

>> /* bits [20..32] reserved for arch specific ioremap internals */
>> 
>> /*
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index 8ef8ea1..fb5049a 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -1262,7 +1262,7 @@ void __init vmalloc_init(void)
>>/* Import existing vmlist entries. */
>>for (tmp = vmlist; tmp; tmp = tmp->next) {
>>va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT);
>> -va->flags = VM_VM_AREA;
>> +va->flags = VM_VM_AREA | VM_STATIC;
>>va->va_start = (unsigned long)tmp->addr;
>>va->va_end = va->va_start + tmp->size;
>>va->vm = tmp;
>> @@ -1480,7 +1480,7 @@ struct vm_struct *remove_vm_area(const void *addr)
>>might_sleep();
>> 
>>va = find_vmap_area((unsigned long)addr);
>> -if (va && va->flags & VM_VM_AREA) {
>> +if (va && va->flags & VM_VM_AREA && likely(!(va->flags & VM_STATIC))) {
> 
> 
> You might want to move the VM_STATIC check before the VM_VM_AREA
> check so in cases where the former is set we can save one more
> conditional check.
> 

OK, I'll fix this in the next version

Thank you for the review.

Re: [PATCH] mm: add VM_STATIC flag to vmalloc and prevent from removing the areas

2017-04-12 Thread Hoeun Ryu

> On Apr 12, 2017, at 3:02 PM, Christoph Hellwig <h...@infradead.org> wrote:
> 
>> On Wed, Apr 12, 2017 at 02:01:59PM +0900, Hoeun Ryu wrote:
>> vm_area_add_early/vm_area_register_early() are used to reserve vmalloc area
>> during boot process and those virtually mapped areas are never unmapped.
>> So `OR` VM_STATIC flag to the areas in vmalloc_init() when importing
>> existing vmlist entries and prevent those areas from being removed from the
>> rbtree by accident.
> 
> How would they be removed "by accident"?

I don't mean actual use-cases, but I just want to make it robust against like 
programming errors.



Re: [PATCH] mm: add VM_STATIC flag to vmalloc and prevent from removing the areas

2017-04-12 Thread Hoeun Ryu

> On Apr 12, 2017, at 3:02 PM, Christoph Hellwig  wrote:
> 
>> On Wed, Apr 12, 2017 at 02:01:59PM +0900, Hoeun Ryu wrote:
>> vm_area_add_early/vm_area_register_early() are used to reserve vmalloc area
>> during boot process and those virtually mapped areas are never unmapped.
>> So `OR` VM_STATIC flag to the areas in vmalloc_init() when importing
>> existing vmlist entries and prevent those areas from being removed from the
>> rbtree by accident.
> 
> How would they be removed "by accident"?

I don't mean actual use-cases, but I just want to make it robust against like 
programming errors.



  1   2   >