RE: [PATCH v2] panic: move bust_spinlocks(0) after console_flush_on_panic() to avoid deadlocks
+CC sergey.senozhatsky.w...@gmail.com pmla...@suse.com Please review this patch. > -Original Message- > From: Hoeun Ryu [mailto:hoeun@lge.com.com] > Sent: Tuesday, June 05, 2018 11:19 AM > To: Andrew Morton ; Kees Cook > ; Andi Kleen ; Borislav Petkov > ; Thomas Gleixner ; Steven Rostedt (VMware) > > Cc: Josh Poimboeuf ; Tejun Heo ; > Vitaly Kuznetsov ; Hoeun Ryu ; > linux-kernel@vger.kernel.org > Subject: [PATCH v2] panic: move bust_spinlocks(0) after > console_flush_on_panic() to avoid deadlocks > > From: Hoeun Ryu > > Many console device drivers hold the uart_port->lock spinlock with irq > disabled > (using spin_lock_irqsave()) while the device drivers are writing > characters to their > devices, but the device drivers just try to hold the spin lock (using > spin_trylock_irqsave()) instead if "oops_in_progress" is equal or greater > than 1 to > avoid deadlocks. > > There is a case ocurring a deadlock related to the lock and > oops_in_progress. If the > kernel lockup detector calls panic() while the device driver is holding > the lock, > it can cause a deadlock because panic() eventually calls console_unlock() > and tries > to hold the lock. Here is an example. > > CPU0 > > local_irq_save() > . > foo() > bar() > .// foo() + bar() takes long time > printk() >console_unlock() > call_console_drivers() // close to watchdog threshold >some_slow_console_device_write() // device driver code > spin_lock_irqsave(uart->lock)// acquire uart spin lock >slow-write() > watchdog_overflow_callback() // watchdog expired and call > panic() >panic() > bust_spinlocks(0)// now, oops_in_progress = 0 >console_flush_on_panic() > console_unlock() >call_console_drivers() > some_slow_console_device_write() >spin_lock_irqsave(uart->lock) > deadlock // we can use > spin_trylock_irqsave() > > console_flush_on_panic() is called in panic() and it eventually holds the > uart > lock but the lock is held by the preempted CPU (the same CPU in NMI > context) and it is > a deadlock. > By moving bust_spinlocks(0) after console_flush_on_panic(), let the > console device > drivers think the Oops is still in progress to call spin_trylock_irqsave() > instead of > spin_lock_irqsave() to avoid the deadlock. > > CPU0 > > watchdog_overflow_callback() // watchdog expired and > call panic() >panic() > console_flush_on_panic() >console_unlock() > call_console_drivers() >some_slow_console_device_write() > spin_trylock_irqsave(uart->lock) // oops_in_progress = 1 > use trylock, no deadlock > bust_spinlocks(0)// now, oops_in_progress = > 0 > > Signed-off-by: Hoeun Ryu > --- > v2: fix commit message on the reason of a deadlock, no code change. > > kernel/panic.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/kernel/panic.c b/kernel/panic.c > index 42e4874..b4063b6 100644 > --- a/kernel/panic.c > +++ b/kernel/panic.c > @@ -233,8 +233,6 @@ void panic(const char *fmt, ...) > if (_crash_kexec_post_notifiers) > __crash_kexec(NULL); > > - bust_spinlocks(0); > - > /* >* We may have ended up stopping the CPU holding the lock (in >* smp_send_stop()) while still having some valuable data in the > console > @@ -246,6 +244,8 @@ void panic(const char *fmt, ...) > debug_locks_off(); > console_flush_on_panic(); > > + bust_spinlocks(0); > + > if (!panic_blink) > panic_blink = no_blink; > > -- > 2.1.4
RE: [PATCH v2] panic: move bust_spinlocks(0) after console_flush_on_panic() to avoid deadlocks
+CC sergey.senozhatsky.w...@gmail.com pmla...@suse.com Please review this patch. > -Original Message- > From: Hoeun Ryu [mailto:hoeun@lge.com.com] > Sent: Tuesday, June 05, 2018 11:19 AM > To: Andrew Morton ; Kees Cook > ; Andi Kleen ; Borislav Petkov > ; Thomas Gleixner ; Steven Rostedt (VMware) > > Cc: Josh Poimboeuf ; Tejun Heo ; > Vitaly Kuznetsov ; Hoeun Ryu ; > linux-kernel@vger.kernel.org > Subject: [PATCH v2] panic: move bust_spinlocks(0) after > console_flush_on_panic() to avoid deadlocks > > From: Hoeun Ryu > > Many console device drivers hold the uart_port->lock spinlock with irq > disabled > (using spin_lock_irqsave()) while the device drivers are writing > characters to their > devices, but the device drivers just try to hold the spin lock (using > spin_trylock_irqsave()) instead if "oops_in_progress" is equal or greater > than 1 to > avoid deadlocks. > > There is a case ocurring a deadlock related to the lock and > oops_in_progress. If the > kernel lockup detector calls panic() while the device driver is holding > the lock, > it can cause a deadlock because panic() eventually calls console_unlock() > and tries > to hold the lock. Here is an example. > > CPU0 > > local_irq_save() > . > foo() > bar() > .// foo() + bar() takes long time > printk() >console_unlock() > call_console_drivers() // close to watchdog threshold >some_slow_console_device_write() // device driver code > spin_lock_irqsave(uart->lock)// acquire uart spin lock >slow-write() > watchdog_overflow_callback() // watchdog expired and call > panic() >panic() > bust_spinlocks(0)// now, oops_in_progress = 0 >console_flush_on_panic() > console_unlock() >call_console_drivers() > some_slow_console_device_write() >spin_lock_irqsave(uart->lock) > deadlock // we can use > spin_trylock_irqsave() > > console_flush_on_panic() is called in panic() and it eventually holds the > uart > lock but the lock is held by the preempted CPU (the same CPU in NMI > context) and it is > a deadlock. > By moving bust_spinlocks(0) after console_flush_on_panic(), let the > console device > drivers think the Oops is still in progress to call spin_trylock_irqsave() > instead of > spin_lock_irqsave() to avoid the deadlock. > > CPU0 > > watchdog_overflow_callback() // watchdog expired and > call panic() >panic() > console_flush_on_panic() >console_unlock() > call_console_drivers() >some_slow_console_device_write() > spin_trylock_irqsave(uart->lock) // oops_in_progress = 1 > use trylock, no deadlock > bust_spinlocks(0)// now, oops_in_progress = > 0 > > Signed-off-by: Hoeun Ryu > --- > v2: fix commit message on the reason of a deadlock, no code change. > > kernel/panic.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/kernel/panic.c b/kernel/panic.c > index 42e4874..b4063b6 100644 > --- a/kernel/panic.c > +++ b/kernel/panic.c > @@ -233,8 +233,6 @@ void panic(const char *fmt, ...) > if (_crash_kexec_post_notifiers) > __crash_kexec(NULL); > > - bust_spinlocks(0); > - > /* >* We may have ended up stopping the CPU holding the lock (in >* smp_send_stop()) while still having some valuable data in the > console > @@ -246,6 +244,8 @@ void panic(const char *fmt, ...) > debug_locks_off(); > console_flush_on_panic(); > > + bust_spinlocks(0); > + > if (!panic_blink) > panic_blink = no_blink; > > -- > 2.1.4
RE: [PATCH] panic: move bust_spinlocks(0) after console_flush_on_panic() to avoid deadlocks
I misunderstood the cause of a deadlock. I sent v2 fixing the commit message about the reason of the deadlock. Please ignore this and review v2. Thank you. > -Original Message- > From: Steven Rostedt [mailto:rost...@goodmis.org] > Sent: Tuesday, June 05, 2018 10:44 AM > To: Hoeun Ryu > Cc: Andrew Morton ; Kees Cook > ; Borislav Petkov ; Andi Kleen > ; Josh Poimboeuf ; Hoeun Ryu > ; linux-kernel@vger.kernel.org; Tejun Heo > ; Vitaly Kuznetsov > Subject: Re: [PATCH] panic: move bust_spinlocks(0) after > console_flush_on_panic() to avoid deadlocks > > On Mon, 4 Jun 2018 14:45:57 +0900 > Hoeun Ryu wrote: > > > From: Hoeun Ryu > > > > Many console device drivers hold the uart_port->lock spinlock with irq > enabled > > (using spin_lock()) while the device drivers are writing characters to > their devices, > > but the device drivers just try to hold the spin lock (using > spin_trylock()) if > > "oops_in_progress" is equal or greater than 1 to avoid deadlocks. > > > > There is a case ocurring a deadlock related to the lock and > oops_in_progress. A CPU > > could be stopped by smp_send_stop() while it was holding the port lock > because irq was > > enabled. Once a CPU stops, it doesn't respond interrupts anymore and the > lock stays > > locked forever. > > > > console_flush_on_panic() is called during panic() and it eventually > holds the uart > > lock but the lock is held by another stopped CPU and it is a deadlock. > By moving > > bust_spinlocks(0) after console_flush_on_panic(), let the console device > drivers > > think the Oops is still in progress to call spin_trylock() instead of > spin_lock() to > > avoid the deadlock. > > > > Signed-off-by: Hoeun Ryu > > --- > > kernel/panic.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/panic.c b/kernel/panic.c > > index 42e4874..b4063b6 100644 > > --- a/kernel/panic.c > > +++ b/kernel/panic.c > > @@ -233,8 +233,6 @@ void panic(const char *fmt, ...) > > if (_crash_kexec_post_notifiers) > > __crash_kexec(NULL); > > > > - bust_spinlocks(0); > > - > > /* > > * We may have ended up stopping the CPU holding the lock (in > > * smp_send_stop()) while still having some valuable data in the > console > > @@ -246,6 +244,8 @@ void panic(const char *fmt, ...) > > debug_locks_off(); > > console_flush_on_panic(); > > > > + bust_spinlocks(0); > > Added a few more to Cc. This looks like it could have subtle > side-effects. I'd like those that have been touching the code around > here to have a look. > > -- Steve > > > > + > > if (!panic_blink) > > panic_blink = no_blink; > >
RE: [PATCH] panic: move bust_spinlocks(0) after console_flush_on_panic() to avoid deadlocks
I misunderstood the cause of a deadlock. I sent v2 fixing the commit message about the reason of the deadlock. Please ignore this and review v2. Thank you. > -Original Message- > From: Steven Rostedt [mailto:rost...@goodmis.org] > Sent: Tuesday, June 05, 2018 10:44 AM > To: Hoeun Ryu > Cc: Andrew Morton ; Kees Cook > ; Borislav Petkov ; Andi Kleen > ; Josh Poimboeuf ; Hoeun Ryu > ; linux-kernel@vger.kernel.org; Tejun Heo > ; Vitaly Kuznetsov > Subject: Re: [PATCH] panic: move bust_spinlocks(0) after > console_flush_on_panic() to avoid deadlocks > > On Mon, 4 Jun 2018 14:45:57 +0900 > Hoeun Ryu wrote: > > > From: Hoeun Ryu > > > > Many console device drivers hold the uart_port->lock spinlock with irq > enabled > > (using spin_lock()) while the device drivers are writing characters to > their devices, > > but the device drivers just try to hold the spin lock (using > spin_trylock()) if > > "oops_in_progress" is equal or greater than 1 to avoid deadlocks. > > > > There is a case ocurring a deadlock related to the lock and > oops_in_progress. A CPU > > could be stopped by smp_send_stop() while it was holding the port lock > because irq was > > enabled. Once a CPU stops, it doesn't respond interrupts anymore and the > lock stays > > locked forever. > > > > console_flush_on_panic() is called during panic() and it eventually > holds the uart > > lock but the lock is held by another stopped CPU and it is a deadlock. > By moving > > bust_spinlocks(0) after console_flush_on_panic(), let the console device > drivers > > think the Oops is still in progress to call spin_trylock() instead of > spin_lock() to > > avoid the deadlock. > > > > Signed-off-by: Hoeun Ryu > > --- > > kernel/panic.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/panic.c b/kernel/panic.c > > index 42e4874..b4063b6 100644 > > --- a/kernel/panic.c > > +++ b/kernel/panic.c > > @@ -233,8 +233,6 @@ void panic(const char *fmt, ...) > > if (_crash_kexec_post_notifiers) > > __crash_kexec(NULL); > > > > - bust_spinlocks(0); > > - > > /* > > * We may have ended up stopping the CPU holding the lock (in > > * smp_send_stop()) while still having some valuable data in the > console > > @@ -246,6 +244,8 @@ void panic(const char *fmt, ...) > > debug_locks_off(); > > console_flush_on_panic(); > > > > + bust_spinlocks(0); > > Added a few more to Cc. This looks like it could have subtle > side-effects. I'd like those that have been touching the code around > here to have a look. > > -- Steve > > > > + > > if (!panic_blink) > > panic_blink = no_blink; > >
[PATCH v2] panic: move bust_spinlocks(0) after console_flush_on_panic() to avoid deadlocks
From: Hoeun Ryu Many console device drivers hold the uart_port->lock spinlock with irq disabled (using spin_lock_irqsave()) while the device drivers are writing characters to their devices, but the device drivers just try to hold the spin lock (using spin_trylock_irqsave()) instead if "oops_in_progress" is equal or greater than 1 to avoid deadlocks. There is a case ocurring a deadlock related to the lock and oops_in_progress. If the kernel lockup detector calls panic() while the device driver is holding the lock, it can cause a deadlock because panic() eventually calls console_unlock() and tries to hold the lock. Here is an example. CPU0 local_irq_save() . foo() bar() . // foo() + bar() takes long time printk() console_unlock() call_console_drivers() // close to watchdog threshold some_slow_console_device_write() // device driver code spin_lock_irqsave(uart->lock) // acquire uart spin lock slow-write() watchdog_overflow_callback() // watchdog expired and call panic() panic() bust_spinlocks(0) // now, oops_in_progress = 0 console_flush_on_panic() console_unlock() call_console_drivers() some_slow_console_device_write() spin_lock_irqsave(uart->lock) deadlock// we can use spin_trylock_irqsave() console_flush_on_panic() is called in panic() and it eventually holds the uart lock but the lock is held by the preempted CPU (the same CPU in NMI context) and it is a deadlock. By moving bust_spinlocks(0) after console_flush_on_panic(), let the console device drivers think the Oops is still in progress to call spin_trylock_irqsave() instead of spin_lock_irqsave() to avoid the deadlock. CPU0 watchdog_overflow_callback() // watchdog expired and call panic() panic() console_flush_on_panic() console_unlock() call_console_drivers() some_slow_console_device_write() spin_trylock_irqsave(uart->lock) // oops_in_progress = 1 use trylock, no deadlock bust_spinlocks(0) // now, oops_in_progress = 0 Signed-off-by: Hoeun Ryu --- v2: fix commit message on the reason of a deadlock, no code change. kernel/panic.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/panic.c b/kernel/panic.c index 42e4874..b4063b6 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -233,8 +233,6 @@ void panic(const char *fmt, ...) if (_crash_kexec_post_notifiers) __crash_kexec(NULL); - bust_spinlocks(0); - /* * We may have ended up stopping the CPU holding the lock (in * smp_send_stop()) while still having some valuable data in the console @@ -246,6 +244,8 @@ void panic(const char *fmt, ...) debug_locks_off(); console_flush_on_panic(); + bust_spinlocks(0); + if (!panic_blink) panic_blink = no_blink; -- 2.1.4
[PATCH v2] panic: move bust_spinlocks(0) after console_flush_on_panic() to avoid deadlocks
From: Hoeun Ryu Many console device drivers hold the uart_port->lock spinlock with irq disabled (using spin_lock_irqsave()) while the device drivers are writing characters to their devices, but the device drivers just try to hold the spin lock (using spin_trylock_irqsave()) instead if "oops_in_progress" is equal or greater than 1 to avoid deadlocks. There is a case ocurring a deadlock related to the lock and oops_in_progress. If the kernel lockup detector calls panic() while the device driver is holding the lock, it can cause a deadlock because panic() eventually calls console_unlock() and tries to hold the lock. Here is an example. CPU0 local_irq_save() . foo() bar() . // foo() + bar() takes long time printk() console_unlock() call_console_drivers() // close to watchdog threshold some_slow_console_device_write() // device driver code spin_lock_irqsave(uart->lock) // acquire uart spin lock slow-write() watchdog_overflow_callback() // watchdog expired and call panic() panic() bust_spinlocks(0) // now, oops_in_progress = 0 console_flush_on_panic() console_unlock() call_console_drivers() some_slow_console_device_write() spin_lock_irqsave(uart->lock) deadlock// we can use spin_trylock_irqsave() console_flush_on_panic() is called in panic() and it eventually holds the uart lock but the lock is held by the preempted CPU (the same CPU in NMI context) and it is a deadlock. By moving bust_spinlocks(0) after console_flush_on_panic(), let the console device drivers think the Oops is still in progress to call spin_trylock_irqsave() instead of spin_lock_irqsave() to avoid the deadlock. CPU0 watchdog_overflow_callback() // watchdog expired and call panic() panic() console_flush_on_panic() console_unlock() call_console_drivers() some_slow_console_device_write() spin_trylock_irqsave(uart->lock) // oops_in_progress = 1 use trylock, no deadlock bust_spinlocks(0) // now, oops_in_progress = 0 Signed-off-by: Hoeun Ryu --- v2: fix commit message on the reason of a deadlock, no code change. kernel/panic.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/panic.c b/kernel/panic.c index 42e4874..b4063b6 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -233,8 +233,6 @@ void panic(const char *fmt, ...) if (_crash_kexec_post_notifiers) __crash_kexec(NULL); - bust_spinlocks(0); - /* * We may have ended up stopping the CPU holding the lock (in * smp_send_stop()) while still having some valuable data in the console @@ -246,6 +244,8 @@ void panic(const char *fmt, ...) debug_locks_off(); console_flush_on_panic(); + bust_spinlocks(0); + if (!panic_blink) panic_blink = no_blink; -- 2.1.4
[PATCH] panic: move bust_spinlocks(0) after console_flush_on_panic() to avoid deadlocks
From: Hoeun Ryu Many console device drivers hold the uart_port->lock spinlock with irq enabled (using spin_lock()) while the device drivers are writing characters to their devices, but the device drivers just try to hold the spin lock (using spin_trylock()) if "oops_in_progress" is equal or greater than 1 to avoid deadlocks. There is a case ocurring a deadlock related to the lock and oops_in_progress. A CPU could be stopped by smp_send_stop() while it was holding the port lock because irq was enabled. Once a CPU stops, it doesn't respond interrupts anymore and the lock stays locked forever. console_flush_on_panic() is called during panic() and it eventually holds the uart lock but the lock is held by another stopped CPU and it is a deadlock. By moving bust_spinlocks(0) after console_flush_on_panic(), let the console device drivers think the Oops is still in progress to call spin_trylock() instead of spin_lock() to avoid the deadlock. Signed-off-by: Hoeun Ryu --- kernel/panic.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/panic.c b/kernel/panic.c index 42e4874..b4063b6 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -233,8 +233,6 @@ void panic(const char *fmt, ...) if (_crash_kexec_post_notifiers) __crash_kexec(NULL); - bust_spinlocks(0); - /* * We may have ended up stopping the CPU holding the lock (in * smp_send_stop()) while still having some valuable data in the console @@ -246,6 +244,8 @@ void panic(const char *fmt, ...) debug_locks_off(); console_flush_on_panic(); + bust_spinlocks(0); + if (!panic_blink) panic_blink = no_blink; -- 2.1.4
[PATCH] panic: move bust_spinlocks(0) after console_flush_on_panic() to avoid deadlocks
From: Hoeun Ryu Many console device drivers hold the uart_port->lock spinlock with irq enabled (using spin_lock()) while the device drivers are writing characters to their devices, but the device drivers just try to hold the spin lock (using spin_trylock()) if "oops_in_progress" is equal or greater than 1 to avoid deadlocks. There is a case ocurring a deadlock related to the lock and oops_in_progress. A CPU could be stopped by smp_send_stop() while it was holding the port lock because irq was enabled. Once a CPU stops, it doesn't respond interrupts anymore and the lock stays locked forever. console_flush_on_panic() is called during panic() and it eventually holds the uart lock but the lock is held by another stopped CPU and it is a deadlock. By moving bust_spinlocks(0) after console_flush_on_panic(), let the console device drivers think the Oops is still in progress to call spin_trylock() instead of spin_lock() to avoid the deadlock. Signed-off-by: Hoeun Ryu --- kernel/panic.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/panic.c b/kernel/panic.c index 42e4874..b4063b6 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -233,8 +233,6 @@ void panic(const char *fmt, ...) if (_crash_kexec_post_notifiers) __crash_kexec(NULL); - bust_spinlocks(0); - /* * We may have ended up stopping the CPU holding the lock (in * smp_send_stop()) while still having some valuable data in the console @@ -246,6 +244,8 @@ void panic(const char *fmt, ...) debug_locks_off(); console_flush_on_panic(); + bust_spinlocks(0); + if (!panic_blink) panic_blink = no_blink; -- 2.1.4
[PATCH v2] printk: make printk_safe_flush safe in NMI context by skipping flushing
From: Hoeun Ryu Make printk_safe_flush() safe in NMI context. nmi_trigger_cpumask_backtrace() can be called in NMI context. For example the function is called in watchdog_overflow_callback() if the flag of hardlockup backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and watchdog_overflow_callback() function is called in NMI context on some architectures. Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually tries to lock logbuf_lock in vprintk_emit() that might be already be part of another non-nmi context on the same CPU or a soft- or hard-lockup on another CPU. The example of deadlock can be CPU0 local_irq_save(); for (;;) req = blk_peek_request(q); if (unlikely(!scsi_device_online(sdev))) printk() vprintk_emit() console_unlock() logbuf_lock_irqsave() slow-serial-console-write()// close to watchdog threshold watchdog_overflow_callback() trigger_allbutself_cpu_backtrace() printk_safe_flush() vprintk_emit() logbuf_lock_irqsave() deadlock and some other cases. This patch prevents a deadlock in printk_safe_flush() in NMI context. It makes sure that we continue and eventually call printk_safe_flush_on_panic() from panic() that has better chances to succeed. There is a risk that logbuf_lock was not part of a soft- or dead-lockup and we might just loose the messages. But then there is a high chance that irq_work will get called and the messages will get flushed the normal way. Signed-off-by: Hoeun Ryu Suggested-by: Petr Mladek Suggested-by: Sergey Senozhatsky --- v2: fix comments in commit message and code. no change in code itself. kernel/printk/printk_safe.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index 3e3c200..3b5c660 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -254,6 +254,17 @@ void printk_safe_flush(void) { int cpu; + /* +* Just avoid a deadlock here. +* It makes sure that we continue and eventually call +* printk_safe_flush_on_panic() from panic() that has better chances to succeed. +* There is a risk that logbuf_lock was not part of a soft- or dead-lockup and +* we might just loose the messages. But then there is a high chance that +* irq_work will get called and the messages will get flushed the normal way. +*/ + if (this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK) + return; + for_each_possible_cpu(cpu) { #ifdef CONFIG_PRINTK_NMI __printk_safe_flush(_cpu(nmi_print_seq, cpu).work); -- 2.1.4
[PATCH v2] printk: make printk_safe_flush safe in NMI context by skipping flushing
From: Hoeun Ryu Make printk_safe_flush() safe in NMI context. nmi_trigger_cpumask_backtrace() can be called in NMI context. For example the function is called in watchdog_overflow_callback() if the flag of hardlockup backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and watchdog_overflow_callback() function is called in NMI context on some architectures. Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually tries to lock logbuf_lock in vprintk_emit() that might be already be part of another non-nmi context on the same CPU or a soft- or hard-lockup on another CPU. The example of deadlock can be CPU0 local_irq_save(); for (;;) req = blk_peek_request(q); if (unlikely(!scsi_device_online(sdev))) printk() vprintk_emit() console_unlock() logbuf_lock_irqsave() slow-serial-console-write()// close to watchdog threshold watchdog_overflow_callback() trigger_allbutself_cpu_backtrace() printk_safe_flush() vprintk_emit() logbuf_lock_irqsave() deadlock and some other cases. This patch prevents a deadlock in printk_safe_flush() in NMI context. It makes sure that we continue and eventually call printk_safe_flush_on_panic() from panic() that has better chances to succeed. There is a risk that logbuf_lock was not part of a soft- or dead-lockup and we might just loose the messages. But then there is a high chance that irq_work will get called and the messages will get flushed the normal way. Signed-off-by: Hoeun Ryu Suggested-by: Petr Mladek Suggested-by: Sergey Senozhatsky --- v2: fix comments in commit message and code. no change in code itself. kernel/printk/printk_safe.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index 3e3c200..3b5c660 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -254,6 +254,17 @@ void printk_safe_flush(void) { int cpu; + /* +* Just avoid a deadlock here. +* It makes sure that we continue and eventually call +* printk_safe_flush_on_panic() from panic() that has better chances to succeed. +* There is a risk that logbuf_lock was not part of a soft- or dead-lockup and +* we might just loose the messages. But then there is a high chance that +* irq_work will get called and the messages will get flushed the normal way. +*/ + if (this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK) + return; + for_each_possible_cpu(cpu) { #ifdef CONFIG_PRINTK_NMI __printk_safe_flush(_cpu(nmi_print_seq, cpu).work); -- 2.1.4
RE: [PATCH] printk: make printk_safe_flush safe in NMI context by skipping flushing
> -Original Message- > From: Petr Mladek [mailto:pmla...@suse.com] > Sent: Wednesday, May 30, 2018 5:32 PM > To: Sergey Senozhatsky > Cc: Hoeun Ryu ; Sergey Senozhatsky > ; Steven Rostedt ; > Hoeun Ryu ; linux-kernel@vger.kernel.org > Subject: Re: [PATCH] printk: make printk_safe_flush safe in NMI context by > skipping flushing > > On Tue 2018-05-29 21:13:15, Sergey Senozhatsky wrote: > > On (05/29/18 11:51), Hoeun Ryu wrote: > > > Make printk_safe_flush() safe in NMI context. > > > nmi_trigger_cpumask_backtrace() can be called in NMI context. For > example the > > > function is called in watchdog_overflow_callback() if the flag of > hardlockup > > > backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and > > > watchdog_overflow_callback() function is called in NMI context on some > > > architectures. > > > Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() > eventually tries > > > to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be > already locked in > > > preempted contexts (task or irq in this case) or by other CPUs and it > may cause > > The sentence "logbuf_lock can be already locked in preempted contexts" > does not > make much sense. It is a spin lock. It means that both interrupts and > preemption are disabled. > I'd like to say that the preempting context is NMI, so the preempted contexts could be task/irq/bh contexts on the same CPU. > I would change it to something like: > > "Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually > tries > to lock logbuf_lock in vprintk_emit() that might be already be part > of a soft- or hard-lockup on another CPU." > It looks more clear. But I'd modify "be part of a soft- or hard-lockup on another CPU." to "be part of another non-nmi context on the same CPU or a soft- or hard-lockup on another CPU." How about it? > > > > deadlocks. > > > By making printk_safe_flush() safe in NMI context, the backtrace > triggering CPU > > > just skips flushing if the lock is not avaiable in NMI context. The > messages in > > > per-cpu nmi buffer of the backtrace triggering CPU can be lost if the > CPU is in > > > hard lockup (because irq is disabled here) but if panic() is not > called. The > > > flushing can be delayed by the next irq work in normal cases. > > I somehow miss there a motivation why the current state is better than > the previous. It looks like we exchange the risk of a deadlock with > a risk of loosing the messages. > > I see it the following way: > > "This patch prevents a deadlock in printk_safe_flush() in NMI > context. It makes sure that we continue and eventually call > printk_safe_flush_on_panic() from panic() that has better > chances to succeed. > > There is a risk that logbuf_lock was not part of a soft- or > dead-lockup and we might just loose the messages. But then there is a high > chance that irq_work will get called and the messages will get flushed > the normal way." > > > > Any chance we can add more info to the commit message? E.g. backtraces > > which would describe "how" is this possible (like the one I posted in > > another email). Just to make it more clear. > > I agree that a backtrace would be helpful. But it is not a must to > have from my point of view. > > The patch itself looks good to me. > > Best Regards, > Petr
RE: [PATCH] printk: make printk_safe_flush safe in NMI context by skipping flushing
> -Original Message- > From: Petr Mladek [mailto:pmla...@suse.com] > Sent: Wednesday, May 30, 2018 5:32 PM > To: Sergey Senozhatsky > Cc: Hoeun Ryu ; Sergey Senozhatsky > ; Steven Rostedt ; > Hoeun Ryu ; linux-kernel@vger.kernel.org > Subject: Re: [PATCH] printk: make printk_safe_flush safe in NMI context by > skipping flushing > > On Tue 2018-05-29 21:13:15, Sergey Senozhatsky wrote: > > On (05/29/18 11:51), Hoeun Ryu wrote: > > > Make printk_safe_flush() safe in NMI context. > > > nmi_trigger_cpumask_backtrace() can be called in NMI context. For > example the > > > function is called in watchdog_overflow_callback() if the flag of > hardlockup > > > backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and > > > watchdog_overflow_callback() function is called in NMI context on some > > > architectures. > > > Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() > eventually tries > > > to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be > already locked in > > > preempted contexts (task or irq in this case) or by other CPUs and it > may cause > > The sentence "logbuf_lock can be already locked in preempted contexts" > does not > make much sense. It is a spin lock. It means that both interrupts and > preemption are disabled. > I'd like to say that the preempting context is NMI, so the preempted contexts could be task/irq/bh contexts on the same CPU. > I would change it to something like: > > "Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually > tries > to lock logbuf_lock in vprintk_emit() that might be already be part > of a soft- or hard-lockup on another CPU." > It looks more clear. But I'd modify "be part of a soft- or hard-lockup on another CPU." to "be part of another non-nmi context on the same CPU or a soft- or hard-lockup on another CPU." How about it? > > > > deadlocks. > > > By making printk_safe_flush() safe in NMI context, the backtrace > triggering CPU > > > just skips flushing if the lock is not avaiable in NMI context. The > messages in > > > per-cpu nmi buffer of the backtrace triggering CPU can be lost if the > CPU is in > > > hard lockup (because irq is disabled here) but if panic() is not > called. The > > > flushing can be delayed by the next irq work in normal cases. > > I somehow miss there a motivation why the current state is better than > the previous. It looks like we exchange the risk of a deadlock with > a risk of loosing the messages. > > I see it the following way: > > "This patch prevents a deadlock in printk_safe_flush() in NMI > context. It makes sure that we continue and eventually call > printk_safe_flush_on_panic() from panic() that has better > chances to succeed. > > There is a risk that logbuf_lock was not part of a soft- or > dead-lockup and we might just loose the messages. But then there is a high > chance that irq_work will get called and the messages will get flushed > the normal way." > > > > Any chance we can add more info to the commit message? E.g. backtraces > > which would describe "how" is this possible (like the one I posted in > > another email). Just to make it more clear. > > I agree that a backtrace would be helpful. But it is not a must to > have from my point of view. > > The patch itself looks good to me. > > Best Regards, > Petr
RE: [PATCH] printk: make printk_safe_flush safe in NMI context by skipping flushing
I appreciate the detailed correction. I will reflect the corrections in the next patch. plus, the explanation in the code will be fixed. > -Original Message- > From: Petr Mladek [mailto:pmla...@suse.com] > Sent: Wednesday, May 30, 2018 5:32 PM > To: Sergey Senozhatsky > Cc: Hoeun Ryu ; Sergey Senozhatsky > ; Steven Rostedt ; > Hoeun Ryu ; linux-kernel@vger.kernel.org > Subject: Re: [PATCH] printk: make printk_safe_flush safe in NMI context by > skipping flushing > > On Tue 2018-05-29 21:13:15, Sergey Senozhatsky wrote: > > On (05/29/18 11:51), Hoeun Ryu wrote: > > > Make printk_safe_flush() safe in NMI context. > > > nmi_trigger_cpumask_backtrace() can be called in NMI context. For > example the > > > function is called in watchdog_overflow_callback() if the flag of > hardlockup > > > backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and > > > watchdog_overflow_callback() function is called in NMI context on some > > > architectures. > > > Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() > eventually tries > > > to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be > already locked in > > > preempted contexts (task or irq in this case) or by other CPUs and it > may cause > > The sentence "logbuf_lock can be already locked in preempted contexts" > does not > make much sense. It is a spin lock. It means that both interrupts and > preemption are disabled. > > I would change it to something like: > > "Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually > tries > to lock logbuf_lock in vprintk_emit() that might be already be part > of a soft- or hard-lockup on another CPU." > > > > > deadlocks. > > > By making printk_safe_flush() safe in NMI context, the backtrace > triggering CPU > > > just skips flushing if the lock is not avaiable in NMI context. The > messages in > > > per-cpu nmi buffer of the backtrace triggering CPU can be lost if the > CPU is in > > > hard lockup (because irq is disabled here) but if panic() is not > called. The > > > flushing can be delayed by the next irq work in normal cases. > > I somehow miss there a motivation why the current state is better than > the previous. It looks like we exchange the risk of a deadlock with > a risk of loosing the messages. > > I see it the following way: > > "This patch prevents a deadlock in printk_safe_flush() in NMI > context. It makes sure that we continue and eventually call > printk_safe_flush_on_panic() from panic() that has better > chances to succeed. > > There is a risk that logbuf_lock was not part of a soft- or > dead-lockup and we might just loose the messages. But then there is a high > chance that irq_work will get called and the messages will get flushed > the normal way." > > > > Any chance we can add more info to the commit message? E.g. backtraces > > which would describe "how" is this possible (like the one I posted in > > another email). Just to make it more clear. > > I agree that a backtrace would be helpful. But it is not a must to > have from my point of view. > > The patch itself looks good to me. > > Best Regards, > Petr
RE: [PATCH] printk: make printk_safe_flush safe in NMI context by skipping flushing
I appreciate the detailed correction. I will reflect the corrections in the next patch. plus, the explanation in the code will be fixed. > -Original Message- > From: Petr Mladek [mailto:pmla...@suse.com] > Sent: Wednesday, May 30, 2018 5:32 PM > To: Sergey Senozhatsky > Cc: Hoeun Ryu ; Sergey Senozhatsky > ; Steven Rostedt ; > Hoeun Ryu ; linux-kernel@vger.kernel.org > Subject: Re: [PATCH] printk: make printk_safe_flush safe in NMI context by > skipping flushing > > On Tue 2018-05-29 21:13:15, Sergey Senozhatsky wrote: > > On (05/29/18 11:51), Hoeun Ryu wrote: > > > Make printk_safe_flush() safe in NMI context. > > > nmi_trigger_cpumask_backtrace() can be called in NMI context. For > example the > > > function is called in watchdog_overflow_callback() if the flag of > hardlockup > > > backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and > > > watchdog_overflow_callback() function is called in NMI context on some > > > architectures. > > > Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() > eventually tries > > > to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be > already locked in > > > preempted contexts (task or irq in this case) or by other CPUs and it > may cause > > The sentence "logbuf_lock can be already locked in preempted contexts" > does not > make much sense. It is a spin lock. It means that both interrupts and > preemption are disabled. > > I would change it to something like: > > "Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually > tries > to lock logbuf_lock in vprintk_emit() that might be already be part > of a soft- or hard-lockup on another CPU." > > > > > deadlocks. > > > By making printk_safe_flush() safe in NMI context, the backtrace > triggering CPU > > > just skips flushing if the lock is not avaiable in NMI context. The > messages in > > > per-cpu nmi buffer of the backtrace triggering CPU can be lost if the > CPU is in > > > hard lockup (because irq is disabled here) but if panic() is not > called. The > > > flushing can be delayed by the next irq work in normal cases. > > I somehow miss there a motivation why the current state is better than > the previous. It looks like we exchange the risk of a deadlock with > a risk of loosing the messages. > > I see it the following way: > > "This patch prevents a deadlock in printk_safe_flush() in NMI > context. It makes sure that we continue and eventually call > printk_safe_flush_on_panic() from panic() that has better > chances to succeed. > > There is a risk that logbuf_lock was not part of a soft- or > dead-lockup and we might just loose the messages. But then there is a high > chance that irq_work will get called and the messages will get flushed > the normal way." > > > > Any chance we can add more info to the commit message? E.g. backtraces > > which would describe "how" is this possible (like the one I posted in > > another email). Just to make it more clear. > > I agree that a backtrace would be helpful. But it is not a must to > have from my point of view. > > The patch itself looks good to me. > > Best Regards, > Petr
RE: [PATCH] printk: make printk_safe_flush safe in NMI context by skipping flushing
> -Original Message- > From: Sergey Senozhatsky [mailto:sergey.senozhatsky.w...@gmail.com] > Sent: Tuesday, May 29, 2018 9:13 PM > To: Hoeun Ryu > Cc: Petr Mladek ; Sergey Senozhatsky > ; Steven Rostedt ; > Hoeun Ryu ; linux-kernel@vger.kernel.org > Subject: Re: [PATCH] printk: make printk_safe_flush safe in NMI context by > skipping flushing > > On (05/29/18 11:51), Hoeun Ryu wrote: > > Make printk_safe_flush() safe in NMI context. > > nmi_trigger_cpumask_backtrace() can be called in NMI context. For > example the > > function is called in watchdog_overflow_callback() if the flag of > hardlockup > > backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and > > watchdog_overflow_callback() function is called in NMI context on some > > architectures. > > Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() > eventually tries > > to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be already > locked in > > preempted contexts (task or irq in this case) or by other CPUs and it > may cause > > deadlocks. > > By making printk_safe_flush() safe in NMI context, the backtrace > triggering CPU > > just skips flushing if the lock is not avaiable in NMI context. The > messages in > > per-cpu nmi buffer of the backtrace triggering CPU can be lost if the > CPU is in > > hard lockup (because irq is disabled here) but if panic() is not called. > The > > flushing can be delayed by the next irq work in normal cases. > > Any chance we can add more info to the commit message? E.g. backtraces > which would describe "how" is this possible (like the one I posted in > another email). Just to make it more clear. > OK, I will. > > @@ -254,6 +254,16 @@ void printk_safe_flush(void) > > { > > int cpu; > > > > + /* > > +* Just avoid deadlocks here, we could loose the messages in per- > cpu nmi buffer > > +* in the case that hardlockup happens but panic() is not called > (irq_work won't > > +* work). > > +* The flushing can be delayed by the next irq_work if flushing is > skippped here > ^^ skipped > Typo will be fixed. > -ss
RE: [PATCH] printk: make printk_safe_flush safe in NMI context by skipping flushing
> -Original Message- > From: Sergey Senozhatsky [mailto:sergey.senozhatsky.w...@gmail.com] > Sent: Tuesday, May 29, 2018 9:13 PM > To: Hoeun Ryu > Cc: Petr Mladek ; Sergey Senozhatsky > ; Steven Rostedt ; > Hoeun Ryu ; linux-kernel@vger.kernel.org > Subject: Re: [PATCH] printk: make printk_safe_flush safe in NMI context by > skipping flushing > > On (05/29/18 11:51), Hoeun Ryu wrote: > > Make printk_safe_flush() safe in NMI context. > > nmi_trigger_cpumask_backtrace() can be called in NMI context. For > example the > > function is called in watchdog_overflow_callback() if the flag of > hardlockup > > backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and > > watchdog_overflow_callback() function is called in NMI context on some > > architectures. > > Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() > eventually tries > > to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be already > locked in > > preempted contexts (task or irq in this case) or by other CPUs and it > may cause > > deadlocks. > > By making printk_safe_flush() safe in NMI context, the backtrace > triggering CPU > > just skips flushing if the lock is not avaiable in NMI context. The > messages in > > per-cpu nmi buffer of the backtrace triggering CPU can be lost if the > CPU is in > > hard lockup (because irq is disabled here) but if panic() is not called. > The > > flushing can be delayed by the next irq work in normal cases. > > Any chance we can add more info to the commit message? E.g. backtraces > which would describe "how" is this possible (like the one I posted in > another email). Just to make it more clear. > OK, I will. > > @@ -254,6 +254,16 @@ void printk_safe_flush(void) > > { > > int cpu; > > > > + /* > > +* Just avoid deadlocks here, we could loose the messages in per- > cpu nmi buffer > > +* in the case that hardlockup happens but panic() is not called > (irq_work won't > > +* work). > > +* The flushing can be delayed by the next irq_work if flushing is > skippped here > ^^ skipped > Typo will be fixed. > -ss
[PATCH] printk: make printk_safe_flush safe in NMI context by skipping flushing
From: Hoeun Ryu Make printk_safe_flush() safe in NMI context. nmi_trigger_cpumask_backtrace() can be called in NMI context. For example the function is called in watchdog_overflow_callback() if the flag of hardlockup backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and watchdog_overflow_callback() function is called in NMI context on some architectures. Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually tries to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be already locked in preempted contexts (task or irq in this case) or by other CPUs and it may cause deadlocks. By making printk_safe_flush() safe in NMI context, the backtrace triggering CPU just skips flushing if the lock is not avaiable in NMI context. The messages in per-cpu nmi buffer of the backtrace triggering CPU can be lost if the CPU is in hard lockup (because irq is disabled here) but if panic() is not called. The flushing can be delayed by the next irq work in normal cases. Suggested-by: Sergey Senozhatsky Signed-off-by: Hoeun Ryu --- kernel/printk/printk_safe.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index 3e3c200..62bcc9b 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -254,6 +254,16 @@ void printk_safe_flush(void) { int cpu; + /* +* Just avoid deadlocks here, we could loose the messages in per-cpu nmi buffer +* in the case that hardlockup happens but panic() is not called (irq_work won't +* work). +* The flushing can be delayed by the next irq_work if flushing is skippped here +* in normal cases. +*/ + if (this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK) + return; + for_each_possible_cpu(cpu) { #ifdef CONFIG_PRINTK_NMI __printk_safe_flush(_cpu(nmi_print_seq, cpu).work); -- 2.1.4
[PATCH] printk: make printk_safe_flush safe in NMI context by skipping flushing
From: Hoeun Ryu Make printk_safe_flush() safe in NMI context. nmi_trigger_cpumask_backtrace() can be called in NMI context. For example the function is called in watchdog_overflow_callback() if the flag of hardlockup backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and watchdog_overflow_callback() function is called in NMI context on some architectures. Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually tries to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be already locked in preempted contexts (task or irq in this case) or by other CPUs and it may cause deadlocks. By making printk_safe_flush() safe in NMI context, the backtrace triggering CPU just skips flushing if the lock is not avaiable in NMI context. The messages in per-cpu nmi buffer of the backtrace triggering CPU can be lost if the CPU is in hard lockup (because irq is disabled here) but if panic() is not called. The flushing can be delayed by the next irq work in normal cases. Suggested-by: Sergey Senozhatsky Signed-off-by: Hoeun Ryu --- kernel/printk/printk_safe.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index 3e3c200..62bcc9b 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -254,6 +254,16 @@ void printk_safe_flush(void) { int cpu; + /* +* Just avoid deadlocks here, we could loose the messages in per-cpu nmi buffer +* in the case that hardlockup happens but panic() is not called (irq_work won't +* work). +* The flushing can be delayed by the next irq_work if flushing is skippped here +* in normal cases. +*/ + if (this_cpu_read(printk_context) & PRINTK_NMI_CONTEXT_MASK) + return; + for_each_possible_cpu(cpu) { #ifdef CONFIG_PRINTK_NMI __printk_safe_flush(_cpu(nmi_print_seq, cpu).work); -- 2.1.4
[PATCH] printk: make printk_safe_flush safe in NMI context
From: Hoeun Ryu Make printk_safe_flush() safe in NMI context. And printk_safe_flush_on_panic() is folded into this function. The prototype of printk_safe_flush() is changed to "void printk_safe_flush(bool panic)". nmi_trigger_cpumask_backtrace() can be called in NMI context. For example the function is called in watchdog_overflow_callback() if the flag of hardlockup backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and watchdog_overflow_callback() function is called in NMI context on some architectures. Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually tries to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be already locked in preempted contexts (task or irq in this case) or by other CPUs and it may cause deadlocks. By making printk_safe_flush() safe in NMI context, the backtrace triggering CPU just skips flushing if the lock is not avaiable in NMI context. The messages in per-cpu nmi buffer of the backtrace triggering CPU can be lost if the CPU is in hard lockup (because irq is disabled here) but if panic() is not called. The flushing can be delayed by the next irq work in normal cases. Suggested-by: Sergey Senozhatsky Signed-off-by: Hoeun Ryu --- arch/powerpc/kernel/traps.c| 2 +- arch/powerpc/kernel/watchdog.c | 2 +- include/linux/printk.h | 9 ++ kernel/kexec_core.c| 2 +- kernel/panic.c | 4 +-- kernel/printk/printk_safe.c| 62 +- lib/nmi_backtrace.c| 2 +- 7 files changed, 39 insertions(+), 44 deletions(-) diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 0904492..c50749c 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -160,7 +160,7 @@ extern void panic_flush_kmsg_start(void) extern void panic_flush_kmsg_end(void) { - printk_safe_flush_on_panic(); + printk_safe_flush(true); kmsg_dump(KMSG_DUMP_PANIC); bust_spinlocks(0); debug_locks_off(); diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c index 6256dc3..3c9138b 100644 --- a/arch/powerpc/kernel/watchdog.c +++ b/arch/powerpc/kernel/watchdog.c @@ -173,7 +173,7 @@ static void watchdog_smp_panic(int cpu, u64 tb) wd_smp_unlock(); - printk_safe_flush(); + printk_safe_flush(false); /* * printk_safe_flush() seems to require another print * before anything actually goes out to console. diff --git a/include/linux/printk.h b/include/linux/printk.h index 6d7e800..495fe26 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -203,8 +203,7 @@ void dump_stack_print_info(const char *log_lvl); void show_regs_print_info(const char *log_lvl); extern asmlinkage void dump_stack(void) __cold; extern void printk_safe_init(void); -extern void printk_safe_flush(void); -extern void printk_safe_flush_on_panic(void); +extern void printk_safe_flush(bool panic); #else static inline __printf(1, 0) int vprintk(const char *s, va_list args) @@ -273,11 +272,7 @@ static inline void printk_safe_init(void) { } -static inline void printk_safe_flush(void) -{ -} - -static inline void printk_safe_flush_on_panic(void) +static inline void printk_safe_flush(bool panic) { } #endif diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 20fef1a..1b0876e 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -961,7 +961,7 @@ void crash_kexec(struct pt_regs *regs) old_cpu = atomic_cmpxchg(_cpu, PANIC_CPU_INVALID, this_cpu); if (old_cpu == PANIC_CPU_INVALID) { /* This is the 1st CPU which comes here, so go ahead. */ - printk_safe_flush_on_panic(); + printk_safe_flush(true); __crash_kexec(regs); /* diff --git a/kernel/panic.c b/kernel/panic.c index 42e4874..2f2c86c 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -193,7 +193,7 @@ void panic(const char *fmt, ...) * Bypass the panic_cpu check and call __crash_kexec directly. */ if (!_crash_kexec_post_notifiers) { - printk_safe_flush_on_panic(); + printk_safe_flush(true); __crash_kexec(NULL); /* @@ -218,7 +218,7 @@ void panic(const char *fmt, ...) atomic_notifier_call_chain(_notifier_list, 0, buf); /* Call flush even twice. It tries harder with a single online CPU */ - printk_safe_flush_on_panic(); + printk_safe_flush(true); kmsg_dump(KMSG_DUMP_PANIC); /* diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index 3e3c200..35ea941 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -244,49 +244,49 @@ static void __printk_safe_flush(struct irq_work *work) } /** - * printk_safe_flush - flush all per-cpu nmi buffers. + * printk_safe_flush - flush all per-cpu nmi buffers. it can be called ev
[PATCH] printk: make printk_safe_flush safe in NMI context
From: Hoeun Ryu Make printk_safe_flush() safe in NMI context. And printk_safe_flush_on_panic() is folded into this function. The prototype of printk_safe_flush() is changed to "void printk_safe_flush(bool panic)". nmi_trigger_cpumask_backtrace() can be called in NMI context. For example the function is called in watchdog_overflow_callback() if the flag of hardlockup backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and watchdog_overflow_callback() function is called in NMI context on some architectures. Calling printk_safe_flush() in nmi_trigger_cpumask_backtrace() eventually tries to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be already locked in preempted contexts (task or irq in this case) or by other CPUs and it may cause deadlocks. By making printk_safe_flush() safe in NMI context, the backtrace triggering CPU just skips flushing if the lock is not avaiable in NMI context. The messages in per-cpu nmi buffer of the backtrace triggering CPU can be lost if the CPU is in hard lockup (because irq is disabled here) but if panic() is not called. The flushing can be delayed by the next irq work in normal cases. Suggested-by: Sergey Senozhatsky Signed-off-by: Hoeun Ryu --- arch/powerpc/kernel/traps.c| 2 +- arch/powerpc/kernel/watchdog.c | 2 +- include/linux/printk.h | 9 ++ kernel/kexec_core.c| 2 +- kernel/panic.c | 4 +-- kernel/printk/printk_safe.c| 62 +- lib/nmi_backtrace.c| 2 +- 7 files changed, 39 insertions(+), 44 deletions(-) diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c index 0904492..c50749c 100644 --- a/arch/powerpc/kernel/traps.c +++ b/arch/powerpc/kernel/traps.c @@ -160,7 +160,7 @@ extern void panic_flush_kmsg_start(void) extern void panic_flush_kmsg_end(void) { - printk_safe_flush_on_panic(); + printk_safe_flush(true); kmsg_dump(KMSG_DUMP_PANIC); bust_spinlocks(0); debug_locks_off(); diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c index 6256dc3..3c9138b 100644 --- a/arch/powerpc/kernel/watchdog.c +++ b/arch/powerpc/kernel/watchdog.c @@ -173,7 +173,7 @@ static void watchdog_smp_panic(int cpu, u64 tb) wd_smp_unlock(); - printk_safe_flush(); + printk_safe_flush(false); /* * printk_safe_flush() seems to require another print * before anything actually goes out to console. diff --git a/include/linux/printk.h b/include/linux/printk.h index 6d7e800..495fe26 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -203,8 +203,7 @@ void dump_stack_print_info(const char *log_lvl); void show_regs_print_info(const char *log_lvl); extern asmlinkage void dump_stack(void) __cold; extern void printk_safe_init(void); -extern void printk_safe_flush(void); -extern void printk_safe_flush_on_panic(void); +extern void printk_safe_flush(bool panic); #else static inline __printf(1, 0) int vprintk(const char *s, va_list args) @@ -273,11 +272,7 @@ static inline void printk_safe_init(void) { } -static inline void printk_safe_flush(void) -{ -} - -static inline void printk_safe_flush_on_panic(void) +static inline void printk_safe_flush(bool panic) { } #endif diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 20fef1a..1b0876e 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -961,7 +961,7 @@ void crash_kexec(struct pt_regs *regs) old_cpu = atomic_cmpxchg(_cpu, PANIC_CPU_INVALID, this_cpu); if (old_cpu == PANIC_CPU_INVALID) { /* This is the 1st CPU which comes here, so go ahead. */ - printk_safe_flush_on_panic(); + printk_safe_flush(true); __crash_kexec(regs); /* diff --git a/kernel/panic.c b/kernel/panic.c index 42e4874..2f2c86c 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -193,7 +193,7 @@ void panic(const char *fmt, ...) * Bypass the panic_cpu check and call __crash_kexec directly. */ if (!_crash_kexec_post_notifiers) { - printk_safe_flush_on_panic(); + printk_safe_flush(true); __crash_kexec(NULL); /* @@ -218,7 +218,7 @@ void panic(const char *fmt, ...) atomic_notifier_call_chain(_notifier_list, 0, buf); /* Call flush even twice. It tries harder with a single online CPU */ - printk_safe_flush_on_panic(); + printk_safe_flush(true); kmsg_dump(KMSG_DUMP_PANIC); /* diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index 3e3c200..35ea941 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -244,49 +244,49 @@ static void __printk_safe_flush(struct irq_work *work) } /** - * printk_safe_flush - flush all per-cpu nmi buffers. + * printk_safe_flush - flush all per-cpu nmi buffers. it can be called ev
[PATCH] nmi_backtrace: use printk_safe_flush_on_panic when triggering backtraces
From: Hoeun Ryu <hoeun@lge.com> Use printk_safe_flush_on_panic() in nmi_trigger_cpumask_backtrace(). nmi_trigger_cpumask_backtrace() can be called in NMI context. For example the function is called in watchdog_overflow_callback() if the flag of hardlockup backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and watchdog_overflow_callback() function is called in NMI context on some architectures. printk_safe_flush() eventually tries to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be already locked in preempted contexts (task or irq in this case) or by other CPUs and it may cause deadlocks. By using printk_safe_flush_on_panic() instead of printk_safe_flush() in in nmi_trigger_cpumask_backtrace(), the backtrace triggering CPU can access the lock safely as the lock is reinitialized before calling printk_safe_flush(). The messages in logbuf can be corrupted if the NMI context preempts other contexts or ignores the lock held by other CPUs writing the buffer. Signed-off-by: Hoeun Ryu <hoeun@lge.com> --- lib/nmi_backtrace.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c index 61a6b5a..0ace3c9 100644 --- a/lib/nmi_backtrace.c +++ b/lib/nmi_backtrace.c @@ -78,8 +78,11 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask, /* * Force flush any remote buffers that might be stuck in IRQ context * and therefore could not run their irq_work. +* Call nmi-safe version of printk_safe_flush() because this function can be +* called in NMI context like watchdog_overflow_callback() if +* sysctl_hardlockup_all_cpu_backtrace is true. */ - printk_safe_flush(); + printk_safe_flush_on_panic(); clear_bit_unlock(0, _flag); put_cpu(); -- 2.1.4
[PATCH] nmi_backtrace: use printk_safe_flush_on_panic when triggering backtraces
From: Hoeun Ryu Use printk_safe_flush_on_panic() in nmi_trigger_cpumask_backtrace(). nmi_trigger_cpumask_backtrace() can be called in NMI context. For example the function is called in watchdog_overflow_callback() if the flag of hardlockup backtrace (sysctl_hardlockup_all_cpu_backtrace) is true and watchdog_overflow_callback() function is called in NMI context on some architectures. printk_safe_flush() eventually tries to lock logbuf_lock in vprintk_emit() but the logbuf_lock can be already locked in preempted contexts (task or irq in this case) or by other CPUs and it may cause deadlocks. By using printk_safe_flush_on_panic() instead of printk_safe_flush() in in nmi_trigger_cpumask_backtrace(), the backtrace triggering CPU can access the lock safely as the lock is reinitialized before calling printk_safe_flush(). The messages in logbuf can be corrupted if the NMI context preempts other contexts or ignores the lock held by other CPUs writing the buffer. Signed-off-by: Hoeun Ryu --- lib/nmi_backtrace.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c index 61a6b5a..0ace3c9 100644 --- a/lib/nmi_backtrace.c +++ b/lib/nmi_backtrace.c @@ -78,8 +78,11 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask, /* * Force flush any remote buffers that might be stuck in IRQ context * and therefore could not run their irq_work. +* Call nmi-safe version of printk_safe_flush() because this function can be +* called in NMI context like watchdog_overflow_callback() if +* sysctl_hardlockup_all_cpu_backtrace is true. */ - printk_safe_flush(); + printk_safe_flush_on_panic(); clear_bit_unlock(0, _flag); put_cpu(); -- 2.1.4
[PATCHv2] armpmu: broadcast overflow irq on multi-core system having one muxed SPI for PMU.
From: Hoeun Ryu <hoeun@lge.com> On some SoCs like i.MX6DL/QL have only one muxed SPI for multi-core system. On the systems, a CPU can be interrupted by overflow irq but it is possible that the overflow actually occurs on another CPU. This patch broadcasts the irq using smp_call_function_single_async() so that other CPUs can check and handle their overflows by themselves when a overflow doesn't actually occur on the interrupted CPU. Per-cpu call_single_data are allocated in arm_pmu structure for this purpose during initialization The callback for smp_call_function_single_async() is __armpmu_handle_irq() and the function calls armpmu->handle_irq() with an invalid irq_num because smp_call_func_t has only one parameter and armpmu pointer is handed over by the pointer. It can be a problem if irq_num parameter is used by handlers but no handler uses the irq parameter for now. We could have another approach removing irq_num argument itself in handle_irq() function. Signed-off-by: Hoeun Ryu <hoeun@lge.com> --- drivers/perf/arm_pmu.c | 62 ++-- include/linux/perf/arm_pmu.h | 3 +++ 2 files changed, 63 insertions(+), 2 deletions(-) diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c index 1a0d340..df024a0 100644 --- a/drivers/perf/arm_pmu.c +++ b/drivers/perf/arm_pmu.c @@ -322,6 +322,29 @@ validate_group(struct perf_event *event) return 0; } +static void __armpmu_handle_irq(void *dev) +{ + struct arm_pmu *armpmu; + u64 start_clock, finish_clock; + irqreturn_t ret; + + armpmu = *(void **)dev; + start_clock = sched_clock(); + /* +* irq_num should not be used by the handler, we don't have irq_num for +* the first place. There is no handler using the irq_num argument for now. +* smp_call_func_t has one function argument and irq number cannot be handed +* over to this callback because we need dev pointer here. +* If you need valid irq_num, you need to declare a wrapper struct having +* irq_num and dev pointer. +*/ + ret = armpmu->handle_irq(-1, armpmu); + if (ret == IRQ_HANDLED) { + finish_clock = sched_clock(); + perf_sample_event_took(finish_clock - start_clock); + } +} + static irqreturn_t armpmu_dispatch_irq(int irq, void *dev) { struct arm_pmu *armpmu; @@ -340,9 +363,34 @@ static irqreturn_t armpmu_dispatch_irq(int irq, void *dev) start_clock = sched_clock(); ret = armpmu->handle_irq(irq, armpmu); - finish_clock = sched_clock(); + /* +* The handler just returns with IRQ_NONE when it checks the overflow +* and the overflow doesn't occur on the CPU. +* +* Some SoCs like i.MX6 have one muxed SPI on multi-core system. +* On the systems , the irq should be broadcasted to other CPUs so that the +* CPUs can check their own PMU overflow. +*/ + if (ret == IRQ_HANDLED) { + finish_clock = sched_clock(); + perf_sample_event_took(finish_clock - start_clock); + } else if (ret == IRQ_NONE) { + int cpu; + struct cpumask mask; + + cpumask_copy(, cpu_online_mask); + cpumask_clear_cpu(raw_smp_processor_id(), ); + for_each_cpu(cpu, ) { + call_single_data_t *csd = + per_cpu_ptr(armpmu->ov_brdcast_csd, cpu); + + csd->func = __armpmu_handle_irq; + csd->info = dev; + + smp_call_function_single_async(cpu, csd); + } + } - perf_sample_event_took(finish_clock - start_clock); return ret; } @@ -790,6 +838,13 @@ static struct arm_pmu *__armpmu_alloc(gfp_t flags) goto out_free_pmu; } + pmu->ov_brdcast_csd = alloc_percpu_gfp(call_single_data_t, flags); + if (!pmu->ov_brdcast_csd) { + pr_info("failed to allocate per-cpu " + "overflow broadcasting call single data.\n"); + goto out_free_hw_events; + } + pmu->pmu = (struct pmu) { .pmu_enable = armpmu_enable, .pmu_disable= armpmu_disable, @@ -824,6 +879,8 @@ static struct arm_pmu *__armpmu_alloc(gfp_t flags) return pmu; +out_free_hw_events: + free_percpu(pmu->hw_events); out_free_pmu: kfree(pmu); out: @@ -844,6 +901,7 @@ struct arm_pmu *armpmu_alloc_atomic(void) void armpmu_free(struct arm_pmu *pmu) { free_percpu(pmu->hw_events); + free_percpu(pmu->ov_brdcast_csd); kfree(pmu); } diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h index 40036a5..a63da63 100644 --- a/include/linux/perf/arm_pmu.h +++ b/include/linux/perf/arm_pmu.h @@ -107,6 +10
RE: [PATCH] armpmu: broadcast overflow irq on multi-core system having one muxed SPI for PMU.
Thank you for the review. I understand your NACK. But I'd like to just fix the part of smp_call_function() in the next version. You can simply ignore it. > -Original Message- > From: Mark Rutland [mailto:mark.rutl...@arm.com] > Sent: Friday, May 11, 2018 7:39 PM > To: ��ȣ�� <hoeun@lge.com> > Cc: 'Hoeun Ryu' <hoeun@lge.com.com>; 'Will Deacon' > <will.dea...@arm.com>; linux-arm-ker...@lists.infradead.org; linux- > ker...@vger.kernel.org > Subject: Re: [PATCH] armpmu: broadcast overflow irq on multi-core system > having one muxed SPI for PMU. > > On Fri, May 11, 2018 at 08:20:49AM +0900, ��ȣ�� wrote: > > Thank you for the reply. > > > > > -Original Message- > > > From: Mark Rutland [mailto:mark.rutl...@arm.com] > > > Sent: Thursday, May 10, 2018 7:21 PM > > > To: Hoeun Ryu <hoeun@lge.com.com> > > > Cc: Will Deacon <will.dea...@arm.com>; Hoeun Ryu <hoeun@lge.com>; > > > linux-arm-ker...@lists.infradead.org; linux-kernel@vger.kernel.org > > > Subject: Re: [PATCH] armpmu: broadcast overflow irq on multi-core > system > > > having one muxed SPI for PMU. > > > > Muxing the PMU IRQs is a really broken system design, and there's no > good > > > way of supporting it. > > > > What we should do for such systems is: > > > > > > * Add a flag to the DT to describe that the IRQs are muxed, as this > > > cannot be probed. > > > > > > * Add hrtimer code to periodically update the counters, to avoid > > > overflow (e.g. as we do in the l2x0 PMU). > > > > > > * Reject sampling for such systems, as this cannot be done reliably or > > > efficiently. > > > > > > NAK to broadcasting the IRQ -- there are a number of issues with the > > > general approach. > > > > The second solution would be good if sampling is necessary even like > those > > systems. > > Please note that I mean *all* of the above. There would be no sampling > on systems with muxed PMU IRQs, since there's no correlation between > overflow events and the hrtimer interrupts -- the results of sampling > would be misleading. > > > Actually I'm working on FIQ available ARM32 system and trying to enable > the > > hard lockup detector by routing the PMU IRQ to FIQ. > > Because of that, I really need the interrupt event if it is a muxed SPI, > > beside I also need to make an dedicated IPI FIQ to broadcast the IRQ in > > this approach. > > What would you do if you were in the same situation ? > > I don't think that this can work with a muxed IRQ, sorry. > > It would be better to use some kind of timer. > > [...] > > > > Futher, If you ever encounter a case where you need to avoid > preemption > > > across enabling IRQs, preemption must be disabled *before* enabling > IRQs. > > > > Ah, OK. > > Enabling IRQs can cause scheduling tasks in the end of exception or > other > > scheduling points, right ? > > Yes. If an IRQ was taken *between* enabling IRQs and disabling > preemption, preemption may occur as part of the exception return. > > Thanks, > Mark.
[PATCHv2] armpmu: broadcast overflow irq on multi-core system having one muxed SPI for PMU.
From: Hoeun Ryu On some SoCs like i.MX6DL/QL have only one muxed SPI for multi-core system. On the systems, a CPU can be interrupted by overflow irq but it is possible that the overflow actually occurs on another CPU. This patch broadcasts the irq using smp_call_function_single_async() so that other CPUs can check and handle their overflows by themselves when a overflow doesn't actually occur on the interrupted CPU. Per-cpu call_single_data are allocated in arm_pmu structure for this purpose during initialization The callback for smp_call_function_single_async() is __armpmu_handle_irq() and the function calls armpmu->handle_irq() with an invalid irq_num because smp_call_func_t has only one parameter and armpmu pointer is handed over by the pointer. It can be a problem if irq_num parameter is used by handlers but no handler uses the irq parameter for now. We could have another approach removing irq_num argument itself in handle_irq() function. Signed-off-by: Hoeun Ryu --- drivers/perf/arm_pmu.c | 62 ++-- include/linux/perf/arm_pmu.h | 3 +++ 2 files changed, 63 insertions(+), 2 deletions(-) diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c index 1a0d340..df024a0 100644 --- a/drivers/perf/arm_pmu.c +++ b/drivers/perf/arm_pmu.c @@ -322,6 +322,29 @@ validate_group(struct perf_event *event) return 0; } +static void __armpmu_handle_irq(void *dev) +{ + struct arm_pmu *armpmu; + u64 start_clock, finish_clock; + irqreturn_t ret; + + armpmu = *(void **)dev; + start_clock = sched_clock(); + /* +* irq_num should not be used by the handler, we don't have irq_num for +* the first place. There is no handler using the irq_num argument for now. +* smp_call_func_t has one function argument and irq number cannot be handed +* over to this callback because we need dev pointer here. +* If you need valid irq_num, you need to declare a wrapper struct having +* irq_num and dev pointer. +*/ + ret = armpmu->handle_irq(-1, armpmu); + if (ret == IRQ_HANDLED) { + finish_clock = sched_clock(); + perf_sample_event_took(finish_clock - start_clock); + } +} + static irqreturn_t armpmu_dispatch_irq(int irq, void *dev) { struct arm_pmu *armpmu; @@ -340,9 +363,34 @@ static irqreturn_t armpmu_dispatch_irq(int irq, void *dev) start_clock = sched_clock(); ret = armpmu->handle_irq(irq, armpmu); - finish_clock = sched_clock(); + /* +* The handler just returns with IRQ_NONE when it checks the overflow +* and the overflow doesn't occur on the CPU. +* +* Some SoCs like i.MX6 have one muxed SPI on multi-core system. +* On the systems , the irq should be broadcasted to other CPUs so that the +* CPUs can check their own PMU overflow. +*/ + if (ret == IRQ_HANDLED) { + finish_clock = sched_clock(); + perf_sample_event_took(finish_clock - start_clock); + } else if (ret == IRQ_NONE) { + int cpu; + struct cpumask mask; + + cpumask_copy(, cpu_online_mask); + cpumask_clear_cpu(raw_smp_processor_id(), ); + for_each_cpu(cpu, ) { + call_single_data_t *csd = + per_cpu_ptr(armpmu->ov_brdcast_csd, cpu); + + csd->func = __armpmu_handle_irq; + csd->info = dev; + + smp_call_function_single_async(cpu, csd); + } + } - perf_sample_event_took(finish_clock - start_clock); return ret; } @@ -790,6 +838,13 @@ static struct arm_pmu *__armpmu_alloc(gfp_t flags) goto out_free_pmu; } + pmu->ov_brdcast_csd = alloc_percpu_gfp(call_single_data_t, flags); + if (!pmu->ov_brdcast_csd) { + pr_info("failed to allocate per-cpu " + "overflow broadcasting call single data.\n"); + goto out_free_hw_events; + } + pmu->pmu = (struct pmu) { .pmu_enable = armpmu_enable, .pmu_disable= armpmu_disable, @@ -824,6 +879,8 @@ static struct arm_pmu *__armpmu_alloc(gfp_t flags) return pmu; +out_free_hw_events: + free_percpu(pmu->hw_events); out_free_pmu: kfree(pmu); out: @@ -844,6 +901,7 @@ struct arm_pmu *armpmu_alloc_atomic(void) void armpmu_free(struct arm_pmu *pmu) { free_percpu(pmu->hw_events); + free_percpu(pmu->ov_brdcast_csd); kfree(pmu); } diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h index 40036a5..a63da63 100644 --- a/include/linux/perf/arm_pmu.h +++ b/include/linux/perf/arm_pmu.h @@ -107,6 +107,9 @@ struct arm_pmu { /* Only to be used
RE: [PATCH] armpmu: broadcast overflow irq on multi-core system having one muxed SPI for PMU.
Thank you for the review. I understand your NACK. But I'd like to just fix the part of smp_call_function() in the next version. You can simply ignore it. > -Original Message- > From: Mark Rutland [mailto:mark.rutl...@arm.com] > Sent: Friday, May 11, 2018 7:39 PM > To: ��ȣ�� > Cc: 'Hoeun Ryu' ; 'Will Deacon' > ; linux-arm-ker...@lists.infradead.org; linux- > ker...@vger.kernel.org > Subject: Re: [PATCH] armpmu: broadcast overflow irq on multi-core system > having one muxed SPI for PMU. > > On Fri, May 11, 2018 at 08:20:49AM +0900, ��ȣ�� wrote: > > Thank you for the reply. > > > > > -Original Message- > > > From: Mark Rutland [mailto:mark.rutl...@arm.com] > > > Sent: Thursday, May 10, 2018 7:21 PM > > > To: Hoeun Ryu > > > Cc: Will Deacon ; Hoeun Ryu ; > > > linux-arm-ker...@lists.infradead.org; linux-kernel@vger.kernel.org > > > Subject: Re: [PATCH] armpmu: broadcast overflow irq on multi-core > system > > > having one muxed SPI for PMU. > > > > Muxing the PMU IRQs is a really broken system design, and there's no > good > > > way of supporting it. > > > > What we should do for such systems is: > > > > > > * Add a flag to the DT to describe that the IRQs are muxed, as this > > > cannot be probed. > > > > > > * Add hrtimer code to periodically update the counters, to avoid > > > overflow (e.g. as we do in the l2x0 PMU). > > > > > > * Reject sampling for such systems, as this cannot be done reliably or > > > efficiently. > > > > > > NAK to broadcasting the IRQ -- there are a number of issues with the > > > general approach. > > > > The second solution would be good if sampling is necessary even like > those > > systems. > > Please note that I mean *all* of the above. There would be no sampling > on systems with muxed PMU IRQs, since there's no correlation between > overflow events and the hrtimer interrupts -- the results of sampling > would be misleading. > > > Actually I'm working on FIQ available ARM32 system and trying to enable > the > > hard lockup detector by routing the PMU IRQ to FIQ. > > Because of that, I really need the interrupt event if it is a muxed SPI, > > beside I also need to make an dedicated IPI FIQ to broadcast the IRQ in > > this approach. > > What would you do if you were in the same situation ? > > I don't think that this can work with a muxed IRQ, sorry. > > It would be better to use some kind of timer. > > [...] > > > > Futher, If you ever encounter a case where you need to avoid > preemption > > > across enabling IRQs, preemption must be disabled *before* enabling > IRQs. > > > > Ah, OK. > > Enabling IRQs can cause scheduling tasks in the end of exception or > other > > scheduling points, right ? > > Yes. If an IRQ was taken *between* enabling IRQs and disabling > preemption, preemption may occur as part of the exception return. > > Thanks, > Mark.
RE: smp_call_function() friends and irq/bottom_half context
> -Original Message- > From: Peter Zijlstra [mailto:pet...@infradead.org] > Sent: Thursday, May 10, 2018 7:17 PM > To: ��ȣ��> Cc: mi...@kernel.org; aaron...@intel.com; adobri...@gmail.com; > frede...@kernel.org; ying.hu...@intel.com; linux-kernel@vger.kernel.org > Subject: Re: smp_call_function() friends and irq/bottom_half context > > On Thu, May 10, 2018 at 01:58:29PM +0900, ��ȣ�� wrote: > > Hi, all. > > > > I'm reading kernel/smp.c code and I found comments on smp_call_function() > > and smp_call_function_[single/many] > > saying that these functions are cannot be called in interrupt disabled > > context or irq/bottom half handlers. > > > > I understand that there is a potential deadlock issue when caller CPU of > > the functions is waiting for the completion of the callback of other > CPUs. > > But I was wondering if this is the case even when the caller CPU doesn't > > wait for the completion (wait == 0). > > IIRC yes, because csd_lock(). You can however use > smp_call_function_single_async() with your own csd. Be very careful > though, it is very easy to construct deadlocks. Thank you for the explanation and the suggestion. Would you please explain more on csd lock and irq disabled or irq/bh context ? How can deadlocks happen when calling smp_call_function(wait=0) with irq disabled ? How can deadlocks happen when calling smp_call_function(wait=0) from irq or bottom half context ? I think it as 2d array like call_single_data[caller cpu][target cpu].flags. csd_lock(csd[caller][target]) spins on the flag only when LOCKED is already set which means that the caller CPU already called smp_call_function(wait=0) and at the second calling of the function, the caller CPU spins on csd_lock(csd[caller][target]) but the target CPU is still processing its callback list (maybe processing other csd[other cpu][target]) and it does not yet process csd[caller][target]. But the callback list will be finally flushed and LOCKED flags is cleared of csd[caller][target] eventually ?
RE: smp_call_function() friends and irq/bottom_half context
> -Original Message- > From: Peter Zijlstra [mailto:pet...@infradead.org] > Sent: Thursday, May 10, 2018 7:17 PM > To: ��ȣ�� > Cc: mi...@kernel.org; aaron...@intel.com; adobri...@gmail.com; > frede...@kernel.org; ying.hu...@intel.com; linux-kernel@vger.kernel.org > Subject: Re: smp_call_function() friends and irq/bottom_half context > > On Thu, May 10, 2018 at 01:58:29PM +0900, ��ȣ�� wrote: > > Hi, all. > > > > I'm reading kernel/smp.c code and I found comments on smp_call_function() > > and smp_call_function_[single/many] > > saying that these functions are cannot be called in interrupt disabled > > context or irq/bottom half handlers. > > > > I understand that there is a potential deadlock issue when caller CPU of > > the functions is waiting for the completion of the callback of other > CPUs. > > But I was wondering if this is the case even when the caller CPU doesn't > > wait for the completion (wait == 0). > > IIRC yes, because csd_lock(). You can however use > smp_call_function_single_async() with your own csd. Be very careful > though, it is very easy to construct deadlocks. Thank you for the explanation and the suggestion. Would you please explain more on csd lock and irq disabled or irq/bh context ? How can deadlocks happen when calling smp_call_function(wait=0) with irq disabled ? How can deadlocks happen when calling smp_call_function(wait=0) from irq or bottom half context ? I think it as 2d array like call_single_data[caller cpu][target cpu].flags. csd_lock(csd[caller][target]) spins on the flag only when LOCKED is already set which means that the caller CPU already called smp_call_function(wait=0) and at the second calling of the function, the caller CPU spins on csd_lock(csd[caller][target]) but the target CPU is still processing its callback list (maybe processing other csd[other cpu][target]) and it does not yet process csd[caller][target]. But the callback list will be finally flushed and LOCKED flags is cleared of csd[caller][target] eventually ?
[PATCH] armpmu: broadcast overflow irq on multi-core system having one muxed SPI for PMU.
From: Hoeun Ryu <hoeun@lge.com> On some SoCs like i.MX6DL/QL have only one muxed SPI for multi-core system. On the systems, a CPU can be interrupted by overflow irq but it is possible that the overflow actually occurs on another CPU. This patch broadcasts the irq using smp_call_function() so that other CPUs can check and handle their overflows by themselves when a overflow doesn't actually occur on the interrupted CPU. Local irq is enabled and preemption is disabled temporarily to call smp_call_function_many() in armpmu_dispatch_irq() as the smp_call_function_many() doesn't allow to be called with irq-disabled. The callback for smp_call_function_many() is __armpmu_handle_irq() and the function calls armpmu->handle_irq() with an invalid irq_num because smp_call_func_t has only one parameter and armpmu pointer is handed over by the pointer. It can be a problem if irq_num parameter is used by handlers but no handler uses the irq parameter for now. We could have another approach removing irq_num argument itself in handle_irq() function. Signed-off-by: Hoeun Ryu <hoeun@lge.com> --- drivers/perf/arm_pmu.c | 51 +++--- 1 file changed, 48 insertions(+), 3 deletions(-) diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c index 1a0d340..3d65e44 100644 --- a/drivers/perf/arm_pmu.c +++ b/drivers/perf/arm_pmu.c @@ -322,6 +322,29 @@ validate_group(struct perf_event *event) return 0; } +static void __armpmu_handle_irq(void *dev) +{ + struct arm_pmu *armpmu; + u64 start_clock, finish_clock; + irqreturn_t ret; + + armpmu = *(void **)dev; + start_clock = sched_clock(); + /* +* irq_num should not be used by the handler, we don't have irq_num for +* the first place. There is no handler using the irq_num argument for now. +* smp_call_func_t has one function argument and irq number cannot be handed +* over to this callback because we need dev pointer here. +* If you need valid irq_num, you need to declare a wrapper struct having +* irq_num and dev pointer. +*/ + ret = armpmu->handle_irq(-1, armpmu); + if (ret == IRQ_HANDLED) { + finish_clock = sched_clock(); + perf_sample_event_took(finish_clock - start_clock); + } +} + static irqreturn_t armpmu_dispatch_irq(int irq, void *dev) { struct arm_pmu *armpmu; @@ -340,9 +363,31 @@ static irqreturn_t armpmu_dispatch_irq(int irq, void *dev) start_clock = sched_clock(); ret = armpmu->handle_irq(irq, armpmu); - finish_clock = sched_clock(); - - perf_sample_event_took(finish_clock - start_clock); + /* +* The handler just returns with IRQ_NONE when it checks the overflow +* and the overflow doesn't occur on the CPU. +* +* Some SoCs like i.MX6 have one muxed SPI on multi-core system. +* On the systems , the irq should be broadcasted to other CPUs so that the +* CPUs can check their own PMU overflow. +*/ + if (ret == IRQ_HANDLED) { + finish_clock = sched_clock(); + perf_sample_event_took(finish_clock - start_clock); + } else if (ret == IRQ_NONE) { + struct cpumask mask; + + cpumask_copy(, cpu_online_mask); + cpumask_clear_cpu(raw_smp_processor_id(), ); + if (!cpumask_empty()) { + /* smp_call_function cannot be called with irq disabled */ + local_irq_enable(); + preempt_disable(); + smp_call_function_many(, __armpmu_handle_irq, dev, 0); + preempt_enable(); + local_irq_disable(); + } + } return ret; } -- 2.1.4
[PATCH] armpmu: broadcast overflow irq on multi-core system having one muxed SPI for PMU.
From: Hoeun Ryu On some SoCs like i.MX6DL/QL have only one muxed SPI for multi-core system. On the systems, a CPU can be interrupted by overflow irq but it is possible that the overflow actually occurs on another CPU. This patch broadcasts the irq using smp_call_function() so that other CPUs can check and handle their overflows by themselves when a overflow doesn't actually occur on the interrupted CPU. Local irq is enabled and preemption is disabled temporarily to call smp_call_function_many() in armpmu_dispatch_irq() as the smp_call_function_many() doesn't allow to be called with irq-disabled. The callback for smp_call_function_many() is __armpmu_handle_irq() and the function calls armpmu->handle_irq() with an invalid irq_num because smp_call_func_t has only one parameter and armpmu pointer is handed over by the pointer. It can be a problem if irq_num parameter is used by handlers but no handler uses the irq parameter for now. We could have another approach removing irq_num argument itself in handle_irq() function. Signed-off-by: Hoeun Ryu --- drivers/perf/arm_pmu.c | 51 +++--- 1 file changed, 48 insertions(+), 3 deletions(-) diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c index 1a0d340..3d65e44 100644 --- a/drivers/perf/arm_pmu.c +++ b/drivers/perf/arm_pmu.c @@ -322,6 +322,29 @@ validate_group(struct perf_event *event) return 0; } +static void __armpmu_handle_irq(void *dev) +{ + struct arm_pmu *armpmu; + u64 start_clock, finish_clock; + irqreturn_t ret; + + armpmu = *(void **)dev; + start_clock = sched_clock(); + /* +* irq_num should not be used by the handler, we don't have irq_num for +* the first place. There is no handler using the irq_num argument for now. +* smp_call_func_t has one function argument and irq number cannot be handed +* over to this callback because we need dev pointer here. +* If you need valid irq_num, you need to declare a wrapper struct having +* irq_num and dev pointer. +*/ + ret = armpmu->handle_irq(-1, armpmu); + if (ret == IRQ_HANDLED) { + finish_clock = sched_clock(); + perf_sample_event_took(finish_clock - start_clock); + } +} + static irqreturn_t armpmu_dispatch_irq(int irq, void *dev) { struct arm_pmu *armpmu; @@ -340,9 +363,31 @@ static irqreturn_t armpmu_dispatch_irq(int irq, void *dev) start_clock = sched_clock(); ret = armpmu->handle_irq(irq, armpmu); - finish_clock = sched_clock(); - - perf_sample_event_took(finish_clock - start_clock); + /* +* The handler just returns with IRQ_NONE when it checks the overflow +* and the overflow doesn't occur on the CPU. +* +* Some SoCs like i.MX6 have one muxed SPI on multi-core system. +* On the systems , the irq should be broadcasted to other CPUs so that the +* CPUs can check their own PMU overflow. +*/ + if (ret == IRQ_HANDLED) { + finish_clock = sched_clock(); + perf_sample_event_took(finish_clock - start_clock); + } else if (ret == IRQ_NONE) { + struct cpumask mask; + + cpumask_copy(, cpu_online_mask); + cpumask_clear_cpu(raw_smp_processor_id(), ); + if (!cpumask_empty()) { + /* smp_call_function cannot be called with irq disabled */ + local_irq_enable(); + preempt_disable(); + smp_call_function_many(, __armpmu_handle_irq, dev, 0); + preempt_enable(); + local_irq_disable(); + } + } return ret; } -- 2.1.4
[RFC 3/3] sched: depend on 64BIT_ATOMIC_ACCESS to determine if to use min_vruntime_copy
'min_vruntime_copy' is copied when 'min_vruntime' is updated for cfq_rq and used to check if updating 'min_vruntime' is completed on reader side. Because 'min_vruntime' variable is 64bit, we need a mimic of seqlock to check if the variable is not being updated on 32bit machines. On 64BIT_ATOMIC_ACCESS enabled machines, 64bit accesses are atomic even though the machines are 32bit, so we can directly access 'min_vruntime' on the architectures. Depend on CONFIG_64BIT_ATOMIC_ACCESS instead of CONFIG_64BIT to determine whether 'min_vruntime_copy' variable is used for synchronization or not. And align 'min_vruntime' by 8 if 64BIT_ATOMIC_ALIGNED_ACCESS is true because 64BIT_ATOMIC_ALIGNED_ACCESS enabled system can access the variable atomically only when it' aligned. Signed-off-by: Hoeun Ryu <hoeun@gmail.com> --- kernel/sched/fair.c | 6 +++--- kernel/sched/sched.h | 6 +- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c95880e..840658f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -536,7 +536,7 @@ static void update_min_vruntime(struct cfs_rq *cfs_rq) /* ensure we never gain time by being placed backwards. */ cfs_rq->min_vruntime = max_vruntime(cfs_rq->min_vruntime, vruntime); -#ifndef CONFIG_64BIT +#ifndef CONFIG_64BIT_ATOMIC_ACCESS smp_wmb(); cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime; #endif @@ -5975,7 +5975,7 @@ static void migrate_task_rq_fair(struct task_struct *p) struct cfs_rq *cfs_rq = cfs_rq_of(se); u64 min_vruntime; -#ifndef CONFIG_64BIT +#ifndef CONFIG_64BIT_ATOMIC_ACCESS u64 min_vruntime_copy; do { @@ -9173,7 +9173,7 @@ void init_cfs_rq(struct cfs_rq *cfs_rq) { cfs_rq->tasks_timeline = RB_ROOT; cfs_rq->min_vruntime = (u64)(-(1LL << 20)); -#ifndef CONFIG_64BIT +#ifndef CONFIG_64BIT_ATOMIC_ACCESS cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime; #endif #ifdef CONFIG_SMP diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index eeef1a3..870010b 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -421,8 +421,12 @@ struct cfs_rq { unsigned int nr_running, h_nr_running; u64 exec_clock; +#ifndef CONFIG_64BIT_ATOMIC_ALIGNED_ACCESS u64 min_vruntime; -#ifndef CONFIG_64BIT +#else + u64 min_vruntime __attribute__((aligned(sizeof(u64; +#endif +#ifndef CONFIG_64BIT_ATOMIC_ACCESS u64 min_vruntime_copy; #endif -- 2.7.4
[RFC 3/3] sched: depend on 64BIT_ATOMIC_ACCESS to determine if to use min_vruntime_copy
'min_vruntime_copy' is copied when 'min_vruntime' is updated for cfq_rq and used to check if updating 'min_vruntime' is completed on reader side. Because 'min_vruntime' variable is 64bit, we need a mimic of seqlock to check if the variable is not being updated on 32bit machines. On 64BIT_ATOMIC_ACCESS enabled machines, 64bit accesses are atomic even though the machines are 32bit, so we can directly access 'min_vruntime' on the architectures. Depend on CONFIG_64BIT_ATOMIC_ACCESS instead of CONFIG_64BIT to determine whether 'min_vruntime_copy' variable is used for synchronization or not. And align 'min_vruntime' by 8 if 64BIT_ATOMIC_ALIGNED_ACCESS is true because 64BIT_ATOMIC_ALIGNED_ACCESS enabled system can access the variable atomically only when it' aligned. Signed-off-by: Hoeun Ryu --- kernel/sched/fair.c | 6 +++--- kernel/sched/sched.h | 6 +- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c95880e..840658f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -536,7 +536,7 @@ static void update_min_vruntime(struct cfs_rq *cfs_rq) /* ensure we never gain time by being placed backwards. */ cfs_rq->min_vruntime = max_vruntime(cfs_rq->min_vruntime, vruntime); -#ifndef CONFIG_64BIT +#ifndef CONFIG_64BIT_ATOMIC_ACCESS smp_wmb(); cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime; #endif @@ -5975,7 +5975,7 @@ static void migrate_task_rq_fair(struct task_struct *p) struct cfs_rq *cfs_rq = cfs_rq_of(se); u64 min_vruntime; -#ifndef CONFIG_64BIT +#ifndef CONFIG_64BIT_ATOMIC_ACCESS u64 min_vruntime_copy; do { @@ -9173,7 +9173,7 @@ void init_cfs_rq(struct cfs_rq *cfs_rq) { cfs_rq->tasks_timeline = RB_ROOT; cfs_rq->min_vruntime = (u64)(-(1LL << 20)); -#ifndef CONFIG_64BIT +#ifndef CONFIG_64BIT_ATOMIC_ACCESS cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime; #endif #ifdef CONFIG_SMP diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index eeef1a3..870010b 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -421,8 +421,12 @@ struct cfs_rq { unsigned int nr_running, h_nr_running; u64 exec_clock; +#ifndef CONFIG_64BIT_ATOMIC_ALIGNED_ACCESS u64 min_vruntime; -#ifndef CONFIG_64BIT +#else + u64 min_vruntime __attribute__((aligned(sizeof(u64; +#endif +#ifndef CONFIG_64BIT_ATOMIC_ACCESS u64 min_vruntime_copy; #endif -- 2.7.4
[RFC 0/3] add 64BIT_ATOMIC_ACCESS and 64BIT_ATOMIC_ALIGNED_ACCESS
On some 32-bit architectures, 64bit accesses are atomic when certain conditions are satisfied. For example, on LPAE (Large Physical Address Extension) enabled ARM architecture, 'ldrd/strd' (load/store doublewords) instructions are 64bit atomic as long as the address is 64-bit aligned. This feature is to guarantee atomic accesses on newly introduced 64bit wide descriptors in the translation tables, and 's/u64' variables can be accessed atomically when they are aligned(8) on LPAE enabled ARM architecture machines. Introducing 64BIT_ATOMIC_ACCESS and 64BIT_ATOMIC_ALIGNED_ACCESS, which can be true for the 32bit architectures as well as 64bit architectures. we can optimize some kernel codes using seqlock (timekeeping) or mimics of it (like in sched/cfq) simply to read or write 64bit variables. The existing codes depend on CONFIG_64BIT to determine whether the 64bit variables can be directly accessed or need additional synchronization primitives like seqlock. CONFIG_64BIT_ATOMIC_ACCESS can be used instead of CONFIG_64BIT in the cases. 64BIT_ATOMIC_ALIGNED_ACCESS can be used in the variable declaration to indicate the alignment requirement to the compiler (__attribute__((aligned(8 in the way of #ifdef. The last patch "sched: depend on 64BIT_ATOMIC_ACCESS to determine if to use min_vruntime_copy" is an example of this approach. I'd like to know what the architecture maintainers and kernel maintainers think about it. I think I can make more examples (mostly removing seqlock to access the 64bit variables on the machines) if this approach is accepted. Hoeun Ryu (3): arch: add 64BIT_ATOMIC_ACCESS to support 64bit atomic access on 32bit machines arm: enable 64BIT_ATOMIC(_ALIGNED)_ACCESS on LPAE enabled machines sched: depend on 64BIT_ATOMIC_ACCESS to determine if to use min_vruntime_copy arch/Kconfig | 20 arch/arm/mm/Kconfig | 2 ++ kernel/sched/fair.c | 6 +++--- kernel/sched/sched.h | 6 +- 4 files changed, 30 insertions(+), 4 deletions(-) -- 2.7.4
[RFC 1/3] arch: add 64BIT_ATOMIC_ACCESS to support 64bit atomic access on 32bit machines
On some 32-bit architectures, 64bit accesses are atomic when certain conditions are satisfied. For example, on LPAE (Large Physical Address Extension) enabled ARM architecture, 'ldrd/strd' (load/store doublewords) instructions are 64bit atomic as long as the address is 64-bit aligned. This feature is to guarantee atomic accesses on newly introduced 64bit wide descriptors in the translation tables, and 's/u64' variables can be accessed atomically when they are aligned(8) on LPAE enabled ARM architecture machines. Introducing 64BIT_ATOMIC_ACCESS and 64BIT_ATOMIC_ALIGNED_ACCESS, which can be true for the 32bit architectures as well as 64bit architectures, we can optimize some kernel codes using seqlock (timekeeping) or mimics of it (like in sched/cfq) simply to read or write 64bit variables. The existing codes depend on CONFIG_64BIT to determine whether the 64bit variables can be directly accessed or need additional synchronization primitives like seqlock. CONFIG_64BIT_ATOMIC_ACCESS can be used instead of CONFIG_64BIT in the cases. 64BIT_ATOMIC_ALIGNED_ACCESS can be used in the variable declaration to indicate the alignment requirement to the compiler (__attribute__((aligned(8 in the way of #ifdef. Signed-off-by: Hoeun Ryu <hoeun@gmail.com> --- arch/Kconfig | 20 1 file changed, 20 insertions(+) diff --git a/arch/Kconfig b/arch/Kconfig index 21d0089..1def331 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -115,6 +115,26 @@ config UPROBES managed by the kernel and kept transparent to the probed application. ) +config 64BIT_ATOMIC_ACCESS + def_bool 64BIT + help + On some 32bit architectures as well as 64bit architectures, + 64bit accesses are atomic when certain conditions are satisfied. + + This symbol should be selected by an architecture if 64 bit + accesses can be atomic. + +config 64BIT_ATOMIC_ALIGNED_ACCESS + def_bool n + depends on 64BIT_ATOMIC_ACCESS + help + On 64BIT_ATOMIC_ACCESS enabled system, the address should be + aligned by 8 to guarantee the accesses are atomic. + + This symbol should be selected by an architecture if 64 bit + accesses are required to be 64 bit aligned to guarantee that + the 64bit accesses are atomic. + config HAVE_64BIT_ALIGNED_ACCESS def_bool 64BIT && !HAVE_EFFICIENT_UNALIGNED_ACCESS help -- 2.7.4
[RFC 0/3] add 64BIT_ATOMIC_ACCESS and 64BIT_ATOMIC_ALIGNED_ACCESS
On some 32-bit architectures, 64bit accesses are atomic when certain conditions are satisfied. For example, on LPAE (Large Physical Address Extension) enabled ARM architecture, 'ldrd/strd' (load/store doublewords) instructions are 64bit atomic as long as the address is 64-bit aligned. This feature is to guarantee atomic accesses on newly introduced 64bit wide descriptors in the translation tables, and 's/u64' variables can be accessed atomically when they are aligned(8) on LPAE enabled ARM architecture machines. Introducing 64BIT_ATOMIC_ACCESS and 64BIT_ATOMIC_ALIGNED_ACCESS, which can be true for the 32bit architectures as well as 64bit architectures. we can optimize some kernel codes using seqlock (timekeeping) or mimics of it (like in sched/cfq) simply to read or write 64bit variables. The existing codes depend on CONFIG_64BIT to determine whether the 64bit variables can be directly accessed or need additional synchronization primitives like seqlock. CONFIG_64BIT_ATOMIC_ACCESS can be used instead of CONFIG_64BIT in the cases. 64BIT_ATOMIC_ALIGNED_ACCESS can be used in the variable declaration to indicate the alignment requirement to the compiler (__attribute__((aligned(8 in the way of #ifdef. The last patch "sched: depend on 64BIT_ATOMIC_ACCESS to determine if to use min_vruntime_copy" is an example of this approach. I'd like to know what the architecture maintainers and kernel maintainers think about it. I think I can make more examples (mostly removing seqlock to access the 64bit variables on the machines) if this approach is accepted. Hoeun Ryu (3): arch: add 64BIT_ATOMIC_ACCESS to support 64bit atomic access on 32bit machines arm: enable 64BIT_ATOMIC(_ALIGNED)_ACCESS on LPAE enabled machines sched: depend on 64BIT_ATOMIC_ACCESS to determine if to use min_vruntime_copy arch/Kconfig | 20 arch/arm/mm/Kconfig | 2 ++ kernel/sched/fair.c | 6 +++--- kernel/sched/sched.h | 6 +- 4 files changed, 30 insertions(+), 4 deletions(-) -- 2.7.4
[RFC 1/3] arch: add 64BIT_ATOMIC_ACCESS to support 64bit atomic access on 32bit machines
On some 32-bit architectures, 64bit accesses are atomic when certain conditions are satisfied. For example, on LPAE (Large Physical Address Extension) enabled ARM architecture, 'ldrd/strd' (load/store doublewords) instructions are 64bit atomic as long as the address is 64-bit aligned. This feature is to guarantee atomic accesses on newly introduced 64bit wide descriptors in the translation tables, and 's/u64' variables can be accessed atomically when they are aligned(8) on LPAE enabled ARM architecture machines. Introducing 64BIT_ATOMIC_ACCESS and 64BIT_ATOMIC_ALIGNED_ACCESS, which can be true for the 32bit architectures as well as 64bit architectures, we can optimize some kernel codes using seqlock (timekeeping) or mimics of it (like in sched/cfq) simply to read or write 64bit variables. The existing codes depend on CONFIG_64BIT to determine whether the 64bit variables can be directly accessed or need additional synchronization primitives like seqlock. CONFIG_64BIT_ATOMIC_ACCESS can be used instead of CONFIG_64BIT in the cases. 64BIT_ATOMIC_ALIGNED_ACCESS can be used in the variable declaration to indicate the alignment requirement to the compiler (__attribute__((aligned(8 in the way of #ifdef. Signed-off-by: Hoeun Ryu --- arch/Kconfig | 20 1 file changed, 20 insertions(+) diff --git a/arch/Kconfig b/arch/Kconfig index 21d0089..1def331 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -115,6 +115,26 @@ config UPROBES managed by the kernel and kept transparent to the probed application. ) +config 64BIT_ATOMIC_ACCESS + def_bool 64BIT + help + On some 32bit architectures as well as 64bit architectures, + 64bit accesses are atomic when certain conditions are satisfied. + + This symbol should be selected by an architecture if 64 bit + accesses can be atomic. + +config 64BIT_ATOMIC_ALIGNED_ACCESS + def_bool n + depends on 64BIT_ATOMIC_ACCESS + help + On 64BIT_ATOMIC_ACCESS enabled system, the address should be + aligned by 8 to guarantee the accesses are atomic. + + This symbol should be selected by an architecture if 64 bit + accesses are required to be 64 bit aligned to guarantee that + the 64bit accesses are atomic. + config HAVE_64BIT_ALIGNED_ACCESS def_bool 64BIT && !HAVE_EFFICIENT_UNALIGNED_ACCESS help -- 2.7.4
[RFC 2/3] arm: enable 64BIT_ATOMIC(_ALIGNED)_ACCESS on LPAE enabled machines
'ldrd/strd' (load/store doublewords) instructions are 64bit atomic as long as the address is 64-bit aligned on LPAE (Large Physical Address Extension) enabled architectures. This feature is to guarantee atomic accesses on newly introduced 64bit wide descriptors in the translation tables. Making 64BIT_ATOMIC_ACCESS true, some kernel codes to access 64bit variables can be optimized by omitting seqlock or the mimic of it. Also make 64BIT_ATOMIC_ALIGNED_ACCESS true, the 64bit atomic access is guarnteed only when the address is 64bit algined. Signed-off-by: Hoeun Ryu <hoeun@gmail.com> --- arch/arm/mm/Kconfig | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig index 60cdfdc..3142572 100644 --- a/arch/arm/mm/Kconfig +++ b/arch/arm/mm/Kconfig @@ -660,6 +660,8 @@ config ARM_LPAE bool "Support for the Large Physical Address Extension" depends on MMU && CPU_32v7 && !CPU_32v6 && !CPU_32v5 && \ !CPU_32v4 && !CPU_32v3 + select 64BIT_ATOMIC_ACCESS + select 64BIT_ATOMIC_ALIGNED_ACCESS help Say Y if you have an ARMv7 processor supporting the LPAE page table format and you would like to access memory beyond the -- 2.7.4
[RFC 2/3] arm: enable 64BIT_ATOMIC(_ALIGNED)_ACCESS on LPAE enabled machines
'ldrd/strd' (load/store doublewords) instructions are 64bit atomic as long as the address is 64-bit aligned on LPAE (Large Physical Address Extension) enabled architectures. This feature is to guarantee atomic accesses on newly introduced 64bit wide descriptors in the translation tables. Making 64BIT_ATOMIC_ACCESS true, some kernel codes to access 64bit variables can be optimized by omitting seqlock or the mimic of it. Also make 64BIT_ATOMIC_ALIGNED_ACCESS true, the 64bit atomic access is guarnteed only when the address is 64bit algined. Signed-off-by: Hoeun Ryu --- arch/arm/mm/Kconfig | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig index 60cdfdc..3142572 100644 --- a/arch/arm/mm/Kconfig +++ b/arch/arm/mm/Kconfig @@ -660,6 +660,8 @@ config ARM_LPAE bool "Support for the Large Physical Address Extension" depends on MMU && CPU_32v7 && !CPU_32v6 && !CPU_32v5 && \ !CPU_32v4 && !CPU_32v3 + select 64BIT_ATOMIC_ACCESS + select 64BIT_ATOMIC_ALIGNED_ACCESS help Say Y if you have an ARMv7 processor supporting the LPAE page table format and you would like to access memory beyond the -- 2.7.4
Re: [PATCHv3] arm:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores
Hello, All. Would you please review this patch ? I haven't had any respond to this patch. Thank you. On Tue, 2017-08-08 at 10:22 +0900, Hoeun Ryu wrote: > Â Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly > version in panic path) introduced crash_smp_send_stop() which is a weak > function and can be overriden by architecture codes to fix the side effect > caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_ > notifiers" option). > > Â ARM architecture uses the weak version function and the problem is that > the weak function simply calls smp_send_stop() which makes other CPUs > offline and takes away the chance to save crash information for nonpanic > CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel > option is enabled. > > Â Calling smp_call_function(machine_crash_nonpanic_core, NULL, false) in > the function is useless because all nonpanic CPUs are already offline by > smp_send_stop() in this case and smp_call_function() only works against > online CPUs. > > Â The result is that /proc/vmcore is not available with the error messages; > "Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized". > > Â crash_smp_send_stop() is implemented for ARM architecture to fix this > problem and the function (strong symbol version) saves crash information > for nonpanic CPUs using smp_call_function() and machine_crash_shutdown() > tries to save crash information for nonpanic CPUs only when > crash_kexec_post_notifiers kernel option is disabled. > > Â We might be able to implement the function like arm64 or x86 using a > dedicated IPI (let's say IPI_CPU_CRASH_STOP), but we cannot implement this > function like that because of the lack of IPI slots. Please see the commit > e7273ff4 : (ARM: 8488/1: Make IPI_CPU_BACKTRACE a "non-secure" SGI) > > Signed-off-by: Hoeun Ryu <hoeun@gmail.com> > --- > Â v3: > Â Â Â - remove 'WARN_ON(num_online_cpus() > 1)' in machine_crash_shutdown(). > Â it's a false check for the case when crash_kexec_post_notifiers > Â kernel option is disabled. > Â v2: > Â Â Â - calling crash_smp_send_stop() in machine_crash_shutdown() for the case > Â when crash_kexec_post_notifiers kernel option is disabled. > Â Â Â - fix commit messages for it. > > Â arch/arm/kernel/machine_kexec.c | 40 +--- > Â 1 file changed, 29 insertions(+), 11 deletions(-) > > diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c > index fe1419e..82ef7c7 100644 > --- a/arch/arm/kernel/machine_kexec.c > +++ b/arch/arm/kernel/machine_kexec.c > @@ -94,6 +94,34 @@ void machine_crash_nonpanic_core(void *unused) > Â cpu_relax(); > Â } > Â > +void crash_smp_send_stop(void) > +{ > + static int cpus_stopped; > + unsigned long msecs; > + > + /* > + Â * This function can be called twice in panic path, but obviously > + Â * we execute this only once. > + Â */ > + if (cpus_stopped) > + return; > + > + cpus_stopped = 1; > + > + if (num_online_cpus() == 1) > + return; > + > + atomic_set(_for_crash_ipi, num_online_cpus() - 1); > + smp_call_function(machine_crash_nonpanic_core, NULL, false); > + msecs = 1000; /* Wait at most a second for the other cpus to stop */ > + while ((atomic_read(_for_crash_ipi) > 0) && msecs) { > + mdelay(1); > + msecs--; > + } > + if (atomic_read(_for_crash_ipi) > 0) > + pr_warn("Non-crashing CPUs did not react to IPI\n"); > +} > + > Â static void machine_kexec_mask_interrupts(void) > Â { > Â unsigned int i; > @@ -119,19 +147,9 @@ static void machine_kexec_mask_interrupts(void) > Â > Â void machine_crash_shutdown(struct pt_regs *regs) > Â { > - unsigned long msecs; > - > Â local_irq_disable(); > Â > - atomic_set(_for_crash_ipi, num_online_cpus() - 1); > - smp_call_function(machine_crash_nonpanic_core, NULL, false); > - msecs = 1000; /* Wait at most a second for the other cpus to stop */ > - while ((atomic_read(_for_crash_ipi) > 0) && msecs) { > - mdelay(1); > - msecs--; > - } > - if (atomic_read(_for_crash_ipi) > 0) > - pr_warn("Non-crashing CPUs did not react to IPI\n"); > + crash_smp_send_stop(); > Â > Â crash_save_cpu(regs, smp_processor_id()); > Â machine_kexec_mask_interrupts();
Re: [PATCHv3] arm:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores
Hello, All. Would you please review this patch ? I haven't had any respond to this patch. Thank you. On Tue, 2017-08-08 at 10:22 +0900, Hoeun Ryu wrote: > Â Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly > version in panic path) introduced crash_smp_send_stop() which is a weak > function and can be overriden by architecture codes to fix the side effect > caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_ > notifiers" option). > > Â ARM architecture uses the weak version function and the problem is that > the weak function simply calls smp_send_stop() which makes other CPUs > offline and takes away the chance to save crash information for nonpanic > CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel > option is enabled. > > Â Calling smp_call_function(machine_crash_nonpanic_core, NULL, false) in > the function is useless because all nonpanic CPUs are already offline by > smp_send_stop() in this case and smp_call_function() only works against > online CPUs. > > Â The result is that /proc/vmcore is not available with the error messages; > "Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized". > > Â crash_smp_send_stop() is implemented for ARM architecture to fix this > problem and the function (strong symbol version) saves crash information > for nonpanic CPUs using smp_call_function() and machine_crash_shutdown() > tries to save crash information for nonpanic CPUs only when > crash_kexec_post_notifiers kernel option is disabled. > > Â We might be able to implement the function like arm64 or x86 using a > dedicated IPI (let's say IPI_CPU_CRASH_STOP), but we cannot implement this > function like that because of the lack of IPI slots. Please see the commit > e7273ff4 : (ARM: 8488/1: Make IPI_CPU_BACKTRACE a "non-secure" SGI) > > Signed-off-by: Hoeun Ryu > --- > Â v3: > Â Â Â - remove 'WARN_ON(num_online_cpus() > 1)' in machine_crash_shutdown(). > Â it's a false check for the case when crash_kexec_post_notifiers > Â kernel option is disabled. > Â v2: > Â Â Â - calling crash_smp_send_stop() in machine_crash_shutdown() for the case > Â when crash_kexec_post_notifiers kernel option is disabled. > Â Â Â - fix commit messages for it. > > Â arch/arm/kernel/machine_kexec.c | 40 +--- > Â 1 file changed, 29 insertions(+), 11 deletions(-) > > diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c > index fe1419e..82ef7c7 100644 > --- a/arch/arm/kernel/machine_kexec.c > +++ b/arch/arm/kernel/machine_kexec.c > @@ -94,6 +94,34 @@ void machine_crash_nonpanic_core(void *unused) > Â cpu_relax(); > Â } > Â > +void crash_smp_send_stop(void) > +{ > + static int cpus_stopped; > + unsigned long msecs; > + > + /* > + Â * This function can be called twice in panic path, but obviously > + Â * we execute this only once. > + Â */ > + if (cpus_stopped) > + return; > + > + cpus_stopped = 1; > + > + if (num_online_cpus() == 1) > + return; > + > + atomic_set(_for_crash_ipi, num_online_cpus() - 1); > + smp_call_function(machine_crash_nonpanic_core, NULL, false); > + msecs = 1000; /* Wait at most a second for the other cpus to stop */ > + while ((atomic_read(_for_crash_ipi) > 0) && msecs) { > + mdelay(1); > + msecs--; > + } > + if (atomic_read(_for_crash_ipi) > 0) > + pr_warn("Non-crashing CPUs did not react to IPI\n"); > +} > + > Â static void machine_kexec_mask_interrupts(void) > Â { > Â unsigned int i; > @@ -119,19 +147,9 @@ static void machine_kexec_mask_interrupts(void) > Â > Â void machine_crash_shutdown(struct pt_regs *regs) > Â { > - unsigned long msecs; > - > Â local_irq_disable(); > Â > - atomic_set(_for_crash_ipi, num_online_cpus() - 1); > - smp_call_function(machine_crash_nonpanic_core, NULL, false); > - msecs = 1000; /* Wait at most a second for the other cpus to stop */ > - while ((atomic_read(_for_crash_ipi) > 0) && msecs) { > - mdelay(1); > - msecs--; > - } > - if (atomic_read(_for_crash_ipi) > 0) > - pr_warn("Non-crashing CPUs did not react to IPI\n"); > + crash_smp_send_stop(); > Â > Â crash_save_cpu(regs, smp_processor_id()); > Â machine_kexec_mask_interrupts();
[PATCHv3] arm64:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores
Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly version in panic path) introduced crash_smp_send_stop() which is a weak function and can be overridden by architecture codes to fix the side effect caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_ notifiers" option). ARM64 architecture uses the weak version function and the problem is that the weak function simply calls smp_send_stop() which makes other CPUs offline and takes away the chance to save crash information for nonpanic CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel option is enabled. Calling smp_send_crash_stop() in machine_crash_shutdown() is useless because all nonpanic CPUs are already offline by smp_send_stop() in this case and smp_send_crash_stop() only works against online CPUs. The result is that secondary CPUs registers are not saved by crash_save_cpu() and the vmcore file misreports these CPUs as being offline. crash_smp_send_stop() is implemented to fix this problem by replacing the existing smp_send_crash_stop() and adding a check for multiple calling to the function. The function (strong symbol version) saves crash information for nonpanic CPUs and machine_crash_shutdown() tries to save crash information for nonpanic CPUs only when crash_kexec_post_notifiers kernel option is disabled. * crash_kexec_post_notifiers : false panic() __crash_kexec() machine_crash_shutdown() crash_smp_send_stop()<= save crash dump for nonpanic cores * crash_kexec_post_notifiers : true panic() crash_smp_send_stop()<= save crash dump for nonpanic cores __crash_kexec() machine_crash_shutdown() crash_smp_send_stop()<= just return. Signed-off-by: Hoeun Ryu <hoeun@gmail.com> Reviewed-by: James Morse <james.mo...@arm.com> Tested-by: James Morse <james.mo...@arm.com> --- v3: - fix typos in the commit log. - modify commit log about the result of this problem. - add Tested-by/Reviewed-by: James Morse. v2: - replace the existing smp_send_crash_stop() with crash_smp_send_stop() and adding called-twice logic to it. - modify the commit message. arch/arm64/include/asm/smp.h | 2 +- arch/arm64/kernel/machine_kexec.c | 2 +- arch/arm64/kernel/smp.c | 12 +++- 3 files changed, 13 insertions(+), 3 deletions(-) diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h index 55f08c5..f82b447 100644 --- a/arch/arm64/include/asm/smp.h +++ b/arch/arm64/include/asm/smp.h @@ -148,7 +148,7 @@ static inline void cpu_panic_kernel(void) */ bool cpus_are_stuck_in_kernel(void); -extern void smp_send_crash_stop(void); +extern void crash_smp_send_stop(void); extern bool smp_crash_stop_failed(void); #endif /* ifndef __ASSEMBLY__ */ diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index 481f54a..11121f6 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -252,7 +252,7 @@ void machine_crash_shutdown(struct pt_regs *regs) local_irq_disable(); /* shutdown non-crashing cpus */ - smp_send_crash_stop(); + crash_smp_send_stop(); /* for crashing cpu */ crash_save_cpu(regs, smp_processor_id()); diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index dc66e6e..73d8f5e 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -977,11 +977,21 @@ void smp_send_stop(void) } #ifdef CONFIG_KEXEC_CORE -void smp_send_crash_stop(void) +void crash_smp_send_stop(void) { + static int cpus_stopped; cpumask_t mask; unsigned long timeout; + /* +* This function can be called twice in panic path, but obviously +* we execute this only once. +*/ + if (cpus_stopped) + return; + + cpus_stopped = 1; + if (num_online_cpus() == 1) return; -- 2.7.4
[PATCHv3] arm64:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores
Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly version in panic path) introduced crash_smp_send_stop() which is a weak function and can be overridden by architecture codes to fix the side effect caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_ notifiers" option). ARM64 architecture uses the weak version function and the problem is that the weak function simply calls smp_send_stop() which makes other CPUs offline and takes away the chance to save crash information for nonpanic CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel option is enabled. Calling smp_send_crash_stop() in machine_crash_shutdown() is useless because all nonpanic CPUs are already offline by smp_send_stop() in this case and smp_send_crash_stop() only works against online CPUs. The result is that secondary CPUs registers are not saved by crash_save_cpu() and the vmcore file misreports these CPUs as being offline. crash_smp_send_stop() is implemented to fix this problem by replacing the existing smp_send_crash_stop() and adding a check for multiple calling to the function. The function (strong symbol version) saves crash information for nonpanic CPUs and machine_crash_shutdown() tries to save crash information for nonpanic CPUs only when crash_kexec_post_notifiers kernel option is disabled. * crash_kexec_post_notifiers : false panic() __crash_kexec() machine_crash_shutdown() crash_smp_send_stop()<= save crash dump for nonpanic cores * crash_kexec_post_notifiers : true panic() crash_smp_send_stop()<= save crash dump for nonpanic cores __crash_kexec() machine_crash_shutdown() crash_smp_send_stop()<= just return. Signed-off-by: Hoeun Ryu Reviewed-by: James Morse Tested-by: James Morse --- v3: - fix typos in the commit log. - modify commit log about the result of this problem. - add Tested-by/Reviewed-by: James Morse. v2: - replace the existing smp_send_crash_stop() with crash_smp_send_stop() and adding called-twice logic to it. - modify the commit message. arch/arm64/include/asm/smp.h | 2 +- arch/arm64/kernel/machine_kexec.c | 2 +- arch/arm64/kernel/smp.c | 12 +++- 3 files changed, 13 insertions(+), 3 deletions(-) diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h index 55f08c5..f82b447 100644 --- a/arch/arm64/include/asm/smp.h +++ b/arch/arm64/include/asm/smp.h @@ -148,7 +148,7 @@ static inline void cpu_panic_kernel(void) */ bool cpus_are_stuck_in_kernel(void); -extern void smp_send_crash_stop(void); +extern void crash_smp_send_stop(void); extern bool smp_crash_stop_failed(void); #endif /* ifndef __ASSEMBLY__ */ diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index 481f54a..11121f6 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -252,7 +252,7 @@ void machine_crash_shutdown(struct pt_regs *regs) local_irq_disable(); /* shutdown non-crashing cpus */ - smp_send_crash_stop(); + crash_smp_send_stop(); /* for crashing cpu */ crash_save_cpu(regs, smp_processor_id()); diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index dc66e6e..73d8f5e 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -977,11 +977,21 @@ void smp_send_stop(void) } #ifdef CONFIG_KEXEC_CORE -void smp_send_crash_stop(void) +void crash_smp_send_stop(void) { + static int cpus_stopped; cpumask_t mask; unsigned long timeout; + /* +* This function can be called twice in panic path, but obviously +* we execute this only once. +*/ + if (cpus_stopped) + return; + + cpus_stopped = 1; + if (num_online_cpus() == 1) return; -- 2.7.4
Re: [PATCHv2] arm64:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores
Hello, James. Thank you for the meticulous test and review. On Fri, 2017-08-11 at 18:02 +0100, James Morse wrote: > Hi Hoeun, > > On 07/08/17 06:09, Hoeun Ryu wrote: > > > >  Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly > > version in panic path) introduced crash_smp_send_stop() which is a weak > > function and can be overriden by architecture codes to fix the side effect > (overridden) It'll be fixed in the next version. > > > > > > caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_ > > notifiers" option). > > > >  ARM64 architecture uses the weak version function and the problem is that > > the weak function simply calls smp_send_stop() which makes other CPUs > > offline and takes away the chance to save crash information for nonpanic > > CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel > > option is enabled. > > > >  Calling smp_send_crash_stop() in machine_crash_shutdown() is useless > > because all nonpanic CPUs are already offline by smp_send_stop() in this > > case and smp_send_crash_stop() only works against online CPUs. > > > > >  The result is that /proc/vmcore is not available with the error messages; > > "Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized". > When I tried this I got one of these warnings for each secondary CPU, but the > vmcore file was still available. When I ran 'crash' on the vmcore it reported: > > > > CPUS: 6 [OFFLINE: 5] > Did I miss as step to reproduce this? If not, can we change this paragraph to > say something like: > > > > The result is that secondary CPUs registers are not saved by > > crash_save_cpu() > > and the vmcore file misreports these CPUs as being offline. Actually the commit log comes from the patch to fix a similar issue in arm port. I'll change the commit log with yours. > > > > >  crash_smp_send_stop() is implemented to fix this problem by replacing the > > exising smp_send_crash_stop() and adding a check for multiple calling to > (existing) It'll be fixed in the next version. > > > > > > the function. The function (strong symbol version) saves crash information > > for nonpanic CPUs and machine_crash_shutdown() tries to save crash > > information for nonpanic CPUs only when crash_kexec_post_notifiers kernel > > option is disabled. > > > > * crash_kexec_post_notifiers : false > > > >  panic() > > __crash_kexec() > >   machine_crash_shutdown() > > crash_smp_send_stop()<= save crash dump for nonpanic cores > > > > * crash_kexec_post_notifiers : true > > > >  panic() > > crash_smp_send_stop()<= save crash dump for nonpanic cores > > __crash_kexec() > >   machine_crash_shutdown() > > crash_smp_send_stop()<= just return. > > > > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > > index dc66e6e..73d8f5e 100644 > > --- a/arch/arm64/kernel/smp.c > > +++ b/arch/arm64/kernel/smp.c > > @@ -977,11 +977,21 @@ void smp_send_stop(void) > >  } > >  > >  #ifdef CONFIG_KEXEC_CORE > > -void smp_send_crash_stop(void) > > +void crash_smp_send_stop(void) > >  { > > + static int cpus_stopped; > >  cpumask_t mask; > >  unsigned long timeout; > >  > > + /* > > +  * This function can be called twice in panic path, but obviously > > +  * we execute this only once. > > +  */ > > + if (cpus_stopped) > > + return; > > + > > + cpus_stopped = 1; > > + > This cpus_stopped=1 can't happen on multiple CPUs at the same time as any > second > call is guaranteed to be on the same CPU, both are behind panic()s > 'atomic_cmpxchg()'. 'cpu_stopped' variable is not for the race of multi CPUs. This variable is simply to prevent from calling 'smp_cross_call(, IPI_CPU_CRASH_STOP)' twice in the machine_crash_shutdown(). Please look at following call path.  * crash_kexec_post_notifiers : true  panic()    crash_smp_send_stop() {      ...      cpu_stopped = 1        <= make it '1'      smp_cross_call()       <= save crash dump for nonpanic cores    }    __crash_kexec()      machine_crash_shutdown()        crash_smp_send_stop() {          if (cpu_stopped)            return      <= just return.        } > > > Other than my '/proc/vmcore is not available' question above, this looks fine > to me: > Reviewed-by: James Morse <james.mo...@arm.com> > Tested-by: James Morse <james.mo...@arm.com> > > > Thanks! > > James > > >
Re: [PATCHv2] arm64:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores
Hello, James. Thank you for the meticulous test and review. On Fri, 2017-08-11 at 18:02 +0100, James Morse wrote: > Hi Hoeun, > > On 07/08/17 06:09, Hoeun Ryu wrote: > > > >  Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly > > version in panic path) introduced crash_smp_send_stop() which is a weak > > function and can be overriden by architecture codes to fix the side effect > (overridden) It'll be fixed in the next version. > > > > > > caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_ > > notifiers" option). > > > >  ARM64 architecture uses the weak version function and the problem is that > > the weak function simply calls smp_send_stop() which makes other CPUs > > offline and takes away the chance to save crash information for nonpanic > > CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel > > option is enabled. > > > >  Calling smp_send_crash_stop() in machine_crash_shutdown() is useless > > because all nonpanic CPUs are already offline by smp_send_stop() in this > > case and smp_send_crash_stop() only works against online CPUs. > > > > >  The result is that /proc/vmcore is not available with the error messages; > > "Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized". > When I tried this I got one of these warnings for each secondary CPU, but the > vmcore file was still available. When I ran 'crash' on the vmcore it reported: > > > > CPUS: 6 [OFFLINE: 5] > Did I miss as step to reproduce this? If not, can we change this paragraph to > say something like: > > > > The result is that secondary CPUs registers are not saved by > > crash_save_cpu() > > and the vmcore file misreports these CPUs as being offline. Actually the commit log comes from the patch to fix a similar issue in arm port. I'll change the commit log with yours. > > > > >  crash_smp_send_stop() is implemented to fix this problem by replacing the > > exising smp_send_crash_stop() and adding a check for multiple calling to > (existing) It'll be fixed in the next version. > > > > > > the function. The function (strong symbol version) saves crash information > > for nonpanic CPUs and machine_crash_shutdown() tries to save crash > > information for nonpanic CPUs only when crash_kexec_post_notifiers kernel > > option is disabled. > > > > * crash_kexec_post_notifiers : false > > > >  panic() > > __crash_kexec() > >   machine_crash_shutdown() > > crash_smp_send_stop()<= save crash dump for nonpanic cores > > > > * crash_kexec_post_notifiers : true > > > >  panic() > > crash_smp_send_stop()<= save crash dump for nonpanic cores > > __crash_kexec() > >   machine_crash_shutdown() > > crash_smp_send_stop()<= just return. > > > > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > > index dc66e6e..73d8f5e 100644 > > --- a/arch/arm64/kernel/smp.c > > +++ b/arch/arm64/kernel/smp.c > > @@ -977,11 +977,21 @@ void smp_send_stop(void) > >  } > >  > >  #ifdef CONFIG_KEXEC_CORE > > -void smp_send_crash_stop(void) > > +void crash_smp_send_stop(void) > >  { > > + static int cpus_stopped; > >  cpumask_t mask; > >  unsigned long timeout; > >  > > + /* > > +  * This function can be called twice in panic path, but obviously > > +  * we execute this only once. > > +  */ > > + if (cpus_stopped) > > + return; > > + > > + cpus_stopped = 1; > > + > This cpus_stopped=1 can't happen on multiple CPUs at the same time as any > second > call is guaranteed to be on the same CPU, both are behind panic()s > 'atomic_cmpxchg()'. 'cpu_stopped' variable is not for the race of multi CPUs. This variable is simply to prevent from calling 'smp_cross_call(, IPI_CPU_CRASH_STOP)' twice in the machine_crash_shutdown(). Please look at following call path.  * crash_kexec_post_notifiers : true  panic()    crash_smp_send_stop() {      ...      cpu_stopped = 1        <= make it '1'      smp_cross_call()       <= save crash dump for nonpanic cores    }    __crash_kexec()      machine_crash_shutdown()        crash_smp_send_stop() {          if (cpu_stopped)            return      <= just return.        } > > > Other than my '/proc/vmcore is not available' question above, this looks fine > to me: > Reviewed-by: James Morse > Tested-by: James Morse > > > Thanks! > > James > > >
[PATCHv3] arm:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores
Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly version in panic path) introduced crash_smp_send_stop() which is a weak function and can be overriden by architecture codes to fix the side effect caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_ notifiers" option). ARM architecture uses the weak version function and the problem is that the weak function simply calls smp_send_stop() which makes other CPUs offline and takes away the chance to save crash information for nonpanic CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel option is enabled. Calling smp_call_function(machine_crash_nonpanic_core, NULL, false) in the function is useless because all nonpanic CPUs are already offline by smp_send_stop() in this case and smp_call_function() only works against online CPUs. The result is that /proc/vmcore is not available with the error messages; "Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized". crash_smp_send_stop() is implemented for ARM architecture to fix this problem and the function (strong symbol version) saves crash information for nonpanic CPUs using smp_call_function() and machine_crash_shutdown() tries to save crash information for nonpanic CPUs only when crash_kexec_post_notifiers kernel option is disabled. We might be able to implement the function like arm64 or x86 using a dedicated IPI (let's say IPI_CPU_CRASH_STOP), but we cannot implement this function like that because of the lack of IPI slots. Please see the commit e7273ff4 : (ARM: 8488/1: Make IPI_CPU_BACKTRACE a "non-secure" SGI) Signed-off-by: Hoeun Ryu <hoeun@gmail.com> --- v3: - remove 'WARN_ON(num_online_cpus() > 1)' in machine_crash_shutdown(). it's a false check for the case when crash_kexec_post_notifiers kernel option is disabled. v2: - calling crash_smp_send_stop() in machine_crash_shutdown() for the case when crash_kexec_post_notifiers kernel option is disabled. - fix commit messages for it. arch/arm/kernel/machine_kexec.c | 40 +--- 1 file changed, 29 insertions(+), 11 deletions(-) diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c index fe1419e..82ef7c7 100644 --- a/arch/arm/kernel/machine_kexec.c +++ b/arch/arm/kernel/machine_kexec.c @@ -94,6 +94,34 @@ void machine_crash_nonpanic_core(void *unused) cpu_relax(); } +void crash_smp_send_stop(void) +{ + static int cpus_stopped; + unsigned long msecs; + + /* +* This function can be called twice in panic path, but obviously +* we execute this only once. +*/ + if (cpus_stopped) + return; + + cpus_stopped = 1; + + if (num_online_cpus() == 1) + return; + + atomic_set(_for_crash_ipi, num_online_cpus() - 1); + smp_call_function(machine_crash_nonpanic_core, NULL, false); + msecs = 1000; /* Wait at most a second for the other cpus to stop */ + while ((atomic_read(_for_crash_ipi) > 0) && msecs) { + mdelay(1); + msecs--; + } + if (atomic_read(_for_crash_ipi) > 0) + pr_warn("Non-crashing CPUs did not react to IPI\n"); +} + static void machine_kexec_mask_interrupts(void) { unsigned int i; @@ -119,19 +147,9 @@ static void machine_kexec_mask_interrupts(void) void machine_crash_shutdown(struct pt_regs *regs) { - unsigned long msecs; - local_irq_disable(); - atomic_set(_for_crash_ipi, num_online_cpus() - 1); - smp_call_function(machine_crash_nonpanic_core, NULL, false); - msecs = 1000; /* Wait at most a second for the other cpus to stop */ - while ((atomic_read(_for_crash_ipi) > 0) && msecs) { - mdelay(1); - msecs--; - } - if (atomic_read(_for_crash_ipi) > 0) - pr_warn("Non-crashing CPUs did not react to IPI\n"); + crash_smp_send_stop(); crash_save_cpu(regs, smp_processor_id()); machine_kexec_mask_interrupts(); -- 2.7.4
[PATCHv3] arm:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores
Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly version in panic path) introduced crash_smp_send_stop() which is a weak function and can be overriden by architecture codes to fix the side effect caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_ notifiers" option). ARM architecture uses the weak version function and the problem is that the weak function simply calls smp_send_stop() which makes other CPUs offline and takes away the chance to save crash information for nonpanic CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel option is enabled. Calling smp_call_function(machine_crash_nonpanic_core, NULL, false) in the function is useless because all nonpanic CPUs are already offline by smp_send_stop() in this case and smp_call_function() only works against online CPUs. The result is that /proc/vmcore is not available with the error messages; "Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized". crash_smp_send_stop() is implemented for ARM architecture to fix this problem and the function (strong symbol version) saves crash information for nonpanic CPUs using smp_call_function() and machine_crash_shutdown() tries to save crash information for nonpanic CPUs only when crash_kexec_post_notifiers kernel option is disabled. We might be able to implement the function like arm64 or x86 using a dedicated IPI (let's say IPI_CPU_CRASH_STOP), but we cannot implement this function like that because of the lack of IPI slots. Please see the commit e7273ff4 : (ARM: 8488/1: Make IPI_CPU_BACKTRACE a "non-secure" SGI) Signed-off-by: Hoeun Ryu --- v3: - remove 'WARN_ON(num_online_cpus() > 1)' in machine_crash_shutdown(). it's a false check for the case when crash_kexec_post_notifiers kernel option is disabled. v2: - calling crash_smp_send_stop() in machine_crash_shutdown() for the case when crash_kexec_post_notifiers kernel option is disabled. - fix commit messages for it. arch/arm/kernel/machine_kexec.c | 40 +--- 1 file changed, 29 insertions(+), 11 deletions(-) diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c index fe1419e..82ef7c7 100644 --- a/arch/arm/kernel/machine_kexec.c +++ b/arch/arm/kernel/machine_kexec.c @@ -94,6 +94,34 @@ void machine_crash_nonpanic_core(void *unused) cpu_relax(); } +void crash_smp_send_stop(void) +{ + static int cpus_stopped; + unsigned long msecs; + + /* +* This function can be called twice in panic path, but obviously +* we execute this only once. +*/ + if (cpus_stopped) + return; + + cpus_stopped = 1; + + if (num_online_cpus() == 1) + return; + + atomic_set(_for_crash_ipi, num_online_cpus() - 1); + smp_call_function(machine_crash_nonpanic_core, NULL, false); + msecs = 1000; /* Wait at most a second for the other cpus to stop */ + while ((atomic_read(_for_crash_ipi) > 0) && msecs) { + mdelay(1); + msecs--; + } + if (atomic_read(_for_crash_ipi) > 0) + pr_warn("Non-crashing CPUs did not react to IPI\n"); +} + static void machine_kexec_mask_interrupts(void) { unsigned int i; @@ -119,19 +147,9 @@ static void machine_kexec_mask_interrupts(void) void machine_crash_shutdown(struct pt_regs *regs) { - unsigned long msecs; - local_irq_disable(); - atomic_set(_for_crash_ipi, num_online_cpus() - 1); - smp_call_function(machine_crash_nonpanic_core, NULL, false); - msecs = 1000; /* Wait at most a second for the other cpus to stop */ - while ((atomic_read(_for_crash_ipi) > 0) && msecs) { - mdelay(1); - msecs--; - } - if (atomic_read(_for_crash_ipi) > 0) - pr_warn("Non-crashing CPUs did not react to IPI\n"); + crash_smp_send_stop(); crash_save_cpu(regs, smp_processor_id()); machine_kexec_mask_interrupts(); -- 2.7.4
[PATCHv2] arm64:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores
Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly version in panic path) introduced crash_smp_send_stop() which is a weak function and can be overriden by architecture codes to fix the side effect caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_ notifiers" option). ARM64 architecture uses the weak version function and the problem is that the weak function simply calls smp_send_stop() which makes other CPUs offline and takes away the chance to save crash information for nonpanic CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel option is enabled. Calling smp_send_crash_stop() in machine_crash_shutdown() is useless because all nonpanic CPUs are already offline by smp_send_stop() in this case and smp_send_crash_stop() only works against online CPUs. The result is that /proc/vmcore is not available with the error messages; "Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized". crash_smp_send_stop() is implemented to fix this problem by replacing the exising smp_send_crash_stop() and adding a check for multiple calling to the function. The function (strong symbol version) saves crash information for nonpanic CPUs and machine_crash_shutdown() tries to save crash information for nonpanic CPUs only when crash_kexec_post_notifiers kernel option is disabled. * crash_kexec_post_notifiers : false panic() __crash_kexec() machine_crash_shutdown() crash_smp_send_stop()<= save crash dump for nonpanic cores * crash_kexec_post_notifiers : true panic() crash_smp_send_stop()<= save crash dump for nonpanic cores __crash_kexec() machine_crash_shutdown() crash_smp_send_stop()<= just return. Signed-off-by: Hoeun Ryu <hoeun@gmail.com> --- v2: - replace the existing smp_send_crash_stop() with crash_smp_send_stop() and adding called-twice logic to it. - modify the commit message arch/arm64/include/asm/smp.h | 2 +- arch/arm64/kernel/machine_kexec.c | 2 +- arch/arm64/kernel/smp.c | 12 +++- 3 files changed, 13 insertions(+), 3 deletions(-) diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h index 55f08c5..f82b447 100644 --- a/arch/arm64/include/asm/smp.h +++ b/arch/arm64/include/asm/smp.h @@ -148,7 +148,7 @@ static inline void cpu_panic_kernel(void) */ bool cpus_are_stuck_in_kernel(void); -extern void smp_send_crash_stop(void); +extern void crash_smp_send_stop(void); extern bool smp_crash_stop_failed(void); #endif /* ifndef __ASSEMBLY__ */ diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index 481f54a..11121f6 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -252,7 +252,7 @@ void machine_crash_shutdown(struct pt_regs *regs) local_irq_disable(); /* shutdown non-crashing cpus */ - smp_send_crash_stop(); + crash_smp_send_stop(); /* for crashing cpu */ crash_save_cpu(regs, smp_processor_id()); diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index dc66e6e..73d8f5e 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -977,11 +977,21 @@ void smp_send_stop(void) } #ifdef CONFIG_KEXEC_CORE -void smp_send_crash_stop(void) +void crash_smp_send_stop(void) { + static int cpus_stopped; cpumask_t mask; unsigned long timeout; + /* +* This function can be called twice in panic path, but obviously +* we execute this only once. +*/ + if (cpus_stopped) + return; + + cpus_stopped = 1; + if (num_online_cpus() == 1) return; -- 2.7.4
[PATCHv2] arm64:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores
Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly version in panic path) introduced crash_smp_send_stop() which is a weak function and can be overriden by architecture codes to fix the side effect caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_ notifiers" option). ARM64 architecture uses the weak version function and the problem is that the weak function simply calls smp_send_stop() which makes other CPUs offline and takes away the chance to save crash information for nonpanic CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel option is enabled. Calling smp_send_crash_stop() in machine_crash_shutdown() is useless because all nonpanic CPUs are already offline by smp_send_stop() in this case and smp_send_crash_stop() only works against online CPUs. The result is that /proc/vmcore is not available with the error messages; "Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized". crash_smp_send_stop() is implemented to fix this problem by replacing the exising smp_send_crash_stop() and adding a check for multiple calling to the function. The function (strong symbol version) saves crash information for nonpanic CPUs and machine_crash_shutdown() tries to save crash information for nonpanic CPUs only when crash_kexec_post_notifiers kernel option is disabled. * crash_kexec_post_notifiers : false panic() __crash_kexec() machine_crash_shutdown() crash_smp_send_stop()<= save crash dump for nonpanic cores * crash_kexec_post_notifiers : true panic() crash_smp_send_stop()<= save crash dump for nonpanic cores __crash_kexec() machine_crash_shutdown() crash_smp_send_stop()<= just return. Signed-off-by: Hoeun Ryu --- v2: - replace the existing smp_send_crash_stop() with crash_smp_send_stop() and adding called-twice logic to it. - modify the commit message arch/arm64/include/asm/smp.h | 2 +- arch/arm64/kernel/machine_kexec.c | 2 +- arch/arm64/kernel/smp.c | 12 +++- 3 files changed, 13 insertions(+), 3 deletions(-) diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h index 55f08c5..f82b447 100644 --- a/arch/arm64/include/asm/smp.h +++ b/arch/arm64/include/asm/smp.h @@ -148,7 +148,7 @@ static inline void cpu_panic_kernel(void) */ bool cpus_are_stuck_in_kernel(void); -extern void smp_send_crash_stop(void); +extern void crash_smp_send_stop(void); extern bool smp_crash_stop_failed(void); #endif /* ifndef __ASSEMBLY__ */ diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index 481f54a..11121f6 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -252,7 +252,7 @@ void machine_crash_shutdown(struct pt_regs *regs) local_irq_disable(); /* shutdown non-crashing cpus */ - smp_send_crash_stop(); + crash_smp_send_stop(); /* for crashing cpu */ crash_save_cpu(regs, smp_processor_id()); diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index dc66e6e..73d8f5e 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -977,11 +977,21 @@ void smp_send_stop(void) } #ifdef CONFIG_KEXEC_CORE -void smp_send_crash_stop(void) +void crash_smp_send_stop(void) { + static int cpus_stopped; cpumask_t mask; unsigned long timeout; + /* +* This function can be called twice in panic path, but obviously +* we execute this only once. +*/ + if (cpus_stopped) + return; + + cpus_stopped = 1; + if (num_online_cpus() == 1) return; -- 2.7.4
Re: [PATCH] arm64:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores
> On 4 Aug 2017, at 8:43 PM, AKASHI Takahiro <takahiro.aka...@linaro.org> wrote: > >> On Fri, Aug 04, 2017 at 11:38:16AM +0100, James Morse wrote: >> Hi Hoeun, >> >>> On 04/08/17 08:02, Hoeun Ryu wrote: >>> Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly >>> version in panic path) introduced crash_smp_send_stop() which is a weak >>> function and can be overriden by architecture codes to fix the side effect >>> caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_ >>> notifiers" option). >> >> If I've understood correctly: if we boot with this option core code doesn't >> use >> our machine_crash_shutdown(), and instead calls crash_smp_send_stop(), which >> we > > No. Machine_crash_shutdown() is always called, but at that time, > all the cpus other than the crashing cpu have already died in this case. > You're right. >> don't have, so it uses the default smp_send_stop(), which doesn't save the >> regs. >> >> Thanks for catching this! >> >> >> Could we rename smp_send_crash_stop() crash_smp_send_stop() and add the >> called-twice logic there? They are similar enough that I'm getting them >> muddled >> already! >> > > Nice. I'll reflect it in v2. Thank you for the review. > > -Takahiro AKASHI > >> >> Thanks, >> >> James >> >> >>> ARM64 architecture uses the weak version function and the problem is that >>> the weak function simply calls smp_send_stop() which makes other CPUs >>> offline and takes away the chance to save crash information for nonpanic >>> CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel >>> option is enabled. >>> >>> Calling smp_send_crash_stop() in the function is useless because all >>> nonpanic CPUs are already offline by smp_send_stop() in this case and >>> smp_send_crash_stop() only works against online CPUs. >>> >>> The result is that /proc/vmcore is not available with the error messages; >>> "Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized". >>> >>> crash_smp_send_stop() is implemented for ARM64 architecture to fix this >>> problem and the function (strong symbol version) saves crash information >>> for nonpanic CPUs using smp_send_crash_stop() and machine_crash_shutdown() >>> tries to save crash information for nonpanic CPUs only when >>> crash_kexec_post_notifiers kernel option is disabled. >> >>
Re: [PATCH] arm64:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores
> On 4 Aug 2017, at 8:43 PM, AKASHI Takahiro wrote: > >> On Fri, Aug 04, 2017 at 11:38:16AM +0100, James Morse wrote: >> Hi Hoeun, >> >>> On 04/08/17 08:02, Hoeun Ryu wrote: >>> Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly >>> version in panic path) introduced crash_smp_send_stop() which is a weak >>> function and can be overriden by architecture codes to fix the side effect >>> caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_ >>> notifiers" option). >> >> If I've understood correctly: if we boot with this option core code doesn't >> use >> our machine_crash_shutdown(), and instead calls crash_smp_send_stop(), which >> we > > No. Machine_crash_shutdown() is always called, but at that time, > all the cpus other than the crashing cpu have already died in this case. > You're right. >> don't have, so it uses the default smp_send_stop(), which doesn't save the >> regs. >> >> Thanks for catching this! >> >> >> Could we rename smp_send_crash_stop() crash_smp_send_stop() and add the >> called-twice logic there? They are similar enough that I'm getting them >> muddled >> already! >> > > Nice. I'll reflect it in v2. Thank you for the review. > > -Takahiro AKASHI > >> >> Thanks, >> >> James >> >> >>> ARM64 architecture uses the weak version function and the problem is that >>> the weak function simply calls smp_send_stop() which makes other CPUs >>> offline and takes away the chance to save crash information for nonpanic >>> CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel >>> option is enabled. >>> >>> Calling smp_send_crash_stop() in the function is useless because all >>> nonpanic CPUs are already offline by smp_send_stop() in this case and >>> smp_send_crash_stop() only works against online CPUs. >>> >>> The result is that /proc/vmcore is not available with the error messages; >>> "Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized". >>> >>> crash_smp_send_stop() is implemented for ARM64 architecture to fix this >>> problem and the function (strong symbol version) saves crash information >>> for nonpanic CPUs using smp_send_crash_stop() and machine_crash_shutdown() >>> tries to save crash information for nonpanic CPUs only when >>> crash_kexec_post_notifiers kernel option is disabled. >> >>
Re: [PATCH] arm64:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores
> On 4 Aug 2017, at 7:38 PM, James Morse <james.mo...@arm.com> wrote: > > Hi Hoeun, > >> On 04/08/17 08:02, Hoeun Ryu wrote: >> Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly >> version in panic path) introduced crash_smp_send_stop() which is a weak >> function and can be overriden by architecture codes to fix the side effect >> caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_ >> notifiers" option). > > If I've understood correctly: if we boot with this option core code doesn't > use > our machine_crash_shutdown(), and instead calls crash_smp_send_stop(), which > we > don't have, so it uses the default smp_send_stop(), which doesn't save the > regs. > > Thanks for catching this! > > > Could we rename smp_send_crash_stop() crash_smp_send_stop() and add the > called-twice logic there? They are similar enough that I'm getting them > muddled > already! I think it is possible, I will reflect it in v2. Thank you for the review. > > Thanks, > > James > > >> ARM64 architecture uses the weak version function and the problem is that >> the weak function simply calls smp_send_stop() which makes other CPUs >> offline and takes away the chance to save crash information for nonpanic >> CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel >> option is enabled. >> >> Calling smp_send_crash_stop() in the function is useless because all >> nonpanic CPUs are already offline by smp_send_stop() in this case and >> smp_send_crash_stop() only works against online CPUs. >> >> The result is that /proc/vmcore is not available with the error messages; >> "Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized". >> >> crash_smp_send_stop() is implemented for ARM64 architecture to fix this >> problem and the function (strong symbol version) saves crash information >> for nonpanic CPUs using smp_send_crash_stop() and machine_crash_shutdown() >> tries to save crash information for nonpanic CPUs only when >> crash_kexec_post_notifiers kernel option is disabled. > >
Re: [PATCH] arm64:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores
> On 4 Aug 2017, at 7:38 PM, James Morse wrote: > > Hi Hoeun, > >> On 04/08/17 08:02, Hoeun Ryu wrote: >> Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly >> version in panic path) introduced crash_smp_send_stop() which is a weak >> function and can be overriden by architecture codes to fix the side effect >> caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_ >> notifiers" option). > > If I've understood correctly: if we boot with this option core code doesn't > use > our machine_crash_shutdown(), and instead calls crash_smp_send_stop(), which > we > don't have, so it uses the default smp_send_stop(), which doesn't save the > regs. > > Thanks for catching this! > > > Could we rename smp_send_crash_stop() crash_smp_send_stop() and add the > called-twice logic there? They are similar enough that I'm getting them > muddled > already! I think it is possible, I will reflect it in v2. Thank you for the review. > > Thanks, > > James > > >> ARM64 architecture uses the weak version function and the problem is that >> the weak function simply calls smp_send_stop() which makes other CPUs >> offline and takes away the chance to save crash information for nonpanic >> CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel >> option is enabled. >> >> Calling smp_send_crash_stop() in the function is useless because all >> nonpanic CPUs are already offline by smp_send_stop() in this case and >> smp_send_crash_stop() only works against online CPUs. >> >> The result is that /proc/vmcore is not available with the error messages; >> "Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized". >> >> crash_smp_send_stop() is implemented for ARM64 architecture to fix this >> problem and the function (strong symbol version) saves crash information >> for nonpanic CPUs using smp_send_crash_stop() and machine_crash_shutdown() >> tries to save crash information for nonpanic CPUs only when >> crash_kexec_post_notifiers kernel option is disabled. > >
Re: [PATCH] arm:lpae: build TTB control register value from scratch in v7_ttb_setup
2017. 8. 4. 오후 7:04 Robin Murphy <robin.mur...@arm.com> 작성: >> On 04/08/17 07:07, Hoeun Ryu wrote: >> Hello, Russell King. >> >> The following patch has not merged yet. >> Do you have a plan to accept and merge this patch ? > > This should probably go through the ARM tree, so please submit it to > Russell's patch-tracking system here: > > http://www.armlinux.org.uk/developer/patches/ Thank you for the reply, I'll try it. > > Robin. > >> >> Thank you. >> >>> On Mon, 2017-06-12 at 10:47 +0900, Hoeun Ryu wrote: >>> Reading TTBCR in early boot stage might return the value of the previous >>> kernel's configuration, especially in case of kexec. For example, if >>> normal kernel (first kernel) had run on a configuration of PHYS_OFFSET <= >>> PAGE_OFFSET and crash kernel (second kernel) is running on a configuration >>> PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the >>> reserved area for crash kernel, reading TTBCR and using the value to OR >>> other bit fields might be risky because it doesn't have a reset value for >>> TTBCR. >>> >>> Acked-by: Russell King <rmk+ker...@armlinux.org.uk> >>> Suggested-by: Robin Murphy <robin.mur...@arm.com> >>> Signed-off-by: Hoeun Ryu <hoeun@gmail.com> >>> >>> --- >>> >>> * add Acked-by: Russell King <rmk+ker...@armlinux.org.uk> >>> * v1: amended based on >>> - "[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when >>>PHYS_OFFSET > PAGE_OFFSET" >>> - https://lkml.org/lkml/2017/6/5/239 >>> >>> arch/arm/mm/proc-v7-3level.S | 3 +-- >>> 1 file changed, 1 insertion(+), 2 deletions(-) >>> >>> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S >>> index 5e5720e..7d16bbc 100644 >>> --- a/arch/arm/mm/proc-v7-3level.S >>> +++ b/arch/arm/mm/proc-v7-3level.S >>> @@ -129,8 +129,7 @@ ENDPROC(cpu_v7_set_pte_ext) >>>.macrov7_ttb_setup, zero, ttbr0l, ttbr0h, ttbr1, tmp >>>ldr\tmp, =swapper_pg_dir@ swapper_pg_dir virtual address >>>cmp\ttbr1, \tmp, lsr #12@ PHYS_OFFSET > PAGE_OFFSET? >>> -mrcp15, 0, \tmp, c2, c0, 2@ TTB control egister >>> -orr\tmp, \tmp, #TTB_EAE >>> +mov\tmp, #TTB_EAE@ for TTB control egister >>>ALT_SMP(orr\tmp, \tmp, #TTB_FLAGS_SMP) >>>ALT_UP(orr\tmp, \tmp, #TTB_FLAGS_UP) >>>ALT_SMP(orr\tmp, \tmp, #TTB_FLAGS_SMP << 16) >
Re: [PATCH] arm:lpae: build TTB control register value from scratch in v7_ttb_setup
2017. 8. 4. 오후 7:04 Robin Murphy 작성: >> On 04/08/17 07:07, Hoeun Ryu wrote: >> Hello, Russell King. >> >> The following patch has not merged yet. >> Do you have a plan to accept and merge this patch ? > > This should probably go through the ARM tree, so please submit it to > Russell's patch-tracking system here: > > http://www.armlinux.org.uk/developer/patches/ Thank you for the reply, I'll try it. > > Robin. > >> >> Thank you. >> >>> On Mon, 2017-06-12 at 10:47 +0900, Hoeun Ryu wrote: >>> Reading TTBCR in early boot stage might return the value of the previous >>> kernel's configuration, especially in case of kexec. For example, if >>> normal kernel (first kernel) had run on a configuration of PHYS_OFFSET <= >>> PAGE_OFFSET and crash kernel (second kernel) is running on a configuration >>> PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the >>> reserved area for crash kernel, reading TTBCR and using the value to OR >>> other bit fields might be risky because it doesn't have a reset value for >>> TTBCR. >>> >>> Acked-by: Russell King >>> Suggested-by: Robin Murphy >>> Signed-off-by: Hoeun Ryu >>> >>> --- >>> >>> * add Acked-by: Russell King >>> * v1: amended based on >>> - "[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when >>>PHYS_OFFSET > PAGE_OFFSET" >>> - https://lkml.org/lkml/2017/6/5/239 >>> >>> arch/arm/mm/proc-v7-3level.S | 3 +-- >>> 1 file changed, 1 insertion(+), 2 deletions(-) >>> >>> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S >>> index 5e5720e..7d16bbc 100644 >>> --- a/arch/arm/mm/proc-v7-3level.S >>> +++ b/arch/arm/mm/proc-v7-3level.S >>> @@ -129,8 +129,7 @@ ENDPROC(cpu_v7_set_pte_ext) >>>.macrov7_ttb_setup, zero, ttbr0l, ttbr0h, ttbr1, tmp >>>ldr\tmp, =swapper_pg_dir@ swapper_pg_dir virtual address >>>cmp\ttbr1, \tmp, lsr #12@ PHYS_OFFSET > PAGE_OFFSET? >>> -mrcp15, 0, \tmp, c2, c0, 2@ TTB control egister >>> -orr\tmp, \tmp, #TTB_EAE >>> +mov\tmp, #TTB_EAE@ for TTB control egister >>>ALT_SMP(orr\tmp, \tmp, #TTB_FLAGS_SMP) >>>ALT_UP(orr\tmp, \tmp, #TTB_FLAGS_UP) >>>ALT_SMP(orr\tmp, \tmp, #TTB_FLAGS_SMP << 16) >
[PATCH] arm64:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores
Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly version in panic path) introduced crash_smp_send_stop() which is a weak function and can be overriden by architecture codes to fix the side effect caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_ notifiers" option). ARM64 architecture uses the weak version function and the problem is that the weak function simply calls smp_send_stop() which makes other CPUs offline and takes away the chance to save crash information for nonpanic CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel option is enabled. Calling smp_send_crash_stop() in the function is useless because all nonpanic CPUs are already offline by smp_send_stop() in this case and smp_send_crash_stop() only works against online CPUs. The result is that /proc/vmcore is not available with the error messages; "Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized". crash_smp_send_stop() is implemented for ARM64 architecture to fix this problem and the function (strong symbol version) saves crash information for nonpanic CPUs using smp_send_crash_stop() and machine_crash_shutdown() tries to save crash information for nonpanic CPUs only when crash_kexec_post_notifiers kernel option is disabled. Signed-off-by: Hoeun Ryu <hoeun@gmail.com> --- arch/arm64/kernel/machine_kexec.c | 19 ++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index 481f54a..ec55cd8 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -213,6 +213,23 @@ void machine_kexec(struct kimage *kimage) BUG(); /* Should never get here. */ } +void crash_smp_send_stop(void) +{ + static int cpus_stopped; + + /* +* This function can be called twice in panic path, but obviously +* we execute this only once. +*/ + if (cpus_stopped) + return; + + /* shutdown non-crashing cpus */ + smp_send_crash_stop(); + + cpus_stopped = 1; +} + static void machine_kexec_mask_interrupts(void) { unsigned int i; @@ -252,7 +269,7 @@ void machine_crash_shutdown(struct pt_regs *regs) local_irq_disable(); /* shutdown non-crashing cpus */ - smp_send_crash_stop(); + crash_smp_send_stop(); /* for crashing cpu */ crash_save_cpu(regs, smp_processor_id()); -- 2.7.4
[PATCH] arm64:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores
Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly version in panic path) introduced crash_smp_send_stop() which is a weak function and can be overriden by architecture codes to fix the side effect caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_ notifiers" option). ARM64 architecture uses the weak version function and the problem is that the weak function simply calls smp_send_stop() which makes other CPUs offline and takes away the chance to save crash information for nonpanic CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel option is enabled. Calling smp_send_crash_stop() in the function is useless because all nonpanic CPUs are already offline by smp_send_stop() in this case and smp_send_crash_stop() only works against online CPUs. The result is that /proc/vmcore is not available with the error messages; "Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized". crash_smp_send_stop() is implemented for ARM64 architecture to fix this problem and the function (strong symbol version) saves crash information for nonpanic CPUs using smp_send_crash_stop() and machine_crash_shutdown() tries to save crash information for nonpanic CPUs only when crash_kexec_post_notifiers kernel option is disabled. Signed-off-by: Hoeun Ryu --- arch/arm64/kernel/machine_kexec.c | 19 ++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index 481f54a..ec55cd8 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c @@ -213,6 +213,23 @@ void machine_kexec(struct kimage *kimage) BUG(); /* Should never get here. */ } +void crash_smp_send_stop(void) +{ + static int cpus_stopped; + + /* +* This function can be called twice in panic path, but obviously +* we execute this only once. +*/ + if (cpus_stopped) + return; + + /* shutdown non-crashing cpus */ + smp_send_crash_stop(); + + cpus_stopped = 1; +} + static void machine_kexec_mask_interrupts(void) { unsigned int i; @@ -252,7 +269,7 @@ void machine_crash_shutdown(struct pt_regs *regs) local_irq_disable(); /* shutdown non-crashing cpus */ - smp_send_crash_stop(); + crash_smp_send_stop(); /* for crashing cpu */ crash_save_cpu(regs, smp_processor_id()); -- 2.7.4
[PATCHv2] arm:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores
Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly version in panic path) introduced crash_smp_send_stop() which is a weak function and can be overriden by architecture codes to fix the side effect caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_ notifiers" option). ARM architecture uses the weak version function and the problem is that the weak function simply calls smp_send_stop() which makes other CPUs offline and takes away the chance to save crash information for nonpanic CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel option is enabled. Calling smp_call_function(machine_crash_nonpanic_core, NULL, false) in the function is useless because all nonpanic CPUs are already offline by smp_send_stop() in this case and smp_call_function() only works against online CPUs. The result is that /proc/vmcore is not available with the error messages; "Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized". crash_smp_send_stop() is implemented for ARM architecture to fix this problem and the function (strong symbol version) saves crash information for nonpanic CPUs using smp_call_function() and machine_crash_shutdown() tries to save crash information for nonpanic CPUs only when crash_kexec_post_notifiers kernel option is disabled. We might be able to implement the function like arm64 or x86 using a dedicated IPI (let's say IPI_CPU_CRASH_STOP), but we cannot implement this function like that because of the lack of IPI slots. Please see the commit e7273ff4 : (ARM: 8488/1: Make IPI_CPU_BACKTRACE a "non-secure" SGI) Signed-off-by: Hoeun Ryu <hoeun@gmail.com> --- v2: - calling crash_smp_send_stop() in machine_crash_shutdown() for the case when crash_kexec_post_notifiers kernel option is disabled. - fix commit messages for it. arch/arm/kernel/machine_kexec.c | 37 +++-- 1 file changed, 27 insertions(+), 10 deletions(-) diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c index fe1419e..b58a49a 100644 --- a/arch/arm/kernel/machine_kexec.c +++ b/arch/arm/kernel/machine_kexec.c @@ -94,6 +94,31 @@ void machine_crash_nonpanic_core(void *unused) cpu_relax(); } +void crash_smp_send_stop(void) +{ + static int cpus_stopped; + unsigned long msecs; + + /* +* This function can be called twice in panic path, but obviously +* we execute this only once. +*/ + if (cpus_stopped) + return; + + atomic_set(_for_crash_ipi, num_online_cpus() - 1); + smp_call_function(machine_crash_nonpanic_core, NULL, false); + msecs = 1000; /* Wait at most a second for the other cpus to stop */ + while ((atomic_read(_for_crash_ipi) > 0) && msecs) { + mdelay(1); + msecs--; + } + if (atomic_read(_for_crash_ipi) > 0) + pr_warn("Non-crashing CPUs did not react to IPI\n"); + + cpus_stopped = 1; +} + static void machine_kexec_mask_interrupts(void) { unsigned int i; @@ -119,19 +144,11 @@ static void machine_kexec_mask_interrupts(void) void machine_crash_shutdown(struct pt_regs *regs) { - unsigned long msecs; + WARN_ON(num_online_cpus() > 1); local_irq_disable(); - atomic_set(_for_crash_ipi, num_online_cpus() - 1); - smp_call_function(machine_crash_nonpanic_core, NULL, false); - msecs = 1000; /* Wait at most a second for the other cpus to stop */ - while ((atomic_read(_for_crash_ipi) > 0) && msecs) { - mdelay(1); - msecs--; - } - if (atomic_read(_for_crash_ipi) > 0) - pr_warn("Non-crashing CPUs did not react to IPI\n"); + crash_smp_send_stop(); crash_save_cpu(regs, smp_processor_id()); machine_kexec_mask_interrupts(); -- 2.7.4
[PATCHv2] arm:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores
Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly version in panic path) introduced crash_smp_send_stop() which is a weak function and can be overriden by architecture codes to fix the side effect caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_ notifiers" option). ARM architecture uses the weak version function and the problem is that the weak function simply calls smp_send_stop() which makes other CPUs offline and takes away the chance to save crash information for nonpanic CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel option is enabled. Calling smp_call_function(machine_crash_nonpanic_core, NULL, false) in the function is useless because all nonpanic CPUs are already offline by smp_send_stop() in this case and smp_call_function() only works against online CPUs. The result is that /proc/vmcore is not available with the error messages; "Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized". crash_smp_send_stop() is implemented for ARM architecture to fix this problem and the function (strong symbol version) saves crash information for nonpanic CPUs using smp_call_function() and machine_crash_shutdown() tries to save crash information for nonpanic CPUs only when crash_kexec_post_notifiers kernel option is disabled. We might be able to implement the function like arm64 or x86 using a dedicated IPI (let's say IPI_CPU_CRASH_STOP), but we cannot implement this function like that because of the lack of IPI slots. Please see the commit e7273ff4 : (ARM: 8488/1: Make IPI_CPU_BACKTRACE a "non-secure" SGI) Signed-off-by: Hoeun Ryu --- v2: - calling crash_smp_send_stop() in machine_crash_shutdown() for the case when crash_kexec_post_notifiers kernel option is disabled. - fix commit messages for it. arch/arm/kernel/machine_kexec.c | 37 +++-- 1 file changed, 27 insertions(+), 10 deletions(-) diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c index fe1419e..b58a49a 100644 --- a/arch/arm/kernel/machine_kexec.c +++ b/arch/arm/kernel/machine_kexec.c @@ -94,6 +94,31 @@ void machine_crash_nonpanic_core(void *unused) cpu_relax(); } +void crash_smp_send_stop(void) +{ + static int cpus_stopped; + unsigned long msecs; + + /* +* This function can be called twice in panic path, but obviously +* we execute this only once. +*/ + if (cpus_stopped) + return; + + atomic_set(_for_crash_ipi, num_online_cpus() - 1); + smp_call_function(machine_crash_nonpanic_core, NULL, false); + msecs = 1000; /* Wait at most a second for the other cpus to stop */ + while ((atomic_read(_for_crash_ipi) > 0) && msecs) { + mdelay(1); + msecs--; + } + if (atomic_read(_for_crash_ipi) > 0) + pr_warn("Non-crashing CPUs did not react to IPI\n"); + + cpus_stopped = 1; +} + static void machine_kexec_mask_interrupts(void) { unsigned int i; @@ -119,19 +144,11 @@ static void machine_kexec_mask_interrupts(void) void machine_crash_shutdown(struct pt_regs *regs) { - unsigned long msecs; + WARN_ON(num_online_cpus() > 1); local_irq_disable(); - atomic_set(_for_crash_ipi, num_online_cpus() - 1); - smp_call_function(machine_crash_nonpanic_core, NULL, false); - msecs = 1000; /* Wait at most a second for the other cpus to stop */ - while ((atomic_read(_for_crash_ipi) > 0) && msecs) { - mdelay(1); - msecs--; - } - if (atomic_read(_for_crash_ipi) > 0) - pr_warn("Non-crashing CPUs did not react to IPI\n"); + crash_smp_send_stop(); crash_save_cpu(regs, smp_processor_id()); machine_kexec_mask_interrupts(); -- 2.7.4
Re: [PATCH] arm:lpae: build TTB control register value from scratch in v7_ttb_setup
Hello, Russell King. The following patch has not merged yet. Do you have a plan to accept and merge this patch ? Thank you. On Mon, 2017-06-12 at 10:47 +0900, Hoeun Ryu wrote: > Â Reading TTBCR in early boot stage might return the value of the previous > kernel's configuration, especially in case of kexec. For example, if > normal kernel (first kernel) had run on a configuration of PHYS_OFFSET <= > PAGE_OFFSET and crash kernel (second kernel) is running on a configuration > PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the > reserved area for crash kernel, reading TTBCR and using the value to OR > other bit fields might be risky because it doesn't have a reset value for > TTBCR. > > Acked-by: Russell King <rmk+ker...@armlinux.org.uk> > Suggested-by: Robin Murphy <robin.mur...@arm.com> > Signed-off-by: Hoeun Ryu <hoeun@gmail.com> > > --- > > Â * add Acked-by: Russell King <rmk+ker...@armlinux.org.uk> > Â * v1: amended based on > Â - "[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when > PHYS_OFFSET > PAGE_OFFSET" > Â - https://lkml.org/lkml/2017/6/5/239 > > Â arch/arm/mm/proc-v7-3level.S | 3 +-- > Â 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S > index 5e5720e..7d16bbc 100644 > --- a/arch/arm/mm/proc-v7-3level.S > +++ b/arch/arm/mm/proc-v7-3level.S > @@ -129,8 +129,7 @@ ENDPROC(cpu_v7_set_pte_ext) > Â .macro v7_ttb_setup, zero, ttbr0l, ttbr0h, ttbr1, tmp > Â ldr \tmp, =swapper_pg_dir @ swapper_pg_dir virtual address > Â cmp \ttbr1, \tmp, lsr #12 @ PHYS_OFFSET > PAGE_OFFSET? > - mrc p15, 0, \tmp, c2, c0, 2 @ TTB control egister > - orr \tmp, \tmp, #TTB_EAE > + mov \tmp, #TTB_EAE @ for TTB control egister > Â ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP) > Â ALT_UP(orr \tmp, \tmp, #TTB_FLAGS_UP) > Â ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP << 16)
Re: [PATCH] arm:lpae: build TTB control register value from scratch in v7_ttb_setup
Hello, Russell King. The following patch has not merged yet. Do you have a plan to accept and merge this patch ? Thank you. On Mon, 2017-06-12 at 10:47 +0900, Hoeun Ryu wrote: > Â Reading TTBCR in early boot stage might return the value of the previous > kernel's configuration, especially in case of kexec. For example, if > normal kernel (first kernel) had run on a configuration of PHYS_OFFSET <= > PAGE_OFFSET and crash kernel (second kernel) is running on a configuration > PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the > reserved area for crash kernel, reading TTBCR and using the value to OR > other bit fields might be risky because it doesn't have a reset value for > TTBCR. > > Acked-by: Russell King > Suggested-by: Robin Murphy > Signed-off-by: Hoeun Ryu > > --- > > Â * add Acked-by: Russell King > Â * v1: amended based on > Â - "[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when > PHYS_OFFSET > PAGE_OFFSET" > Â - https://lkml.org/lkml/2017/6/5/239 > > Â arch/arm/mm/proc-v7-3level.S | 3 +-- > Â 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S > index 5e5720e..7d16bbc 100644 > --- a/arch/arm/mm/proc-v7-3level.S > +++ b/arch/arm/mm/proc-v7-3level.S > @@ -129,8 +129,7 @@ ENDPROC(cpu_v7_set_pte_ext) > Â .macro v7_ttb_setup, zero, ttbr0l, ttbr0h, ttbr1, tmp > Â ldr \tmp, =swapper_pg_dir @ swapper_pg_dir virtual address > Â cmp \ttbr1, \tmp, lsr #12 @ PHYS_OFFSET > PAGE_OFFSET? > - mrc p15, 0, \tmp, c2, c0, 2 @ TTB control egister > - orr \tmp, \tmp, #TTB_EAE > + mov \tmp, #TTB_EAE @ for TTB control egister > Â ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP) > Â ALT_UP(orr \tmp, \tmp, #TTB_FLAGS_UP) > Â ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP << 16)
[PATCH] arm:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores
Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly version in panic path) introduced crash_smp_send_stop() which is a weak function and can be overriden by architecture codes to fix the side effect caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_ notifiers" option). ARM architecture uses the weak version function and the problem is that the weak function simply calls smp_send_stop() which makes other CPUs offline and takes away the chance to save crash information for nonpanic CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel option is enabled. Calling smp_call_function(machine_crash_nonpanic_core, NULL, false) in the function is useless because all nonpanic CPUs are already offline by smp_send_stop() in this case and smp_call_function() only works against online CPUs. The result is that /proc/vmcore is not available with the error messages; "Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized". crash_smp_send_stop() is implemented for ARM architecture to fix this problem and the function (strong symbol version) saves crash information for nonpanic CPUs using smp_call_function() and machine_crash_shutdown() saves crash information only for the panic CPU. We might be able to implement the function like arm64 or x86 using a dedicated IPI (let's say IPI_CPU_CRASH_STOP), but we cannot implement this function like that because of the lack of IPI slot. Please see the commit e7273ff4 : (ARM: 8488/1: Make IPI_CPU_BACKTRACE a "non-secure" SGI) Signed-off-by: Hoeun Ryu <hoeun@gmail.com> --- arch/arm/kernel/machine_kexec.c | 37 ++--- 1 file changed, 26 insertions(+), 11 deletions(-) diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c index fe1419e..18d084a 100644 --- a/arch/arm/kernel/machine_kexec.c +++ b/arch/arm/kernel/machine_kexec.c @@ -94,6 +94,31 @@ void machine_crash_nonpanic_core(void *unused) cpu_relax(); } +void crash_smp_send_stop(void) +{ + static int cpus_stopped; + unsigned long msecs; + + /* +* This function can be called twice in panic path, but obviously +* we execute this only once. +*/ + if (cpus_stopped) + return; + + atomic_set(_for_crash_ipi, num_online_cpus() - 1); + smp_call_function(machine_crash_nonpanic_core, NULL, false); + msecs = 1000; /* Wait at most a second for the other cpus to stop */ + while ((atomic_read(_for_crash_ipi) > 0) && msecs) { + mdelay(1); + msecs--; + } + if (atomic_read(_for_crash_ipi) > 0) + pr_warn("Non-crashing CPUs did not react to IPI\n"); + + cpus_stopped = 1; +} + static void machine_kexec_mask_interrupts(void) { unsigned int i; @@ -119,20 +144,10 @@ static void machine_kexec_mask_interrupts(void) void machine_crash_shutdown(struct pt_regs *regs) { - unsigned long msecs; + WARN_ON(num_online_cpus() > 1); local_irq_disable(); - atomic_set(_for_crash_ipi, num_online_cpus() - 1); - smp_call_function(machine_crash_nonpanic_core, NULL, false); - msecs = 1000; /* Wait at most a second for the other cpus to stop */ - while ((atomic_read(_for_crash_ipi) > 0) && msecs) { - mdelay(1); - msecs--; - } - if (atomic_read(_for_crash_ipi) > 0) - pr_warn("Non-crashing CPUs did not react to IPI\n"); - crash_save_cpu(regs, smp_processor_id()); machine_kexec_mask_interrupts(); -- 2.7.4
[PATCH] arm:kexec: have own crash_smp_send_stop() for crash dump for nonpanic cores
Commit 0ee5941 : (x86/panic: replace smp_send_stop() with kdump friendly version in panic path) introduced crash_smp_send_stop() which is a weak function and can be overriden by architecture codes to fix the side effect caused by commit f06e515 : (kernel/panic.c: add "crash_kexec_post_ notifiers" option). ARM architecture uses the weak version function and the problem is that the weak function simply calls smp_send_stop() which makes other CPUs offline and takes away the chance to save crash information for nonpanic CPUs in machine_crash_shutdown() when crash_kexec_post_notifiers kernel option is enabled. Calling smp_call_function(machine_crash_nonpanic_core, NULL, false) in the function is useless because all nonpanic CPUs are already offline by smp_send_stop() in this case and smp_call_function() only works against online CPUs. The result is that /proc/vmcore is not available with the error messages; "Warning: Zero PT_NOTE entries found", "Kdump: vmcore not initialized". crash_smp_send_stop() is implemented for ARM architecture to fix this problem and the function (strong symbol version) saves crash information for nonpanic CPUs using smp_call_function() and machine_crash_shutdown() saves crash information only for the panic CPU. We might be able to implement the function like arm64 or x86 using a dedicated IPI (let's say IPI_CPU_CRASH_STOP), but we cannot implement this function like that because of the lack of IPI slot. Please see the commit e7273ff4 : (ARM: 8488/1: Make IPI_CPU_BACKTRACE a "non-secure" SGI) Signed-off-by: Hoeun Ryu --- arch/arm/kernel/machine_kexec.c | 37 ++--- 1 file changed, 26 insertions(+), 11 deletions(-) diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c index fe1419e..18d084a 100644 --- a/arch/arm/kernel/machine_kexec.c +++ b/arch/arm/kernel/machine_kexec.c @@ -94,6 +94,31 @@ void machine_crash_nonpanic_core(void *unused) cpu_relax(); } +void crash_smp_send_stop(void) +{ + static int cpus_stopped; + unsigned long msecs; + + /* +* This function can be called twice in panic path, but obviously +* we execute this only once. +*/ + if (cpus_stopped) + return; + + atomic_set(_for_crash_ipi, num_online_cpus() - 1); + smp_call_function(machine_crash_nonpanic_core, NULL, false); + msecs = 1000; /* Wait at most a second for the other cpus to stop */ + while ((atomic_read(_for_crash_ipi) > 0) && msecs) { + mdelay(1); + msecs--; + } + if (atomic_read(_for_crash_ipi) > 0) + pr_warn("Non-crashing CPUs did not react to IPI\n"); + + cpus_stopped = 1; +} + static void machine_kexec_mask_interrupts(void) { unsigned int i; @@ -119,20 +144,10 @@ static void machine_kexec_mask_interrupts(void) void machine_crash_shutdown(struct pt_regs *regs) { - unsigned long msecs; + WARN_ON(num_online_cpus() > 1); local_irq_disable(); - atomic_set(_for_crash_ipi, num_online_cpus() - 1); - smp_call_function(machine_crash_nonpanic_core, NULL, false); - msecs = 1000; /* Wait at most a second for the other cpus to stop */ - while ((atomic_read(_for_crash_ipi) > 0) && msecs) { - mdelay(1); - msecs--; - } - if (atomic_read(_for_crash_ipi) > 0) - pr_warn("Non-crashing CPUs did not react to IPI\n"); - crash_save_cpu(regs, smp_processor_id()); machine_kexec_mask_interrupts(); -- 2.7.4
Re: [PATCH] arm:lpae: build TTB control register value from scratch in v7_ttb_setup
Hello, Russell King. Do you have a plan to include this patch in your tree ? Thank you. On Mon, 2017-06-12 at 10:47 +0900, Hoeun Ryu wrote: > Â Reading TTBCR in early boot stage might return the value of the > previous > kernel's configuration, especially in case of kexec. For example, if > normal kernel (first kernel) had run on a configuration of > PHYS_OFFSET <= > PAGE_OFFSET and crash kernel (second kernel) is running on a > configuration > PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the > reserved area for crash kernel, reading TTBCR and using the value to > OR > other bit fields might be risky because it doesn't have a reset value > for > TTBCR. > > Acked-by: Russell King <rmk+ker...@armlinux.org.uk> > Suggested-by: Robin Murphy <robin.mur...@arm.com> > Signed-off-by: Hoeun Ryu <hoeun@gmail.com> > > --- > > Â * add Acked-by: Russell King <rmk+ker...@armlinux.org.uk> > Â * v1: amended based on > Â - "[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when > PHYS_OFFSET > PAGE_OFFSET" > Â - https://lkml.org/lkml/2017/6/5/239 > > Â arch/arm/mm/proc-v7-3level.S | 3 +-- > Â 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7- > 3level.S > index 5e5720e..7d16bbc 100644 > --- a/arch/arm/mm/proc-v7-3level.S > +++ b/arch/arm/mm/proc-v7-3level.S > @@ -129,8 +129,7 @@ ENDPROC(cpu_v7_set_pte_ext) > Â .macro v7_ttb_setup, zero, ttbr0l, ttbr0h, ttbr1, tmp > Â ldr \tmp, =swapper_pg_dir @ > swapper_pg_dir virtual address > Â cmp \ttbr1, \tmp, lsr #12 @ > PHYS_OFFSET > PAGE_OFFSET? > - mrc p15, 0, \tmp, c2, c0, 2 @ TTB > control egister > - orr \tmp, \tmp, #TTB_EAE > + mov \tmp, #TTB_EAE @ for TTB > control egister > Â ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP) > Â ALT_UP(orr \tmp, \tmp, #TTB_FLAGS_UP) > Â ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP << 16)
Re: [PATCH] arm:lpae: build TTB control register value from scratch in v7_ttb_setup
Hello, Russell King. Do you have a plan to include this patch in your tree ? Thank you. On Mon, 2017-06-12 at 10:47 +0900, Hoeun Ryu wrote: > Â Reading TTBCR in early boot stage might return the value of the > previous > kernel's configuration, especially in case of kexec. For example, if > normal kernel (first kernel) had run on a configuration of > PHYS_OFFSET <= > PAGE_OFFSET and crash kernel (second kernel) is running on a > configuration > PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the > reserved area for crash kernel, reading TTBCR and using the value to > OR > other bit fields might be risky because it doesn't have a reset value > for > TTBCR. > > Acked-by: Russell King > Suggested-by: Robin Murphy > Signed-off-by: Hoeun Ryu > > --- > > Â * add Acked-by: Russell King > Â * v1: amended based on > Â - "[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when > PHYS_OFFSET > PAGE_OFFSET" > Â - https://lkml.org/lkml/2017/6/5/239 > > Â arch/arm/mm/proc-v7-3level.S | 3 +-- > Â 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7- > 3level.S > index 5e5720e..7d16bbc 100644 > --- a/arch/arm/mm/proc-v7-3level.S > +++ b/arch/arm/mm/proc-v7-3level.S > @@ -129,8 +129,7 @@ ENDPROC(cpu_v7_set_pte_ext) > Â .macro v7_ttb_setup, zero, ttbr0l, ttbr0h, ttbr1, tmp > Â ldr \tmp, =swapper_pg_dir @ > swapper_pg_dir virtual address > Â cmp \ttbr1, \tmp, lsr #12 @ > PHYS_OFFSET > PAGE_OFFSET? > - mrc p15, 0, \tmp, c2, c0, 2 @ TTB > control egister > - orr \tmp, \tmp, #TTB_EAE > + mov \tmp, #TTB_EAE @ for TTB > control egister > Â ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP) > Â ALT_UP(orr \tmp, \tmp, #TTB_FLAGS_UP) > Â ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP << 16)
Re: [RESEND PATCH 1/2] arm:omap2+: put omap_uart_phys/virt/lsr in .text section when ZIMAGE is true
On Tue, 2017-06-13 at 22:27 -0700, Tony Lindgren wrote: > Hi, > > * Hoeun Ryu <hoeun@gmail.com> [170612 18:18]: > > > > --- a/arch/arm/include/debug/omap2plus.S > > +++ b/arch/arm/include/debug/omap2plus.S > > @@ -58,11 +58,22 @@ > > Â > > Â #define UART_OFFSET(addr) ((addr) & 0x00ff) > > Â > > +/* > > + * Definition of ZIMAGE is in arch/arm/boot/compressed/Makefile. > > + * Place the following block in .text section only when this file > > is > > + * included by arch/arm/boot/compressed/* to make it possible to > > + * enable CONFIG_DEBUG_UNCOMPRESS and DEBUG in > > arch/arm/boot/compressed/head.S > > + * on OMAP2+ SoCs. > > + */ > > +#ifndef ZIMAGE > > Â .pushsection .data > > +#endif > > Â omap_uart_phys:.word 0 > > Â omap_uart_virt:.word 0 > > Â omap_uart_lsr: .word 0 > > +#ifndef ZIMAGE > > Â .popsection > > +#endif > So I converted all these to use the 8250 debug_ll yesterday > which should solve the DEBUG_UNCOMPRESS issue for you and > allows us to remove this file. Will post the series shortly > for testing with you in Cc after I've done a bit more testing > here. It sounds good. patch 01 is dropped though, are you using the second patch ? "[PATCH 2/2] arm:omap2+: drop dependence on DEBUG_OMAP2PLUS_UART for DEBUG_UNCOMPRESS" Thank you. > > Regards, > > Tony
Re: [RESEND PATCH 1/2] arm:omap2+: put omap_uart_phys/virt/lsr in .text section when ZIMAGE is true
On Tue, 2017-06-13 at 22:27 -0700, Tony Lindgren wrote: > Hi, > > * Hoeun Ryu [170612 18:18]: > > > > --- a/arch/arm/include/debug/omap2plus.S > > +++ b/arch/arm/include/debug/omap2plus.S > > @@ -58,11 +58,22 @@ > > Â > > Â #define UART_OFFSET(addr) ((addr) & 0x00ff) > > Â > > +/* > > + * Definition of ZIMAGE is in arch/arm/boot/compressed/Makefile. > > + * Place the following block in .text section only when this file > > is > > + * included by arch/arm/boot/compressed/* to make it possible to > > + * enable CONFIG_DEBUG_UNCOMPRESS and DEBUG in > > arch/arm/boot/compressed/head.S > > + * on OMAP2+ SoCs. > > + */ > > +#ifndef ZIMAGE > > Â .pushsection .data > > +#endif > > Â omap_uart_phys:.word 0 > > Â omap_uart_virt:.word 0 > > Â omap_uart_lsr: .word 0 > > +#ifndef ZIMAGE > > Â .popsection > > +#endif > So I converted all these to use the 8250 debug_ll yesterday > which should solve the DEBUG_UNCOMPRESS issue for you and > allows us to remove this file. Will post the series shortly > for testing with you in Cc after I've done a bit more testing > here. It sounds good. patch 01 is dropped though, are you using the second patch ? "[PATCH 2/2] arm:omap2+: drop dependence on DEBUG_OMAP2PLUS_UART for DEBUG_UNCOMPRESS" Thank you. > > Regards, > > Tony
[RESEND PATCH 2/2] arm:omap2+: drop dependence on DEBUG_OMAP2PLUS_UART for DEBUG_UNCOMPRESS
With a patch 9a5151e1d1ba4df41bc4b8e8c0f17163247645ac, "arm:omap2+: put omap_uart_phys/virt/lsr in .text section when ZIMAGE is true", the dependence on DEBUG_OMAP2PLUS_UART for DEBUG_UNCOMPRESS can be dropped, because all code of omap2+'s DEBUG_LL_INCLUDE file is in .text section when it's included in the decompressor. Signed-off-by: Hoeun Ryu <hoeun@gmail.com> --- * mail to relevant recipients, no response yet from them. - add TO=Tony Lindgren <t...@atomide.com> - add CC=linux-o...@vger.kernel.org * indentical to previous patch arch/arm/Kconfig.debug | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug index ba2cb63..52eb0bf 100644 --- a/arch/arm/Kconfig.debug +++ b/arch/arm/Kconfig.debug @@ -1735,8 +1735,7 @@ config DEBUG_UART_8250_FLOW_CONTROL config DEBUG_UNCOMPRESS bool depends on ARCH_MULTIPLATFORM || PLAT_SAMSUNG || ARM_SINGLE_ARMV7M - default y if DEBUG_LL && !DEBUG_OMAP2PLUS_UART && \ -(!DEBUG_TEGRA_UART || !ZBOOT_ROM) && \ + default y if DEBUG_LL && (!DEBUG_TEGRA_UART || !ZBOOT_ROM) && \ !DEBUG_BRCMSTB_UART help This option influences the normal decompressor output for -- 2.7.4
[RESEND PATCH 2/2] arm:omap2+: drop dependence on DEBUG_OMAP2PLUS_UART for DEBUG_UNCOMPRESS
With a patch 9a5151e1d1ba4df41bc4b8e8c0f17163247645ac, "arm:omap2+: put omap_uart_phys/virt/lsr in .text section when ZIMAGE is true", the dependence on DEBUG_OMAP2PLUS_UART for DEBUG_UNCOMPRESS can be dropped, because all code of omap2+'s DEBUG_LL_INCLUDE file is in .text section when it's included in the decompressor. Signed-off-by: Hoeun Ryu --- * mail to relevant recipients, no response yet from them. - add TO=Tony Lindgren - add CC=linux-o...@vger.kernel.org * indentical to previous patch arch/arm/Kconfig.debug | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug index ba2cb63..52eb0bf 100644 --- a/arch/arm/Kconfig.debug +++ b/arch/arm/Kconfig.debug @@ -1735,8 +1735,7 @@ config DEBUG_UART_8250_FLOW_CONTROL config DEBUG_UNCOMPRESS bool depends on ARCH_MULTIPLATFORM || PLAT_SAMSUNG || ARM_SINGLE_ARMV7M - default y if DEBUG_LL && !DEBUG_OMAP2PLUS_UART && \ -(!DEBUG_TEGRA_UART || !ZBOOT_ROM) && \ + default y if DEBUG_LL && (!DEBUG_TEGRA_UART || !ZBOOT_ROM) && \ !DEBUG_BRCMSTB_UART help This option influences the normal decompressor output for -- 2.7.4
[RESEND PATCH 1/2] arm:omap2+: put omap_uart_phys/virt/lsr in .text section when ZIMAGE is true
omap_uart_phys, omap_uart_virt and omap_uart_lsr reside in .data section and it's right implementation. But because of this, we cannot enable CONFIG_DEBUG_UNCOMPRESS. LL_DEBUG and DEBUG_UNCOMPRESS are very useful tools for debugging early boot stage when something goes wrong if you don't have any hardware based debugging tools like a JTAG debugger. This patch is to put the variables into .text section instead only when the DEBUG_LL_INCLUDE file is included in the kernel decompressor, which is only when ZIMAGE is defined. This patch does not change anything when DEBUG_LL_INCLUDE is included in the other kernel parts like arch/arm/kernel/* Signed-off-by: Hoeun Ryu <hoeun@gmail.com> --- * to mail to relevant recipients, no respond yet from them - TO=Tony Lindgren <t...@atomide.com> - CC=linux-o...@vger.kernel.org * indentical to previous patch arch/arm/include/debug/omap2plus.S | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/arm/include/debug/omap2plus.S b/arch/arm/include/debug/omap2plus.S index 6d867ae..6ce6ef9 100644 --- a/arch/arm/include/debug/omap2plus.S +++ b/arch/arm/include/debug/omap2plus.S @@ -58,11 +58,22 @@ #define UART_OFFSET(addr) ((addr) & 0x00ff) +/* + * Definition of ZIMAGE is in arch/arm/boot/compressed/Makefile. + * Place the following block in .text section only when this file is + * included by arch/arm/boot/compressed/* to make it possible to + * enable CONFIG_DEBUG_UNCOMPRESS and DEBUG in arch/arm/boot/compressed/head.S + * on OMAP2+ SoCs. + */ +#ifndef ZIMAGE .pushsection .data +#endif omap_uart_phys:.word 0 omap_uart_virt:.word 0 omap_uart_lsr: .word 0 +#ifndef ZIMAGE .popsection +#endif .macro addruart, rp, rv, tmp -- 2.7.4
[RESEND PATCH 1/2] arm:omap2+: put omap_uart_phys/virt/lsr in .text section when ZIMAGE is true
omap_uart_phys, omap_uart_virt and omap_uart_lsr reside in .data section and it's right implementation. But because of this, we cannot enable CONFIG_DEBUG_UNCOMPRESS. LL_DEBUG and DEBUG_UNCOMPRESS are very useful tools for debugging early boot stage when something goes wrong if you don't have any hardware based debugging tools like a JTAG debugger. This patch is to put the variables into .text section instead only when the DEBUG_LL_INCLUDE file is included in the kernel decompressor, which is only when ZIMAGE is defined. This patch does not change anything when DEBUG_LL_INCLUDE is included in the other kernel parts like arch/arm/kernel/* Signed-off-by: Hoeun Ryu --- * to mail to relevant recipients, no respond yet from them - TO=Tony Lindgren - CC=linux-o...@vger.kernel.org * indentical to previous patch arch/arm/include/debug/omap2plus.S | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/arm/include/debug/omap2plus.S b/arch/arm/include/debug/omap2plus.S index 6d867ae..6ce6ef9 100644 --- a/arch/arm/include/debug/omap2plus.S +++ b/arch/arm/include/debug/omap2plus.S @@ -58,11 +58,22 @@ #define UART_OFFSET(addr) ((addr) & 0x00ff) +/* + * Definition of ZIMAGE is in arch/arm/boot/compressed/Makefile. + * Place the following block in .text section only when this file is + * included by arch/arm/boot/compressed/* to make it possible to + * enable CONFIG_DEBUG_UNCOMPRESS and DEBUG in arch/arm/boot/compressed/head.S + * on OMAP2+ SoCs. + */ +#ifndef ZIMAGE .pushsection .data +#endif omap_uart_phys:.word 0 omap_uart_virt:.word 0 omap_uart_lsr: .word 0 +#ifndef ZIMAGE .popsection +#endif .macro addruart, rp, rv, tmp -- 2.7.4
[PATCH] arm:lpae: build TTB control register value from scratch in v7_ttb_setup
Reading TTBCR in early boot stage might return the value of the previous kernel's configuration, especially in case of kexec. For example, if normal kernel (first kernel) had run on a configuration of PHYS_OFFSET <= PAGE_OFFSET and crash kernel (second kernel) is running on a configuration PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the reserved area for crash kernel, reading TTBCR and using the value to OR other bit fields might be risky because it doesn't have a reset value for TTBCR. Acked-by: Russell King <rmk+ker...@armlinux.org.uk> Suggested-by: Robin Murphy <robin.mur...@arm.com> Signed-off-by: Hoeun Ryu <hoeun@gmail.com> --- * add Acked-by: Russell King <rmk+ker...@armlinux.org.uk> * v1: amended based on - "[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when PHYS_OFFSET > PAGE_OFFSET" - https://lkml.org/lkml/2017/6/5/239 arch/arm/mm/proc-v7-3level.S | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S index 5e5720e..7d16bbc 100644 --- a/arch/arm/mm/proc-v7-3level.S +++ b/arch/arm/mm/proc-v7-3level.S @@ -129,8 +129,7 @@ ENDPROC(cpu_v7_set_pte_ext) .macro v7_ttb_setup, zero, ttbr0l, ttbr0h, ttbr1, tmp ldr \tmp, =swapper_pg_dir @ swapper_pg_dir virtual address cmp \ttbr1, \tmp, lsr #12 @ PHYS_OFFSET > PAGE_OFFSET? - mrc p15, 0, \tmp, c2, c0, 2 @ TTB control egister - orr \tmp, \tmp, #TTB_EAE + mov \tmp, #TTB_EAE @ for TTB control egister ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP) ALT_UP(orr \tmp, \tmp, #TTB_FLAGS_UP) ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP << 16) -- 2.7.4
[PATCH] arm:lpae: build TTB control register value from scratch in v7_ttb_setup
Reading TTBCR in early boot stage might return the value of the previous kernel's configuration, especially in case of kexec. For example, if normal kernel (first kernel) had run on a configuration of PHYS_OFFSET <= PAGE_OFFSET and crash kernel (second kernel) is running on a configuration PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the reserved area for crash kernel, reading TTBCR and using the value to OR other bit fields might be risky because it doesn't have a reset value for TTBCR. Acked-by: Russell King Suggested-by: Robin Murphy Signed-off-by: Hoeun Ryu --- * add Acked-by: Russell King * v1: amended based on - "[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when PHYS_OFFSET > PAGE_OFFSET" - https://lkml.org/lkml/2017/6/5/239 arch/arm/mm/proc-v7-3level.S | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S index 5e5720e..7d16bbc 100644 --- a/arch/arm/mm/proc-v7-3level.S +++ b/arch/arm/mm/proc-v7-3level.S @@ -129,8 +129,7 @@ ENDPROC(cpu_v7_set_pte_ext) .macro v7_ttb_setup, zero, ttbr0l, ttbr0h, ttbr1, tmp ldr \tmp, =swapper_pg_dir @ swapper_pg_dir virtual address cmp \ttbr1, \tmp, lsr #12 @ PHYS_OFFSET > PAGE_OFFSET? - mrc p15, 0, \tmp, c2, c0, 2 @ TTB control egister - orr \tmp, \tmp, #TTB_EAE + mov \tmp, #TTB_EAE @ for TTB control egister ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP) ALT_UP(orr \tmp, \tmp, #TTB_FLAGS_UP) ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP << 16) -- 2.7.4
Re: [PATCH 1/2] arm:omap2+: put omap_uart_phys/virt/lsr in .text section when ZIMAGE is true
Hello, Russell. Would you please review this patch ? Than you > On Jun 8, 2017, at 11:16 AM, Hoeun Ryu <hoeun@gmail.com> wrote: > > omap_uart_phys, omap_uart_virt and omap_uart_lsr reside in .data section > and it's right implementation. But because of this, we cannot enable > CONFIG_DEBUG_UNCOMPRESS. LL_DEBUG and DEBUG_UNCOMPRESS are very useful tools > for debugging early boot stage when something goes wrong if you don't have > any hardware based debugging tools like a JTAG debugger. > This patch is to put the variables into .text section instead only when the > DEBUG_LL_INCLUDE file is included in the kernel decompressor, which is only > when ZIMAGE is defined. > This patch does not change anything when DEBUG_LL_INCLUDE is included in > the other kernel parts like arch/arm/kernel/* > > Signed-off-by: Hoeun Ryu <hoeun@gmail.com> > --- > arch/arm/include/debug/omap2plus.S | 11 +++ > 1 file changed, 11 insertions(+) > > diff --git a/arch/arm/include/debug/omap2plus.S > b/arch/arm/include/debug/omap2plus.S > index 6d867ae..6ce6ef9 100644 > --- a/arch/arm/include/debug/omap2plus.S > +++ b/arch/arm/include/debug/omap2plus.S > @@ -58,11 +58,22 @@ > > #define UART_OFFSET(addr)((addr) & 0x00ff) > > +/* > + * Definition of ZIMAGE is in arch/arm/boot/compressed/Makefile. > + * Place the following block in .text section only when this file is > + * included by arch/arm/boot/compressed/* to make it possible to > + * enable CONFIG_DEBUG_UNCOMPRESS and DEBUG in > arch/arm/boot/compressed/head.S > + * on OMAP2+ SoCs. > + */ > +#ifndef ZIMAGE >.pushsection .data > +#endif > omap_uart_phys:.word0 > omap_uart_virt:.word0 > omap_uart_lsr:.word0 > +#ifndef ZIMAGE >.popsection > +#endif > >.macroaddruart, rp, rv, tmp > > -- > 2.7.4 >
Re: [PATCH] arm:lpae: build TTB control register value from scratch in v7_ttb_setup
Hello, Russell and Robin. Would you please review this patch ? Than you > On Jun 7, 2017, at 11:39 AM, Hoeun Ryu <hoeun@gmail.com> wrote: > > Reading TTBCR in early boot stage might return the value of the previous > kernel's configuration, especially in case of kexec. For example, if > normal kernel (first kernel) had run on a configuration of PHYS_OFFSET <= > PAGE_OFFSET and crash kernel (second kernel) is running on a configuration > PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the > reserved area for crash kernel, reading TTBCR and using the value to OR > other bit fields might be risky because it doesn't have a reset value for > TTBCR. > > Suggested-by: Robin Murphy <robin.mur...@arm.com> > Signed-off-by: Hoeun Ryu <hoeun@gmail.com> > --- > > * v1: amended based on > - "[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when > PHYS_OFFSET > PAGE_OFFSET" > - https://lkml.org/lkml/2017/6/5/239 > > arch/arm/mm/proc-v7-3level.S | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S > index 5e5720e..7d16bbc 100644 > --- a/arch/arm/mm/proc-v7-3level.S > +++ b/arch/arm/mm/proc-v7-3level.S > @@ -129,8 +129,7 @@ ENDPROC(cpu_v7_set_pte_ext) >.macrov7_ttb_setup, zero, ttbr0l, ttbr0h, ttbr1, tmp >ldr\tmp, =swapper_pg_dir@ swapper_pg_dir virtual address >cmp\ttbr1, \tmp, lsr #12@ PHYS_OFFSET > PAGE_OFFSET? > -mrcp15, 0, \tmp, c2, c0, 2@ TTB control egister > -orr\tmp, \tmp, #TTB_EAE > +mov\tmp, #TTB_EAE@ for TTB control egister >ALT_SMP(orr\tmp, \tmp, #TTB_FLAGS_SMP) >ALT_UP(orr\tmp, \tmp, #TTB_FLAGS_UP) >ALT_SMP(orr\tmp, \tmp, #TTB_FLAGS_SMP << 16) > -- > 2.7.4 >
Re: [PATCH 1/2] arm:omap2+: put omap_uart_phys/virt/lsr in .text section when ZIMAGE is true
Hello, Russell. Would you please review this patch ? Than you > On Jun 8, 2017, at 11:16 AM, Hoeun Ryu wrote: > > omap_uart_phys, omap_uart_virt and omap_uart_lsr reside in .data section > and it's right implementation. But because of this, we cannot enable > CONFIG_DEBUG_UNCOMPRESS. LL_DEBUG and DEBUG_UNCOMPRESS are very useful tools > for debugging early boot stage when something goes wrong if you don't have > any hardware based debugging tools like a JTAG debugger. > This patch is to put the variables into .text section instead only when the > DEBUG_LL_INCLUDE file is included in the kernel decompressor, which is only > when ZIMAGE is defined. > This patch does not change anything when DEBUG_LL_INCLUDE is included in > the other kernel parts like arch/arm/kernel/* > > Signed-off-by: Hoeun Ryu > --- > arch/arm/include/debug/omap2plus.S | 11 +++ > 1 file changed, 11 insertions(+) > > diff --git a/arch/arm/include/debug/omap2plus.S > b/arch/arm/include/debug/omap2plus.S > index 6d867ae..6ce6ef9 100644 > --- a/arch/arm/include/debug/omap2plus.S > +++ b/arch/arm/include/debug/omap2plus.S > @@ -58,11 +58,22 @@ > > #define UART_OFFSET(addr)((addr) & 0x00ff) > > +/* > + * Definition of ZIMAGE is in arch/arm/boot/compressed/Makefile. > + * Place the following block in .text section only when this file is > + * included by arch/arm/boot/compressed/* to make it possible to > + * enable CONFIG_DEBUG_UNCOMPRESS and DEBUG in > arch/arm/boot/compressed/head.S > + * on OMAP2+ SoCs. > + */ > +#ifndef ZIMAGE >.pushsection .data > +#endif > omap_uart_phys:.word0 > omap_uart_virt:.word0 > omap_uart_lsr:.word0 > +#ifndef ZIMAGE >.popsection > +#endif > >.macroaddruart, rp, rv, tmp > > -- > 2.7.4 >
Re: [PATCH] arm:lpae: build TTB control register value from scratch in v7_ttb_setup
Hello, Russell and Robin. Would you please review this patch ? Than you > On Jun 7, 2017, at 11:39 AM, Hoeun Ryu wrote: > > Reading TTBCR in early boot stage might return the value of the previous > kernel's configuration, especially in case of kexec. For example, if > normal kernel (first kernel) had run on a configuration of PHYS_OFFSET <= > PAGE_OFFSET and crash kernel (second kernel) is running on a configuration > PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the > reserved area for crash kernel, reading TTBCR and using the value to OR > other bit fields might be risky because it doesn't have a reset value for > TTBCR. > > Suggested-by: Robin Murphy > Signed-off-by: Hoeun Ryu > --- > > * v1: amended based on > - "[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when > PHYS_OFFSET > PAGE_OFFSET" > - https://lkml.org/lkml/2017/6/5/239 > > arch/arm/mm/proc-v7-3level.S | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S > index 5e5720e..7d16bbc 100644 > --- a/arch/arm/mm/proc-v7-3level.S > +++ b/arch/arm/mm/proc-v7-3level.S > @@ -129,8 +129,7 @@ ENDPROC(cpu_v7_set_pte_ext) >.macrov7_ttb_setup, zero, ttbr0l, ttbr0h, ttbr1, tmp >ldr\tmp, =swapper_pg_dir@ swapper_pg_dir virtual address >cmp\ttbr1, \tmp, lsr #12@ PHYS_OFFSET > PAGE_OFFSET? > -mrcp15, 0, \tmp, c2, c0, 2@ TTB control egister > -orr\tmp, \tmp, #TTB_EAE > +mov\tmp, #TTB_EAE@ for TTB control egister >ALT_SMP(orr\tmp, \tmp, #TTB_FLAGS_SMP) >ALT_UP(orr\tmp, \tmp, #TTB_FLAGS_UP) >ALT_SMP(orr\tmp, \tmp, #TTB_FLAGS_SMP << 16) > -- > 2.7.4 >
[PATCH 2/2] arm:omap2+: drop dependence on DEBUG_OMAP2PLUS_UART for DEBUG_UNCOMPRESS
With a patch 9a5151e1d1ba4df41bc4b8e8c0f17163247645ac, "arm:omap2+: put omap_uart_phys/virt/lsr in .text section when ZIMAGE is true", the dependence on DEBUG_OMAP2PLUS_UART for DEBUG_UNCOMPRESS can be dropped, because all code of omap2+'s DEBUG_LL_INCLUDE file is in .text section when it's included in the decompressor. Signed-off-by: Hoeun Ryu <hoeun@gmail.com> --- arch/arm/Kconfig.debug | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug index ba2cb63..52eb0bf 100644 --- a/arch/arm/Kconfig.debug +++ b/arch/arm/Kconfig.debug @@ -1735,8 +1735,7 @@ config DEBUG_UART_8250_FLOW_CONTROL config DEBUG_UNCOMPRESS bool depends on ARCH_MULTIPLATFORM || PLAT_SAMSUNG || ARM_SINGLE_ARMV7M - default y if DEBUG_LL && !DEBUG_OMAP2PLUS_UART && \ -(!DEBUG_TEGRA_UART || !ZBOOT_ROM) && \ + default y if DEBUG_LL && (!DEBUG_TEGRA_UART || !ZBOOT_ROM) && \ !DEBUG_BRCMSTB_UART help This option influences the normal decompressor output for -- 2.7.4
[PATCH 2/2] arm:omap2+: drop dependence on DEBUG_OMAP2PLUS_UART for DEBUG_UNCOMPRESS
With a patch 9a5151e1d1ba4df41bc4b8e8c0f17163247645ac, "arm:omap2+: put omap_uart_phys/virt/lsr in .text section when ZIMAGE is true", the dependence on DEBUG_OMAP2PLUS_UART for DEBUG_UNCOMPRESS can be dropped, because all code of omap2+'s DEBUG_LL_INCLUDE file is in .text section when it's included in the decompressor. Signed-off-by: Hoeun Ryu --- arch/arm/Kconfig.debug | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug index ba2cb63..52eb0bf 100644 --- a/arch/arm/Kconfig.debug +++ b/arch/arm/Kconfig.debug @@ -1735,8 +1735,7 @@ config DEBUG_UART_8250_FLOW_CONTROL config DEBUG_UNCOMPRESS bool depends on ARCH_MULTIPLATFORM || PLAT_SAMSUNG || ARM_SINGLE_ARMV7M - default y if DEBUG_LL && !DEBUG_OMAP2PLUS_UART && \ -(!DEBUG_TEGRA_UART || !ZBOOT_ROM) && \ + default y if DEBUG_LL && (!DEBUG_TEGRA_UART || !ZBOOT_ROM) && \ !DEBUG_BRCMSTB_UART help This option influences the normal decompressor output for -- 2.7.4
[PATCH 1/2] arm:omap2+: put omap_uart_phys/virt/lsr in .text section when ZIMAGE is true
omap_uart_phys, omap_uart_virt and omap_uart_lsr reside in .data section and it's right implementation. But because of this, we cannot enable CONFIG_DEBUG_UNCOMPRESS. LL_DEBUG and DEBUG_UNCOMPRESS are very useful tools for debugging early boot stage when something goes wrong if you don't have any hardware based debugging tools like a JTAG debugger. This patch is to put the variables into .text section instead only when the DEBUG_LL_INCLUDE file is included in the kernel decompressor, which is only when ZIMAGE is defined. This patch does not change anything when DEBUG_LL_INCLUDE is included in the other kernel parts like arch/arm/kernel/* Signed-off-by: Hoeun Ryu <hoeun@gmail.com> --- arch/arm/include/debug/omap2plus.S | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/arm/include/debug/omap2plus.S b/arch/arm/include/debug/omap2plus.S index 6d867ae..6ce6ef9 100644 --- a/arch/arm/include/debug/omap2plus.S +++ b/arch/arm/include/debug/omap2plus.S @@ -58,11 +58,22 @@ #define UART_OFFSET(addr) ((addr) & 0x00ff) +/* + * Definition of ZIMAGE is in arch/arm/boot/compressed/Makefile. + * Place the following block in .text section only when this file is + * included by arch/arm/boot/compressed/* to make it possible to + * enable CONFIG_DEBUG_UNCOMPRESS and DEBUG in arch/arm/boot/compressed/head.S + * on OMAP2+ SoCs. + */ +#ifndef ZIMAGE .pushsection .data +#endif omap_uart_phys:.word 0 omap_uart_virt:.word 0 omap_uart_lsr: .word 0 +#ifndef ZIMAGE .popsection +#endif .macro addruart, rp, rv, tmp -- 2.7.4
[PATCH 1/2] arm:omap2+: put omap_uart_phys/virt/lsr in .text section when ZIMAGE is true
omap_uart_phys, omap_uart_virt and omap_uart_lsr reside in .data section and it's right implementation. But because of this, we cannot enable CONFIG_DEBUG_UNCOMPRESS. LL_DEBUG and DEBUG_UNCOMPRESS are very useful tools for debugging early boot stage when something goes wrong if you don't have any hardware based debugging tools like a JTAG debugger. This patch is to put the variables into .text section instead only when the DEBUG_LL_INCLUDE file is included in the kernel decompressor, which is only when ZIMAGE is defined. This patch does not change anything when DEBUG_LL_INCLUDE is included in the other kernel parts like arch/arm/kernel/* Signed-off-by: Hoeun Ryu --- arch/arm/include/debug/omap2plus.S | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/arm/include/debug/omap2plus.S b/arch/arm/include/debug/omap2plus.S index 6d867ae..6ce6ef9 100644 --- a/arch/arm/include/debug/omap2plus.S +++ b/arch/arm/include/debug/omap2plus.S @@ -58,11 +58,22 @@ #define UART_OFFSET(addr) ((addr) & 0x00ff) +/* + * Definition of ZIMAGE is in arch/arm/boot/compressed/Makefile. + * Place the following block in .text section only when this file is + * included by arch/arm/boot/compressed/* to make it possible to + * enable CONFIG_DEBUG_UNCOMPRESS and DEBUG in arch/arm/boot/compressed/head.S + * on OMAP2+ SoCs. + */ +#ifndef ZIMAGE .pushsection .data +#endif omap_uart_phys:.word 0 omap_uart_virt:.word 0 omap_uart_lsr: .word 0 +#ifndef ZIMAGE .popsection +#endif .macro addruart, rp, rv, tmp -- 2.7.4
[PATCH] arm:lpae: build TTB control register value from scratch in v7_ttb_setup
Reading TTBCR in early boot stage might return the value of the previous kernel's configuration, especially in case of kexec. For example, if normal kernel (first kernel) had run on a configuration of PHYS_OFFSET <= PAGE_OFFSET and crash kernel (second kernel) is running on a configuration PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the reserved area for crash kernel, reading TTBCR and using the value to OR other bit fields might be risky because it doesn't have a reset value for TTBCR. Suggested-by: Robin Murphy <robin.mur...@arm.com> Signed-off-by: Hoeun Ryu <hoeun@gmail.com> --- * v1: amended based on - "[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when PHYS_OFFSET > PAGE_OFFSET" - https://lkml.org/lkml/2017/6/5/239 arch/arm/mm/proc-v7-3level.S | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S index 5e5720e..7d16bbc 100644 --- a/arch/arm/mm/proc-v7-3level.S +++ b/arch/arm/mm/proc-v7-3level.S @@ -129,8 +129,7 @@ ENDPROC(cpu_v7_set_pte_ext) .macro v7_ttb_setup, zero, ttbr0l, ttbr0h, ttbr1, tmp ldr \tmp, =swapper_pg_dir @ swapper_pg_dir virtual address cmp \ttbr1, \tmp, lsr #12 @ PHYS_OFFSET > PAGE_OFFSET? - mrc p15, 0, \tmp, c2, c0, 2 @ TTB control egister - orr \tmp, \tmp, #TTB_EAE + mov \tmp, #TTB_EAE @ for TTB control egister ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP) ALT_UP(orr \tmp, \tmp, #TTB_FLAGS_UP) ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP << 16) -- 2.7.4
[PATCH] arm:lpae: build TTB control register value from scratch in v7_ttb_setup
Reading TTBCR in early boot stage might return the value of the previous kernel's configuration, especially in case of kexec. For example, if normal kernel (first kernel) had run on a configuration of PHYS_OFFSET <= PAGE_OFFSET and crash kernel (second kernel) is running on a configuration PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the reserved area for crash kernel, reading TTBCR and using the value to OR other bit fields might be risky because it doesn't have a reset value for TTBCR. Suggested-by: Robin Murphy Signed-off-by: Hoeun Ryu --- * v1: amended based on - "[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when PHYS_OFFSET > PAGE_OFFSET" - https://lkml.org/lkml/2017/6/5/239 arch/arm/mm/proc-v7-3level.S | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S index 5e5720e..7d16bbc 100644 --- a/arch/arm/mm/proc-v7-3level.S +++ b/arch/arm/mm/proc-v7-3level.S @@ -129,8 +129,7 @@ ENDPROC(cpu_v7_set_pte_ext) .macro v7_ttb_setup, zero, ttbr0l, ttbr0h, ttbr1, tmp ldr \tmp, =swapper_pg_dir @ swapper_pg_dir virtual address cmp \ttbr1, \tmp, lsr #12 @ PHYS_OFFSET > PAGE_OFFSET? - mrc p15, 0, \tmp, c2, c0, 2 @ TTB control egister - orr \tmp, \tmp, #TTB_EAE + mov \tmp, #TTB_EAE @ for TTB control egister ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP) ALT_UP(orr \tmp, \tmp, #TTB_FLAGS_UP) ALT_SMP(orr \tmp, \tmp, #TTB_FLAGS_SMP << 16) -- 2.7.4
Re: [PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when PHYS_OFFSET > PAGE_OFFSET
>> On Jun 5, 2017, at 7:30 PM, Robin Murphy <robin.mur...@arm.com> wrote: >> >> On 05/06/17 11:06, Hoeun Ryu wrote: >> Clearing TTBCR.T1SZ explicitly when kernel runs on a configuration of >> PHYS_OFFSET > PAGE_OFFSET. >> Reading TTBCR in early boot stage might returns the value of the >> previous kernel's configuration, especially in case of kexec. For >> example, if normal kernel (first kernel) had run on a configuration >> of PHYS_OFFSET <= PAGE_OFFSET and crash kernel (second kernel) is >> running on a configuration PHYS_OFFSET > PAGE_OFFSET, which can happen >> because it depends on the reserved area for crash kernel, reading >> TTBCR and using the value without clearing TTBCR.T1SZ might risky >> because the value doesn't have a reset value for TTBCR.T1SZ. > > Hmm, but isn't that also strictly true of all the other fields we're > ORing TTB_FLAGS_* into? Given that this is initial setup, I wonder > whether we need to care about the previous TTBCR value at all - is there > any reason for not simply building up the value we want from scratch? Your idea seems better. I'll send a new patch to build TTBCR from scratch without reading it for the initial value. Russell, do you have any opinion ? Thank toy for the review. > > Robin. > >> Signed-off-by: Hoeun Ryu <hoeun@gmail.com> >> --- >> arch/arm/mm/proc-v7-3level.S | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S >> index 5e5720e..81404b8 100644 >> --- a/arch/arm/mm/proc-v7-3level.S >> +++ b/arch/arm/mm/proc-v7-3level.S >> @@ -140,6 +140,7 @@ ENDPROC(cpu_v7_set_pte_ext) >>* otherwise booting secondary CPUs would end up using TTBR1 for the >>* identity mapping set up in TTBR0. >>*/ >> +bichi\tmp, \tmp, #(7 << 16)@ clear TTBCR.T1SZ >> orrls\tmp, \tmp, #TTBR1_SIZE@ TTBCR.T1SZ >> mcrp15, 0, \tmp, c2, c0, 2@ TTBCR >> mov\tmp, \ttbr1, lsr #20 >
Re: [PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when PHYS_OFFSET > PAGE_OFFSET
>> On Jun 5, 2017, at 7:30 PM, Robin Murphy wrote: >> >> On 05/06/17 11:06, Hoeun Ryu wrote: >> Clearing TTBCR.T1SZ explicitly when kernel runs on a configuration of >> PHYS_OFFSET > PAGE_OFFSET. >> Reading TTBCR in early boot stage might returns the value of the >> previous kernel's configuration, especially in case of kexec. For >> example, if normal kernel (first kernel) had run on a configuration >> of PHYS_OFFSET <= PAGE_OFFSET and crash kernel (second kernel) is >> running on a configuration PHYS_OFFSET > PAGE_OFFSET, which can happen >> because it depends on the reserved area for crash kernel, reading >> TTBCR and using the value without clearing TTBCR.T1SZ might risky >> because the value doesn't have a reset value for TTBCR.T1SZ. > > Hmm, but isn't that also strictly true of all the other fields we're > ORing TTB_FLAGS_* into? Given that this is initial setup, I wonder > whether we need to care about the previous TTBCR value at all - is there > any reason for not simply building up the value we want from scratch? Your idea seems better. I'll send a new patch to build TTBCR from scratch without reading it for the initial value. Russell, do you have any opinion ? Thank toy for the review. > > Robin. > >> Signed-off-by: Hoeun Ryu >> --- >> arch/arm/mm/proc-v7-3level.S | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S >> index 5e5720e..81404b8 100644 >> --- a/arch/arm/mm/proc-v7-3level.S >> +++ b/arch/arm/mm/proc-v7-3level.S >> @@ -140,6 +140,7 @@ ENDPROC(cpu_v7_set_pte_ext) >>* otherwise booting secondary CPUs would end up using TTBR1 for the >>* identity mapping set up in TTBR0. >>*/ >> +bichi\tmp, \tmp, #(7 << 16)@ clear TTBCR.T1SZ >> orrls\tmp, \tmp, #TTBR1_SIZE@ TTBCR.T1SZ >> mcrp15, 0, \tmp, c2, c0, 2@ TTBCR >> mov\tmp, \ttbr1, lsr #20 >
[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when PHYS_OFFSET > PAGE_OFFSET
Clearing TTBCR.T1SZ explicitly when kernel runs on a configuration of PHYS_OFFSET > PAGE_OFFSET. Reading TTBCR in early boot stage might returns the value of the previous kernel's configuration, especially in case of kexec. For example, if normal kernel (first kernel) had run on a configuration of PHYS_OFFSET <= PAGE_OFFSET and crash kernel (second kernel) is running on a configuration PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the reserved area for crash kernel, reading TTBCR and using the value without clearing TTBCR.T1SZ might risky because the value doesn't have a reset value for TTBCR.T1SZ. Signed-off-by: Hoeun Ryu <hoeun@gmail.com> --- arch/arm/mm/proc-v7-3level.S | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S index 5e5720e..81404b8 100644 --- a/arch/arm/mm/proc-v7-3level.S +++ b/arch/arm/mm/proc-v7-3level.S @@ -140,6 +140,7 @@ ENDPROC(cpu_v7_set_pte_ext) * otherwise booting secondary CPUs would end up using TTBR1 for the * identity mapping set up in TTBR0. */ + bichi \tmp, \tmp, #(7 << 16) @ clear TTBCR.T1SZ orrls \tmp, \tmp, #TTBR1_SIZE @ TTBCR.T1SZ mcr p15, 0, \tmp, c2, c0, 2 @ TTBCR mov \tmp, \ttbr1, lsr #20 -- 2.7.4
[PATCHv2] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when PHYS_OFFSET > PAGE_OFFSET
Clearing TTBCR.T1SZ explicitly when kernel runs on a configuration of PHYS_OFFSET > PAGE_OFFSET. Reading TTBCR in early boot stage might returns the value of the previous kernel's configuration, especially in case of kexec. For example, if normal kernel (first kernel) had run on a configuration of PHYS_OFFSET <= PAGE_OFFSET and crash kernel (second kernel) is running on a configuration PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the reserved area for crash kernel, reading TTBCR and using the value without clearing TTBCR.T1SZ might risky because the value doesn't have a reset value for TTBCR.T1SZ. Signed-off-by: Hoeun Ryu --- arch/arm/mm/proc-v7-3level.S | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S index 5e5720e..81404b8 100644 --- a/arch/arm/mm/proc-v7-3level.S +++ b/arch/arm/mm/proc-v7-3level.S @@ -140,6 +140,7 @@ ENDPROC(cpu_v7_set_pte_ext) * otherwise booting secondary CPUs would end up using TTBR1 for the * identity mapping set up in TTBR0. */ + bichi \tmp, \tmp, #(7 << 16) @ clear TTBCR.T1SZ orrls \tmp, \tmp, #TTBR1_SIZE @ TTBCR.T1SZ mcr p15, 0, \tmp, c2, c0, 2 @ TTBCR mov \tmp, \ttbr1, lsr #20 -- 2.7.4
Re: [PATCH] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when PHYS_OFFSET > PAGE_OFFSET
On Mon, Jun 5, 2017 at 6:34 PM, Russell King - ARM Linux <li...@armlinux.org.uk> wrote: > On Mon, Jun 05, 2017 at 06:22:20PM +0900, Hoeun Ryu wrote: >> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S >> index 5e5720e..9ac2bec 100644 >> --- a/arch/arm/mm/proc-v7-3level.S >> +++ b/arch/arm/mm/proc-v7-3level.S >> @@ -140,6 +140,7 @@ ENDPROC(cpu_v7_set_pte_ext) >>* otherwise booting secondary CPUs would end up using TTBR1 for the >>* identity mapping set up in TTBR0. >>*/ >> + bichi \tmp, \tmp, #(1 << 16) @ clear >> TTBCR.T1SZ > > This looks insufficient. There's two bits here: > > * TTBR0/TTBR1 split (PAGE_OFFSET): > * 0x4000: T0SZ = 2, T1SZ = 0 (not used) > * 0x8000: T0SZ = 0, T1SZ = 1 > * 0xc000: T0SZ = 0, T1SZ = 2 > > but you seem to only be clearing one bit. Oh, I'm sorry for the mistake, I'll fix this like #(7 << 16) in v2. (There're 3 bits in TxSZ) Thank you for the review. > > -- > RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up > according to speedtest.net.
Re: [PATCH] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when PHYS_OFFSET > PAGE_OFFSET
On Mon, Jun 5, 2017 at 6:34 PM, Russell King - ARM Linux wrote: > On Mon, Jun 05, 2017 at 06:22:20PM +0900, Hoeun Ryu wrote: >> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S >> index 5e5720e..9ac2bec 100644 >> --- a/arch/arm/mm/proc-v7-3level.S >> +++ b/arch/arm/mm/proc-v7-3level.S >> @@ -140,6 +140,7 @@ ENDPROC(cpu_v7_set_pte_ext) >>* otherwise booting secondary CPUs would end up using TTBR1 for the >>* identity mapping set up in TTBR0. >>*/ >> + bichi \tmp, \tmp, #(1 << 16) @ clear >> TTBCR.T1SZ > > This looks insufficient. There's two bits here: > > * TTBR0/TTBR1 split (PAGE_OFFSET): > * 0x4000: T0SZ = 2, T1SZ = 0 (not used) > * 0x8000: T0SZ = 0, T1SZ = 1 > * 0xc000: T0SZ = 0, T1SZ = 2 > > but you seem to only be clearing one bit. Oh, I'm sorry for the mistake, I'll fix this like #(7 << 16) in v2. (There're 3 bits in TxSZ) Thank you for the review. > > -- > RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up > according to speedtest.net.
[PATCH] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when PHYS_OFFSET > PAGE_OFFSET
Clearing TTBCR.T1SZ explicitly when kernel runs on a configuration of PHYS_OFFSET > PAGE_OFFSET. Reading TTBCR in early boot stage might returns the value of the previous kernel's configuration, especially in case of kexec. For example, if normal kernel (first kernel) had run on a configuration of PHYS_OFFSET <= PAGE_OFFSET and crash kernel (second kernel) is running on a configuration PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the reserved area for crash kernel, reading TTBCR and using the value without clearing TTBCR.T1SZ might risky because the value doesn't have a reset value for TTBCR.T1SZ. Signed-off-by: Hoeun Ryu <hoeun@gmail.com> --- arch/arm/mm/proc-v7-3level.S | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S index 5e5720e..9ac2bec 100644 --- a/arch/arm/mm/proc-v7-3level.S +++ b/arch/arm/mm/proc-v7-3level.S @@ -140,6 +140,7 @@ ENDPROC(cpu_v7_set_pte_ext) * otherwise booting secondary CPUs would end up using TTBR1 for the * identity mapping set up in TTBR0. */ + bichi \tmp, \tmp, #(1 << 16) @ clear TTBCR.T1SZ orrls \tmp, \tmp, #TTBR1_SIZE @ TTBCR.T1SZ mcr p15, 0, \tmp, c2, c0, 2 @ TTBCR mov \tmp, \ttbr1, lsr #20 -- 2.7.4
[PATCH] arm: LPAE: kexec: clear TTBCR.T1SZ explicitly when PHYS_OFFSET > PAGE_OFFSET
Clearing TTBCR.T1SZ explicitly when kernel runs on a configuration of PHYS_OFFSET > PAGE_OFFSET. Reading TTBCR in early boot stage might returns the value of the previous kernel's configuration, especially in case of kexec. For example, if normal kernel (first kernel) had run on a configuration of PHYS_OFFSET <= PAGE_OFFSET and crash kernel (second kernel) is running on a configuration PHYS_OFFSET > PAGE_OFFSET, which can happen because it depends on the reserved area for crash kernel, reading TTBCR and using the value without clearing TTBCR.T1SZ might risky because the value doesn't have a reset value for TTBCR.T1SZ. Signed-off-by: Hoeun Ryu --- arch/arm/mm/proc-v7-3level.S | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S index 5e5720e..9ac2bec 100644 --- a/arch/arm/mm/proc-v7-3level.S +++ b/arch/arm/mm/proc-v7-3level.S @@ -140,6 +140,7 @@ ENDPROC(cpu_v7_set_pte_ext) * otherwise booting secondary CPUs would end up using TTBR1 for the * identity mapping set up in TTBR0. */ + bichi \tmp, \tmp, #(1 << 16) @ clear TTBCR.T1SZ orrls \tmp, \tmp, #TTBR1_SIZE @ TTBCR.T1SZ mcr p15, 0, \tmp, c2, c0, 2 @ TTBCR mov \tmp, \ttbr1, lsr #20 -- 2.7.4
Re: [PATCH v2] mm: add VM_STATIC flag to vmalloc and prevent from removing the areas
> On Apr 18, 2017, at 3:59 PM, Michal Hocko <mho...@kernel.org> wrote: > >> On Tue 18-04-17 14:48:39, Hoeun Ryu wrote: >> vm_area_add_early/vm_area_register_early() are used to reserve vmalloc area >> during boot process and those virtually mapped areas are never unmapped. >> So `OR` VM_STATIC flag to the areas in vmalloc_init() when importing >> existing vmlist entries and prevent those areas from being removed from the >> rbtree by accident. > > Has this been a problem in the past or currently so that it is worth > handling? > >> This flags can be also used by other vmalloc APIs to >> specify that the area will never go away. > > Do we have a user for that? > >> This makes remove_vm_area() more robust against other kind of errors (eg. >> programming errors). > > Well, yes it will help to prevent from vfree(early_mem) but we have 4 > users of vm_area_register_early so I am really wondering whether this is > worth additional code. It would really help to understand your > motivation for the patch if we were explicit about the problem you are > trying to solve. I just think that it would be good to make it robust against various kind of errors. You might think that's not an enough reason to do so though. > > Thanks > > -- > Michal Hocko > SUSE Labs
Re: [PATCH v2] mm: add VM_STATIC flag to vmalloc and prevent from removing the areas
> On Apr 18, 2017, at 3:59 PM, Michal Hocko wrote: > >> On Tue 18-04-17 14:48:39, Hoeun Ryu wrote: >> vm_area_add_early/vm_area_register_early() are used to reserve vmalloc area >> during boot process and those virtually mapped areas are never unmapped. >> So `OR` VM_STATIC flag to the areas in vmalloc_init() when importing >> existing vmlist entries and prevent those areas from being removed from the >> rbtree by accident. > > Has this been a problem in the past or currently so that it is worth > handling? > >> This flags can be also used by other vmalloc APIs to >> specify that the area will never go away. > > Do we have a user for that? > >> This makes remove_vm_area() more robust against other kind of errors (eg. >> programming errors). > > Well, yes it will help to prevent from vfree(early_mem) but we have 4 > users of vm_area_register_early so I am really wondering whether this is > worth additional code. It would really help to understand your > motivation for the patch if we were explicit about the problem you are > trying to solve. I just think that it would be good to make it robust against various kind of errors. You might think that's not an enough reason to do so though. > > Thanks > > -- > Michal Hocko > SUSE Labs
[PATCH v2] mm: add VM_STATIC flag to vmalloc and prevent from removing the areas
vm_area_add_early/vm_area_register_early() are used to reserve vmalloc area during boot process and those virtually mapped areas are never unmapped. So `OR` VM_STATIC flag to the areas in vmalloc_init() when importing existing vmlist entries and prevent those areas from being removed from the rbtree by accident. This flags can be also used by other vmalloc APIs to specify that the area will never go away. This makes remove_vm_area() more robust against other kind of errors (eg. programming errors). Signed-off-by: Hoeun Ryu <hoeun@gmail.com> --- v2: - update changelog - add description to VM_STATIC - check VM_STATIC first before VM_VM_AREA in remove_vm_area() include/linux/vmalloc.h | 7 +++ mm/vmalloc.c| 9 ++--- 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 46991ad..a42c210 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -19,6 +19,13 @@ struct notifier_block; /* in notifier.h */ #define VM_UNINITIALIZED 0x0020 /* vm_struct is not fully initialized */ #define VM_NO_GUARD0x0040 /* don't add guard page */ #define VM_KASAN 0x0080 /* has allocated kasan shadow memory */ +/* + * static area, vmap_area will never go away. + * This flag is OR-ed automatically in vmalloc_init() if the area is inserted + * by vm_area_add_early()/vm_area_register_early() during early boot process + * or you can give this flag manually using other vmalloc APIs. + */ +#define VM_STATIC 0x0200 /* bits [20..32] reserved for arch specific ioremap internals */ /* diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 8ef8ea1..6bc6c39 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1262,7 +1262,7 @@ void __init vmalloc_init(void) /* Import existing vmlist entries. */ for (tmp = vmlist; tmp; tmp = tmp->next) { va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT); - va->flags = VM_VM_AREA; + va->flags = VM_VM_AREA | VM_STATIC; va->va_start = (unsigned long)tmp->addr; va->va_end = va->va_start + tmp->size; va->vm = tmp; @@ -1480,7 +1480,7 @@ struct vm_struct *remove_vm_area(const void *addr) might_sleep(); va = find_vmap_area((unsigned long)addr); - if (va && va->flags & VM_VM_AREA) { + if (va && likely(!(va->flags & VM_STATIC)) && va->flags & VM_VM_AREA) { struct vm_struct *vm = va->vm; spin_lock(_area_lock); @@ -1510,7 +1510,7 @@ static void __vunmap(const void *addr, int deallocate_pages) area = remove_vm_area(addr); if (unlikely(!area)) { - WARN(1, KERN_ERR "Trying to vfree() nonexistent vm area (%p)\n", + WARN(1, KERN_ERR "Trying to vfree() nonexistent or static vm area (%p)\n", addr); return; } @@ -2708,6 +2708,9 @@ static int s_show(struct seq_file *m, void *p) if (v->phys_addr) seq_printf(m, " phys=%pa", >phys_addr); + if (v->flags & VM_STATIC) + seq_puts(m, " static"); + if (v->flags & VM_IOREMAP) seq_puts(m, " ioremap"); -- 2.7.4
[PATCH v2] mm: add VM_STATIC flag to vmalloc and prevent from removing the areas
vm_area_add_early/vm_area_register_early() are used to reserve vmalloc area during boot process and those virtually mapped areas are never unmapped. So `OR` VM_STATIC flag to the areas in vmalloc_init() when importing existing vmlist entries and prevent those areas from being removed from the rbtree by accident. This flags can be also used by other vmalloc APIs to specify that the area will never go away. This makes remove_vm_area() more robust against other kind of errors (eg. programming errors). Signed-off-by: Hoeun Ryu --- v2: - update changelog - add description to VM_STATIC - check VM_STATIC first before VM_VM_AREA in remove_vm_area() include/linux/vmalloc.h | 7 +++ mm/vmalloc.c| 9 ++--- 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 46991ad..a42c210 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -19,6 +19,13 @@ struct notifier_block; /* in notifier.h */ #define VM_UNINITIALIZED 0x0020 /* vm_struct is not fully initialized */ #define VM_NO_GUARD0x0040 /* don't add guard page */ #define VM_KASAN 0x0080 /* has allocated kasan shadow memory */ +/* + * static area, vmap_area will never go away. + * This flag is OR-ed automatically in vmalloc_init() if the area is inserted + * by vm_area_add_early()/vm_area_register_early() during early boot process + * or you can give this flag manually using other vmalloc APIs. + */ +#define VM_STATIC 0x0200 /* bits [20..32] reserved for arch specific ioremap internals */ /* diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 8ef8ea1..6bc6c39 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1262,7 +1262,7 @@ void __init vmalloc_init(void) /* Import existing vmlist entries. */ for (tmp = vmlist; tmp; tmp = tmp->next) { va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT); - va->flags = VM_VM_AREA; + va->flags = VM_VM_AREA | VM_STATIC; va->va_start = (unsigned long)tmp->addr; va->va_end = va->va_start + tmp->size; va->vm = tmp; @@ -1480,7 +1480,7 @@ struct vm_struct *remove_vm_area(const void *addr) might_sleep(); va = find_vmap_area((unsigned long)addr); - if (va && va->flags & VM_VM_AREA) { + if (va && likely(!(va->flags & VM_STATIC)) && va->flags & VM_VM_AREA) { struct vm_struct *vm = va->vm; spin_lock(_area_lock); @@ -1510,7 +1510,7 @@ static void __vunmap(const void *addr, int deallocate_pages) area = remove_vm_area(addr); if (unlikely(!area)) { - WARN(1, KERN_ERR "Trying to vfree() nonexistent vm area (%p)\n", + WARN(1, KERN_ERR "Trying to vfree() nonexistent or static vm area (%p)\n", addr); return; } @@ -2708,6 +2708,9 @@ static int s_show(struct seq_file *m, void *p) if (v->phys_addr) seq_printf(m, " phys=%pa", >phys_addr); + if (v->flags & VM_STATIC) + seq_puts(m, " static"); + if (v->flags & VM_IOREMAP) seq_puts(m, " ioremap"); -- 2.7.4
[PATCH v2] mm: add VM_STATIC flag to vmalloc and prevent from removing the areas
vm_area_add_early/vm_area_register_early() are used to reserve vmalloc area during boot process and those virtually mapped areas are never unmapped. So `OR` VM_STATIC flag to the areas in vmalloc_init() when importing existing vmlist entries and prevent those areas from being removed from the rbtree by accident. This flags can be also used by other vmalloc APIs to specify that the area will never go away. This makes remove_vm_area() more robust against other kind of errors (eg. programming errors). Signed-off-by: Hoeun Ryu <hoeun@gmail.com> --- v2: - update changelog - add description to VM_STATIC - check VM_STATIC first before VM_VM_AREA in remove_vm_area() include/linux/vmalloc.h | 7 +++ mm/vmalloc.c| 9 ++--- 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 46991ad..a42c210 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -19,6 +19,13 @@ struct notifier_block; /* in notifier.h */ #define VM_UNINITIALIZED 0x0020 /* vm_struct is not fully initialized */ #define VM_NO_GUARD0x0040 /* don't add guard page */ #define VM_KASAN 0x0080 /* has allocated kasan shadow memory */ +/* + * static area, vmap_area will never go away. + * This flag is OR-ed automatically in vmalloc_init() if the area is inserted + * by vm_area_add_early()/vm_area_register_early() during early boot process + * or you can give this flag manually using other vmalloc APIs. + */ +#define VM_STATIC 0x0200 /* bits [20..32] reserved for arch specific ioremap internals */ /* diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 8ef8ea1..6bc6c39 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1262,7 +1262,7 @@ void __init vmalloc_init(void) /* Import existing vmlist entries. */ for (tmp = vmlist; tmp; tmp = tmp->next) { va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT); - va->flags = VM_VM_AREA; + va->flags = VM_VM_AREA | VM_STATIC; va->va_start = (unsigned long)tmp->addr; va->va_end = va->va_start + tmp->size; va->vm = tmp; @@ -1480,7 +1480,7 @@ struct vm_struct *remove_vm_area(const void *addr) might_sleep(); va = find_vmap_area((unsigned long)addr); - if (va && va->flags & VM_VM_AREA) { + if (va && likely(!(va->flags & VM_STATIC)) && va->flags & VM_VM_AREA) { struct vm_struct *vm = va->vm; spin_lock(_area_lock); @@ -1510,7 +1510,7 @@ static void __vunmap(const void *addr, int deallocate_pages) area = remove_vm_area(addr); if (unlikely(!area)) { - WARN(1, KERN_ERR "Trying to vfree() nonexistent vm area (%p)\n", + WARN(1, KERN_ERR "Trying to vfree() nonexistent or static vm area (%p)\n", addr); return; } @@ -2708,6 +2708,9 @@ static int s_show(struct seq_file *m, void *p) if (v->phys_addr) seq_printf(m, " phys=%pa", >phys_addr); + if (v->flags & VM_STATIC) + seq_puts(m, " static"); + if (v->flags & VM_IOREMAP) seq_puts(m, " ioremap"); -- 2.7.4
[PATCH v2] mm: add VM_STATIC flag to vmalloc and prevent from removing the areas
vm_area_add_early/vm_area_register_early() are used to reserve vmalloc area during boot process and those virtually mapped areas are never unmapped. So `OR` VM_STATIC flag to the areas in vmalloc_init() when importing existing vmlist entries and prevent those areas from being removed from the rbtree by accident. This flags can be also used by other vmalloc APIs to specify that the area will never go away. This makes remove_vm_area() more robust against other kind of errors (eg. programming errors). Signed-off-by: Hoeun Ryu --- v2: - update changelog - add description to VM_STATIC - check VM_STATIC first before VM_VM_AREA in remove_vm_area() include/linux/vmalloc.h | 7 +++ mm/vmalloc.c| 9 ++--- 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 46991ad..a42c210 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -19,6 +19,13 @@ struct notifier_block; /* in notifier.h */ #define VM_UNINITIALIZED 0x0020 /* vm_struct is not fully initialized */ #define VM_NO_GUARD0x0040 /* don't add guard page */ #define VM_KASAN 0x0080 /* has allocated kasan shadow memory */ +/* + * static area, vmap_area will never go away. + * This flag is OR-ed automatically in vmalloc_init() if the area is inserted + * by vm_area_add_early()/vm_area_register_early() during early boot process + * or you can give this flag manually using other vmalloc APIs. + */ +#define VM_STATIC 0x0200 /* bits [20..32] reserved for arch specific ioremap internals */ /* diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 8ef8ea1..6bc6c39 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1262,7 +1262,7 @@ void __init vmalloc_init(void) /* Import existing vmlist entries. */ for (tmp = vmlist; tmp; tmp = tmp->next) { va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT); - va->flags = VM_VM_AREA; + va->flags = VM_VM_AREA | VM_STATIC; va->va_start = (unsigned long)tmp->addr; va->va_end = va->va_start + tmp->size; va->vm = tmp; @@ -1480,7 +1480,7 @@ struct vm_struct *remove_vm_area(const void *addr) might_sleep(); va = find_vmap_area((unsigned long)addr); - if (va && va->flags & VM_VM_AREA) { + if (va && likely(!(va->flags & VM_STATIC)) && va->flags & VM_VM_AREA) { struct vm_struct *vm = va->vm; spin_lock(_area_lock); @@ -1510,7 +1510,7 @@ static void __vunmap(const void *addr, int deallocate_pages) area = remove_vm_area(addr); if (unlikely(!area)) { - WARN(1, KERN_ERR "Trying to vfree() nonexistent vm area (%p)\n", + WARN(1, KERN_ERR "Trying to vfree() nonexistent or static vm area (%p)\n", addr); return; } @@ -2708,6 +2708,9 @@ static int s_show(struct seq_file *m, void *p) if (v->phys_addr) seq_printf(m, " phys=%pa", >phys_addr); + if (v->flags & VM_STATIC) + seq_puts(m, " static"); + if (v->flags & VM_IOREMAP) seq_puts(m, " ioremap"); -- 2.7.4
Re: [PATCH] mm: add VM_STATIC flag to vmalloc and prevent from removing the areas
> On Apr 13, 2017, at 1:17 PM, Anshuman Khandual <khand...@linux.vnet.ibm.com> > wrote: > >> On 04/12/2017 10:31 AM, Hoeun Ryu wrote: >> vm_area_add_early/vm_area_register_early() are used to reserve vmalloc area >> during boot process and those virtually mapped areas are never unmapped. >> So `OR` VM_STATIC flag to the areas in vmalloc_init() when importing >> existing vmlist entries and prevent those areas from being removed from the >> rbtree by accident. > > I am wondering whether protection against accidental deletion > of any vmap area should be done in remove_vm_area() function > or the callers should take care of it. But I guess either way > it works. > >> >> Signed-off-by: Hoeun Ryu <hoeun@gmail.com> >> --- >> include/linux/vmalloc.h | 1 + >> mm/vmalloc.c| 9 ++--- >> 2 files changed, 7 insertions(+), 3 deletions(-) >> >> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h >> index 46991ad..3df53fc 100644 >> --- a/include/linux/vmalloc.h >> +++ b/include/linux/vmalloc.h >> @@ -19,6 +19,7 @@ struct notifier_block;/* in notifier.h */ >> #define VM_UNINITIALIZED0x0020/* vm_struct is not fully >> initialized */ >> #define VM_NO_GUARD0x0040 /* don't add guard page */ >> #define VM_KASAN0x0080 /* has allocated kasan shadow memory >> */ >> +#define VM_STATIC0x0200 > > You might want to add some description in the comment saying > its a sticky VM area which will never go away or something. > OK. I will add some description. >> /* bits [20..32] reserved for arch specific ioremap internals */ >> >> /* >> diff --git a/mm/vmalloc.c b/mm/vmalloc.c >> index 8ef8ea1..fb5049a 100644 >> --- a/mm/vmalloc.c >> +++ b/mm/vmalloc.c >> @@ -1262,7 +1262,7 @@ void __init vmalloc_init(void) >>/* Import existing vmlist entries. */ >>for (tmp = vmlist; tmp; tmp = tmp->next) { >>va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT); >> -va->flags = VM_VM_AREA; >> +va->flags = VM_VM_AREA | VM_STATIC; >>va->va_start = (unsigned long)tmp->addr; >>va->va_end = va->va_start + tmp->size; >>va->vm = tmp; >> @@ -1480,7 +1480,7 @@ struct vm_struct *remove_vm_area(const void *addr) >>might_sleep(); >> >>va = find_vmap_area((unsigned long)addr); >> -if (va && va->flags & VM_VM_AREA) { >> +if (va && va->flags & VM_VM_AREA && likely(!(va->flags & VM_STATIC))) { > > > You might want to move the VM_STATIC check before the VM_VM_AREA > check so in cases where the former is set we can save one more > conditional check. > OK, I'll fix this in the next version Thank you for the review.
Re: [PATCH] mm: add VM_STATIC flag to vmalloc and prevent from removing the areas
> On Apr 13, 2017, at 1:17 PM, Anshuman Khandual > wrote: > >> On 04/12/2017 10:31 AM, Hoeun Ryu wrote: >> vm_area_add_early/vm_area_register_early() are used to reserve vmalloc area >> during boot process and those virtually mapped areas are never unmapped. >> So `OR` VM_STATIC flag to the areas in vmalloc_init() when importing >> existing vmlist entries and prevent those areas from being removed from the >> rbtree by accident. > > I am wondering whether protection against accidental deletion > of any vmap area should be done in remove_vm_area() function > or the callers should take care of it. But I guess either way > it works. > >> >> Signed-off-by: Hoeun Ryu >> --- >> include/linux/vmalloc.h | 1 + >> mm/vmalloc.c| 9 ++--- >> 2 files changed, 7 insertions(+), 3 deletions(-) >> >> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h >> index 46991ad..3df53fc 100644 >> --- a/include/linux/vmalloc.h >> +++ b/include/linux/vmalloc.h >> @@ -19,6 +19,7 @@ struct notifier_block;/* in notifier.h */ >> #define VM_UNINITIALIZED0x0020/* vm_struct is not fully >> initialized */ >> #define VM_NO_GUARD0x0040 /* don't add guard page */ >> #define VM_KASAN0x0080 /* has allocated kasan shadow memory >> */ >> +#define VM_STATIC0x0200 > > You might want to add some description in the comment saying > its a sticky VM area which will never go away or something. > OK. I will add some description. >> /* bits [20..32] reserved for arch specific ioremap internals */ >> >> /* >> diff --git a/mm/vmalloc.c b/mm/vmalloc.c >> index 8ef8ea1..fb5049a 100644 >> --- a/mm/vmalloc.c >> +++ b/mm/vmalloc.c >> @@ -1262,7 +1262,7 @@ void __init vmalloc_init(void) >>/* Import existing vmlist entries. */ >>for (tmp = vmlist; tmp; tmp = tmp->next) { >>va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT); >> -va->flags = VM_VM_AREA; >> +va->flags = VM_VM_AREA | VM_STATIC; >>va->va_start = (unsigned long)tmp->addr; >>va->va_end = va->va_start + tmp->size; >>va->vm = tmp; >> @@ -1480,7 +1480,7 @@ struct vm_struct *remove_vm_area(const void *addr) >>might_sleep(); >> >>va = find_vmap_area((unsigned long)addr); >> -if (va && va->flags & VM_VM_AREA) { >> +if (va && va->flags & VM_VM_AREA && likely(!(va->flags & VM_STATIC))) { > > > You might want to move the VM_STATIC check before the VM_VM_AREA > check so in cases where the former is set we can save one more > conditional check. > OK, I'll fix this in the next version Thank you for the review.
Re: [PATCH] mm: add VM_STATIC flag to vmalloc and prevent from removing the areas
> On Apr 12, 2017, at 3:02 PM, Christoph Hellwig <h...@infradead.org> wrote: > >> On Wed, Apr 12, 2017 at 02:01:59PM +0900, Hoeun Ryu wrote: >> vm_area_add_early/vm_area_register_early() are used to reserve vmalloc area >> during boot process and those virtually mapped areas are never unmapped. >> So `OR` VM_STATIC flag to the areas in vmalloc_init() when importing >> existing vmlist entries and prevent those areas from being removed from the >> rbtree by accident. > > How would they be removed "by accident"? I don't mean actual use-cases, but I just want to make it robust against like programming errors.
Re: [PATCH] mm: add VM_STATIC flag to vmalloc and prevent from removing the areas
> On Apr 12, 2017, at 3:02 PM, Christoph Hellwig wrote: > >> On Wed, Apr 12, 2017 at 02:01:59PM +0900, Hoeun Ryu wrote: >> vm_area_add_early/vm_area_register_early() are used to reserve vmalloc area >> during boot process and those virtually mapped areas are never unmapped. >> So `OR` VM_STATIC flag to the areas in vmalloc_init() when importing >> existing vmlist entries and prevent those areas from being removed from the >> rbtree by accident. > > How would they be removed "by accident"? I don't mean actual use-cases, but I just want to make it robust against like programming errors.